[jira] [Assigned] (SPARK-33108) Remove sbt-dependency-graph SBT plugin
[ https://issues.apache.org/jira/browse/SPARK-33108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33108: - Assignee: Dongjoon Hyun > Remove sbt-dependency-graph SBT plugin > -- > > Key: SPARK-33108 > URL: https://issues.apache.org/jira/browse/SPARK-33108 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33108) Remove sbt-dependency-graph SBT plugin
[ https://issues.apache.org/jira/browse/SPARK-33108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33108. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29997 [https://github.com/apache/spark/pull/29997] > Remove sbt-dependency-graph SBT plugin > -- > > Key: SPARK-33108 > URL: https://issues.apache.org/jira/browse/SPARK-33108 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33104) Fix `YarnClusterSuite.yarn-cluster should respect conf overrides in SparkHadoopUtil`
[ https://issues.apache.org/jira/browse/SPARK-33104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211563#comment-17211563 ] Yang Jie commented on SPARK-33104: -- Is this an inevitable problem? `mvn test` can pass, may need to add some log to determine the file loading path of `core-site.xml` > Fix `YarnClusterSuite.yarn-cluster should respect conf overrides in > SparkHadoopUtil` > > > Key: SPARK-33104 > URL: https://issues.apache.org/jira/browse/SPARK-33104 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/1377/testReport/org.apache.spark.deploy.yarn/YarnClusterSuite/yarn_cluster_should_respect_conf_overrides_in_SparkHadoopUtil__SPARK_16414__SPARK_23630_/ > {code} > 20/10/09 05:18:13.211 ContainersLauncher #0 WARN DefaultContainerExecutor: > Exit code from container container_1602245728426_0006_02_01 is : 15 > 20/10/09 05:18:13.211 ContainersLauncher #0 WARN DefaultContainerExecutor: > Exception from container-launch with container ID: > container_1602245728426_0006_02_01 and exit code: 15 > ExitCodeException exitCode=15: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 20/10/09 05:18:13.211 ContainersLauncher #0 WARN ContainerLaunch: Container > exited with a non-zero exit code 15 > 20/10/09 05:18:13.237 AsyncDispatcher event handler WARN NMAuditLogger: > USER=jenkins OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE > APPID=application_1602245728426_0006 > CONTAINERID=container_1602245728426_0006_02_01 > 20/10/09 05:18:13.244 Socket Reader #1 for port 37112 INFO Server: Auth > successful for appattempt_1602245728426_0006_02 (auth:SIMPLE) > 20/10/09 05:18:13.326 IPC Parameter Sending Thread #0 DEBUG Client: IPC > Client (1123559518) connection to > amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins sending #37 > 20/10/09 05:18:13.327 IPC Client (1123559518) connection to > amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins DEBUG Client: IPC > Client (1123559518) connection to > amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins got value #37 > 20/10/09 05:18:13.328 main DEBUG ProtobufRpcEngine: Call: > getApplicationReport took 2ms > 20/10/09 05:18:13.328 main INFO Client: Application report for > application_1602245728426_0006 (state: FINISHED) > 20/10/09 05:18:13.328 main DEBUG Client: >client token: N/A >diagnostics: User class threw exception: > org.scalatest.exceptions.TestFailedException: null was not equal to > "testvalue" > at > org.scalatest.matchers.MatchersHelper$.indicateFailure(MatchersHelper.scala:344) > at > org.scalatest.matchers.should.Matchers$ShouldMethodHelperClass.shouldMatcher(Matchers.scala:6778) > at > org.scalatest.matchers.should.Matchers$AnyShouldWrapper.should(Matchers.scala:6822) > at > org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf$.$anonfun$main$2(YarnClusterSuite.scala:383) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) > at > org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf$.main(YarnClusterSuite.scala:382) > at > org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf.main(YarnClusterSuite.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMe
[jira] [Created] (SPARK-33109) Upgrade to SBT 1.4 and support `dependencyTree` back
Dongjoon Hyun created SPARK-33109: - Summary: Upgrade to SBT 1.4 and support `dependencyTree` back Key: SPARK-33109 URL: https://issues.apache.org/jira/browse/SPARK-33109 Project: Spark Issue Type: Task Components: Build Affects Versions: 3.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33094) ORC format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33094: - Fix Version/s: 2.4.8 > ORC format does not propagate Hadoop config from DS options to underlying > HDFS file system > -- > > Key: SPARK-33094 > URL: https://issues.apache.org/jira/browse/SPARK-33094 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 2.4.8, 3.0.2, 3.1.0 > > > When running: > {code:java} > spark.read.format("orc").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33079) Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job
[ https://issues.apache.org/jira/browse/SPARK-33079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-33079: --- Summary: Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job (was: Add Scala 2.13 build test in GitHub Action for SBT) > Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job > > > Key: SPARK-33079 > URL: https://issues.apache.org/jira/browse/SPARK-33079 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > SPARK-32926 added a build test to GitHub Action for Scala 2.13 but it's only > with Maven. > As SPARK-32873 reported, some compilation error happens only with SBT so I > think we need to add another build test to GitHub Action for SBT. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32082) Project Zen: Improving Python usability
[ https://issues.apache.org/jira/browse/SPARK-32082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211560#comment-17211560 ] Hyukjin Kwon commented on SPARK-32082: -- Nope, these are all the JIRAs linked here. I should still collect feedback and investigate with a proper design for that. Feel free to send an email (with cc'ing me) or file a JIRA if you have a concrete idea. > Project Zen: Improving Python usability > --- > > Key: SPARK-32082 > URL: https://issues.apache.org/jira/browse/SPARK-32082 > Project: Spark > Issue Type: Epic > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Critical > > The importance of Python and PySpark has grown radically in the last few > years. The number of PySpark downloads reached [more than 1.3 million _every > week_|https://pypistats.org/packages/pyspark] when we count them _only_ in > PyPI. Nevertheless, PySpark is still less Pythonic. It exposes many JVM error > messages as an example, and the API documentation is poorly written. > This epic tickets aims to improve the usability in PySpark, and make it more > Pythonic. To be more explicit, this JIRA targets four bullet points below. > Each includes examples: > * Being Pythonic > ** Pandas UDF enhancements and type hints > ** Avoid dynamic function definitions, for example, at {{funcitons.py}} > which makes IDEs unable to detect. > * Better and easier usability in PySpark > ** User-facing error message and warnings > ** Documentation > ** User guide > ** Better examples and API documentation, e.g. > [Koalas|https://koalas.readthedocs.io/en/latest/] and > [pandas|https://pandas.pydata.org/docs/] > * Better interoperability with other Python libraries > ** Visualization and plotting > ** Potentially better interface by leveraging Arrow > ** Compatibility with other libraries such as NumPy universal functions or > pandas possibly by leveraging Koalas > * PyPI Installation > ** PySpark with Hadoop 3 support on PyPi > ** Better error handling -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-32082) Project Zen: Improving Python usability
[ https://issues.apache.org/jira/browse/SPARK-32082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211560#comment-17211560 ] Hyukjin Kwon edited comment on SPARK-32082 at 10/10/20, 4:55 AM: - Nope, these are all the JIRAs linked here. I should still collect feedback and investigate with a proper design for that. Feel free to send an email to dev mailing list (with cc'ing me) or file a JIRA if you have a concrete idea. was (Author: hyukjin.kwon): Nope, these are all the JIRAs linked here. I should still collect feedback and investigate with a proper design for that. Feel free to send an email (with cc'ing me) or file a JIRA if you have a concrete idea. > Project Zen: Improving Python usability > --- > > Key: SPARK-32082 > URL: https://issues.apache.org/jira/browse/SPARK-32082 > Project: Spark > Issue Type: Epic > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Critical > > The importance of Python and PySpark has grown radically in the last few > years. The number of PySpark downloads reached [more than 1.3 million _every > week_|https://pypistats.org/packages/pyspark] when we count them _only_ in > PyPI. Nevertheless, PySpark is still less Pythonic. It exposes many JVM error > messages as an example, and the API documentation is poorly written. > This epic tickets aims to improve the usability in PySpark, and make it more > Pythonic. To be more explicit, this JIRA targets four bullet points below. > Each includes examples: > * Being Pythonic > ** Pandas UDF enhancements and type hints > ** Avoid dynamic function definitions, for example, at {{funcitons.py}} > which makes IDEs unable to detect. > * Better and easier usability in PySpark > ** User-facing error message and warnings > ** Documentation > ** User guide > ** Better examples and API documentation, e.g. > [Koalas|https://koalas.readthedocs.io/en/latest/] and > [pandas|https://pandas.pydata.org/docs/] > * Better interoperability with other Python libraries > ** Visualization and plotting > ** Potentially better interface by leveraging Arrow > ** Compatibility with other libraries such as NumPy universal functions or > pandas possibly by leveraging Koalas > * PyPI Installation > ** PySpark with Hadoop 3 support on PyPi > ** Better error handling -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33102) Use stringToSeq on SQL list typed parameters
[ https://issues.apache.org/jira/browse/SPARK-33102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33102: Assignee: Gabor Somogyi > Use stringToSeq on SQL list typed parameters > > > Key: SPARK-33102 > URL: https://issues.apache.org/jira/browse/SPARK-33102 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33102) Use stringToSeq on SQL list typed parameters
[ https://issues.apache.org/jira/browse/SPARK-33102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33102. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29989 [https://github.com/apache/spark/pull/29989] > Use stringToSeq on SQL list typed parameters > > > Key: SPARK-33102 > URL: https://issues.apache.org/jira/browse/SPARK-33102 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Minor > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33105) Broken installation of source packages on AppVeyor
[ https://issues.apache.org/jira/browse/SPARK-33105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33105. -- Fix Version/s: 3.1.0 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/29991 > Broken installation of source packages on AppVeyor > -- > > Key: SPARK-33105 > URL: https://issues.apache.org/jira/browse/SPARK-33105 > Project: Spark > Issue Type: Bug > Components: Project Infra, R >Affects Versions: 3.1.0 > Environment: *strong text* >Reporter: Maciej Szymkiewicz >Priority: Major > Fix For: 3.1.0 > > > It looks like AppVeyor configuration is broken, which leads to failure of > installation of source packages (become a problem when {{rlang}} has been > updated from 0.4.7 and 0.4.8, with latter available only as a source package). > {code} > [00:01:48] trying URL > 'https://cloud.r-project.org/src/contrib/rlang_0.4.8.tar.gz' > [00:01:48] Content type 'application/x-gzip' length 847517 bytes (827 KB) > [00:01:48] == > [00:01:48] downloaded 827 KB > [00:01:48] > [00:01:48] Warning in strptime(xx, f, tz = tz) : > [00:01:48] unable to identify current timezone 'C': > [00:01:48] please set environment variable 'TZ' > [00:01:49] * installing *source* package 'rlang' ... > [00:01:49] ** package 'rlang' successfully unpacked and MD5 sums checked > [00:01:49] ** using staged installation > [00:01:49] ** libs > [00:01:49] > [00:01:49] *** arch - i386 > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c capture.c -o capture.o > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c export.c -o export.o > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c internal.c -o internal.o > [00:01:50] In file included from ./lib/rlang.h:74, > [00:01:50] from internal/arg.c:1, > [00:01:50] from internal.c:1: > [00:01:50] internal/eval-tidy.c: In function 'rlang_tilde_eval': > [00:01:50] ./lib/env.h:33:10: warning: 'top' may be used uninitialized > in this function [-Wmaybe-uninitialized] > [00:01:50]return ENCLOS(env); > [00:01:50] ^~~ > [00:01:50] In file included from internal.c:8: > [00:01:50] internal/eval-tidy.c:406:9: note: 'top' was declared here > [00:01:50]sexp* top; > [00:01:50] ^~~ > [00:01:50] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c lib.c -o lib.o > [00:01:51] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c version.c -o version.o > [00:01:52] C:/Rtools40/mingw64/bin/gcc -shared -s -static-libgcc -o > rlang.dll tmp.def capture.o export.o internal.o lib.o version.o > -LC:/R/bin/i386 -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > skipping incompatible C:/R/bin/i386/R.dll when searching for -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > skipping incompatible C:/R/bin/i386/R.dll when searching for -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > cannot find -lR > [00:01:52] collect2.exe: error: ld returned 1 exit status > [00:01:52] no DLL was created > [00:01:52] ERROR: compilation failed for package 'rlang' > [00:01:52] * removing 'C:/RLibrary/rlang' > [00:01:52] > [00:01:52] The downloaded source packages are in > [00:01:52] > 'C:\Users\appveyor\AppData\Local\Temp\1\Rtmp8qrryA\downloaded_packages' > [00:01:52] Warning message: > [00:01:52] In install.packages(c("knitr", "rmarkdown", "testthat", > "e1071", : > [00:01:52] installation of package 'rlang' had non-zero exit status > {code} > This leads to failures to install {{devtools}} and generate Rd files and, as > a result, CRAN check failure. > There are some discrepancies in the > {{dev/appveyor-install-dependencies.ps1}}, but the direct source of this > issue seems to be {{$env:BINPREF}}, which forces usage of 64 bit mingw, even > if packages are compiled for 32 bit. > Modifying the variable to include current architecture: > {code} > $env:BINPREF=$RtoolsDrive + '/Rtools40/mingw$(WIN)/bin/' > {code} > (as proposed [here|https://stackoverflow.com/a/44035904] by R Yoda) looks > like a valid fix, though we might want to clean remaining issues as well. -- This message was s
[jira] [Commented] (SPARK-32907) adaptively blockify instances
[ https://issues.apache.org/jira/browse/SPARK-32907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211550#comment-17211550 ] Apache Spark commented on SPARK-32907: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/29998 > adaptively blockify instances > - > > Key: SPARK-32907 > URL: https://issues.apache.org/jira/browse/SPARK-32907 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Priority: Major > Attachments: blockify_svc_perf_20201010.xlsx > > > According to the performance test in > https://issues.apache.org/jira/browse/SPARK-31783, the performance gain is > mainly related to the nnz of block. > So it is reasonable to control the size of block. > > I had some offline discuss with [~weichenxu123], then we think following > changes are worthy: > 1, infer an appropriate blockSize (MB) based on numFeatures and nnz by > default; > 2, impls should use a relative small memory footprint when processing one > block, and should not use a large pre-allocated buffer, so we need to revert > gmm; > 3, use new blockify strategy in LinearSVC/LoR/LiR/AFT; > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32907) adaptively blockify instances
[ https://issues.apache.org/jira/browse/SPARK-32907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211549#comment-17211549 ] Apache Spark commented on SPARK-32907: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/29998 > adaptively blockify instances > - > > Key: SPARK-32907 > URL: https://issues.apache.org/jira/browse/SPARK-32907 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Priority: Major > Attachments: blockify_svc_perf_20201010.xlsx > > > According to the performance test in > https://issues.apache.org/jira/browse/SPARK-31783, the performance gain is > mainly related to the nnz of block. > So it is reasonable to control the size of block. > > I had some offline discuss with [~weichenxu123], then we think following > changes are worthy: > 1, infer an appropriate blockSize (MB) based on numFeatures and nnz by > default; > 2, impls should use a relative small memory footprint when processing one > block, and should not use a large pre-allocated buffer, so we need to revert > gmm; > 3, use new blockify strategy in LinearSVC/LoR/LiR/AFT; > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33108) Remove sbt-dependency-graph SBT plugin
[ https://issues.apache.org/jira/browse/SPARK-33108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211547#comment-17211547 ] Apache Spark commented on SPARK-33108: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/29997 > Remove sbt-dependency-graph SBT plugin > -- > > Key: SPARK-33108 > URL: https://issues.apache.org/jira/browse/SPARK-33108 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33108) Remove sbt-dependency-graph SBT plugin
[ https://issues.apache.org/jira/browse/SPARK-33108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33108: Assignee: Apache Spark > Remove sbt-dependency-graph SBT plugin > -- > > Key: SPARK-33108 > URL: https://issues.apache.org/jira/browse/SPARK-33108 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33108) Remove sbt-dependency-graph SBT plugin
[ https://issues.apache.org/jira/browse/SPARK-33108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33108: Assignee: (was: Apache Spark) > Remove sbt-dependency-graph SBT plugin > -- > > Key: SPARK-33108 > URL: https://issues.apache.org/jira/browse/SPARK-33108 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32907) adaptively blockify instances
[ https://issues.apache.org/jira/browse/SPARK-32907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-32907: - Attachment: blockify_svc_perf_20201010.xlsx > adaptively blockify instances > - > > Key: SPARK-32907 > URL: https://issues.apache.org/jira/browse/SPARK-32907 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Priority: Major > Attachments: blockify_svc_perf_20201010.xlsx > > > According to the performance test in > https://issues.apache.org/jira/browse/SPARK-31783, the performance gain is > mainly related to the nnz of block. > So it is reasonable to control the size of block. > > I had some offline discuss with [~weichenxu123], then we think following > changes are worthy: > 1, infer an appropriate blockSize (MB) based on numFeatures and nnz by > default; > 2, impls should use a relative small memory footprint when processing one > block, and should not use a large pre-allocated buffer, so we need to revert > gmm; > 3, use new blockify strategy in LinearSVC/LoR/LiR/AFT; > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33108) Remove sbt-dependency-graph SBT plugin
Dongjoon Hyun created SPARK-33108: - Summary: Remove sbt-dependency-graph SBT plugin Key: SPARK-33108 URL: https://issues.apache.org/jira/browse/SPARK-33108 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33107) Remove hive-2.3 workaround code
[ https://issues.apache.org/jira/browse/SPARK-33107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33107: Assignee: (was: Apache Spark) > Remove hive-2.3 workaround code > --- > > Key: SPARK-33107 > URL: https://issues.apache.org/jira/browse/SPARK-33107 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We can make code more clear and readable after SPARK-33082. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33107) Remove hive-2.3 workaround code
[ https://issues.apache.org/jira/browse/SPARK-33107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33107: Assignee: Apache Spark > Remove hive-2.3 workaround code > --- > > Key: SPARK-33107 > URL: https://issues.apache.org/jira/browse/SPARK-33107 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > We can make code more clear and readable after SPARK-33082. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33107) Remove hive-2.3 workaround code
[ https://issues.apache.org/jira/browse/SPARK-33107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211526#comment-17211526 ] Apache Spark commented on SPARK-33107: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/29996 > Remove hive-2.3 workaround code > --- > > Key: SPARK-33107 > URL: https://issues.apache.org/jira/browse/SPARK-33107 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We can make code more clear and readable after SPARK-33082. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33107) Remove hive-2.3 workaround code
Yuming Wang created SPARK-33107: --- Summary: Remove hive-2.3 workaround code Key: SPARK-33107 URL: https://issues.apache.org/jira/browse/SPARK-33107 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang We can make code more clear and readable after SPARK-33082. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33045) Implement built-in LIKE ANY and LIKE ALL UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211522#comment-17211522 ] jiaan.geng commented on SPARK-33045: I'm working on. > Implement built-in LIKE ANY and LIKE ALL UDF > > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support LIKE ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements(more than 14378 elements). > We should implement built-in LIKE ANY and LIKE ALL UDF to fix this issue. > {noformat} > java.lang.StackOverflowError > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) > at > scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) > at scala.collection.immutable.List.foreach(List.scala:392) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33080) Replace compiler reporter with more robust and maintainable solution
[ https://issues.apache.org/jira/browse/SPARK-33080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33080: Assignee: Apache Spark > Replace compiler reporter with more robust and maintainable solution > > > Key: SPARK-33080 > URL: https://issues.apache.org/jira/browse/SPARK-33080 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Denis Pyshev >Assignee: Apache Spark >Priority: Minor > > Currently existing solution to have build to fail on any warning except > deprecation ones > ([https://github.com/apache/spark/blob/v3.0.1/project/SparkBuild.scala#L285)] > is not very maintainable in scope of build upgrade to latest sbt. > At upgrade to sbt 1.4.0 this snippet would fail build import at all. > Implement new solution, using switch over compiler versions, silencer > compiler plugin for Scala prior 2.13.2 and built-in warning configuration > since Scala 2.13.2 > Depends on changes for SPARK-21708 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33080) Replace compiler reporter with more robust and maintainable solution
[ https://issues.apache.org/jira/browse/SPARK-33080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33080: Assignee: (was: Apache Spark) > Replace compiler reporter with more robust and maintainable solution > > > Key: SPARK-33080 > URL: https://issues.apache.org/jira/browse/SPARK-33080 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Denis Pyshev >Priority: Minor > > Currently existing solution to have build to fail on any warning except > deprecation ones > ([https://github.com/apache/spark/blob/v3.0.1/project/SparkBuild.scala#L285)] > is not very maintainable in scope of build upgrade to latest sbt. > At upgrade to sbt 1.4.0 this snippet would fail build import at all. > Implement new solution, using switch over compiler versions, silencer > compiler plugin for Scala prior 2.13.2 and built-in warning configuration > since Scala 2.13.2 > Depends on changes for SPARK-21708 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33080) Replace compiler reporter with more robust and maintainable solution
[ https://issues.apache.org/jira/browse/SPARK-33080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211494#comment-17211494 ] Apache Spark commented on SPARK-33080: -- User 'gemelen' has created a pull request for this issue: https://github.com/apache/spark/pull/29995 > Replace compiler reporter with more robust and maintainable solution > > > Key: SPARK-33080 > URL: https://issues.apache.org/jira/browse/SPARK-33080 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Denis Pyshev >Priority: Minor > > Currently existing solution to have build to fail on any warning except > deprecation ones > ([https://github.com/apache/spark/blob/v3.0.1/project/SparkBuild.scala#L285)] > is not very maintainable in scope of build upgrade to latest sbt. > At upgrade to sbt 1.4.0 this snippet would fail build import at all. > Implement new solution, using switch over compiler versions, silencer > compiler plugin for Scala prior 2.13.2 and built-in warning configuration > since Scala 2.13.2 > Depends on changes for SPARK-21708 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33106) Fix sbt resolvers clash
Denis Pyshev created SPARK-33106: Summary: Fix sbt resolvers clash Key: SPARK-33106 URL: https://issues.apache.org/jira/browse/SPARK-33106 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.1.0 Reporter: Denis Pyshev During sbt upgrade from 0.13 to 1.x, exact resolvers list was used as is. That leads to local resolvers name clashing, which is observed as warning from SBT: {code:java} [warn] Multiple resolvers having different access mechanism configured with same name 'local'. To avoid conflict, Remove duplicate project resolvers (`resolvers`) or rename publishing resolve r (`publishTo`). {code} This needs to be fixed to avoid potential errors and reduce log noise. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33104) Fix `YarnClusterSuite.yarn-cluster should respect conf overrides in SparkHadoopUtil`
[ https://issues.apache.org/jira/browse/SPARK-33104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33104: -- Summary: Fix `YarnClusterSuite.yarn-cluster should respect conf overrides in SparkHadoopUtil` (was: Fix YarnClusterSuite.yarn-cluster should respect conf overrides in SparkHadoopUtil) > Fix `YarnClusterSuite.yarn-cluster should respect conf overrides in > SparkHadoopUtil` > > > Key: SPARK-33104 > URL: https://issues.apache.org/jira/browse/SPARK-33104 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/1377/testReport/org.apache.spark.deploy.yarn/YarnClusterSuite/yarn_cluster_should_respect_conf_overrides_in_SparkHadoopUtil__SPARK_16414__SPARK_23630_/ > {code} > 20/10/09 05:18:13.211 ContainersLauncher #0 WARN DefaultContainerExecutor: > Exit code from container container_1602245728426_0006_02_01 is : 15 > 20/10/09 05:18:13.211 ContainersLauncher #0 WARN DefaultContainerExecutor: > Exception from container-launch with container ID: > container_1602245728426_0006_02_01 and exit code: 15 > ExitCodeException exitCode=15: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 20/10/09 05:18:13.211 ContainersLauncher #0 WARN ContainerLaunch: Container > exited with a non-zero exit code 15 > 20/10/09 05:18:13.237 AsyncDispatcher event handler WARN NMAuditLogger: > USER=jenkins OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE > APPID=application_1602245728426_0006 > CONTAINERID=container_1602245728426_0006_02_01 > 20/10/09 05:18:13.244 Socket Reader #1 for port 37112 INFO Server: Auth > successful for appattempt_1602245728426_0006_02 (auth:SIMPLE) > 20/10/09 05:18:13.326 IPC Parameter Sending Thread #0 DEBUG Client: IPC > Client (1123559518) connection to > amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins sending #37 > 20/10/09 05:18:13.327 IPC Client (1123559518) connection to > amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins DEBUG Client: IPC > Client (1123559518) connection to > amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins got value #37 > 20/10/09 05:18:13.328 main DEBUG ProtobufRpcEngine: Call: > getApplicationReport took 2ms > 20/10/09 05:18:13.328 main INFO Client: Application report for > application_1602245728426_0006 (state: FINISHED) > 20/10/09 05:18:13.328 main DEBUG Client: >client token: N/A >diagnostics: User class threw exception: > org.scalatest.exceptions.TestFailedException: null was not equal to > "testvalue" > at > org.scalatest.matchers.MatchersHelper$.indicateFailure(MatchersHelper.scala:344) > at > org.scalatest.matchers.should.Matchers$ShouldMethodHelperClass.shouldMatcher(Matchers.scala:6778) > at > org.scalatest.matchers.should.Matchers$AnyShouldWrapper.should(Matchers.scala:6822) > at > org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf$.$anonfun$main$2(YarnClusterSuite.scala:383) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) > at > org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf$.main(YarnClusterSuite.scala:382) > at > org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf.main(YarnClusterSuite.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.D
[jira] [Commented] (SPARK-32082) Project Zen: Improving Python usability
[ https://issues.apache.org/jira/browse/SPARK-32082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211474#comment-17211474 ] Andrew Malone Melo commented on SPARK-32082: > Potentially better interface by leveraging Arrow Is there an open Jira for this? > Project Zen: Improving Python usability > --- > > Key: SPARK-32082 > URL: https://issues.apache.org/jira/browse/SPARK-32082 > Project: Spark > Issue Type: Epic > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Critical > > The importance of Python and PySpark has grown radically in the last few > years. The number of PySpark downloads reached [more than 1.3 million _every > week_|https://pypistats.org/packages/pyspark] when we count them _only_ in > PyPI. Nevertheless, PySpark is still less Pythonic. It exposes many JVM error > messages as an example, and the API documentation is poorly written. > This epic tickets aims to improve the usability in PySpark, and make it more > Pythonic. To be more explicit, this JIRA targets four bullet points below. > Each includes examples: > * Being Pythonic > ** Pandas UDF enhancements and type hints > ** Avoid dynamic function definitions, for example, at {{funcitons.py}} > which makes IDEs unable to detect. > * Better and easier usability in PySpark > ** User-facing error message and warnings > ** Documentation > ** User guide > ** Better examples and API documentation, e.g. > [Koalas|https://koalas.readthedocs.io/en/latest/] and > [pandas|https://pandas.pydata.org/docs/] > * Better interoperability with other Python libraries > ** Visualization and plotting > ** Potentially better interface by leveraging Arrow > ** Compatibility with other libraries such as NumPy universal functions or > pandas possibly by leveraging Koalas > * PyPI Installation > ** PySpark with Hadoop 3 support on PyPi > ** Better error handling -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33105) Broken installation of source packages on AppVeyor
[ https://issues.apache.org/jira/browse/SPARK-33105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33105: Assignee: (was: Apache Spark) > Broken installation of source packages on AppVeyor > -- > > Key: SPARK-33105 > URL: https://issues.apache.org/jira/browse/SPARK-33105 > Project: Spark > Issue Type: Bug > Components: Project Infra, R >Affects Versions: 3.1.0 > Environment: *strong text* >Reporter: Maciej Szymkiewicz >Priority: Major > > It looks like AppVeyor configuration is broken, which leads to failure of > installation of source packages (become a problem when {{rlang}} has been > updated from 0.4.7 and 0.4.8, with latter available only as a source package). > {code} > [00:01:48] trying URL > 'https://cloud.r-project.org/src/contrib/rlang_0.4.8.tar.gz' > [00:01:48] Content type 'application/x-gzip' length 847517 bytes (827 KB) > [00:01:48] == > [00:01:48] downloaded 827 KB > [00:01:48] > [00:01:48] Warning in strptime(xx, f, tz = tz) : > [00:01:48] unable to identify current timezone 'C': > [00:01:48] please set environment variable 'TZ' > [00:01:49] * installing *source* package 'rlang' ... > [00:01:49] ** package 'rlang' successfully unpacked and MD5 sums checked > [00:01:49] ** using staged installation > [00:01:49] ** libs > [00:01:49] > [00:01:49] *** arch - i386 > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c capture.c -o capture.o > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c export.c -o export.o > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c internal.c -o internal.o > [00:01:50] In file included from ./lib/rlang.h:74, > [00:01:50] from internal/arg.c:1, > [00:01:50] from internal.c:1: > [00:01:50] internal/eval-tidy.c: In function 'rlang_tilde_eval': > [00:01:50] ./lib/env.h:33:10: warning: 'top' may be used uninitialized > in this function [-Wmaybe-uninitialized] > [00:01:50]return ENCLOS(env); > [00:01:50] ^~~ > [00:01:50] In file included from internal.c:8: > [00:01:50] internal/eval-tidy.c:406:9: note: 'top' was declared here > [00:01:50]sexp* top; > [00:01:50] ^~~ > [00:01:50] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c lib.c -o lib.o > [00:01:51] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c version.c -o version.o > [00:01:52] C:/Rtools40/mingw64/bin/gcc -shared -s -static-libgcc -o > rlang.dll tmp.def capture.o export.o internal.o lib.o version.o > -LC:/R/bin/i386 -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > skipping incompatible C:/R/bin/i386/R.dll when searching for -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > skipping incompatible C:/R/bin/i386/R.dll when searching for -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > cannot find -lR > [00:01:52] collect2.exe: error: ld returned 1 exit status > [00:01:52] no DLL was created > [00:01:52] ERROR: compilation failed for package 'rlang' > [00:01:52] * removing 'C:/RLibrary/rlang' > [00:01:52] > [00:01:52] The downloaded source packages are in > [00:01:52] > 'C:\Users\appveyor\AppData\Local\Temp\1\Rtmp8qrryA\downloaded_packages' > [00:01:52] Warning message: > [00:01:52] In install.packages(c("knitr", "rmarkdown", "testthat", > "e1071", : > [00:01:52] installation of package 'rlang' had non-zero exit status > {code} > This leads to failures to install {{devtools}} and generate Rd files and, as > a result, CRAN check failure. > There are some discrepancies in the > {{dev/appveyor-install-dependencies.ps1}}, but the direct source of this > issue seems to be {{$env:BINPREF}}, which forces usage of 64 bit mingw, even > if packages are compiled for 32 bit. > Modifying the variable to include current architecture: > {code} > $env:BINPREF=$RtoolsDrive + '/Rtools40/mingw$(WIN)/bin/' > {code} > (as proposed [here|https://stackoverflow.com/a/44035904] by R Yoda) looks > like a valid fix, though we might want to clean remaining issues as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (SPARK-33105) Broken installation of source packages on AppVeyor
[ https://issues.apache.org/jira/browse/SPARK-33105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33105: Assignee: Apache Spark > Broken installation of source packages on AppVeyor > -- > > Key: SPARK-33105 > URL: https://issues.apache.org/jira/browse/SPARK-33105 > Project: Spark > Issue Type: Bug > Components: Project Infra, R >Affects Versions: 3.1.0 > Environment: *strong text* >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Major > > It looks like AppVeyor configuration is broken, which leads to failure of > installation of source packages (become a problem when {{rlang}} has been > updated from 0.4.7 and 0.4.8, with latter available only as a source package). > {code} > [00:01:48] trying URL > 'https://cloud.r-project.org/src/contrib/rlang_0.4.8.tar.gz' > [00:01:48] Content type 'application/x-gzip' length 847517 bytes (827 KB) > [00:01:48] == > [00:01:48] downloaded 827 KB > [00:01:48] > [00:01:48] Warning in strptime(xx, f, tz = tz) : > [00:01:48] unable to identify current timezone 'C': > [00:01:48] please set environment variable 'TZ' > [00:01:49] * installing *source* package 'rlang' ... > [00:01:49] ** package 'rlang' successfully unpacked and MD5 sums checked > [00:01:49] ** using staged installation > [00:01:49] ** libs > [00:01:49] > [00:01:49] *** arch - i386 > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c capture.c -o capture.o > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c export.c -o export.o > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c internal.c -o internal.o > [00:01:50] In file included from ./lib/rlang.h:74, > [00:01:50] from internal/arg.c:1, > [00:01:50] from internal.c:1: > [00:01:50] internal/eval-tidy.c: In function 'rlang_tilde_eval': > [00:01:50] ./lib/env.h:33:10: warning: 'top' may be used uninitialized > in this function [-Wmaybe-uninitialized] > [00:01:50]return ENCLOS(env); > [00:01:50] ^~~ > [00:01:50] In file included from internal.c:8: > [00:01:50] internal/eval-tidy.c:406:9: note: 'top' was declared here > [00:01:50]sexp* top; > [00:01:50] ^~~ > [00:01:50] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c lib.c -o lib.o > [00:01:51] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c version.c -o version.o > [00:01:52] C:/Rtools40/mingw64/bin/gcc -shared -s -static-libgcc -o > rlang.dll tmp.def capture.o export.o internal.o lib.o version.o > -LC:/R/bin/i386 -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > skipping incompatible C:/R/bin/i386/R.dll when searching for -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > skipping incompatible C:/R/bin/i386/R.dll when searching for -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > cannot find -lR > [00:01:52] collect2.exe: error: ld returned 1 exit status > [00:01:52] no DLL was created > [00:01:52] ERROR: compilation failed for package 'rlang' > [00:01:52] * removing 'C:/RLibrary/rlang' > [00:01:52] > [00:01:52] The downloaded source packages are in > [00:01:52] > 'C:\Users\appveyor\AppData\Local\Temp\1\Rtmp8qrryA\downloaded_packages' > [00:01:52] Warning message: > [00:01:52] In install.packages(c("knitr", "rmarkdown", "testthat", > "e1071", : > [00:01:52] installation of package 'rlang' had non-zero exit status > {code} > This leads to failures to install {{devtools}} and generate Rd files and, as > a result, CRAN check failure. > There are some discrepancies in the > {{dev/appveyor-install-dependencies.ps1}}, but the direct source of this > issue seems to be {{$env:BINPREF}}, which forces usage of 64 bit mingw, even > if packages are compiled for 32 bit. > Modifying the variable to include current architecture: > {code} > $env:BINPREF=$RtoolsDrive + '/Rtools40/mingw$(WIN)/bin/' > {code} > (as proposed [here|https://stackoverflow.com/a/44035904] by R Yoda) looks > like a valid fix, though we might want to clean remaining issues as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) ---
[jira] [Commented] (SPARK-33105) Broken installation of source packages on AppVeyor
[ https://issues.apache.org/jira/browse/SPARK-33105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211456#comment-17211456 ] Apache Spark commented on SPARK-33105: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/29991 > Broken installation of source packages on AppVeyor > -- > > Key: SPARK-33105 > URL: https://issues.apache.org/jira/browse/SPARK-33105 > Project: Spark > Issue Type: Bug > Components: Project Infra, R >Affects Versions: 3.1.0 > Environment: *strong text* >Reporter: Maciej Szymkiewicz >Priority: Major > > It looks like AppVeyor configuration is broken, which leads to failure of > installation of source packages (become a problem when {{rlang}} has been > updated from 0.4.7 and 0.4.8, with latter available only as a source package). > {code} > [00:01:48] trying URL > 'https://cloud.r-project.org/src/contrib/rlang_0.4.8.tar.gz' > [00:01:48] Content type 'application/x-gzip' length 847517 bytes (827 KB) > [00:01:48] == > [00:01:48] downloaded 827 KB > [00:01:48] > [00:01:48] Warning in strptime(xx, f, tz = tz) : > [00:01:48] unable to identify current timezone 'C': > [00:01:48] please set environment variable 'TZ' > [00:01:49] * installing *source* package 'rlang' ... > [00:01:49] ** package 'rlang' successfully unpacked and MD5 sums checked > [00:01:49] ** using staged installation > [00:01:49] ** libs > [00:01:49] > [00:01:49] *** arch - i386 > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c capture.c -o capture.o > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c export.c -o export.o > [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c internal.c -o internal.o > [00:01:50] In file included from ./lib/rlang.h:74, > [00:01:50] from internal/arg.c:1, > [00:01:50] from internal.c:1: > [00:01:50] internal/eval-tidy.c: In function 'rlang_tilde_eval': > [00:01:50] ./lib/env.h:33:10: warning: 'top' may be used uninitialized > in this function [-Wmaybe-uninitialized] > [00:01:50]return ENCLOS(env); > [00:01:50] ^~~ > [00:01:50] In file included from internal.c:8: > [00:01:50] internal/eval-tidy.c:406:9: note: 'top' was declared here > [00:01:50]sexp* top; > [00:01:50] ^~~ > [00:01:50] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c lib.c -o lib.o > [00:01:51] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG > -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 > -mstackrealign -c version.c -o version.o > [00:01:52] C:/Rtools40/mingw64/bin/gcc -shared -s -static-libgcc -o > rlang.dll tmp.def capture.o export.o internal.o lib.o version.o > -LC:/R/bin/i386 -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > skipping incompatible C:/R/bin/i386/R.dll when searching for -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > skipping incompatible C:/R/bin/i386/R.dll when searching for -lR > [00:01:52] > c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > cannot find -lR > [00:01:52] collect2.exe: error: ld returned 1 exit status > [00:01:52] no DLL was created > [00:01:52] ERROR: compilation failed for package 'rlang' > [00:01:52] * removing 'C:/RLibrary/rlang' > [00:01:52] > [00:01:52] The downloaded source packages are in > [00:01:52] > 'C:\Users\appveyor\AppData\Local\Temp\1\Rtmp8qrryA\downloaded_packages' > [00:01:52] Warning message: > [00:01:52] In install.packages(c("knitr", "rmarkdown", "testthat", > "e1071", : > [00:01:52] installation of package 'rlang' had non-zero exit status > {code} > This leads to failures to install {{devtools}} and generate Rd files and, as > a result, CRAN check failure. > There are some discrepancies in the > {{dev/appveyor-install-dependencies.ps1}}, but the direct source of this > issue seems to be {{$env:BINPREF}}, which forces usage of 64 bit mingw, even > if packages are compiled for 32 bit. > Modifying the variable to include current architecture: > {code} > $env:BINPREF=$RtoolsDrive + '/Rtools40/mingw$(WIN)/bin/' > {code} > (as proposed [here|https://stackoverflow.com/a/44035904] by R Yoda) looks > like a valid fix, though we might want to clean remaining issues as well.
[jira] [Commented] (SPARK-33098) Exception when using 'in' to compare a partition column to a literal with the wrong type
[ https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211453#comment-17211453 ] Bruce Robbins commented on SPARK-33098: --- I left out one case, which I added to the bottom to the description. All the cases are covered by the PR for SPARK-25056, except for the last one, which still throws an exception ('Filtering is supported only on partition keys of type string'). > Exception when using 'in' to compare a partition column to a literal with the > wrong type > > > Key: SPARK-33098 > URL: https://issues.apache.org/jira/browse/SPARK-33098 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Bruce Robbins >Priority: Major > > Comparing a partition column against a literal with the wrong type works if > you use equality ('='). However, if you use 'in', you get: > {noformat} > MetaException(message:Filtering is supported only on partition keys of type > string) > {noformat} > For example: > {noformat} > spark-sql> create table test (a int) partitioned by (b int) stored as parquet; > Time taken: 0.323 seconds > spark-sql> insert into test values (1, 1), (1, 2), (2, 2); > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updated size to 418 > 20/10/08 19:57:14 WARN log: Updated size to 836 > Time taken: 2.124 seconds > spark-sql> -- this works, of course > spark-sql> select * from test where b in (2); > 1 2 > 2 2 > Time taken: 0.13 seconds, Fetched 2 row(s) > spark-sql> -- this also works (equals with wrong type) > spark-sql> select * from test where b = '2'; > 1 2 > 2 2 > Time taken: 0.132 seconds, Fetched 2 row(s) > spark-sql> -- this does not work ('in' with wrong type) > spark-sql> select * from test where b in ('2'); > 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b > in ('2')] > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > - > Caused by: MetaException(message:Filtering is supported only on partition > keys of type string) > {noformat} > There are also interesting variations of this using the dataframe API: > {noformat} > scala> sql("select cast(b as string) as b from test where b in > (2)").show(false) > +---+ > |b | > +---+ > |2 | > |2 | > +---+ > scala> sql("select cast(b as string) as b from test").filter("b in > (2)").show(false) > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is > supported only on partition keys of type string > {noformat} > Also this: > {noformat} > scala> sql("select cast(b as string) as b from test").filter("b in > ('2')").show(false) > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > - > - > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is > supported only on partition keys of type string > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33098) Exception when using 'in' to compare a partition column to a literal with the wrong type
[ https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-33098: -- Description: Comparing a partition column against a literal with the wrong type works if you use equality ('='). However, if you use 'in', you get: {noformat} MetaException(message:Filtering is supported only on partition keys of type string) {noformat} For example: {noformat} spark-sql> create table test (a int) partitioned by (b int) stored as parquet; Time taken: 0.323 seconds spark-sql> insert into test values (1, 1), (1, 2), (2, 2); 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updated size to 418 20/10/08 19:57:14 WARN log: Updated size to 836 Time taken: 2.124 seconds spark-sql> -- this works, of course spark-sql> select * from test where b in (2); 1 2 2 2 Time taken: 0.13 seconds, Fetched 2 row(s) spark-sql> -- this also works (equals with wrong type) spark-sql> select * from test where b = '2'; 1 2 2 2 Time taken: 0.132 seconds, Fetched 2 row(s) spark-sql> -- this does not work ('in' with wrong type) spark-sql> select * from test where b in ('2'); 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b in ('2')] java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) - - - Caused by: MetaException(message:Filtering is supported only on partition keys of type string) {noformat} There are also interesting variations of this using the dataframe API: {noformat} scala> sql("select cast(b as string) as b from test where b in (2)").show(false) +---+ |b | +---+ |2 | |2 | +---+ scala> sql("select cast(b as string) as b from test").filter("b in (2)").show(false) java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) - - Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string {noformat} Also this: {noformat} scala> sql("select cast(b as string) as b from test").filter("b in ('2')").show(false) java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK - - Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string {noformat} was: Comparing a partition column against a literal with the wrong type works if you use equality ('='). However, if you use 'in', you get: {noformat} MetaException(message:Filtering is supported only on partition keys of type string) {noformat} For example: {noformat} spark-sql> create table test (a int) partitioned by (b int) stored as parquet; Time taken: 0.323 seconds spark-sql> insert into test values (1, 1), (1, 2), (2, 2); 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updated size to 418 20/10/08 19:57:14 WARN log: Updated size to 836 Time taken: 2.124 seconds spark-sql> -- this works, of course spark-sql> select * from test where b in (2); 1 2 2 2 Time taken: 0.13 seconds, Fetched 2 row(s) spark-sql> -- this also works (equals with wrong type) spark-sql> select * from test where b = '2'; 1 2 2 2 Time taken: 0.132 seconds, Fetched 2 row(s) spark-sql> -- this does not work ('in' with wrong type) spark-sql> select * from test where b in ('2'); 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b in ('2')] java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/ji
[jira] [Updated] (SPARK-33105) Broken installation of source packages on AppVeyor
[ https://issues.apache.org/jira/browse/SPARK-33105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-33105: --- Description: It looks like AppVeyor configuration is broken, which leads to failure of installation of source packages (become a problem when {{rlang}} has been updated from 0.4.7 and 0.4.8, with latter available only as a source package). {code} [00:01:48] trying URL 'https://cloud.r-project.org/src/contrib/rlang_0.4.8.tar.gz' [00:01:48] Content type 'application/x-gzip' length 847517 bytes (827 KB) [00:01:48] == [00:01:48] downloaded 827 KB [00:01:48] [00:01:48] Warning in strptime(xx, f, tz = tz) : [00:01:48] unable to identify current timezone 'C': [00:01:48] please set environment variable 'TZ' [00:01:49] * installing *source* package 'rlang' ... [00:01:49] ** package 'rlang' successfully unpacked and MD5 sums checked [00:01:49] ** using staged installation [00:01:49] ** libs [00:01:49] [00:01:49] *** arch - i386 [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c capture.c -o capture.o [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c export.c -o export.o [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c internal.c -o internal.o [00:01:50] In file included from ./lib/rlang.h:74, [00:01:50] from internal/arg.c:1, [00:01:50] from internal.c:1: [00:01:50] internal/eval-tidy.c: In function 'rlang_tilde_eval': [00:01:50] ./lib/env.h:33:10: warning: 'top' may be used uninitialized in this function [-Wmaybe-uninitialized] [00:01:50]return ENCLOS(env); [00:01:50] ^~~ [00:01:50] In file included from internal.c:8: [00:01:50] internal/eval-tidy.c:406:9: note: 'top' was declared here [00:01:50]sexp* top; [00:01:50] ^~~ [00:01:50] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c lib.c -o lib.o [00:01:51] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c version.c -o version.o [00:01:52] C:/Rtools40/mingw64/bin/gcc -shared -s -static-libgcc -o rlang.dll tmp.def capture.o export.o internal.o lib.o version.o -LC:/R/bin/i386 -lR [00:01:52] c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:/R/bin/i386/R.dll when searching for -lR [00:01:52] c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:/R/bin/i386/R.dll when searching for -lR [00:01:52] c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lR [00:01:52] collect2.exe: error: ld returned 1 exit status [00:01:52] no DLL was created [00:01:52] ERROR: compilation failed for package 'rlang' [00:01:52] * removing 'C:/RLibrary/rlang' [00:01:52] [00:01:52] The downloaded source packages are in [00:01:52] 'C:\Users\appveyor\AppData\Local\Temp\1\Rtmp8qrryA\downloaded_packages' [00:01:52] Warning message: [00:01:52] In install.packages(c("knitr", "rmarkdown", "testthat", "e1071", : [00:01:52] installation of package 'rlang' had non-zero exit status {code} This leads to failures to install {{devtools}} and generate Rd files and, as a result, CRAN check failure. There are some discrepancies in the {{dev/appveyor-install-dependencies.ps1}}, but the direct source of this issue seems to be {{$env:BINPREF}}, which forces usage of 64 bit mingw, even if packages are compiled for 32 bit. Modifying the variable to include current architecture: {code} $env:BINPREF=$RtoolsDrive + '/Rtools40/mingw$(WIN)/bin/' {code} (as proposed [here|https://stackoverflow.com/a/44035904] by R Yoda) looks like a valid fix, though we might want to clean remaining issues as well. was: It looks like AppVeyor configuration is broken, which leads to failure of installation of source packages (become a problem when {{rlang}} has been updated from 0.4.7 and 0.4.8, with latter available only as a source package). {code} [00:01:48] trying URL 'https://cloud.r-project.org/src/contrib/rlang_0.4.8.tar.gz' [00:01:48] Content type 'application/x-gzip' length 847517 bytes (827 KB) [00:01:48] == [00:01:48] downloaded 827 KB [00:01:48] [00:01:48] Warning in strptime(xx, f, tz = tz) : [00:01:48] unable to identify current timezone 'C': [00:01:48] please set environment variable 'TZ' [00:01:49] * installing *source* package 'rlang' ... [00:01:49] ** package 'rlang' s
[jira] [Created] (SPARK-33105) Broken installation of source packages on AppVeyor
Maciej Szymkiewicz created SPARK-33105: -- Summary: Broken installation of source packages on AppVeyor Key: SPARK-33105 URL: https://issues.apache.org/jira/browse/SPARK-33105 Project: Spark Issue Type: Bug Components: Project Infra, R Affects Versions: 3.1.0 Environment: *strong text* Reporter: Maciej Szymkiewicz It looks like AppVeyor configuration is broken, which leads to failure of installation of source packages (become a problem when {{rlang}} has been updated from 0.4.7 and 0.4.8, with latter available only as a source package). {code} [00:01:48] trying URL 'https://cloud.r-project.org/src/contrib/rlang_0.4.8.tar.gz' [00:01:48] Content type 'application/x-gzip' length 847517 bytes (827 KB) [00:01:48] == [00:01:48] downloaded 827 KB [00:01:48] [00:01:48] Warning in strptime(xx, f, tz = tz) : [00:01:48] unable to identify current timezone 'C': [00:01:48] please set environment variable 'TZ' [00:01:49] * installing *source* package 'rlang' ... [00:01:49] ** package 'rlang' successfully unpacked and MD5 sums checked [00:01:49] ** using staged installation [00:01:49] ** libs [00:01:49] [00:01:49] *** arch - i386 [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c capture.c -o capture.o [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c export.c -o export.o [00:01:49] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c internal.c -o internal.o [00:01:50] In file included from ./lib/rlang.h:74, [00:01:50] from internal/arg.c:1, [00:01:50] from internal.c:1: [00:01:50] internal/eval-tidy.c: In function 'rlang_tilde_eval': [00:01:50] ./lib/env.h:33:10: warning: 'top' may be used uninitialized in this function [-Wmaybe-uninitialized] [00:01:50]return ENCLOS(env); [00:01:50] ^~~ [00:01:50] In file included from internal.c:8: [00:01:50] internal/eval-tidy.c:406:9: note: 'top' was declared here [00:01:50]sexp* top; [00:01:50] ^~~ [00:01:50] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c lib.c -o lib.o [00:01:51] C:/Rtools40/mingw64/bin/gcc -I"C:/R/include" -DNDEBUG -I./lib/ -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c version.c -o version.o [00:01:52] C:/Rtools40/mingw64/bin/gcc -shared -s -static-libgcc -o rlang.dll tmp.def capture.o export.o internal.o lib.o version.o -LC:/R/bin/i386 -lR [00:01:52] c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:/R/bin/i386/R.dll when searching for -lR [00:01:52] c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: skipping incompatible C:/R/bin/i386/R.dll when searching for -lR [00:01:52] c:/Rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lR [00:01:52] collect2.exe: error: ld returned 1 exit status [00:01:52] no DLL was created [00:01:52] ERROR: compilation failed for package 'rlang' [00:01:52] * removing 'C:/RLibrary/rlang' [00:01:52] [00:01:52] The downloaded source packages are in [00:01:52] 'C:\Users\appveyor\AppData\Local\Temp\1\Rtmp8qrryA\downloaded_packages' [00:01:52] Warning message: [00:01:52] In install.packages(c("knitr", "rmarkdown", "testthat", "e1071", : [00:01:52] installation of package 'rlang' had non-zero exit status {code} There are some discrepancies in the {{dev/appveyor-install-dependencies.ps1}}, but the direct source of this issue seems to be {{$env:BINPREF}}, which forces usage of 64 bit mingw, even if packages are compiled for 32 bit. Modifying the variable to include current architecture: {code} $env:BINPREF=$RtoolsDrive + '/Rtools40/mingw$(WIN)/bin/' {code} (as proposed [here|https://stackoverflow.com/a/44035904] by R Yoda) looks like a valid fix, though we might want to clean remaining issues as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33098) Exception when using 'in' to compare a partition column to a literal with the wrong type
[ https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-33098: -- Description: Comparing a partition column against a literal with the wrong type works if you use equality ('='). However, if you use 'in', you get: {noformat} MetaException(message:Filtering is supported only on partition keys of type string) {noformat} For example: {noformat} spark-sql> create table test (a int) partitioned by (b int) stored as parquet; Time taken: 0.323 seconds spark-sql> insert into test values (1, 1), (1, 2), (2, 2); 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updated size to 418 20/10/08 19:57:14 WARN log: Updated size to 836 Time taken: 2.124 seconds spark-sql> -- this works, of course spark-sql> select * from test where b in (2); 1 2 2 2 Time taken: 0.13 seconds, Fetched 2 row(s) spark-sql> -- this also works (equals with wrong type) spark-sql> select * from test where b = '2'; 1 2 2 2 Time taken: 0.132 seconds, Fetched 2 row(s) spark-sql> -- this does not work ('in' with wrong type) spark-sql> select * from test where b in ('2'); 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b in ('2')] java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) - - - Caused by: MetaException(message:Filtering is supported only on partition keys of type string) {noformat} There are also interesting variations of this using the dataframe API: {noformat} scala> sql("select cast(b as string) as b from test where b in (2)").show(false) +---+ |b | +---+ |2 | |2 | +---+ scala> sql("select cast(b as string) as b from test").filter("b in (2)").show(false) java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) - - Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string scala> sql("select cast(b as string) as b from test").filter("b in ('2')").show(false) java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK - - Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string {noformat} was: Comparing a partition column against a literal with the wrong type works if you use equality ('='). However, if you use 'in', you get: {noformat} MetaException(message:Filtering is supported only on partition keys of type string) {noformat} For example: {noformat} spark-sql> create table test (a int) partitioned by (b int) stored as parquet; Time taken: 0.323 seconds spark-sql> insert into test values (1, 1), (1, 2), (2, 2); 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updated size to 418 20/10/08 19:57:14 WARN log: Updated size to 836 Time taken: 2.124 seconds spark-sql> -- this works, of course spark-sql> select * from test where b in (2); 1 2 2 2 Time taken: 0.13 seconds, Fetched 2 row(s) spark-sql> -- this also works (equals with wrong type) spark-sql> select * from test where b = '2'; 1 2 2 2 Time taken: 0.132 seconds, Fetched 2 row(s) spark-sql> -- this does not work ('in' with wrong type) spark-sql> select * from test where b in ('2'); 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b in ('2')] java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.
[jira] [Updated] (SPARK-33098) Exception when using 'in' to compare a partition column to a literal with the wrong type
[ https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-33098: -- Description: Comparing a partition column against a literal with the wrong type works if you use equality ('='). However, if you use 'in', you get: {noformat} MetaException(message:Filtering is supported only on partition keys of type string) {noformat} For example: {noformat} spark-sql> create table test (a int) partitioned by (b int) stored as parquet; Time taken: 0.323 seconds spark-sql> insert into test values (1, 1), (1, 2), (2, 2); 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updated size to 418 20/10/08 19:57:14 WARN log: Updated size to 836 Time taken: 2.124 seconds spark-sql> -- this works, of course spark-sql> select * from test where b in (2); 1 2 2 2 Time taken: 0.13 seconds, Fetched 2 row(s) spark-sql> -- this also works (equals with wrong type) spark-sql> select * from test where b = '2'; 1 2 2 2 Time taken: 0.132 seconds, Fetched 2 row(s) spark-sql> -- this does not work ('in' with wrong type) spark-sql> select * from test where b in ('2'); 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b in ('2')] java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) - - - Caused by: MetaException(message:Filtering is supported only on partition keys of type string) {noformat} There are also interesting variations of this using the dataframe API: {noformat} scala> sql("select cast(b as string) as b from test where b in (2)").show(false) +---+ |b | +---+ |2 | |2 | +---+ scala> sql("select cast(b as string) as b from test").filter("b in (2)").show(false) java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) - - Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string scala> sql("select cast(b as string) as b from test").filter("b in ('2')").show(false) java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string {noformat} was: Comparing a partition column against a literal with the wrong type works if you use equality ('='). However, if you use 'in', you get: {noformat} MetaException(message:Filtering is supported only on partition keys of type string) {noformat} For example: {noformat} spark-sql> create table test (a int) partitioned by (b int) stored as parquet; Time taken: 0.323 seconds spark-sql> insert into test values (1, 1), (1, 2), (2, 2); 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test 20/10/08 19:57:14 WARN log: Updated size to 418 20/10/08 19:57:14 WARN log: Updated size to 836 Time taken: 2.124 seconds spark-sql> -- this works, of course spark-sql> select * from test where b in (2); 1 2 2 2 Time taken: 0.13 seconds, Fetched 2 row(s) spark-sql> -- this also works (equals with wrong type) spark-sql> select * from test where b = '2'; 1 2 2 2 Time taken: 0.132 seconds, Fetched 2 row(s) spark-sql> -- this does not work ('in' with wrong type) spark-sql> select * from test where b in ('2'); 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b in ('2')] java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apac
[jira] [Assigned] (SPARK-31972) Improve heurestic for selecting nodes for scale down to take into account graceful decommission cost
[ https://issues.apache.org/jira/browse/SPARK-31972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31972: Assignee: Apache Spark > Improve heurestic for selecting nodes for scale down to take into account > graceful decommission cost > > > Key: SPARK-31972 > URL: https://issues.apache.org/jira/browse/SPARK-31972 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Apache Spark >Priority: Major > > Once SPARK-31198 is in we should see if we can come up with a better graceful > decommissioning aware heuristic to use for selecting nodes to scale down. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31972) Improve heurestic for selecting nodes for scale down to take into account graceful decommission cost
[ https://issues.apache.org/jira/browse/SPARK-31972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211408#comment-17211408 ] Apache Spark commented on SPARK-31972: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/29993 > Improve heurestic for selecting nodes for scale down to take into account > graceful decommission cost > > > Key: SPARK-31972 > URL: https://issues.apache.org/jira/browse/SPARK-31972 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Holden Karau >Priority: Major > > Once SPARK-31198 is in we should see if we can come up with a better graceful > decommissioning aware heuristic to use for selecting nodes to scale down. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31972) Improve heurestic for selecting nodes for scale down to take into account graceful decommission cost
[ https://issues.apache.org/jira/browse/SPARK-31972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31972: Assignee: (was: Apache Spark) > Improve heurestic for selecting nodes for scale down to take into account > graceful decommission cost > > > Key: SPARK-31972 > URL: https://issues.apache.org/jira/browse/SPARK-31972 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Holden Karau >Priority: Major > > Once SPARK-31198 is in we should see if we can come up with a better graceful > decommissioning aware heuristic to use for selecting nodes to scale down. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32881) NoSuchElementException occurs during decommissioning
[ https://issues.apache.org/jira/browse/SPARK-32881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211407#comment-17211407 ] Apache Spark commented on SPARK-32881: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/29992 > NoSuchElementException occurs during decommissioning > > > Key: SPARK-32881 > URL: https://issues.apache.org/jira/browse/SPARK-32881 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > `BlockManagerMasterEndpoint` seems to fail at `getReplicateInfoForRDDBlocks` > due to `java.util.NoSuchElementException`. This happens on K8s IT testing, > but the main code seems to need a graceful handling of > `NoSuchElementException` instead of showing a naive error message. > {code} > private def getReplicateInfoForRDDBlocks(blockManagerId: BlockManagerId): > Seq[ReplicateBlock] = { > val info = blockManagerInfo(blockManagerId) >... > } > {code} > {code} > 20/09/14 18:56:54 INFO ExecutorPodsAllocator: Going to request 1 executors > from Kubernetes. > 20/09/14 18:56:54 INFO BasicExecutorFeatureStep: Adding decommission script > to lifecycle > 20/09/14 18:56:55 ERROR TaskSchedulerImpl: Lost executor 1 on 172.17.0.4: > Executor decommission. > 20/09/14 18:56:55 INFO BlockManagerMaster: Removal of executor 1 requested > 20/09/14 18:56:55 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove > non-existent executor 1 > 20/09/14 18:56:55 INFO BlockManagerMasterEndpoint: Trying to remove > executor 1 from BlockManagerMaster. > 20/09/14 18:56:55 INFO BlockManagerMasterEndpoint: Removing block manager > BlockManagerId(1, 172.17.0.4, 41235, None) > 20/09/14 18:56:55 INFO DAGScheduler: Executor lost: 1 (epoch 1) > 20/09/14 18:56:55 ERROR Inbox: Ignoring error > java.util.NoSuchElementException > at scala.collection.concurrent.TrieMap.apply(TrieMap.scala:833) > at > org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$getReplicateInfoForRDDBlocks(BlockManagerMasterEndpoint.scala:383) > at > org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:171) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) > at > org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 20/09/14 18:56:55 INFO BlockManagerMasterEndpoint: Trying to remove > executor 1 from BlockManagerMaster. > 20/09/14 18:56:55 INFO BlockManagerMaster: Removed 1 successfully in > removeExecutor > 20/09/14 18:56:55 INFO DAGScheduler: Shuffle files lost for executor: 1 > (epoch 1) > 20/09/14 18:56:58 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (172.17.0.7:46674) with > ID 4, ResourceProfileId 0 > 20/09/14 18:56:58 INFO BlockManagerMasterEndpoint: Registering block > manager 172.17.0.7:40495 with 593.9 MiB RAM, BlockManagerId(4, 172.17.0.7, > 40495, None) > 20/09/14 18:57:23 INFO SparkContext: Starting job: count at > /opt/spark/tests/decommissioning.py:49 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32881) NoSuchElementException occurs during decommissioning
[ https://issues.apache.org/jira/browse/SPARK-32881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32881: Assignee: Apache Spark > NoSuchElementException occurs during decommissioning > > > Key: SPARK-32881 > URL: https://issues.apache.org/jira/browse/SPARK-32881 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > `BlockManagerMasterEndpoint` seems to fail at `getReplicateInfoForRDDBlocks` > due to `java.util.NoSuchElementException`. This happens on K8s IT testing, > but the main code seems to need a graceful handling of > `NoSuchElementException` instead of showing a naive error message. > {code} > private def getReplicateInfoForRDDBlocks(blockManagerId: BlockManagerId): > Seq[ReplicateBlock] = { > val info = blockManagerInfo(blockManagerId) >... > } > {code} > {code} > 20/09/14 18:56:54 INFO ExecutorPodsAllocator: Going to request 1 executors > from Kubernetes. > 20/09/14 18:56:54 INFO BasicExecutorFeatureStep: Adding decommission script > to lifecycle > 20/09/14 18:56:55 ERROR TaskSchedulerImpl: Lost executor 1 on 172.17.0.4: > Executor decommission. > 20/09/14 18:56:55 INFO BlockManagerMaster: Removal of executor 1 requested > 20/09/14 18:56:55 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove > non-existent executor 1 > 20/09/14 18:56:55 INFO BlockManagerMasterEndpoint: Trying to remove > executor 1 from BlockManagerMaster. > 20/09/14 18:56:55 INFO BlockManagerMasterEndpoint: Removing block manager > BlockManagerId(1, 172.17.0.4, 41235, None) > 20/09/14 18:56:55 INFO DAGScheduler: Executor lost: 1 (epoch 1) > 20/09/14 18:56:55 ERROR Inbox: Ignoring error > java.util.NoSuchElementException > at scala.collection.concurrent.TrieMap.apply(TrieMap.scala:833) > at > org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$getReplicateInfoForRDDBlocks(BlockManagerMasterEndpoint.scala:383) > at > org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:171) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) > at > org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 20/09/14 18:56:55 INFO BlockManagerMasterEndpoint: Trying to remove > executor 1 from BlockManagerMaster. > 20/09/14 18:56:55 INFO BlockManagerMaster: Removed 1 successfully in > removeExecutor > 20/09/14 18:56:55 INFO DAGScheduler: Shuffle files lost for executor: 1 > (epoch 1) > 20/09/14 18:56:58 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (172.17.0.7:46674) with > ID 4, ResourceProfileId 0 > 20/09/14 18:56:58 INFO BlockManagerMasterEndpoint: Registering block > manager 172.17.0.7:40495 with 593.9 MiB RAM, BlockManagerId(4, 172.17.0.7, > 40495, None) > 20/09/14 18:57:23 INFO SparkContext: Starting job: count at > /opt/spark/tests/decommissioning.py:49 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32881) NoSuchElementException occurs during decommissioning
[ https://issues.apache.org/jira/browse/SPARK-32881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32881: Assignee: (was: Apache Spark) > NoSuchElementException occurs during decommissioning > > > Key: SPARK-32881 > URL: https://issues.apache.org/jira/browse/SPARK-32881 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > `BlockManagerMasterEndpoint` seems to fail at `getReplicateInfoForRDDBlocks` > due to `java.util.NoSuchElementException`. This happens on K8s IT testing, > but the main code seems to need a graceful handling of > `NoSuchElementException` instead of showing a naive error message. > {code} > private def getReplicateInfoForRDDBlocks(blockManagerId: BlockManagerId): > Seq[ReplicateBlock] = { > val info = blockManagerInfo(blockManagerId) >... > } > {code} > {code} > 20/09/14 18:56:54 INFO ExecutorPodsAllocator: Going to request 1 executors > from Kubernetes. > 20/09/14 18:56:54 INFO BasicExecutorFeatureStep: Adding decommission script > to lifecycle > 20/09/14 18:56:55 ERROR TaskSchedulerImpl: Lost executor 1 on 172.17.0.4: > Executor decommission. > 20/09/14 18:56:55 INFO BlockManagerMaster: Removal of executor 1 requested > 20/09/14 18:56:55 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove > non-existent executor 1 > 20/09/14 18:56:55 INFO BlockManagerMasterEndpoint: Trying to remove > executor 1 from BlockManagerMaster. > 20/09/14 18:56:55 INFO BlockManagerMasterEndpoint: Removing block manager > BlockManagerId(1, 172.17.0.4, 41235, None) > 20/09/14 18:56:55 INFO DAGScheduler: Executor lost: 1 (epoch 1) > 20/09/14 18:56:55 ERROR Inbox: Ignoring error > java.util.NoSuchElementException > at scala.collection.concurrent.TrieMap.apply(TrieMap.scala:833) > at > org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$getReplicateInfoForRDDBlocks(BlockManagerMasterEndpoint.scala:383) > at > org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:171) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) > at > org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 20/09/14 18:56:55 INFO BlockManagerMasterEndpoint: Trying to remove > executor 1 from BlockManagerMaster. > 20/09/14 18:56:55 INFO BlockManagerMaster: Removed 1 successfully in > removeExecutor > 20/09/14 18:56:55 INFO DAGScheduler: Shuffle files lost for executor: 1 > (epoch 1) > 20/09/14 18:56:58 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (172.17.0.7:46674) with > ID 4, ResourceProfileId 0 > 20/09/14 18:56:58 INFO BlockManagerMasterEndpoint: Registering block > manager 172.17.0.7:40495 with 593.9 MiB RAM, BlockManagerId(4, 172.17.0.7, > 40495, None) > 20/09/14 18:57:23 INFO SparkContext: Starting job: count at > /opt/spark/tests/decommissioning.py:49 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33104) Fix YarnClusterSuite.yarn-cluster should respect conf overrides in SparkHadoopUtil
Dongjoon Hyun created SPARK-33104: - Summary: Fix YarnClusterSuite.yarn-cluster should respect conf overrides in SparkHadoopUtil Key: SPARK-33104 URL: https://issues.apache.org/jira/browse/SPARK-33104 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 3.1.0 Reporter: Dongjoon Hyun - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/1377/testReport/org.apache.spark.deploy.yarn/YarnClusterSuite/yarn_cluster_should_respect_conf_overrides_in_SparkHadoopUtil__SPARK_16414__SPARK_23630_/ {code} 20/10/09 05:18:13.211 ContainersLauncher #0 WARN DefaultContainerExecutor: Exit code from container container_1602245728426_0006_02_01 is : 15 20/10/09 05:18:13.211 ContainersLauncher #0 WARN DefaultContainerExecutor: Exception from container-launch with container ID: container_1602245728426_0006_02_01 and exit code: 15 ExitCodeException exitCode=15: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 20/10/09 05:18:13.211 ContainersLauncher #0 WARN ContainerLaunch: Container exited with a non-zero exit code 15 20/10/09 05:18:13.237 AsyncDispatcher event handler WARN NMAuditLogger: USER=jenkinsOPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1602245728426_0006 CONTAINERID=container_1602245728426_0006_02_01 20/10/09 05:18:13.244 Socket Reader #1 for port 37112 INFO Server: Auth successful for appattempt_1602245728426_0006_02 (auth:SIMPLE) 20/10/09 05:18:13.326 IPC Parameter Sending Thread #0 DEBUG Client: IPC Client (1123559518) connection to amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins sending #37 20/10/09 05:18:13.327 IPC Client (1123559518) connection to amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins DEBUG Client: IPC Client (1123559518) connection to amp-jenkins-worker-04.amp/192.168.10.24:43090 from jenkins got value #37 20/10/09 05:18:13.328 main DEBUG ProtobufRpcEngine: Call: getApplicationReport took 2ms 20/10/09 05:18:13.328 main INFO Client: Application report for application_1602245728426_0006 (state: FINISHED) 20/10/09 05:18:13.328 main DEBUG Client: client token: N/A diagnostics: User class threw exception: org.scalatest.exceptions.TestFailedException: null was not equal to "testvalue" at org.scalatest.matchers.MatchersHelper$.indicateFailure(MatchersHelper.scala:344) at org.scalatest.matchers.should.Matchers$ShouldMethodHelperClass.shouldMatcher(Matchers.scala:6778) at org.scalatest.matchers.should.Matchers$AnyShouldWrapper.should(Matchers.scala:6822) at org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf$.$anonfun$main$2(YarnClusterSuite.scala:383) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf$.main(YarnClusterSuite.scala:382) at org.apache.spark.deploy.yarn.YarnClusterDriverUseSparkHadoopUtilConf.main(YarnClusterSuite.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:732) ApplicationMaster host: amp-jenkins-worker-04.amp ApplicationMaster RPC port: 36200 queue: default start time: 1602245859148 final status: FAILED tracking URL: http://amp-jenkins-worker-04.amp:39546/proxy/application_1602245728426_0
[jira] [Resolved] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-9686. - Resolution: Duplicate > Spark Thrift server doesn't return correct JDBC metadata > - > > Key: SPARK-9686 > URL: https://issues.apache.org/jira/browse/SPARK-9686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2 >Reporter: pin_zhang >Priority: Critical > Attachments: SPARK-9686.1.patch.txt > > > 1. Start start-thriftserver.sh > 2. connect with beeline > 3. create table > 4.show tables, the new created table returned > 5. > Class.forName("org.apache.hive.jdbc.HiveDriver"); > String URL = "jdbc:hive2://localhost:1/default"; >Properties info = new Properties(); > Connection conn = DriverManager.getConnection(URL, info); > ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(), >null, null, null); > Problem: >No tables with returned this API, that work in spark1.3 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33103) Custom Schema with Custom RDD reorders columns when more than 4 added
Justin Mays created SPARK-33103: --- Summary: Custom Schema with Custom RDD reorders columns when more than 4 added Key: SPARK-33103 URL: https://issues.apache.org/jira/browse/SPARK-33103 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1 Environment: Java Application Reporter: Justin Mays I have a custom RDD written in Java that uses a custom schema. Everything appears to work fine with using 4 columns, but when i add a 5th column, calling show() fails with java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.Long is not a valid external type for schema of here is the schema definition in java: StructType schema = new StructType() StructType schema = new StructType() .add("recordId", DataTypes.LongType, false) .add("col1", DataTypes.DoubleType, false) .add("col2", DataTypes.DoubleType, false) .add("col3", DataTypes.IntegerType, false) .add("col4", DataTypes.IntegerType, false); Here is the printout of schema.printTreeString(); == Physical Plan == *(1) Scan dw [recordId#0L,col1#1,col2#2,col3#3,col4#4] PushedFilters: [], ReadSchema: struct I hardcoded a return in my Row object with values matching the schema: @Override @Override public Object get(int i) \{ switch(i) { case 0: return 0L; case 1: return 1.1911950001644689D; case 2: return 9.10949955666E9D; case 3: return 476; case 4: return 500; } return 0L; } Here is the output of the show command: 15:30:26.875 ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0)15:30:26.875 ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0)java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.Long is not a valid external type for schema of doublevalidateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, col1), DoubleType) AS col1#30validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, recordId), LongType) AS recordId#31Lvalidateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, col2), DoubleType) AS col2#32validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 3, col3), IntegerType) AS col3#33validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 4, col4), IntegerType) AS col4#34 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215) ~[spark-catalyst_2.12-3.0.1.jar:3.0.1] at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:197) ~[spark-catalyst_2.12-3.0.1.jar:3.0.1] at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) ~[scala-library-2.12.10.jar:?] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.12-3.0.1.jar:3.0.1] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) ~[spark-sql_2.12-3.0.1.jar:3.0.1] at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340) ~[spark-sql_2.12-3.0.1.jar:3.0.1] at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.scheduler.Task.run(Task.scala:127) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) [spark-core_2.12-3.0.1.jar:3.0.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]Caused by: java.lang.RuntimeException: java.lang.Long is not a valid external type for schema of double at org.apache.spark.sql.catalyst.expre
[jira] [Commented] (SPARK-32069) Improve error message on reading unexpected directory which is not a table partition
[ https://issues.apache.org/jira/browse/SPARK-32069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211312#comment-17211312 ] Aoyuan Liao commented on SPARK-32069: - [~Gengliang.Wang] If no one is working on it, can I take this one? > Improve error message on reading unexpected directory which is not a table > partition > > > Key: SPARK-32069 > URL: https://issues.apache.org/jira/browse/SPARK-32069 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Minor > Labels: starter > > To reproduce: > {code:java} > spark-sql> create table test(i long); > spark-sql> insert into test values(1); > {code} > {code:java} > bash $ mkdir ./spark-warehouse/test/data > {code} > There will be such error messge > {code:java} > java.io.IOException: Not a file: > file:/Users/gengliang.wang/projects/spark/spark-warehouse/test/data > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2173) > at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) > at org.apache.spark.rdd.RDD.collect(RDD.scala:1029) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385) > at > org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:412) > at > org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282) > at > org.apache.spark.
[jira] [Commented] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211305#comment-17211305 ] Aoyuan Liao commented on SPARK-9686: [~srowen] I think SPARK-28426 fixed this. > Spark Thrift server doesn't return correct JDBC metadata > - > > Key: SPARK-9686 > URL: https://issues.apache.org/jira/browse/SPARK-9686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2 >Reporter: pin_zhang >Priority: Critical > Attachments: SPARK-9686.1.patch.txt > > > 1. Start start-thriftserver.sh > 2. connect with beeline > 3. create table > 4.show tables, the new created table returned > 5. > Class.forName("org.apache.hive.jdbc.HiveDriver"); > String URL = "jdbc:hive2://localhost:1/default"; >Properties info = new Properties(); > Connection conn = DriverManager.getConnection(URL, info); > ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(), >null, null, null); > Problem: >No tables with returned this API, that work in spark1.3 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33039) Misleading watermark calculation in structure streaming
[ https://issues.apache.org/jira/browse/SPARK-33039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211294#comment-17211294 ] Sean R. Owen commented on SPARK-33039: -- Yeah I think that's an OK resolution. Thanks for resolving Sandish. > Misleading watermark calculation in structure streaming > --- > > Key: SPARK-33039 > URL: https://issues.apache.org/jira/browse/SPARK-33039 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.4 >Reporter: Sandish Kumar HN >Priority: Major > > source code: > {code:java} > import org.apache.spark.sql.SparkSession > import org.apache.hadoop.fs.Path > import java.sql.Timestamp > import org.apache.spark.sql.types._ > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.streaming.{ProcessingTime, Trigger} > object TestWaterMark extends App { > val spark = SparkSession.builder().master("local").getOrCreate() > val sc = spark.sparkContext > val dir = new Path("/tmp/test-structured-streaming") > val fs = dir.getFileSystem(sc.hadoopConfiguration) > fs.mkdirs(dir) > val schema = StructType(StructField("vilue", StringType) :: > StructField("timestamp", TimestampType) :: > Nil) > val eventStream = spark > .readStream > .option("sep", ";") > .option("header", "false") > .schema(schema) > .csv(dir.toString) > // Watermarked aggregation > val eventsCount = eventStream > .withWatermark("timestamp", "5 seconds") > .groupBy(window(col("timestamp"), "10 seconds")) > .count > def writeFile(path: Path, data: String) { > val file = fs.create(path) > file.writeUTF(data) > file.close() > } > // Debug query > val query = eventsCount.writeStream > .format("console") > .outputMode("complete") > .option("truncate", "false") > .trigger(Trigger.ProcessingTime("5 seconds")) > .start() > writeFile(new Path(dir, "file1"), """ > |OLD;2019-08-09 10:05:00 > |OLD;2019-08-09 10:10:00 > |OLD;2019-08-09 10:15:00""".stripMargin) > query.processAllAvailable() > val lp1 = query.lastProgress > println(lp1.eventTime) > writeFile(new Path(dir, "file2"), """ > |NEW;2020-08-29 10:05:00 > |NEW;2020-08-29 10:10:00 > |NEW;2020-08-29 10:15:00""".stripMargin) > query.processAllAvailable() > val lp2 = query.lastProgress > println(lp2.eventTime) > writeFile(new Path(dir, "file4"), """ > |OLD;2017-08-10 10:05:00 > |OLD;2017-08-10 10:10:00 > |OLD;2017-08-10 10:15:00""".stripMargin) > writeFile(new Path(dir, "file3"), "") > query.processAllAvailable() > val lp3 = query.lastProgress > println(lp3.eventTime) > query.awaitTermination() > fs.delete(dir, true) > } > {code} > OUTPUT: > > {code:java} > --- > Batch: 0 > --- > +--+-+ > |window |count| > +--+-+ > |[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 | > |[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 | > |[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 | > +--+-+ > {min=2019-08-09T17:05:00.000Z, avg=2019-08-09T17:10:00.000Z, > watermark=1970-01-01T00:00:00.000Z, max=2019-08-09T17:15:00.000Z} > --- > Batch: 1 > --- > +--+-+ > |window |count| > +--+-+ > |[2020-08-29 10:15:00, 2020-08-29 10:15:10]|1 | > |[2020-08-29 10:10:00, 2020-08-29 10:10:10]|1 | > |[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 | > |[2020-08-29 10:05:00, 2020-08-29 10:05:10]|1 | > |[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 | > |[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 | > +--+-+ > {min=2020-08-29T17:05:00.000Z, avg=2020-08-29T17:10:00.000Z, > watermark=2019-08-09T17:14:55.000Z, max=2020-08-29T17:15:00.000Z} > --- > Batch: 2 > --- > +--+-+ > |window |count| > +--+-+ > |[2017-08-10 10:15:00, 2017-08-10 10:15:10]|1 | > |[2020-08-29 10:15:00, 2020-08-29 10:15:10]|1 | > |[2017-08-10 10:05:00, 2017-08-10 10:05:10]|1 | > |[2020-08-29 10:10:00, 2020-08-29 10:10:10]|1 | > |[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 | > |[2017-08-10 10:10:00, 2017-08-10 10:10:10]|1 | > |[2020-08-29 10:05:00, 2020-08-29 10:05:10]|1 | > |[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 | > |[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 | > +--+-+ > {min=2017-08-10T17:05:00.000Z, avg=2017-08-10T17:10:00.000Z, > watermark=2020-08-29T17:14:55.000Z, max=2017-08-10T17:15:00.000Z} > {
[jira] [Resolved] (SPARK-33039) Misleading watermark calculation in structure streaming
[ https://issues.apache.org/jira/browse/SPARK-33039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandish Kumar HN resolved SPARK-33039. -- Resolution: Invalid > Misleading watermark calculation in structure streaming > --- > > Key: SPARK-33039 > URL: https://issues.apache.org/jira/browse/SPARK-33039 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.4 >Reporter: Sandish Kumar HN >Priority: Major > > source code: > {code:java} > import org.apache.spark.sql.SparkSession > import org.apache.hadoop.fs.Path > import java.sql.Timestamp > import org.apache.spark.sql.types._ > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.streaming.{ProcessingTime, Trigger} > object TestWaterMark extends App { > val spark = SparkSession.builder().master("local").getOrCreate() > val sc = spark.sparkContext > val dir = new Path("/tmp/test-structured-streaming") > val fs = dir.getFileSystem(sc.hadoopConfiguration) > fs.mkdirs(dir) > val schema = StructType(StructField("vilue", StringType) :: > StructField("timestamp", TimestampType) :: > Nil) > val eventStream = spark > .readStream > .option("sep", ";") > .option("header", "false") > .schema(schema) > .csv(dir.toString) > // Watermarked aggregation > val eventsCount = eventStream > .withWatermark("timestamp", "5 seconds") > .groupBy(window(col("timestamp"), "10 seconds")) > .count > def writeFile(path: Path, data: String) { > val file = fs.create(path) > file.writeUTF(data) > file.close() > } > // Debug query > val query = eventsCount.writeStream > .format("console") > .outputMode("complete") > .option("truncate", "false") > .trigger(Trigger.ProcessingTime("5 seconds")) > .start() > writeFile(new Path(dir, "file1"), """ > |OLD;2019-08-09 10:05:00 > |OLD;2019-08-09 10:10:00 > |OLD;2019-08-09 10:15:00""".stripMargin) > query.processAllAvailable() > val lp1 = query.lastProgress > println(lp1.eventTime) > writeFile(new Path(dir, "file2"), """ > |NEW;2020-08-29 10:05:00 > |NEW;2020-08-29 10:10:00 > |NEW;2020-08-29 10:15:00""".stripMargin) > query.processAllAvailable() > val lp2 = query.lastProgress > println(lp2.eventTime) > writeFile(new Path(dir, "file4"), """ > |OLD;2017-08-10 10:05:00 > |OLD;2017-08-10 10:10:00 > |OLD;2017-08-10 10:15:00""".stripMargin) > writeFile(new Path(dir, "file3"), "") > query.processAllAvailable() > val lp3 = query.lastProgress > println(lp3.eventTime) > query.awaitTermination() > fs.delete(dir, true) > } > {code} > OUTPUT: > > {code:java} > --- > Batch: 0 > --- > +--+-+ > |window |count| > +--+-+ > |[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 | > |[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 | > |[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 | > +--+-+ > {min=2019-08-09T17:05:00.000Z, avg=2019-08-09T17:10:00.000Z, > watermark=1970-01-01T00:00:00.000Z, max=2019-08-09T17:15:00.000Z} > --- > Batch: 1 > --- > +--+-+ > |window |count| > +--+-+ > |[2020-08-29 10:15:00, 2020-08-29 10:15:10]|1 | > |[2020-08-29 10:10:00, 2020-08-29 10:10:10]|1 | > |[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 | > |[2020-08-29 10:05:00, 2020-08-29 10:05:10]|1 | > |[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 | > |[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 | > +--+-+ > {min=2020-08-29T17:05:00.000Z, avg=2020-08-29T17:10:00.000Z, > watermark=2019-08-09T17:14:55.000Z, max=2020-08-29T17:15:00.000Z} > --- > Batch: 2 > --- > +--+-+ > |window |count| > +--+-+ > |[2017-08-10 10:15:00, 2017-08-10 10:15:10]|1 | > |[2020-08-29 10:15:00, 2020-08-29 10:15:10]|1 | > |[2017-08-10 10:05:00, 2017-08-10 10:05:10]|1 | > |[2020-08-29 10:10:00, 2020-08-29 10:10:10]|1 | > |[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 | > |[2017-08-10 10:10:00, 2017-08-10 10:10:10]|1 | > |[2020-08-29 10:05:00, 2020-08-29 10:05:10]|1 | > |[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 | > |[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 | > +--+-+ > {min=2017-08-10T17:05:00.000Z, avg=2017-08-10T17:10:00.000Z, > watermark=2020-08-29T17:14:55.000Z, max=2017-08-10T17:15:00.000Z} > {code} > EXPECTED: > expected to drop the last batch events to get dropped as the watermark i
[jira] [Commented] (SPARK-33039) Misleading watermark calculation in structure streaming
[ https://issues.apache.org/jira/browse/SPARK-33039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211288#comment-17211288 ] Aoyuan Liao commented on SPARK-33039: - [~srowen] This is actually not a bug. The user didn't fully understand the documenation. The output is correct as what we intended. Can we mark it as "not a problem"? > Misleading watermark calculation in structure streaming > --- > > Key: SPARK-33039 > URL: https://issues.apache.org/jira/browse/SPARK-33039 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.4 >Reporter: Sandish Kumar HN >Priority: Major > > source code: > {code:java} > import org.apache.spark.sql.SparkSession > import org.apache.hadoop.fs.Path > import java.sql.Timestamp > import org.apache.spark.sql.types._ > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.streaming.{ProcessingTime, Trigger} > object TestWaterMark extends App { > val spark = SparkSession.builder().master("local").getOrCreate() > val sc = spark.sparkContext > val dir = new Path("/tmp/test-structured-streaming") > val fs = dir.getFileSystem(sc.hadoopConfiguration) > fs.mkdirs(dir) > val schema = StructType(StructField("vilue", StringType) :: > StructField("timestamp", TimestampType) :: > Nil) > val eventStream = spark > .readStream > .option("sep", ";") > .option("header", "false") > .schema(schema) > .csv(dir.toString) > // Watermarked aggregation > val eventsCount = eventStream > .withWatermark("timestamp", "5 seconds") > .groupBy(window(col("timestamp"), "10 seconds")) > .count > def writeFile(path: Path, data: String) { > val file = fs.create(path) > file.writeUTF(data) > file.close() > } > // Debug query > val query = eventsCount.writeStream > .format("console") > .outputMode("complete") > .option("truncate", "false") > .trigger(Trigger.ProcessingTime("5 seconds")) > .start() > writeFile(new Path(dir, "file1"), """ > |OLD;2019-08-09 10:05:00 > |OLD;2019-08-09 10:10:00 > |OLD;2019-08-09 10:15:00""".stripMargin) > query.processAllAvailable() > val lp1 = query.lastProgress > println(lp1.eventTime) > writeFile(new Path(dir, "file2"), """ > |NEW;2020-08-29 10:05:00 > |NEW;2020-08-29 10:10:00 > |NEW;2020-08-29 10:15:00""".stripMargin) > query.processAllAvailable() > val lp2 = query.lastProgress > println(lp2.eventTime) > writeFile(new Path(dir, "file4"), """ > |OLD;2017-08-10 10:05:00 > |OLD;2017-08-10 10:10:00 > |OLD;2017-08-10 10:15:00""".stripMargin) > writeFile(new Path(dir, "file3"), "") > query.processAllAvailable() > val lp3 = query.lastProgress > println(lp3.eventTime) > query.awaitTermination() > fs.delete(dir, true) > } > {code} > OUTPUT: > > {code:java} > --- > Batch: 0 > --- > +--+-+ > |window |count| > +--+-+ > |[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 | > |[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 | > |[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 | > +--+-+ > {min=2019-08-09T17:05:00.000Z, avg=2019-08-09T17:10:00.000Z, > watermark=1970-01-01T00:00:00.000Z, max=2019-08-09T17:15:00.000Z} > --- > Batch: 1 > --- > +--+-+ > |window |count| > +--+-+ > |[2020-08-29 10:15:00, 2020-08-29 10:15:10]|1 | > |[2020-08-29 10:10:00, 2020-08-29 10:10:10]|1 | > |[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 | > |[2020-08-29 10:05:00, 2020-08-29 10:05:10]|1 | > |[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 | > |[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 | > +--+-+ > {min=2020-08-29T17:05:00.000Z, avg=2020-08-29T17:10:00.000Z, > watermark=2019-08-09T17:14:55.000Z, max=2020-08-29T17:15:00.000Z} > --- > Batch: 2 > --- > +--+-+ > |window |count| > +--+-+ > |[2017-08-10 10:15:00, 2017-08-10 10:15:10]|1 | > |[2020-08-29 10:15:00, 2020-08-29 10:15:10]|1 | > |[2017-08-10 10:05:00, 2017-08-10 10:05:10]|1 | > |[2020-08-29 10:10:00, 2020-08-29 10:10:10]|1 | > |[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 | > |[2017-08-10 10:10:00, 2017-08-10 10:10:10]|1 | > |[2020-08-29 10:05:00, 2020-08-29 10:05:10]|1 | > |[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 | > |[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 | > +--+-+ > {min=2017-08-10T17:05:00.000Z, avg
[jira] [Resolved] (SPARK-31430) Bug in the approximate quantile computation.
[ https://issues.apache.org/jira/browse/SPARK-31430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-31430. -- Resolution: Duplicate > Bug in the approximate quantile computation. > > > Key: SPARK-31430 > URL: https://issues.apache.org/jira/browse/SPARK-31430 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Siddartha Naidu >Priority: Major > Attachments: approx_quantile_data.csv > > > I am seeing a bug where passing lower relative error to the > {{approxQuantile}} function is leading to incorrect result in the presence of > partitions. Setting a relative error 1e-6 causes it to compute equal values > for 0.9 and 1.0 quantiles. Coalescing it back to 1 partition gives correct > results. This issue was not present in spark version 2.4.5, we noticed it > when testing 3.0.0-preview. > {{>>> df = spark.read.csv('file:///tmp/approx_quantile_data.csv', > header=True, > schema=T.StructType([T.StructField('Store',T.StringType(),True),T.StructField('seconds',T.LongType(),True)]))}} > {{>>> df = df.repartition(200, 'Store').localCheckpoint()}} > {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.0001)}} > {{[1422576000.0, 1430352000.0, 1438300800.0]}} > {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.1)}} > {{[1422576000.0, 1430524800.0, 1438300800.0]}} > {color:#de350b}{{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], > 0.01)}}{color} > {color:#de350b}{{[1422576000.0, 1438300800.0, 1438300800.0]}}{color} > {{>>> df.coalesce(1).approxQuantile('seconds', [0.8, 0.9, 1.0], 0.01)}} > {{[1422576000.0, 1430524800.0, 1438300800.0]}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16859) History Server storage information is missing
[ https://issues.apache.org/jira/browse/SPARK-16859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-16859. -- Resolution: Not A Problem Sounds fine. It's very old in any event. > History Server storage information is missing > - > > Key: SPARK-16859 > URL: https://issues.apache.org/jira/browse/SPARK-16859 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2, 2.0.0 >Reporter: Andrei Ivanov >Priority: Major > Labels: historyserver, newbie > > It looks like job history storage tab in history server is broken for > completed jobs since *1.6.2*. > More specifically it's broken since > [SPARK-13845|https://issues.apache.org/jira/browse/SPARK-13845]. > I've fixed for my installation by effectively reverting the above patch > ([see|https://github.com/EinsamHauer/spark/commit/3af62ea09af8bb350c8c8a9117149c09b8feba08]). > IMHO, the most straightforward fix would be to implement > _SparkListenerBlockUpdated_ serialization to JSON in _JsonProtocol_ making > sure it works from _ReplayListenerBus_. > The downside will be that it will still work incorrectly with pre patch job > histories. But then, it doesn't work since *1.6.2* anyhow. > PS: I'd really love to have this fixed eventually. But I'm pretty new to > Apache Spark and missing hands on Scala experience. So I'd prefer that it be > fixed by someone experienced with roadmap vision. If nobody volunteers I'll > try to patch myself. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31430) Bug in the approximate quantile computation.
[ https://issues.apache.org/jira/browse/SPARK-31430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211286#comment-17211286 ] Sean R. Owen commented on SPARK-31430: -- Sounds good, I usually mark as a Duplicate. > Bug in the approximate quantile computation. > > > Key: SPARK-31430 > URL: https://issues.apache.org/jira/browse/SPARK-31430 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Siddartha Naidu >Priority: Major > Attachments: approx_quantile_data.csv > > > I am seeing a bug where passing lower relative error to the > {{approxQuantile}} function is leading to incorrect result in the presence of > partitions. Setting a relative error 1e-6 causes it to compute equal values > for 0.9 and 1.0 quantiles. Coalescing it back to 1 partition gives correct > results. This issue was not present in spark version 2.4.5, we noticed it > when testing 3.0.0-preview. > {{>>> df = spark.read.csv('file:///tmp/approx_quantile_data.csv', > header=True, > schema=T.StructType([T.StructField('Store',T.StringType(),True),T.StructField('seconds',T.LongType(),True)]))}} > {{>>> df = df.repartition(200, 'Store').localCheckpoint()}} > {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.0001)}} > {{[1422576000.0, 1430352000.0, 1438300800.0]}} > {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.1)}} > {{[1422576000.0, 1430524800.0, 1438300800.0]}} > {color:#de350b}{{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], > 0.01)}}{color} > {color:#de350b}{{[1422576000.0, 1438300800.0, 1438300800.0]}}{color} > {{>>> df.coalesce(1).approxQuantile('seconds', [0.8, 0.9, 1.0], 0.01)}} > {{[1422576000.0, 1430524800.0, 1438300800.0]}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16859) History Server storage information is missing
[ https://issues.apache.org/jira/browse/SPARK-16859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211284#comment-17211284 ] Aoyuan Liao commented on SPARK-16859: - [~srowen] After configuring "spark.eventLog.logBlockUpdates.enabled=true", it works on v3.0.1. Should we mark it as "not a problem"? > History Server storage information is missing > - > > Key: SPARK-16859 > URL: https://issues.apache.org/jira/browse/SPARK-16859 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2, 2.0.0 >Reporter: Andrei Ivanov >Priority: Major > Labels: historyserver, newbie > > It looks like job history storage tab in history server is broken for > completed jobs since *1.6.2*. > More specifically it's broken since > [SPARK-13845|https://issues.apache.org/jira/browse/SPARK-13845]. > I've fixed for my installation by effectively reverting the above patch > ([see|https://github.com/EinsamHauer/spark/commit/3af62ea09af8bb350c8c8a9117149c09b8feba08]). > IMHO, the most straightforward fix would be to implement > _SparkListenerBlockUpdated_ serialization to JSON in _JsonProtocol_ making > sure it works from _ReplayListenerBus_. > The downside will be that it will still work incorrectly with pre patch job > histories. But then, it doesn't work since *1.6.2* anyhow. > PS: I'd really love to have this fixed eventually. But I'm pretty new to > Apache Spark and missing hands on Scala experience. So I'd prefer that it be > fixed by someone experienced with roadmap vision. If nobody volunteers I'll > try to patch myself. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211283#comment-17211283 ] Sean R. Owen commented on SPARK-9686: - [~EveLiao] probably - do you know what other issue or PR might have resolved it so we can mark as a Duplicate? if we don't know, I usually just mark "Not a Problem" (anymore). > Spark Thrift server doesn't return correct JDBC metadata > - > > Key: SPARK-9686 > URL: https://issues.apache.org/jira/browse/SPARK-9686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2 >Reporter: pin_zhang >Priority: Critical > Attachments: SPARK-9686.1.patch.txt > > > 1. Start start-thriftserver.sh > 2. connect with beeline > 3. create table > 4.show tables, the new created table returned > 5. > Class.forName("org.apache.hive.jdbc.HiveDriver"); > String URL = "jdbc:hive2://localhost:1/default"; >Properties info = new Properties(); > Connection conn = DriverManager.getConnection(URL, info); > ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(), >null, null, null); > Problem: >No tables with returned this API, that work in spark1.3 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31430) Bug in the approximate quantile computation.
[ https://issues.apache.org/jira/browse/SPARK-31430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211282#comment-17211282 ] Aoyuan Liao commented on SPARK-31430: - [~srowen] This is already fixed. > Bug in the approximate quantile computation. > > > Key: SPARK-31430 > URL: https://issues.apache.org/jira/browse/SPARK-31430 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Siddartha Naidu >Priority: Major > Attachments: approx_quantile_data.csv > > > I am seeing a bug where passing lower relative error to the > {{approxQuantile}} function is leading to incorrect result in the presence of > partitions. Setting a relative error 1e-6 causes it to compute equal values > for 0.9 and 1.0 quantiles. Coalescing it back to 1 partition gives correct > results. This issue was not present in spark version 2.4.5, we noticed it > when testing 3.0.0-preview. > {{>>> df = spark.read.csv('file:///tmp/approx_quantile_data.csv', > header=True, > schema=T.StructType([T.StructField('Store',T.StringType(),True),T.StructField('seconds',T.LongType(),True)]))}} > {{>>> df = df.repartition(200, 'Store').localCheckpoint()}} > {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.0001)}} > {{[1422576000.0, 1430352000.0, 1438300800.0]}} > {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.1)}} > {{[1422576000.0, 1430524800.0, 1438300800.0]}} > {color:#de350b}{{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], > 0.01)}}{color} > {color:#de350b}{{[1422576000.0, 1438300800.0, 1438300800.0]}}{color} > {{>>> df.coalesce(1).approxQuantile('seconds', [0.8, 0.9, 1.0], 0.01)}} > {{[1422576000.0, 1430524800.0, 1438300800.0]}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211280#comment-17211280 ] Aoyuan Liao commented on SPARK-9686: [~srowen] This is resolved on 3.0.1. Should we mark it as fixed? > Spark Thrift server doesn't return correct JDBC metadata > - > > Key: SPARK-9686 > URL: https://issues.apache.org/jira/browse/SPARK-9686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2 >Reporter: pin_zhang >Priority: Critical > Attachments: SPARK-9686.1.patch.txt > > > 1. Start start-thriftserver.sh > 2. connect with beeline > 3. create table > 4.show tables, the new created table returned > 5. > Class.forName("org.apache.hive.jdbc.HiveDriver"); > String URL = "jdbc:hive2://localhost:1/default"; >Properties info = new Properties(); > Connection conn = DriverManager.getConnection(URL, info); > ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(), >null, null, null); > Problem: >No tables with returned this API, that work in spark1.3 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33062) Make DataFrameReader.jdbc work for DataSource V2
[ https://issues.apache.org/jira/browse/SPARK-33062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-33062. Resolution: Not A Problem > Make DataFrameReader.jdbc work for DataSource V2 > - > > Key: SPARK-33062 > URL: https://issues.apache.org/jira/browse/SPARK-33062 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > Support multiple catalogs in DataFrameReader.jdbc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33010) Make DataFrameWriter.jdbc work for DataSource V2
[ https://issues.apache.org/jira/browse/SPARK-33010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-33010. Resolution: Not A Problem > Make DataFrameWriter.jdbc work for DataSource V2 > - > > Key: SPARK-33010 > URL: https://issues.apache.org/jira/browse/SPARK-33010 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Minor > > Support multiple catalogs in DataFrameWriter.jdbc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33081) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (DB2 dialect)
[ https://issues.apache.org/jira/browse/SPARK-33081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33081: Assignee: Apache Spark > Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of > columns (DB2 dialect) > -- > > Key: SPARK-33081 > URL: https://issues.apache.org/jira/browse/SPARK-33081 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Major > > Override the default SQL strings for: > * ALTER TABLE UPDATE COLUMN TYPE > * ALTER TABLE UPDATE COLUMN NULLABILITY > in the following DB2 JDBC dialect according to official documentation. > Write DB2 integration tests for JDBC. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33081) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (DB2 dialect)
[ https://issues.apache.org/jira/browse/SPARK-33081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33081: Assignee: (was: Apache Spark) > Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of > columns (DB2 dialect) > -- > > Key: SPARK-33081 > URL: https://issues.apache.org/jira/browse/SPARK-33081 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > Override the default SQL strings for: > * ALTER TABLE UPDATE COLUMN TYPE > * ALTER TABLE UPDATE COLUMN NULLABILITY > in the following DB2 JDBC dialect according to official documentation. > Write DB2 integration tests for JDBC. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33102) Use stringToSeq on SQL list typed parameters
[ https://issues.apache.org/jira/browse/SPARK-33102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33102: Assignee: (was: Apache Spark) > Use stringToSeq on SQL list typed parameters > > > Key: SPARK-33102 > URL: https://issues.apache.org/jira/browse/SPARK-33102 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33102) Use stringToSeq on SQL list typed parameters
[ https://issues.apache.org/jira/browse/SPARK-33102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33102: Assignee: Apache Spark > Use stringToSeq on SQL list typed parameters > > > Key: SPARK-33102 > URL: https://issues.apache.org/jira/browse/SPARK-33102 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33102) Use stringToSeq on SQL list typed parameters
[ https://issues.apache.org/jira/browse/SPARK-33102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211166#comment-17211166 ] Apache Spark commented on SPARK-33102: -- User 'gaborgsomogyi' has created a pull request for this issue: https://github.com/apache/spark/pull/29989 > Use stringToSeq on SQL list typed parameters > > > Key: SPARK-33102 > URL: https://issues.apache.org/jira/browse/SPARK-33102 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-33081) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (DB2 dialect)
[ https://issues.apache.org/jira/browse/SPARK-33081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao reopened SPARK-33081: > Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of > columns (DB2 dialect) > -- > > Key: SPARK-33081 > URL: https://issues.apache.org/jira/browse/SPARK-33081 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > Override the default SQL strings for: > * ALTER TABLE UPDATE COLUMN TYPE > * ALTER TABLE UPDATE COLUMN NULLABILITY > in the following DB2 JDBC dialect according to official documentation. > Write DB2 integration tests for JDBC. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-33081) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (DB2 dialect)
[ https://issues.apache.org/jira/browse/SPARK-33081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-33081: --- Comment: was deleted (was: This is done by smaller subtasks) > Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of > columns (DB2 dialect) > -- > > Key: SPARK-33081 > URL: https://issues.apache.org/jira/browse/SPARK-33081 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > Override the default SQL strings for: > * ALTER TABLE UPDATE COLUMN TYPE > * ALTER TABLE UPDATE COLUMN NULLABILITY > in the following DB2 JDBC dialect according to official documentation. > Write DB2 integration tests for JDBC. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33081) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (DB2 dialect)
[ https://issues.apache.org/jira/browse/SPARK-33081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211144#comment-17211144 ] Huaxin Gao commented on SPARK-33081: This is done by smaller subtasks > Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of > columns (DB2 dialect) > -- > > Key: SPARK-33081 > URL: https://issues.apache.org/jira/browse/SPARK-33081 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > Override the default SQL strings for: > * ALTER TABLE UPDATE COLUMN TYPE > * ALTER TABLE UPDATE COLUMN NULLABILITY > in the following DB2 JDBC dialect according to official documentation. > Write DB2 integration tests for JDBC. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33081) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (DB2 dialect)
[ https://issues.apache.org/jira/browse/SPARK-33081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-33081. Resolution: Not A Problem > Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of > columns (DB2 dialect) > -- > > Key: SPARK-33081 > URL: https://issues.apache.org/jira/browse/SPARK-33081 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > Override the default SQL strings for: > * ALTER TABLE UPDATE COLUMN TYPE > * ALTER TABLE UPDATE COLUMN NULLABILITY > in the following DB2 JDBC dialect according to official documentation. > Write DB2 integration tests for JDBC. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33102) Use stringToSeq on SQL list typed parameters
[ https://issues.apache.org/jira/browse/SPARK-33102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1723#comment-1723 ] Gabor Somogyi commented on SPARK-33102: --- Filing a PR soon... > Use stringToSeq on SQL list typed parameters > > > Key: SPARK-33102 > URL: https://issues.apache.org/jira/browse/SPARK-33102 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33102) Use stringToSeq on SQL list typed parameters
Gabor Somogyi created SPARK-33102: - Summary: Use stringToSeq on SQL list typed parameters Key: SPARK-33102 URL: https://issues.apache.org/jira/browse/SPARK-33102 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Reporter: Gabor Somogyi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25080) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
[ https://issues.apache.org/jira/browse/SPARK-25080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211011#comment-17211011 ] Anika Kelhanka edited comment on SPARK-25080 at 10/9/20, 2:24 PM: -- I am able to produce this issue while querying a external Hive on parquet table from spark shell in Spark 2.4. The scenarios is: Certain decimal fields in parquet have value higher than the precision defined in hive table. Basically, Parquet has a value that needs to be converted to a target type with not enough precision. scala> val df = spark.sql("select 'dummy' as name, 100010.7010 as value") scala> df.write.mode("Overwrite").parquet("/my/hdfs/location/test") hive> create external table db1.test_precision(name string, value Decimal(18,6)) STORED As PARQUET LOCATION '/my/hdfs/location/test'; scala> spark.conf.set("spark.sql.hive.convertMetastoreParquet","false") scala> val df_hive = spark.sql("select * from db_gwm_morph_mrd.test_precision") scala> df_hive.show 20/10/09 09:33:12 WARN hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl 20/10/09 09:33:12 ERROR executor.Executor: Exception in task 0.0 in stage 5.0 (TID 5) java.lang.NullPointerException at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:107) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:415) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:443) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:434) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) was (Author: anikakelhanka): I am able to produce this issue while querying a external Hive on parquet table with value is higher than the precision hive table is defined with (significant side) from spark shell in Spark 2.4. Th scala> val df = spark.sql("select 'dummy' as name, 100010.7010 as value") scala> df.write.mode("Overwrite").parquet("/my/hdfs/location/test") hive> create external table db1.test_precision(name string, value Decimal(18,6)) STORED As PARQUET LOCATION '/my/hdfs/location/test'; scala> spark.conf.set("spark.sql.hive.convertMetastoreParquet","false") scala> val df_hive = spark.sql("select * from db_gwm_morph_mrd.test_precision") scala> df_hive.show 20/10/09 09:33:12 WARN hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl 20/10/09 09:33:12 ERROR executor.Executor: Exception in task 0.0 in stage 5.0 (TID 5) java.lang.NullPointerException at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:107) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:41
[jira] [Commented] (SPARK-33098) Exception when using 'in' to compare a partition column to a literal with the wrong type
[ https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211040#comment-17211040 ] Apache Spark commented on SPARK-33098: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/29988 > Exception when using 'in' to compare a partition column to a literal with the > wrong type > > > Key: SPARK-33098 > URL: https://issues.apache.org/jira/browse/SPARK-33098 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Bruce Robbins >Priority: Major > > Comparing a partition column against a literal with the wrong type works if > you use equality ('='). However, if you use 'in', you get: > {noformat} > MetaException(message:Filtering is supported only on partition keys of type > string) > {noformat} > For example: > {noformat} > spark-sql> create table test (a int) partitioned by (b int) stored as parquet; > Time taken: 0.323 seconds > spark-sql> insert into test values (1, 1), (1, 2), (2, 2); > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updated size to 418 > 20/10/08 19:57:14 WARN log: Updated size to 836 > Time taken: 2.124 seconds > spark-sql> -- this works, of course > spark-sql> select * from test where b in (2); > 1 2 > 2 2 > Time taken: 0.13 seconds, Fetched 2 row(s) > spark-sql> -- this also works (equals with wrong type) > spark-sql> select * from test where b = '2'; > 1 2 > 2 2 > Time taken: 0.132 seconds, Fetched 2 row(s) > spark-sql> -- this does not work ('in' with wrong type) > spark-sql> select * from test where b in ('2'); > 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b > in ('2')] > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > - > Caused by: MetaException(message:Filtering is supported only on partition > keys of type string) > {noformat} > There are also interesting variations of this using the dataframe API: > {noformat} > scala> sql("select cast(b as string) as b from test where b in > (2)").show(false) > +---+ > |b | > +---+ > |2 | > |2 | > +---+ > scala> sql("select cast(b as string) as b from test").filter("b in > (2)").show(false) > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is > supported only on partition keys of type string > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33098) Exception when using 'in' to compare a partition column to a literal with the wrong type
[ https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211043#comment-17211043 ] Apache Spark commented on SPARK-33098: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/29988 > Exception when using 'in' to compare a partition column to a literal with the > wrong type > > > Key: SPARK-33098 > URL: https://issues.apache.org/jira/browse/SPARK-33098 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Bruce Robbins >Priority: Major > > Comparing a partition column against a literal with the wrong type works if > you use equality ('='). However, if you use 'in', you get: > {noformat} > MetaException(message:Filtering is supported only on partition keys of type > string) > {noformat} > For example: > {noformat} > spark-sql> create table test (a int) partitioned by (b int) stored as parquet; > Time taken: 0.323 seconds > spark-sql> insert into test values (1, 1), (1, 2), (2, 2); > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updated size to 418 > 20/10/08 19:57:14 WARN log: Updated size to 836 > Time taken: 2.124 seconds > spark-sql> -- this works, of course > spark-sql> select * from test where b in (2); > 1 2 > 2 2 > Time taken: 0.13 seconds, Fetched 2 row(s) > spark-sql> -- this also works (equals with wrong type) > spark-sql> select * from test where b = '2'; > 1 2 > 2 2 > Time taken: 0.132 seconds, Fetched 2 row(s) > spark-sql> -- this does not work ('in' with wrong type) > spark-sql> select * from test where b in ('2'); > 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b > in ('2')] > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > - > Caused by: MetaException(message:Filtering is supported only on partition > keys of type string) > {noformat} > There are also interesting variations of this using the dataframe API: > {noformat} > scala> sql("select cast(b as string) as b from test where b in > (2)").show(false) > +---+ > |b | > +---+ > |2 | > |2 | > +---+ > scala> sql("select cast(b as string) as b from test").filter("b in > (2)").show(false) > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is > supported only on partition keys of type string > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33098) Exception when using 'in' to compare a partition column to a literal with the wrong type
[ https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33098: Assignee: (was: Apache Spark) > Exception when using 'in' to compare a partition column to a literal with the > wrong type > > > Key: SPARK-33098 > URL: https://issues.apache.org/jira/browse/SPARK-33098 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Bruce Robbins >Priority: Major > > Comparing a partition column against a literal with the wrong type works if > you use equality ('='). However, if you use 'in', you get: > {noformat} > MetaException(message:Filtering is supported only on partition keys of type > string) > {noformat} > For example: > {noformat} > spark-sql> create table test (a int) partitioned by (b int) stored as parquet; > Time taken: 0.323 seconds > spark-sql> insert into test values (1, 1), (1, 2), (2, 2); > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updated size to 418 > 20/10/08 19:57:14 WARN log: Updated size to 836 > Time taken: 2.124 seconds > spark-sql> -- this works, of course > spark-sql> select * from test where b in (2); > 1 2 > 2 2 > Time taken: 0.13 seconds, Fetched 2 row(s) > spark-sql> -- this also works (equals with wrong type) > spark-sql> select * from test where b = '2'; > 1 2 > 2 2 > Time taken: 0.132 seconds, Fetched 2 row(s) > spark-sql> -- this does not work ('in' with wrong type) > spark-sql> select * from test where b in ('2'); > 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b > in ('2')] > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > - > Caused by: MetaException(message:Filtering is supported only on partition > keys of type string) > {noformat} > There are also interesting variations of this using the dataframe API: > {noformat} > scala> sql("select cast(b as string) as b from test where b in > (2)").show(false) > +---+ > |b | > +---+ > |2 | > |2 | > +---+ > scala> sql("select cast(b as string) as b from test").filter("b in > (2)").show(false) > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is > supported only on partition keys of type string > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33098) Exception when using 'in' to compare a partition column to a literal with the wrong type
[ https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33098: Assignee: Apache Spark > Exception when using 'in' to compare a partition column to a literal with the > wrong type > > > Key: SPARK-33098 > URL: https://issues.apache.org/jira/browse/SPARK-33098 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Bruce Robbins >Assignee: Apache Spark >Priority: Major > > Comparing a partition column against a literal with the wrong type works if > you use equality ('='). However, if you use 'in', you get: > {noformat} > MetaException(message:Filtering is supported only on partition keys of type > string) > {noformat} > For example: > {noformat} > spark-sql> create table test (a int) partitioned by (b int) stored as parquet; > Time taken: 0.323 seconds > spark-sql> insert into test values (1, 1), (1, 2), (2, 2); > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updated size to 418 > 20/10/08 19:57:14 WARN log: Updated size to 836 > Time taken: 2.124 seconds > spark-sql> -- this works, of course > spark-sql> select * from test where b in (2); > 1 2 > 2 2 > Time taken: 0.13 seconds, Fetched 2 row(s) > spark-sql> -- this also works (equals with wrong type) > spark-sql> select * from test where b = '2'; > 1 2 > 2 2 > Time taken: 0.132 seconds, Fetched 2 row(s) > spark-sql> -- this does not work ('in' with wrong type) > spark-sql> select * from test where b in ('2'); > 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b > in ('2')] > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > - > Caused by: MetaException(message:Filtering is supported only on partition > keys of type string) > {noformat} > There are also interesting variations of this using the dataframe API: > {noformat} > scala> sql("select cast(b as string) as b from test where b in > (2)").show(false) > +---+ > |b | > +---+ > |2 | > |2 | > +---+ > scala> sql("select cast(b as string) as b from test").filter("b in > (2)").show(false) > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is > supported only on partition keys of type string > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33094) ORC format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211025#comment-17211025 ] Apache Spark commented on SPARK-33094: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/29987 > ORC format does not propagate Hadoop config from DS options to underlying > HDFS file system > -- > > Key: SPARK-33094 > URL: https://issues.apache.org/jira/browse/SPARK-33094 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > When running: > {code:java} > spark.read.format("orc").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33094) ORC format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211022#comment-17211022 ] Apache Spark commented on SPARK-33094: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/29987 > ORC format does not propagate Hadoop config from DS options to underlying > HDFS file system > -- > > Key: SPARK-33094 > URL: https://issues.apache.org/jira/browse/SPARK-33094 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > When running: > {code:java} > spark.read.format("orc").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25080) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
[ https://issues.apache.org/jira/browse/SPARK-25080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211011#comment-17211011 ] Anika Kelhanka commented on SPARK-25080: I am able to produce this issue while querying a external Hive on parquet table with value is higher than the precision hive table is defined with (significant side) from spark shell in Spark 2.4. Th scala> val df = spark.sql("select 'dummy' as name, 100010.7010 as value") scala> df.write.mode("Overwrite").parquet("/my/hdfs/location/test") hive> create external table db1.test_precision(name string, value Decimal(18,6)) STORED As PARQUET LOCATION '/my/hdfs/location/test'; scala> spark.conf.set("spark.sql.hive.convertMetastoreParquet","false") scala> val df_hive = spark.sql("select * from db_gwm_morph_mrd.test_precision") scala> df_hive.show 20/10/09 09:33:12 WARN hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl 20/10/09 09:33:12 ERROR executor.Executor: Exception in task 0.0 in stage 5.0 (TID 5) java.lang.NullPointerException at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:107) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:415) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:443) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:434) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) > NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110) > -- > > Key: SPARK-25080 > URL: https://issues.apache.org/jira/browse/SPARK-25080 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 2.3.1 > Environment: AWS EMR >Reporter: Andrew K Long >Priority: Minor > > NPE while reading hive table. > > ``` > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost task > 1190.3 in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, executor > 487): java.lang.NullPointerException > at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > a
[jira] [Comment Edited] (SPARK-32924) Web UI sort on duration is wrong
[ https://issues.apache.org/jira/browse/SPARK-32924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210977#comment-17210977 ] Rakesh Raushan edited comment on SPARK-32924 at 10/9/20, 1:49 PM: -- I think its due to string sorting. One similar issue is fixed here SPARK-31983 was (Author: rakson): I thinking its due to string sorting. One similar issue is fixed here [SPARK-31983|https://issues.apache.org/jira/browse/SPARK-31983] > Web UI sort on duration is wrong > > > Key: SPARK-32924 > URL: https://issues.apache.org/jira/browse/SPARK-32924 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.6 >Reporter: t oo >Priority: Major > Attachments: ui_sort.png > > > See attachment, 9 s(econds) is showing as larger than 8.1min -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32924) Web UI sort on duration is wrong
[ https://issues.apache.org/jira/browse/SPARK-32924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210977#comment-17210977 ] Rakesh Raushan commented on SPARK-32924: I thinking its due to string sorting. One similar issue is fixed here [SPARK-31983|https://issues.apache.org/jira/browse/SPARK-31983] > Web UI sort on duration is wrong > > > Key: SPARK-32924 > URL: https://issues.apache.org/jira/browse/SPARK-32924 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.6 >Reporter: t oo >Priority: Major > Attachments: ui_sort.png > > > See attachment, 9 s(econds) is showing as larger than 8.1min -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33094) ORC format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33094: - Fix Version/s: 3.0.2 > ORC format does not propagate Hadoop config from DS options to underlying > HDFS file system > -- > > Key: SPARK-33094 > URL: https://issues.apache.org/jira/browse/SPARK-33094 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > When running: > {code:java} > spark.read.format("orc").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33101) LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33101: - Fix Version/s: 3.0.2 2.4.8 > LibSVM format does not propagate Hadoop config from DS options to underlying > HDFS file system > - > > Key: SPARK-33101 > URL: https://issues.apache.org/jira/browse/SPARK-33101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 2.4.8, 3.0.2, 3.1.0 > > > When running: > {code:java} > spark.read.format("libsvm").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33101) LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210808#comment-17210808 ] Apache Spark commented on SPARK-33101: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/29986 > LibSVM format does not propagate Hadoop config from DS options to underlying > HDFS file system > - > > Key: SPARK-33101 > URL: https://issues.apache.org/jira/browse/SPARK-33101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > When running: > {code:java} > spark.read.format("libsvm").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33101) LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210807#comment-17210807 ] Apache Spark commented on SPARK-33101: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/29986 > LibSVM format does not propagate Hadoop config from DS options to underlying > HDFS file system > - > > Key: SPARK-33101 > URL: https://issues.apache.org/jira/browse/SPARK-33101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > When running: > {code:java} > spark.read.format("libsvm").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33094) ORC format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210799#comment-17210799 ] Apache Spark commented on SPARK-33094: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/29985 > ORC format does not propagate Hadoop config from DS options to underlying > HDFS file system > -- > > Key: SPARK-33094 > URL: https://issues.apache.org/jira/browse/SPARK-33094 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > When running: > {code:java} > spark.read.format("orc").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33094) ORC format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210798#comment-17210798 ] Apache Spark commented on SPARK-33094: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/29985 > ORC format does not propagate Hadoop config from DS options to underlying > HDFS file system > -- > > Key: SPARK-33094 > URL: https://issues.apache.org/jira/browse/SPARK-33094 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > When running: > {code:java} > spark.read.format("orc").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32896) Add DataStreamWriter.table API
[ https://issues.apache.org/jira/browse/SPARK-32896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32896. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29767 [https://github.com/apache/spark/pull/29767] > Add DataStreamWriter.table API > -- > > Key: SPARK-32896 > URL: https://issues.apache.org/jira/browse/SPARK-32896 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.1.0 > > > For now, there's no way to write to the table (especially catalog table) even > the table is capable to handle streaming write. > We can add DataStreamWriter.table API to let end users specify table as > provider, and let streaming query write into the table. That is just to > specify the table, and the overall usage of DataStreamWriter isn't changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33099) Respect executor idle timeout conf in ExecutorPodsAllocator
[ https://issues.apache.org/jira/browse/SPARK-33099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33099. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29981 [https://github.com/apache/spark/pull/29981] > Respect executor idle timeout conf in ExecutorPodsAllocator > --- > > Key: SPARK-33099 > URL: https://issues.apache.org/jira/browse/SPARK-33099 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33099) Respect executor idle timeout conf in ExecutorPodsAllocator
[ https://issues.apache.org/jira/browse/SPARK-33099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33099: - Assignee: Dongjoon Hyun > Respect executor idle timeout conf in ExecutorPodsAllocator > --- > > Key: SPARK-33099 > URL: https://issues.apache.org/jira/browse/SPARK-33099 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33101) LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33101. --- Resolution: Fixed Issue resolved by pull request 29984 [https://github.com/apache/spark/pull/29984] > LibSVM format does not propagate Hadoop config from DS options to underlying > HDFS file system > - > > Key: SPARK-33101 > URL: https://issues.apache.org/jira/browse/SPARK-33101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > When running: > {code:java} > spark.read.format("libsvm").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33093) Why do my Spark 3 jobs fail to use external shuffle service on YARN?
[ https://issues.apache.org/jira/browse/SPARK-33093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210715#comment-17210715 ] Julien commented on SPARK-33093: That worked, [~yumwang]; thanks! > Why do my Spark 3 jobs fail to use external shuffle service on YARN? > > > Key: SPARK-33093 > URL: https://issues.apache.org/jira/browse/SPARK-33093 > Project: Spark > Issue Type: Question > Components: Deploy, Java API >Affects Versions: 3.0.0 >Reporter: Julien >Priority: Minor > > We are running a Spark-on-YARN setup, where each client uploads their own > Spark JARs for their job, to run in YARN executors. YARN exposes a shuffle > service on every NodeManager's 7337 port, and clients enable use of that. > This has worked for a while, with clients using Spark 2 JARs, but we are > seeing issues when clients attempt to use Spark 3 JAR. When shuffling is > either disabled, or enabled but no use of the shuffle service is made, things > seems to continue working in Spark 3. > When a Spark 3 job attempts to use the external service, we get a stack-trace > that looks like this: > {noformat}java.lang.IllegalArgumentException: Unknown message type: 10 > at > org.apache.spark.network.shuffle.protocol.BlockTransferMessage$Decoder.fromByteBuffer(BlockTransferMessage.java:67) > at > org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:71) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:154) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at ...{noformat} > Message type 10 was introduced as of SPARK-27651, released in Spark 3.0.0; > this error hints at an older version of > {{BlockTransferMessage$Decoder.fromByteBuffer}} being used. > {{ExternalShuffleBlockHandler}} was renamed to {{ExternalBlockHandler}} as of > SPARK-28593, also released in Spark 3.0.0; this stack-trace hints at an older > JAR being loaded. > Our current Hadoop setup (Cloudera CDH parcels) is very likely to be > polluting the class-path with older JARs. Trying to figure out where the old > JARs come from, I added {{-verbose:class}} to the executor options, to log > all class loading. > This is where things get interesting: there is no mention of the old > {{ExternalShuffleBlockHandler}} class anywhere, and > {{BlockTransferMessage$Decoder}} is reported as loaded from the Spark 3 JARs: > {noformat}grep -E > 'org.apache.spark.network.shuffle.protocol.BlockTransferMessage|org.apache.spark.network.shuffle.ExternalShuffleBlockHandler|org.apache.spark.network.server.TransportRequestHandler|org.apache.spark.network.server.TransportChannelHandler|org.apache.spark.network.shuffle.ExternalBlockHandler' > example_shuffle_stdout.txt > [Loaded org.apache.spark.network.server.TransportRequestHandler from > file:/hadoop/2/yarn/nm/filecache/0/2170513/spark-network-common_2.12-3.0.0.jar] > [Loaded org.apache.spark.network.server.TransportChannelHandler from > file:/hadoop/2/yarn/nm/filecache/0/2170513/spark-network-common_2.12-3.0.0.jar] > [Loaded org.apache.spark.network.shuffle.protocol.BlockTransferMessage from > file:/hadoop/1/yarn/nm/filecache/0/2170571/spark-network-shuffle_2.12-3.0.0.jar] > [Loaded org.apache.spark.network.shuffle.protocol.BlockTransferMessage$Type > from > file:/hadoop/1/yarn/nm/filecache/0/2170571/spark-network-shuffle_2.12-3.0.0.jar] > [Loaded > org.apache.spark.network.shuffle.protocol.BlockTransferMessage$Decoder from > file:/hadoop/1/yarn/nm/filecache/0/2170571/spark-network-shuffle_2.12-3.0.0.jar] > [Loaded org.apache.spark.network.server.TransportRequestHandler$1 from > file:/hadoop/2/yarn/nm/filecache/0/2170513/spark-network-common_2.12-3.0.0.jar] > [Loaded > org.apache.spark.network.server.TransportRequestHandler$$Lambda$666/376989599 > from org.apache.spark.network.server.TransportRequestHandler]{noformat} > I do not know how this is possible: > - is the executor reporting a stack-trace that comes from another process > rather than itself? > - are old classes loaded without being reported by {{-verbose:class}}? > I'm not sure how to investigate this further, as I failed to locate precisely > how the instance of {{RpcHandler}} is injected into the > {{TransportRequestHandler}} for my executors. > I did try setting {{spark.executor.userClas
[jira] [Commented] (SPARK-33101) LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210692#comment-17210692 ] Apache Spark commented on SPARK-33101: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/29984 > LibSVM format does not propagate Hadoop config from DS options to underlying > HDFS file system > - > > Key: SPARK-33101 > URL: https://issues.apache.org/jira/browse/SPARK-33101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > When running: > {code:java} > spark.read.format("libsvm").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33101) LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210694#comment-17210694 ] Apache Spark commented on SPARK-33101: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/29984 > LibSVM format does not propagate Hadoop config from DS options to underlying > HDFS file system > - > > Key: SPARK-33101 > URL: https://issues.apache.org/jira/browse/SPARK-33101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > When running: > {code:java} > spark.read.format("libsvm").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33101) LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33101: Assignee: Maxim Gekk (was: Apache Spark) > LibSVM format does not propagate Hadoop config from DS options to underlying > HDFS file system > - > > Key: SPARK-33101 > URL: https://issues.apache.org/jira/browse/SPARK-33101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > When running: > {code:java} > spark.read.format("libsvm").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33101) LibSVM format does not propagate Hadoop config from DS options to underlying HDFS file system
[ https://issues.apache.org/jira/browse/SPARK-33101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33101: Assignee: Apache Spark (was: Maxim Gekk) > LibSVM format does not propagate Hadoop config from DS options to underlying > HDFS file system > - > > Key: SPARK-33101 > URL: https://issues.apache.org/jira/browse/SPARK-33101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > Fix For: 3.1.0 > > > When running: > {code:java} > spark.read.format("libsvm").options(conf).load(path) > {code} > The underlying file system will not receive the `conf` options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13860) TPCDS query 39 returns wrong results compared to TPC official result set
[ https://issues.apache.org/jira/browse/SPARK-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210691#comment-17210691 ] Apache Spark commented on SPARK-13860: -- User 'leanken' has created a pull request for this issue: https://github.com/apache/spark/pull/29983 > TPCDS query 39 returns wrong results compared to TPC official result set > - > > Key: SPARK-13860 > URL: https://issues.apache.org/jira/browse/SPARK-13860 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 2.1.1, 2.2.0 >Reporter: JESSE CHEN >Priority: Major > Labels: bulk-closed, tpcds-result-mismatch > > Testing Spark SQL using TPC queries. Query 39 returns wrong results compared > to official result set. This is at 1GB SF (validation run). > q39a - 3 extra rows in SparkSQL output (eg. > [1,1155,1,184.0,NaN,1,1155,2,343.3,1.1700233592269733]) ; q39b > - 3 extra rows in SparkSQL output (eg. > [1,1155,1,184.0,NaN,1,1155,2,343.3,1.1700233592269733]) > Actual results 39a: > {noformat} > [1,265,1,324.75,1.2438391781531353,1,265,2,329.0,1.0151581328149208] > [1,363,1,499.5,1.031941572270649,1,363,2,321.0,1.1411766752007977] > [1,679,1,373.75,1.0955498064867504,1,679,2,417.5,1.042970994259454] > [1,695,1,450.75,1.0835888283564505,1,695,2,368.75,1.1356494125569416] > [1,789,1,357.25,1.03450938027956,1,789,2,410.0,1.0284221852702604] > [1,815,1,216.5,1.1702270938111008,1,815,2,150.5,1.3057281471249382] > [1,827,1,271.75,1.1046890134130438,1,827,2,424.75,1.1653198631238286] > [1,1041,1,382.5,1.284808399803008,1,1041,2,424.75,1.000577271456812] > [1,1155,1,184.0,NaN,1,1155,2,343.3,1.1700233592269733] > [1,1569,1,212.0,1.630213519639535,1,1569,2,239.25,1.2641513267800557] > [1,1623,1,338.25,1.1285483279713715,1,1623,2,261.3,1.2717809002195564] > [1,2581,1,448.5,1.060429041250449,1,2581,2,476.25,1.0362984739390064] > [1,2705,1,246.25,1.0120308357959693,1,2705,2,294.7,1.0742134101583702] > [1,3131,1,393.75,1.0037613982687346,1,3131,2,480.5,1.0669144981482768] > [1,3291,1,374.5,1.195189833087008,1,3291,2,265.25,1.572972106948466] > [1,3687,1,279.75,1.4260909081999698,1,3687,2,157.25,1.4534340882531784] > [1,4955,1,495.25,1.0318296151625301,1,4955,2,322.5,1.1693842343776149] > [1,5627,1,282.75,1.5657032366359889,1,5627,2,297.5,1.2084286841430678] > [1,7017,1,175.5,1.0427454215644427,1,7017,2,321.3,1.0183356932936254] > [1,7317,1,366.3,1.025466403613547,1,7317,2,378.0,1.2172513189920555] > [1,7569,1,430.5,1.0874396852180854,1,7569,2,360.25,1.047005559314515] > [1,7999,1,166.25,1.7924231710846223,1,7999,2,375.3,1.008092263550718] > [1,8319,1,306.75,1.1615378040478215,1,8319,2,276.0,1.1420996385609428] > [1,8443,1,327.75,1.256718374192724,1,8443,2,332.5,1.0044167259988928] > [1,8583,1,319.5,1.024108893111539,1,8583,2,310.25,1.2358813775861328] > [1,8591,1,398.0,1.1478168692042447,1,8591,2,355.75,1.0024472149348966] > [1,8611,1,300.5,1.5191545184147954,1,8611,2,243.75,1.2342122780960432] > [1,9081,1,367.0,1.0878932141280895,1,9081,2,435.0,1.0330530776324107] > [1,9357,1,351.7,1.1902922622025887,1,9357,2,427.0,1.0438583026358363] > [1,9449,1,406.25,1.0183183104803557,1,9449,2,175.0,1.0544779796296408] > [1,9713,1,242.5,1.1035044355064203,1,9713,2,393.0,1.208474608738988] > [1,9809,1,479.0,1.0189602512117633,1,9809,2,317.5,1.0614142074924882] > [1,9993,1,417.75,1.0099832672435247,1,9993,2,204.5,1.552870745350107] > [1,10127,1,239.75,1.0561770587198123,1,10127,2,359.25,1.1857980403742183] > [1,11159,1,407.25,1.0785507154337637,1,11159,2,250.0,1.334757905639321] > [1,11277,1,211.25,1.2615858275316627,1,11277,2,330.75,1.0808767951625093] > [1,11937,1,344.5,1.085804026843784,1,11937,2,200.34,1.0638527063883725] > [1,12373,1,387.75,1.1014904822941258,1,12373,2,306.0,1.0761744390394028] > [1,12471,1,365.25,1.0607570183728479,1,12471,2,327.25,1.0547560580567852] > [1,12625,1,279.0,1.3016560542373208,1,12625,2,443.25,1.0604958838068959] > [1,12751,1,280.75,1.10833057888089,1,12751,2,369.3,1.3416504398884601] > [1,12779,1,331.0,1.041690207320035,1,12779,2,359.0,1.028978056175258] > [1,13077,1,367.7,1.345523904195734,1,13077,2,358.7,1.5132429058096555] > [1,13191,1,260.25,1.063569632291568,1,13191,2,405.0,1.0197999172180061] > [1,13561,1,335.25,1.2609616961776389,1,13561,2,240.0,1.0513604502245155] > [1,13935,1,311.75,1.0399289695412326,1,13935,2,275.0,1.0367527180321774] > [1,14687,1,358.0,1.4369356919381713,1,14687,2,187.0,1.5493631531474956] > [1,14719,1,209.0,1.0411509639707628,1,14719,2,489.0,1.376616882800804] > [1,15345,1,148.5,1.5295784035794024,1,15345,2,246.5,1.5087987747231526] > [1,15427,1,482.75,1.0124238928335
[jira] [Commented] (SPARK-33098) Exception when using 'in' to compare a partition column to a literal with the wrong type
[ https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210689#comment-17210689 ] Peter Toth commented on SPARK-33098: I've started to look into this issue. > Exception when using 'in' to compare a partition column to a literal with the > wrong type > > > Key: SPARK-33098 > URL: https://issues.apache.org/jira/browse/SPARK-33098 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Bruce Robbins >Priority: Major > > Comparing a partition column against a literal with the wrong type works if > you use equality ('='). However, if you use 'in', you get: > {noformat} > MetaException(message:Filtering is supported only on partition keys of type > string) > {noformat} > For example: > {noformat} > spark-sql> create table test (a int) partitioned by (b int) stored as parquet; > Time taken: 0.323 seconds > spark-sql> insert into test values (1, 1), (1, 2), (2, 2); > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updated size to 418 > 20/10/08 19:57:14 WARN log: Updated size to 836 > Time taken: 2.124 seconds > spark-sql> -- this works, of course > spark-sql> select * from test where b in (2); > 1 2 > 2 2 > Time taken: 0.13 seconds, Fetched 2 row(s) > spark-sql> -- this also works (equals with wrong type) > spark-sql> select * from test where b = '2'; > 1 2 > 2 2 > Time taken: 0.132 seconds, Fetched 2 row(s) > spark-sql> -- this does not work ('in' with wrong type) > spark-sql> select * from test where b in ('2'); > 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b > in ('2')] > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > - > Caused by: MetaException(message:Filtering is supported only on partition > keys of type string) > {noformat} > There are also interesting variations of this using the dataframe API: > {noformat} > scala> sql("select cast(b as string) as b from test where b in > (2)").show(false) > +---+ > |b | > +---+ > |2 | > |2 | > +---+ > scala> sql("select cast(b as string) as b from test").filter("b in > (2)").show(false) > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is > supported only on partition keys of type string > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33100) Support parse the sql statements with c-style comments
[ https://issues.apache.org/jira/browse/SPARK-33100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang updated SPARK-33100: Description: Now the spark-sql does not support parse the sql statements with C-style comments. For the sql statements: {code:java} /* SELECT 'test'; */ SELECT 'test'; {code} Would be split to two statements: The first: "/* SELECT 'test'" The second: "*/ SELECT 'test'" Then it would throw an exception because the first one is illegal. was: Now the spark-sql does not support parse the sql statements with c-style coments. For example: For the sql statements: {code:java} /* SELECT 'test'; */ SELECT 'test'; {code} Would be split to two statements: The first: "/* SELECT 'test'" The second: "*/ SELECT 'test'" Then it would throw an exception because the first one is illegal. > Support parse the sql statements with c-style comments > -- > > Key: SPARK-33100 > URL: https://issues.apache.org/jira/browse/SPARK-33100 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: feiwang >Assignee: Apache Spark >Priority: Minor > > Now the spark-sql does not support parse the sql statements with C-style > comments. > For the sql statements: > {code:java} > /* SELECT 'test'; */ > SELECT 'test'; > {code} > Would be split to two statements: > The first: "/* SELECT 'test'" > The second: "*/ SELECT 'test'" > Then it would throw an exception because the first one is illegal. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org