[jira] [Commented] (SPARK-17110) Pyspark with locality ANY throw java.io.StreamCorruptedException
[ https://issues.apache.org/jira/browse/SPARK-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440786#comment-15440786 ] Miao Wang commented on SPARK-17110: --- I set up a two-node cluster, one master, one worker, 48 cores. 1G memory. pyspark run the above code works fine. No exception. It seems that this bug has been fix in latest master branch. Can you upgrade and try again? > Pyspark with locality ANY throw java.io.StreamCorruptedException > > > Key: SPARK-17110 > URL: https://issues.apache.org/jira/browse/SPARK-17110 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.0 > Environment: Cluster of 2 AWS r3.xlarge nodes launched via ec2 > scripts, Spark 2.0.0, hadoop: yarn, pyspark shell >Reporter: Tomer Kaftan >Priority: Critical > > In Pyspark 2.0.0, any task that accesses cached data non-locally throws a > StreamCorruptedException like the stacktrace below: > {noformat} > WARN TaskSetManager: Lost task 7.0 in stage 2.0 (TID 26, 172.31.26.184): > java.io.StreamCorruptedException: invalid stream header: 12010A80 > at > java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:807) > at java.io.ObjectInputStream.(ObjectInputStream.java:302) > at > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.(JavaSerializer.scala:63) > at > org.apache.spark.serializer.JavaDeserializationStream.(JavaSerializer.scala:63) > at > org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:122) > at > org.apache.spark.serializer.SerializerManager.dataDeserializeStream(SerializerManager.scala:146) > at > org.apache.spark.storage.BlockManager$$anonfun$getRemoteValues$1.apply(BlockManager.scala:524) > at > org.apache.spark.storage.BlockManager$$anonfun$getRemoteValues$1.apply(BlockManager.scala:522) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.storage.BlockManager.getRemoteValues(BlockManager.scala:522) > at org.apache.spark.storage.BlockManager.get(BlockManager.scala:609) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:661) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:281) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > The simplest way I have found to reproduce this is by running the following > code in the pyspark shell, on a cluster of 2 nodes set to use only one worker > core each: > {code} > x = sc.parallelize([1, 1, 1, 1, 1, 1000, 1, 1, 1], numSlices=9).cache() > x.count() > import time > def waitMap(x): > time.sleep(x) > return x > x.map(waitMap).count() > {code} > Or by running the following via spark-submit: > {code} > from pyspark import SparkContext > sc = SparkContext() > x = sc.parallelize([1, 1, 1, 1, 1, 1000, 1, 1, 1], numSlices=9).cache() > x.count() > import time > def waitMap(x): > time.sleep(x) > return x > x.map(waitMap).count() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17276) Stop environment parameters flooding Jenkins build output
[ https://issues.apache.org/jira/browse/SPARK-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Ren updated SPARK-17276: Attachment: Screen Shot 2016-08-26 at 10.52.07 PM.png > Stop environment parameters flooding Jenkins build output > - > > Key: SPARK-17276 > URL: https://issues.apache.org/jira/browse/SPARK-17276 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Tests >Affects Versions: 2.0.0 >Reporter: Xin Ren >Priority: Minor > Attachments: Screen Shot 2016-08-26 at 10.52.07 PM.png > > > When I was trying to find error msg in a failed Jenkins build job, annoyed by > the huge env output. > The env parameter output should be muted. > {code} > [info] PipedRDDSuite: > [info] - basic pipe (51 milliseconds) > 0 0 0 > [info] - basic pipe with tokenization (60 milliseconds) > [info] - failure in iterating over pipe input (49 milliseconds) > [info] - advanced pipe (100 milliseconds) > [info] - pipe with empty partition (117 milliseconds) > PATH=/home/anaconda/envs/py3k/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:/usr/java/jdk1.8.0_60/bin:/home/jenkins/tools/hudson.model.JDK/JDK_7u60/bin:/home/jenkins/.cargo/bin:/home/anaconda/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/:/home/android-sdk/:/usr/local/bin:/bin:/usr/bin:/home/anaconda/envs/py3k/bin > BUILD_CAUSE_GHPRBCAUSE=true > SBT_MAVEN_PROFILES=-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl > -Phive-thriftserver > HUDSON_HOME=/var/lib/jenkins > AWS_SECRET_ACCESS_KEY= > JOB_URL=https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ > HUDSON_COOKIE=638da3d2-d27a-4724-b41a-5ff6e8ce6752 > LINES=24 > CURRENT_BLOCK=18 > ANDROID_HOME=/home/android-sdk/ > ghprbActualCommit=70a751c6959048e65c083ab775b01523da4578a2 > ghprbSourceBranch=codeWalkThroughML > GITHUB_OAUTH_KEY= > MAIL=/var/mail/jenkins > AMPLAB_JENKINS=1 > JENKINS_SERVER_COOKIE=472906e9832aeb79 > ghprbPullTitle=[MINOR][MLlib][SQL] Clean up unused variables and unused import > LOGNAME=jenkins > PWD=/home/jenkins/workspace/SparkPullRequestBuilder > JENKINS_URL=https://amplab.cs.berkeley.edu/jenkins/ > SPARK_VERSIONS_SUITE_IVY_PATH=/home/sparkivy/per-executor-caches/9/.ivy2 > ROOT_BUILD_CAUSE_GHPRBCAUSE=true > ghprbActualCommitAuthorEmail=iamsh...@126.com > ghprbTargetBranch=master > BUILD_TAG=jenkins-SparkPullRequestBuilder-64504 > SHELL=/bin/bash > ROOT_BUILD_CAUSE=GHPRBCAUSE > SBT_OPTS=-Duser.home=/home/sparkivy/per-executor-caches/9 > -Dsbt.ivy.home=/home/sparkivy/per-executor-caches/9/.ivy2 > JENKINS_HOME=/var/lib/jenkins > sha1=origin/pr/14836/merge > ghprbPullDescription=GitHub pull request #14836 of commit > 70a751c6959048e65c083ab775b01523da4578a2 automatically merged. > NODE_NAME=amp-jenkins-worker-02 > BUILD_DISPLAY_NAME=#64504 > JAVA_7_HOME=/usr/java/jdk1.7.0_79 > GIT_BRANCH=codeWalkThroughML > SHLVL=3 > AMP_JENKINS_PRB=true > JAVA_HOME=/usr/java/jdk1.8.0_60 > JENKINS_MASTER_HOSTNAME=amp-jenkins-master > BUILD_ID=64504 > XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt > ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/14836 > JOB_NAME=SparkPullRequestBuilder > BUILD_CAUSE=GHPRBCAUSE > SPARK_SCALA_VERSION=2.11 > AWS_ACCESS_KEY_ID= > NODE_LABELS=amp-jenkins-worker-02 centos spark-compile spark-test > HUDSON_URL=https://amplab.cs.berkeley.edu/jenkins/ > SPARK_PREPEND_CLASSES=1 > COLUMNS=80 > WORKSPACE=/home/jenkins/workspace/SparkPullRequestBuilder > SPARK_TESTING=1 > _=/usr/java/jdk1.8.0_60/bin/java > GIT_COMMIT=b31b82bcc9d8767561ee720c9e7192252f4fd3fc > ghprbPullId=14836 > EXECUTOR_NUMBER=9 > SSH_CLIENT=192.168.10.10 44762 22 > HUDSON_SERVER_COOKIE=472906e9832aeb79 > cat: nonexistent_file: No such file or directory > cat: nonexistent_file: No such file or directory >
[jira] [Commented] (SPARK-17276) Stop environment parameters flooding Jenkins build output
[ https://issues.apache.org/jira/browse/SPARK-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440782#comment-15440782 ] Xin Ren commented on SPARK-17276: - I'm working on it. > Stop environment parameters flooding Jenkins build output > - > > Key: SPARK-17276 > URL: https://issues.apache.org/jira/browse/SPARK-17276 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Tests >Affects Versions: 2.0.0 >Reporter: Xin Ren >Priority: Minor > Attachments: Screen Shot 2016-08-26 at 10.52.07 PM.png > > > When I was trying to find error msg in a failed Jenkins build job, annoyed by > the huge env output. > The env parameter output should be muted. > {code} > [info] PipedRDDSuite: > [info] - basic pipe (51 milliseconds) > 0 0 0 > [info] - basic pipe with tokenization (60 milliseconds) > [info] - failure in iterating over pipe input (49 milliseconds) > [info] - advanced pipe (100 milliseconds) > [info] - pipe with empty partition (117 milliseconds) > PATH=/home/anaconda/envs/py3k/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:/usr/java/jdk1.8.0_60/bin:/home/jenkins/tools/hudson.model.JDK/JDK_7u60/bin:/home/jenkins/.cargo/bin:/home/anaconda/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/:/home/android-sdk/:/usr/local/bin:/bin:/usr/bin:/home/anaconda/envs/py3k/bin > BUILD_CAUSE_GHPRBCAUSE=true > SBT_MAVEN_PROFILES=-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl > -Phive-thriftserver > HUDSON_HOME=/var/lib/jenkins > AWS_SECRET_ACCESS_KEY= > JOB_URL=https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ > HUDSON_COOKIE=638da3d2-d27a-4724-b41a-5ff6e8ce6752 > LINES=24 > CURRENT_BLOCK=18 > ANDROID_HOME=/home/android-sdk/ > ghprbActualCommit=70a751c6959048e65c083ab775b01523da4578a2 > ghprbSourceBranch=codeWalkThroughML > GITHUB_OAUTH_KEY= > MAIL=/var/mail/jenkins > AMPLAB_JENKINS=1 > JENKINS_SERVER_COOKIE=472906e9832aeb79 > ghprbPullTitle=[MINOR][MLlib][SQL] Clean up unused variables and unused import > LOGNAME=jenkins > PWD=/home/jenkins/workspace/SparkPullRequestBuilder > JENKINS_URL=https://amplab.cs.berkeley.edu/jenkins/ > SPARK_VERSIONS_SUITE_IVY_PATH=/home/sparkivy/per-executor-caches/9/.ivy2 > ROOT_BUILD_CAUSE_GHPRBCAUSE=true > ghprbActualCommitAuthorEmail=iamsh...@126.com > ghprbTargetBranch=master > BUILD_TAG=jenkins-SparkPullRequestBuilder-64504 > SHELL=/bin/bash > ROOT_BUILD_CAUSE=GHPRBCAUSE > SBT_OPTS=-Duser.home=/home/sparkivy/per-executor-caches/9 > -Dsbt.ivy.home=/home/sparkivy/per-executor-caches/9/.ivy2 > JENKINS_HOME=/var/lib/jenkins > sha1=origin/pr/14836/merge > ghprbPullDescription=GitHub pull request #14836 of commit > 70a751c6959048e65c083ab775b01523da4578a2 automatically merged. > NODE_NAME=amp-jenkins-worker-02 > BUILD_DISPLAY_NAME=#64504 > JAVA_7_HOME=/usr/java/jdk1.7.0_79 > GIT_BRANCH=codeWalkThroughML > SHLVL=3 > AMP_JENKINS_PRB=true > JAVA_HOME=/usr/java/jdk1.8.0_60 > JENKINS_MASTER_HOSTNAME=amp-jenkins-master > BUILD_ID=64504 > XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt > ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/14836 > JOB_NAME=SparkPullRequestBuilder > BUILD_CAUSE=GHPRBCAUSE > SPARK_SCALA_VERSION=2.11 > AWS_ACCESS_KEY_ID= > NODE_LABELS=amp-jenkins-worker-02 centos spark-compile spark-test > HUDSON_URL=https://amplab.cs.berkeley.edu/jenkins/ > SPARK_PREPEND_CLASSES=1 > COLUMNS=80 > WORKSPACE=/home/jenkins/workspace/SparkPullRequestBuilder > SPARK_TESTING=1 > _=/usr/java/jdk1.8.0_60/bin/java > GIT_COMMIT=b31b82bcc9d8767561ee720c9e7192252f4fd3fc > ghprbPullId=14836 > EXECUTOR_NUMBER=9 > SSH_CLIENT=192.168.10.10 44762 22 > HUDSON_SERVER_COOKIE=472906e9832aeb79 > cat: nonexistent_file: No such file or directory > cat: nonexistent_file: No such file or directory >
[jira] [Created] (SPARK-17276) Stop environment parameters flooding Jenkins build output
Xin Ren created SPARK-17276: --- Summary: Stop environment parameters flooding Jenkins build output Key: SPARK-17276 URL: https://issues.apache.org/jira/browse/SPARK-17276 Project: Spark Issue Type: Improvement Components: Spark Core, Tests Affects Versions: 2.0.0 Reporter: Xin Ren Priority: Minor When I was trying to find error msg in a failed Jenkins build job, annoyed by the huge env output. The env parameter output should be muted. {code} [info] PipedRDDSuite: [info] - basic pipe (51 milliseconds) 0 0 0 [info] - basic pipe with tokenization (60 milliseconds) [info] - failure in iterating over pipe input (49 milliseconds) [info] - advanced pipe (100 milliseconds) [info] - pipe with empty partition (117 milliseconds) PATH=/home/anaconda/envs/py3k/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:/usr/java/jdk1.8.0_60/bin:/home/jenkins/tools/hudson.model.JDK/JDK_7u60/bin:/home/jenkins/.cargo/bin:/home/anaconda/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/:/home/android-sdk/:/usr/local/bin:/bin:/usr/bin:/home/anaconda/envs/py3k/bin BUILD_CAUSE_GHPRBCAUSE=true SBT_MAVEN_PROFILES=-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl -Phive-thriftserver HUDSON_HOME=/var/lib/jenkins AWS_SECRET_ACCESS_KEY= JOB_URL=https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ HUDSON_COOKIE=638da3d2-d27a-4724-b41a-5ff6e8ce6752 LINES=24 CURRENT_BLOCK=18 ANDROID_HOME=/home/android-sdk/ ghprbActualCommit=70a751c6959048e65c083ab775b01523da4578a2 ghprbSourceBranch=codeWalkThroughML GITHUB_OAUTH_KEY= MAIL=/var/mail/jenkins AMPLAB_JENKINS=1 JENKINS_SERVER_COOKIE=472906e9832aeb79 ghprbPullTitle=[MINOR][MLlib][SQL] Clean up unused variables and unused import LOGNAME=jenkins PWD=/home/jenkins/workspace/SparkPullRequestBuilder JENKINS_URL=https://amplab.cs.berkeley.edu/jenkins/ SPARK_VERSIONS_SUITE_IVY_PATH=/home/sparkivy/per-executor-caches/9/.ivy2 ROOT_BUILD_CAUSE_GHPRBCAUSE=true ghprbActualCommitAuthorEmail=iamsh...@126.com ghprbTargetBranch=master BUILD_TAG=jenkins-SparkPullRequestBuilder-64504 SHELL=/bin/bash ROOT_BUILD_CAUSE=GHPRBCAUSE SBT_OPTS=-Duser.home=/home/sparkivy/per-executor-caches/9 -Dsbt.ivy.home=/home/sparkivy/per-executor-caches/9/.ivy2 JENKINS_HOME=/var/lib/jenkins sha1=origin/pr/14836/merge ghprbPullDescription=GitHub pull request #14836 of commit 70a751c6959048e65c083ab775b01523da4578a2 automatically merged. NODE_NAME=amp-jenkins-worker-02 BUILD_DISPLAY_NAME=#64504 JAVA_7_HOME=/usr/java/jdk1.7.0_79 GIT_BRANCH=codeWalkThroughML SHLVL=3 AMP_JENKINS_PRB=true JAVA_HOME=/usr/java/jdk1.8.0_60 JENKINS_MASTER_HOSTNAME=amp-jenkins-master BUILD_ID=64504 XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/14836 JOB_NAME=SparkPullRequestBuilder BUILD_CAUSE=GHPRBCAUSE SPARK_SCALA_VERSION=2.11 AWS_ACCESS_KEY_ID= NODE_LABELS=amp-jenkins-worker-02 centos spark-compile spark-test HUDSON_URL=https://amplab.cs.berkeley.edu/jenkins/ SPARK_PREPEND_CLASSES=1 COLUMNS=80 WORKSPACE=/home/jenkins/workspace/SparkPullRequestBuilder SPARK_TESTING=1 _=/usr/java/jdk1.8.0_60/bin/java GIT_COMMIT=b31b82bcc9d8767561ee720c9e7192252f4fd3fc ghprbPullId=14836 EXECUTOR_NUMBER=9 SSH_CLIENT=192.168.10.10 44762 22 HUDSON_SERVER_COOKIE=472906e9832aeb79 cat: nonexistent_file: No such file or directory cat: nonexistent_file: No such file or directory
[jira] [Commented] (SPARK-17274) Move join optimizer rules into a separate file
[ https://issues.apache.org/jira/browse/SPARK-17274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440750#comment-15440750 ] Apache Spark commented on SPARK-17274: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/14846 > Move join optimizer rules into a separate file > -- > > Key: SPARK-17274 > URL: https://issues.apache.org/jira/browse/SPARK-17274 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17274) Move join optimizer rules into a separate file
[ https://issues.apache.org/jira/browse/SPARK-17274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17274: Assignee: Reynold Xin (was: Apache Spark) > Move join optimizer rules into a separate file > -- > > Key: SPARK-17274 > URL: https://issues.apache.org/jira/browse/SPARK-17274 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17274) Move join optimizer rules into a separate file
[ https://issues.apache.org/jira/browse/SPARK-17274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17274: Assignee: Apache Spark (was: Reynold Xin) > Move join optimizer rules into a separate file > -- > > Key: SPARK-17274 > URL: https://issues.apache.org/jira/browse/SPARK-17274 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17275) Flaky test: org.apache.spark.deploy.RPackageUtilsSuite.jars that don't exist are skipped and print warning
[ https://issues.apache.org/jira/browse/SPARK-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440747#comment-15440747 ] Yin Huai commented on SPARK-17275: -- cc [~felixcheung] [~shivaram] > Flaky test: org.apache.spark.deploy.RPackageUtilsSuite.jars that don't exist > are skipped and print warning > -- > > Key: SPARK-17275 > URL: https://issues.apache.org/jira/browse/SPARK-17275 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Yin Huai > > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1623/testReport/junit/org.apache.spark.deploy/RPackageUtilsSuite/jars_that_don_t_exist_are_skipped_and_print_warning/ > {code} > Error Message > java.io.IOException: Unable to delete directory > /home/jenkins/.ivy2/cache/a/mylib. > Stacktrace > sbt.ForkMain$ForkError: java.io.IOException: Unable to delete directory > /home/jenkins/.ivy2/cache/a/mylib. > at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1541) > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) > at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) > at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) > at > org.apache.spark.deploy.IvyTestUtils$.purgeLocalIvyCache(IvyTestUtils.scala:394) > at > org.apache.spark.deploy.IvyTestUtils$.withRepository(IvyTestUtils.scala:384) > at > org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply$mcV$sp(RPackageUtilsSuite.scala:103) > at > org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply(RPackageUtilsSuite.scala:100) > at > org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply(RPackageUtilsSuite.scala:100) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.deploy.RPackageUtilsSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(RPackageUtilsSuite.scala:38) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.deploy.RPackageUtilsSuite.runTest(RPackageUtilsSuite.scala:38) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:381) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:357) > at >
[jira] [Created] (SPARK-17275) Flaky test: org.apache.spark.deploy.RPackageUtilsSuite.jars that don't exist are skipped and print warning
Yin Huai created SPARK-17275: Summary: Flaky test: org.apache.spark.deploy.RPackageUtilsSuite.jars that don't exist are skipped and print warning Key: SPARK-17275 URL: https://issues.apache.org/jira/browse/SPARK-17275 Project: Spark Issue Type: Bug Components: SparkR Reporter: Yin Huai https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1623/testReport/junit/org.apache.spark.deploy/RPackageUtilsSuite/jars_that_don_t_exist_are_skipped_and_print_warning/ {code} Error Message java.io.IOException: Unable to delete directory /home/jenkins/.ivy2/cache/a/mylib. Stacktrace sbt.ForkMain$ForkError: java.io.IOException: Unable to delete directory /home/jenkins/.ivy2/cache/a/mylib. at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1541) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) at org.apache.spark.deploy.IvyTestUtils$.purgeLocalIvyCache(IvyTestUtils.scala:394) at org.apache.spark.deploy.IvyTestUtils$.withRepository(IvyTestUtils.scala:384) at org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply$mcV$sp(RPackageUtilsSuite.scala:103) at org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply(RPackageUtilsSuite.scala:100) at org.apache.spark.deploy.RPackageUtilsSuite$$anonfun$3.apply(RPackageUtilsSuite.scala:100) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.deploy.RPackageUtilsSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(RPackageUtilsSuite.scala:38) at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) at org.apache.spark.deploy.RPackageUtilsSuite.runTest(RPackageUtilsSuite.scala:38) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:381) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:357) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:502) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
[jira] [Created] (SPARK-17274) Move join optimizer rules into a separate file
Reynold Xin created SPARK-17274: --- Summary: Move join optimizer rules into a separate file Key: SPARK-17274 URL: https://issues.apache.org/jira/browse/SPARK-17274 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17273) Move expression optimizer rules into a separate file
[ https://issues.apache.org/jira/browse/SPARK-17273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17273: Assignee: Reynold Xin (was: Apache Spark) > Move expression optimizer rules into a separate file > > > Key: SPARK-17273 > URL: https://issues.apache.org/jira/browse/SPARK-17273 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17273) Move expression optimizer rules into a separate file
[ https://issues.apache.org/jira/browse/SPARK-17273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440742#comment-15440742 ] Apache Spark commented on SPARK-17273: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/14845 > Move expression optimizer rules into a separate file > > > Key: SPARK-17273 > URL: https://issues.apache.org/jira/browse/SPARK-17273 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17273) Move expression optimizer rules into a separate file
[ https://issues.apache.org/jira/browse/SPARK-17273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17273: Assignee: Apache Spark (was: Reynold Xin) > Move expression optimizer rules into a separate file > > > Key: SPARK-17273 > URL: https://issues.apache.org/jira/browse/SPARK-17273 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17272) Move subquery optimizer rules into its own file
[ https://issues.apache.org/jira/browse/SPARK-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17272: Assignee: Apache Spark (was: Reynold Xin) > Move subquery optimizer rules into its own file > --- > > Key: SPARK-17272 > URL: https://issues.apache.org/jira/browse/SPARK-17272 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17272) Move subquery optimizer rules into its own file
[ https://issues.apache.org/jira/browse/SPARK-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17272: Assignee: Reynold Xin (was: Apache Spark) > Move subquery optimizer rules into its own file > --- > > Key: SPARK-17272 > URL: https://issues.apache.org/jira/browse/SPARK-17272 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17273) Move expression optimizer rules into a separate file
Reynold Xin created SPARK-17273: --- Summary: Move expression optimizer rules into a separate file Key: SPARK-17273 URL: https://issues.apache.org/jira/browse/SPARK-17273 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17272) Move subquery optimizer rules into its own file
[ https://issues.apache.org/jira/browse/SPARK-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440733#comment-15440733 ] Apache Spark commented on SPARK-17272: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/14844 > Move subquery optimizer rules into its own file > --- > > Key: SPARK-17272 > URL: https://issues.apache.org/jira/browse/SPARK-17272 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17272) Move subquery optimizer rules into its own file
Reynold Xin created SPARK-17272: --- Summary: Move subquery optimizer rules into its own file Key: SPARK-17272 URL: https://issues.apache.org/jira/browse/SPARK-17272 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17270) Move object optimization rules into its own file
[ https://issues.apache.org/jira/browse/SPARK-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440725#comment-15440725 ] Apache Spark commented on SPARK-17270: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/14843 > Move object optimization rules into its own file > > > Key: SPARK-17270 > URL: https://issues.apache.org/jira/browse/SPARK-17270 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17269) Move finish analysis stage into its own file
[ https://issues.apache.org/jira/browse/SPARK-17269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-17269. - Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 > Move finish analysis stage into its own file > > > Key: SPARK-17269 > URL: https://issues.apache.org/jira/browse/SPARK-17269 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.1, 2.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17270) Move object optimization rules into its own file
[ https://issues.apache.org/jira/browse/SPARK-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-17270. -- Resolution: Fixed Fix Version/s: 2.1.0 https://github.com/apache/spark/pull/14839 has been merged to master. > Move object optimization rules into its own file > > > Key: SPARK-17270 > URL: https://issues.apache.org/jira/browse/SPARK-17270 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10747) add support for NULLS FIRST|LAST in ORDER BY clause
[ https://issues.apache.org/jira/browse/SPARK-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Wu updated SPARK-10747: --- Summary: add support for NULLS FIRST|LAST in ORDER BY clause (was: add support for window specification to include how NULLS are ordered) > add support for NULLS FIRST|LAST in ORDER BY clause > --- > > Key: SPARK-10747 > URL: https://issues.apache.org/jira/browse/SPARK-10747 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: N Campbell > > You cannot express how NULLS are to be sorted in the window order > specification and have to use a compensating expression to simulate. > Error: org.apache.spark.sql.AnalysisException: line 1:76 missing ) at 'nulls' > near 'nulls' > line 1:82 missing EOF at 'last' near 'nulls'; > SQLState: null > Same limitation as Hive reported in Apache JIRA HIVE-9535 ) > This fails > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc > nulls last) from tolap > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by case when > c3 is null then 1 else 0 end) from tolap -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17268) Break Optimizer.scala apart
[ https://issues.apache.org/jira/browse/SPARK-17268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-17268: Description: Optimizer.scala has become too large to maintain. We would need to break it apart into multiple files each of which contains rules that are logically relevant. We can create the following files for logical grouping: - finish analysis - joins - expressions - subquery - objects was: Optimizer.scala has become too large to maintain. We would need to break it apart into multiple files each of which contains rules that are logically relevant. > Break Optimizer.scala apart > --- > > Key: SPARK-17268 > URL: https://issues.apache.org/jira/browse/SPARK-17268 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > Optimizer.scala has become too large to maintain. We would need to break it > apart into multiple files each of which contains rules that are logically > relevant. > We can create the following files for logical grouping: > - finish analysis > - joins > - expressions > - subquery > - objects -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10747) add support for window specification to include how NULLS are ordered
[ https://issues.apache.org/jira/browse/SPARK-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440656#comment-15440656 ] Xin Wu commented on SPARK-10747: This JIRA may be changed to support NULLS FIRST|LAST feature in ORDER BY clause. > add support for window specification to include how NULLS are ordered > - > > Key: SPARK-10747 > URL: https://issues.apache.org/jira/browse/SPARK-10747 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: N Campbell > > You cannot express how NULLS are to be sorted in the window order > specification and have to use a compensating expression to simulate. > Error: org.apache.spark.sql.AnalysisException: line 1:76 missing ) at 'nulls' > near 'nulls' > line 1:82 missing EOF at 'last' near 'nulls'; > SQLState: null > Same limitation as Hive reported in Apache JIRA HIVE-9535 ) > This fails > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc > nulls last) from tolap > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by case when > c3 is null then 1 else 0 end) from tolap -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10747) add support for window specification to include how NULLS are ordered
[ https://issues.apache.org/jira/browse/SPARK-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10747: Assignee: Apache Spark > add support for window specification to include how NULLS are ordered > - > > Key: SPARK-10747 > URL: https://issues.apache.org/jira/browse/SPARK-10747 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: N Campbell >Assignee: Apache Spark > > You cannot express how NULLS are to be sorted in the window order > specification and have to use a compensating expression to simulate. > Error: org.apache.spark.sql.AnalysisException: line 1:76 missing ) at 'nulls' > near 'nulls' > line 1:82 missing EOF at 'last' near 'nulls'; > SQLState: null > Same limitation as Hive reported in Apache JIRA HIVE-9535 ) > This fails > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc > nulls last) from tolap > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by case when > c3 is null then 1 else 0 end) from tolap -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10747) add support for window specification to include how NULLS are ordered
[ https://issues.apache.org/jira/browse/SPARK-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440651#comment-15440651 ] Apache Spark commented on SPARK-10747: -- User 'xwu0226' has created a pull request for this issue: https://github.com/apache/spark/pull/14842 > add support for window specification to include how NULLS are ordered > - > > Key: SPARK-10747 > URL: https://issues.apache.org/jira/browse/SPARK-10747 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: N Campbell > > You cannot express how NULLS are to be sorted in the window order > specification and have to use a compensating expression to simulate. > Error: org.apache.spark.sql.AnalysisException: line 1:76 missing ) at 'nulls' > near 'nulls' > line 1:82 missing EOF at 'last' near 'nulls'; > SQLState: null > Same limitation as Hive reported in Apache JIRA HIVE-9535 ) > This fails > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc > nulls last) from tolap > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by case when > c3 is null then 1 else 0 end) from tolap -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10747) add support for window specification to include how NULLS are ordered
[ https://issues.apache.org/jira/browse/SPARK-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10747: Assignee: (was: Apache Spark) > add support for window specification to include how NULLS are ordered > - > > Key: SPARK-10747 > URL: https://issues.apache.org/jira/browse/SPARK-10747 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: N Campbell > > You cannot express how NULLS are to be sorted in the window order > specification and have to use a compensating expression to simulate. > Error: org.apache.spark.sql.AnalysisException: line 1:76 missing ) at 'nulls' > near 'nulls' > line 1:82 missing EOF at 'last' near 'nulls'; > SQLState: null > Same limitation as Hive reported in Apache JIRA HIVE-9535 ) > This fails > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc > nulls last) from tolap > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by case when > c3 is null then 1 else 0 end) from tolap -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10747) add support for window specification to include how NULLS are ordered
[ https://issues.apache.org/jira/browse/SPARK-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Wu updated SPARK-10747: --- Issue Type: New Feature (was: Improvement) > add support for window specification to include how NULLS are ordered > - > > Key: SPARK-10747 > URL: https://issues.apache.org/jira/browse/SPARK-10747 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.0 >Reporter: N Campbell > > You cannot express how NULLS are to be sorted in the window order > specification and have to use a compensating expression to simulate. > Error: org.apache.spark.sql.AnalysisException: line 1:76 missing ) at 'nulls' > near 'nulls' > line 1:82 missing EOF at 'last' near 'nulls'; > SQLState: null > Same limitation as Hive reported in Apache JIRA HIVE-9535 ) > This fails > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc > nulls last) from tolap > select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by case when > c3 is null then 1 else 0 end) from tolap -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16963) Change Source API so that sources do not need to keep unbounded state
[ https://issues.apache.org/jira/browse/SPARK-16963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440598#comment-15440598 ] Frederick Reiss commented on SPARK-16963: - Updated the pull request to address some conflicting changes in the main branch and to address some minor review comments. Changed the name of `getMinOffset` to `lastCommittedOffset` per Prashant's comments. Changes are still ready for review. > Change Source API so that sources do not need to keep unbounded state > - > > Key: SPARK-16963 > URL: https://issues.apache.org/jira/browse/SPARK-16963 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 2.0.0 >Reporter: Frederick Reiss > > The version of the Source API in Spark 2.0.0 defines a single getBatch() > method for fetching records from the source, with the following Scaladoc > comments defining the semantics: > {noformat} > /** > * Returns the data that is between the offsets (`start`, `end`]. When > `start` is `None` then > * the batch should begin with the first available record. This method must > always return the > * same data for a particular `start` and `end` pair. > */ > def getBatch(start: Option[Offset], end: Offset): DataFrame > {noformat} > These semantics mean that a Source must retain all past history for the > stream that it backs. Further, a Source is also required to retain this data > across restarts of the process where the Source is instantiated, even when > the Source is restarted on a different machine. > These restrictions make it difficult to implement the Source API, as any > implementation requires potentially unbounded amounts of distributed storage. > See the mailing list thread at > [http://apache-spark-developers-list.1001551.n3.nabble.com/Source-API-requires-unbounded-distributed-storage-td18551.html] > for more information. > This JIRA will cover augmenting the Source API with an additional callback > that will allow Structured Streaming scheduler to notify the source when it > is safe to discard buffered data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17271) Planner adds un-necessary Sort even if child ordering is semantically same as required ordering
[ https://issues.apache.org/jira/browse/SPARK-17271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17271: Assignee: Apache Spark > Planner adds un-necessary Sort even if child ordering is semantically same as > required ordering > --- > > Key: SPARK-17271 > URL: https://issues.apache.org/jira/browse/SPARK-17271 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.0.0 >Reporter: Tejas Patil >Assignee: Apache Spark > > Found a case when the planner is adding un-needed SORT operation due to bug > in the way comparison for `SortOrder` is done at > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253 > `SortOrder` needs to be compared semantically because `Expression` within two > `SortOrder` can be "semantically equal" but not literally equal objects. > eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")` > Expression in required SortOrder: > {code} > AttributeReference( > name = "col1", > dataType = LongType, > nullable = false > ) (exprId = exprId, > qualifier = Some("a") > ) > {code} > Expression in child SortOrder: > {code} > AttributeReference( > name = "col1", > dataType = LongType, > nullable = false > ) (exprId = exprId) > {code} > Notice that the output column has a qualifier but the child attribute does > not but the inherent expression is the same and hence in this case we can say > that the child satisfies the required sort order. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17271) Planner adds un-necessary Sort even if child ordering is semantically same as required ordering
[ https://issues.apache.org/jira/browse/SPARK-17271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17271: Assignee: (was: Apache Spark) > Planner adds un-necessary Sort even if child ordering is semantically same as > required ordering > --- > > Key: SPARK-17271 > URL: https://issues.apache.org/jira/browse/SPARK-17271 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.0.0 >Reporter: Tejas Patil > > Found a case when the planner is adding un-needed SORT operation due to bug > in the way comparison for `SortOrder` is done at > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253 > `SortOrder` needs to be compared semantically because `Expression` within two > `SortOrder` can be "semantically equal" but not literally equal objects. > eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")` > Expression in required SortOrder: > {code} > AttributeReference( > name = "col1", > dataType = LongType, > nullable = false > ) (exprId = exprId, > qualifier = Some("a") > ) > {code} > Expression in child SortOrder: > {code} > AttributeReference( > name = "col1", > dataType = LongType, > nullable = false > ) (exprId = exprId) > {code} > Notice that the output column has a qualifier but the child attribute does > not but the inherent expression is the same and hence in this case we can say > that the child satisfies the required sort order. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17271) Planner adds un-necessary Sort even if child ordering is semantically same as required ordering
[ https://issues.apache.org/jira/browse/SPARK-17271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440557#comment-15440557 ] Apache Spark commented on SPARK-17271: -- User 'tejasapatil' has created a pull request for this issue: https://github.com/apache/spark/pull/14841 > Planner adds un-necessary Sort even if child ordering is semantically same as > required ordering > --- > > Key: SPARK-17271 > URL: https://issues.apache.org/jira/browse/SPARK-17271 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.0.0 >Reporter: Tejas Patil > > Found a case when the planner is adding un-needed SORT operation due to bug > in the way comparison for `SortOrder` is done at > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253 > `SortOrder` needs to be compared semantically because `Expression` within two > `SortOrder` can be "semantically equal" but not literally equal objects. > eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")` > Expression in required SortOrder: > {code} > AttributeReference( > name = "col1", > dataType = LongType, > nullable = false > ) (exprId = exprId, > qualifier = Some("a") > ) > {code} > Expression in child SortOrder: > {code} > AttributeReference( > name = "col1", > dataType = LongType, > nullable = false > ) (exprId = exprId) > {code} > Notice that the output column has a qualifier but the child attribute does > not but the inherent expression is the same and hence in this case we can say > that the child satisfies the required sort order. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17271) Planner adds un-necessary Sort even if child ordering is semantically same as required ordering
[ https://issues.apache.org/jira/browse/SPARK-17271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-17271: Description: Found a case when the planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253 `SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects. eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")` Expression in required SortOrder: {code} AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId, qualifier = Some("a") ) {code} Expression in child SortOrder: {code} AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId) {code} Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order. was: Found a case when the planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253 `SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects. eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")` Expression in required SortOrder: ``` AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId, qualifier = Some("a") ) ``` Expression in child SortOrder: ``` AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId) ``` Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order. > Planner adds un-necessary Sort even if child ordering is semantically same as > required ordering > --- > > Key: SPARK-17271 > URL: https://issues.apache.org/jira/browse/SPARK-17271 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.0.0 >Reporter: Tejas Patil > > Found a case when the planner is adding un-needed SORT operation due to bug > in the way comparison for `SortOrder` is done at > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253 > `SortOrder` needs to be compared semantically because `Expression` within two > `SortOrder` can be "semantically equal" but not literally equal objects. > eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")` > Expression in required SortOrder: > {code} > AttributeReference( > name = "col1", > dataType = LongType, > nullable = false > ) (exprId = exprId, > qualifier = Some("a") > ) > {code} > Expression in child SortOrder: > {code} > AttributeReference( > name = "col1", > dataType = LongType, > nullable = false > ) (exprId = exprId) > {code} > Notice that the output column has a qualifier but the child attribute does > not but the inherent expression is the same and hence in this case we can say > that the child satisfies the required sort order. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16216) CSV data source does not write date and timestamp correctly
[ https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440546#comment-15440546 ] Apache Spark commented on SPARK-16216: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/14840 > CSV data source does not write date and timestamp correctly > --- > > Key: SPARK-16216 > URL: https://issues.apache.org/jira/browse/SPARK-16216 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Blocker > Labels: releasenotes > Fix For: 2.0.1, 2.1.0 > > > Currently, CSV data source write {{DateType}} and {{TimestampType}} as below: > {code} > ++ > |date| > ++ > |14406372| > |14144598| > |14540400| > ++ > {code} > It would be nicer if it write dates and timestamps as a formatted string just > like JSON data sources. > Also, CSV data source currently supports {{dateFormat}} option to read dates > and timestamps in a custom format. It might be better if this option can be > applied in writing as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17266) PrefixComparatorsSuite's "String prefix comparator" failed when both input strings are empty strings
[ https://issues.apache.org/jira/browse/SPARK-17266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-17266. -- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14837 [https://github.com/apache/spark/pull/14837] > PrefixComparatorsSuite's "String prefix comparator" failed when both input > strings are empty strings > > > Key: SPARK-17266 > URL: https://issues.apache.org/jira/browse/SPARK-17266 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Reporter: Yin Huai > Fix For: 2.1.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1620/testReport/junit/org.apache.spark.util.collection.unsafe.sort/PrefixComparatorsSuite/String_prefix_comparator/ > {code} > org.scalatest.exceptions.GeneratorDrivenPropertyCheckFailedException: > TestFailedException was thrown during property evaluation. Message: 0 > equaled 0, but 1 did not equal 0, and 0 was not less than 0, and 0 was not > greater than 0 Location: (PrefixComparatorsSuite.scala:42) Occurred when > passed generated values ( arg0 = "", arg1 = "" ) > {code} > I could not reproduce it locally. But, let me add this case in the > regressionTests to explicitly test it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440505#comment-15440505 ] Sun Rui commented on SPARK-13525: - What's your spark cluster deployment mode? yarn or standalone? > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: Ă¢â‚¬ËœSparkRĂ¢â‚¬â„¢ > The following objects are masked from Ă¢â‚¬Ëœpackage:baseĂ¢â‚¬â„¢: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at
[jira] [Updated] (SPARK-17266) PrefixComparatorsSuite's "String prefix comparator" failed when both input strings are empty strings
[ https://issues.apache.org/jira/browse/SPARK-17266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-17266: - Assignee: Yin Huai > PrefixComparatorsSuite's "String prefix comparator" failed when both input > strings are empty strings > > > Key: SPARK-17266 > URL: https://issues.apache.org/jira/browse/SPARK-17266 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Reporter: Yin Huai >Assignee: Yin Huai > Fix For: 2.1.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1620/testReport/junit/org.apache.spark.util.collection.unsafe.sort/PrefixComparatorsSuite/String_prefix_comparator/ > {code} > org.scalatest.exceptions.GeneratorDrivenPropertyCheckFailedException: > TestFailedException was thrown during property evaluation. Message: 0 > equaled 0, but 1 did not equal 0, and 0 was not less than 0, and 0 was not > greater than 0 Location: (PrefixComparatorsSuite.scala:42) Occurred when > passed generated values ( arg0 = "", arg1 = "" ) > {code} > I could not reproduce it locally. But, let me add this case in the > regressionTests to explicitly test it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17271) Planner adds un-necessary Sort even if child ordering is semantically same as required ordering
Tejas Patil created SPARK-17271: --- Summary: Planner adds un-necessary Sort even if child ordering is semantically same as required ordering Key: SPARK-17271 URL: https://issues.apache.org/jira/browse/SPARK-17271 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0, 1.6.2 Reporter: Tejas Patil Found a case when the planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253 `SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects. eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")` Expression in required SortOrder: ``` AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId, qualifier = Some("a") ) ``` Expression in child SortOrder: ``` AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId) ``` Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440476#comment-15440476 ] Sun Rui commented on SPARK-13525: - Another guess: could you check "localhost" works for local TCP connection? > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: Ă¢â‚¬ËœSparkRĂ¢â‚¬â„¢ > The following objects are masked from Ă¢â‚¬Ëœpackage:baseĂ¢â‚¬â„¢: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at
[jira] [Commented] (SPARK-17270) Move object optimization rules into its own file
[ https://issues.apache.org/jira/browse/SPARK-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440368#comment-15440368 ] Apache Spark commented on SPARK-17270: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/14839 > Move object optimization rules into its own file > > > Key: SPARK-17270 > URL: https://issues.apache.org/jira/browse/SPARK-17270 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17270) Move object optimization rules into its own file
[ https://issues.apache.org/jira/browse/SPARK-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17270: Assignee: Reynold Xin (was: Apache Spark) > Move object optimization rules into its own file > > > Key: SPARK-17270 > URL: https://issues.apache.org/jira/browse/SPARK-17270 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17270) Move object optimization rules into its own file
[ https://issues.apache.org/jira/browse/SPARK-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17270: Assignee: Apache Spark (was: Reynold Xin) > Move object optimization rules into its own file > > > Key: SPARK-17270 > URL: https://issues.apache.org/jira/browse/SPARK-17270 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17270) Move object optimization rules into its own file
Reynold Xin created SPARK-17270: --- Summary: Move object optimization rules into its own file Key: SPARK-17270 URL: https://issues.apache.org/jira/browse/SPARK-17270 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17269) Move finish analysis stage into its own file
[ https://issues.apache.org/jira/browse/SPARK-17269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440359#comment-15440359 ] Apache Spark commented on SPARK-17269: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/14838 > Move finish analysis stage into its own file > > > Key: SPARK-17269 > URL: https://issues.apache.org/jira/browse/SPARK-17269 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17269) Move finish analysis stage into its own file
[ https://issues.apache.org/jira/browse/SPARK-17269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17269: Assignee: Reynold Xin (was: Apache Spark) > Move finish analysis stage into its own file > > > Key: SPARK-17269 > URL: https://issues.apache.org/jira/browse/SPARK-17269 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17269) Move finish analysis stage into its own file
[ https://issues.apache.org/jira/browse/SPARK-17269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17269: Assignee: Apache Spark (was: Reynold Xin) > Move finish analysis stage into its own file > > > Key: SPARK-17269 > URL: https://issues.apache.org/jira/browse/SPARK-17269 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17269) Move finish analysis stage into its own file
Reynold Xin created SPARK-17269: --- Summary: Move finish analysis stage into its own file Key: SPARK-17269 URL: https://issues.apache.org/jira/browse/SPARK-17269 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17268) Break Optimizer.scala apart
Reynold Xin created SPARK-17268: --- Summary: Break Optimizer.scala apart Key: SPARK-17268 URL: https://issues.apache.org/jira/browse/SPARK-17268 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Optimizer.scala has become too large to maintain. We would need to break it apart into multiple files each of which contains rules that are logically relevant. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17244) Joins should not pushdown non-deterministic conditions
[ https://issues.apache.org/jira/browse/SPARK-17244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-17244. -- Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull request 14815 [https://github.com/apache/spark/pull/14815] > Joins should not pushdown non-deterministic conditions > -- > > Key: SPARK-17244 > URL: https://issues.apache.org/jira/browse/SPARK-17244 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sameer Agarwal > Fix For: 2.0.1, 2.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17244) Joins should not pushdown non-deterministic conditions
[ https://issues.apache.org/jira/browse/SPARK-17244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-17244: - Assignee: Sameer Agarwal > Joins should not pushdown non-deterministic conditions > -- > > Key: SPARK-17244 > URL: https://issues.apache.org/jira/browse/SPARK-17244 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal > Fix For: 2.0.1, 2.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17163) Merge MLOR into a single LOR interface
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440183#comment-15440183 ] DB Tsai commented on SPARK-17163: - It relates to this [SPARK-17201], but seems that it's not a concern. > Merge MLOR into a single LOR interface > -- > > Key: SPARK-17163 > URL: https://issues.apache.org/jira/browse/SPARK-17163 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Reporter: Seth Hendrickson > > Before the 2.1 release, we should finalize the API for logistic regression. > After SPARK-7159, we have both LogisticRegression and > MultinomialLogisticRegression models. This may be confusing to users and, is > a bit superfluous since MLOR can do basically all of what BLOR does. We > should decide if it needs to be changed and implement those changes before 2.1 > *Update*: Seems we have decided to merge the two estimators. I changed the > title to reflect that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17266) PrefixComparatorsSuite's "String prefix comparator" failed when both input strings are empty strings
[ https://issues.apache.org/jira/browse/SPARK-17266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440146#comment-15440146 ] Apache Spark commented on SPARK-17266: -- User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/14837 > PrefixComparatorsSuite's "String prefix comparator" failed when both input > strings are empty strings > > > Key: SPARK-17266 > URL: https://issues.apache.org/jira/browse/SPARK-17266 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Reporter: Yin Huai > > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1620/testReport/junit/org.apache.spark.util.collection.unsafe.sort/PrefixComparatorsSuite/String_prefix_comparator/ > {code} > org.scalatest.exceptions.GeneratorDrivenPropertyCheckFailedException: > TestFailedException was thrown during property evaluation. Message: 0 > equaled 0, but 1 did not equal 0, and 0 was not less than 0, and 0 was not > greater than 0 Location: (PrefixComparatorsSuite.scala:42) Occurred when > passed generated values ( arg0 = "", arg1 = "" ) > {code} > I could not reproduce it locally. But, let me add this case in the > regressionTests to explicitly test it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17266) PrefixComparatorsSuite's "String prefix comparator" failed when both input strings are empty strings
[ https://issues.apache.org/jira/browse/SPARK-17266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17266: Assignee: (was: Apache Spark) > PrefixComparatorsSuite's "String prefix comparator" failed when both input > strings are empty strings > > > Key: SPARK-17266 > URL: https://issues.apache.org/jira/browse/SPARK-17266 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Reporter: Yin Huai > > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1620/testReport/junit/org.apache.spark.util.collection.unsafe.sort/PrefixComparatorsSuite/String_prefix_comparator/ > {code} > org.scalatest.exceptions.GeneratorDrivenPropertyCheckFailedException: > TestFailedException was thrown during property evaluation. Message: 0 > equaled 0, but 1 did not equal 0, and 0 was not less than 0, and 0 was not > greater than 0 Location: (PrefixComparatorsSuite.scala:42) Occurred when > passed generated values ( arg0 = "", arg1 = "" ) > {code} > I could not reproduce it locally. But, let me add this case in the > regressionTests to explicitly test it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17266) PrefixComparatorsSuite's "String prefix comparator" failed when both input strings are empty strings
[ https://issues.apache.org/jira/browse/SPARK-17266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17266: Assignee: Apache Spark > PrefixComparatorsSuite's "String prefix comparator" failed when both input > strings are empty strings > > > Key: SPARK-17266 > URL: https://issues.apache.org/jira/browse/SPARK-17266 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Reporter: Yin Huai >Assignee: Apache Spark > > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1620/testReport/junit/org.apache.spark.util.collection.unsafe.sort/PrefixComparatorsSuite/String_prefix_comparator/ > {code} > org.scalatest.exceptions.GeneratorDrivenPropertyCheckFailedException: > TestFailedException was thrown during property evaluation. Message: 0 > equaled 0, but 1 did not equal 0, and 0 was not less than 0, and 0 was not > greater than 0 Location: (PrefixComparatorsSuite.scala:42) Occurred when > passed generated values ( arg0 = "", arg1 = "" ) > {code} > I could not reproduce it locally. But, let me add this case in the > regressionTests to explicitly test it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17163) Merge MLOR into a single LOR interface
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440142#comment-15440142 ] Joseph K. Bradley commented on SPARK-17163: --- I was guessing that optimization would be more likely to diverge and return blown-up coefficients when not pivoting with regParam=0 (more likely than when pivoting). A given training dataset could constrain the problem enough to make a well-defined optimal solution with regParam=0 and pivoting, but the same might not hold true when not pivoting. > Merge MLOR into a single LOR interface > -- > > Key: SPARK-17163 > URL: https://issues.apache.org/jira/browse/SPARK-17163 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Reporter: Seth Hendrickson > > Before the 2.1 release, we should finalize the API for logistic regression. > After SPARK-7159, we have both LogisticRegression and > MultinomialLogisticRegression models. This may be confusing to users and, is > a bit superfluous since MLOR can do basically all of what BLOR does. We > should decide if it needs to be changed and implement those changes before 2.1 > *Update*: Seems we have decided to merge the two estimators. I changed the > title to reflect that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17267) Long running structured streaming requirements
[ https://issues.apache.org/jira/browse/SPARK-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-17267: Priority: Blocker (was: Major) > Long running structured streaming requirements > -- > > Key: SPARK-17267 > URL: https://issues.apache.org/jira/browse/SPARK-17267 > Project: Spark > Issue Type: Bug > Components: SQL, Streaming >Reporter: Reynold Xin >Priority: Blocker > > This is an umbrella ticket to track various things that are required in order > to have the engine for structured streaming run non-stop in production. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15698) Ability to remove old metadata for structure streaming MetadataLog
[ https://issues.apache.org/jira/browse/SPARK-15698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-15698: Issue Type: Sub-task (was: Improvement) Parent: SPARK-17267 > Ability to remove old metadata for structure streaming MetadataLog > -- > > Key: SPARK-15698 > URL: https://issues.apache.org/jira/browse/SPARK-15698 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Saisai Shao >Priority: Minor > > Current MetadataLog lacks the ability to remove old checkpoint file, we'd > better add this functionality to the MetadataLog and honor it in the place > where MetadataLog is used, that will reduce unnecessary small files in the > long running scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17235) MetadataLog should support purging old logs
[ https://issues.apache.org/jira/browse/SPARK-17235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-17235: Issue Type: Sub-task (was: New Feature) Parent: SPARK-17267 > MetadataLog should support purging old logs > --- > > Key: SPARK-17235 > URL: https://issues.apache.org/jira/browse/SPARK-17235 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Peter Lee >Assignee: Peter Lee > Fix For: 2.0.1, 2.1.0 > > > This is a useful primitive operation to have to support checkpointing and > forgetting old logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17165) FileStreamSource should not track the list of seen files indefinitely
[ https://issues.apache.org/jira/browse/SPARK-17165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-17165: Issue Type: Sub-task (was: Bug) Parent: SPARK-17267 > FileStreamSource should not track the list of seen files indefinitely > - > > Key: SPARK-17165 > URL: https://issues.apache.org/jira/browse/SPARK-17165 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Reynold Xin >Assignee: Peter Lee > Fix For: 2.0.1, 2.1.0 > > > FileStreamSource currently tracks all the files seen indefinitely, which > means it can run out of memory or overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17267) Long running structured streaming requirements
Reynold Xin created SPARK-17267: --- Summary: Long running structured streaming requirements Key: SPARK-17267 URL: https://issues.apache.org/jira/browse/SPARK-17267 Project: Spark Issue Type: Bug Components: SQL, Streaming Reporter: Reynold Xin This is an umbrella ticket to track various things that are required in order to have the engine for structured streaming run non-stop in production. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17235) MetadataLog should support purging old logs
[ https://issues.apache.org/jira/browse/SPARK-17235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-17235. - Resolution: Fixed Assignee: Peter Lee Fix Version/s: 2.1.0 2.0.1 > MetadataLog should support purging old logs > --- > > Key: SPARK-17235 > URL: https://issues.apache.org/jira/browse/SPARK-17235 > Project: Spark > Issue Type: New Feature > Components: SQL, Streaming >Reporter: Peter Lee >Assignee: Peter Lee > Fix For: 2.0.1, 2.1.0 > > > This is a useful primitive operation to have to support checkpointing and > forgetting old logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17266) PrefixComparatorsSuite's "String prefix comparator" failed when both input strings are empty strings
Yin Huai created SPARK-17266: Summary: PrefixComparatorsSuite's "String prefix comparator" failed when both input strings are empty strings Key: SPARK-17266 URL: https://issues.apache.org/jira/browse/SPARK-17266 Project: Spark Issue Type: Bug Components: SQL, Tests Reporter: Yin Huai https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1620/testReport/junit/org.apache.spark.util.collection.unsafe.sort/PrefixComparatorsSuite/String_prefix_comparator/ {code} org.scalatest.exceptions.GeneratorDrivenPropertyCheckFailedException: TestFailedException was thrown during property evaluation. Message: 0 equaled 0, but 1 did not equal 0, and 0 was not less than 0, and 0 was not greater than 0 Location: (PrefixComparatorsSuite.scala:42) Occurred when passed generated values ( arg0 = "", arg1 = "" ) {code} I could not reproduce it locally. But, let me add this case in the regressionTests to explicitly test it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440090#comment-15440090 ] Arihanth Jain commented on SPARK-13525: --- [~sunrui] I have tried "spark.sparkr.use.daemon" to false with no luck. Now, dealing with this by creating R cluster using base package "parallel" and makePSOCKcluster function. I believe this gets closer to finding: By passing nodes Hostname the workers fail with following error and hangs on it forever @@@ @ WARNING: POSSIBLE DNS SPOOFING DETECTED! @ @@@ The RSA host key for test02.servers.jiffybox.net has changed, and the key for the corresponding IP address 134.xxx.xx.xxx is unchanged. This could either mean that DNS SPOOFING is happening or the IP address for the host and its host key have changed at the same time. Offending key for IP in /root/.ssh/known_hosts:10 @@@ @WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that the RSA host key has just been changed. The fingerprint for the RSA key sent by the remote host is . Please contact your system administrator. Add correct host key in /root/.ssh/known_hosts to get rid of this message. Offending key in /root/.ssh/known_hosts:9 RSA host key for test02.servers.jiffybox.net has changed and you have requested strict checking. Host key verification failed. The same works fine and all workers are started when passing nodes IP address instead of Hostname. starting worker pid=32407 on master.jiffybox.net:11575 at 22:18:50.050 starting worker pid=3523 on master.jiffybox.net:11575 at 22:18:50.464 starting worker pid=2583 on master.jiffybox.net:11575 at 22:18:50.885 starting worker pid=5227 on master.jiffybox.net:11575 at 22:18:51.294 -- The above "DNS SPOOFING" issue was simply resolved by removing the matching entries from .ssh/known_hosts and recreating them for all nodes "ssh root@hostname". This fixed the previous issue and was able to able to create socket cluster with 4 nodes (now at port 11977). starting worker pid=6804 on master.jiffybox.net:11977 at 23:59:23.245 starting worker pid=10257 on master.jiffybox.net:11977 at 23:59:23.668 starting worker pid=9776 on master.jiffybox.net:11977 at 23:59:24.107 starting worker pid=12073 on master.jiffybox.net:11977 at 23:59:24.540 note: Neither the path to Rscript not any port number was specified. -- Unfortunately, this did not resolve the problem with SparkR. It fails with existing issue "java.net.SocketTimeoutException: Accept timed out". > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: Ă¢â‚¬ËœSparkRĂ¢â‚¬â„¢ > The following objects are masked from Ă¢â‚¬Ëœpackage:baseĂ¢â‚¬â„¢: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : >
[jira] [Commented] (SPARK-17044) Add window function test in SQLQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-17044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439925#comment-15439925 ] Dongjoon Hyun commented on SPARK-17044: --- Hi, [~rxin]. Could you review this issue? > Add window function test in SQLQueryTestSuite > - > > Key: SPARK-17044 > URL: https://issues.apache.org/jira/browse/SPARK-17044 > Project: Spark > Issue Type: Improvement >Reporter: Dongjoon Hyun >Priority: Minor > > This issue adds a SQL query test for Window functions for new > `SQLQueryTestSuite`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17044) Add window function test in SQLQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-17044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-17044: -- Description: This issue adds a SQL query test for Window functions for new `SQLQueryTestSuite`. (was: New `SQLQueryTestSuite` simplifies SQL testcases. This issue aims to replace `WindowQuerySuite.scala` of `sql/hive` module with `window_functions.sql` in `sql/core` module.) > Add window function test in SQLQueryTestSuite > - > > Key: SPARK-17044 > URL: https://issues.apache.org/jira/browse/SPARK-17044 > Project: Spark > Issue Type: Improvement >Reporter: Dongjoon Hyun >Priority: Minor > > This issue adds a SQL query test for Window functions for new > `SQLQueryTestSuite`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439857#comment-15439857 ] Apache Spark commented on SPARK-17243: -- User 'ajbozarth' has created a pull request for this issue: https://github.com/apache/spark/pull/14835 > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17243: Assignee: (was: Apache Spark) > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17243: Assignee: Apache Spark > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu >Assignee: Apache Spark > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439799#comment-15439799 ] Alex Bozarth commented on SPARK-17243: -- So I decided to work on this as a short break from my current work and I have a fix that just requires some final testing before I open a pr, should be open by EOD. > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15882) Discuss distributed linear algebra in spark.ml package
[ https://issues.apache.org/jira/browse/SPARK-15882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439767#comment-15439767 ] Joseph K. Bradley commented on SPARK-15882: --- This seems like an important feature, but less critical than other feature parity issues in {{spark.ml}}. Essentially, most users who want distributed linear algebra are fairly expert. Those expert users are often experienced enough to know how to work with RDDs and DataFrames to do conversions as needed. Missing algorithms, on the other hand, often impact non-experts who do not know how to combine spark.ml with spark.mllib. Therefore, I'd prioritize adding missing algorithms (FPGrowth, etc.) to spark.ml over adding distributed linear algebra. That said, we definitely need to do this task before too long, so it would be great to start thinking about design. RDDs vs. Datasets: For the initial implementation, I'd say we should use RDDs to limit initial work required, though I'm open to Datasets if we do scaling tests. However, I strongly prefer only using Datasets in the public APIs, with the expectation that we can eventually switch over to Dataset-based implementations. It is true that RDDs offer more flexibility now, but we should push for the needed flexibility in Datasets so that we can take advantage of their other improvements over RDDs. Functionality: This can be sketched in the design doc. The main question is whether we want to change APIs from spark.mllib, especially if any are not Java-friendly. Plugging in other local linear algebra: This should be addressed in the design doc. I hope, however, that this decision can be made later (by exposing internal APIs as needed) so that the migration is not held up by massive design discussions. Scaling: Regardless of our approach, we'll need to do proper scalability tests to make sure we do not have regressions in the migration. "Migration": I should clarify that I'm assuming we will leave spark.mllib.linalg alone and will be adding new APIs in spark.ml.linalg. > Discuss distributed linear algebra in spark.ml package > -- > > Key: SPARK-15882 > URL: https://issues.apache.org/jira/browse/SPARK-15882 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for discussing how org.apache.spark.mllib.linalg.distributed.* > should be migrated to org.apache.spark.ml. > Initial questions: > * Should we use Datasets or RDDs underneath? > * If Datasets, are there missing features needed for the migration? > * Do we want to redesign any aspects of the distributed matrices during this > move? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17246) Support BigDecimal literal parsing
[ https://issues.apache.org/jira/browse/SPARK-17246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-17246. - Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 > Support BigDecimal literal parsing > -- > > Key: SPARK-17246 > URL: https://issues.apache.org/jira/browse/SPARK-17246 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Herman van Hovell >Assignee: Herman van Hovell >Priority: Minor > Fix For: 2.0.1, 2.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16967) Collect Mesos support code into a module/profile
[ https://issues.apache.org/jira/browse/SPARK-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-16967. Resolution: Fixed Assignee: Michael Gummelt Fix Version/s: 2.1.0 > Collect Mesos support code into a module/profile > > > Key: SPARK-16967 > URL: https://issues.apache.org/jira/browse/SPARK-16967 > Project: Spark > Issue Type: Task > Components: Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Sean Owen >Assignee: Michael Gummelt >Priority: Critical > Fix For: 2.1.0 > > > CC [~mgummelt] [~tnachen] [~skonto] > I think this is fairly easy and would be beneficial as more work goes into > Mesos. It should separate into a module like YARN does, just on principle > really, but because it also means anyone that doesn't need Mesos support can > build without it. > I'm entirely willing to take a shot at this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17265) EdgeRDD Difference throws an exception
Shishir Kharel created SPARK-17265: -- Summary: EdgeRDD Difference throws an exception Key: SPARK-17265 URL: https://issues.apache.org/jira/browse/SPARK-17265 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Environment: windows, ubuntu Reporter: Shishir Kharel Subtracting two edge RDD throws and exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17261) Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"
[ https://issues.apache.org/jira/browse/SPARK-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439613#comment-15439613 ] Dongjoon Hyun commented on SPARK-17261: --- Hi, [~dakghar] For me, those seems not to work even in `spark-shell`. Could you add a `show` at the end? I tested and got the same result in 2.0.0 and current master branch. {code} scala> import org.apache.spark.sql.SparkSession scala> val spark = SparkSession.builder.enableHiveSupport().getOrCreate() scala> spark.sql("show databases").show ++ |databaseName| ++ | default| ++ scala> spark.stop() scala> val spark = SparkSession.builder.enableHiveSupport().getOrCreate() scala> spark.sql("show databases").show 16/08/26 12:09:22 ERROR Schema: Failed initialising database. Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: -- java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@6b60d99c, see the next exception for details. {code} > Using HiveContext after re-creating SparkContext in Spark 2.0 throws > "Java.lang.illegalStateException: Cannot call methods on a stopped > sparkContext" > - > > Key: SPARK-17261 > URL: https://issues.apache.org/jira/browse/SPARK-17261 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.0 > Environment: Amazon AWS EMR 5.0 >Reporter: Rahul Jain > Fix For: 2.0.0 > > > After stopping SparkSession if we recreate it and use HiveContext in it. it > will throw error. > Steps to reproduce: > spark = SparkSession.builder.enableHiveSupport().getOrCreate() > spark.sql("show databases") > spark.stop() > spark = SparkSession.builder.enableHiveSupport().getOrCreate() > spark.sql("show databases") > "Java.lang.illegalStateException: Cannot call methods on a stopped > sparkContext" > Above error occurs only in case of Pyspark not in SparkShell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17220) Upgrade Py4J to 0.10.3
[ https://issues.apache.org/jira/browse/SPARK-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-17220: - Component/s: PySpark > Upgrade Py4J to 0.10.3 > -- > > Key: SPARK-17220 > URL: https://issues.apache.org/jira/browse/SPARK-17220 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Weiqing Yang >Priority: Minor > > Py4J 0.10.3 has landed. It includes some important bug fixes. For example: > Both sides: fixed memory leak issue with ClientServer and potential deadlock > issue by creating a memory leak test suite. (Py4J 0.10.2) > Both sides: added more memory leak tests and fixed a potential memory leak > related to listeners. (Py4J 0.10.3) > So it's time to upgrade py4j from 0.10.1 to 0.10.3. The changelog is > available at https://www.py4j.org/changelog.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17265) EdgeRDD Difference throws an exception
[ https://issues.apache.org/jira/browse/SPARK-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shishir Kharel updated SPARK-17265: --- Description: Subtracting two edge RDD throws and exception. val difference = graph1.edges.subtract(graph2.edges) gives Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.ClassCastException: org.apache.spark.graphx.Edge cannot be cast to scala.Tuple2 at org.apache.spark.rdd.RDD$$anonfun$subtract$3$$anon$3.getPartition(RDD.scala:968) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) was: Subtracting two edge RDD throws and exception. > EdgeRDD Difference throws an exception > -- > > Key: SPARK-17265 > URL: https://issues.apache.org/jira/browse/SPARK-17265 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: windows, ubuntu >Reporter: Shishir Kharel > > Subtracting two edge RDD throws and exception. > val difference = graph1.edges.subtract(graph2.edges) > gives > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: > Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.ClassCastException: > org.apache.spark.graphx.Edge cannot be cast to scala.Tuple2 > at > org.apache.spark.rdd.RDD$$anonfun$subtract$3$$anon$3.getPartition(RDD.scala:968) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16501) spark.mesos.secret exposed on UI and command line
[ https://issues.apache.org/jira/browse/SPARK-16501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439609#comment-15439609 ] Eric Daniel commented on SPARK-16501: - Great to know, thanks! > spark.mesos.secret exposed on UI and command line > - > > Key: SPARK-16501 > URL: https://issues.apache.org/jira/browse/SPARK-16501 > Project: Spark > Issue Type: Improvement > Components: Spark Submit, Web UI >Affects Versions: 1.6.2 >Reporter: Eric Daniel > Labels: security > > There are two related problems with spark.mesos.secret: > 1) The web UI shows its value in the "environment" tab > 2) Passing it as a command-line option to spark-submit (or creating a > SparkContext from python, with the effect of launching spark-submit) exposes > it to "ps" > I'll be happy to submit a patch but I could use some advice first. > The first problem is easy enough, just don't show that value in the UI > For the second problem, I'm not sure what the best solution is. A > "spark.mesos.secret-file" parameter would let the user store the secret in a > non-world-readable file. Alternatively, the mesos secret could be obtained > from the environment, which other users don't have access to. Either > solution would work in client mode, but I don't know if they're workable in > cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17207) Comparing Vector in relative tolerance or absolute tolerance in UnitTests error
[ https://issues.apache.org/jira/browse/SPARK-17207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai resolved SPARK-17207. - Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14785 [https://github.com/apache/spark/pull/14785] > Comparing Vector in relative tolerance or absolute tolerance in UnitTests > error > > > Key: SPARK-17207 > URL: https://issues.apache.org/jira/browse/SPARK-17207 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Reporter: Peng Meng > Fix For: 2.1.0 > > > The result of compare two vectors using UnitTests > (org.apache.spark.mllib.util.TestingUtils) is not right sometime. > For example: > val a = Vectors.dense(Arrary(1.0, 2.0)) > val b = Vectors.zeros(0) > a ~== b absTol 1e-1 // the result is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17165) FileStreamSource should not track the list of seen files indefinitely
[ https://issues.apache.org/jira/browse/SPARK-17165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-17165: - Fix Version/s: 2.1.0 2.0.1 > FileStreamSource should not track the list of seen files indefinitely > - > > Key: SPARK-17165 > URL: https://issues.apache.org/jira/browse/SPARK-17165 > Project: Spark > Issue Type: Bug > Components: SQL, Streaming >Reporter: Reynold Xin >Assignee: Peter Lee > Fix For: 2.0.1, 2.1.0 > > > FileStreamSource currently tracks all the files seen indefinitely, which > means it can run out of memory or overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17165) FileStreamSource should not track the list of seen files indefinitely
[ https://issues.apache.org/jira/browse/SPARK-17165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-17165. -- Resolution: Fixed Assignee: Peter Lee > FileStreamSource should not track the list of seen files indefinitely > - > > Key: SPARK-17165 > URL: https://issues.apache.org/jira/browse/SPARK-17165 > Project: Spark > Issue Type: Bug > Components: SQL, Streaming >Reporter: Reynold Xin >Assignee: Peter Lee > > FileStreamSource currently tracks all the files seen indefinitely, which > means it can run out of memory or overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439517#comment-15439517 ] Gang Wu commented on SPARK-17243: - Thanks [~ajbozarth]! Let me know when it is done. > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17250) Remove HiveClient and setCurrentDatabase from HiveSessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-17250: - Assignee: Xiao Li (was: Apache Spark) > Remove HiveClient and setCurrentDatabase from HiveSessionCatalog > > > Key: SPARK-17250 > URL: https://issues.apache.org/jira/browse/SPARK-17250 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.1.0 > > > This is the first step to clean `HiveClient` from `HiveSessionState`. In the > metastore interaction, we always set fully qualified names when > accessing/operating a table. That means, we always specify the database. > Thus, it is not necessary to use `HiveClient` to change the active database > in Hive metastore. > In `HiveSessionCatalog `, `setCurrentDatabase` is the only function that uses > `HiveClient`. Thus, we can remove it after removing `setCurrentDatabase` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17250) Remove HiveClient and setCurrentDatabase from HiveSessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-17250. -- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14821 [https://github.com/apache/spark/pull/14821] > Remove HiveClient and setCurrentDatabase from HiveSessionCatalog > > > Key: SPARK-17250 > URL: https://issues.apache.org/jira/browse/SPARK-17250 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Xiao Li >Assignee: Apache Spark > Fix For: 2.1.0 > > > This is the first step to clean `HiveClient` from `HiveSessionState`. In the > metastore interaction, we always set fully qualified names when > accessing/operating a table. That means, we always specify the database. > Thus, it is not necessary to use `HiveClient` to change the active database > in Hive metastore. > In `HiveSessionCatalog `, `setCurrentDatabase` is the only function that uses > `HiveClient`. Thus, we can remove it after removing `setCurrentDatabase` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17163) Merge MLOR into a single LOR interface
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439499#comment-15439499 ] Seth Hendrickson commented on SPARK-17163: -- [~dbtsai] We can discuss these design points on the WIP PR. We can change from what is currently implemented there, but I find it is always easier to communicate if we can directly look at code :) > Merge MLOR into a single LOR interface > -- > > Key: SPARK-17163 > URL: https://issues.apache.org/jira/browse/SPARK-17163 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Reporter: Seth Hendrickson > > Before the 2.1 release, we should finalize the API for logistic regression. > After SPARK-7159, we have both LogisticRegression and > MultinomialLogisticRegression models. This may be confusing to users and, is > a bit superfluous since MLOR can do basically all of what BLOR does. We > should decide if it needs to be changed and implement those changes before 2.1 > *Update*: Seems we have decided to merge the two estimators. I changed the > title to reflect that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17192) Issuing an exception when users specify the partitioning columns without a given schema
[ https://issues.apache.org/jira/browse/SPARK-17192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-17192: - Assignee: Xiao Li > Issuing an exception when users specify the partitioning columns without a > given schema > --- > > Key: SPARK-17192 > URL: https://issues.apache.org/jira/browse/SPARK-17192 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.1.0 > > > We need to issue an exception when users specify the partitioning columns > without a given schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17192) Issuing an exception when users specify the partitioning columns without a given schema
[ https://issues.apache.org/jira/browse/SPARK-17192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-17192. -- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14572 [https://github.com/apache/spark/pull/14572] > Issuing an exception when users specify the partitioning columns without a > given schema > --- > > Key: SPARK-17192 > URL: https://issues.apache.org/jira/browse/SPARK-17192 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > Fix For: 2.1.0 > > > We need to issue an exception when users specify the partitioning columns > without a given schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17252) Performing arithmetic in VALUES can lead to ClassCastException / MatchErrors during query parsing
[ https://issues.apache.org/jira/browse/SPARK-17252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439478#comment-15439478 ] Josh Rosen commented on SPARK-17252: Looks like this issue only affects 2.0.0, so I'm going to resolve it as fixed in 2.0.1. > Performing arithmetic in VALUES can lead to ClassCastException / MatchErrors > during query parsing > - > > Key: SPARK-17252 > URL: https://issues.apache.org/jira/browse/SPARK-17252 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Josh Rosen > Fix For: 2.0.1 > > > The following example fails with a ClassCastException: > {code} > create table t(d double); > insert into t VALUES (1 * 1.0); > {code} > Here's the error: > {code} > java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be > cast to java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) > at scala.math.Numeric$IntIsIntegral$.times(Numeric.scala:57) > at > org.apache.spark.sql.catalyst.expressions.Multiply.nullSafeEval(arithmetic.scala:207) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:416) > at > org.apache.spark.sql.catalyst.expressions.CreateStruct$$anonfun$eval$2.apply(complexTypeCreator.scala:198) > at > org.apache.spark.sql.catalyst.expressions.CreateStruct$$anonfun$eval$2.apply(complexTypeCreator.scala:198) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.expressions.CreateStruct.eval(complexTypeCreator.scala:198) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:320) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1$$anonfun$39.apply(AstBuilder.scala:677) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1$$anonfun$39.apply(AstBuilder.scala:674) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1.apply(AstBuilder.scala:674) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1.apply(AstBuilder.scala:658) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:96) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInlineTable(AstBuilder.scala:658) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInlineTable(AstBuilder.scala:43) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$InlineTableContext.accept(SqlBaseParser.java:9358) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitInlineTableDefault1(SqlBaseBaseVisitor.java:608) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$InlineTableDefault1Context.accept(SqlBaseParser.java:7073) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitQueryTermDefault(SqlBaseBaseVisitor.java:580) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$QueryTermDefaultContext.accept(SqlBaseParser.java:6895) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:47) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.plan(AstBuilder.scala:83) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleInsertQuery$1.apply(AstBuilder.scala:158) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleInsertQuery$1.apply(AstBuilder.scala:162) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:96) > at >
[jira] [Resolved] (SPARK-17252) Performing arithmetic in VALUES can lead to ClassCastException / MatchErrors during query parsing
[ https://issues.apache.org/jira/browse/SPARK-17252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-17252. Resolution: Fixed Fix Version/s: 2.0.1 > Performing arithmetic in VALUES can lead to ClassCastException / MatchErrors > during query parsing > - > > Key: SPARK-17252 > URL: https://issues.apache.org/jira/browse/SPARK-17252 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Josh Rosen > Fix For: 2.0.1 > > > The following example fails with a ClassCastException: > {code} > create table t(d double); > insert into t VALUES (1 * 1.0); > {code} > Here's the error: > {code} > java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be > cast to java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) > at scala.math.Numeric$IntIsIntegral$.times(Numeric.scala:57) > at > org.apache.spark.sql.catalyst.expressions.Multiply.nullSafeEval(arithmetic.scala:207) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:416) > at > org.apache.spark.sql.catalyst.expressions.CreateStruct$$anonfun$eval$2.apply(complexTypeCreator.scala:198) > at > org.apache.spark.sql.catalyst.expressions.CreateStruct$$anonfun$eval$2.apply(complexTypeCreator.scala:198) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.expressions.CreateStruct.eval(complexTypeCreator.scala:198) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:320) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1$$anonfun$39.apply(AstBuilder.scala:677) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1$$anonfun$39.apply(AstBuilder.scala:674) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1.apply(AstBuilder.scala:674) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1.apply(AstBuilder.scala:658) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:96) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInlineTable(AstBuilder.scala:658) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInlineTable(AstBuilder.scala:43) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$InlineTableContext.accept(SqlBaseParser.java:9358) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitInlineTableDefault1(SqlBaseBaseVisitor.java:608) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$InlineTableDefault1Context.accept(SqlBaseParser.java:7073) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitQueryTermDefault(SqlBaseBaseVisitor.java:580) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$QueryTermDefaultContext.accept(SqlBaseParser.java:6895) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:47) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.plan(AstBuilder.scala:83) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleInsertQuery$1.apply(AstBuilder.scala:158) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleInsertQuery$1.apply(AstBuilder.scala:162) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:96) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleInsertQuery(AstBuilder.scala:157) > at >
[jira] [Commented] (SPARK-17163) Merge MLOR into a single LOR interface
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439475#comment-15439475 ] Apache Spark commented on SPARK-17163: -- User 'sethah' has created a pull request for this issue: https://github.com/apache/spark/pull/14834 > Merge MLOR into a single LOR interface > -- > > Key: SPARK-17163 > URL: https://issues.apache.org/jira/browse/SPARK-17163 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Reporter: Seth Hendrickson > > Before the 2.1 release, we should finalize the API for logistic regression. > After SPARK-7159, we have both LogisticRegression and > MultinomialLogisticRegression models. This may be confusing to users and, is > a bit superfluous since MLOR can do basically all of what BLOR does. We > should decide if it needs to be changed and implement those changes before 2.1 > *Update*: Seems we have decided to merge the two estimators. I changed the > title to reflect that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17163) Merge MLOR into a single LOR interface
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17163: Assignee: Apache Spark > Merge MLOR into a single LOR interface > -- > > Key: SPARK-17163 > URL: https://issues.apache.org/jira/browse/SPARK-17163 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Reporter: Seth Hendrickson >Assignee: Apache Spark > > Before the 2.1 release, we should finalize the API for logistic regression. > After SPARK-7159, we have both LogisticRegression and > MultinomialLogisticRegression models. This may be confusing to users and, is > a bit superfluous since MLOR can do basically all of what BLOR does. We > should decide if it needs to be changed and implement those changes before 2.1 > *Update*: Seems we have decided to merge the two estimators. I changed the > title to reflect that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17163) Merge MLOR into a single LOR interface
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17163: Assignee: (was: Apache Spark) > Merge MLOR into a single LOR interface > -- > > Key: SPARK-17163 > URL: https://issues.apache.org/jira/browse/SPARK-17163 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Reporter: Seth Hendrickson > > Before the 2.1 release, we should finalize the API for logistic regression. > After SPARK-7159, we have both LogisticRegression and > MultinomialLogisticRegression models. This may be confusing to users and, is > a bit superfluous since MLOR can do basically all of what BLOR does. We > should decide if it needs to be changed and implement those changes before 2.1 > *Update*: Seems we have decided to merge the two estimators. I changed the > title to reflect that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439470#comment-15439470 ] Alex Bozarth commented on SPARK-17243: -- Thanks [~ste...@apache.org], this idea is great. [~wgtmac], based on this I might be able to get a small fix for this out next week instead of waiting to include it in my larger update next month. > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17165) FileStreamSource should not track the list of seen files indefinitely
[ https://issues.apache.org/jira/browse/SPARK-17165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439391#comment-15439391 ] Frederick Reiss commented on SPARK-17165: - This problem is actually deeper than just FileStreamSource. With the current version of the Source trait, *every* source needs to keep infinite state. [~scrapco...@gmail.com] ran into that issue while writing a connector for MQTT. I opened SPARK-16963 a few weeks back to cover the core issue with the Stream trait. My open PR for that JIRA (https://github.com/apache/spark/pull/14553) has a fair amount of overlap with the PR here and with the one in SPARK-17235. Can we merge our efforts here to make a single sequence of small, easy-to-review change sets that will resolve these state management issues across all sources? I'm thinking that we can create a single JIRA (or reuse one of the existing ones) to cover "keep only bounded state for Structured Streaming data sources", then divide that JIRA into the following tasks: # Add a method to `Source` to trigger cleaning of processed data # Add a method to `HDFSMetadataLog` to clean out processed metadata # Implement garbage collection of old data (metadata and files) in `FileStreamSource` # Implement garbage collection of old data in `MemoryStream` and other stubs of Source # Modify the scheduler (`StreamExecution`) so that it triggers garbage collection of data and metadata Thoughts? > FileStreamSource should not track the list of seen files indefinitely > - > > Key: SPARK-17165 > URL: https://issues.apache.org/jira/browse/SPARK-17165 > Project: Spark > Issue Type: Bug > Components: SQL, Streaming >Reporter: Reynold Xin > > FileStreamSource currently tracks all the files seen indefinitely, which > means it can run out of memory or overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17251) "ClassCastException: OuterReference cannot be cast to NamedExpression" for correlated subquery on the RHS of an IN operator
[ https://issues.apache.org/jira/browse/SPARK-17251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439360#comment-15439360 ] Herman van Hovell commented on SPARK-17251: --- Ok, I have taken a look at this one. We should make {{OuterReference}} a {{NamedExpression}} and then we are good (have most of the code working locally). If we fix this, it will fail analysis because we are using a correlated predicate in a {{Project}}. We could make an exception for IN, but I am just wondering if we support such a weird construct at all. > "ClassCastException: OuterReference cannot be cast to NamedExpression" for > correlated subquery on the RHS of an IN operator > --- > > Key: SPARK-17251 > URL: https://issues.apache.org/jira/browse/SPARK-17251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Josh Rosen > > The following test case produces a ClassCastException in the analyzer: > {code} > CREATE TABLE t1(a INTEGER); > INSERT INTO t1 VALUES(1),(2); > CREATE TABLE t2(b INTEGER); > INSERT INTO t2 VALUES(1); > SELECT a FROM t1 WHERE a NOT IN (SELECT a FROM t2); > {code} > Here's the exception: > {code} > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.OuterReference cannot be cast to > org.apache.spark.sql.catalyst.expressions.NamedExpression > at > org.apache.spark.sql.catalyst.plans.logical.Project$$anonfun$1.apply(basicLogicalOperators.scala:48) > at > scala.collection.LinearSeqOptimized$class.exists(LinearSeqOptimized.scala:80) > at scala.collection.immutable.List.exists(List.scala:84) > at > org.apache.spark.sql.catalyst.plans.logical.Project.resolved$lzycompute(basicLogicalOperators.scala:44) > at > org.apache.spark.sql.catalyst.plans.logical.Project.resolved(basicLogicalOperators.scala:43) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveSubquery$$resolveSubQuery(Analyzer.scala:1091) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveSubquery$$resolveSubQueries$1.applyOrElse(Analyzer.scala:1130) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveSubquery$$resolveSubQueries$1.applyOrElse(Analyzer.scala:1116) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:156) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:166) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$4.apply(QueryPlan.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:175) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:144) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveSubquery$$resolveSubQueries(Analyzer.scala:1116) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery$$anonfun$apply$16.applyOrElse(Analyzer.scala:1148) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery$$anonfun$apply$16.applyOrElse(Analyzer.scala:1141) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61) > at >
[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439352#comment-15439352 ] Steve Loughran commented on SPARK-17243: The REST API actually lets you set a time range for querying entries coming back, though not a limit. This problem could presumably be addressed in a couple of ways # add a {{limit}} argument to the REST API, declaring the max #of responses to return # leave the REST API alone but tweak the client code to work backwards from now to try and get a range. That's more convoluted and is probably brittle to clocks. strategy #1 is simpler and would avoid the server being overloaded from large requests made directly by arbitrary callers —that serialization is going to be expensive too, and an easy to way to bring the history server down. > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16998) select($"column1", explode($"column2")) is extremely slow
[ https://issues.apache.org/jira/browse/SPARK-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439336#comment-15439336 ] Takeshi Yamamuro commented on SPARK-16998: -- can we link this ticket to SPARK-15214? > select($"column1", explode($"column2")) is extremely slow > - > > Key: SPARK-16998 > URL: https://issues.apache.org/jira/browse/SPARK-16998 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: TobiasP > > Using a Dataset containing 10.000 rows, each containing null and an array of > 5.000 Ints, I observe the following performance (in local mode): > {noformat} > scala> time(ds.select(explode($"value")).sample(false, 0.001, 1).collect) > 1.219052 seconds > > res9: Array[org.apache.spark.sql.Row] = Array([3761], [3766], [3196]) > scala> time(ds.select($"dummy", explode($"value")).sample(false, 0.001, > 1).collect) > 20.219447 seconds > > res5: Array[org.apache.spark.sql.Row] = Array([null,3761], [null,3766], > [null,3196]) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16998) select($"column1", explode($"column2")) is extremely slow
[ https://issues.apache.org/jira/browse/SPARK-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439331#comment-15439331 ] Takeshi Yamamuro commented on SPARK-16998: -- yea, no problem. thanks! > select($"column1", explode($"column2")) is extremely slow > - > > Key: SPARK-16998 > URL: https://issues.apache.org/jira/browse/SPARK-16998 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: TobiasP > > Using a Dataset containing 10.000 rows, each containing null and an array of > 5.000 Ints, I observe the following performance (in local mode): > {noformat} > scala> time(ds.select(explode($"value")).sample(false, 0.001, 1).collect) > 1.219052 seconds > > res9: Array[org.apache.spark.sql.Row] = Array([3761], [3766], [3196]) > scala> time(ds.select($"dummy", explode($"value")).sample(false, 0.001, > 1).collect) > 20.219447 seconds > > res5: Array[org.apache.spark.sql.Row] = Array([null,3761], [null,3766], > [null,3196]) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17260) move CreateTables to HiveStrategies
[ https://issues.apache.org/jira/browse/SPARK-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-17260. -- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14825 [https://github.com/apache/spark/pull/14825] > move CreateTables to HiveStrategies > --- > > Key: SPARK-17260 > URL: https://issues.apache.org/jira/browse/SPARK-17260 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16998) select($"column1", explode($"column2")) is extremely slow
[ https://issues.apache.org/jira/browse/SPARK-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439276#comment-15439276 ] Herman van Hovell commented on SPARK-16998: --- [~maropu] Do you mind if I do it myself? I already started hacking. > select($"column1", explode($"column2")) is extremely slow > - > > Key: SPARK-16998 > URL: https://issues.apache.org/jira/browse/SPARK-16998 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: TobiasP > > Using a Dataset containing 10.000 rows, each containing null and an array of > 5.000 Ints, I observe the following performance (in local mode): > {noformat} > scala> time(ds.select(explode($"value")).sample(false, 0.001, 1).collect) > 1.219052 seconds > > res9: Array[org.apache.spark.sql.Row] = Array([3761], [3766], [3196]) > scala> time(ds.select($"dummy", explode($"value")).sample(false, 0.001, > 1).collect) > 20.219447 seconds > > res5: Array[org.apache.spark.sql.Row] = Array([null,3761], [null,3766], > [null,3196]) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17252) Performing arithmetic in VALUES can lead to ClassCastException / MatchErrors during query parsing
[ https://issues.apache.org/jira/browse/SPARK-17252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439269#comment-15439269 ] Herman van Hovell commented on SPARK-17252: --- I cannot reproduce this. I tried both on the latest master and branch-2.0. > Performing arithmetic in VALUES can lead to ClassCastException / MatchErrors > during query parsing > - > > Key: SPARK-17252 > URL: https://issues.apache.org/jira/browse/SPARK-17252 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Josh Rosen > > The following example fails with a ClassCastException: > {code} > create table t(d double); > insert into t VALUES (1 * 1.0); > {code} > Here's the error: > {code} > java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be > cast to java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) > at scala.math.Numeric$IntIsIntegral$.times(Numeric.scala:57) > at > org.apache.spark.sql.catalyst.expressions.Multiply.nullSafeEval(arithmetic.scala:207) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:416) > at > org.apache.spark.sql.catalyst.expressions.CreateStruct$$anonfun$eval$2.apply(complexTypeCreator.scala:198) > at > org.apache.spark.sql.catalyst.expressions.CreateStruct$$anonfun$eval$2.apply(complexTypeCreator.scala:198) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.expressions.CreateStruct.eval(complexTypeCreator.scala:198) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:320) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1$$anonfun$39.apply(AstBuilder.scala:677) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1$$anonfun$39.apply(AstBuilder.scala:674) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1.apply(AstBuilder.scala:674) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitInlineTable$1.apply(AstBuilder.scala:658) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:96) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInlineTable(AstBuilder.scala:658) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInlineTable(AstBuilder.scala:43) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$InlineTableContext.accept(SqlBaseParser.java:9358) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitInlineTableDefault1(SqlBaseBaseVisitor.java:608) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$InlineTableDefault1Context.accept(SqlBaseParser.java:7073) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitQueryTermDefault(SqlBaseBaseVisitor.java:580) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$QueryTermDefaultContext.accept(SqlBaseParser.java:6895) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:47) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.plan(AstBuilder.scala:83) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleInsertQuery$1.apply(AstBuilder.scala:158) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleInsertQuery$1.apply(AstBuilder.scala:162) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:96) > at >