[jira] [Updated] (SPARK-3288) All fields in TaskMetrics should be private and use getters/setters
[ https://issues.apache.org/jira/browse/SPARK-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3288: --- Labels: starter (was: ) > All fields in TaskMetrics should be private and use getters/setters > --- > > Key: SPARK-3288 > URL: https://issues.apache.org/jira/browse/SPARK-3288 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Patrick Wendell > Labels: starter > > This is particularly bad because we expose this as a developer API. > Technically a library could create a TaskMetrics object and then change the > values inside of it and pass it onto someone else. It can be written pretty > compactly like below: > {code} > /** >* Number of bytes written for the shuffle by this task >*/ > @volatile private var _shuffleBytesWritten: Long = _ > def incrementShuffleBytesWritten(value: Long) = _shuffleBytesWritten += > value > def decrementShuffleBytesWritten(value: Long) = _shuffleBytesWritten -= > value > def shuffleBytesWritten = _shuffleBytesWritten > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3288) All fields in TaskMetrics should be private and use getters/setters
[ https://issues.apache.org/jira/browse/SPARK-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3288: --- Assignee: (was: Andrew Or) > All fields in TaskMetrics should be private and use getters/setters > --- > > Key: SPARK-3288 > URL: https://issues.apache.org/jira/browse/SPARK-3288 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Patrick Wendell > Labels: starter > > This is particularly bad because we expose this as a developer API. > Technically a library could create a TaskMetrics object and then change the > values inside of it and pass it onto someone else. It can be written pretty > compactly like below: > {code} > /** >* Number of bytes written for the shuffle by this task >*/ > @volatile private var _shuffleBytesWritten: Long = _ > def incrementShuffleBytesWritten(value: Long) = _shuffleBytesWritten += > value > def decrementShuffleBytesWritten(value: Long) = _shuffleBytesWritten -= > value > def shuffleBytesWritten = _shuffleBytesWritten > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3576) Provide script for creating the Spark AMI from scratch
[ https://issues.apache.org/jira/browse/SPARK-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3576. Resolution: Fixed This was fixed in spark-ec2 itself > Provide script for creating the Spark AMI from scratch > -- > > Key: SPARK-3576 > URL: https://issues.apache.org/jira/browse/SPARK-3576 > Project: Spark > Issue Type: Bug > Components: EC2 >Reporter: Patrick Wendell >Assignee: Patrick Wendell > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147465#comment-14147465 ] Patrick Wendell commented on SPARK-3687: Can you perform a jstack on the executor when it is hanging? We usually only post things on JIRA like this when a specific issue has been debugged a bit more. But if you can produce a jstack of the hung executor we can keep it open :) > Spark hang while processing more than 100 sequence files > > > Key: SPARK-3687 > URL: https://issues.apache.org/jira/browse/SPARK-3687 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Ziv Huang > > In my application, I read more than 100 sequence files to a JavaPairRDD, > perform flatmap to get another JavaRDD, and then use takeOrdered to get the > result. > It is quite often (but not always) that the spark hangs while the executing > some of 110th-130th tasks. > The job can hang for several hours, maybe forever (I can't wait for its > completion). > When the spark job hangs, I can't find any error message in anywhere, and I > can't kill the job from web UI. > The current workaround is to use coalesce to reduce the number of partitions > to be processed. > I never get a job hanged if the number of partitions to be processed is no > greater than 80. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2778) Add unit tests for Yarn integration
[ https://issues.apache.org/jira/browse/SPARK-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2778. Resolution: Fixed Fix Version/s: 1.2.0 Fixed by: https://github.com/apache/spark/pull/2257 > Add unit tests for Yarn integration > --- > > Key: SPARK-2778 > URL: https://issues.apache.org/jira/browse/SPARK-2778 > Project: Spark > Issue Type: Test > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 1.2.0 > > > It would be nice to add some Yarn integration tests to the unit tests in > Spark; Yarn provides a "MiniYARNCluster" class that can be used to spawn a > cluster locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2647) DAGScheduler plugs others when processing one JobSubmitted event
[ https://issues.apache.org/jira/browse/SPARK-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147378#comment-14147378 ] Nan Zhu commented on SPARK-2647: isn't it the expected behaviour as we keep DAGScheduler as a single-thread mode? > DAGScheduler plugs others when processing one JobSubmitted event > > > Key: SPARK-2647 > URL: https://issues.apache.org/jira/browse/SPARK-2647 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: YanTang Zhai > > If a few of jobs are submitted, DAGScheduler plugs others when processing one > JobSubmitted event. > For example ont JobSubmitted event is processed as follows and costs much time > "spark-akka.actor.default-dispatcher-67" daemon prio=10 > tid=0x7f75ec001000 nid=0x7dd6 in Object.wait() [0x7f76063e1000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at org.apache.hadoopcdh3.ipc.Client.call(Client.java:1130) > - locked <0x000783b17330> (a org.apache.hadoopcdh3.ipc.Client$Call) > at org.apache.hadoopcdh3.ipc.RPC$Invoker.invoke(RPC.java:241) > at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source) > at sun.reflect.GeneratedMethodAccessor86.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoopcdh3.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:83) > at > org.apache.hadoopcdh3.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:60) > at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source) > at > org.apache.hadoopcdh3.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1472) > at > org.apache.hadoopcdh3.hdfs.DFSClient.getBlockLocations(DFSClient.java:1498) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$1.doCall(Cdh3DistributedFileSystem.java:208) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$1.doCall(Cdh3DistributedFileSystem.java:204) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.getFileBlockLocations(Cdh3DistributedFileSystem.java:204) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:233) > at > StorageEngineClient.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:141) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:172) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) > at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:54) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:54) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:54) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartition
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Description: In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 110th-130th tasks. The job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't find any error message in anywhere, and I can't kill the job from web UI. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get a job hanged if the number of partitions to be processed is no greater than 80. was: In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 110th-130th tasks. The job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't find any error message in anywhere, and I can't kill the job from web UI. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get job hanged if the number of partitions to be processed is no greater than 80. > Spark hang while processing more than 100 sequence files > > > Key: SPARK-3687 > URL: https://issues.apache.org/jira/browse/SPARK-3687 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Ziv Huang > > In my application, I read more than 100 sequence files to a JavaPairRDD, > perform flatmap to get another JavaRDD, and then use takeOrdered to get the > result. > It is quite often (but not always) that the spark hangs while the executing > some of 110th-130th tasks. > The job can hang for several hours, maybe forever (I can't wait for its > completion). > When the spark job hangs, I can't find any error message in anywhere, and I > can't kill the job from web UI. > The current workaround is to use coalesce to reduce the number of partitions > to be processed. > I never get a job hanged if the number of partitions to be processed is no > greater than 80. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Description: In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 110th-130th tasks. The job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't find any error message in anywhere, and I can't kill the job from web UI. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get job hanged if the number of partitions to be processed is no greater than 80. was:In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered > Spark hang while processing more than 100 sequence files > > > Key: SPARK-3687 > URL: https://issues.apache.org/jira/browse/SPARK-3687 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Ziv Huang > > In my application, I read more than 100 sequence files to a JavaPairRDD, > perform flatmap to get another JavaRDD, and then use takeOrdered to get the > result. > It is quite often (but not always) that the spark hangs while the executing > some of 110th-130th tasks. > The job can hang for several hours, maybe forever (I can't wait for its > completion). > When the spark job hangs, I can't find any error message in anywhere, and I > can't kill the job from web UI. > The current workaround is to use coalesce to reduce the number of partitions > to be processed. > I never get job hanged if the number of partitions to be processed is no > greater than 80. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Description: In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered (was: In my application, I read more than 100 sequence files, ) > Spark hang while processing more than 100 sequence files > > > Key: SPARK-3687 > URL: https://issues.apache.org/jira/browse/SPARK-3687 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Ziv Huang > > In my application, I read more than 100 sequence files to a JavaPairRDD, > perform flatmap to get another JavaRDD, and then use takeOrdered -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Description: In my application, I read more than 100 sequence files, (was: I use spark ) > Spark hang while processing more than 100 sequence files > > > Key: SPARK-3687 > URL: https://issues.apache.org/jira/browse/SPARK-3687 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Ziv Huang > > In my application, I read more than 100 sequence files, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Description: I use spark > Spark hang while processing more than 100 sequence files > > > Key: SPARK-3687 > URL: https://issues.apache.org/jira/browse/SPARK-3687 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Ziv Huang > > I use spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Component/s: Spark Core > Spark hang while processing more than 100 sequence files > > > Key: SPARK-3687 > URL: https://issues.apache.org/jira/browse/SPARK-3687 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Ziv Huang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Affects Version/s: 1.0.2 1.1.0 > Spark hang while processing more than 100 sequence files > > > Key: SPARK-3687 > URL: https://issues.apache.org/jira/browse/SPARK-3687 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Ziv Huang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Summary: Spark hang while processing more than 100 sequence files (was: Spark hang while ) > Spark hang while processing more than 100 sequence files > > > Key: SPARK-3687 > URL: https://issues.apache.org/jira/browse/SPARK-3687 > Project: Spark > Issue Type: Bug >Reporter: Ziv Huang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3687) Spark hang while
Ziv Huang created SPARK-3687: Summary: Spark hang while Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Reporter: Ziv Huang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3686) flume.SparkSinkSuite.Success is flaky
[ https://issues.apache.org/jira/browse/SPARK-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147331#comment-14147331 ] Apache Spark commented on SPARK-3686: - User 'harishreedharan' has created a pull request for this issue: https://github.com/apache/spark/pull/2531 > flume.SparkSinkSuite.Success is flaky > - > > Key: SPARK-3686 > URL: https://issues.apache.org/jira/browse/SPARK-3686 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Patrick Wendell >Assignee: Hari Shreedharan >Priority: Blocker > > {code} > Error Message > 4000 did not equal 5000 > Stacktrace > sbt.ForkMain$ForkError: 4000 did not equal 5000 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:498) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1559) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:416) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite.org$apache$spark$streaming$flume$sink$SparkSinkSuite$$assertChannelIsEmpty(SparkSinkSuite.scala:195) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply$mcV$sp(SparkSinkSuite.scala:54) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply(SparkSinkSuite.scala:40) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply(SparkSinkSuite.scala:40) > at > org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) > at > org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:158) > at org.scalatest.Suite$class.withFixture(Suite.scala:1121) > at org.scalatest.FunSuite.withFixture(FunSuite.scala:1559) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:155) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:167) > at org.scalatest.FunSuite.runTest(FunSuite.scala:1559) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:200) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:200) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:200) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1559) > at org.scalatest.Suite$class.run(Suite.scala:1423) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1559) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:204) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:204) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:204) > at org.scalatest.FunSuite.run(FunSuite.scala:1559) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:444) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:651) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > Example test result (this will stop working in a few days): > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/719/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=centos/testReport/junit/org.apache.spark.streaming.flume.sink/SparkSinkSuite/Success_with_ack/ -- This m
[jira] [Resolved] (SPARK-546) Support full outer join and multiple join in a single shuffle
[ https://issues.apache.org/jira/browse/SPARK-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-546. --- Resolution: Fixed Fix Version/s: 1.2.0 Assignee: Aaron Staple Fixed by: https://github.com/apache/spark/pull/1395 > Support full outer join and multiple join in a single shuffle > - > > Key: SPARK-546 > URL: https://issues.apache.org/jira/browse/SPARK-546 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Streaming >Reporter: Reynold Xin >Assignee: Aaron Staple > Fix For: 1.2.0 > > > RDD[(K,V)] now supports left/right outer join but not full outer join. > Also it'd be nice to provide a way for users to join multiple RDDs on the > same key in a single shuffle. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3686) flume.SparkSinkSuite.Success is flaky
[ https://issues.apache.org/jira/browse/SPARK-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147316#comment-14147316 ] Hari Shreedharan commented on SPARK-3686: - Unlike the other tests in this suite, this one does not have a sleep to let the sink commit the transactions back to the channel. So because this does not give enough time for the channel to actually becoming empty. Let me add a sleep - will send a PR and run the pre-commit hook a bunch of times to ensure that it fixes it. > flume.SparkSinkSuite.Success is flaky > - > > Key: SPARK-3686 > URL: https://issues.apache.org/jira/browse/SPARK-3686 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Patrick Wendell >Assignee: Hari Shreedharan >Priority: Blocker > > {code} > Error Message > 4000 did not equal 5000 > Stacktrace > sbt.ForkMain$ForkError: 4000 did not equal 5000 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:498) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1559) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:416) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite.org$apache$spark$streaming$flume$sink$SparkSinkSuite$$assertChannelIsEmpty(SparkSinkSuite.scala:195) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply$mcV$sp(SparkSinkSuite.scala:54) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply(SparkSinkSuite.scala:40) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply(SparkSinkSuite.scala:40) > at > org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) > at > org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:158) > at org.scalatest.Suite$class.withFixture(Suite.scala:1121) > at org.scalatest.FunSuite.withFixture(FunSuite.scala:1559) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:155) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:167) > at org.scalatest.FunSuite.runTest(FunSuite.scala:1559) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:200) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:200) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:200) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1559) > at org.scalatest.Suite$class.run(Suite.scala:1423) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1559) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:204) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:204) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:204) > at org.scalatest.FunSuite.run(FunSuite.scala:1559) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:444) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:651) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > Example test result (this will stop working in a
[jira] [Commented] (SPARK-3686) flume.SparkSinkSuite.Success is flaky
[ https://issues.apache.org/jira/browse/SPARK-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147314#comment-14147314 ] Hari Shreedharan commented on SPARK-3686: - Looking into this. > flume.SparkSinkSuite.Success is flaky > - > > Key: SPARK-3686 > URL: https://issues.apache.org/jira/browse/SPARK-3686 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Patrick Wendell >Assignee: Hari Shreedharan >Priority: Blocker > > {code} > Error Message > 4000 did not equal 5000 > Stacktrace > sbt.ForkMain$ForkError: 4000 did not equal 5000 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:498) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1559) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:416) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite.org$apache$spark$streaming$flume$sink$SparkSinkSuite$$assertChannelIsEmpty(SparkSinkSuite.scala:195) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply$mcV$sp(SparkSinkSuite.scala:54) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply(SparkSinkSuite.scala:40) > at > org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply(SparkSinkSuite.scala:40) > at > org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) > at > org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:158) > at org.scalatest.Suite$class.withFixture(Suite.scala:1121) > at org.scalatest.FunSuite.withFixture(FunSuite.scala:1559) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:155) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:167) > at org.scalatest.FunSuite.runTest(FunSuite.scala:1559) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:200) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:200) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:200) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1559) > at org.scalatest.Suite$class.run(Suite.scala:1423) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1559) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:204) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:204) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:204) > at org.scalatest.FunSuite.run(FunSuite.scala:1559) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:444) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:651) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > Example test result (this will stop working in a few days): > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/719/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=centos/testReport/junit/org.apache.spark.streaming.flume.sink/SparkSinkSuite/Success_with_ack/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SPARK-3666) Extract interfaces for EdgeRDD and VertexRDD
[ https://issues.apache.org/jira/browse/SPARK-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147280#comment-14147280 ] Apache Spark commented on SPARK-3666: - User 'ankurdave' has created a pull request for this issue: https://github.com/apache/spark/pull/2530 > Extract interfaces for EdgeRDD and VertexRDD > > > Key: SPARK-3666 > URL: https://issues.apache.org/jira/browse/SPARK-3666 > Project: Spark > Issue Type: Improvement > Components: GraphX >Reporter: Ankur Dave >Assignee: Ankur Dave > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3665) Java API for GraphX
[ https://issues.apache.org/jira/browse/SPARK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave updated SPARK-3665: -- Description: The Java API will wrap the Scala API in a similar manner as JavaRDD. Components will include: # JavaGraph #- removes optional param from persist, subgraph, mapReduceTriplets, Graph.fromEdgeTuples, Graph.fromEdges, Graph.apply #- removes implicit {{=:=}} param from mapVertices, outerJoinVertices #- merges multiple parameters lists #- incorporates GraphOps # JavaVertexRDD # JavaEdgeRDD # JavaGraphLoader #- removes optional params, or uses builder pattern was: The Java API will wrap the Scala API in a similar manner as JavaRDD. Components will include: 1. JavaGraph -- removes optional param from persist, subgraph, mapReduceTriplets, Graph.fromEdgeTuples, Graph.fromEdges, Graph.apply -- removes implicit {{=:=}} param from mapVertices, outerJoinVertices -- merges multiple parameters lists -- incorporates GraphOps 2. JavaVertexRDD 3. JavaEdgeRDD 4. JavaGraphLoader -- removes optional params, or uses builder pattern > Java API for GraphX > --- > > Key: SPARK-3665 > URL: https://issues.apache.org/jira/browse/SPARK-3665 > Project: Spark > Issue Type: Improvement > Components: GraphX, Java API >Reporter: Ankur Dave >Assignee: Ankur Dave > > The Java API will wrap the Scala API in a similar manner as JavaRDD. > Components will include: > # JavaGraph > #- removes optional param from persist, subgraph, mapReduceTriplets, > Graph.fromEdgeTuples, Graph.fromEdges, Graph.apply > #- removes implicit {{=:=}} param from mapVertices, outerJoinVertices > #- merges multiple parameters lists > #- incorporates GraphOps > # JavaVertexRDD > # JavaEdgeRDD > # JavaGraphLoader > #- removes optional params, or uses builder pattern -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3610) History server log name should not be based on user input
[ https://issues.apache.org/jira/browse/SPARK-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147274#comment-14147274 ] Kousuke Saruta edited comment on SPARK-3610 at 9/25/14 2:35 AM: Hi [~SK], I'm trying to resolve similar issue and I think I can resolve this issue using Application ID. See https://github.com/apache/spark/pull/2432 was (Author: sarutak): Hi [~skrishna...@gmail.com], I'm trying to resolve similar issue and I think I can resolve this issue using Application ID. See https://github.com/apache/spark/pull/2432 > History server log name should not be based on user input > - > > Key: SPARK-3610 > URL: https://issues.apache.org/jira/browse/SPARK-3610 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: SK >Priority: Critical > > Right now we use the user-defined application name when creating the logging > file for the history server. We should use some type of GUID generated from > inside of Spark instead of allowing user input here. It can cause errors if > users provide characters that are not valid in filesystem paths. > Original bug report: > {quote} > The default log files for the Mllib examples use a rather long naming > convention that includes special characters like parentheses and comma.For > e.g. one of my log files is named > "binaryclassifier-with-params(input.txt,100,1.0,svm,l2,0.1)-1410566770032". > When I click on the program on the history server page (at port 18080), to > view the detailed application logs, the history server crashes and I need to > restart it. I am using Spark 1.1 on a mesos cluster. > I renamed the log file by removing the special characters and then it loads > up correctly. I am not sure which program is creating the log files. Can it > be changed so that the default log file naming convention does not include > special characters? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3610) History server log name should not be based on user input
[ https://issues.apache.org/jira/browse/SPARK-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147274#comment-14147274 ] Kousuke Saruta commented on SPARK-3610: --- Hi [~skrishna...@gmail.com], I'm trying to resolve similar issue and I think I can resolve this issue using Application ID. See https://github.com/apache/spark/pull/2432 > History server log name should not be based on user input > - > > Key: SPARK-3610 > URL: https://issues.apache.org/jira/browse/SPARK-3610 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: SK >Priority: Critical > > Right now we use the user-defined application name when creating the logging > file for the history server. We should use some type of GUID generated from > inside of Spark instead of allowing user input here. It can cause errors if > users provide characters that are not valid in filesystem paths. > Original bug report: > {quote} > The default log files for the Mllib examples use a rather long naming > convention that includes special characters like parentheses and comma.For > e.g. one of my log files is named > "binaryclassifier-with-params(input.txt,100,1.0,svm,l2,0.1)-1410566770032". > When I click on the program on the history server page (at port 18080), to > view the detailed application logs, the history server crashes and I need to > restart it. I am using Spark 1.1 on a mesos cluster. > I renamed the log file by removing the special characters and then it loads > up correctly. I am not sure which program is creating the log files. Can it > be changed so that the default log file naming convention does not include > special characters? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3412) Add Missing Types for Row API
[ https://issues.apache.org/jira/browse/SPARK-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147261#comment-14147261 ] Apache Spark commented on SPARK-3412: - User 'adrian-wang' has created a pull request for this issue: https://github.com/apache/spark/pull/2529 > Add Missing Types for Row API > - > > Key: SPARK-3412 > URL: https://issues.apache.org/jira/browse/SPARK-3412 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Cheng Hao >Assignee: Cheng Hao >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3685) Spark's local dir scheme is not configurable
[ https://issues.apache.org/jira/browse/SPARK-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147206#comment-14147206 ] Andrew Or commented on SPARK-3685: -- Note that this is not meaningful unless we also change the usages of this to use the Hadoop FileSystem. This requires a non-trivial refactor of shuffle and spill code to use the Hadoop API. > Spark's local dir scheme is not configurable > > > Key: SPARK-3685 > URL: https://issues.apache.org/jira/browse/SPARK-3685 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.1.0 >Reporter: Andrew Or > > When you try to set local dirs to "hdfs:/tmp/foo" it doesn't work. What it > will try to do is create a folder called "hdfs:" and put "tmp" inside it. > This is because in Util#getOrCreateLocalRootDirs we use java.io.File instead > of Hadoop's file system to parse this path. We also need to resolve the path > appropriately. > This may not have an urgent use case, but it fails silently and does what is > least expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3686) flume.SparkSinkSuite.Success is flaky
Patrick Wendell created SPARK-3686: -- Summary: flume.SparkSinkSuite.Success is flaky Key: SPARK-3686 URL: https://issues.apache.org/jira/browse/SPARK-3686 Project: Spark Issue Type: Bug Components: Streaming Reporter: Patrick Wendell Assignee: Hari Shreedharan Priority: Blocker {code} Error Message 4000 did not equal 5000 Stacktrace sbt.ForkMain$ForkError: 4000 did not equal 5000 at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:498) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1559) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:416) at org.apache.spark.streaming.flume.sink.SparkSinkSuite.org$apache$spark$streaming$flume$sink$SparkSinkSuite$$assertChannelIsEmpty(SparkSinkSuite.scala:195) at org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply$mcV$sp(SparkSinkSuite.scala:54) at org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply(SparkSinkSuite.scala:40) at org.apache.spark.streaming.flume.sink.SparkSinkSuite$$anonfun$1.apply(SparkSinkSuite.scala:40) at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:158) at org.scalatest.Suite$class.withFixture(Suite.scala:1121) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1559) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:155) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:167) at org.scalatest.FunSuite.runTest(FunSuite.scala:1559) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:200) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:200) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:200) at org.scalatest.FunSuite.runTests(FunSuite.scala:1559) at org.scalatest.Suite$class.run(Suite.scala:1423) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1559) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:204) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:204) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:204) at org.scalatest.FunSuite.run(FunSuite.scala:1559) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:444) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:651) at sbt.ForkMain$Run$2.call(ForkMain.java:294) at sbt.ForkMain$Run$2.call(ForkMain.java:284) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} Example test result (this will stop working in a few days): https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/719/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=centos/testReport/junit/org.apache.spark.streaming.flume.sink/SparkSinkSuite/Success_with_ack/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3476) Yarn ClientBase.validateArgs memory checks wrong
[ https://issues.apache.org/jira/browse/SPARK-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147180#comment-14147180 ] Apache Spark commented on SPARK-3476: - User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/2528 > Yarn ClientBase.validateArgs memory checks wrong > > > Key: SPARK-3476 > URL: https://issues.apache.org/jira/browse/SPARK-3476 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.2.0 >Reporter: Thomas Graves > > The yarn ClientBase.validateArgs memory checks are no longer valid. It used > to be that the overhead was taken out of what the user specified, now we add > it on top of what the user specifies. We can probably just remove these. > (args.amMemory <= memoryOverhead) -> ("Error: AM memory size must be" + > "greater than: " + memoryOverhead), > (args.executorMemory <= memoryOverhead) -> ("Error: Executor memory > size" + > "must be greater than: " + memoryOverhead.toString) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3685) Spark's local dir scheme is not configurable
Andrew Or created SPARK-3685: Summary: Spark's local dir scheme is not configurable Key: SPARK-3685 URL: https://issues.apache.org/jira/browse/SPARK-3685 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.1.0 Reporter: Andrew Or When you try to set local dirs to "hdfs:/tmp/foo" it doesn't work. What it will try to do is create a folder called "hdfs:" and put "tmp" inside it. This is because in Util#getOrCreateLocalRootDirs we use java.io.File instead of Hadoop's file system to parse this path. We also need to resolve the path appropriately. This may not have an urgent use case, but it fails silently and does what is least expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3615) Kafka test should not hard code Zookeeper port
[ https://issues.apache.org/jira/browse/SPARK-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3615. Resolution: Fixed https://github.com/apache/spark/pull/2483 > Kafka test should not hard code Zookeeper port > -- > > Key: SPARK-3615 > URL: https://issues.apache.org/jira/browse/SPARK-3615 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Patrick Wendell >Assignee: Saisai Shao >Priority: Blocker > > This is causing failures in our master build if port 2181 is contented. > Instead of binding to a static port we should re-factor this such that it > opens a socket on port 0 and then reads back the port. So we can never have > contention. > {code} > sbt.ForkMain$ForkError: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) > at > org.apache.spark.streaming.kafka.KafkaTestUtils$EmbeddedZookeeper.(KafkaStreamSuite.scala:200) > at > org.apache.spark.streaming.kafka.KafkaStreamSuite.beforeFunction(KafkaStreamSuite.scala:62) > at > org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.setUp(JavaKafkaStreamSuite.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) > at org.junit.runners.ParentRunner.run(ParentRunner.java:300) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:24) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) > at org.junit.runners.ParentRunner.run(ParentRunner.java:300) > at org.junit.runner.JUnitCore.run(JUnitCore.java:157) > at org.junit.runner.JUnitCore.run(JUnitCore.java:136) > at com.novocode.junit.JUnitRunner.run(JUnitRunner.java:90) > at sbt.RunnerWrapper$1.runRunner2(FrameworkWrapper.java:223) > at sbt.RunnerWrapper$1.execute(FrameworkWrapper.java:236) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3681) Failed to serialized ArrayType or MapType after accessing them in Python
[ https://issues.apache.org/jira/browse/SPARK-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3681: --- Component/s: PySpark > Failed to serialized ArrayType or MapType after accessing them in Python > - > > Key: SPARK-3681 > URL: https://issues.apache.org/jira/browse/SPARK-3681 > Project: Spark > Issue Type: Bug > Components: PySpark >Reporter: Davies Liu >Assignee: Davies Liu > > {code} > files_schema_rdd.map(lambda x: x.files).take(1) > {code} > Also it will lose the schema after iterate an ArrayType. > {code} > files_schema_rdd.map(lambda x: [f.batch for f in x.files]).take(1) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3663) Document SPARK_LOG_DIR and SPARK_PID_DIR
[ https://issues.apache.org/jira/browse/SPARK-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3663: --- Component/s: Documentation > Document SPARK_LOG_DIR and SPARK_PID_DIR > > > Key: SPARK-3663 > URL: https://issues.apache.org/jira/browse/SPARK-3663 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: Andrew Ash >Assignee: Andrew Ash > > I'm using these two parameters in some puppet scripts for standalone > deployment and realized that they're not documented anywhere. We should > document them -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3610) History server log name should not be based on user input
[ https://issues.apache.org/jira/browse/SPARK-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3610: --- Component/s: Spark Core > History server log name should not be based on user input > - > > Key: SPARK-3610 > URL: https://issues.apache.org/jira/browse/SPARK-3610 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: SK >Priority: Critical > > Right now we use the user-defined application name when creating the logging > file for the history server. We should use some type of GUID generated from > inside of Spark instead of allowing user input here. It can cause errors if > users provide characters that are not valid in filesystem paths. > Original bug report: > {quote} > The default log files for the Mllib examples use a rather long naming > convention that includes special characters like parentheses and comma.For > e.g. one of my log files is named > "binaryclassifier-with-params(input.txt,100,1.0,svm,l2,0.1)-1410566770032". > When I click on the program on the history server page (at port 18080), to > view the detailed application logs, the history server crashes and I need to > restart it. I am using Spark 1.1 on a mesos cluster. > I renamed the log file by removing the special characters and then it loads > up correctly. I am not sure which program is creating the log files. Can it > be changed so that the default log file naming convention does not include > special characters? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3604) unbounded recursion in getNumPartitions triggers stack overflow for large UnionRDD
[ https://issues.apache.org/jira/browse/SPARK-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3604. Resolution: Not a Problem > unbounded recursion in getNumPartitions triggers stack overflow for large > UnionRDD > -- > > Key: SPARK-3604 > URL: https://issues.apache.org/jira/browse/SPARK-3604 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 > Environment: linux. Used python, but error is in Scala land. >Reporter: Eric Friedman >Priority: Critical > > I have a large number of parquet files all with the same schema and attempted > to make a UnionRDD out of them. > When I call getNumPartitions(), I get a stack overflow error > that looks like this: > Py4JJavaError: An error occurred while calling o3275.partitions. > : java.lang.StackOverflowError > at > scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:239) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:243) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:65) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:65) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:65) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:65) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3684) Can't configure local dirs in Yarn mode
Andrew Or created SPARK-3684: Summary: Can't configure local dirs in Yarn mode Key: SPARK-3684 URL: https://issues.apache.org/jira/browse/SPARK-3684 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.1.0 Reporter: Andrew Or We can't set SPARK_LOCAL_DIRS or spark.local.dirs because they're not picked up in Yarn mode. However, we can't set YARN_LOCAL_DIRS or LOCAL_DIRS either because these are overridden by Yarn. I'm trying to set these through SPARK_YARN_USER_ENV. I'm aware that the default behavior is for Spark to use Yarn's local dirs, but right now there's no way to change it even if the user wants to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2691) Allow Spark on Mesos to be launched with Docker
[ https://issues.apache.org/jira/browse/SPARK-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2691: --- Assignee: Tim Chen (was: Timothy Hunter) > Allow Spark on Mesos to be launched with Docker > --- > > Key: SPARK-2691 > URL: https://issues.apache.org/jira/browse/SPARK-2691 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Timothy Chen >Assignee: Tim Chen > Labels: mesos > > Currently to launch Spark with Mesos one must upload a tarball and specifiy > the executor URI to be passed in that is to be downloaded on each slave or > even each execution depending coarse mode or not. > We want to make Spark able to support launching Executors via a Docker image > that utilizes the recent Docker and Mesos integration work. > With the recent integration Spark can simply specify a Docker image and > options that is needed and it should continue to work as-is. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2691) Allow Spark on Mesos to be launched with Docker
[ https://issues.apache.org/jira/browse/SPARK-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2691: --- Assignee: Timothy Hunter > Allow Spark on Mesos to be launched with Docker > --- > > Key: SPARK-2691 > URL: https://issues.apache.org/jira/browse/SPARK-2691 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Timothy Chen >Assignee: Timothy Hunter > Labels: mesos > > Currently to launch Spark with Mesos one must upload a tarball and specifiy > the executor URI to be passed in that is to be downloaded on each slave or > even each execution depending coarse mode or not. > We want to make Spark able to support launching Executors via a Docker image > that utilizes the recent Docker and Mesos integration work. > With the recent integration Spark can simply specify a Docker image and > options that is needed and it should continue to work as-is. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2691) Allow Spark on Mesos to be launched with Docker
[ https://issues.apache.org/jira/browse/SPARK-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2691: --- Assignee: Timothy Chen (was: Tim Chen) > Allow Spark on Mesos to be launched with Docker > --- > > Key: SPARK-2691 > URL: https://issues.apache.org/jira/browse/SPARK-2691 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Timothy Chen >Assignee: Timothy Chen > Labels: mesos > > Currently to launch Spark with Mesos one must upload a tarball and specifiy > the executor URI to be passed in that is to be downloaded on each slave or > even each execution depending coarse mode or not. > We want to make Spark able to support launching Executors via a Docker image > that utilizes the recent Docker and Mesos integration work. > With the recent integration Spark can simply specify a Docker image and > options that is needed and it should continue to work as-is. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3678) Yarn app name reported in RM is different between cluster and client mode
[ https://issues.apache.org/jira/browse/SPARK-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3678: - Affects Version/s: (was: 1.2.0) 1.1.0 > Yarn app name reported in RM is different between cluster and client mode > - > > Key: SPARK-3678 > URL: https://issues.apache.org/jira/browse/SPARK-3678 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves > > If you launch an application in yarn cluster mode the name of the application > in the ResourceManager generally shows up as the full name > org.apache.spark.examples.SparkHdfsLR. If you start the same app in client > mode it shows up as SparkHdfsLR. > We should be consistent between them. > I haven't looked at it in detail, perhaps its only the examples but I think > I've seen this with customer apps also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3683) PySpark Hive query generates "NULL" instead of None
Tamas Jambor created SPARK-3683: --- Summary: PySpark Hive query generates "NULL" instead of None Key: SPARK-3683 URL: https://issues.apache.org/jira/browse/SPARK-3683 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 1.1.0 Reporter: Tamas Jambor When I run a Hive query in Spark SQL, I get the new Row object, where it does not convert Hive NULL into Python None instead it keeps it string 'NULL'. It's only an issue with String type, works with other types. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-889) Bring back DFS broadcast
[ https://issues.apache.org/jira/browse/SPARK-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147047#comment-14147047 ] Reynold Xin commented on SPARK-889: --- Yea I think we should close this as won't fix for now. > Bring back DFS broadcast > > > Key: SPARK-889 > URL: https://issues.apache.org/jira/browse/SPARK-889 > Project: Spark > Issue Type: Improvement >Reporter: Matei Zaharia >Priority: Minor > > DFS broadcast was a simple way to get better-than-single-master performance > for broadcast, so we should add it back for people who have HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-889) Bring back DFS broadcast
[ https://issues.apache.org/jira/browse/SPARK-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-889. --- Resolution: Won't Fix > Bring back DFS broadcast > > > Key: SPARK-889 > URL: https://issues.apache.org/jira/browse/SPARK-889 > Project: Spark > Issue Type: Improvement >Reporter: Matei Zaharia >Priority: Minor > > DFS broadcast was a simple way to get better-than-single-master performance > for broadcast, so we should add it back for people who have HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3639) Kinesis examples set master as local
[ https://issues.apache.org/jira/browse/SPARK-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147042#comment-14147042 ] Josh Rosen commented on SPARK-3639: --- This sounds reasonable to me; feel free to open a PR. If you look at most of the other Spark examples, they only set the appName when creating the SparkContext and leave the master unspecified in order to allow it to be set when passing the script to {{spark-submit}}. > Kinesis examples set master as local > > > Key: SPARK-3639 > URL: https://issues.apache.org/jira/browse/SPARK-3639 > Project: Spark > Issue Type: Bug > Components: Examples, Streaming >Affects Versions: 1.0.2, 1.1.0 >Reporter: Aniket Bhatnagar >Priority: Minor > Labels: examples > > Kinesis examples set master as local thus not allowing the example to be > tested on a cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-889) Bring back DFS broadcast
[ https://issues.apache.org/jira/browse/SPARK-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147032#comment-14147032 ] Josh Rosen commented on SPARK-889: -- In fact, I think [~rxin] has some JIRAs and PRs to make TorrentBroadcast _even_ better than it is now (it was greatly improved from 1.0.2 to 1.1.0), so it's probably safe to close this. > Bring back DFS broadcast > > > Key: SPARK-889 > URL: https://issues.apache.org/jira/browse/SPARK-889 > Project: Spark > Issue Type: Improvement >Reporter: Matei Zaharia >Priority: Minor > > DFS broadcast was a simple way to get better-than-single-master performance > for broadcast, so we should add it back for people who have HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3682) Add helpful warnings to the UI
[ https://issues.apache.org/jira/browse/SPARK-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3682: -- Target Version/s: 1.3.0 (was: 1.2.0) > Add helpful warnings to the UI > -- > > Key: SPARK-3682 > URL: https://issues.apache.org/jira/browse/SPARK-3682 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 1.1.0 >Reporter: Sandy Ryza > > Spark has a zillion configuration options and a zillion different things that > can go wrong with a job. Improvements like incremental and better metrics > and the proposed spark replay debugger provide more insight into what's going > on under the covers. However, it's difficult for non-advanced users to > synthesize this information and understand where to direct their attention. > It would be helpful to have some sort of central location on the UI users > could go to that would provide indications about why an app/job is failing or > performing poorly. > Some helpful messages that we could provide: > * Warn that the tasks in a particular stage are spending a long time in GC. > * Warn that spark.shuffle.memoryFraction does not fit inside the young > generation. > * Warn that tasks in a particular stage are very short, and that the number > of partitions should probably be decreased. > * Warn that tasks in a particular stage are spilling a lot, and that the > number of partitions should probably be decreased. > * Warn that a cached RDD that gets a lot of use does not fit in memory, and a > lot of time is being spent recomputing it. > To start, probably two kinds of warnings would be most helpful. > * Warnings at the app level that report on misconfigurations, issues with the > general health of executors. > * Warnings at the job level that indicate why a job might be performing > slowly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3682) Add helpful warnings to the UI
[ https://issues.apache.org/jira/browse/SPARK-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3682: - Target Version/s: 1.2.0 Affects Version/s: 1.1.0 > Add helpful warnings to the UI > -- > > Key: SPARK-3682 > URL: https://issues.apache.org/jira/browse/SPARK-3682 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 1.1.0 >Reporter: Sandy Ryza > > Spark has a zillion configuration options and a zillion different things that > can go wrong with a job. Improvements like incremental and better metrics > and the proposed spark replay debugger provide more insight into what's going > on under the covers. However, it's difficult for non-advanced users to > synthesize this information and understand where to direct their attention. > It would be helpful to have some sort of central location on the UI users > could go to that would provide indications about why an app/job is failing or > performing poorly. > Some helpful messages that we could provide: > * Warn that the tasks in a particular stage are spending a long time in GC. > * Warn that spark.shuffle.memoryFraction does not fit inside the young > generation. > * Warn that tasks in a particular stage are very short, and that the number > of partitions should probably be decreased. > * Warn that tasks in a particular stage are spilling a lot, and that the > number of partitions should probably be decreased. > * Warn that a cached RDD that gets a lot of use does not fit in memory, and a > lot of time is being spent recomputing it. > To start, probably two kinds of warnings would be most helpful. > * Warnings at the app level that report on misconfigurations, issues with the > general health of executors. > * Warnings at the job level that indicate why a job might be performing > slowly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2131) Collect per-task filesystem-bytes-read/written metrics
[ https://issues.apache.org/jira/browse/SPARK-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-2131. --- Resolution: Duplicate > Collect per-task filesystem-bytes-read/written metrics > -- > > Key: SPARK-2131 > URL: https://issues.apache.org/jira/browse/SPARK-2131 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Sandy Ryza > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3682) Add helpful warnings to the UI
Sandy Ryza created SPARK-3682: - Summary: Add helpful warnings to the UI Key: SPARK-3682 URL: https://issues.apache.org/jira/browse/SPARK-3682 Project: Spark Issue Type: New Feature Components: Web UI Reporter: Sandy Ryza Spark has a zillion configuration options and a zillion different things that can go wrong with a job. Improvements like incremental and better metrics and the proposed spark replay debugger provide more insight into what's going on under the covers. However, it's difficult for non-advanced users to synthesize this information and understand where to direct their attention. It would be helpful to have some sort of central location on the UI users could go to that would provide indications about why an app/job is failing or performing poorly. Some helpful messages that we could provide: * Warn that the tasks in a particular stage are spending a long time in GC. * Warn that spark.shuffle.memoryFraction does not fit inside the young generation. * Warn that tasks in a particular stage are very short, and that the number of partitions should probably be decreased. * Warn that tasks in a particular stage are spilling a lot, and that the number of partitions should probably be decreased. * Warn that a cached RDD that gets a lot of use does not fit in memory, and a lot of time is being spent recomputing it. To start, probably two kinds of warnings would be most helpful. * Warnings at the app level that report on misconfigurations, issues with the general health of executors. * Warnings at the job level that indicate why a job might be performing slowly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3681) Failed to serialized ArrayType or MapType after accessing them in Python
[ https://issues.apache.org/jira/browse/SPARK-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146903#comment-14146903 ] Apache Spark commented on SPARK-3681: - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/2526 > Failed to serialized ArrayType or MapType after accessing them in Python > - > > Key: SPARK-3681 > URL: https://issues.apache.org/jira/browse/SPARK-3681 > Project: Spark > Issue Type: Bug >Reporter: Davies Liu >Assignee: Davies Liu > > {code} > files_schema_rdd.map(lambda x: x.files).take(1) > {code} > Also it will lose the schema after iterate an ArrayType. > {code} > files_schema_rdd.map(lambda x: [f.batch for f in x.files]).take(1) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3681) Failed to serialized ArrayType or MapType after accessing them in Python
Davies Liu created SPARK-3681: - Summary: Failed to serialized ArrayType or MapType after accessing them in Python Key: SPARK-3681 URL: https://issues.apache.org/jira/browse/SPARK-3681 Project: Spark Issue Type: Bug Reporter: Davies Liu {code} files_schema_rdd.map(lambda x: x.files).take(1) {code} Also it will lose the schema after iterate an ArrayType. {code} files_schema_rdd.map(lambda x: [f.batch for f in x.files]).take(1) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3679) pickle the exact globals of functions
[ https://issues.apache.org/jira/browse/SPARK-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-3679. --- Resolution: Fixed Fix Version/s: 1.2.0 Issue resolved by pull request 2522 [https://github.com/apache/spark/pull/2522] > pickle the exact globals of functions > - > > Key: SPARK-3679 > URL: https://issues.apache.org/jira/browse/SPARK-3679 > Project: Spark > Issue Type: Bug > Components: PySpark >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Critical > Fix For: 1.2.0 > > > function.func_code.co_names has all the names used in the function, including > name of attributes. It will pickle some unnecessary globals if there is a > global having the same name with attribute (in co_names). > There is a regression introduced by PR 2114 > https://github.com/apache/spark/pull/2144/files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-889) Bring back DFS broadcast
[ https://issues.apache.org/jira/browse/SPARK-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146804#comment-14146804 ] Andrew Ash commented on SPARK-889: -- [~matei] should we close ticket this as Won't Fix then, since effort is better spent making TorrentBroadcast better? > Bring back DFS broadcast > > > Key: SPARK-889 > URL: https://issues.apache.org/jira/browse/SPARK-889 > Project: Spark > Issue Type: Improvement >Reporter: Matei Zaharia >Priority: Minor > > DFS broadcast was a simple way to get better-than-single-master performance > for broadcast, so we should add it back for people who have HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3634) Python modules added through addPyFile should take precedence over system modules
[ https://issues.apache.org/jira/browse/SPARK-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-3634. --- Resolution: Fixed Fix Version/s: 1.2.0 Issue resolved by pull request 2492 [https://github.com/apache/spark/pull/2492] > Python modules added through addPyFile should take precedence over system > modules > - > > Key: SPARK-3634 > URL: https://issues.apache.org/jira/browse/SPARK-3634 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 1.0.2, 1.1.0 >Reporter: Josh Rosen > Fix For: 1.2.0 > > > Python modules added through {{SparkContext.addPyFile()}} are currently > _appended_ to {{sys.path}}; this is probably the opposite of the behavior > that we want, since it causes system versions of modules to take precedence > over versions explicitly added by users. > To fix this, we should change the {{sys.path}} manipulation code in > {{context.py}} and {{worker.py}} to prepend files to {{sys.path}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3680) java.lang.Exception: makeCopy when using HiveGeneric UDFs on Converted Parquet Metastore tables
[ https://issues.apache.org/jira/browse/SPARK-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146738#comment-14146738 ] Apache Spark commented on SPARK-3680: - User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/2525 > java.lang.Exception: makeCopy when using HiveGeneric UDFs on Converted > Parquet Metastore tables > --- > > Key: SPARK-3680 > URL: https://issues.apache.org/jira/browse/SPARK-3680 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Critical > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3680) java.lang.Exception: makeCopy when using HiveGeneric UDFs on Converted Parquet Metastore tables
Michael Armbrust created SPARK-3680: --- Summary: java.lang.Exception: makeCopy when using HiveGeneric UDFs on Converted Parquet Metastore tables Key: SPARK-3680 URL: https://issues.apache.org/jira/browse/SPARK-3680 Project: Spark Issue Type: Bug Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146718#comment-14146718 ] Apache Spark commented on SPARK-3628: - User 'CodingCat' has created a pull request for this issue: https://github.com/apache/spark/pull/2524 > Don't apply accumulator updates multiple times for tasks in result stages > - > > Key: SPARK-3628 > URL: https://issues.apache.org/jira/browse/SPARK-3628 > Project: Spark > Issue Type: Bug >Reporter: Matei Zaharia >Assignee: Nan Zhu >Priority: Blocker > > In previous versions of Spark, accumulator updates only got applied once for > accumulators that are only used in actions (i.e. result stages), letting you > use them to deterministically compute a result. Unfortunately, this got > broken in some recent refactorings. > This is related to https://issues.apache.org/jira/browse/SPARK-732, but that > issue is about applying the same semantics to intermediate stages too, which > is more work and may not be what we want for debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146714#comment-14146714 ] Nan Zhu commented on SPARK-3628: https://github.com/apache/spark/pull/2524 > Don't apply accumulator updates multiple times for tasks in result stages > - > > Key: SPARK-3628 > URL: https://issues.apache.org/jira/browse/SPARK-3628 > Project: Spark > Issue Type: Bug >Reporter: Matei Zaharia >Priority: Blocker > > In previous versions of Spark, accumulator updates only got applied once for > accumulators that are only used in actions (i.e. result stages), letting you > use them to deterministically compute a result. Unfortunately, this got > broken in some recent refactorings. > This is related to https://issues.apache.org/jira/browse/SPARK-732, but that > issue is about applying the same semantics to intermediate stages too, which > is more work and may not be what we want for debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3659) Set EC2 version to 1.1.0 in master branch
[ https://issues.apache.org/jira/browse/SPARK-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3659. Resolution: Fixed Fix Version/s: 1.2.0 https://github.com/apache/spark/pull/2510 > Set EC2 version to 1.1.0 in master branch > - > > Key: SPARK-3659 > URL: https://issues.apache.org/jira/browse/SPARK-3659 > Project: Spark > Issue Type: Bug > Components: EC2 >Reporter: Shivaram Venkataraman >Assignee: Shivaram Venkataraman >Priority: Minor > Fix For: 1.2.0 > > > Master branch should be in sync with branch-1.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3679) pickle the exact globals of functions
[ https://issues.apache.org/jira/browse/SPARK-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146691#comment-14146691 ] Apache Spark commented on SPARK-3679: - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/2522 > pickle the exact globals of functions > - > > Key: SPARK-3679 > URL: https://issues.apache.org/jira/browse/SPARK-3679 > Project: Spark > Issue Type: Bug > Components: PySpark >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Critical > > function.func_code.co_names has all the names used in the function, including > name of attributes. It will pickle some unnecessary globals if there is a > global having the same name with attribute (in co_names). > There is a regression introduced by PR 2114 > https://github.com/apache/spark/pull/2144/files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3679) pickle the exact globals of functions
Davies Liu created SPARK-3679: - Summary: pickle the exact globals of functions Key: SPARK-3679 URL: https://issues.apache.org/jira/browse/SPARK-3679 Project: Spark Issue Type: Bug Components: PySpark Reporter: Davies Liu Priority: Critical function.func_code.co_names has all the names used in the function, including name of attributes. It will pickle some unnecessary globals if there is a global having the same name with attribute (in co_names). There is a regression introduced by PR 2114 https://github.com/apache/spark/pull/2144/files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3466) Limit size of results that a driver collects for each action
[ https://issues.apache.org/jira/browse/SPARK-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ash updated SPARK-3466: -- Description: Right now, operations like {{collect()}} and {{take()}} can crash the driver with an OOM if they bring back too many data. We should add a {{spark.driver.maxResultSize}} setting (or something like that) that will make the driver abort a job if its result is too big. We can set it to some fraction of the driver's memory by default, or to something like 100 MB. (was: Right now, operations like collect() and take() can crash the driver if they bring back too many data. We should add a spark.driver.maxResultSize setting (or something like that) that will make the driver abort a job if its result is too big. We can set it to some fraction of the driver's memory by default, or to something like 100 MB.) > Limit size of results that a driver collects for each action > > > Key: SPARK-3466 > URL: https://issues.apache.org/jira/browse/SPARK-3466 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Matei Zaharia > > Right now, operations like {{collect()}} and {{take()}} can crash the driver > with an OOM if they bring back too many data. We should add a > {{spark.driver.maxResultSize}} setting (or something like that) that will > make the driver abort a job if its result is too big. We can set it to some > fraction of the driver's memory by default, or to something like 100 MB. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3662) Importing pandas breaks included pi.py example
[ https://issues.apache.org/jira/browse/SPARK-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146639#comment-14146639 ] Evan Samanas commented on SPARK-3662: - I wouldn't focus on the example, that I modified it, or whether I should be importing a small portion of pandas. The issue here is that Spark breaks in this case because of a name collision. Modifying the example is simply the one reproducer I've found. I was modifying the example to learn about how Spark ships Python code to the cluster. In this case, I expected pandas to only be imported in the driver program and not to be imported by any workers. The workers do not have pandas installed, so expected behavior means the example would run to completion, and an ImportError would mean that the workers are importing things they don't need for the task at hand. The way I expected Spark to work IS actually how Spark works...modules will only be imported by workers if a function passed to them uses the modules, but this error showed me false evidence to the contrary. I'm assuming the error is in Spark's modifications to CloudPickle...not in the way the example is set up. > Importing pandas breaks included pi.py example > -- > > Key: SPARK-3662 > URL: https://issues.apache.org/jira/browse/SPARK-3662 > Project: Spark > Issue Type: Bug > Components: PySpark, YARN >Affects Versions: 1.1.0 > Environment: Xubuntu 14.04. Yarn cluster running on Ubuntu 12.04. >Reporter: Evan Samanas > > If I add "import pandas" at the top of the included pi.py example and submit > using "spark-submit --master yarn-client", I get this stack trace: > {code} > Traceback (most recent call last): > File "/home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi.py", line > 39, in > count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add) > File "/home/evan/pub_src/spark/python/pyspark/rdd.py", line 759, in reduce > vals = self.mapPartitions(func).collect() > File "/home/evan/pub_src/spark/python/pyspark/rdd.py", line 723, in collect > bytesInJava = self._jrdd.collect().iterator() > File > "/home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File > "/home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError14/09/23 15:51:58 INFO TaskSetManager: Lost task > 2.3 in stage 0.0 (TID 10) on executor SERVERNAMEREMOVED: > org.apache.spark.api.python.PythonException (Traceback (most recent call > last): > File > "/yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/worker.py", > line 75, in main > command = pickleSer._read_with_length(infile) > File > "/yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/serializers.py", > line 150, in _read_with_length > return self.loads(obj) > ImportError: No module named algos > {code} > The example works fine if I move the statement "from random import random" > from the top and into the function (def f(_)) defined in the example. Near > as I can tell, "random" is getting confused with a function of the same name > within pandas.algos. > Submitting the same script using --master local works, but gives a > distressing amount of random characters to stdout or stderr and messes up my > terminal: > {code} > ... > @J@J@J@J@J@J@J@J@J@J@J@J@J@JJ@J@J@J@J > @J!@J"@J#@J$@J%@J&@J'@J(@J)@J*@J+@J,@J-@J.@J/@J0@J1@J2@J3@J4@J5@J6@J7@J8@J9@J:@J;@J<@J=@J>@J?@J@@JA@JB@JC@JD@JE@JF@JG@JH@JI@JJ@JK@JL@JM@JN@JO@JP@JQ@JR@JS@JT@JU@JV@JW@JX@JY@JZ@J[@J\@J]@J^@J_@J`@Ja@Jb@Jc@Jd@Je@Jf@Jg@Jh@Ji@Jj@Jk@Jl@Jm@Jn@Jo@Jp@Jq@Jr@Js@Jt@Ju@Jv@Jw@Jx@Jy@Jz@J{@J|@J}@J~@J@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JJJ�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JAJAJAJAJAJAJAJAAJ >AJ > AJ > AJ > AJAJAJAJAJAJAJAJAJAJAJAJAJAJJAJAJAJAJ > AJ!AJ"AJ#AJ$AJ%AJ&AJ'AJ(AJ)AJ*AJ+AJ,AJ-AJ.AJ/AJ0AJ1AJ2AJ3AJ4AJ5AJ6AJ7AJ8AJ9AJ:AJ;AJAJ?AJ@AJAAJBAJCAJDAJEAJFAJGAJHAJIAJJAJKAJLAJMAJNAJOAJPAJQAJRAJSAJTAJUAJVAJWAJXAJYAJZAJ[AJ\AJ]AJ^AJ_AJ`AJaAJbAJcAJdAJeAJfAJgAJhAJiAJjAJkAJlAJmAJnAJoAJpAJqAJrAJsAJtAJuAJvAJwAJxAJyAJzAJ{AJ|AJ}AJ~AJAJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJJJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�A14/09/23 > 15:42:09 INFO SparkContext: Job finished: reduce at > /home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi_sframe.py:38, took > 11.276879779 s > J�AJ�AJ�AJ�AJ�AJ�AJ�AJ
[jira] [Commented] (SPARK-3466) Limit size of results that a driver collects for each action
[ https://issues.apache.org/jira/browse/SPARK-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146640#comment-14146640 ] Andrew Ash commented on SPARK-3466: --- How would you design this feature? I can imagine measuring the size of partitions / RDD elements while they are held in memory across the cluster, sending those sizes back to the driver, and having the driver throw an exception if the requested size exceeds the threshold. Otherwise proceed as normal. Is that how you were envisioning implementation? > Limit size of results that a driver collects for each action > > > Key: SPARK-3466 > URL: https://issues.apache.org/jira/browse/SPARK-3466 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Matei Zaharia > > Right now, operations like collect() and take() can crash the driver if they > bring back too many data. We should add a spark.driver.maxResultSize setting > (or something like that) that will make the driver abort a job if its result > is too big. We can set it to some fraction of the driver's memory by default, > or to something like 100 MB. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3678) Yarn app name reported in RM is different between cluster and client mode
Thomas Graves created SPARK-3678: Summary: Yarn app name reported in RM is different between cluster and client mode Key: SPARK-3678 URL: https://issues.apache.org/jira/browse/SPARK-3678 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Thomas Graves If you launch an application in yarn cluster mode the name of the application in the ResourceManager generally shows up as the full name org.apache.spark.examples.SparkHdfsLR. If you start the same app in client mode it shows up as SparkHdfsLR. We should be consistent between them. I haven't looked at it in detail, perhaps its only the examples but I think I've seen this with customer apps also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2691) Allow Spark on Mesos to be launched with Docker
[ https://issues.apache.org/jira/browse/SPARK-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146246#comment-14146246 ] Ryan D Braley commented on SPARK-2691: -- +1 Spark typically lags behind mesos in version numbers so if you run mesos today you have to choose between spark and docker. With this we could have our cake and eat it too :) > Allow Spark on Mesos to be launched with Docker > --- > > Key: SPARK-2691 > URL: https://issues.apache.org/jira/browse/SPARK-2691 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Timothy Chen > Labels: mesos > > Currently to launch Spark with Mesos one must upload a tarball and specifiy > the executor URI to be passed in that is to be downloaded on each slave or > even each execution depending coarse mode or not. > We want to make Spark able to support launching Executors via a Docker image > that utilizes the recent Docker and Mesos integration work. > With the recent integration Spark can simply specify a Docker image and > options that is needed and it should continue to work as-is. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3639) Kinesis examples set master as local
[ https://issues.apache.org/jira/browse/SPARK-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146245#comment-14146245 ] Matthew Farrellee commented on SPARK-3639: -- seems reasonable to me > Kinesis examples set master as local > > > Key: SPARK-3639 > URL: https://issues.apache.org/jira/browse/SPARK-3639 > Project: Spark > Issue Type: Bug > Components: Examples, Streaming >Affects Versions: 1.0.2, 1.1.0 >Reporter: Aniket Bhatnagar >Priority: Minor > Labels: examples > > Kinesis examples set master as local thus not allowing the example to be > tested on a cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3677) Scalastyle is never applyed to the sources under yarn/common
[ https://issues.apache.org/jira/browse/SPARK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146171#comment-14146171 ] Apache Spark commented on SPARK-3677: - User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/2520 > Scalastyle is never applyed to the sources under yarn/common > > > Key: SPARK-3677 > URL: https://issues.apache.org/jira/browse/SPARK-3677 > Project: Spark > Issue Type: Bug > Components: Build, YARN >Affects Versions: 1.2.0 >Reporter: Kousuke Saruta > > When we run "sbt -Pyarn scalastyle", scalastyle is not applied to the sources > under yarn/common. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3677) Scalastyle is never applyed to the sources under yarn/common
Kousuke Saruta created SPARK-3677: - Summary: Scalastyle is never applyed to the sources under yarn/common Key: SPARK-3677 URL: https://issues.apache.org/jira/browse/SPARK-3677 Project: Spark Issue Type: Bug Components: Build, YARN Affects Versions: 1.2.0 Reporter: Kousuke Saruta When we run "sbt -Pyarn scalastyle", scalastyle is not applied to the sources under yarn/common. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3526) Docs section on data locality
[ https://issues.apache.org/jira/browse/SPARK-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146094#comment-14146094 ] Apache Spark commented on SPARK-3526: - User 'ash211' has created a pull request for this issue: https://github.com/apache/spark/pull/2519 > Docs section on data locality > - > > Key: SPARK-3526 > URL: https://issues.apache.org/jira/browse/SPARK-3526 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 1.0.2 >Reporter: Andrew Ash >Assignee: Andrew Ash > > Several threads on the mailing list have been about data locality and how to > interpret PROCESS_LOCAL, NODE_LOCAL, RACK_LOCAL, etc. Let's get some more > details in the docs on this concept so we can point future questions there. > A couple people appreciated the below description of locality so it could be > a good starting point: > {quote} > The locality is how close the data is to the code that's processing it. > PROCESS_LOCAL means data is in the same JVM as the code that's running, so > it's really fast. NODE_LOCAL might mean that the data is in HDFS on the same > node, or in another executor on the same node, so is a little slower because > the data has to travel across an IPC connection. RACK_LOCAL is even slower > -- data is on a different server so needs to be sent over the network. > Spark switches to lower locality levels when there's no unprocessed data on a > node that has idle CPUs. In that situation you have two options: wait until > the busy CPUs free up so you can start another task that uses data on that > server, or start a new task on a farther away server that needs to bring data > from that remote place. What Spark typically does is wait a bit in the hopes > that a busy CPU frees up. Once that timeout expires, it starts moving the > data from far away to the free CPU. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3676) jdk version lead to spark sql test suite error
[ https://issues.apache.org/jira/browse/SPARK-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146050#comment-14146050 ] wangfei commented on SPARK-3676: hmm, i see, thanks for that. > jdk version lead to spark sql test suite error > -- > > Key: SPARK-3676 > URL: https://issues.apache.org/jira/browse/SPARK-3676 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: wangfei > Fix For: 1.2.0 > > > System.out.println(1/500d) get different result in diff jdk version > jdk 1.6.0(_31) 0.0020 > jdk 1.7.0(_05) 0.002 > this will lead to spark sql hive test suite failed (replay by set jdk > version = 1.6.0_31)--- > [info] - division *** FAILED *** > [info] Results do not match for division: > [info] SELECT 2 / 1, 1 / 2, 1 / 3, 1 / COUNT(*) FROM src LIMIT 1 > [info] == Parsed Logical Plan == > [info] Limit 1 > [info]Project [(2 / 1) AS c_0#692,(1 / 2) AS c_1#693,(1 / 3) AS > c_2#694,(1 / COUNT(1)) AS c_3#695] > [info] UnresolvedRelation None, src, None > [info] > [info] == Analyzed Logical Plan == > [info] Limit 1 > [info]Aggregate [], [(CAST(2, DoubleType) / CAST(1, DoubleType)) AS > c_0#692,(CAST(1, DoubleType) / CAST(2, DoubleType)) AS c_1#693,(CAST(1, > DoubleType) / CAST(3, DoubleType)) AS c_2#694,(CAST(CAST(1, LongType), Doub > leType) / CAST(COUNT(1), DoubleType)) AS c_3#695] > [info] MetastoreRelation default, src, None > [info] > [info] == Optimized Logical Plan == > [info] Limit 1 > [info]Aggregate [], [2.0 AS c_0#692,0.5 AS c_1#693,0. AS > c_2#694,(1.0 / CAST(COUNT(1), DoubleType)) AS c_3#695] > [info] Project [] > [info] MetastoreRelation default, src, None > [info] > [info] == Physical Plan == > [info] Limit 1 > [info]Aggregate false, [], [2.0 AS c_0#692,0.5 AS > c_1#693,0. AS c_2#694,(1.0 / CAST(SUM(PartialCount#699L), > DoubleType)) AS c_3#695] > [info] Exchange SinglePartition > [info] Aggregate true, [], [COUNT(1) AS PartialCount#699L] > [info] HiveTableScan [], (MetastoreRelation default, src, None), None > [info] > [info] Code Generation: false > [info] == RDD == > [info] c_0c_1 c_2 c_3 > [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == > [info] !2.0 0.5 0. 0.002 2.0 0.5 > 0. 0.0020 (HiveComparisonTest.scala:370) > [info] - timestamp cast #1 *** FAILED *** > [info] Results do not match for timestamp cast #1: > [info] SELECT CAST(CAST(1 AS TIMESTAMP) AS DOUBLE) FROM src LIMIT 1 > [info] == Parsed Logical Plan == > [info] Limit 1 > [info]Project [CAST(CAST(1, TimestampType), DoubleType) AS c_0#995] > [info] UnresolvedRelation None, src, None > [info] > [info] == Analyzed Logical Plan == > [info] Limit 1 > [info]Project [CAST(CAST(1, TimestampType), DoubleType) AS c_0#995] > [info] MetastoreRelation default, src, None > [info] > [info] == Optimized Logical Plan == > [info] Limit 1 > [info]Project [0.0010 AS c_0#995] > [info] MetastoreRelation default, src, None > [info] > [info] == Physical Plan == > [info] Limit 1 > [info]Project [0.0010 AS c_0#995] > [info] HiveTableScan [], (MetastoreRelation default, src, None), None > [info] > [info] Code Generation: false > [info] == RDD == > [info] c_0 > [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == > [info] !0.001 0.0010 (HiveComparisonTest.scala:370) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3663) Document SPARK_LOG_DIR and SPARK_PID_DIR
[ https://issues.apache.org/jira/browse/SPARK-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146041#comment-14146041 ] Apache Spark commented on SPARK-3663: - User 'ash211' has created a pull request for this issue: https://github.com/apache/spark/pull/2518 > Document SPARK_LOG_DIR and SPARK_PID_DIR > > > Key: SPARK-3663 > URL: https://issues.apache.org/jira/browse/SPARK-3663 > Project: Spark > Issue Type: Documentation >Reporter: Andrew Ash >Assignee: Andrew Ash > > I'm using these two parameters in some puppet scripts for standalone > deployment and realized that they're not documented anywhere. We should > document them -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3676) jdk version lead to spark sql test suite error
[ https://issues.apache.org/jira/browse/SPARK-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146040#comment-14146040 ] Sean Owen commented on SPARK-3676: -- (For the interested, I looked it up, since the behavior change sounds surprising. This is in fact a bug in Java 6 that was fixed in Java 7: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4428022 It may even be fixed in later versions of Java 6, but I have a very recent one and it is not.) > jdk version lead to spark sql test suite error > -- > > Key: SPARK-3676 > URL: https://issues.apache.org/jira/browse/SPARK-3676 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: wangfei > Fix For: 1.2.0 > > > System.out.println(1/500d) get different result in diff jdk version > jdk 1.6.0(_31) 0.0020 > jdk 1.7.0(_05) 0.002 > this will lead to spark sql hive test suite failed (replay by set jdk > version = 1.6.0_31)--- > [info] - division *** FAILED *** > [info] Results do not match for division: > [info] SELECT 2 / 1, 1 / 2, 1 / 3, 1 / COUNT(*) FROM src LIMIT 1 > [info] == Parsed Logical Plan == > [info] Limit 1 > [info]Project [(2 / 1) AS c_0#692,(1 / 2) AS c_1#693,(1 / 3) AS > c_2#694,(1 / COUNT(1)) AS c_3#695] > [info] UnresolvedRelation None, src, None > [info] > [info] == Analyzed Logical Plan == > [info] Limit 1 > [info]Aggregate [], [(CAST(2, DoubleType) / CAST(1, DoubleType)) AS > c_0#692,(CAST(1, DoubleType) / CAST(2, DoubleType)) AS c_1#693,(CAST(1, > DoubleType) / CAST(3, DoubleType)) AS c_2#694,(CAST(CAST(1, LongType), Doub > leType) / CAST(COUNT(1), DoubleType)) AS c_3#695] > [info] MetastoreRelation default, src, None > [info] > [info] == Optimized Logical Plan == > [info] Limit 1 > [info]Aggregate [], [2.0 AS c_0#692,0.5 AS c_1#693,0. AS > c_2#694,(1.0 / CAST(COUNT(1), DoubleType)) AS c_3#695] > [info] Project [] > [info] MetastoreRelation default, src, None > [info] > [info] == Physical Plan == > [info] Limit 1 > [info]Aggregate false, [], [2.0 AS c_0#692,0.5 AS > c_1#693,0. AS c_2#694,(1.0 / CAST(SUM(PartialCount#699L), > DoubleType)) AS c_3#695] > [info] Exchange SinglePartition > [info] Aggregate true, [], [COUNT(1) AS PartialCount#699L] > [info] HiveTableScan [], (MetastoreRelation default, src, None), None > [info] > [info] Code Generation: false > [info] == RDD == > [info] c_0c_1 c_2 c_3 > [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == > [info] !2.0 0.5 0. 0.002 2.0 0.5 > 0. 0.0020 (HiveComparisonTest.scala:370) > [info] - timestamp cast #1 *** FAILED *** > [info] Results do not match for timestamp cast #1: > [info] SELECT CAST(CAST(1 AS TIMESTAMP) AS DOUBLE) FROM src LIMIT 1 > [info] == Parsed Logical Plan == > [info] Limit 1 > [info]Project [CAST(CAST(1, TimestampType), DoubleType) AS c_0#995] > [info] UnresolvedRelation None, src, None > [info] > [info] == Analyzed Logical Plan == > [info] Limit 1 > [info]Project [CAST(CAST(1, TimestampType), DoubleType) AS c_0#995] > [info] MetastoreRelation default, src, None > [info] > [info] == Optimized Logical Plan == > [info] Limit 1 > [info]Project [0.0010 AS c_0#995] > [info] MetastoreRelation default, src, None > [info] > [info] == Physical Plan == > [info] Limit 1 > [info]Project [0.0010 AS c_0#995] > [info] HiveTableScan [], (MetastoreRelation default, src, None), None > [info] > [info] Code Generation: false > [info] == RDD == > [info] c_0 > [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == > [info] !0.001 0.0010 (HiveComparisonTest.scala:370) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3267) Deadlock between ScalaReflectionLock and Data type initialization
[ https://issues.apache.org/jira/browse/SPARK-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146039#comment-14146039 ] Aaron Davidson commented on SPARK-3267: --- I don't have it anymore, unfortunately. Michael and I did a little digging at the time, and I think we found the reason for the deadlock, shown in the stack traces above, but decided it was a very unlikely scenario. Indeed, the query did not consistently deadlock; this only occurred a single time. > Deadlock between ScalaReflectionLock and Data type initialization > - > > Key: SPARK-3267 > URL: https://issues.apache.org/jira/browse/SPARK-3267 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Aaron Davidson >Priority: Critical > > Deadlock here: > {code} > "Executor task launch worker-0" daemon prio=10 tid=0x7fab50036000 > nid=0x27a in Object.wait() [0x7fab60c2e000 > ] >java.lang.Thread.State: RUNNABLE > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.defaultPrimitive(CodeGenerator.scala:565) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$1.applyOrElse(CodeGenerator.scal > a:202) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$1.applyOrElse(CodeGenerator.scal > a:195) > at scala.PartialFunction$Lifted.apply(PartialFunction.scala:218) > at scala.PartialFunction$Lifted.apply(PartialFunction.scala:214) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.expressionEvaluator(CodeGenerator.scala:4 > 93) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$Evaluate2$2.evaluateAs(CodeGenerator.scal > a:175) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$1.applyOrElse(CodeGenerator.scal > a:304) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$1.applyOrElse(CodeGenerator.scal > a:195) > at scala.PartialFunction$Lifted.apply(PartialFunction.scala:218) > at scala.PartialFunction$Lifted.apply(PartialFunction.scala:214) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.expressionEvaluator(CodeGenerator.scala:4 > 93) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$1.applyOrElse(CodeGenerator.scal > a:314) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$1.applyOrElse(CodeGenerator.scal > a:195) > at scala.PartialFunction$Lifted.apply(PartialFunction.scala:218) > at scala.PartialFunction$Lifted.apply(PartialFunction.scala:214) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.expressionEvaluator(CodeGenerator.scala:4 > 93) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$1.applyOrElse(CodeGenerator.scal > a:313) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$1.applyOrElse(CodeGenerator.scal > a:195) > at scala.PartialFunction$Lifted.apply(PartialFunction.scala:218) > at scala.PartialFunction$Lifted.apply(PartialFunction.scala:214) > ... > {code} > and > {code} > "Executor task launch worker-2" daemon prio=10 tid=0x7fab100f0800 > nid=0x27e in Object.wait() [0x7fab0eeec000 > ] >java.lang.Thread.State: RUNNABLE > at > org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:250) > - locked <0x00064e5d9a48> (a > org.apache.spark.sql.catalyst.expressions.Cast) > at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247) > at org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263) > at > org.apache.spark.sql.parquet.ParquetTableScan$$anonfun$execute$2$$anonfun$6.apply(ParquetTableOperations. > scala:139) > at > org.apache.spark.sql.parquet.ParquetTableScan$$anonfun$execute$2$$anonfun$6.apply(ParquetTableOperations. > scala:139) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.parquet.ParquetTableScan$$anonfun$execute$2.apply(ParquetTableOperations.scala:139) > at > org.apache.spark.sql.parquet.ParquetTableScan$$anonfu
[jira] [Commented] (SPARK-3676) jdk version lead to spark sql test suite error
[ https://issues.apache.org/jira/browse/SPARK-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146035#comment-14146035 ] Apache Spark commented on SPARK-3676: - User 'scwf' has created a pull request for this issue: https://github.com/apache/spark/pull/2517 > jdk version lead to spark sql test suite error > -- > > Key: SPARK-3676 > URL: https://issues.apache.org/jira/browse/SPARK-3676 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: wangfei > Fix For: 1.2.0 > > > System.out.println(1/500d) get different result in diff jdk version > jdk 1.6.0(_31) 0.0020 > jdk 1.7.0(_05) 0.002 > this will lead to spark sql hive test suite failed (replay by set jdk > version = 1.6.0_31)--- > [info] - division *** FAILED *** > [info] Results do not match for division: > [info] SELECT 2 / 1, 1 / 2, 1 / 3, 1 / COUNT(*) FROM src LIMIT 1 > [info] == Parsed Logical Plan == > [info] Limit 1 > [info]Project [(2 / 1) AS c_0#692,(1 / 2) AS c_1#693,(1 / 3) AS > c_2#694,(1 / COUNT(1)) AS c_3#695] > [info] UnresolvedRelation None, src, None > [info] > [info] == Analyzed Logical Plan == > [info] Limit 1 > [info]Aggregate [], [(CAST(2, DoubleType) / CAST(1, DoubleType)) AS > c_0#692,(CAST(1, DoubleType) / CAST(2, DoubleType)) AS c_1#693,(CAST(1, > DoubleType) / CAST(3, DoubleType)) AS c_2#694,(CAST(CAST(1, LongType), Doub > leType) / CAST(COUNT(1), DoubleType)) AS c_3#695] > [info] MetastoreRelation default, src, None > [info] > [info] == Optimized Logical Plan == > [info] Limit 1 > [info]Aggregate [], [2.0 AS c_0#692,0.5 AS c_1#693,0. AS > c_2#694,(1.0 / CAST(COUNT(1), DoubleType)) AS c_3#695] > [info] Project [] > [info] MetastoreRelation default, src, None > [info] > [info] == Physical Plan == > [info] Limit 1 > [info]Aggregate false, [], [2.0 AS c_0#692,0.5 AS > c_1#693,0. AS c_2#694,(1.0 / CAST(SUM(PartialCount#699L), > DoubleType)) AS c_3#695] > [info] Exchange SinglePartition > [info] Aggregate true, [], [COUNT(1) AS PartialCount#699L] > [info] HiveTableScan [], (MetastoreRelation default, src, None), None > [info] > [info] Code Generation: false > [info] == RDD == > [info] c_0c_1 c_2 c_3 > [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == > [info] !2.0 0.5 0. 0.002 2.0 0.5 > 0. 0.0020 (HiveComparisonTest.scala:370) > [info] - timestamp cast #1 *** FAILED *** > [info] Results do not match for timestamp cast #1: > [info] SELECT CAST(CAST(1 AS TIMESTAMP) AS DOUBLE) FROM src LIMIT 1 > [info] == Parsed Logical Plan == > [info] Limit 1 > [info]Project [CAST(CAST(1, TimestampType), DoubleType) AS c_0#995] > [info] UnresolvedRelation None, src, None > [info] > [info] == Analyzed Logical Plan == > [info] Limit 1 > [info]Project [CAST(CAST(1, TimestampType), DoubleType) AS c_0#995] > [info] MetastoreRelation default, src, None > [info] > [info] == Optimized Logical Plan == > [info] Limit 1 > [info]Project [0.0010 AS c_0#995] > [info] MetastoreRelation default, src, None > [info] > [info] == Physical Plan == > [info] Limit 1 > [info]Project [0.0010 AS c_0#995] > [info] HiveTableScan [], (MetastoreRelation default, src, None), None > [info] > [info] Code Generation: false > [info] == RDD == > [info] c_0 > [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == > [info] !0.001 0.0010 (HiveComparisonTest.scala:370) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3662) Importing pandas breaks included pi.py example
[ https://issues.apache.org/jira/browse/SPARK-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146017#comment-14146017 ] Sean Owen commented on SPARK-3662: -- Maybe I miss something, but, does this just mean you can't "import pandas" entirely? If you're modifying the example, you should import only what you need from pandas. Or, it may be that you need to modify the "import random", indeed, to accommodate other modifications you want to make. But what is the problem with the included example? it runs fine without modifications, no? > Importing pandas breaks included pi.py example > -- > > Key: SPARK-3662 > URL: https://issues.apache.org/jira/browse/SPARK-3662 > Project: Spark > Issue Type: Bug > Components: PySpark, YARN >Affects Versions: 1.1.0 > Environment: Xubuntu 14.04. Yarn cluster running on Ubuntu 12.04. >Reporter: Evan Samanas > > If I add "import pandas" at the top of the included pi.py example and submit > using "spark-submit --master yarn-client", I get this stack trace: > {code} > Traceback (most recent call last): > File "/home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi.py", line > 39, in > count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add) > File "/home/evan/pub_src/spark/python/pyspark/rdd.py", line 759, in reduce > vals = self.mapPartitions(func).collect() > File "/home/evan/pub_src/spark/python/pyspark/rdd.py", line 723, in collect > bytesInJava = self._jrdd.collect().iterator() > File > "/home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File > "/home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError14/09/23 15:51:58 INFO TaskSetManager: Lost task > 2.3 in stage 0.0 (TID 10) on executor SERVERNAMEREMOVED: > org.apache.spark.api.python.PythonException (Traceback (most recent call > last): > File > "/yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/worker.py", > line 75, in main > command = pickleSer._read_with_length(infile) > File > "/yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/serializers.py", > line 150, in _read_with_length > return self.loads(obj) > ImportError: No module named algos > {code} > The example works fine if I move the statement "from random import random" > from the top and into the function (def f(_)) defined in the example. Near > as I can tell, "random" is getting confused with a function of the same name > within pandas.algos. > Submitting the same script using --master local works, but gives a > distressing amount of random characters to stdout or stderr and messes up my > terminal: > {code} > ... > @J@J@J@J@J@J@J@J@J@J@J@J@J@JJ@J@J@J@J > @J!@J"@J#@J$@J%@J&@J'@J(@J)@J*@J+@J,@J-@J.@J/@J0@J1@J2@J3@J4@J5@J6@J7@J8@J9@J:@J;@J<@J=@J>@J?@J@@JA@JB@JC@JD@JE@JF@JG@JH@JI@JJ@JK@JL@JM@JN@JO@JP@JQ@JR@JS@JT@JU@JV@JW@JX@JY@JZ@J[@J\@J]@J^@J_@J`@Ja@Jb@Jc@Jd@Je@Jf@Jg@Jh@Ji@Jj@Jk@Jl@Jm@Jn@Jo@Jp@Jq@Jr@Js@Jt@Ju@Jv@Jw@Jx@Jy@Jz@J{@J|@J}@J~@J@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JJJ�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JAJAJAJAJAJAJAJAAJ >AJ > AJ > AJ > AJAJAJAJAJAJAJAJAJAJAJAJAJAJJAJAJAJAJ > AJ!AJ"AJ#AJ$AJ%AJ&AJ'AJ(AJ)AJ*AJ+AJ,AJ-AJ.AJ/AJ0AJ1AJ2AJ3AJ4AJ5AJ6AJ7AJ8AJ9AJ:AJ;AJAJ?AJ@AJAAJBAJCAJDAJEAJFAJGAJHAJIAJJAJKAJLAJMAJNAJOAJPAJQAJRAJSAJTAJUAJVAJWAJXAJYAJZAJ[AJ\AJ]AJ^AJ_AJ`AJaAJbAJcAJdAJeAJfAJgAJhAJiAJjAJkAJlAJmAJnAJoAJpAJqAJrAJsAJtAJuAJvAJwAJxAJyAJzAJ{AJ|AJ}AJ~AJAJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJJJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�A14/09/23 > 15:42:09 INFO SparkContext: Job finished: reduce at > /home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi_sframe.py:38, took > 11.276879779 s > J�AJ�AJ�AJ�AJ�AJ�AJ�A�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJBJBJBJBJBJBJBJBBJ > BJ > BJ > BJ > BJBJBJBJBJBJBJBJBJBJBJBJBJBJJBJBJBJBJ > BJ!BJ"BJ#BJ$BJ%BJ&BJ'BJ(BJ)BJ*BJ+BJ,BJ-BJ.BJ/BJ0BJ1BJ2BJ3BJ4BJ5BJ6BJ7BJ8BJ9BJ:BJ;BJBJ?BJ@Be. > �]qJ#1a. > �]qJX4a. > �]qJX4a. > �]qJ#1a. > �]qJX4a. > �]qJX4a. > �]qJ#1a. > �]qJX4a. > �]qJX4a. > �]qJa. > Pi is roughly 3.146136 > {code} > No idea if that's related, but thought I'd include it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For ad
[jira] [Commented] (SPARK-3620) Refactor config option handling code for spark-submit
[ https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146011#comment-14146011 ] Apache Spark commented on SPARK-3620: - User 'tigerquoll' has created a pull request for this issue: https://github.com/apache/spark/pull/2516 > Refactor config option handling code for spark-submit > - > > Key: SPARK-3620 > URL: https://issues.apache.org/jira/browse/SPARK-3620 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.0.0, 1.1.0 >Reporter: Dale Richardson >Assignee: Dale Richardson >Priority: Minor > > I'm proposing its time to refactor the configuration argument handling code > in spark-submit. The code has grown organically in a short period of time, > handles a pretty complicated logic flow, and is now pretty fragile. Some > issues that have been identified: > 1. Hand-crafted property file readers that do not support the property file > format as specified in > http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) > 2. ResolveURI not called on paths read from conf/prop files > 3. inconsistent means of merging / overriding values from different sources > (Some get overridden by file, others by manual settings of field on object, > Some by properties) > 4. Argument validation should be done after combining config files, system > properties and command line arguments, > 5. Alternate conf file location not handled in shell scripts > 6. Some options can only be passed as command line arguments > 7. Defaults for options are hard-coded (and sometimes overridden multiple > times) in many through-out the code e.g. master = local[*] > Initial proposal is to use typesafe conf to read in the config information > and merge the various config sources -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3676) jdk version lead to spark sql test suite error
[ https://issues.apache.org/jira/browse/SPARK-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3676: --- Summary: jdk version lead to spark sql test suite error (was: jdk version lead to spark hql test suite error) > jdk version lead to spark sql test suite error > -- > > Key: SPARK-3676 > URL: https://issues.apache.org/jira/browse/SPARK-3676 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: wangfei > Fix For: 1.2.0 > > > System.out.println(1/500d) get different result in diff jdk version > jdk 1.6.0(_31) 0.0020 > jdk 1.7.0(_05) 0.002 > this will lead to spark sql hive test suite failed (replay by set jdk > version = 1.6.0_31)--- > [info] - division *** FAILED *** > [info] Results do not match for division: > [info] SELECT 2 / 1, 1 / 2, 1 / 3, 1 / COUNT(*) FROM src LIMIT 1 > [info] == Parsed Logical Plan == > [info] Limit 1 > [info]Project [(2 / 1) AS c_0#692,(1 / 2) AS c_1#693,(1 / 3) AS > c_2#694,(1 / COUNT(1)) AS c_3#695] > [info] UnresolvedRelation None, src, None > [info] > [info] == Analyzed Logical Plan == > [info] Limit 1 > [info]Aggregate [], [(CAST(2, DoubleType) / CAST(1, DoubleType)) AS > c_0#692,(CAST(1, DoubleType) / CAST(2, DoubleType)) AS c_1#693,(CAST(1, > DoubleType) / CAST(3, DoubleType)) AS c_2#694,(CAST(CAST(1, LongType), Doub > leType) / CAST(COUNT(1), DoubleType)) AS c_3#695] > [info] MetastoreRelation default, src, None > [info] > [info] == Optimized Logical Plan == > [info] Limit 1 > [info]Aggregate [], [2.0 AS c_0#692,0.5 AS c_1#693,0. AS > c_2#694,(1.0 / CAST(COUNT(1), DoubleType)) AS c_3#695] > [info] Project [] > [info] MetastoreRelation default, src, None > [info] > [info] == Physical Plan == > [info] Limit 1 > [info]Aggregate false, [], [2.0 AS c_0#692,0.5 AS > c_1#693,0. AS c_2#694,(1.0 / CAST(SUM(PartialCount#699L), > DoubleType)) AS c_3#695] > [info] Exchange SinglePartition > [info] Aggregate true, [], [COUNT(1) AS PartialCount#699L] > [info] HiveTableScan [], (MetastoreRelation default, src, None), None > [info] > [info] Code Generation: false > [info] == RDD == > [info] c_0c_1 c_2 c_3 > [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == > [info] !2.0 0.5 0. 0.002 2.0 0.5 > 0. 0.0020 (HiveComparisonTest.scala:370) > [info] - timestamp cast #1 *** FAILED *** > [info] Results do not match for timestamp cast #1: > [info] SELECT CAST(CAST(1 AS TIMESTAMP) AS DOUBLE) FROM src LIMIT 1 > [info] == Parsed Logical Plan == > [info] Limit 1 > [info]Project [CAST(CAST(1, TimestampType), DoubleType) AS c_0#995] > [info] UnresolvedRelation None, src, None > [info] > [info] == Analyzed Logical Plan == > [info] Limit 1 > [info]Project [CAST(CAST(1, TimestampType), DoubleType) AS c_0#995] > [info] MetastoreRelation default, src, None > [info] > [info] == Optimized Logical Plan == > [info] Limit 1 > [info]Project [0.0010 AS c_0#995] > [info] MetastoreRelation default, src, None > [info] > [info] == Physical Plan == > [info] Limit 1 > [info]Project [0.0010 AS c_0#995] > [info] HiveTableScan [], (MetastoreRelation default, src, None), None > [info] > [info] Code Generation: false > [info] == RDD == > [info] c_0 > [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == > [info] !0.001 0.0010 (HiveComparisonTest.scala:370) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3676) jdk version lead to spark hql test suite error
wangfei created SPARK-3676: -- Summary: jdk version lead to spark hql test suite error Key: SPARK-3676 URL: https://issues.apache.org/jira/browse/SPARK-3676 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 System.out.println(1/500d) get different result in diff jdk version jdk 1.6.0(_31) 0.0020 jdk 1.7.0(_05) 0.002 this will lead to spark sql hive test suite failed (replay by set jdk version = 1.6.0_31)--- [info] - division *** FAILED *** [info] Results do not match for division: [info] SELECT 2 / 1, 1 / 2, 1 / 3, 1 / COUNT(*) FROM src LIMIT 1 [info] == Parsed Logical Plan == [info] Limit 1 [info]Project [(2 / 1) AS c_0#692,(1 / 2) AS c_1#693,(1 / 3) AS c_2#694,(1 / COUNT(1)) AS c_3#695] [info] UnresolvedRelation None, src, None [info] [info] == Analyzed Logical Plan == [info] Limit 1 [info]Aggregate [], [(CAST(2, DoubleType) / CAST(1, DoubleType)) AS c_0#692,(CAST(1, DoubleType) / CAST(2, DoubleType)) AS c_1#693,(CAST(1, DoubleType) / CAST(3, DoubleType)) AS c_2#694,(CAST(CAST(1, LongType), Doub leType) / CAST(COUNT(1), DoubleType)) AS c_3#695] [info] MetastoreRelation default, src, None [info] [info] == Optimized Logical Plan == [info] Limit 1 [info]Aggregate [], [2.0 AS c_0#692,0.5 AS c_1#693,0. AS c_2#694,(1.0 / CAST(COUNT(1), DoubleType)) AS c_3#695] [info] Project [] [info] MetastoreRelation default, src, None [info] [info] == Physical Plan == [info] Limit 1 [info]Aggregate false, [], [2.0 AS c_0#692,0.5 AS c_1#693,0. AS c_2#694,(1.0 / CAST(SUM(PartialCount#699L), DoubleType)) AS c_3#695] [info] Exchange SinglePartition [info] Aggregate true, [], [COUNT(1) AS PartialCount#699L] [info] HiveTableScan [], (MetastoreRelation default, src, None), None [info] [info] Code Generation: false [info] == RDD == [info] c_0c_1 c_2 c_3 [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == [info] !2.0 0.5 0. 0.002 2.0 0.5 0. 0.0020 (HiveComparisonTest.scala:370) [info] - timestamp cast #1 *** FAILED *** [info] Results do not match for timestamp cast #1: [info] SELECT CAST(CAST(1 AS TIMESTAMP) AS DOUBLE) FROM src LIMIT 1 [info] == Parsed Logical Plan == [info] Limit 1 [info]Project [CAST(CAST(1, TimestampType), DoubleType) AS c_0#995] [info] UnresolvedRelation None, src, None [info] [info] == Analyzed Logical Plan == [info] Limit 1 [info]Project [CAST(CAST(1, TimestampType), DoubleType) AS c_0#995] [info] MetastoreRelation default, src, None [info] [info] == Optimized Logical Plan == [info] Limit 1 [info]Project [0.0010 AS c_0#995] [info] MetastoreRelation default, src, None [info] [info] == Physical Plan == [info] Limit 1 [info]Project [0.0010 AS c_0#995] [info] HiveTableScan [], (MetastoreRelation default, src, None), None [info] [info] Code Generation: false [info] == RDD == [info] c_0 [info] !== HIVE - 1 row(s) == == CATALYST - 1 row(s) == [info] !0.001 0.0010 (HiveComparisonTest.scala:370) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org