[jira] [Updated] (SPARK-5719) allow daemons to bind to specified host
[ https://issues.apache.org/jira/browse/SPARK-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Wang updated SPARK-5719: Description: Now web ui binds to 0.0.0.0. When multiple network plane is enabled, we may try to bind ui port to some specified ip address so that it is possible to do some firewall work(ip filter). The added config items also work for daemons. was: Now web ui binds to 0.0.0.0. When multiple network plane is enabled, we may try to bind ui port to some specified ip address so that it is possible to do some firewall work(ip filter..etc..) The added config items also work for daemons. allow daemons to bind to specified host --- Key: SPARK-5719 URL: https://issues.apache.org/jira/browse/SPARK-5719 Project: Spark Issue Type: Improvement Components: Deploy Reporter: Tao Wang Priority: Minor Now web ui binds to 0.0.0.0. When multiple network plane is enabled, we may try to bind ui port to some specified ip address so that it is possible to do some firewall work(ip filter). The added config items also work for daemons. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1805) Error launching cluster when master and slave machines are of different virtualization types
[ https://issues.apache.org/jira/browse/SPARK-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-1805: - Target Version/s: (was: 1.3.0) Assignee: Nicholas Chammas Error launching cluster when master and slave machines are of different virtualization types Key: SPARK-1805 URL: https://issues.apache.org/jira/browse/SPARK-1805 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.1, 1.2.0 Reporter: Han JU Assignee: Nicholas Chammas Priority: Minor Fix For: 1.4.0 In the current EC2 script, the AMI image object is loaded only once. This is OK when the master and slave machines are of the same virtualization type (pv or hvm). But this won't work if, say, the master is pv and the slaves are hvm since the AMI is not compatible across these two kinds of virtualization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1805) Error launching cluster when master and slave machines are of different virtualization types
[ https://issues.apache.org/jira/browse/SPARK-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1805. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 4455 [https://github.com/apache/spark/pull/4455] Error launching cluster when master and slave machines are of different virtualization types Key: SPARK-1805 URL: https://issues.apache.org/jira/browse/SPARK-1805 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.1, 1.2.0 Reporter: Han JU Priority: Minor Fix For: 1.4.0 In the current EC2 script, the AMI image object is loaded only once. This is OK when the master and slave machines are of the same virtualization type (pv or hvm). But this won't work if, say, the master is pv and the slaves are hvm since the AMI is not compatible across these two kinds of virtualization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4237) Generate right Manifest File for maven building
[ https://issues.apache.org/jira/browse/SPARK-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4237. -- Resolution: Won't Fix Per PR discussion, it sounds like the action here if anything is to omit most or all of the manifest, but this in particular is WontFix. Generate right Manifest File for maven building --- Key: SPARK-4237 URL: https://issues.apache.org/jira/browse/SPARK-4237 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.1.0 Reporter: wangfei Now build spark with maven produce the Manifest File of guava, we should make right Manifest File for Maven building -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5720) Support `Create Table Like` in HiveContext
[ https://issues.apache.org/jira/browse/SPARK-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Sheng updated SPARK-5720: Summary: Support `Create Table Like` in HiveContext (was: Support `Create Table xx Like xx` in HiveContext) Support `Create Table Like` in HiveContext -- Key: SPARK-5720 URL: https://issues.apache.org/jira/browse/SPARK-5720 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.3.0 Reporter: Li Sheng Fix For: 1.3.0 Original Estimate: 72h Remaining Estimate: 72h -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2653) Heap size should be the sum of driver.memory and executor.memory in local mode
[ https://issues.apache.org/jira/browse/SPARK-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314330#comment-14314330 ] liu chang commented on SPARK-2653: -- hi, Davies Liu I have review the spark core codes, finded code of setting the heap size of JVM. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala#L109 I'm not sure. Is it a correct fix that change $newDriverMemory to the sum of spark.driver.memory and spark.executor.memory. Need your help. very thanks Heap size should be the sum of driver.memory and executor.memory in local mode -- Key: SPARK-2653 URL: https://issues.apache.org/jira/browse/SPARK-2653 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Davies Liu Priority: Minor Original Estimate: 1h Remaining Estimate: 1h In local mode, the driver and executor run in the same JVM, so the heap size of JVM should be the sum of spark.driver.memory and spark.executor.memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5717) add sc.stop to LDA examples
[ https://issues.apache.org/jira/browse/SPARK-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuhao yang closed SPARK-5717. - merged. Thanks add sc.stop to LDA examples --- Key: SPARK-5717 URL: https://issues.apache.org/jira/browse/SPARK-5717 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.2.0 Reporter: yuhao yang Assignee: yuhao yang Priority: Trivial Fix For: 1.3.0 Original Estimate: 1h Remaining Estimate: 1h Trivial. add sc stop and reorganize import in LDAExample and JavaLDAExample -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5081) Shuffle write increases
[ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315341#comment-14315341 ] Kevin Jung commented on SPARK-5081: --- Xuefeng Wu mentioned about one difference of snappy version. dependency groupIdorg.xerial.snappy/groupId artifactIdsnappy-java/artifactId version1.0.5.3/version /dependency It is changed to 1.1.1.6 in spark 1.2. We need to consider these two. Shuffle write increases --- Key: SPARK-5081 URL: https://issues.apache.org/jira/browse/SPARK-5081 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 1.2.0 Reporter: Kevin Jung The size of shuffle write showing in spark web UI is much different when I execute same spark job with same input data in both spark 1.1 and spark 1.2. At sortBy stage, the size of shuffle write is 98.1MB in spark 1.1 but 146.9MB in spark 1.2. I set spark.shuffle.manager option to hash because it's default value is changed but spark 1.2 still writes shuffle output more than spark 1.1. It can increase disk I/O overhead exponentially as the input file gets bigger and it causes the jobs take more time to complete. In the case of about 100GB input, for example, the size of shuffle write is 39.7GB in spark 1.1 but 91.0GB in spark 1.2. spark 1.1 ||Stage Id||Description||Input||Shuffle Read||Shuffle Write|| |9|saveAsTextFile| |1169.4KB| | |12|combineByKey| |1265.4KB|1275.0KB| |6|sortByKey| |1276.5KB| | |8|mapPartitions| |91.0MB|1383.1KB| |4|apply| |89.4MB| | |5|sortBy|155.6MB| |98.1MB| |3|sortBy|155.6MB| | | |1|collect| |2.1MB| | |2|mapValues|155.6MB| |2.2MB| |0|first|184.4KB| | | spark 1.2 ||Stage Id||Description||Input||Shuffle Read||Shuffle Write|| |12|saveAsTextFile| |1170.2KB| | |11|combineByKey| |1264.5KB|1275.0KB| |8|sortByKey| |1273.6KB| | |7|mapPartitions| |134.5MB|1383.1KB| |5|zipWithIndex| |132.5MB| | |4|sortBy|155.6MB| |146.9MB| |3|sortBy|155.6MB| | | |2|collect| |2.0MB| | |1|mapValues|155.6MB| |2.2MB| |0|first|184.4KB| | | -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5687) in TaskResultGetter need to catch OutOfMemoryError.
[ https://issues.apache.org/jira/browse/SPARK-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5687. -- Resolution: Won't Fix Resolving WontFix per PR discussion. in TaskResultGetter need to catch OutOfMemoryError. --- Key: SPARK-5687 URL: https://issues.apache.org/jira/browse/SPARK-5687 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Lianhui Wang because in enqueueSuccessfulTask there is another thread to fetch result, if result is very large,it maybe throw a OutOfMemoryError. so if we donot catch OutOfMemoryError, DAGDAGScheduler donot know the status of this task. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5679) Flaky tests in InputOutputMetricsSuite: input metrics with interleaved reads and input metrics with mixed read method
[ https://issues.apache.org/jira/browse/SPARK-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5679: --- Priority: Major (was: Blocker) Flaky tests in InputOutputMetricsSuite: input metrics with interleaved reads and input metrics with mixed read method -- Key: SPARK-5679 URL: https://issues.apache.org/jira/browse/SPARK-5679 Project: Spark Issue Type: Bug Components: Spark Core, Tests Affects Versions: 1.3.0 Reporter: Patrick Wendell Assignee: Kostas Sakellis Labels: flaky-test Please audit these and see if there are any assumptions with respect to File IO that might not hold in all cases. I'm happy to help if you can't find anything. These both failed in the same run: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.3-SBT/38/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/#showFailuresLink {code} org.apache.spark.metrics.InputOutputMetricsSuite.input metrics with mixed read method Failing for the past 13 builds (Since Failed#26 ) Took 48 sec. Error Message 2030 did not equal 6496 Stacktrace sbt.ForkMain$ForkError: 2030 did not equal 6496 at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply$mcV$sp(InputOutputMetricsSuite.scala:135) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply(InputOutputMetricsSuite.scala:113) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply(InputOutputMetricsSuite.scala:113) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.metrics.InputOutputMetricsSuite.org$scalatest$BeforeAndAfter$$super$runTest(InputOutputMetricsSuite.scala:46) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.metrics.InputOutputMetricsSuite.runTest(InputOutputMetricsSuite.scala:46) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.metrics.InputOutputMetricsSuite.org$scalatest$BeforeAndAfterAll$$super$run(InputOutputMetricsSuite.scala:46) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) at
[jira] [Resolved] (SPARK-5709) Add EXPLAIN support for DataFrame API for debugging purpose
[ https://issues.apache.org/jira/browse/SPARK-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5709. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4496 [https://github.com/apache/spark/pull/4496] Add EXPLAIN support for DataFrame API for debugging purpose - Key: SPARK-5709 URL: https://issues.apache.org/jira/browse/SPARK-5709 Project: Spark Issue Type: Improvement Components: SQL Reporter: Cheng Hao Fix For: 1.3.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5443) jsonRDD with schema should ignore sub-objects that are omitted in schema
[ https://issues.apache.org/jira/browse/SPARK-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315530#comment-14315530 ] Yin Huai commented on SPARK-5443: - Yeah, I think we can improve performance by only construct rows with needed field. Also, we can get better improvement if we only parse needed fields. jsonRDD with schema should ignore sub-objects that are omitted in schema Key: SPARK-5443 URL: https://issues.apache.org/jira/browse/SPARK-5443 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.2.0 Reporter: Derrick Burns Original Estimate: 168h Remaining Estimate: 168h Reading the code for jsonRDD, it appears that all fields of a JSON object are read into a ROW independent of the provided schema. I would expect it to be more efficient to only store in the ROW those fields that are explicitly included in the schema. For example, assume that I only wish to extract the id field of a tweet. If I provided a schema that simply had one field within a map named id, then the row object would only store that field within a map. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5729) Potential NPE in StandaloneRestServer if user specifies bad path
[ https://issues.apache.org/jira/browse/SPARK-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-5729. Resolution: Fixed Fix Version/s: 1.3.0 Potential NPE in StandaloneRestServer if user specifies bad path Key: SPARK-5729 URL: https://issues.apache.org/jira/browse/SPARK-5729 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Critical Fix For: 1.3.0 When we delegate something to the default ErrorServlet, the context should be /*, not just /. One line fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5728) MQTTStreamSuite leaves behind ActiveMQ database files
[ https://issues.apache.org/jira/browse/SPARK-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315360#comment-14315360 ] Apache Spark commented on SPARK-5728: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/4517 MQTTStreamSuite leaves behind ActiveMQ database files - Key: SPARK-5728 URL: https://issues.apache.org/jira/browse/SPARK-5728 Project: Spark Issue Type: Bug Components: Streaming, Tests Affects Versions: 1.2.1 Reporter: Sean Owen Assignee: Sean Owen Priority: Trivial I've seen this several times and finally wanted to fix it: {{MQTTStreamSuite}} uses a local ActiveMQ broker, that creates a working dir for its database in the {{external/mqtt}} directory called {{activemq}}. This doesn't get cleaned up, at least often it does not for me. It's trivial to set it to use a temp directory which the test harness does clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5730) Group methods in the generated doc for spark.ml algorithms.
Xiangrui Meng created SPARK-5730: Summary: Group methods in the generated doc for spark.ml algorithms. Key: SPARK-5730 URL: https://issues.apache.org/jira/browse/SPARK-5730 Project: Spark Issue Type: Documentation Components: Documentation, ML Affects Versions: 1.3.0 Reporter: Xiangrui Meng In spark.ml, we have params and their setters/getters. It is nice to group them in the generated docs. Params should be in the top, while setters/getters should be at the bottom. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5731) Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset
Patrick Wendell created SPARK-5731: -- Summary: Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset Key: SPARK-5731 URL: https://issues.apache.org/jira/browse/SPARK-5731 Project: Spark Issue Type: Bug Components: Streaming Reporter: Patrick Wendell Assignee: Tathagata Das {code} sbt.ForkMain$ForkError: The code passed to eventually never returned normally. Attempted 110 times over 20.070287525 seconds. Last failure message: 300 did not equal 48 didn't get all messages. at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438) at org.apache.spark.streaming.kafka.KafkaStreamSuiteBase.eventually(KafkaStreamSuite.scala:49) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307) at org.apache.spark.streaming.kafka.KafkaStreamSuiteBase.eventually(KafkaStreamSuite.scala:49) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$2.apply$mcV$sp(DirectKafkaStreamSuite.scala:110) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$2.apply(DirectKafkaStreamSuite.scala:70) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$2.apply(DirectKafkaStreamSuite.scala:70) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.org$scalatest$BeforeAndAfter$$super$runTest(DirectKafkaStreamSuite.scala:38) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.runTest(DirectKafkaStreamSuite.scala:38) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.org$scalatest$BeforeAndAfter$$super$run(DirectKafkaStreamSuite.scala:38) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.org$scalatest$BeforeAndAfterAll$$super$run(DirectKafkaStreamSuite.scala:38) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.run(DirectKafkaStreamSuite.scala:38) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) at
[jira] [Updated] (SPARK-5731) Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset
[ https://issues.apache.org/jira/browse/SPARK-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5731: --- Affects Version/s: 1.3.0 Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset Key: SPARK-5731 URL: https://issues.apache.org/jira/browse/SPARK-5731 Project: Spark Issue Type: Bug Components: Streaming, Tests Affects Versions: 1.3.0 Reporter: Patrick Wendell Assignee: Tathagata Das {code} sbt.ForkMain$ForkError: The code passed to eventually never returned normally. Attempted 110 times over 20.070287525 seconds. Last failure message: 300 did not equal 48 didn't get all messages. at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438) at org.apache.spark.streaming.kafka.KafkaStreamSuiteBase.eventually(KafkaStreamSuite.scala:49) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307) at org.apache.spark.streaming.kafka.KafkaStreamSuiteBase.eventually(KafkaStreamSuite.scala:49) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$2.apply$mcV$sp(DirectKafkaStreamSuite.scala:110) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$2.apply(DirectKafkaStreamSuite.scala:70) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$2.apply(DirectKafkaStreamSuite.scala:70) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.org$scalatest$BeforeAndAfter$$super$runTest(DirectKafkaStreamSuite.scala:38) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.runTest(DirectKafkaStreamSuite.scala:38) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.org$scalatest$BeforeAndAfter$$super$run(DirectKafkaStreamSuite.scala:38) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.org$scalatest$BeforeAndAfterAll$$super$run(DirectKafkaStreamSuite.scala:38) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) at
[jira] [Updated] (SPARK-5731) Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset
[ https://issues.apache.org/jira/browse/SPARK-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5731: --- Component/s: Tests Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset Key: SPARK-5731 URL: https://issues.apache.org/jira/browse/SPARK-5731 Project: Spark Issue Type: Bug Components: Streaming, Tests Affects Versions: 1.3.0 Reporter: Patrick Wendell Assignee: Tathagata Das {code} sbt.ForkMain$ForkError: The code passed to eventually never returned normally. Attempted 110 times over 20.070287525 seconds. Last failure message: 300 did not equal 48 didn't get all messages. at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438) at org.apache.spark.streaming.kafka.KafkaStreamSuiteBase.eventually(KafkaStreamSuite.scala:49) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307) at org.apache.spark.streaming.kafka.KafkaStreamSuiteBase.eventually(KafkaStreamSuite.scala:49) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$2.apply$mcV$sp(DirectKafkaStreamSuite.scala:110) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$2.apply(DirectKafkaStreamSuite.scala:70) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$2.apply(DirectKafkaStreamSuite.scala:70) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.org$scalatest$BeforeAndAfter$$super$runTest(DirectKafkaStreamSuite.scala:38) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.runTest(DirectKafkaStreamSuite.scala:38) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.org$scalatest$BeforeAndAfter$$super$run(DirectKafkaStreamSuite.scala:38) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.org$scalatest$BeforeAndAfterAll$$super$run(DirectKafkaStreamSuite.scala:38) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) at
[jira] [Updated] (SPARK-4879) Missing output partitions after job completes with speculative execution
[ https://issues.apache.org/jira/browse/SPARK-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4879: - Fix Version/s: 1.3.0 Missing output partitions after job completes with speculative execution Key: SPARK-4879 URL: https://issues.apache.org/jira/browse/SPARK-4879 Project: Spark Issue Type: Bug Components: Input/Output, Spark Core Affects Versions: 1.0.2, 1.1.1, 1.2.0 Reporter: Josh Rosen Assignee: Josh Rosen Priority: Critical Fix For: 1.3.0 Attachments: speculation.txt, speculation2.txt When speculative execution is enabled ({{spark.speculation=true}}), jobs that save output files may report that they have completed successfully even though some output partitions written by speculative tasks may be missing. h3. Reproduction This symptom was reported to me by a Spark user and I've been doing my own investigation to try to come up with an in-house reproduction. I'm still working on a reliable local reproduction for this issue, which is a little tricky because Spark won't schedule speculated tasks on the same host as the original task, so you need an actual (or containerized) multi-host cluster to test speculation. Here's a simple reproduction of some of the symptoms on EC2, which can be run in {{spark-shell}} with {{--conf spark.speculation=true}}: {code} // Rig a job such that all but one of the tasks complete instantly // and one task runs for 20 seconds on its first attempt and instantly // on its second attempt: val numTasks = 100 sc.parallelize(1 to numTasks, numTasks).repartition(2).mapPartitionsWithContext { case (ctx, iter) = if (ctx.partitionId == 0) { // If this is the one task that should run really slow if (ctx.attemptId == 0) { // If this is the first attempt, run slow Thread.sleep(20 * 1000) } } iter }.map(x = (x, x)).saveAsTextFile(/test4) {code} When I run this, I end up with a job that completes quickly (due to speculation) but reports failures from the speculated task: {code} [...] 14/12/11 01:41:13 INFO scheduler.TaskSetManager: Finished task 37.1 in stage 3.0 (TID 411) in 131 ms on ip-172-31-8-164.us-west-2.compute.internal (100/100) 14/12/11 01:41:13 INFO scheduler.DAGScheduler: Stage 3 (saveAsTextFile at console:22) finished in 0.856 s 14/12/11 01:41:13 INFO spark.SparkContext: Job finished: saveAsTextFile at console:22, took 0.885438374 s 14/12/11 01:41:13 INFO scheduler.TaskSetManager: Ignoring task-finished event for 70.1 in stage 3.0 because task 70 has already completed successfully scala 14/12/11 01:41:13 WARN scheduler.TaskSetManager: Lost task 49.1 in stage 3.0 (TID 413, ip-172-31-8-164.us-west-2.compute.internal): java.io.IOException: Failed to save output of task: attempt_201412110141_0003_m_49_413 org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:160) org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:172) org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:132) org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:109) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:991) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {code} One interesting thing to note about this stack trace: if we look at {{FileOutputCommitter.java:160}} ([link|http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/2.5.0-mr1-cdh5.2.0/org/apache/hadoop/mapred/FileOutputCommitter.java#160]), this point in the execution seems to correspond to a case where a task completes, attempts to commit its output, fails for some reason, then deletes the destination file, tries again, and fails: {code} if (fs.isFile(taskOutput)) { 152 Path finalOutputPath = getFinalPath(jobOutputDir, taskOutput, 153 getTempTaskOutputPath(context)); 154 if (!fs.rename(taskOutput, finalOutputPath)) { 155if (!fs.delete(finalOutputPath, true)) { 156 throw new IOException(Failed to delete
[jira] [Updated] (SPARK-5155) Python API for MQTT streaming
[ https://issues.apache.org/jira/browse/SPARK-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-5155: - Assignee: Prabeesh K Python API for MQTT streaming - Key: SPARK-5155 URL: https://issues.apache.org/jira/browse/SPARK-5155 Project: Spark Issue Type: New Feature Components: PySpark, Streaming Reporter: Davies Liu Assignee: Prabeesh K Python API for MQTT Utils -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5729) Potential NPE in StandaloneRestServer if user specifies bad path
[ https://issues.apache.org/jira/browse/SPARK-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315428#comment-14315428 ] Apache Spark commented on SPARK-5729: - User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/4518 Potential NPE in StandaloneRestServer if user specifies bad path Key: SPARK-5729 URL: https://issues.apache.org/jira/browse/SPARK-5729 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Critical When we delegate something to the default ErrorServlet, the context should be /*, not just /. One line fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5729) Potential NPE in StandaloneRestServer if user specifies bad path
Andrew Or created SPARK-5729: Summary: Potential NPE in StandaloneRestServer if user specifies bad path Key: SPARK-5729 URL: https://issues.apache.org/jira/browse/SPARK-5729 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Critical When we delegate something to the default ErrorServlet, the context should be /*, not just /. One line fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5683) Improve the json serialization for DataFrame API
[ https://issues.apache.org/jira/browse/SPARK-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5683. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4468 [https://github.com/apache/spark/pull/4468] Improve the json serialization for DataFrame API Key: SPARK-5683 URL: https://issues.apache.org/jira/browse/SPARK-5683 Project: Spark Issue Type: Improvement Components: SQL Reporter: Cheng Hao Priority: Minor Fix For: 1.3.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1302) httpd doesn't start in spark-ec2 (cc2.8xlarge)
[ https://issues.apache.org/jira/browse/SPARK-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315447#comment-14315447 ] Greg Temchenko commented on SPARK-1302: --- I'm getting this httpd error on t2.medium instance. Maybe you can describe recommended instance types in the documentation? {code} Starting httpd: httpd: Syntax error on line 153 of /etc/httpd/conf/httpd.conf: Cannot load modules/mod_authn_alias.so into server: /etc/httpd/modules/mod_authn_alias.so: cannot open shared object file: No such file or directory {code} httpd doesn't start in spark-ec2 (cc2.8xlarge) -- Key: SPARK-1302 URL: https://issues.apache.org/jira/browse/SPARK-1302 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 0.9.0 Reporter: Shivaram Venkataraman Priority: Minor In a cc2.8xlarge EC2 cluster launched from master branch, httpd won't start (i.e ganglia doesn't work). The reason seems to be httpd.conf is wrong (newer httpd version ?). The config file contains a bunch of non-existent modules and this happens because we overwrite the default conf with our config file from spark-ec2. We could explore using patch or something like that to just apply the diff we need -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5454) [SQL] Self join with ArrayType columns problems
[ https://issues.apache.org/jira/browse/SPARK-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315515#comment-14315515 ] Apache Spark commented on SPARK-5454: - User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/4520 [SQL] Self join with ArrayType columns problems --- Key: SPARK-5454 URL: https://issues.apache.org/jira/browse/SPARK-5454 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: Pierre Borckmans Priority: Blocker Weird behaviour when performing self join on a table with some ArrayType field. (potential bug ?) I have set up a minimal non working example here: https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f In a nutshell, if the ArrayType column used for the pivot is created manually in the StructType definition, everything works as expected. However, if the ArrayType pivot column is obtained by a sql query (be it by using a array wrapper, or using a collect_list operator for instance), then results are completely off. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5706) Support inference schema from a single json string
[ https://issues.apache.org/jira/browse/SPARK-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315524#comment-14315524 ] Yin Huai commented on SPARK-5706: - Is https://issues.apache.org/jira/browse/SPARK-4336 same as this one? Support inference schema from a single json string -- Key: SPARK-5706 URL: https://issues.apache.org/jira/browse/SPARK-5706 Project: Spark Issue Type: Improvement Components: SQL Reporter: Cheng Hao We notice some developers are complaining the json parsing is very slow, particularly in inferring schema. Some of them suggesting if we can provide an simple interface for inferring the schema by providing a single complete json string record, instead of sampling. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5613) YarnClientSchedulerBackend fails to get application report when yarn restarts
[ https://issues.apache.org/jira/browse/SPARK-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315545#comment-14315545 ] Kashish Jain commented on SPARK-5613: - Thanks Patrick and Andrew YarnClientSchedulerBackend fails to get application report when yarn restarts - Key: SPARK-5613 URL: https://issues.apache.org/jira/browse/SPARK-5613 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Kashish Jain Assignee: Kashish Jain Priority: Minor Fix For: 1.3.0, 1.2.2 Original Estimate: 24h Remaining Estimate: 24h Steps to Reproduce 1) Run any spark job 2) Stop yarn while the spark job is running (an application id has been generated by now) 3) Restart yarn now 4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException exception. This leads to termination of thread. Here is the StackTrace 15/02/05 05:22:37 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/02/05 05:22:38 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/02/05 05:22:39 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/02/05 05:22:40 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 5/02/05 05:22:40 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) Exception in thread Yarn application state monitor org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1423113179043_0003' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291) at org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:116) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:120) Caused by:
[jira] [Updated] (SPARK-4879) Missing output partitions after job completes with speculative execution
[ https://issues.apache.org/jira/browse/SPARK-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4879: - Labels: backport-needed (was: ) Missing output partitions after job completes with speculative execution Key: SPARK-4879 URL: https://issues.apache.org/jira/browse/SPARK-4879 Project: Spark Issue Type: Bug Components: Input/Output, Spark Core Affects Versions: 1.0.2, 1.1.1, 1.2.0 Reporter: Josh Rosen Assignee: Josh Rosen Priority: Critical Labels: backport-needed Fix For: 1.3.0 Attachments: speculation.txt, speculation2.txt When speculative execution is enabled ({{spark.speculation=true}}), jobs that save output files may report that they have completed successfully even though some output partitions written by speculative tasks may be missing. h3. Reproduction This symptom was reported to me by a Spark user and I've been doing my own investigation to try to come up with an in-house reproduction. I'm still working on a reliable local reproduction for this issue, which is a little tricky because Spark won't schedule speculated tasks on the same host as the original task, so you need an actual (or containerized) multi-host cluster to test speculation. Here's a simple reproduction of some of the symptoms on EC2, which can be run in {{spark-shell}} with {{--conf spark.speculation=true}}: {code} // Rig a job such that all but one of the tasks complete instantly // and one task runs for 20 seconds on its first attempt and instantly // on its second attempt: val numTasks = 100 sc.parallelize(1 to numTasks, numTasks).repartition(2).mapPartitionsWithContext { case (ctx, iter) = if (ctx.partitionId == 0) { // If this is the one task that should run really slow if (ctx.attemptId == 0) { // If this is the first attempt, run slow Thread.sleep(20 * 1000) } } iter }.map(x = (x, x)).saveAsTextFile(/test4) {code} When I run this, I end up with a job that completes quickly (due to speculation) but reports failures from the speculated task: {code} [...] 14/12/11 01:41:13 INFO scheduler.TaskSetManager: Finished task 37.1 in stage 3.0 (TID 411) in 131 ms on ip-172-31-8-164.us-west-2.compute.internal (100/100) 14/12/11 01:41:13 INFO scheduler.DAGScheduler: Stage 3 (saveAsTextFile at console:22) finished in 0.856 s 14/12/11 01:41:13 INFO spark.SparkContext: Job finished: saveAsTextFile at console:22, took 0.885438374 s 14/12/11 01:41:13 INFO scheduler.TaskSetManager: Ignoring task-finished event for 70.1 in stage 3.0 because task 70 has already completed successfully scala 14/12/11 01:41:13 WARN scheduler.TaskSetManager: Lost task 49.1 in stage 3.0 (TID 413, ip-172-31-8-164.us-west-2.compute.internal): java.io.IOException: Failed to save output of task: attempt_201412110141_0003_m_49_413 org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:160) org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:172) org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:132) org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:109) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:991) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {code} One interesting thing to note about this stack trace: if we look at {{FileOutputCommitter.java:160}} ([link|http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/2.5.0-mr1-cdh5.2.0/org/apache/hadoop/mapred/FileOutputCommitter.java#160]), this point in the execution seems to correspond to a case where a task completes, attempts to commit its output, fails for some reason, then deletes the destination file, tries again, and fails: {code} if (fs.isFile(taskOutput)) { 152 Path finalOutputPath = getFinalPath(jobOutputDir, taskOutput, 153 getTempTaskOutputPath(context)); 154 if (!fs.rename(taskOutput, finalOutputPath)) { 155if (!fs.delete(finalOutputPath, true)) { 156
[jira] [Updated] (SPARK-5493) Support proxy users under kerberos
[ https://issues.apache.org/jira/browse/SPARK-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5493: --- Assignee: Marcelo Vanzin Support proxy users under kerberos -- Key: SPARK-5493 URL: https://issues.apache.org/jira/browse/SPARK-5493 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.2.0 Reporter: Brock Noland Assignee: Marcelo Vanzin Fix For: 1.3.0 When using kerberos, services may want to use spark-submit to submit jobs as a separate user. For example a service like oozie might want to submit jobs as a client user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5493) Support proxy users under kerberos
[ https://issues.apache.org/jira/browse/SPARK-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5493. Resolution: Fixed Fix Version/s: 1.3.0 Target Version/s: 1.3.0 Support proxy users under kerberos -- Key: SPARK-5493 URL: https://issues.apache.org/jira/browse/SPARK-5493 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.2.0 Reporter: Brock Noland Assignee: Marcelo Vanzin Fix For: 1.3.0 When using kerberos, services may want to use spark-submit to submit jobs as a separate user. For example a service like oozie might want to submit jobs as a client user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5728) MQTTStreamSuite leaves behind ActiveMQ database files
Sean Owen created SPARK-5728: Summary: MQTTStreamSuite leaves behind ActiveMQ database files Key: SPARK-5728 URL: https://issues.apache.org/jira/browse/SPARK-5728 Project: Spark Issue Type: Bug Components: Streaming, Tests Affects Versions: 1.2.1 Reporter: Sean Owen Assignee: Sean Owen Priority: Trivial I've seen this several times and finally wanted to fix it: {{MQTTStreamSuite}} uses a local ActiveMQ broker, that creates a working dir for its database in the {{external/mqtt}} directory called {{activemq}}. This doesn't get cleaned up, at least often it does not for me. It's trivial to set it to use a temp directory which the test harness does clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5658) Finalize DDL and write support APIs
[ https://issues.apache.org/jira/browse/SPARK-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5658. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4446 [https://github.com/apache/spark/pull/4446] Finalize DDL and write support APIs --- Key: SPARK-5658 URL: https://issues.apache.org/jira/browse/SPARK-5658 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yin Huai Priority: Blocker Fix For: 1.3.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5576) saveAsTable into Hive fails due to duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-5576. - Resolution: Won't Fix I am resolving it per discussions in the PR (https://github.com/apache/spark/pull/4346). saveAsTable into Hive fails due to duplicate columns Key: SPARK-5576 URL: https://issues.apache.org/jira/browse/SPARK-5576 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: Dan Osipov Loading JSON files infers case sensitive schema, which results in an error if attempting to save to Hive. {code} import org.apache.spark.sql._ import org.apache.spark.sql.hive._ val hive = new HiveContext(sc) val data = hive.jsonFile(/path/) data.saveAsTable(table) {code} Results in an error: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Duplicate column name data-errorcode in the table definition. Outputting the schema shows the problem field: |-- data-errorCode: string (nullable = true) |-- data-errorcode: string (nullable = true) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5704) createDataFrame replace applySchema/inferSchema
[ https://issues.apache.org/jira/browse/SPARK-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5704. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4498 [https://github.com/apache/spark/pull/4498] createDataFrame replace applySchema/inferSchema --- Key: SPARK-5704 URL: https://issues.apache.org/jira/browse/SPARK-5704 Project: Spark Issue Type: Sub-task Components: PySpark, SQL Reporter: Davies Liu Fix For: 1.3.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5655) YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode
[ https://issues.apache.org/jira/browse/SPARK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314424#comment-14314424 ] Apache Spark commented on SPARK-5655: - User 'growse' has created a pull request for this issue: https://github.com/apache/spark/pull/4507 YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode - Key: SPARK-5655 URL: https://issues.apache.org/jira/browse/SPARK-5655 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Environment: Both CDH5.3.0 and CDH5.1.3, latest build on branch-1.2 Reporter: Andrew Rowson Labels: hadoop When running a Spark job on a YARN cluster which doesn't run containers under the same user as the nodemanager, and also when using the YARN auxiliary shuffle service, jobs fail with something similar to: {code:java} java.io.FileNotFoundException: /data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index (Permission denied) {code} The root cause of this here: https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287 Spark will attempt to chmod 700 any application directories it creates during the job, which includes files created in the nodemanager's usercache directory. The owner of these files is the container UID, which on a secure cluster is the name of the user creating the job, and on an nonsecure cluster but with the yarn.nodemanager.container-executor.class configured is the value of yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user. The problem with this is that the auxiliary shuffle manager runs as part of the nodemanager, which is typically running as the user 'yarn'. This can't access these files that are only owner-readable. YARN already attempts to secure files created under appcache but keep them readable by the nodemanager, by setting the group of the appcache directory to 'yarn' and also setting the setgid flag. This means that files and directories created under this should also have the 'yarn' group. Normally this means that the nodemanager should also be able to read these files, but Spark setting chmod700 wipes this out. I'm not sure what the right approach is here. Commenting out the chmod700 functionality makes this work on YARN, and still makes the application files only readable by the owner and the group: {code} /data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c # ls -lah total 206M drwxr-s--- 2 nobody yarn 4.0K Feb 6 18:30 . drwxr-s--- 12 nobody yarn 4.0K Feb 6 18:30 .. -rw-r- 1 nobody yarn 206M Feb 6 18:30 shuffle_0_0_0.data {code} But this may not be the right approach on non-YARN. Perhaps an additional step to see if this chmod700 step is necessary (ie non-YARN) is required. Sadly, I don't have a non-YARN environment to test, otherwise I'd be able to suggest a patch. I believe this is a related issue in the MapReduce framwork: https://issues.apache.org/jira/browse/MAPREDUCE-3728 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5720) `Create Table Like` in HiveContext need support `like registered temporary table`
[ https://issues.apache.org/jira/browse/SPARK-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314446#comment-14314446 ] Li Sheng commented on SPARK-5720: - We need support user registered temp table as source table in `create table like` command. `Create Table Like` in HiveContext need support `like registered temporary table` - Key: SPARK-5720 URL: https://issues.apache.org/jira/browse/SPARK-5720 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Li Sheng Fix For: 1.3.0 Original Estimate: 72h Remaining Estimate: 72h -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5655) YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode
[ https://issues.apache.org/jira/browse/SPARK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314451#comment-14314451 ] Apache Spark commented on SPARK-5655: - User 'growse' has created a pull request for this issue: https://github.com/apache/spark/pull/4509 YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode - Key: SPARK-5655 URL: https://issues.apache.org/jira/browse/SPARK-5655 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Environment: Both CDH5.3.0 and CDH5.1.3, latest build on branch-1.2 Reporter: Andrew Rowson Labels: hadoop When running a Spark job on a YARN cluster which doesn't run containers under the same user as the nodemanager, and also when using the YARN auxiliary shuffle service, jobs fail with something similar to: {code:java} java.io.FileNotFoundException: /data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index (Permission denied) {code} The root cause of this here: https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287 Spark will attempt to chmod 700 any application directories it creates during the job, which includes files created in the nodemanager's usercache directory. The owner of these files is the container UID, which on a secure cluster is the name of the user creating the job, and on an nonsecure cluster but with the yarn.nodemanager.container-executor.class configured is the value of yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user. The problem with this is that the auxiliary shuffle manager runs as part of the nodemanager, which is typically running as the user 'yarn'. This can't access these files that are only owner-readable. YARN already attempts to secure files created under appcache but keep them readable by the nodemanager, by setting the group of the appcache directory to 'yarn' and also setting the setgid flag. This means that files and directories created under this should also have the 'yarn' group. Normally this means that the nodemanager should also be able to read these files, but Spark setting chmod700 wipes this out. I'm not sure what the right approach is here. Commenting out the chmod700 functionality makes this work on YARN, and still makes the application files only readable by the owner and the group: {code} /data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c # ls -lah total 206M drwxr-s--- 2 nobody yarn 4.0K Feb 6 18:30 . drwxr-s--- 12 nobody yarn 4.0K Feb 6 18:30 .. -rw-r- 1 nobody yarn 206M Feb 6 18:30 shuffle_0_0_0.data {code} But this may not be the right approach on non-YARN. Perhaps an additional step to see if this chmod700 step is necessary (ie non-YARN) is required. Sadly, I don't have a non-YARN environment to test, otherwise I'd be able to suggest a patch. I believe this is a related issue in the MapReduce framwork: https://issues.apache.org/jira/browse/MAPREDUCE-3728 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5717) add sc.stop to LDA examples
[ https://issues.apache.org/jira/browse/SPARK-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5717: - Fix Version/s: 1.3.0 add sc.stop to LDA examples --- Key: SPARK-5717 URL: https://issues.apache.org/jira/browse/SPARK-5717 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.2.0 Reporter: yuhao yang Assignee: yuhao yang Priority: Trivial Fix For: 1.3.0 Original Estimate: 1h Remaining Estimate: 1h Trivial. add sc stop and reorganize import in LDAExample and JavaLDAExample -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5720) `Create Table Like` in HiveContext need support `like registered temporary table`
[ https://issues.apache.org/jira/browse/SPARK-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Sheng updated SPARK-5720: Issue Type: Improvement (was: New Feature) `Create Table Like` in HiveContext need support `like registered temporary table` - Key: SPARK-5720 URL: https://issues.apache.org/jira/browse/SPARK-5720 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Li Sheng Fix For: 1.3.0 Original Estimate: 72h Remaining Estimate: 72h -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5720) `Create Table Like` in HiveContext need support `like registered temporary table`
[ https://issues.apache.org/jira/browse/SPARK-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Sheng updated SPARK-5720: Summary: `Create Table Like` in HiveContext need support `like registered temporary table` (was: Support `Create Table Like` in HiveContext) `Create Table Like` in HiveContext need support `like registered temporary table` - Key: SPARK-5720 URL: https://issues.apache.org/jira/browse/SPARK-5720 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.3.0 Reporter: Li Sheng Fix For: 1.3.0 Original Estimate: 72h Remaining Estimate: 72h -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2653) Heap size should be the sum of driver.memory and executor.memory in local mode
[ https://issues.apache.org/jira/browse/SPARK-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314441#comment-14314441 ] Davies Liu commented on SPARK-2653: --- [~andrewor14] Could you help to answer the question here? I'm not familiar with SparkSubmit. Heap size should be the sum of driver.memory and executor.memory in local mode -- Key: SPARK-2653 URL: https://issues.apache.org/jira/browse/SPARK-2653 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Davies Liu Priority: Minor Original Estimate: 1h Remaining Estimate: 1h In local mode, the driver and executor run in the same JVM, so the heap size of JVM should be the sum of spark.driver.memory and spark.executor.memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5717) add sc.stop to LDA examples
[ https://issues.apache.org/jira/browse/SPARK-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5717. -- Resolution: Fixed Assignee: yuhao yang Target Version/s: 1.3.0 Resolved by https://github.com/apache/spark/commit/6cc96cf0c3ea87ab65d42a59725959d94701577b add sc.stop to LDA examples --- Key: SPARK-5717 URL: https://issues.apache.org/jira/browse/SPARK-5717 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.2.0 Reporter: yuhao yang Assignee: yuhao yang Priority: Trivial Original Estimate: 1h Remaining Estimate: 1h Trivial. add sc stop and reorganize import in LDAExample and JavaLDAExample -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5239) JdbcRDD throws java.lang.AbstractMethodError: oracle.jdbc.driver.xxxxxx.isClosed()Z
[ https://issues.apache.org/jira/browse/SPARK-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5239: - Assignee: Gankun Luo JdbcRDD throws java.lang.AbstractMethodError: oracle.jdbc.driver.xx.isClosed()Z - Key: SPARK-5239 URL: https://issues.apache.org/jira/browse/SPARK-5239 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.1, 1.2.0 Environment: centos6.4 + ojdbc14 Reporter: Gankun Luo Assignee: Gankun Luo Priority: Minor Fix For: 1.3.0 I try use JdbcRDD to operate the table of Oracle database, but failed. My test code as follows: {code} import java.sql.DriverManager import org.apache.spark.SparkContext import org.apache.spark.rdd.JdbcRDD import org.apache.spark.SparkConf object JdbcRDD4Oracle { def main(args: Array[String]) { val sc = new SparkContext(new SparkConf().setAppName(JdbcRDD4Oracle).setMaster(local[2])) val rdd = new JdbcRDD(sc, () = getConnection, getSQL, 12987, 13055, 3, r = { (r.getObject(HISTORY_ID), r.getObject(APPROVE_OPINION)) }) println(rdd.collect.toList) sc.stop() } def getConnection() = { Class.forName(oracle.jdbc.driver.OracleDriver).newInstance() DriverManager.getConnection(jdbc:oracle:thin:@hadoop000:1521/ORCL, scott, tiger) } def getSQL() = { select HISTORY_ID,APPROVE_OPINION from CI_APPROVE_HISTORY WHERE HISTORY_ID =? AND HISTORY_ID =? } } {code} Run the example, I get the following exception: {code} 09:56:48,302 [Executor task launch worker-0] ERROR Logging$class : Error in TaskCompletionListener java.lang.AbstractMethodError: oracle.jdbc.driver.OracleResultSetImpl.isClosed()Z at org.apache.spark.rdd.JdbcRDD$$anon$1.close(JdbcRDD.scala:99) at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63) at org.apache.spark.rdd.JdbcRDD$$anon$1$$anonfun$1.apply(JdbcRDD.scala:71) at org.apache.spark.rdd.JdbcRDD$$anon$1$$anonfun$1.apply(JdbcRDD.scala:71) at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:85) at org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskContext.scala:110) at org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskContext.scala:108) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContext.markTaskCompleted(TaskContext.scala:108) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:64) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 09:56:48,302 [Executor task launch worker-1] ERROR Logging$class : Error in TaskCompletionListener {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5712) Semicolon at end of a comment line
Adrian Wang created SPARK-5712: -- Summary: Semicolon at end of a comment line Key: SPARK-5712 URL: https://issues.apache.org/jira/browse/SPARK-5712 Project: Spark Issue Type: Bug Components: SQL Reporter: Adrian Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5239) JdbcRDD throws java.lang.AbstractMethodError: oracle.jdbc.driver.xxxxxx.isClosed()Z
[ https://issues.apache.org/jira/browse/SPARK-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5239. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4470 [https://github.com/apache/spark/pull/4470] JdbcRDD throws java.lang.AbstractMethodError: oracle.jdbc.driver.xx.isClosed()Z - Key: SPARK-5239 URL: https://issues.apache.org/jira/browse/SPARK-5239 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.1, 1.2.0 Environment: centos6.4 + ojdbc14 Reporter: Gankun Luo Priority: Minor Fix For: 1.3.0 I try use JdbcRDD to operate the table of Oracle database, but failed. My test code as follows: {code} import java.sql.DriverManager import org.apache.spark.SparkContext import org.apache.spark.rdd.JdbcRDD import org.apache.spark.SparkConf object JdbcRDD4Oracle { def main(args: Array[String]) { val sc = new SparkContext(new SparkConf().setAppName(JdbcRDD4Oracle).setMaster(local[2])) val rdd = new JdbcRDD(sc, () = getConnection, getSQL, 12987, 13055, 3, r = { (r.getObject(HISTORY_ID), r.getObject(APPROVE_OPINION)) }) println(rdd.collect.toList) sc.stop() } def getConnection() = { Class.forName(oracle.jdbc.driver.OracleDriver).newInstance() DriverManager.getConnection(jdbc:oracle:thin:@hadoop000:1521/ORCL, scott, tiger) } def getSQL() = { select HISTORY_ID,APPROVE_OPINION from CI_APPROVE_HISTORY WHERE HISTORY_ID =? AND HISTORY_ID =? } } {code} Run the example, I get the following exception: {code} 09:56:48,302 [Executor task launch worker-0] ERROR Logging$class : Error in TaskCompletionListener java.lang.AbstractMethodError: oracle.jdbc.driver.OracleResultSetImpl.isClosed()Z at org.apache.spark.rdd.JdbcRDD$$anon$1.close(JdbcRDD.scala:99) at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63) at org.apache.spark.rdd.JdbcRDD$$anon$1$$anonfun$1.apply(JdbcRDD.scala:71) at org.apache.spark.rdd.JdbcRDD$$anon$1$$anonfun$1.apply(JdbcRDD.scala:71) at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:85) at org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskContext.scala:110) at org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskContext.scala:108) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContext.markTaskCompleted(TaskContext.scala:108) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:64) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 09:56:48,302 [Executor task launch worker-1] ERROR Logging$class : Error in TaskCompletionListener {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5700) Bump jets3t version from 0.9.2 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles
[ https://issues.apache.org/jira/browse/SPARK-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313789#comment-14313789 ] Apache Spark commented on SPARK-5700: - User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/4499 Bump jets3t version from 0.9.2 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles - Key: SPARK-5700 URL: https://issues.apache.org/jira/browse/SPARK-5700 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 1.3.0 Reporter: Cheng Lian Assignee: Cheng Lian Labels: flaky-test This is a follow-up ticket for SPARK-5671 and SPARK-5696. JetS3t 0.9.2 contains a log4j.properties file inside the artifact and breaks our tests (see SPARK-5696). This is fixed in 0.9.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5671) Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles
[ https://issues.apache.org/jira/browse/SPARK-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313788#comment-14313788 ] Cheng Lian commented on SPARK-5671: --- Thanks, Josh. PR submitted. Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles - Key: SPARK-5671 URL: https://issues.apache.org/jira/browse/SPARK-5671 Project: Spark Issue Type: Improvement Components: Build Reporter: Josh Rosen Assignee: Josh Rosen Fix For: 1.3.0 Bumping the jets3t version from 0.9.0 to 0.9.2 for the hadoop-2.3 and hadoop-2.4 profiles fixes a dependency conflict issue that was causing UISeleniumSuite tests to fail with ClassNotFoundExceptions in the with YARN builds. Jets3t release notes can be found here: http://www.jets3t.org/RELEASE_NOTES.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5671) Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles
[ https://issues.apache.org/jira/browse/SPARK-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-5671: -- Comment: was deleted (was: Thanks, Josh. PR submitted.) Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles - Key: SPARK-5671 URL: https://issues.apache.org/jira/browse/SPARK-5671 Project: Spark Issue Type: Improvement Components: Build Reporter: Josh Rosen Assignee: Josh Rosen Fix For: 1.3.0 Bumping the jets3t version from 0.9.0 to 0.9.2 for the hadoop-2.3 and hadoop-2.4 profiles fixes a dependency conflict issue that was causing UISeleniumSuite tests to fail with ClassNotFoundExceptions in the with YARN builds. Jets3t release notes can be found here: http://www.jets3t.org/RELEASE_NOTES.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5671) Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles
[ https://issues.apache.org/jira/browse/SPARK-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313790#comment-14313790 ] Cheng Lian commented on SPARK-5671: --- Thanks, Josh. PR submitted. Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles - Key: SPARK-5671 URL: https://issues.apache.org/jira/browse/SPARK-5671 Project: Spark Issue Type: Improvement Components: Build Reporter: Josh Rosen Assignee: Josh Rosen Fix For: 1.3.0 Bumping the jets3t version from 0.9.0 to 0.9.2 for the hadoop-2.3 and hadoop-2.4 profiles fixes a dependency conflict issue that was causing UISeleniumSuite tests to fail with ClassNotFoundExceptions in the with YARN builds. Jets3t release notes can be found here: http://www.jets3t.org/RELEASE_NOTES.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5700) Bump jets3t version from 0.9.2 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles
[ https://issues.apache.org/jira/browse/SPARK-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313793#comment-14313793 ] Cheng Lian commented on SPARK-5700: --- Thanks, Josh. PR submitted. Bump jets3t version from 0.9.2 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles - Key: SPARK-5700 URL: https://issues.apache.org/jira/browse/SPARK-5700 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 1.3.0 Reporter: Cheng Lian Assignee: Cheng Lian Labels: flaky-test This is a follow-up ticket for SPARK-5671 and SPARK-5696. JetS3t 0.9.2 contains a log4j.properties file inside the artifact and breaks our tests (see SPARK-5696). This is fixed in 0.9.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5700) Bump jets3t version from 0.9.2 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles
[ https://issues.apache.org/jira/browse/SPARK-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-5700: -- Comment: was deleted (was: Thanks, Josh. PR submitted.) Bump jets3t version from 0.9.2 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles - Key: SPARK-5700 URL: https://issues.apache.org/jira/browse/SPARK-5700 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 1.3.0 Reporter: Cheng Lian Assignee: Cheng Lian Labels: flaky-test This is a follow-up ticket for SPARK-5671 and SPARK-5696. JetS3t 0.9.2 contains a log4j.properties file inside the artifact and breaks our tests (see SPARK-5696). This is fixed in 0.9.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5712) Semicolon at end of a comment line
[ https://issues.apache.org/jira/browse/SPARK-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313807#comment-14313807 ] Apache Spark commented on SPARK-5712: - User 'adrian-wang' has created a pull request for this issue: https://github.com/apache/spark/pull/4500 Semicolon at end of a comment line -- Key: SPARK-5712 URL: https://issues.apache.org/jira/browse/SPARK-5712 Project: Spark Issue Type: Bug Components: SQL Reporter: Adrian Wang HIVE-3348 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5712) Semicolon at end of a comment line
[ https://issues.apache.org/jira/browse/SPARK-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang updated SPARK-5712: --- Description: HIVE-3348 Semicolon at end of a comment line -- Key: SPARK-5712 URL: https://issues.apache.org/jira/browse/SPARK-5712 Project: Spark Issue Type: Bug Components: SQL Reporter: Adrian Wang HIVE-3348 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5700) Bump jets3t version from 0.9.2 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles
[ https://issues.apache.org/jira/browse/SPARK-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313794#comment-14313794 ] Cheng Lian commented on SPARK-5700: --- Thanks, Josh. PR submitted. Bump jets3t version from 0.9.2 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles - Key: SPARK-5700 URL: https://issues.apache.org/jira/browse/SPARK-5700 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 1.3.0 Reporter: Cheng Lian Assignee: Cheng Lian Labels: flaky-test This is a follow-up ticket for SPARK-5671 and SPARK-5696. JetS3t 0.9.2 contains a log4j.properties file inside the artifact and breaks our tests (see SPARK-5696). This is fixed in 0.9.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4900) MLlib SingularValueDecomposition ARPACK IllegalStateException
[ https://issues.apache.org/jira/browse/SPARK-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313863#comment-14313863 ] Sean Owen commented on SPARK-4900: -- Yeah, that fixes the error at least. I imagine the rest is a function of the input and ARPACK, but if there's evidence that something is wrong in the Spark in between, you can reopen. MLlib SingularValueDecomposition ARPACK IllegalStateException -- Key: SPARK-4900 URL: https://issues.apache.org/jira/browse/SPARK-4900 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.1.1, 1.2.0, 1.2.1 Environment: Ubuntu 1410, Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode) spark local mode Reporter: Mike Beyer Assignee: Sean Owen Fix For: 1.3.0 java.lang.reflect.InvocationTargetException ... Caused by: java.lang.IllegalStateException: ARPACK returns non-zero info = 3 Please refer ARPACK user guide for error message. at org.apache.spark.mllib.linalg.EigenValueDecomposition$.symmetricEigs(EigenValueDecomposition.scala:120) at org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:235) at org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:171) ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5558) pySpark zip function unexpected errors
[ https://issues.apache.org/jira/browse/SPARK-5558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5558. -- Resolution: Fixed Fix Version/s: 1.3.0 Could be related to how https://issues.apache.org/jira/browse/SPARK-5351 was fixed. OK, let's close for now if it seems to be verified as fixed for 1.3. pySpark zip function unexpected errors -- Key: SPARK-5558 URL: https://issues.apache.org/jira/browse/SPARK-5558 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.2.0 Reporter: Charles Hayden Labels: pyspark Fix For: 1.3.0 Example: {quote} x = sc.parallelize(range(0,5)) y = x.map(lambda x: x+1000, preservesPartitioning=True) y.take(10) x.zip\(y).collect() {quote} Fails in the JVM: py4J: org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition If the range is changed to range(0,1000) it fails in pySpark code: ValueError: Can not deserialize RDD with different number of items in pair: (100, 1) It also fails if y.take(10) is replaced with y.toDebugString() It even fails if we print y._jrdd -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4208) stack over flow error while using sqlContext.sql
[ https://issues.apache.org/jira/browse/SPARK-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313871#comment-14313871 ] seekerak commented on SPARK-4208: - me too my env: = centos 6.3, jdk 1.7, spark-1.2.0-bin-hadoop2.4.1 = my code: = val sparkConf = new SparkConf().setAppName(action_sql).setMaster(spark://xxx:7077); val sc = new SparkContext(sparkConf); val sqlContext = new org.apache.spark.sql.SQLContext(sc); val file = sc.textFile(/hosts); val schemaString = ip host; val schema = StructType(schemaString.split( ).map(fieldName = StructField(fieldName, StringType, true))); val rowRDD = file.map(_.split( )).map(p = Row(p(0), p(1).trim())); val ip_hostRDD = sqlContext.applySchema(rowRDD, schema); ip_hostRDD.registerTempTable(ip_host_table); val hosts = sqlContext.sql(select ip from ip_host_table); hosts.map(t = ip: + t(0)).collect().foreach(println); = stack over flow error while using sqlContext.sql Key: SPARK-4208 URL: https://issues.apache.org/jira/browse/SPARK-4208 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.1.0 Environment: windows 7 , prebuilt spark-1.1.0-bin-hadoop2.3 Reporter: milq Labels: java, spark, sparkcontext, sql error happens when using sqlContext.sql 14/11/03 18:54:43 INFO BlockManager: Removing block broadcast_1 14/11/03 18:54:43 INFO MemoryStore: Block broadcast_1 of size 2976 dropped from memory (free 28010260 14/11/03 18:54:43 INFO ContextCleaner: Cleaned broadcast 1 root |-- firstName : string (nullable = true) |-- lastNameX: string (nullable = true) Exception in thread main java.lang.StackOverflowError at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5714) Refactor initial step of LDA to remove redundant operations
Liang-Chi Hsieh created SPARK-5714: -- Summary: Refactor initial step of LDA to remove redundant operations Key: SPARK-5714 URL: https://issues.apache.org/jira/browse/SPARK-5714 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Liang-Chi Hsieh Priority: Minor The initialState of LDA performs several RDD operations that looks redundant. This pr tries to simplify these operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4906) Spark master OOMs with exception stack trace stored in JobProgressListener
[ https://issues.apache.org/jira/browse/SPARK-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313947#comment-14313947 ] Matt Cheah edited comment on SPARK-4906 at 2/10/15 10:15 AM: - Spark has logic for failing a stage if there are too many task failures. Keeping the entire UI state is problematic however even without stack traces. Just having a large number of jobs accumulating in the master along with each of those jobs having a large number of tasks can bloat the heap on the master because of the UI state. I don't see why we can't make JobProgressListener use a Spillable object or something similar to keep some of the UI state on disk. Maybe even maintain the state as compressed bytes in memory, if we don't want to deal with the hassles of disk spilling? was (Author: mcheah): Spark has logic for failing a stage if there are too many task failures. Keeping the entire UI state is problematic however even without stack traces. Just having a large number of jobs accumulating in the master along with each of those jobs having a large number of tasks can bloat the heap on the master because of the UI state. Spark master OOMs with exception stack trace stored in JobProgressListener -- Key: SPARK-4906 URL: https://issues.apache.org/jira/browse/SPARK-4906 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.1.1 Reporter: Mingyu Kim Spark master was OOMing with a lot of stack traces retained in JobProgressListener. The object dependency goes like the following. JobProgressListener.stageIdToData = StageUIData.taskData = TaskUIData.errorMessage Each error message is ~10kb since it has the entire stack trace. As we have a lot of tasks, when all of the tasks across multiple stages go bad, these error messages accounted for 0.5GB of heap at some point. Please correct me if I'm wrong, but it looks like all the task info for running applications are kept in memory, which means it's almost always bound to OOM for long-running applications. Would it make sense to fix this, for example, by spilling some UI states to disk? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5716) Support TOK_CHARSETLITERAL in HiveQl
Adrian Wang created SPARK-5716: -- Summary: Support TOK_CHARSETLITERAL in HiveQl Key: SPARK-5716 URL: https://issues.apache.org/jira/browse/SPARK-5716 Project: Spark Issue Type: New Feature Components: SQL Environment: _UTF8 0x12345678 Reporter: Adrian Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5716) Support TOK_CHARSETLITERAL in HiveQl
[ https://issues.apache.org/jira/browse/SPARK-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang updated SPARK-5716: --- Description: where value = _UTF8 0x12345678 Environment: (was: _UTF8 0x12345678) Support TOK_CHARSETLITERAL in HiveQl Key: SPARK-5716 URL: https://issues.apache.org/jira/browse/SPARK-5716 Project: Spark Issue Type: New Feature Components: SQL Reporter: Adrian Wang where value = _UTF8 0x12345678 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5717) add sc.stop to LDA examples
yuhao yang created SPARK-5717: - Summary: add sc.stop to LDA examples Key: SPARK-5717 URL: https://issues.apache.org/jira/browse/SPARK-5717 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.2.0 Reporter: yuhao yang Priority: Trivial Trivial. add sc stop and reorganize import in LDAExample and JavaLDAExample -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5566) Tokenizer for mllib package
[ https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313996#comment-14313996 ] Augustin Borsu commented on SPARK-5566: --- We could use a tokenizer like this, but we would need to add regex and Array[String] parameters type to be able to change those aprameters in a crossvalidation. https://github.com/apache/spark/pull/4504 Tokenizer for mllib package --- Key: SPARK-5566 URL: https://issues.apache.org/jira/browse/SPARK-5566 Project: Spark Issue Type: New Feature Components: ML, MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley There exist tokenizer classes in the spark.ml.feature package and in the LDAExample in the spark.examples.mllib package. The Tokenizer in the LDAExample is more advanced and should be made into a full-fledged public class in spark.mllib.feature. The spark.ml.feature.Tokenizer class should become a wrapper around the new Tokenizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5676) License missing from spark-ec2 repo
[ https://issues.apache.org/jira/browse/SPARK-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313995#comment-14313995 ] Adam B commented on SPARK-5676: --- I agree that spark-ec2 has less to do with Mesos than even Spark itself does. After graduating from the incubator, Mesos lives at github.com/apache/mesos and the remaining projects under the mesos github org are mostly Mesos frameworks or other Mesos ecosystem components. To be honest, spark-ec2 is the only one that doesn't fit into that category. While we are happy to continue hosting spark-ec2 under the Mesos repo, even adding new Collaborators with Write access as needed, it probably makes sense to transfer ownership of the repo to a Spark or Berkeley-owned github org. Just ping d...@mesos.apache.org if you need any assistance from us. License missing from spark-ec2 repo --- Key: SPARK-5676 URL: https://issues.apache.org/jira/browse/SPARK-5676 Project: Spark Issue Type: Bug Components: EC2 Reporter: Florian Verhein There is no LICENSE file or licence headers in the code in the spark-ec2 repo. Also, I believe there is no contributor license agreement notification in place (like there is in the main spark repo). It would be great to fix this (sooner better than later while contributors list is small), so that users wishing to use this part of Spark are not in doubt over licensing issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5705) Explore GPU-accelerated Linear Algebra Libraries
[ https://issues.apache.org/jira/browse/SPARK-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5705. -- Resolution: Duplicate This already had a discussion going; let's merge the two. Explore GPU-accelerated Linear Algebra Libraries Key: SPARK-5705 URL: https://issues.apache.org/jira/browse/SPARK-5705 Project: Spark Issue Type: Bug Components: MLlib Reporter: Evan Sparks Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4360) task only execute on one node when spark on yarn
[ https://issues.apache.org/jira/browse/SPARK-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313898#comment-14313898 ] seekerak commented on SPARK-4360: - i have resolved this issue by configure yarn scheduler like this: property nameyarn.resourcemanager.scheduler.class/name valueorg.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler/value /property or property nameyarn.resourcemanager.scheduler.class/name valueorg.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler/value /property and property nameyarn.scheduler.capacity.resource-calculator/name valueorg.apache.hadoop.yarn.util.resource.DominantResourceCalculator/value /property so the real reason is yarn's resource scheduler, if one node can provide all resource that tasks required, all task maybe run on one node only. task only execute on one node when spark on yarn Key: SPARK-4360 URL: https://issues.apache.org/jira/browse/SPARK-4360 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.0.2 Reporter: seekerak hadoop version: hadoop 2.0.3-alpha spark version: 1.0.2 when i run spark jobs on yarn, i found all the task only run on one node, my cluster has 4 nodes, executors has 3, but only one has task, the others hasn't, my command like this : /opt/hadoopcluster/spark-1.0.2-bin-hadoop2/bin/spark-submit --class org.sr.scala.Spark_LineCount_G0 --executor-memory 2G --num-executors 12 --master yarn-cluster /home/Spark_G0.jar /data /output/ou_1 is there any one knows why? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5713) Support python serialization for RandomForest
Guillaume Charhon created SPARK-5713: Summary: Support python serialization for RandomForest Key: SPARK-5713 URL: https://issues.apache.org/jira/browse/SPARK-5713 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.2.0 Environment: Tested on MacOS Reporter: Guillaume Charhon I was trying to pickle a trained ramdom forest model. Unfortunately, it is impossible to serialize the model for future use. model = RandomForest.trainRegressor(trainingData, categoricalFeaturesInfo={}, numTrees=nb_tree,featureSubsetStrategy=auto, impurity='variance', maxDepth=depth) output = open('model.ml', 'wb') pickle.dump(model,output) I am getting this error: TypeError: can't pickle lock objects I am using Spark 1.2.0. I have also tested Spark 1.2.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5714) Refactor initial step of LDA to remove redundant operations
[ https://issues.apache.org/jira/browse/SPARK-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313917#comment-14313917 ] Apache Spark commented on SPARK-5714: - User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/4501 Refactor initial step of LDA to remove redundant operations --- Key: SPARK-5714 URL: https://issues.apache.org/jira/browse/SPARK-5714 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Liang-Chi Hsieh Priority: Minor The initialState of LDA performs several RDD operations that looks redundant. This pr tries to simplify these operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5667) Remove version from spark-ec2 example.
[ https://issues.apache.org/jira/browse/SPARK-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel Peralvo closed SPARK-5667. - Remove version from spark-ec2 example. -- Key: SPARK-5667 URL: https://issues.apache.org/jira/browse/SPARK-5667 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.2.0, 1.3.0, 1.2.1, 1.2.2 Reporter: Miguel Peralvo Priority: Trivial Labels: documentation Fix For: 1.3.0 Remove version from spark-ec2 example for spark-ec2/Launch Cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5715) Shuffle size increase, performance loss from Spark 1.1.0 to Spark 1.2.0 (and 1.2.1)
Dr. Christian Betz created SPARK-5715: - Summary: Shuffle size increase, performance loss from Spark 1.1.0 to Spark 1.2.0 (and 1.2.1) Key: SPARK-5715 URL: https://issues.apache.org/jira/browse/SPARK-5715 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0, 1.2.1 Environment: Running with local[*] as master (experienced it during pre-integration test, not in production cluster), 100GByte memory assigned, 16 core machine. Reporter: Dr. Christian Betz I see a *factor four performance loss* in my Spark jobs when migrating from Spark 1.1.0 to Spark 1.2.0 or 1.2.1. Also, I see an *increase in the size of shuffle writes* (which is also reported by Kevin Jung on the mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tt20894.html Together with this I experience a *huge number of disk spills*. I'm experiencing these with my job under the following circumstances: * Spark 1.2.0 with Sort-based Shuffle * Spark 1.2.0 with Hash-based Shuffle * Spark 1.2.1 with Sort-based Shuffle All three combinations show the same behavior, which contrasts from Spark 1.1.0. In Spark 1.1.0, my job runs for about an hour, in Spark 1.2.x it runs for almost four hours. Configuration is identical otherwise - I only added org.apache.spark.scheduler.CompressedMapStatus to the Kryo registrator for Spark 1.2.0 to cope with https://issues.apache.org/jira/browse/SPARK-5102. As a consequence (I think, but causality might be different) I see lots and lots of disk spills. I cannot provide a small test case, but maybe the log entries for a single worker thread can help someone investigate on this. (See below.) I will also open up an issue, if nobody stops me by providing an answer ;) Any help will be greatly appreciated, because otherwise I'm stuck with Spark 1.1.0, as quadrupling runtime is not an option. Sincerely, Chris 2015-02-09T14:06:06.328+01:00 INFO org.apache.spark.executor.Executor Running task 9.0 in stage 18.0 (TID 300) Executor task launch worker-18 2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.CacheManager Partition rdd_35_9 not found, computing it Executor task launch worker-18 2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 10 non-empty blocks out of 10 blocks Executor task launch worker-18 2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches in 0 ms Executor task launch worker-18 2015-02-09T14:06:07.396+01:00 INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(2582904) called with curMem=300174944, maxMe... Executor task launch worker-18 2015-02-09T14:06:07.397+01:00 INFO org.apache.spark.storage.MemoryStore Block rdd_35_9 stored as bytes in memory (estimated size 2.5... Executor task launch worker-18 2015-02-09T14:06:07.398+01:00 INFO org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_35_9 Executor task launch worker-18 2015-02-09T14:06:07.399+01:00 INFO org.apache.spark.CacheManager Partition rdd_38_9 not found, computing it Executor task launch worker-18 2015-02-09T14:06:07.399+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 10 non-empty blocks out of 10 blocks Executor task launch worker-18 2015-02-09T14:06:07.400+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches in 0 ms Executor task launch worker-18 2015-02-09T14:06:07.567+01:00 INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(944848) called with curMem=302757848, maxMem... Executor task launch worker-18 2015-02-09T14:06:07.568+01:00 INFO org.apache.spark.storage.MemoryStore Block rdd_38_9 stored as values in memory (estimated size 92... Executor task launch worker-18 2015-02-09T14:06:07.569+01:00 INFO org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_38_9 Executor task launch worker-18 2015-02-09T14:06:07.573+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 34 non-empty blocks out of 50 blocks Executor task launch worker-18 2015-02-09T14:06:07.573+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches in 1 ms Executor task launch worker-18 2015-02-09T14:06:38.931+01:00 INFO org.apache.spark.CacheManager Partition rdd_41_9 not found, computing it Executor task launch worker-18 2015-02-09T14:06:38.931+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 3 non-empty blocks out of 10 blocks Executor task launch worker-18 2015-02-09T14:06:38.931+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches in 0 ms Executor task launch worker-18 2015-02-09T14:06:38.945+01:00 INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(0)
[jira] [Commented] (SPARK-4906) Spark master OOMs with exception stack trace stored in JobProgressListener
[ https://issues.apache.org/jira/browse/SPARK-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313947#comment-14313947 ] Matt Cheah commented on SPARK-4906: --- Spark has logic for failing a stage if there are too many task failures. Keeping the entire UI state is problematic however even without stack traces. Just having a large number of jobs accumulating in the master along with each of those jobs having a large number of tasks can bloat the heap on the master because of the UI state. Spark master OOMs with exception stack trace stored in JobProgressListener -- Key: SPARK-4906 URL: https://issues.apache.org/jira/browse/SPARK-4906 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.1.1 Reporter: Mingyu Kim Spark master was OOMing with a lot of stack traces retained in JobProgressListener. The object dependency goes like the following. JobProgressListener.stageIdToData = StageUIData.taskData = TaskUIData.errorMessage Each error message is ~10kb since it has the entire stack trace. As we have a lot of tasks, when all of the tasks across multiple stages go bad, these error messages accounted for 0.5GB of heap at some point. Please correct me if I'm wrong, but it looks like all the task info for running applications are kept in memory, which means it's almost always bound to OOM for long-running applications. Would it make sense to fix this, for example, by spilling some UI states to disk? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5716) Support TOK_CHARSETLITERAL in HiveQl
[ https://issues.apache.org/jira/browse/SPARK-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313955#comment-14313955 ] Apache Spark commented on SPARK-5716: - User 'adrian-wang' has created a pull request for this issue: https://github.com/apache/spark/pull/4502 Support TOK_CHARSETLITERAL in HiveQl Key: SPARK-5716 URL: https://issues.apache.org/jira/browse/SPARK-5716 Project: Spark Issue Type: New Feature Components: SQL Reporter: Adrian Wang where value = _UTF8 0x12345678 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5700) Bump jets3t version from 0.9.2 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles
[ https://issues.apache.org/jira/browse/SPARK-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-5700. --- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4499 [https://github.com/apache/spark/pull/4499] Bump jets3t version from 0.9.2 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles - Key: SPARK-5700 URL: https://issues.apache.org/jira/browse/SPARK-5700 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 1.3.0 Reporter: Cheng Lian Assignee: Cheng Lian Labels: flaky-test Fix For: 1.3.0 This is a follow-up ticket for SPARK-5671 and SPARK-5696. JetS3t 0.9.2 contains a log4j.properties file inside the artifact and breaks our tests (see SPARK-5696). This is fixed in 0.9.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5717) add sc.stop to LDA examples
[ https://issues.apache.org/jira/browse/SPARK-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313968#comment-14313968 ] Apache Spark commented on SPARK-5717: - User 'hhbyyh' has created a pull request for this issue: https://github.com/apache/spark/pull/4503 add sc.stop to LDA examples --- Key: SPARK-5717 URL: https://issues.apache.org/jira/browse/SPARK-5717 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.2.0 Reporter: yuhao yang Priority: Trivial Original Estimate: 1h Remaining Estimate: 1h Trivial. add sc stop and reorganize import in LDAExample and JavaLDAExample -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4360) task only execute on one node when spark on yarn
[ https://issues.apache.org/jira/browse/SPARK-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313898#comment-14313898 ] seekerak edited comment on SPARK-4360 at 2/10/15 9:37 AM: -- i have resolved this issue by configure yarn scheduler like this: property nameyarn.resourcemanager.scheduler.class/name valueorg.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler/value /property or property nameyarn.resourcemanager.scheduler.class/name valueorg.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler/value /property and property nameyarn.scheduler.capacity.resource-calculator/name valueorg.apache.hadoop.yarn.util.resource.DominantResourceCalculator/value /property so the real reason is yarn's resource scheduler, if one node can provide all resource that tasks required, all tasks maybe run on one node only. was (Author: omicronak): i have resolved this issue by configure yarn scheduler like this: property nameyarn.resourcemanager.scheduler.class/name valueorg.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler/value /property or property nameyarn.resourcemanager.scheduler.class/name valueorg.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler/value /property and property nameyarn.scheduler.capacity.resource-calculator/name valueorg.apache.hadoop.yarn.util.resource.DominantResourceCalculator/value /property so the real reason is yarn's resource scheduler, if one node can provide all resource that tasks required, all task maybe run on one node only. task only execute on one node when spark on yarn Key: SPARK-4360 URL: https://issues.apache.org/jira/browse/SPARK-4360 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.0.2 Reporter: seekerak hadoop version: hadoop 2.0.3-alpha spark version: 1.0.2 when i run spark jobs on yarn, i found all the task only run on one node, my cluster has 4 nodes, executors has 3, but only one has task, the others hasn't, my command like this : /opt/hadoopcluster/spark-1.0.2-bin-hadoop2/bin/spark-submit --class org.sr.scala.Spark_LineCount_G0 --executor-memory 2G --num-executors 12 --master yarn-cluster /home/Spark_G0.jar /data /output/ou_1 is there any one knows why? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2653) Heap size should be the sum of driver.memory and executor.memory in local mode
[ https://issues.apache.org/jira/browse/SPARK-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313940#comment-14313940 ] liu chang commented on SPARK-2653: -- I would like to fix it. I will patch a PR soon. Heap size should be the sum of driver.memory and executor.memory in local mode -- Key: SPARK-2653 URL: https://issues.apache.org/jira/browse/SPARK-2653 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Davies Liu Priority: Minor Original Estimate: 1h Remaining Estimate: 1h In local mode, the driver and executor run in the same JVM, so the heap size of JVM should be the sum of spark.driver.memory and spark.executor.memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5715) Shuffle size increase, performance loss from Spark 1.1.0 to Spark 1.2.0 (and 1.2.1)
[ https://issues.apache.org/jira/browse/SPARK-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5715. -- Resolution: Duplicate Can you move your comment to the existing JIRA, SPARK-5081? Shuffle size increase, performance loss from Spark 1.1.0 to Spark 1.2.0 (and 1.2.1) --- Key: SPARK-5715 URL: https://issues.apache.org/jira/browse/SPARK-5715 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0, 1.2.1 Environment: Running with local[*] as master (experienced it during pre-integration test, not in production cluster), 100GByte memory assigned, 16 core machine. Reporter: Dr. Christian Betz Labels: performance I see a *factor four performance loss* in my Spark jobs when migrating from Spark 1.1.0 to Spark 1.2.0 or 1.2.1. Also, I see an *increase in the size of shuffle writes* (which is also reported by Kevin Jung on the mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tt20894.html Together with this I experience a *huge number of disk spills*. I'm experiencing these with my job under the following circumstances: * Spark 1.2.0 with Sort-based Shuffle * Spark 1.2.0 with Hash-based Shuffle * Spark 1.2.1 with Sort-based Shuffle All three combinations show the same behavior, which contrasts from Spark 1.1.0. In Spark 1.1.0, my job runs for about an hour, in Spark 1.2.x it runs for almost four hours. Configuration is identical otherwise - I only added org.apache.spark.scheduler.CompressedMapStatus to the Kryo registrator for Spark 1.2.0 to cope with https://issues.apache.org/jira/browse/SPARK-5102. As a consequence (I think, but causality might be different) I see lots and lots of disk spills. I cannot provide a small test case, but maybe the log entries for a single worker thread can help someone investigate on this. (See below.) I will also open up an issue, if nobody stops me by providing an answer ;) Any help will be greatly appreciated, because otherwise I'm stuck with Spark 1.1.0, as quadrupling runtime is not an option. Sincerely, Chris 2015-02-09T14:06:06.328+01:00 INFO org.apache.spark.executor.Executor Running task 9.0 in stage 18.0 (TID 300) Executor task launch worker-18 2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.CacheManager Partition rdd_35_9 not found, computing it Executor task launch worker-18 2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 10 non-empty blocks out of 10 blocks Executor task launch worker-18 2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches in 0 ms Executor task launch worker-18 2015-02-09T14:06:07.396+01:00 INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(2582904) called with curMem=300174944, maxMe... Executor task launch worker-18 2015-02-09T14:06:07.397+01:00 INFO org.apache.spark.storage.MemoryStore Block rdd_35_9 stored as bytes in memory (estimated size 2.5... Executor task launch worker-18 2015-02-09T14:06:07.398+01:00 INFO org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_35_9 Executor task launch worker-18 2015-02-09T14:06:07.399+01:00 INFO org.apache.spark.CacheManager Partition rdd_38_9 not found, computing it Executor task launch worker-18 2015-02-09T14:06:07.399+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 10 non-empty blocks out of 10 blocks Executor task launch worker-18 2015-02-09T14:06:07.400+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches in 0 ms Executor task launch worker-18 2015-02-09T14:06:07.567+01:00 INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(944848) called with curMem=302757848, maxMem... Executor task launch worker-18 2015-02-09T14:06:07.568+01:00 INFO org.apache.spark.storage.MemoryStore Block rdd_38_9 stored as values in memory (estimated size 92... Executor task launch worker-18 2015-02-09T14:06:07.569+01:00 INFO org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_38_9 Executor task launch worker-18 2015-02-09T14:06:07.573+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 34 non-empty blocks out of 50 blocks Executor task launch worker-18 2015-02-09T14:06:07.573+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches in 1 ms Executor task launch worker-18 2015-02-09T14:06:38.931+01:00 INFO org.apache.spark.CacheManager Partition rdd_41_9 not found, computing it Executor task launch worker-18 2015-02-09T14:06:38.931+01:00 INFO
[jira] [Created] (SPARK-5719) allow UIs to bind to specified host
Tao Wang created SPARK-5719: --- Summary: allow UIs to bind to specified host Key: SPARK-5719 URL: https://issues.apache.org/jira/browse/SPARK-5719 Project: Spark Issue Type: Improvement Components: Deploy Reporter: Tao Wang Priority: Minor Now web ui binds to 0.0.0.0. When multiple network plane is enabled, we may try to bind ui port to some specified ip address so that it is possible to do some firewall work(ip filter..etc..) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5719) allow daemons to bind to specified host
[ https://issues.apache.org/jira/browse/SPARK-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Wang updated SPARK-5719: Summary: allow daemons to bind to specified host (was: allow UIs to bind to specified host) allow daemons to bind to specified host --- Key: SPARK-5719 URL: https://issues.apache.org/jira/browse/SPARK-5719 Project: Spark Issue Type: Improvement Components: Deploy Reporter: Tao Wang Priority: Minor Now web ui binds to 0.0.0.0. When multiple network plane is enabled, we may try to bind ui port to some specified ip address so that it is possible to do some firewall work(ip filter..etc..) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5719) allow daemons to bind to specified host
[ https://issues.apache.org/jira/browse/SPARK-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Wang updated SPARK-5719: Description: Now web ui binds to 0.0.0.0. When multiple network plane is enabled, we may try to bind ui port to some specified ip address so that it is possible to do some firewall work(ip filter..etc..) The added config items also work for daemons. was:Now web ui binds to 0.0.0.0. When multiple network plane is enabled, we may try to bind ui port to some specified ip address so that it is possible to do some firewall work(ip filter..etc..) allow daemons to bind to specified host --- Key: SPARK-5719 URL: https://issues.apache.org/jira/browse/SPARK-5719 Project: Spark Issue Type: Improvement Components: Deploy Reporter: Tao Wang Priority: Minor Now web ui binds to 0.0.0.0. When multiple network plane is enabled, we may try to bind ui port to some specified ip address so that it is possible to do some firewall work(ip filter..etc..) The added config items also work for daemons. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5719) allow daemons to bind to specified host
[ https://issues.apache.org/jira/browse/SPARK-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314192#comment-14314192 ] Apache Spark commented on SPARK-5719: - User 'WangTaoTheTonic' has created a pull request for this issue: https://github.com/apache/spark/pull/4505 allow daemons to bind to specified host --- Key: SPARK-5719 URL: https://issues.apache.org/jira/browse/SPARK-5719 Project: Spark Issue Type: Improvement Components: Deploy Reporter: Tao Wang Priority: Minor Now web ui binds to 0.0.0.0. When multiple network plane is enabled, we may try to bind ui port to some specified ip address so that it is possible to do some firewall work(ip filter..etc..) The added config items also work for daemons. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3889) JVM dies with SIGBUS, resulting in ConnectionManager failed ACK
[ https://issues.apache.org/jira/browse/SPARK-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314651#comment-14314651 ] Aaron Davidson commented on SPARK-3889: --- The only place we memory map in 1.1 is this method: https://github.com/apache/spark/blob/branch-1.1/core/src/main/scala/org/apache/spark/storage/DiskStore.scala#L106 This threshold is configurable with spark.storage.memoryMapThreshold -- we upped the default from 2 KB to 2 MB in 1.2, which you could try here as well. JVM dies with SIGBUS, resulting in ConnectionManager failed ACK --- Key: SPARK-3889 URL: https://issues.apache.org/jira/browse/SPARK-3889 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Aaron Davidson Assignee: Aaron Davidson Priority: Critical Fix For: 1.2.0 Here's the first part of the core dump, possibly caused by a job which shuffles a lot of very small partitions. {code} # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x7fa5885fcdb0, pid=488, tid=140343502632704 # # JRE version: 7.0_25-b30 # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 compressed oops) # Problematic frame: # v ~StubRoutines::jbyte_disjoint_arraycopy # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again # # If you would like to submit a bug report, please include # instructions on how to reproduce the bug and visit: # https://bugs.launchpad.net/ubuntu/+source/openjdk-7/ # --- T H R E A D --- Current thread (0x7fa4b0631000): JavaThread Executor task launch worker-170 daemon [_thread_in_Java, id=6783, stack(0x7fa4448ef000,0x7fa4449f)] siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), si_addr=0x7fa428f79000 {code} Here is the only useful content I can find related to JVM and SIGBUS from Google: https://bugzilla.redhat.com/show_bug.cgi?format=multipleid=976664 It appears it may be related to disposing byte buffers, which we do in the ConnectionManager -- we mmap shuffle files via ManagedBuffer and dispose of them in BufferMessage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3889) JVM dies with SIGBUS, resulting in ConnectionManager failed ACK
[ https://issues.apache.org/jira/browse/SPARK-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314650#comment-14314650 ] Aaron Davidson commented on SPARK-3889: --- The only place we memory map in 1.1 is this method: https://github.com/apache/spark/blob/branch-1.1/core/src/main/scala/org/apache/spark/storage/DiskStore.scala#L106 This threshold is configurable with spark.storage.memoryMapThreshold -- we upped the default from 2 KB to 2 MB in 1.2, which you could try here as well. JVM dies with SIGBUS, resulting in ConnectionManager failed ACK --- Key: SPARK-3889 URL: https://issues.apache.org/jira/browse/SPARK-3889 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Aaron Davidson Assignee: Aaron Davidson Priority: Critical Fix For: 1.2.0 Here's the first part of the core dump, possibly caused by a job which shuffles a lot of very small partitions. {code} # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x7fa5885fcdb0, pid=488, tid=140343502632704 # # JRE version: 7.0_25-b30 # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 compressed oops) # Problematic frame: # v ~StubRoutines::jbyte_disjoint_arraycopy # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again # # If you would like to submit a bug report, please include # instructions on how to reproduce the bug and visit: # https://bugs.launchpad.net/ubuntu/+source/openjdk-7/ # --- T H R E A D --- Current thread (0x7fa4b0631000): JavaThread Executor task launch worker-170 daemon [_thread_in_Java, id=6783, stack(0x7fa4448ef000,0x7fa4449f)] siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), si_addr=0x7fa428f79000 {code} Here is the only useful content I can find related to JVM and SIGBUS from Google: https://bugzilla.redhat.com/show_bug.cgi?format=multipleid=976664 It appears it may be related to disposing byte buffers, which we do in the ConnectionManager -- we mmap shuffle files via ManagedBuffer and dispose of them in BufferMessage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5716) Support TOK_CHARSETLITERAL in HiveQl
[ https://issues.apache.org/jira/browse/SPARK-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5716. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4502 [https://github.com/apache/spark/pull/4502] Support TOK_CHARSETLITERAL in HiveQl Key: SPARK-5716 URL: https://issues.apache.org/jira/browse/SPARK-5716 Project: Spark Issue Type: New Feature Components: SQL Reporter: Adrian Wang Fix For: 1.3.0 where value = _UTF8 0x12345678 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5721) Propagate missing external shuffle service errors to client
Kostas Sakellis created SPARK-5721: -- Summary: Propagate missing external shuffle service errors to client Key: SPARK-5721 URL: https://issues.apache.org/jira/browse/SPARK-5721 Project: Spark Issue Type: Bug Components: Spark Core, YARN Reporter: Kostas Sakellis When spark.shuffle.service.enabled=true, the yarn AM expects to find an aux service running in the namenode. If it cannot find one an exception like this is present in the app master logs. {noformat} Exception in thread ContainerLauncher #0 Exception in thread ContainerLauncher #1 java.lang.Error: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:206) at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:110) at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:65) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ... 2 more java.lang.Error: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:206) at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:110) at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:65) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ... 2 more {noformat} We should propagate this error to the driver (in yarn-client mode) because it is otherwise unclear why the number of executors expected are not starting up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4105: -- Affects Version/s: 1.2.1 1.3.0 FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle - Key: SPARK-4105 URL: https://issues.apache.org/jira/browse/SPARK-4105 Project: Spark Issue Type: Bug Components: Shuffle, Spark Core Affects Versions: 1.2.0, 1.3.0, 1.2.1 Reporter: Josh Rosen Assignee: Josh Rosen Priority: Blocker We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during shuffle read. Here's a sample stacktrace from an executor: {code} 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID 33053) java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) at org.xerial.snappy.Snappy.uncompress(Snappy.java:427) at org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127) at org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) at org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58) at org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090) at org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116) at org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} Here's another occurrence of a similar error: {code} java.io.IOException: failed to read chunk org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:348)
[jira] [Updated] (SPARK-5655) YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode
[ https://issues.apache.org/jira/browse/SPARK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-5655: - Target Version/s: 1.3.0, 1.2.2 Affects Version/s: (was: 1.2.0) 1.2.1 YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode - Key: SPARK-5655 URL: https://issues.apache.org/jira/browse/SPARK-5655 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Environment: Both CDH5.3.0 and CDH5.1.3, latest build on branch-1.2 Reporter: Andrew Rowson Priority: Blocker Labels: hadoop When running a Spark job on a YARN cluster which doesn't run containers under the same user as the nodemanager, and also when using the YARN auxiliary shuffle service, jobs fail with something similar to: {code:java} java.io.FileNotFoundException: /data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index (Permission denied) {code} The root cause of this here: https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287 Spark will attempt to chmod 700 any application directories it creates during the job, which includes files created in the nodemanager's usercache directory. The owner of these files is the container UID, which on a secure cluster is the name of the user creating the job, and on an nonsecure cluster but with the yarn.nodemanager.container-executor.class configured is the value of yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user. The problem with this is that the auxiliary shuffle manager runs as part of the nodemanager, which is typically running as the user 'yarn'. This can't access these files that are only owner-readable. YARN already attempts to secure files created under appcache but keep them readable by the nodemanager, by setting the group of the appcache directory to 'yarn' and also setting the setgid flag. This means that files and directories created under this should also have the 'yarn' group. Normally this means that the nodemanager should also be able to read these files, but Spark setting chmod700 wipes this out. I'm not sure what the right approach is here. Commenting out the chmod700 functionality makes this work on YARN, and still makes the application files only readable by the owner and the group: {code} /data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c # ls -lah total 206M drwxr-s--- 2 nobody yarn 4.0K Feb 6 18:30 . drwxr-s--- 12 nobody yarn 4.0K Feb 6 18:30 .. -rw-r- 1 nobody yarn 206M Feb 6 18:30 shuffle_0_0_0.data {code} But this may not be the right approach on non-YARN. Perhaps an additional step to see if this chmod700 step is necessary (ie non-YARN) is required. Sadly, I don't have a non-YARN environment to test, otherwise I'd be able to suggest a patch. I believe this is a related issue in the MapReduce framwork: https://issues.apache.org/jira/browse/MAPREDUCE-3728 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5655) YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode
[ https://issues.apache.org/jira/browse/SPARK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-5655: - Priority: Critical (was: Blocker) YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode - Key: SPARK-5655 URL: https://issues.apache.org/jira/browse/SPARK-5655 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.1 Environment: Both CDH5.3.0 and CDH5.1.3, latest build on branch-1.2 Reporter: Andrew Rowson Priority: Critical Labels: hadoop When running a Spark job on a YARN cluster which doesn't run containers under the same user as the nodemanager, and also when using the YARN auxiliary shuffle service, jobs fail with something similar to: {code:java} java.io.FileNotFoundException: /data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index (Permission denied) {code} The root cause of this here: https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287 Spark will attempt to chmod 700 any application directories it creates during the job, which includes files created in the nodemanager's usercache directory. The owner of these files is the container UID, which on a secure cluster is the name of the user creating the job, and on an nonsecure cluster but with the yarn.nodemanager.container-executor.class configured is the value of yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user. The problem with this is that the auxiliary shuffle manager runs as part of the nodemanager, which is typically running as the user 'yarn'. This can't access these files that are only owner-readable. YARN already attempts to secure files created under appcache but keep them readable by the nodemanager, by setting the group of the appcache directory to 'yarn' and also setting the setgid flag. This means that files and directories created under this should also have the 'yarn' group. Normally this means that the nodemanager should also be able to read these files, but Spark setting chmod700 wipes this out. I'm not sure what the right approach is here. Commenting out the chmod700 functionality makes this work on YARN, and still makes the application files only readable by the owner and the group: {code} /data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c # ls -lah total 206M drwxr-s--- 2 nobody yarn 4.0K Feb 6 18:30 . drwxr-s--- 12 nobody yarn 4.0K Feb 6 18:30 .. -rw-r- 1 nobody yarn 206M Feb 6 18:30 shuffle_0_0_0.data {code} But this may not be the right approach on non-YARN. Perhaps an additional step to see if this chmod700 step is necessary (ie non-YARN) is required. Sadly, I don't have a non-YARN environment to test, otherwise I'd be able to suggest a patch. I believe this is a related issue in the MapReduce framwork: https://issues.apache.org/jira/browse/MAPREDUCE-3728 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5720) Support `Create Table xx Like xx` in HiveContext
Li Sheng created SPARK-5720: --- Summary: Support `Create Table xx Like xx` in HiveContext Key: SPARK-5720 URL: https://issues.apache.org/jira/browse/SPARK-5720 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.3.0 Reporter: Li Sheng Fix For: 1.3.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3754) Spark Streaming fileSystem API is not callable from Java
[ https://issues.apache.org/jira/browse/SPARK-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314899#comment-14314899 ] Tathagata Das commented on SPARK-3754: -- Yes, it is. This has already been fixed. Spark Streaming fileSystem API is not callable from Java Key: SPARK-3754 URL: https://issues.apache.org/jira/browse/SPARK-3754 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.0.0, 1.1.0 Reporter: holdenk Assignee: Holden Karau Priority: Critical The Spark Streaming Java API for fileSystem is not callable from Java. We should do something like with how it is handled in the Java Spark Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3754) Spark Streaming fileSystem API is not callable from Java
[ https://issues.apache.org/jira/browse/SPARK-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das closed SPARK-3754. Resolution: Duplicate Spark Streaming fileSystem API is not callable from Java Key: SPARK-3754 URL: https://issues.apache.org/jira/browse/SPARK-3754 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.0.0, 1.1.0 Reporter: holdenk Assignee: Holden Karau Priority: Critical The Spark Streaming Java API for fileSystem is not callable from Java. We should do something like with how it is handled in the Java Spark Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5722) Infer_schema_type incorrect for Integers in pyspark
[ https://issues.apache.org/jira/browse/SPARK-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Don Drake updated SPARK-5722: - Description: The Integers datatype in Python does not match what a Scala/Java integer is defined as. This causes inference of data types and schemas to fail when data is larger than 2^32 and it is inferred incorrectly as an Integer. Since the range of valid Python integers is wider than Java Integers, this causes problems when inferring Integer vs. Long datatypes. This will cause problems when attempting to save SchemaRDD as Parquet or JSON. Here's an example: {code} sqlCtx = SQLContext(sc) from pyspark.sql import Row rdd = sc.parallelize([Row(f1='a', f2=100)]) srdd = sqlCtx.inferSchema(rdd) srdd.schema() StructType(List(StructField(f1,StringType,true),StructField(f2,IntegerType,true))) {code} That number is a LongType in Java, but an Integer in python. We need to check the value to see if it should really by a LongType when a IntegerType is initially inferred. More tests: {code} from pyspark.sql import _infer_type # OK print _infer_type(1) IntegerType # OK print _infer_type(2**31-1) IntegerType #WRONG print _infer_type(2**31) #WRONG IntegerType print _infer_type(2**61 ) #OK IntegerType print _infer_type(2**71 ) LongType {code} Java Primitive Types defined: http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html Python Built-in Types: https://docs.python.org/2/library/stdtypes.html#typesnumeric was: The Integers datatype in Python does not match what a Scala/Java integer is defined as. This causes inference of data types and schemas to fail when data is larger than 2^32 and it is inferred incorrectly as an Integer. Since the range of valid Python integers is wider than Java Integers, this causes problems when inferring Integer vs. Long datatypes. This will cause problems when attempting to save SchemaRDD as Parquet or JSON. Here's an example: sqlCtx = SQLContext(sc) from pyspark.sql import Row rdd = sc.parallelize([Row(f1='a', f2=100)]) srdd = sqlCtx.inferSchema(rdd) srdd.schema() StructType(List(StructField(f1,StringType,true),StructField(f2,IntegerType,true))) That number is a LongType in Java, but an Integer in python. We need to check the value to see if it should really by a LongType when a IntegerType is initially inferred. More tests: from pyspark.sql import _infer_type # OK print _infer_type(1) IntegerType # OK print _infer_type(2**31-1) IntegerType #WRONG print _infer_type(2**31) #WRONG IntegerType print _infer_type(2**61 ) #OK IntegerType print _infer_type(2**71 ) LongType Java Primitive Types defined: http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html Python Built-in Types: https://docs.python.org/2/library/stdtypes.html#typesnumeric Infer_schema_type incorrect for Integers in pyspark --- Key: SPARK-5722 URL: https://issues.apache.org/jira/browse/SPARK-5722 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.2.0 Reporter: Don Drake The Integers datatype in Python does not match what a Scala/Java integer is defined as. This causes inference of data types and schemas to fail when data is larger than 2^32 and it is inferred incorrectly as an Integer. Since the range of valid Python integers is wider than Java Integers, this causes problems when inferring Integer vs. Long datatypes. This will cause problems when attempting to save SchemaRDD as Parquet or JSON. Here's an example: {code} sqlCtx = SQLContext(sc) from pyspark.sql import Row rdd = sc.parallelize([Row(f1='a', f2=100)]) srdd = sqlCtx.inferSchema(rdd) srdd.schema() StructType(List(StructField(f1,StringType,true),StructField(f2,IntegerType,true))) {code} That number is a LongType in Java, but an Integer in python. We need to check the value to see if it should really by a LongType when a IntegerType is initially inferred. More tests: {code} from pyspark.sql import _infer_type # OK print _infer_type(1) IntegerType # OK print _infer_type(2**31-1) IntegerType #WRONG print _infer_type(2**31) #WRONG IntegerType print _infer_type(2**61 ) #OK IntegerType print _infer_type(2**71 ) LongType {code} Java Primitive Types defined: http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html Python Built-in Types: https://docs.python.org/2/library/stdtypes.html#typesnumeric -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5686) Support `show current roles`
[ https://issues.apache.org/jira/browse/SPARK-5686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5686. - Resolution: Fixed Issue resolved by pull request 4471 [https://github.com/apache/spark/pull/4471] Support `show current roles` Key: SPARK-5686 URL: https://issues.apache.org/jira/browse/SPARK-5686 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Li Sheng Priority: Minor Fix For: 1.3.0 Original Estimate: 3h Remaining Estimate: 3h show current roles -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5613) YarnClientSchedulerBackend fails to get application report when yarn restarts
[ https://issues.apache.org/jira/browse/SPARK-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-5613. Resolution: Fixed YarnClientSchedulerBackend fails to get application report when yarn restarts - Key: SPARK-5613 URL: https://issues.apache.org/jira/browse/SPARK-5613 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Kashish Jain Assignee: Kashish Jain Priority: Minor Fix For: 1.3.0, 1.2.2 Original Estimate: 24h Remaining Estimate: 24h Steps to Reproduce 1) Run any spark job 2) Stop yarn while the spark job is running (an application id has been generated by now) 3) Restart yarn now 4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException exception. This leads to termination of thread. Here is the StackTrace 15/02/05 05:22:37 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/02/05 05:22:38 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/02/05 05:22:39 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/02/05 05:22:40 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 5/02/05 05:22:40 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) Exception in thread Yarn application state monitor org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1423113179043_0003' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291) at org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:116) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:120) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException): Application with id
[jira] [Updated] (SPARK-5722) Infer_schema_type incorrect for Integers in pyspark
[ https://issues.apache.org/jira/browse/SPARK-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Don Drake updated SPARK-5722: - Summary: Infer_schema_type incorrect for Integers in pyspark (was: Infer_schma_type incorrect for Integers in pyspark) Infer_schema_type incorrect for Integers in pyspark --- Key: SPARK-5722 URL: https://issues.apache.org/jira/browse/SPARK-5722 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.2.0 Reporter: Don Drake The Integers datatype in Python does not match what a Scala/Java integer is defined as. This causes inference of data types and schemas to fail when data is larger than 2^32 and it is inferred incorrectly as an Integer. Since the range of valid Python integers is wider than Java Integers, this causes problems when inferring Integer vs. Long datatypes. This will cause problems when attempting to save SchemaRDD as Parquet or JSON. Here's an example: sqlCtx = SQLContext(sc) from pyspark.sql import Row rdd = sc.parallelize([Row(f1='a', f2=100)]) srdd = sqlCtx.inferSchema(rdd) srdd.schema() StructType(List(StructField(f1,StringType,true),StructField(f2,IntegerType,true))) That number is a LongType in Java, but an Integer in python. We need to check the value to see if it should really by a LongType when a IntegerType is initially inferred. More tests: from pyspark.sql import _infer_type # OK print _infer_type(1) IntegerType # OK print _infer_type(2**31-1) IntegerType #WRONG print _infer_type(2**31) #WRONG IntegerType print _infer_type(2**61 ) #OK IntegerType print _infer_type(2**71 ) LongType Java Primitive Types defined: http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html Python Built-in Types: https://docs.python.org/2/library/stdtypes.html#typesnumeric -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4964) Exactly-once + WAL-free Kafka Support in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314903#comment-14314903 ] Apache Spark commented on SPARK-4964: - User 'koeninger' has created a pull request for this issue: https://github.com/apache/spark/pull/4511 Exactly-once + WAL-free Kafka Support in Spark Streaming Key: SPARK-4964 URL: https://issues.apache.org/jira/browse/SPARK-4964 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Cody Koeninger Fix For: 1.3.0 There are two issues with the current Kafka support - Use of Write Ahead Logs in Spark Streaming to ensure no data is lost - Causes data replication in both Kafka AND Spark Streaming. - Lack of exactly-once semantics - For background, see http://apache-spark-developers-list.1001551.n3.nabble.com/Which-committers-care-about-Kafka-td9827.html We want to solve both these problem in JIRA. Please see the following design doc for the solution. https://docs.google.com/a/databricks.com/document/d/1IuvZhg9cOueTf1mq4qwc1fhPb5FVcaRLcyjrtG4XU1k/edit#heading=h.itproy77j3p -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5723) Change the default file format to Parquet for CTAS statements.
Yin Huai created SPARK-5723: --- Summary: Change the default file format to Parquet for CTAS statements. Key: SPARK-5723 URL: https://issues.apache.org/jira/browse/SPARK-5723 Project: Spark Issue Type: Improvement Components: SQL Reporter: Yin Huai Right now, if you issue a CTAS queries without specifying file format and serde info, we will use TextFile. We should switch to Parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5613) YarnClientSchedulerBackend fails to get application report when yarn restarts
[ https://issues.apache.org/jira/browse/SPARK-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314922#comment-14314922 ] Andrew Or commented on SPARK-5613: -- Thanks Patrick. I just verified that it was merged into all of 1.2, 1.3 and Master. Closing this again. YarnClientSchedulerBackend fails to get application report when yarn restarts - Key: SPARK-5613 URL: https://issues.apache.org/jira/browse/SPARK-5613 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Kashish Jain Assignee: Kashish Jain Priority: Minor Fix For: 1.3.0, 1.2.2 Original Estimate: 24h Remaining Estimate: 24h Steps to Reproduce 1) Run any spark job 2) Stop yarn while the spark job is running (an application id has been generated by now) 3) Restart yarn now 4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException exception. This leads to termination of thread. Here is the StackTrace 15/02/05 05:22:37 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/02/05 05:22:38 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/02/05 05:22:39 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/02/05 05:22:40 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 5/02/05 05:22:40 INFO Client: Retrying connect to server: nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) Exception in thread Yarn application state monitor org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1423113179043_0003' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291) at org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:116) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:120) Caused by:
[jira] [Closed] (SPARK-4136) Under dynamic allocation, cancel outstanding executor requests when no longer needed
[ https://issues.apache.org/jira/browse/SPARK-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4136. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Sandy Ryza Under dynamic allocation, cancel outstanding executor requests when no longer needed Key: SPARK-4136 URL: https://issues.apache.org/jira/browse/SPARK-4136 Project: Spark Issue Type: Improvement Components: Spark Core, YARN Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4879) Missing output partitions after job completes with speculative execution
[ https://issues.apache.org/jira/browse/SPARK-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4879: - Target Version/s: 1.0.3, 1.3.0, 1.1.2, 1.2.2 (was: 1.0.3, 1.3.0, 1.1.2, 1.2.1) Missing output partitions after job completes with speculative execution Key: SPARK-4879 URL: https://issues.apache.org/jira/browse/SPARK-4879 Project: Spark Issue Type: Bug Components: Input/Output, Spark Core Affects Versions: 1.0.2, 1.1.1, 1.2.0 Reporter: Josh Rosen Assignee: Josh Rosen Priority: Critical Attachments: speculation.txt, speculation2.txt When speculative execution is enabled ({{spark.speculation=true}}), jobs that save output files may report that they have completed successfully even though some output partitions written by speculative tasks may be missing. h3. Reproduction This symptom was reported to me by a Spark user and I've been doing my own investigation to try to come up with an in-house reproduction. I'm still working on a reliable local reproduction for this issue, which is a little tricky because Spark won't schedule speculated tasks on the same host as the original task, so you need an actual (or containerized) multi-host cluster to test speculation. Here's a simple reproduction of some of the symptoms on EC2, which can be run in {{spark-shell}} with {{--conf spark.speculation=true}}: {code} // Rig a job such that all but one of the tasks complete instantly // and one task runs for 20 seconds on its first attempt and instantly // on its second attempt: val numTasks = 100 sc.parallelize(1 to numTasks, numTasks).repartition(2).mapPartitionsWithContext { case (ctx, iter) = if (ctx.partitionId == 0) { // If this is the one task that should run really slow if (ctx.attemptId == 0) { // If this is the first attempt, run slow Thread.sleep(20 * 1000) } } iter }.map(x = (x, x)).saveAsTextFile(/test4) {code} When I run this, I end up with a job that completes quickly (due to speculation) but reports failures from the speculated task: {code} [...] 14/12/11 01:41:13 INFO scheduler.TaskSetManager: Finished task 37.1 in stage 3.0 (TID 411) in 131 ms on ip-172-31-8-164.us-west-2.compute.internal (100/100) 14/12/11 01:41:13 INFO scheduler.DAGScheduler: Stage 3 (saveAsTextFile at console:22) finished in 0.856 s 14/12/11 01:41:13 INFO spark.SparkContext: Job finished: saveAsTextFile at console:22, took 0.885438374 s 14/12/11 01:41:13 INFO scheduler.TaskSetManager: Ignoring task-finished event for 70.1 in stage 3.0 because task 70 has already completed successfully scala 14/12/11 01:41:13 WARN scheduler.TaskSetManager: Lost task 49.1 in stage 3.0 (TID 413, ip-172-31-8-164.us-west-2.compute.internal): java.io.IOException: Failed to save output of task: attempt_201412110141_0003_m_49_413 org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:160) org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:172) org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:132) org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:109) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:991) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {code} One interesting thing to note about this stack trace: if we look at {{FileOutputCommitter.java:160}} ([link|http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/2.5.0-mr1-cdh5.2.0/org/apache/hadoop/mapred/FileOutputCommitter.java#160]), this point in the execution seems to correspond to a case where a task completes, attempts to commit its output, fails for some reason, then deletes the destination file, tries again, and fails: {code} if (fs.isFile(taskOutput)) { 152 Path finalOutputPath = getFinalPath(jobOutputDir, taskOutput, 153 getTempTaskOutputPath(context)); 154 if (!fs.rename(taskOutput, finalOutputPath)) { 155if (!fs.delete(finalOutputPath, true)) { 156 throw new