[jira] [Commented] (SPARK-3889) JVM dies with SIGBUS, resulting in ConnectionManager failed ACK
[ https://issues.apache.org/jira/browse/SPARK-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341997#comment-14341997 ] Mridul Muralidharan commented on SPARK-3889: [~adav] I have seen this a lot in my recent tests with the latest 1.3 and with the stable 1.2.1 version. Though usually the reasons are legitimate upon investigation - like remote side died via SIGTERM from yarn, etc. So would be good to know if this happened legitimately or due to some other bug/issue. > JVM dies with SIGBUS, resulting in ConnectionManager failed ACK > --- > > Key: SPARK-3889 > URL: https://issues.apache.org/jira/browse/SPARK-3889 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Aaron Davidson >Assignee: Aaron Davidson >Priority: Critical > Fix For: 1.2.0 > > > Here's the first part of the core dump, possibly caused by a job which > shuffles a lot of very small partitions. > {code} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0x7) at pc=0x7fa5885fcdb0, pid=488, tid=140343502632704 > # > # JRE version: 7.0_25-b30 > # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # v ~StubRoutines::jbyte_disjoint_arraycopy > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # If you would like to submit a bug report, please include > # instructions on how to reproduce the bug and visit: > # https://bugs.launchpad.net/ubuntu/+source/openjdk-7/ > # > --- T H R E A D --- > Current thread (0x7fa4b0631000): JavaThread "Executor task launch > worker-170" daemon [_thread_in_Java, id=6783, > stack(0x7fa4448ef000,0x7fa4449f)] > siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), > si_addr=0x7fa428f79000 > {code} > Here is the only useful content I can find related to JVM and SIGBUS from > Google: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=976664 > It appears it may be related to disposing byte buffers, which we do in the > ConnectionManager -- we mmap shuffle files via ManagedBuffer and dispose of > them in BufferMessage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost
[ https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-6075. --- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 4835 [https://github.com/apache/spark/pull/4835] > After SPARK-3885, some tasks' accumulator updates may be lost > - > > Key: SPARK-6075 > URL: https://issues.apache.org/jira/browse/SPARK-6075 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.4.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Blocker > Fix For: 1.4.0 > > > It looks like some of the AccumulatorSuite tests have started failing > nondeterministically on Jenkins. The errors seem to be due to lost / missing > accumulator updates, e.g. > {code} > Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901 > {code} > This could somehow be related to SPARK-3885 / > https://github.com/apache/spark/pull/4021, a patch to garbage-collect > accumulators, which was only merged into master. > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/ > I think I've figured it out: consider the lifecycle of an accumulator in a > task, say ShuffleMapTask: on the executor, each task deserializes its own > copy of the RDD inside of its runTask method, so the strong reference to the > RDD disappears at the end of runTask. In Executor.run(), we call > Accumulators.values after runTask has exited, so there's a small window in > which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well > because there are no longer any strong references to them. > The fix is to keep strong references in localAccums, since we clear this at > the end of each task anyways. I'm glad that I was able to figure out > precisely why this was necessary and sorry that I missed this during review; > I'll submit a fix shortly. In terms of preventative measures, it might be a > good idea to write up the lifetime / lifecycle of objects' strong references > whenever we're using WeakReferences, since the process of explicitly writing > that out would prevent these sorts of mistakes in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3785) Support off-loading computations to a GPU
[ https://issues.apache.org/jira/browse/SPARK-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341974#comment-14341974 ] Tycho Grouwstra edited comment on SPARK-3785 at 3/1/15 6:32 AM: I was wondering, it seems [ArrayFire|http://www.arrayfire.com/docs/group__arrayfire__func.htm] already parallelized a number of mathematical/reductor functions for C(++) arrays. If Spark RDDS/DataFrames expose some array interface for columns, might it be possible to use those through JNI? Not sure there'd be tangible performance gains without using APUs, but seemed interesting to me. was (Author: tycho01): Hm, tried commenting a bit earlier but seems it failed. I was wondering, it seems [ArrayFire](http://www.arrayfire.com/docs/group__arrayfire__func.htm) already parallelized a number of mathematical/reductor functions for C(++) arrays. If Spark RDDS/DataFrames expose some array interface for columns, might it be possible to use those through JNI? Not sure there'd be tangible performance gains without using APUs, but seemed interesting to me. > Support off-loading computations to a GPU > - > > Key: SPARK-3785 > URL: https://issues.apache.org/jira/browse/SPARK-3785 > Project: Spark > Issue Type: Brainstorming > Components: MLlib >Reporter: Thomas Darimont >Priority: Minor > > Are there any plans to adding support for off-loading computations to the > GPU, e.g. via an open-cl binding? > http://www.jocl.org/ > https://code.google.com/p/javacl/ > http://lwjgl.org/wiki/index.php?title=OpenCL_in_LWJGL -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3785) Support off-loading computations to a GPU
[ https://issues.apache.org/jira/browse/SPARK-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341974#comment-14341974 ] Tycho Grouwstra commented on SPARK-3785: Hm, tried commenting a bit earlier but seems it failed. I was wondering, it seems [ArrayFire](http://www.arrayfire.com/docs/group__arrayfire__func.htm) already parallelized a number of mathematical/reductor functions for C(++) arrays. If Spark RDDS/DataFrames expose some array interface for columns, might it be possible to use those through JNI? Not sure there'd be tangible performance gains without using APUs, but seemed interesting to me. > Support off-loading computations to a GPU > - > > Key: SPARK-3785 > URL: https://issues.apache.org/jira/browse/SPARK-3785 > Project: Spark > Issue Type: Brainstorming > Components: MLlib >Reporter: Thomas Darimont >Priority: Minor > > Are there any plans to adding support for off-loading computations to the > GPU, e.g. via an open-cl binding? > http://www.jocl.org/ > https://code.google.com/p/javacl/ > http://lwjgl.org/wiki/index.php?title=OpenCL_in_LWJGL -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container
[ https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341970#comment-14341970 ] Aaron Davidson commented on SPARK-6056: --- It's possible that it's actually the shuffle-read that's actually doing memory-mapping -- please try setting spark.storage.memoryMapThreshold to around 1073741824 (1 GB) to disable this form of memory mapping for the test. > Unlimit offHeap memory use cause RM killing the container > - > > Key: SPARK-6056 > URL: https://issues.apache.org/jira/browse/SPARK-6056 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.1 >Reporter: SaintBacchus > > No matter set the `preferDirectBufs` or limit the number of thread or not > ,spark can not limit the use of offheap memory. > At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, > Netty had allocated a offheap memory buffer with the same size in heap. > So how many buffer you want to transfor, the same size offheap memory will be > allocated. > But once the allocated memory size reach the capacity of the overhead momery > set in yarn, this executor will be killed. > I wrote a simple code to test it: > {code:title=test.scala|borderStyle=solid} > import org.apache.spark.storage._ > import org.apache.spark._ > val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new > Array[Byte](10*1024*1024)).persist > bufferRdd.count > val part = bufferRdd.partitions(0) > val sparkEnv = SparkEnv.get > val blockMgr = sparkEnv.blockManager > def test = { > val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) > val resultIt = > blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] > val len = resultIt.map(_.length).sum > println(s"[${Thread.currentThread.getId}] get block length = $len") > } > def test_driver(count:Int, parallel:Int)(f: => Unit) = { > val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) > val taskSupport = new > scala.collection.parallel.ForkJoinTaskSupport(tpool) > val parseq = (1 to count).par > parseq.tasksupport = taskSupport > parseq.foreach(x=>f) > tpool.shutdown > tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS) > } > {code} > progress: > 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 > 2. :load test.scala in spark-shell > 3. use such comman to catch executor on slave node > {code} > pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p > $pid|grep $pid > {code} > 4. test_driver(20,100)(test) in spark-shell > 5. watch the output of the command on slave node > If use multi-thread to get len, the physical memery will soon exceed the > limit set by spark.yarn.executor.memoryOverhead -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3889) JVM dies with SIGBUS, resulting in ConnectionManager failed ACK
[ https://issues.apache.org/jira/browse/SPARK-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341962#comment-14341962 ] Aaron Davidson commented on SPARK-3889: --- This may be a new issue. I would open a new ticket, especially because the "ConnectionManager failed ACK" thing shouldn't be happening in 1.2.1; there should be different symptoms and perhaps a different cause as well. A last ditch thing to try, by the way, is to up spark.storage.memoryMapThreshold to a very large number (e.g., 1 GB in bytes) and see if it still occurs -- if so, then please report more details about your workload and any other possible symptoms you see. > JVM dies with SIGBUS, resulting in ConnectionManager failed ACK > --- > > Key: SPARK-3889 > URL: https://issues.apache.org/jira/browse/SPARK-3889 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Aaron Davidson >Assignee: Aaron Davidson >Priority: Critical > Fix For: 1.2.0 > > > Here's the first part of the core dump, possibly caused by a job which > shuffles a lot of very small partitions. > {code} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0x7) at pc=0x7fa5885fcdb0, pid=488, tid=140343502632704 > # > # JRE version: 7.0_25-b30 > # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # v ~StubRoutines::jbyte_disjoint_arraycopy > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # If you would like to submit a bug report, please include > # instructions on how to reproduce the bug and visit: > # https://bugs.launchpad.net/ubuntu/+source/openjdk-7/ > # > --- T H R E A D --- > Current thread (0x7fa4b0631000): JavaThread "Executor task launch > worker-170" daemon [_thread_in_Java, id=6783, > stack(0x7fa4448ef000,0x7fa4449f)] > siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), > si_addr=0x7fa428f79000 > {code} > Here is the only useful content I can find related to JVM and SIGBUS from > Google: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=976664 > It appears it may be related to disposing byte buffers, which we do in the > ConnectionManager -- we mmap shuffle files via ManagedBuffer and dispose of > them in BufferMessage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-3889) JVM dies with SIGBUS, resulting in ConnectionManager failed ACK
[ https://issues.apache.org/jira/browse/SPARK-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Davidson updated SPARK-3889: -- Comment: was deleted (was: The only place we memory map in 1.1 is this method: https://github.com/apache/spark/blob/branch-1.1/core/src/main/scala/org/apache/spark/storage/DiskStore.scala#L106 This threshold is configurable with "spark.storage.memoryMapThreshold" -- we upped the default from 2 KB to 2 MB in 1.2, which you could try here as well.) > JVM dies with SIGBUS, resulting in ConnectionManager failed ACK > --- > > Key: SPARK-3889 > URL: https://issues.apache.org/jira/browse/SPARK-3889 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Aaron Davidson >Assignee: Aaron Davidson >Priority: Critical > Fix For: 1.2.0 > > > Here's the first part of the core dump, possibly caused by a job which > shuffles a lot of very small partitions. > {code} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0x7) at pc=0x7fa5885fcdb0, pid=488, tid=140343502632704 > # > # JRE version: 7.0_25-b30 > # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # v ~StubRoutines::jbyte_disjoint_arraycopy > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # If you would like to submit a bug report, please include > # instructions on how to reproduce the bug and visit: > # https://bugs.launchpad.net/ubuntu/+source/openjdk-7/ > # > --- T H R E A D --- > Current thread (0x7fa4b0631000): JavaThread "Executor task launch > worker-170" daemon [_thread_in_Java, id=6783, > stack(0x7fa4448ef000,0x7fa4449f)] > siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), > si_addr=0x7fa428f79000 > {code} > Here is the only useful content I can find related to JVM and SIGBUS from > Google: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=976664 > It appears it may be related to disposing byte buffers, which we do in the > ConnectionManager -- we mmap shuffle files via ManagedBuffer and dispose of > them in BufferMessage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container
[ https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341957#comment-14341957 ] SaintBacchus edited comment on SPARK-6056 at 3/1/15 6:04 AM: - [~adav] Thx for comment. I can't understand what you say clearly. Do you mean if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only allocat the heap memory? Is it right? I test it again.And I had update the test program in the Description. I had set the _spark.shuffle.io.preferDirectBufs_ and when I type *test_driver(20,100)(test)* , the result is this: _ 76602 root 20 0 1597m 339m 25m S0 0.1 0:04.89 java _ _76602 root 20 0 {color:red} 1777m {color} 1.0g 26m S 99 0.3 0:07.88 java _ _ 76602 root 20 0 1597m 880m 26m S4 0.3 0:07.99 java_ The red num is visual memory and it had raised about 180mb in the moment and the test case total transfor 200mb data (20 * 10MB) from executor to driver. I think it's a problem. If I use 40 threads to get the result, it will need near 400mb momery and soon exceed the limit of yarn, fanally killed by yarn. If there is a way to limit the peek use of memory, it will be fine. In addtion, I though the number of remote fetch block threads in user side is uncontrollable, it's better to be controlled in spark. [~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 release version. In my test case, I use the default memory: 1G executor and 384 overhead. But in the real case, momery is much more. was (Author: carlmartin): [~adav] Thx for comment. I can't understand what you say clearly. Do you mean if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only allocat the heap memory? Is it right? I test it again.And I had update the test program in the Description. I had set the _spark.shuffle.io.preferDirectBufs_ and when I type *test_driver(20,100)(test)* , the result is this: _ 76602 root 20 0 1597m 339m 25m S0 0.1 0:04.89 java _ _76602 root 20 0 {color:red} 1777m {color} 1.0g 26m S 99 0.3 0:07.88 java _ _ 76602 root 20 0 1597m 880m 26m S4 0.3 0:07.99 java_ The red num is visual memory and it had raised about 180mb in the moment and total transfor 200mb data (20 * 10MB) from executor to driver. I think it's a problem. If I use 40 threads to get the result, it will need near 400mb momery and soon exceed the limit of yarn, fanally killed by yarn. If there is a way to limit the peek use of memory, it will be fine. In addtion, I though the number of remote fetch block threads in user side is uncontrollable, it's better to be controlled in spark. [~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 release version. In my test case, I use the default memory: 1G executor and 384 overhead. But in the real case, momery is much more. > Unlimit offHeap memory use cause RM killing the container > - > > Key: SPARK-6056 > URL: https://issues.apache.org/jira/browse/SPARK-6056 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.1 >Reporter: SaintBacchus > > No matter set the `preferDirectBufs` or limit the number of thread or not > ,spark can not limit the use of offheap memory. > At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, > Netty had allocated a offheap memory buffer with the same size in heap. > So how many buffer you want to transfor, the same size offheap memory will be > allocated. > But once the allocated memory size reach the capacity of the overhead momery > set in yarn, this executor will be killed. > I wrote a simple code to test it: > {code:title=test.scala|borderStyle=solid} > import org.apache.spark.storage._ > import org.apache.spark._ > val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new > Array[Byte](10*1024*1024)).persist > bufferRdd.count > val part = bufferRdd.partitions(0) > val sparkEnv = SparkEnv.get > val blockMgr = sparkEnv.blockManager > def test = { > val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) > val resultIt = > blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] > val len = resultIt.map(_.length).sum > println(s"[${Thread.currentThread.getId}] get block length = $len") > } > def test_driver(count:Int, parallel:Int)(f: => Unit) = { > val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) > val taskSupport = ne
[jira] [Comment Edited] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container
[ https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341957#comment-14341957 ] SaintBacchus edited comment on SPARK-6056 at 3/1/15 6:03 AM: - [~adav] Thx for comment. I can't understand what you say clearly. Do you mean if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only allocat the heap memory? Is it right? I test it again.And I had update the test program in the Description. I had set the _spark.shuffle.io.preferDirectBufs_ and when I type *test_driver(20,100)(test)* , the result is this: _ 76602 root 20 0 1597m 339m 25m S0 0.1 0:04.89 java _ _76602 root 20 0 {color:red} 1777m {color} 1.0g 26m S 99 0.3 0:07.88 java _ _ 76602 root 20 0 1597m 880m 26m S4 0.3 0:07.99 java_ The red num is visual memory and it had raised about 180mb in the moment and total transfor 200mb data (20 * 10MB) from executor to driver. I think it's a problem. If I use 40 threads to get the result, it will need near 400mb momery and soon exceed the limit of yarn, fanally killed by yarn. If there is a way to limit the peek use of memory, it will be fine. In addtion, I though the number of remote fetch block threads in user side is uncontrollable, it's better to be controlled in spark. [~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 release version. In my test case, I use the default memory: 1G executor and 384 overhead. But in the real case, momery is much more. was (Author: carlmartin): [~adav] Thx for comment. I can't understand what you say clearly. Do you mean if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only allocat the heap memory? Is it right? I test it again.And I had update the test program in the Description. I had set the _spark.shuffle.io.preferDirectBufs_ and when I type *test_driver(20,100)(test)* , the result is this: _ 76602 root 20 0 1597m 339m 25m S0 0.1 0:04.89 java _ _76602 root 20 0 {color:red} 1777m {color} 1.0g 26m S 99 0.3 0:07.88 java _ _ 76602 root 20 0 1597m 880m 26m S4 0.3 0:07.99 java_ The red num is visual memory and it had raised about 180mb in the moment and total transfor 200mb data (20 * 10MB) from executor to driver. I think it's a big problem. If I use 40 threads to get the result, it will need near 400mb momery and so exceed the limit of yarn fanally killed by yarn. If there is a way to limit the peek use of memory, it will be fine. In addtion, I though the user side number of remote fetch block threads is uncontrollable, it's better to be controlled in spark. [~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 release version. In my test case, I use the default memory: 1G executor and 384 overhead. But in the real case, momery is much more. > Unlimit offHeap memory use cause RM killing the container > - > > Key: SPARK-6056 > URL: https://issues.apache.org/jira/browse/SPARK-6056 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.1 >Reporter: SaintBacchus > > No matter set the `preferDirectBufs` or limit the number of thread or not > ,spark can not limit the use of offheap memory. > At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, > Netty had allocated a offheap memory buffer with the same size in heap. > So how many buffer you want to transfor, the same size offheap memory will be > allocated. > But once the allocated memory size reach the capacity of the overhead momery > set in yarn, this executor will be killed. > I wrote a simple code to test it: > {code:title=test.scala|borderStyle=solid} > import org.apache.spark.storage._ > import org.apache.spark._ > val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new > Array[Byte](10*1024*1024)).persist > bufferRdd.count > val part = bufferRdd.partitions(0) > val sparkEnv = SparkEnv.get > val blockMgr = sparkEnv.blockManager > def test = { > val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) > val resultIt = > blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] > val len = resultIt.map(_.length).sum > println(s"[${Thread.currentThread.getId}] get block length = $len") > } > def test_driver(count:Int, parallel:Int)(f: => Unit) = { > val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) > val taskSupport = new > scala.collec
[jira] [Comment Edited] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container
[ https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341957#comment-14341957 ] SaintBacchus edited comment on SPARK-6056 at 3/1/15 6:01 AM: - [~adav] Thx for comment. I can't understand what you say clearly. Do you mean if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only allocat the heap memory? Is it right? I test it again.And I had update the test program in the Description. I had set the _spark.shuffle.io.preferDirectBufs_ and when I type *test_driver(20,100)(test)* , the result is this: _ 76602 root 20 0 1597m 339m 25m S0 0.1 0:04.89 java _ _76602 root 20 0 {color:red} 1777m {color} 1.0g 26m S 99 0.3 0:07.88 java _ _ 76602 root 20 0 1597m 880m 26m S4 0.3 0:07.99 java_ The red num is visual memory and it had raised about 180mb in the moment and total transfor 200mb data (20 * 10MB) from executor to driver. I think it's a big problem. If I use 40 threads to get the result, it will need near 400mb momery and so exceed the limit of yarn fanally killed by yarn. If there is a way to limit the peek use of memory, it will be fine. In addtion, I though the user side number of remote fetch block threads is uncontrollable, it's better to be controlled in spark. [~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 release version. In my test case, I use the default memory: 1G executor and 384 overhead. But in the real case, momery is much more. was (Author: carlmartin): [~adav] Thx for comment. I can't understand what you say clearly. Do you mean if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only allocat the heap memory? Is it right? I test it again.And I had update the test program in the Description. I had set the _spark.shuffle.io.preferDirectBufs_ and when I type *test_driver(20,100)(test)* , the result is this: _ 76602 root 20 0 1597m 339m 25m S0 0.1 0:04.89 java _ _76602 root 20 0 {color:red} 1777m {color} 1.0g 26m S 99 0.3 0:07.88 java _ _ 76602 root 20 0 1597m 880m 26m S4 0.3 0:07.99 java_ The red num had raised about 180mb in the moment and total transfor 200mb data (20 * 10MB) from executor to driver. I think it's a big problem. If I use 40 threads to get the result, it will need near 400mb momery and so exceed the limit of yarn fanally killed by yarn. If there is a way to limit the peek use of memory, it will be fine. In addtion, I though the user side number of remote fetch block threads is uncontrollable, it's better to be controlled in spark. [~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 release version. In my test case, I use the default memory: 1G executor and 384 overhead. But in the real case, momery is much more. > Unlimit offHeap memory use cause RM killing the container > - > > Key: SPARK-6056 > URL: https://issues.apache.org/jira/browse/SPARK-6056 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.1 >Reporter: SaintBacchus > > No matter set the `preferDirectBufs` or limit the number of thread or not > ,spark can not limit the use of offheap memory. > At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, > Netty had allocated a offheap memory buffer with the same size in heap. > So how many buffer you want to transfor, the same size offheap memory will be > allocated. > But once the allocated memory size reach the capacity of the overhead momery > set in yarn, this executor will be killed. > I wrote a simple code to test it: > {code:title=test.scala|borderStyle=solid} > import org.apache.spark.storage._ > import org.apache.spark._ > val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new > Array[Byte](10*1024*1024)).persist > bufferRdd.count > val part = bufferRdd.partitions(0) > val sparkEnv = SparkEnv.get > val blockMgr = sparkEnv.blockManager > def test = { > val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) > val resultIt = > blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] > val len = resultIt.map(_.length).sum > println(s"[${Thread.currentThread.getId}] get block length = $len") > } > def test_driver(count:Int, parallel:Int)(f: => Unit) = { > val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) > val taskSupport = new > scala.collection.parallel.ForkJoinTaskS
[jira] [Commented] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container
[ https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341957#comment-14341957 ] SaintBacchus commented on SPARK-6056: - [~adav] Thx for comment. I can't understand what you say clearly. Do you mean if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only allocat the heap memory? Is it right? I test it again.And I had update the test program in the Description. I had set the _spark.shuffle.io.preferDirectBufs_ and when I type *test_driver(20,100)(test)* , the result is this: _ 76602 root 20 0 1597m 339m 25m S0 0.1 0:04.89 java _ _76602 root 20 0 {color:red} 1777m {color} 1.0g 26m S 99 0.3 0:07.88 java _ _ 76602 root 20 0 1597m 880m 26m S4 0.3 0:07.99 java_ The red num had raised about 180mb in the moment and total transfor 200mb data (20 * 10MB) from executor to driver. I think it's a big problem. If I use 40 threads to get the result, it will need near 400mb momery and so exceed the limit of yarn fanally killed by yarn. If there is a way to limit the peek use of memory, it will be fine. In addtion, I though the user side number of remote fetch block threads is uncontrollable, it's better to be controlled in spark. [~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 release version. In my test case, I use the default memory: 1G executor and 384 overhead. But in the real case, momery is much more. > Unlimit offHeap memory use cause RM killing the container > - > > Key: SPARK-6056 > URL: https://issues.apache.org/jira/browse/SPARK-6056 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.1 >Reporter: SaintBacchus > > No matter set the `preferDirectBufs` or limit the number of thread or not > ,spark can not limit the use of offheap memory. > At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, > Netty had allocated a offheap memory buffer with the same size in heap. > So how many buffer you want to transfor, the same size offheap memory will be > allocated. > But once the allocated memory size reach the capacity of the overhead momery > set in yarn, this executor will be killed. > I wrote a simple code to test it: > {code:title=test.scala|borderStyle=solid} > import org.apache.spark.storage._ > import org.apache.spark._ > val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new > Array[Byte](10*1024*1024)).persist > bufferRdd.count > val part = bufferRdd.partitions(0) > val sparkEnv = SparkEnv.get > val blockMgr = sparkEnv.blockManager > def test = { > val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) > val resultIt = > blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] > val len = resultIt.map(_.length).sum > println(s"[${Thread.currentThread.getId}] get block length = $len") > } > def test_driver(count:Int, parallel:Int)(f: => Unit) = { > val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) > val taskSupport = new > scala.collection.parallel.ForkJoinTaskSupport(tpool) > val parseq = (1 to count).par > parseq.tasksupport = taskSupport > parseq.foreach(x=>f) > tpool.shutdown > tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS) > } > {code} > progress: > 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 > 2. :load test.scala in spark-shell > 3. use such comman to catch executor on slave node > {code} > pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p > $pid|grep $pid > {code} > 4. test_driver(20,100)(test) in spark-shell > 5. watch the output of the command on slave node > If use multi-thread to get len, the physical memery will soon exceed the > limit set by spark.yarn.executor.memoryOverhead -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container
[ https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-6056: Description: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: {code:title=test.scala|borderStyle=solid} import org.apache.spark.storage._ import org.apache.spark._ val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new Array[Byte](10*1024*1024)).persist bufferRdd.count val part = bufferRdd.partitions(0) val sparkEnv = SparkEnv.get val blockMgr = sparkEnv.blockManager def test = { val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] val len = resultIt.map(_.length).sum println(s"[${Thread.currentThread.getId}] get block length = $len") } def test_driver(count:Int, parallel:Int)(f: => Unit) = { val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) val taskSupport = new scala.collection.parallel.ForkJoinTaskSupport(tpool) val parseq = (1 to count).par parseq.tasksupport = taskSupport parseq.foreach(x=>f) tpool.shutdown tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS) } {code} progress: 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 2. :load test.scala in spark-shell 3. use such comman to catch executor on slave node {code} pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p $pid|grep $pid {code} 4. test_driver(20,100)(test) in spark-shell 5. watch the output of the command on slave node If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead was: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: {code:title=test.scala|borderStyle=solid} import org.apache.spark.storage._ import org.apache.spark._ val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new Array[Byte](10*1024*1024)).persist bufferRdd.count val part = bufferRdd.partitions(0) val sparkEnv = SparkEnv.get val blockMgr = sparkEnv.blockManager def test = { val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] val len = resultIt.map(_.length).sum println(s"[${Thread.currentThread.getId}] get block length = $len") } def test_driver(count:Int, parallel:Int)(f: => Unit) = { val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) val taskSupport = new scala.collection.parallel.ForkJoinTaskSupport(tpool) val parseq = (1 to count).par parseq.tasksupport = taskSupport parseq.foreach(x=>f) tpool.shutdown tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS) } {code} progress: 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 2. :load test.scala in spark-shell 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p $pid|grep $pid} to catch executor on slave node 4. test_driver(20,100)(test) in spark-shell 5. watch the output of the command on slave node If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead > Unlimit offHeap memory use cause RM killing the container > - > > Key: SPARK-6056 > URL: https://issues.apache.org/jira/browse/SPARK-6056 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.1 >Reporter: SaintBacchus > > No matter set the `preferDirectBufs` or limit the number of thread or not > ,spark can not limit the use of offheap memory. > At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, > Netty had allocated a offheap memory buffer with the same size in heap. > So how many buffer you want to transfor, the same size offheap memory will be > allocated. > But o
[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container
[ https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-6056: Description: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: {code:title=test.scala|borderStyle=solid} import org.apache.spark.storage._ import org.apache.spark._ val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new Array[Byte](10*1024*1024)).persist bufferRdd.count val part = bufferRdd.partitions(0) val sparkEnv = SparkEnv.get val blockMgr = sparkEnv.blockManager def test = { val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] val len = resultIt.map(_.length).sum println(s"[${Thread.currentThread.getId}] get block length = $len") } def test_driver(count:Int, parallel:Int)(f: => Unit) = { val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) val taskSupport = new scala.collection.parallel.ForkJoinTaskSupport(tpool) val parseq = (1 to count).par parseq.tasksupport = taskSupport parseq.foreach(x=>f) tpool.shutdown tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS) } {code} progress: 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 2. :load test.scala in spark-shell 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p $pid|grep $pid} to catch executor on slave node 4. test_driver(20,100)(test) in spark-shell 5. watch the output of the command on slave node If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead was: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: progress: 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 2. :load test.scala in spark-shell 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p $pid|grep $pid} to catch executor on slave node 4. test_driver(20,100)(test) in spark-shell 5. watch the output of the command on slave node If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead > Unlimit offHeap memory use cause RM killing the container > - > > Key: SPARK-6056 > URL: https://issues.apache.org/jira/browse/SPARK-6056 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.1 >Reporter: SaintBacchus > > No matter set the `preferDirectBufs` or limit the number of thread or not > ,spark can not limit the use of offheap memory. > At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, > Netty had allocated a offheap memory buffer with the same size in heap. > So how many buffer you want to transfor, the same size offheap memory will be > allocated. > But once the allocated memory size reach the capacity of the overhead momery > set in yarn, this executor will be killed. > I wrote a simple code to test it: > {code:title=test.scala|borderStyle=solid} > import org.apache.spark.storage._ > import org.apache.spark._ > val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new > Array[Byte](10*1024*1024)).persist > bufferRdd.count > val part = bufferRdd.partitions(0) > val sparkEnv = SparkEnv.get > val blockMgr = sparkEnv.blockManager > def test = { > val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) > val resultIt = > blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] > val len = resultIt.map(_.length).sum > println(s"[${Thread.currentThread.getId}] get block length = $len") > } > def test_driver(count:Int, parallel:Int)(f: => Unit) = { > val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) > val taskSupport = new > scala.collection.parallel.ForkJoinTaskSupport(tpool) > val parseq
[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container
[ https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-6056: Description: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: progress: 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 2. :load test.scala in spark-shell 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p $pid|grep $pid} to catch executor on slave node 4. test_driver(20,100)(test) in spark-shell 5. watch the output of the command on slave node If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead was: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: ```test.scala import org.apache.spark.storage._ import org.apache.spark._ val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new Array[Byte](10*1024*1024)).persist bufferRdd.count val part = bufferRdd.partitions(0) val sparkEnv = SparkEnv.get val blockMgr = sparkEnv.blockManager def test = { val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] val len = resultIt.map(_.length).sum } def test_driver(count:Int, parallel:Int)(f: => Unit) = { val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) val taskSupport = new scala.collection.parallel.ForkJoinTaskSupport(tpool) val parseq = (1 to count).par parseq.tasksupport = taskSupport parseq.foreach(x=>f) tpool.shutdown tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS) } ``` progress: 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 2. :load test.scala in spark-shell 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p $pid|grep $pid} to catch executor on slave node 4. test_driver(20,100)(test) in spark-shell 5. watch the output of the command on slave node If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead > Unlimit offHeap memory use cause RM killing the container > - > > Key: SPARK-6056 > URL: https://issues.apache.org/jira/browse/SPARK-6056 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.1 >Reporter: SaintBacchus > > No matter set the `preferDirectBufs` or limit the number of thread or not > ,spark can not limit the use of offheap memory. > At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, > Netty had allocated a offheap memory buffer with the same size in heap. > So how many buffer you want to transfor, the same size offheap memory will be > allocated. > But once the allocated memory size reach the capacity of the overhead momery > set in yarn, this executor will be killed. > I wrote a simple code to test it: > progress: > 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 > 2. :load test.scala in spark-shell > 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print > $1}');top -b -p $pid|grep $pid} to catch executor on slave node > 4. test_driver(20,100)(test) in spark-shell > 5. watch the output of the command on slave node > If use multi-thread to get len, the physical memery will soon exceed the > limit set by spark.yarn.executor.memoryOverhead -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container
[ https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-6056: Description: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: ```test.scala import org.apache.spark.storage._ import org.apache.spark._ val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new Array[Byte](10*1024*1024)).persist bufferRdd.count val part = bufferRdd.partitions(0) val sparkEnv = SparkEnv.get val blockMgr = sparkEnv.blockManager def test = { val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] val len = resultIt.map(_.length).sum } def test_driver(count:Int, parallel:Int)(f: => Unit) = { val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) val taskSupport = new scala.collection.parallel.ForkJoinTaskSupport(tpool) val parseq = (1 to count).par parseq.tasksupport = taskSupport parseq.foreach(x=>f) tpool.shutdown tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS) } ``` progress: 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 2. :load test.scala in spark-shell 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p $pid|grep $pid} to catch executor on slave node 4. test_driver(20,100)(test) in spark-shell 5. watch the output of the command on slave node If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead was: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: ```test.scala import org.apache.spark.storage._ import org.apache.spark._ val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new Array[Byte](10*1024*1024)).persist bufferRdd.count val part = bufferRdd.partitions(0) val sparkEnv = SparkEnv.get val blockMgr = sparkEnv.blockManager def test = { val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] val len = resultIt.map(_.length).sum } def test_driver(count:Int, parallel:Int)(f: => Unit) = { val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) val taskSupport = new scala.collection.parallel.ForkJoinTaskSupport(tpool) val parseq = (1 to count).par parseq.tasksupport = taskSupport parseq.foreach(x=>f) tpool.shutdown tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS) } ``` progress: 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 2. :load test.scala in spark-shell 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p $pid|grep $pid} to catch executor on slave node 4. test_driver(20,100)(test) in spark-shell 5. watch the output of the command on slave node If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead > Unlimit offHeap memory use cause RM killing the container > - > > Key: SPARK-6056 > URL: https://issues.apache.org/jira/browse/SPARK-6056 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.1 >Reporter: SaintBacchus > > No matter set the `preferDirectBufs` or limit the number of thread or not > ,spark can not limit the use of offheap memory. > At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, > Netty had allocated a offheap memory buffer with the same size in heap. > So how many buffer you want to transfor, the same size offheap memory will be > allocated. > But once the allocated memory size reach the capacity of the overhead momery > set in yarn, this executor will be killed. > I wrote a simple code to test it: > ```test.scala > import org.apache.spark.storage._ > import org.apache.spa
[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container
[ https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-6056: Description: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: ```test.scala import org.apache.spark.storage._ import org.apache.spark._ val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new Array[Byte](10*1024*1024)).persist bufferRdd.count val part = bufferRdd.partitions(0) val sparkEnv = SparkEnv.get val blockMgr = sparkEnv.blockManager def test = { val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] val len = resultIt.map(_.length).sum } def test_driver(count:Int, parallel:Int)(f: => Unit) = { val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) val taskSupport = new scala.collection.parallel.ForkJoinTaskSupport(tpool) val parseq = (1 to count).par parseq.tasksupport = taskSupport parseq.foreach(x=>f) tpool.shutdown tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS) } ``` progress: 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 2. :load test.scala in spark-shell 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p $pid|grep $pid} to catch executor on slave node 4. test_driver(20,100)(test) in spark-shell 5. watch the output of the command on slave node If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead was: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: ```test.scala import org.apache.spark.storage._ import org.apache.spark._ val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new Array[Byte](10*1024*1024)).persist bufferRdd.count val part = bufferRdd.partitions(0) val sparkEnv = SparkEnv.get val blockMgr = sparkEnv.blockManager def test = { val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] val len = resultIt.map(_.length).sum println(s"[${Thread.currentThread.getId}] get block length = $len") } def test_driver(count:Int, parallel:Int)(f: => Unit) = { val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) val taskSupport = new scala.collection.parallel.ForkJoinTaskSupport(tpool) val parseq = (1 to count).par parseq.tasksupport = taskSupport parseq.foreach(x=>f) tpool.shutdown tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS) } ``` progress: 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 2. :load test.scala in spark-shell 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p $pid|grep $pid} to catch executor on slave node 4. test_driver(20,100)(test) in spark-shell 5. watch the output of the command on slave node If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead > Unlimit offHeap memory use cause RM killing the container > - > > Key: SPARK-6056 > URL: https://issues.apache.org/jira/browse/SPARK-6056 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.1 >Reporter: SaintBacchus > > No matter set the `preferDirectBufs` or limit the number of thread or not > ,spark can not limit the use of offheap memory. > At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, > Netty had allocated a offheap memory buffer with the same size in heap. > So how many buffer you want to transfor, the same size offheap memory will be > allocated. > But once the allocated memory size reach the capacity of the overhead momery > set in yarn, this executor will be killed. > I wrote a simple code to test it: >
[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container
[ https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-6056: Description: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: ```test.scala import org.apache.spark.storage._ import org.apache.spark._ val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new Array[Byte](10*1024*1024)).persist bufferRdd.count val part = bufferRdd.partitions(0) val sparkEnv = SparkEnv.get val blockMgr = sparkEnv.blockManager def test = { val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] val len = resultIt.map(_.length).sum println(s"[${Thread.currentThread.getId}] get block length = $len") } def test_driver(count:Int, parallel:Int)(f: => Unit) = { val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) val taskSupport = new scala.collection.parallel.ForkJoinTaskSupport(tpool) val parseq = (1 to count).par parseq.tasksupport = taskSupport parseq.foreach(x=>f) tpool.shutdown tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS) } ``` progress: 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1 2. :load test.scala in spark-shell 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p $pid|grep $pid} to catch executor on slave node 4. test_driver(20,100)(test) in spark-shell 5. watch the output of the command on slave node If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead was: No matter set the `preferDirectBufs` or limit the number of thread or not ,spark can not limit the use of offheap memory. At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty had allocated a offheap memory buffer with the same size in heap. So how many buffer you want to transfor, the same size offheap memory will be allocated. But once the allocated memory size reach the capacity of the overhead momery set in yarn, this executor will be killed. I wrote a simple code to test it: ```scala val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new Array[Byte](10*1024*1024)).persist bufferRdd.count val part = bufferRdd.partitions(0) val sparkEnv = SparkEnv.get val blockMgr = sparkEnv.blockManager val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] val len = resultIt.map(_.length).sum ``` If use multi-thread to get len, the physical memery will soon exceed the limit set by spark.yarn.executor.memoryOverhead > Unlimit offHeap memory use cause RM killing the container > - > > Key: SPARK-6056 > URL: https://issues.apache.org/jira/browse/SPARK-6056 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.1 >Reporter: SaintBacchus > > No matter set the `preferDirectBufs` or limit the number of thread or not > ,spark can not limit the use of offheap memory. > At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, > Netty had allocated a offheap memory buffer with the same size in heap. > So how many buffer you want to transfor, the same size offheap memory will be > allocated. > But once the allocated memory size reach the capacity of the overhead momery > set in yarn, this executor will be killed. > I wrote a simple code to test it: > ```test.scala > import org.apache.spark.storage._ > import org.apache.spark._ > val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new > Array[Byte](10*1024*1024)).persist > bufferRdd.count > val part = bufferRdd.partitions(0) > val sparkEnv = SparkEnv.get > val blockMgr = sparkEnv.blockManager > def test = { > val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index)) > val resultIt = > blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]] > val len = resultIt.map(_.length).sum > println(s"[${Thread.currentThread.getId}] get block length = $len") > } > def test_driver(count:Int, parallel:Int)(f: => Unit) = { > val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel) > val taskSupport = new > scala.collection.parallel.ForkJoinTaskSupport(tpool) > val parseq = (1 to count).par
[jira] [Commented] (SPARK-5771) Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called
[ https://issues.apache.org/jira/browse/SPARK-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341947#comment-14341947 ] Apache Spark commented on SPARK-5771: - User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/4841 > Number of Cores in Completed Applications of Standalone Master Web Page > always be 0 if sc.stop() is called > -- > > Key: SPARK-5771 > URL: https://issues.apache.org/jira/browse/SPARK-5771 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.2.1 >Reporter: Liangliang Gu >Assignee: Liangliang Gu >Priority: Minor > Fix For: 1.4.0 > > > In Standalone mode, the number of cores in Completed Applications of the > Master Web Page will always be zero, if sc.stop() is called. > But the number will always be right, if sc.stop() is not called. > The reason maybe: > after sc.stop() is called, the function removeExecutor of class > ApplicationInfo will be called, thus reduce the variable coresGranted to > zero. The variable coresGranted is used to display the number of Cores on > the Web Page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5981) pyspark ML models should support predict/transform on vector within map
[ https://issues.apache.org/jira/browse/SPARK-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341934#comment-14341934 ] Joseph K. Bradley commented on SPARK-5981: -- True, NaiveBayesModel does not use JavaModelWrapper. It's only a problem for model which use that. > pyspark ML models should support predict/transform on vector within map > --- > > Key: SPARK-5981 > URL: https://issues.apache.org/jira/browse/SPARK-5981 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley > > Currently, most Python models only have limited support for single-vector > prediction. > E.g., one can call {code}model.predict(myFeatureVector){code} for a single > instance, but that fails within a map for Python ML models and transformers > which use JavaModelWrapper: > {code} > data.map(lambda features: model.predict(features)) > {code} > This fails because JavaModelWrapper.call uses the SparkContext (within the > transformation). (It works for linear models, which do prediction within > Python.) > Supporting prediction within a map would require storing the model and doing > prediction/transformation within Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3889) JVM dies with SIGBUS, resulting in ConnectionManager failed ACK
[ https://issues.apache.org/jira/browse/SPARK-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341922#comment-14341922 ] Idan Zalzberg commented on SPARK-3889: -- Hi, I am still getting the same error with spark 1.2.1 (sporadically): {noformat} # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x7ff5ed042220, pid=3694, tid=140692916811520 # # JRE version: Java(TM) SE Runtime Environment (7.0_55-b13) (build 1.7.0_55-b13) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode linux-amd64 compressed oops) # Problematic frame: # v ~StubRoutines::jint_disjoint_arraycopy # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again {noformat} Should we re-open this one, or open a new ticket? > JVM dies with SIGBUS, resulting in ConnectionManager failed ACK > --- > > Key: SPARK-3889 > URL: https://issues.apache.org/jira/browse/SPARK-3889 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Aaron Davidson >Assignee: Aaron Davidson >Priority: Critical > Fix For: 1.2.0 > > > Here's the first part of the core dump, possibly caused by a job which > shuffles a lot of very small partitions. > {code} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0x7) at pc=0x7fa5885fcdb0, pid=488, tid=140343502632704 > # > # JRE version: 7.0_25-b30 > # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # v ~StubRoutines::jbyte_disjoint_arraycopy > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # If you would like to submit a bug report, please include > # instructions on how to reproduce the bug and visit: > # https://bugs.launchpad.net/ubuntu/+source/openjdk-7/ > # > --- T H R E A D --- > Current thread (0x7fa4b0631000): JavaThread "Executor task launch > worker-170" daemon [_thread_in_Java, id=6783, > stack(0x7fa4448ef000,0x7fa4449f)] > siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), > si_addr=0x7fa428f79000 > {code} > Here is the only useful content I can find related to JVM and SIGBUS from > Google: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=976664 > It appears it may be related to disposing byte buffers, which we do in the > ConnectionManager -- we mmap shuffle files via ManagedBuffer and dispose of > them in BufferMessage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5984) TimSort broken
[ https://issues.apache.org/jira/browse/SPARK-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-5984: --- Affects Version/s: (was: 1.3.0) > TimSort broken > -- > > Key: SPARK-5984 > URL: https://issues.apache.org/jira/browse/SPARK-5984 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.2.1 >Reporter: Reynold Xin >Priority: Minor > Fix For: 1.3.0 > > > See > http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/ > Our TimSort is based on Android's TimSort, which is broken in some corner > case. Marking it minor as this problem exists for almost all TimSort > implementations out there, including Android, OpenJDK, Python, and it hasn't > manifested itself in practice yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5984) TimSort broken
[ https://issues.apache.org/jira/browse/SPARK-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-5984. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: (was: Aaron Davidson) > TimSort broken > -- > > Key: SPARK-5984 > URL: https://issues.apache.org/jira/browse/SPARK-5984 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.2.1 >Reporter: Reynold Xin >Priority: Minor > Fix For: 1.3.0 > > > See > http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/ > Our TimSort is based on Android's TimSort, which is broken in some corner > case. Marking it minor as this problem exists for almost all TimSort > implementations out there, including Android, OpenJDK, Python, and it hasn't > manifested itself in practice yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6089) Size of task result fetched can't be found in UI
[ https://issues.apache.org/jira/browse/SPARK-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-6089: - Description: When you do a large collect the amount of data fetched as task result from each task is not present in the WebUI. We should make this appear under the 'Output' column (both per-task and in executor-level aggregation) cc [~kayousterhout] was: When you do a large collect the amount of data fetched as task result from each task is not present in the WebUI. We should make this appear under the 'Output' column (both per-task and in executor-level aggregation) [cc ~kayousterhout] > Size of task result fetched can't be found in UI > > > Key: SPARK-6089 > URL: https://issues.apache.org/jira/browse/SPARK-6089 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.3.0 >Reporter: Shivaram Venkataraman > > When you do a large collect the amount of data fetched as task result from > each task is not present in the WebUI. > We should make this appear under the 'Output' column (both per-task and in > executor-level aggregation) > cc [~kayousterhout] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6089) Size of task result fetched can't be found in UI
Shivaram Venkataraman created SPARK-6089: Summary: Size of task result fetched can't be found in UI Key: SPARK-6089 URL: https://issues.apache.org/jira/browse/SPARK-6089 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.3.0 Reporter: Shivaram Venkataraman When you do a large collect the amount of data fetched as task result from each task is not present in the WebUI. We should make this appear under the 'Output' column (both per-task and in executor-level aggregation) [cc ~kayousterhout] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4144) Support incremental model training of Naive Bayes classifier
[ https://issues.apache.org/jira/browse/SPARK-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341901#comment-14341901 ] Chris Fregly commented on SPARK-4144: - Hey [~freeman-lab]! I was literally just talking to [~josephkb] in the office last week about picking this up. Great timing! Let's coordinate offline. I'll shoot you an email. -Chris > Support incremental model training of Naive Bayes classifier > > > Key: SPARK-4144 > URL: https://issues.apache.org/jira/browse/SPARK-4144 > Project: Spark > Issue Type: Improvement > Components: MLlib, Streaming >Reporter: Chris Fregly >Assignee: Jeremy Freeman > > Per Xiangrui Meng from the following user list discussion: > http://mail-archives.apache.org/mod_mbox/spark-user/201408.mbox/%3CCAJgQjQ_QjMGO=jmm8weq1v8yqfov8du03abzy7eeavgjrou...@mail.gmail.com%3E > > "For Naive Bayes, we need to update the priors and conditional > probabilities, which means we should also remember the number of > observations for the updates." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6088) UI is malformed when tasks fetch remote results
[ https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout updated SPARK-6088: -- Description: There are three issues when tasks get remote results: (1) The status never changes from GET_RESULT to SUCCEEDED (2) The time to get the result is shown as the absolute time (resulting in a non-sensical output that says getting the result took >1 million hours) rather than the elapsed time (3) The getting result time is included as part of the scheduler delay cc [~shivaram] was: There are two issues when tasks get remote results: (1) The status never changes from GET_RESULT to SUCCEEDED (2) The time to get the result is shown as the absolute time (resulting in a non-sensical output that says getting the result took >1 million hours) rather than the elapsed time cc [~shivaram] > UI is malformed when tasks fetch remote results > --- > > Key: SPARK-6088 > URL: https://issues.apache.org/jira/browse/SPARK-6088 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout > Attachments: Screenshot 2015-02-28 18.24.42.png > > > There are three issues when tasks get remote results: > (1) The status never changes from GET_RESULT to SUCCEEDED > (2) The time to get the result is shown as the absolute time (resulting in a > non-sensical output that says getting the result took >1 million hours) > rather than the elapsed time > (3) The getting result time is included as part of the scheduler delay > cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6088) UI is malformed when tasks fetch remote results
[ https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341898#comment-14341898 ] Apache Spark commented on SPARK-6088: - User 'kayousterhout' has created a pull request for this issue: https://github.com/apache/spark/pull/4839 > UI is malformed when tasks fetch remote results > --- > > Key: SPARK-6088 > URL: https://issues.apache.org/jira/browse/SPARK-6088 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout > Attachments: Screenshot 2015-02-28 18.24.42.png > > > There are two issues when tasks get remote results: > (1) The status never changes from GET_RESULT to SUCCEEDED > (2) The time to get the result is shown as the absolute time (resulting in a > non-sensical output that says getting the result took >1 million hours) > rather than the elapsed time > cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6088) UI is malformed when tasks fetch remote results
[ https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341896#comment-14341896 ] Shivaram Venkataraman commented on SPARK-6088: -- Also for some reason the get result time is also included in the Scheduler Delay. Screen shot attached shows how the get result took 33 mins and how this shows up in scheduler delay. > UI is malformed when tasks fetch remote results > --- > > Key: SPARK-6088 > URL: https://issues.apache.org/jira/browse/SPARK-6088 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout > Attachments: Screenshot 2015-02-28 18.24.42.png > > > There are two issues when tasks get remote results: > (1) The status never changes from GET_RESULT to SUCCEEDED > (2) The time to get the result is shown as the absolute time (resulting in a > non-sensical output that says getting the result took >1 million hours) > rather than the elapsed time > cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6088) UI is malformed when tasks fetch remote results
[ https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-6088: - Attachment: Screenshot 2015-02-28 18.24.42.png > UI is malformed when tasks fetch remote results > --- > > Key: SPARK-6088 > URL: https://issues.apache.org/jira/browse/SPARK-6088 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout > Attachments: Screenshot 2015-02-28 18.24.42.png > > > There are two issues when tasks get remote results: > (1) The status never changes from GET_RESULT to SUCCEEDED > (2) The time to get the result is shown as the absolute time (resulting in a > non-sensical output that says getting the result took >1 million hours) > rather than the elapsed time > cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341895#comment-14341895 ] Marko Bonaci commented on SPARK-2620: - *Spark 1.2 shell local:* {code:java} scala> case class P(name:String) defined class P scala> val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) ps: Array[P] = Array(P(alice), P(bob), P(charly), P(bob)) scala> sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect res8: Array[(P, Int)] = Array((P(alice),1), (P(charly),1), (P(bob),2)) {code} > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6088) UI is malformed when tasks fetch remote results
Kay Ousterhout created SPARK-6088: - Summary: UI is malformed when tasks fetch remote results Key: SPARK-6088 URL: https://issues.apache.org/jira/browse/SPARK-6088 Project: Spark Issue Type: Bug Components: Web UI Reporter: Kay Ousterhout Assignee: Kay Ousterhout There are two issues when tasks get remote results: (1) The status never changes from GET_RESULT to SUCCEEDED (2) The time to get the result is shown as the absolute time (resulting in a non-sensical output that says getting the result took >1 million hours) rather than the elapsed time cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators
[ https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6086: --- Component/s: SQL > Exceptions in DAGScheduler.updateAccumulators > - > > Key: SPARK-6086 > URL: https://issues.apache.org/jira/browse/SPARK-6086 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core, SQL >Affects Versions: 1.3.0 >Reporter: Kai Zeng >Priority: Critical > > Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler > is collecting status from tasks. These exceptions happen occasionally, > especially when there are many stages in a job. > Application code: > https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala > Script used: ./bin/spark-submit --class > org.apache.spark.examples.sql.hive.SQLSuite > examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar > benchmark-cache 6 > There are two types of error messages: > {code} > java.lang.ClassCastException: scala.None$ cannot be cast to > scala.collection.TraversableOnce > at > org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188) > at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.Accumulators$.add(Accumulators.scala:335) > at > org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} > {code} > java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) > at > org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263) > at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.Accumulators$.add(Accumulators.scala:335) > at > org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators
[ https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6086: --- Component/s: Spark Core > Exceptions in DAGScheduler.updateAccumulators > - > > Key: SPARK-6086 > URL: https://issues.apache.org/jira/browse/SPARK-6086 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 1.3.0 >Reporter: Kai Zeng >Priority: Critical > > Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler > is collecting status from tasks. These exceptions happen occasionally, > especially when there are many stages in a job. > Application code: > https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala > Script used: ./bin/spark-submit --class > org.apache.spark.examples.sql.hive.SQLSuite > examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar > benchmark-cache 6 > There are two types of error messages: > {code} > java.lang.ClassCastException: scala.None$ cannot be cast to > scala.collection.TraversableOnce > at > org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188) > at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.Accumulators$.add(Accumulators.scala:335) > at > org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} > {code} > java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) > at > org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263) > at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.Accumulators$.add(Accumulators.scala:335) > at > org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators
[ https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6086: --- Description: Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler is collecting status from tasks. These exceptions happen occasionally, especially when there are many stages in a job. Application code: https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala Script used: ./bin/spark-submit --class org.apache.spark.examples.sql.hive.SQLSuite examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar benchmark-cache 6 There are two types of error messages: {code} java.lang.ClassCastException: scala.None$ cannot be cast to scala.collection.TraversableOnce at org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188) at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:335) at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) {code} {code} java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263) at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:335) at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) {code} was: Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler is collecting status from tasks. These exceptions happen occasionally, especially when there are many stages in a job. Application code: https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala Script used: ./bin/spark-submit --class org.apache.spark.examples.sql.hive.SQLSuite examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar benchmark-cache 6 There are two types of error messages: java.lang.ClassCastException: scala.None$ cannot be cast to scala.collection.TraversableOnce at org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188) at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collec
[jira] [Commented] (SPARK-6066) Metadata in event log makes it very difficult for external libraries to parse event log
[ https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341881#comment-14341881 ] Patrick Wendell commented on SPARK-6066: [~vanzin] - yes you are right (an early scratch version of the feature used a Gzip stream, I think). There are python bindings for all three of those compression codecs. To be fair, I'm not 100% sure the codecs are standardized enough to be compatible across different implementations. Gzip is pretty good in this regard, but not sure about those other three. > Metadata in event log makes it very difficult for external libraries to parse > event log > --- > > Key: SPARK-6066 > URL: https://issues.apache.org/jira/browse/SPARK-6066 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Kay Ousterhout >Assignee: Andrew Or >Priority: Blocker > > The fix for SPARK-2261 added a line at the beginning of the event log that > encodes metadata. This line makes it much more difficult to parse the event > logs from external libraries (like > https://github.com/kayousterhout/trace-analysis, which is used by folks at > Berkeley) because: > (1) The metadata is not written as JSON, unlike the rest of the file > (2) More annoyingly, if the file is compressed, the metadata is not > compressed. This has a few side-effects: first, someone can't just use the > command line to uncompress the file and then look at the logs, because the > file is in this weird half-compressed format; and second, now external tools > that parse these logs also need to deal with this weird format. > We should fix this before the 1.3 release, because otherwise we'll have to > add a bunch more backward-compatibility code to handle this weird format! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6087) Provide actionable exception if Kryo buffer is not large enough
Patrick Wendell created SPARK-6087: -- Summary: Provide actionable exception if Kryo buffer is not large enough Key: SPARK-6087 URL: https://issues.apache.org/jira/browse/SPARK-6087 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Patrick Wendell Priority: Critical Right now if you don't have a large enough Kryo buffer, you get a really confusing exception. I noticed this when using Kryo to serialize broadcasted tables in Spark SQL. We should catch-then-rethrow this in the KryoSerializer, wrapping it in a message that suggests increasing the Kryo buffer size configuration variable. {code} com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 3 Serialization trace: value (org.apache.spark.sql.catalyst.expressions.MutableAny) values (org.apache.spark.sql.catalyst.expressions.SpecificMutableRow) at com.esotericsoftware.kryo.io.Output.require(Output.java:138) at com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446) at com.esotericsoftware.kryo.io.Output.writeString(Output.java:306) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:167) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:234) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6087) Provide actionable exception if Kryo buffer is not large enough
[ https://issues.apache.org/jira/browse/SPARK-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6087: --- Description: Right now if you don't have a large enough Kryo buffer, you get a really confusing exception. I noticed this when using Kryo to serialize broadcasted tables in Spark SQL. We should catch-then-rethrow this in the KryoSerializer, wrapping it in a message that suggests increasing the Kryo buffer size configuration variable. {code} com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 3 Serialization trace: value (org.apache.spark.sql.catalyst.expressions.MutableAny) values (org.apache.spark.sql.catalyst.expressions.SpecificMutableRow) at com.esotericsoftware.kryo.io.Output.require(Output.java:138) at com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446) at com.esotericsoftware.kryo.io.Output.writeString(Output.java:306) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:167) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:234) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} /cc [~kayousterhout] who helped report his issue was: Right now if you don't have a large enough Kryo buffer, you get a really confusing exception. I noticed this when using Kryo to serialize broadcasted tables in Spark SQL. We should catch-then-rethrow this in the KryoSerializer, wrapping it in a message that suggests increasing the Kryo buffer size configuration variable. {code} com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 3 Serialization trace: value (org.apache.spark.sql.catalyst.expressions.MutableAny) values (org.apache.spark.sql.catalyst.expressions.SpecificMutableRow) at com.esotericsoftware.kryo.io.Output.require(Output.java:138) at com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446) at com.esotericsoftware.kryo.io.Output.writeString(Output.java:306) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializ
[jira] [Created] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators
Kai Zeng created SPARK-6086: --- Summary: Exceptions in DAGScheduler.updateAccumulators Key: SPARK-6086 URL: https://issues.apache.org/jira/browse/SPARK-6086 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.3.0 Reporter: Kai Zeng Priority: Critical Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler is collecting status from tasks. These exceptions happen occasionally, especially when there are many stages in a job. Application code: https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala Script used: ./bin/spark-submit --class org.apache.spark.examples.sql.hive.SQLSuite examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar benchmark-cache 6 There are two types of error messages: java.lang.ClassCastException: scala.None$ cannot be cast to scala.collection.TraversableOnce at org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188) at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:335) at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263) at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:335) at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2301) add ability to submit multiple jars for Driver
[ https://issues.apache.org/jira/browse/SPARK-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-2301. -- Resolution: Won't Fix The PR says this is WontFix. > add ability to submit multiple jars for Driver > -- > > Key: SPARK-2301 > URL: https://issues.apache.org/jira/browse/SPARK-2301 > Project: Spark > Issue Type: Improvement > Components: Deploy >Reporter: Lianhui Wang > > add ability to submit multiple jars for Driver > see PR: > https://github.com/apache/spark/pull/1113 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3357) Internal log messages should be set at DEBUG level instead of INFO
[ https://issues.apache.org/jira/browse/SPARK-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341846#comment-14341846 ] Apache Spark commented on SPARK-3357: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/4838 > Internal log messages should be set at DEBUG level instead of INFO > -- > > Key: SPARK-3357 > URL: https://issues.apache.org/jira/browse/SPARK-3357 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Xiangrui Meng >Priority: Minor > > spark-shell shows INFO by default, so we should carefully choose what to show > at INFO level. For example, if I run > {code} > sc.parallelize(0 until 100).count() > {code} > and wait for one minute or so. I will see messages that mix with the current > input box, which is annoying: > {code} > scala> 14/09/02 17:09:00 INFO BlockManager: Removing broadcast 0 > 14/09/02 17:09:00 INFO BlockManager: Removing block broadcast_0 > 14/09/02 17:09:00 INFO MemoryStore: Block broadcast_0 of size 1088 dropped > from memory (free 278019440) > 14/09/02 17:09:00 INFO ContextCleaner: Cleaned broadcast 0 > {code} > Does a user need to know when a broadcast variable is removed? Maybe not. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6068) KMeans Parallel test may fail
[ https://issues.apache.org/jira/browse/SPARK-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341833#comment-14341833 ] Joseph K. Bradley commented on SPARK-6068: -- [~derrickburns] I'm sorry about how it can take a long time to get a PR into Spark, but sending small PRs with one PR per JIRA helps a lot. For a reviewer to say "LGTM," they need to fully understand and be prepared to "own" the code, which makes reviewing large patches *much* harder. I've spent a lot of time breaking my patches into smaller pieces. Looking over your JIRAs, the changes all sound useful. It also seems like the most important change for you (supporting general Bregman divergences) could potentially be added in spark.ml or spark.mllib without making breaking changes. Since there is no distance metric parameter currently, adding one based on a Bregman divergence API should be possible. However, but it's pretty hard to figure out exactly what changes are needed because of the many issues being addressed in your big k-means PR. A smaller PR would help a lot. I hope it will prove worthwhile for you to help get these improvements into MLlib, piece by piece. I don't think they will all require waiting for the spark.ml API, but if you do want to make major API changes, then this would be time to design the new API for the spark.ml package. * [SPARK-6001] might require an API change since it would return a model which could not be serialized. Perhaps it could follow a similar pattern as LDA, which returns a DistributedLDAModel (with info about the training dataset topic distributions), which in turn can be converted into a LocalLDAModel (which stores model parameters locally and drops the training dataset info). > KMeans Parallel test may fail > - > > Key: SPARK-6068 > URL: https://issues.apache.org/jira/browse/SPARK-6068 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.1 >Reporter: Derrick Burns > Labels: clustering > Original Estimate: 24h > Remaining Estimate: 24h > > The test "k-means|| initialization in KMeansSuite can fail when the random > number generator is truly random. > The test is predicated on the assumption that each round of K-Means || will > add at least one new cluster center. The current implementation of K-Means > || adds 2*k cluster centers with high probability. However, there is no > deterministic lower bound on the number of cluster centers added. > Choices are: > 1) change the KMeans || implementation to iterate on selecting points until > it has satisfied a lower bound on the number of points chosen. > 2) eliminate the test > 3) ignore the problem and depend on the random number generator to sample the > space in a lucky manner. > Option (1) is most in keeping with the contract that KMeans || should provide > a precise number of cluster centers when possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFoundException with Kryo, Guava 14
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341819#comment-14341819 ] Sean Owen commented on SPARK-6069: -- No I set it kind of preemptively. I don't know that I serialize any Guava classes though, come to think of it. I am using YARN + Hadoop 2.5. I don't think it should be necessary in general. Guava is a strange special case, so though it worth trying. If you have the energy, you might try 1.3.0-SNAPSHOT since I see a few things fixed that may be relevant: https://issues.apache.org/jira/browse/SPARK-4877 https://issues.apache.org/jira/browse/SPARK-4660 > Deserialization Error ClassNotFoundException with Kryo, Guava 14 > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel >Priority: Critical > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFoundException with Kryo, Guava 14
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341809#comment-14341809 ] Pat Ferrel commented on SPARK-6069: --- Embarrassed to say still on Hadoop 1.2.1 and so no yarn. The packaging is not in the app jar but a separate pruned down dependencies-only jar. I can see why yarn would throw a unique kink into the situation. So I guess you ran into this and had to use the {{user.classpath.first}} work around or are you saying it doesn't occur in oryx? Still none of this should be necessary, right? Why else would jars be specified in to context creation? We do have a work around if someone has to work with 1.2.1 but because of that it doesn't seem like a good version to recommend. Maybe I'll try 1.2 and install H2 and yarn--which seems like what the distros support. > Deserialization Error ClassNotFoundException with Kryo, Guava 14 > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel >Priority: Critical > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5389) spark-shell.cmd does not run from DOS Windows 7
[ https://issues.apache.org/jira/browse/SPARK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341790#comment-14341790 ] Nicholas Chammas edited comment on SPARK-5389 at 2/28/15 9:48 PM: -- Marking as major since the shell -is technically broken- is behaving terribly when Java cannot be found. Reopening since multiple reports of this problem have come in. was (Author: nchammas): Marking as major since the shell is technically broken. (Trivial is for mostly cosmetic problems.) Reopening since multiple reports of this problem have come in. > spark-shell.cmd does not run from DOS Windows 7 > --- > > Key: SPARK-5389 > URL: https://issues.apache.org/jira/browse/SPARK-5389 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.2.0 > Environment: Windows 7 >Reporter: Yana Kadiyska > Attachments: SparkShell_Win7.JPG > > > spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. > spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2 > Marking as trivial since calling spark-shell2.cmd also works fine > Attaching a screenshot since the error isn't very useful: > {code} > spark-1.2.0-bin-cdh4>bin\spark-shell.cmd > else was unexpected at this time. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6084) spark-shell broken on Windows
[ https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-6084. - Resolution: Duplicate Resolving as duplicate of SPARK-5389. That seems a more likely match for this than SPARK-4833. > spark-shell broken on Windows > - > > Key: SPARK-6084 > URL: https://issues.apache.org/jira/browse/SPARK-6084 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.2.0, 1.2.1 > Environment: Windows 7, Scala 2.11.4, Java 1.8 >Reporter: Nicholas Chammas > Labels: windows > > Original report here: > http://stackoverflow.com/questions/28747795/spark-launch-find-version > For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this: > {code} > bin\spark-shell.cmd > {code} > Yields the following error: > {code} > find: 'version': No such file or directory > else was unexpected at this time. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5389) spark-shell.cmd does not run from DOS Windows 7
[ https://issues.apache.org/jira/browse/SPARK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5389: Description: spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2 Marking as trivial since calling spark-shell2.cmd also works fine Attaching a screenshot since the error isn't very useful: {code} spark-1.2.0-bin-cdh4>bin\spark-shell.cmd else was unexpected at this time. {code} was: spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2 Marking as trivial sine calling spark-shell2.cmd also works fine Attaching a screenshot since the error isn't very useful: spark-1.2.0-bin-cdh4>bin\spark-shell.cmd else was unexpected at this time. Priority: Major (was: Trivial) Environment: Windows 7 Marking as major since the shell is technically broken. (Trivial is for mostly cosmetic problems.) Reopening since multiple reports of this problem have come in. > spark-shell.cmd does not run from DOS Windows 7 > --- > > Key: SPARK-5389 > URL: https://issues.apache.org/jira/browse/SPARK-5389 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.2.0 > Environment: Windows 7 >Reporter: Yana Kadiyska > Attachments: SparkShell_Win7.JPG > > > spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. > spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2 > Marking as trivial since calling spark-shell2.cmd also works fine > Attaching a screenshot since the error isn't very useful: > {code} > spark-1.2.0-bin-cdh4>bin\spark-shell.cmd > else was unexpected at this time. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-5389) spark-shell.cmd does not run from DOS Windows 7
[ https://issues.apache.org/jira/browse/SPARK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-5389: - > spark-shell.cmd does not run from DOS Windows 7 > --- > > Key: SPARK-5389 > URL: https://issues.apache.org/jira/browse/SPARK-5389 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.2.0 > Environment: Windows 7 >Reporter: Yana Kadiyska > Attachments: SparkShell_Win7.JPG > > > spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. > spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2 > Marking as trivial since calling spark-shell2.cmd also works fine > Attaching a screenshot since the error isn't very useful: > {code} > spark-1.2.0-bin-cdh4>bin\spark-shell.cmd > else was unexpected at this time. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5389) spark-shell.cmd does not run from DOS Windows 7
[ https://issues.apache.org/jira/browse/SPARK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341789#comment-14341789 ] Nicholas Chammas commented on SPARK-5389: - Yeah, I think we found another instance of this in SPARK-6084 / [here|http://stackoverflow.com/questions/28747795/spark-launch-find-version]. > spark-shell.cmd does not run from DOS Windows 7 > --- > > Key: SPARK-5389 > URL: https://issues.apache.org/jira/browse/SPARK-5389 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.2.0 >Reporter: Yana Kadiyska >Priority: Trivial > Attachments: SparkShell_Win7.JPG > > > spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. > spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2 > Marking as trivial sine calling spark-shell2.cmd also works fine > Attaching a screenshot since the error isn't very useful: > spark-1.2.0-bin-cdh4>bin\spark-shell.cmd > else was unexpected at this time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5396) Syntax error in spark scripts on windows.
[ https://issues.apache.org/jira/browse/SPARK-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341788#comment-14341788 ] Nicholas Chammas commented on SPARK-5396: - What does that error message say in English? So we can pattern match to similar reports elsewhere. > Syntax error in spark scripts on windows. > - > > Key: SPARK-5396 > URL: https://issues.apache.org/jira/browse/SPARK-5396 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.3.0 > Environment: Window 7 and Window 8.1. >Reporter: Vladimir Protsenko >Assignee: Masayoshi TSUZUKI >Priority: Critical > Fix For: 1.3.0 > > Attachments: windows7.png, windows8.1.png > > > I made the following steps: > 1. downloaded and installed Scala 2.11.5 > 2. downloaded spark 1.2.0 by git clone git://github.com/apache/spark.git > 3. run dev/change-version-to-2.11.sh and mvn -Dscala-2.11 -DskipTests clean > package (in git bash) > After installation tried to run spark-shell.cmd in cmd shell and it says > there is a syntax error in file. The same with spark-shell2.cmd, > spark-submit.cmd and spark-submit2.cmd. > !windows7.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6085) Increase default value for memory overhead
[ https://issues.apache.org/jira/browse/SPARK-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341786#comment-14341786 ] Apache Spark commented on SPARK-6085: - User 'tedyu' has created a pull request for this issue: https://github.com/apache/spark/pull/4836 > Increase default value for memory overhead > -- > > Key: SPARK-6085 > URL: https://issues.apache.org/jira/browse/SPARK-6085 > Project: Spark > Issue Type: Improvement >Reporter: Ted Yu > > Several users have communicated how current default memory overhead value > resulted in failed computation in Spark on YARN. > See this thread: > http://search-hadoop.com/m/JW1q58FDel > Increasing default value for memory overhead would improve out of box user > experience. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6084) spark-shell broken on Windows
[ https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341787#comment-14341787 ] Nicholas Chammas commented on SPARK-6084: - Ah, there's also SPARK-5396, though it's in Russian (?) so I'm not sure if the error is the same. > spark-shell broken on Windows > - > > Key: SPARK-6084 > URL: https://issues.apache.org/jira/browse/SPARK-6084 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.2.0, 1.2.1 > Environment: Windows 7, Scala 2.11.4, Java 1.8 >Reporter: Nicholas Chammas > Labels: windows > > Original report here: > http://stackoverflow.com/questions/28747795/spark-launch-find-version > For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this: > {code} > bin\spark-shell.cmd > {code} > Yields the following error: > {code} > find: 'version': No such file or directory > else was unexpected at this time. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6085) Increase default value for memory overhead
Ted Yu created SPARK-6085: - Summary: Increase default value for memory overhead Key: SPARK-6085 URL: https://issues.apache.org/jira/browse/SPARK-6085 Project: Spark Issue Type: Improvement Reporter: Ted Yu Several users have communicated how current default memory overhead value resulted in failed computation in Spark on YARN. See this thread: http://search-hadoop.com/m/JW1q58FDel Increasing default value for memory overhead would improve out of box user experience. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6084) spark-shell broken on Windows
[ https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341785#comment-14341785 ] Sean Owen commented on SPARK-6084: -- Oops, I meant SPARK-5389. It still may not be the same thing. Maybe [~tsudukim] can look to double-check whether it's the same? Is {{find "version"'}} supposed to work in Windows at large, or just PowerShell or...? > spark-shell broken on Windows > - > > Key: SPARK-6084 > URL: https://issues.apache.org/jira/browse/SPARK-6084 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.2.0, 1.2.1 > Environment: Windows 7, Scala 2.11.4, Java 1.8 >Reporter: Nicholas Chammas > Labels: windows > > Original report here: > http://stackoverflow.com/questions/28747795/spark-launch-find-version > For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this: > {code} > bin\spark-shell.cmd > {code} > Yields the following error: > {code} > find: 'version': No such file or directory > else was unexpected at this time. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-6084) spark-shell broken on Windows
[ https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-6084: - Don't see how this is a dup of SPARK-4833. > spark-shell broken on Windows > - > > Key: SPARK-6084 > URL: https://issues.apache.org/jira/browse/SPARK-6084 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.2.0, 1.2.1 > Environment: Windows 7, Scala 2.11.4, Java 1.8 >Reporter: Nicholas Chammas > Labels: windows > > Original report here: > http://stackoverflow.com/questions/28747795/spark-launch-find-version > For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this: > {code} > bin\spark-shell.cmd > {code} > Yields the following error: > {code} > find: 'version': No such file or directory > else was unexpected at this time. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6084) spark-shell broken on Windows
[ https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341776#comment-14341776 ] Nicholas Chammas commented on SPARK-6084: - I took a look at the linked issue (SPARK-4833) and I don't see how they are duplicates. They both relate to spark-shell and Windows, but the error messages and conditions are different. Here the use is claiming spark-shell fails with an error right away. There, the user is claiming spark-shell runs OK the first time, but then doesn't run a second time. > spark-shell broken on Windows > - > > Key: SPARK-6084 > URL: https://issues.apache.org/jira/browse/SPARK-6084 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.2.0, 1.2.1 > Environment: Windows 7, Scala 2.11.4, Java 1.8 >Reporter: Nicholas Chammas > Labels: windows > > Original report here: > http://stackoverflow.com/questions/28747795/spark-launch-find-version > For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this: > {code} > bin\spark-shell.cmd > {code} > Yields the following error: > {code} > find: 'version': No such file or directory > else was unexpected at this time. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6084) spark-shell broken on Windows
[ https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-6084. -- Resolution: Duplicate Target Version/s: (was: 1.3.0) I think a lot does not work on Windows. > spark-shell broken on Windows > - > > Key: SPARK-6084 > URL: https://issues.apache.org/jira/browse/SPARK-6084 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.2.0, 1.2.1 > Environment: Windows 7, Scala 2.11.4, Java 1.8 >Reporter: Nicholas Chammas > Labels: windows > > Original report here: > http://stackoverflow.com/questions/28747795/spark-launch-find-version > For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this: > {code} > bin\spark-shell.cmd > {code} > Yields the following error: > {code} > find: 'version': No such file or directory > else was unexpected at this time. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost
[ https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341767#comment-14341767 ] Apache Spark commented on SPARK-6075: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/4835 > After SPARK-3885, some tasks' accumulator updates may be lost > - > > Key: SPARK-6075 > URL: https://issues.apache.org/jira/browse/SPARK-6075 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.4.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Blocker > > It looks like some of the AccumulatorSuite tests have started failing > nondeterministically on Jenkins. The errors seem to be due to lost / missing > accumulator updates, e.g. > {code} > Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901 > {code} > This could somehow be related to SPARK-3885 / > https://github.com/apache/spark/pull/4021, a patch to garbage-collect > accumulators, which was only merged into master. > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/ > I think I've figured it out: consider the lifecycle of an accumulator in a > task, say ShuffleMapTask: on the executor, each task deserializes its own > copy of the RDD inside of its runTask method, so the strong reference to the > RDD disappears at the end of runTask. In Executor.run(), we call > Accumulators.values after runTask has exited, so there's a small window in > which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well > because there are no longer any strong references to them. > The fix is to keep strong references in localAccums, since we clear this at > the end of each task anyways. I'm glad that I was able to figure out > precisely why this was necessary and sorry that I missed this during review; > I'll submit a fix shortly. In terms of preventative measures, it might be a > good idea to write up the lifetime / lifecycle of objects' strong references > whenever we're using WeakReferences, since the process of explicitly writing > that out would prevent these sorts of mistakes in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost
[ https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-6075: -- Priority: Blocker (was: Critical) > After SPARK-3885, some tasks' accumulator updates may be lost > - > > Key: SPARK-6075 > URL: https://issues.apache.org/jira/browse/SPARK-6075 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.4.0 >Reporter: Josh Rosen >Priority: Blocker > > It looks like some of the AccumulatorSuite tests have started failing > nondeterministically on Jenkins. The errors seem to be due to lost / missing > accumulator updates, e.g. > {code} > Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901 > {code} > This could somehow be related to SPARK-3885 / > https://github.com/apache/spark/pull/4021, a patch to garbage-collect > accumulators, which was only merged into master. > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/ > I think I've figured it out: consider the lifecycle of an accumulator in a > task, say ShuffleMapTask: on the executor, each task deserializes its own > copy of the RDD inside of its runTask method, so the strong reference to the > RDD disappears at the end of runTask. In Executor.run(), we call > Accumulators.values after runTask has exited, so there's a small window in > which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well > because there are no longer any strong references to them. > The fix is to keep strong references in localAccums, since we clear this at > the end of each task anyways. I'm glad that I was able to figure out > precisely why this was necessary and sorry that I missed this during review; > I'll submit a fix shortly. In terms of preventative measures, it might be a > good idea to write up the lifetime / lifecycle of objects' strong references > whenever we're using WeakReferences, since the process of explicitly writing > that out would prevent these sorts of mistakes in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost
[ https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-6075: -- Labels: (was: flaky-test) > After SPARK-3885, some tasks' accumulator updates may be lost > - > > Key: SPARK-6075 > URL: https://issues.apache.org/jira/browse/SPARK-6075 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.4.0 >Reporter: Josh Rosen >Priority: Critical > > It looks like some of the AccumulatorSuite tests have started failing > nondeterministically on Jenkins. The errors seem to be due to lost / missing > accumulator updates, e.g. > {code} > Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901 > {code} > This could somehow be related to SPARK-3885 / > https://github.com/apache/spark/pull/4021, a patch to garbage-collect > accumulators, which was only merged into master. > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/ > I think I've figured it out: consider the lifecycle of an accumulator in a > task, say ShuffleMapTask: on the executor, each task deserializes its own > copy of the RDD inside of its runTask method, so the strong reference to the > RDD disappears at the end of runTask. In Executor.run(), we call > Accumulators.values after runTask has exited, so there's a small window in > which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well > because there are no longer any strong references to them. > The fix is to keep strong references in localAccums, since we clear this at > the end of each task anyways. I'm glad that I was able to figure out > precisely why this was necessary and sorry that I missed this during review; > I'll submit a fix shortly. In terms of preventative measures, it might be a > good idea to write up the lifetime / lifecycle of objects' strong references > whenever we're using WeakReferences, since the process of explicitly writing > that out would prevent these sorts of mistakes in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost
[ https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-6075: -- Description: It looks like some of the AccumulatorSuite tests have started failing nondeterministically on Jenkins. The errors seem to be due to lost / missing accumulator updates, e.g. {code} Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901 {code} This could somehow be related to SPARK-3885 / https://github.com/apache/spark/pull/4021, a patch to garbage-collect accumulators, which was only merged into master. https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/ I think I've figured it out: consider the lifecycle of an accumulator in a task, say ShuffleMapTask: on the executor, each task deserializes its own copy of the RDD inside of its runTask method, so the strong reference to the RDD disappears at the end of runTask. In Executor.run(), we call Accumulators.values after runTask has exited, so there's a small window in which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well because there are no longer any strong references to them. The fix is to keep strong references in localAccums, since we clear this at the end of each task anyways. I'm glad that I was able to figure out precisely why this was necessary and sorry that I missed this during review; I'll submit a fix shortly. In terms of preventative measures, it might be a good idea to write up the lifetime / lifecycle of objects' strong references whenever we're using WeakReferences, since the process of explicitly writing that out would prevent these sorts of mistakes in the future. was: It looks like some of the AccumulatorSuite tests have started failing nondeterministically on Jenkins. The errors seem to be due to lost / missing accumulator updates, e.g. {code} Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901 {code} This could somehow be related to SPARK-3885 / https://github.com/apache/spark/pull/4021, a patch to garbage-collect accumulators, which was only merged into master. https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/ > After SPARK-3885, some tasks' accumulator updates may be lost > - > > Key: SPARK-6075 > URL: https://issues.apache.org/jira/browse/SPARK-6075 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.4.0 >Reporter: Josh Rosen >Priority: Critical > > It looks like some of the AccumulatorSuite tests have started failing > nondeterministically on Jenkins. The errors seem to be due to lost / missing > accumulator updates, e.g. > {code} > Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901 > {code} > This could somehow be related to SPARK-3885 / > https://github.com/apache/spark/pull/4021, a patch to garbage-collect > accumulators, which was only merged into master. > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/ > I think I've figured it out: consider the lifecycle of an accumulator in a > task, say ShuffleMapTask: on the executor, each task deserializes its own > copy of the RDD inside of its runTask method, so the strong reference to the > RDD disappears at the end of runTask. In Executor.run(), we call > Accumulators.values after runTask has exited, so there's a small window in > which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well > because there are no longer any strong references to them. > The fix is to keep strong references in localAccums, since we clear this at > the end of each task anyways. I'm glad that I was able to figure out > precisely why this was necessary and sorry that I missed this during review; > I'll submit a fix shortly. In terms of preventative measures, it might be a > good idea to write up the lifetime / lifecycle of objects' strong references > whenever we're using WeakReferences, since the process of explicitly writing > that out would prevent these sorts of mistakes in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost
[ https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-6075: - Assignee: Josh Rosen > After SPARK-3885, some tasks' accumulator updates may be lost > - > > Key: SPARK-6075 > URL: https://issues.apache.org/jira/browse/SPARK-6075 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.4.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Blocker > > It looks like some of the AccumulatorSuite tests have started failing > nondeterministically on Jenkins. The errors seem to be due to lost / missing > accumulator updates, e.g. > {code} > Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901 > {code} > This could somehow be related to SPARK-3885 / > https://github.com/apache/spark/pull/4021, a patch to garbage-collect > accumulators, which was only merged into master. > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/ > I think I've figured it out: consider the lifecycle of an accumulator in a > task, say ShuffleMapTask: on the executor, each task deserializes its own > copy of the RDD inside of its runTask method, so the strong reference to the > RDD disappears at the end of runTask. In Executor.run(), we call > Accumulators.values after runTask has exited, so there's a small window in > which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well > because there are no longer any strong references to them. > The fix is to keep strong references in localAccums, since we clear this at > the end of each task anyways. I'm glad that I was able to figure out > precisely why this was necessary and sorry that I missed this during review; > I'll submit a fix shortly. In terms of preventative measures, it might be a > good idea to write up the lifetime / lifecycle of objects' strong references > whenever we're using WeakReferences, since the process of explicitly writing > that out would prevent these sorts of mistakes in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost
[ https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-6075: -- Comment: was deleted (was: I left some notes on the PR that might have introduced the bug: https://github.com/apache/spark/pull/4021#issuecomment-76511660 {quote} I'm still trying to see if I can spot the problem, but my hunch is that maybe the localAccums thread-local maps should not hold weak references. When deserializing an accumulator in an executor and registering it with localAccums, is there ever a moment in which the accumulator has no strong references pointing to it? Does someone object hold a strong reference to an accumulator while it's being deserialized? If not, this could lead to it being dropped from the localAccums map, causing that task's accumulator updates to be lost. {quote}) > After SPARK-3885, some tasks' accumulator updates may be lost > - > > Key: SPARK-6075 > URL: https://issues.apache.org/jira/browse/SPARK-6075 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.4.0 >Reporter: Josh Rosen >Priority: Critical > > It looks like some of the AccumulatorSuite tests have started failing > nondeterministically on Jenkins. The errors seem to be due to lost / missing > accumulator updates, e.g. > {code} > Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901 > {code} > This could somehow be related to SPARK-3885 / > https://github.com/apache/spark/pull/4021, a patch to garbage-collect > accumulators, which was only merged into master. > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/ > I think I've figured it out: consider the lifecycle of an accumulator in a > task, say ShuffleMapTask: on the executor, each task deserializes its own > copy of the RDD inside of its runTask method, so the strong reference to the > RDD disappears at the end of runTask. In Executor.run(), we call > Accumulators.values after runTask has exited, so there's a small window in > which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well > because there are no longer any strong references to them. > The fix is to keep strong references in localAccums, since we clear this at > the end of each task anyways. I'm glad that I was able to figure out > precisely why this was necessary and sorry that I missed this during review; > I'll submit a fix shortly. In terms of preventative measures, it might be a > good idea to write up the lifetime / lifecycle of objects' strong references > whenever we're using WeakReferences, since the process of explicitly writing > that out would prevent these sorts of mistakes in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost
[ https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-6075: -- Summary: After SPARK-3885, some tasks' accumulator updates may be lost (was: Flaky AccumulatorSuite.add value to collection accumulators test) > After SPARK-3885, some tasks' accumulator updates may be lost > - > > Key: SPARK-6075 > URL: https://issues.apache.org/jira/browse/SPARK-6075 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.4.0 >Reporter: Josh Rosen >Priority: Critical > Labels: flaky-test > > It looks like some of the AccumulatorSuite tests have started failing > nondeterministically on Jenkins. The errors seem to be due to lost / missing > accumulator updates, e.g. > {code} > Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901 > {code} > This could somehow be related to SPARK-3885 / > https://github.com/apache/spark/pull/4021, a patch to garbage-collect > accumulators, which was only merged into master. > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6075) Flaky AccumulatorSuite.add value to collection accumulators test
[ https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-6075: -- Priority: Critical (was: Major) > Flaky AccumulatorSuite.add value to collection accumulators test > > > Key: SPARK-6075 > URL: https://issues.apache.org/jira/browse/SPARK-6075 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.4.0 >Reporter: Josh Rosen >Priority: Critical > Labels: flaky-test > > It looks like some of the AccumulatorSuite tests have started failing > nondeterministically on Jenkins. The errors seem to be due to lost / missing > accumulator updates, e.g. > {code} > Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901 > {code} > This could somehow be related to SPARK-3885 / > https://github.com/apache/spark/pull/4021, a patch to garbage-collect > accumulators, which was only merged into master. > https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3885) Provide mechanism to remove accumulators once they are no longer used
[ https://issues.apache.org/jira/browse/SPARK-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341753#comment-14341753 ] Josh Rosen commented on SPARK-3885: --- I found a correctness issue in this patch, which I'll fix shortly: see SPARK-6075 > Provide mechanism to remove accumulators once they are no longer used > - > > Key: SPARK-3885 > URL: https://issues.apache.org/jira/browse/SPARK-3885 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2, 1.1.0, 1.2.0 >Reporter: Josh Rosen >Assignee: Ilya Ganelin > Fix For: 1.4.0 > > > Spark does not currently provide any mechanism to delete accumulators after > they are no longer used. This can lead to OOMs for long-lived SparkContexts > that create many large accumulators. > Part of the problem is that accumulators are registered in a global > {{Accumulators}} registry. Maybe the fix would be as simple as using weak > references in the Accumulators registry so that accumulators can be GC'd once > they can no longer be used. > In the meantime, here's a workaround that users can try: > Accumulators have a public setValue() method that can be called (only by the > driver) to change an accumulator’s value. You might be able to use this to > reset accumulators’ values to smaller objects (e.g. the “zero” object of > whatever your accumulator type is, or ‘null’ if you’re sure that the > accumulator will never be accessed again). > This issue was originally reported by [~nkronenfeld] on the dev mailing list: > http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Accumulator-question-td8709.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5981) pyspark ML models should support predict/transform on vector within map
[ https://issues.apache.org/jira/browse/SPARK-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341752#comment-14341752 ] Manoj Kumar commented on SPARK-5981: It seems for NaiveBayes it does work, see https://github.com/apache/spark/pull/4834 . I shall have a better look tomorrow. Sorry for the delay. > pyspark ML models should support predict/transform on vector within map > --- > > Key: SPARK-5981 > URL: https://issues.apache.org/jira/browse/SPARK-5981 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley > > Currently, most Python models only have limited support for single-vector > prediction. > E.g., one can call {code}model.predict(myFeatureVector){code} for a single > instance, but that fails within a map for Python ML models and transformers > which use JavaModelWrapper: > {code} > data.map(lambda features: model.predict(features)) > {code} > This fails because JavaModelWrapper.call uses the SparkContext (within the > transformation). (It works for linear models, which do prediction within > Python.) > Supporting prediction within a map would require storing the model and doing > prediction/transformation within Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6084) spark-shell broken on Windows
[ https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341746#comment-14341746 ] Nicholas Chammas commented on SPARK-6084: - cc [~pwendell], [~andrewor14] I haven't confirmed this issue myself. Just forwarding along the report I saw on Stack Overflow. > spark-shell broken on Windows > - > > Key: SPARK-6084 > URL: https://issues.apache.org/jira/browse/SPARK-6084 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.2.0, 1.2.1 > Environment: Windows 7, Scala 2.11.4, Java 1.8 >Reporter: Nicholas Chammas > Labels: windows > > Original report here: > http://stackoverflow.com/questions/28747795/spark-launch-find-version > For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this: > {code} > bin\spark-shell.cmd > {code} > Yields the following error: > {code} > find: 'version': No such file or directory > else was unexpected at this time. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6084) spark-shell broken on Windows
Nicholas Chammas created SPARK-6084: --- Summary: spark-shell broken on Windows Key: SPARK-6084 URL: https://issues.apache.org/jira/browse/SPARK-6084 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.2.1, 1.2.0 Environment: Windows 7, Scala 2.11.4, Java 1.8 Reporter: Nicholas Chammas Original report here: http://stackoverflow.com/questions/28747795/spark-launch-find-version For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this: {code} bin\spark-shell.cmd {code} Yields the following error: {code} find: 'version': No such file or directory else was unexpected at this time. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6083) Make Python API example consistent in NaiveBayes
[ https://issues.apache.org/jira/browse/SPARK-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341735#comment-14341735 ] Apache Spark commented on SPARK-6083: - User 'MechCoder' has created a pull request for this issue: https://github.com/apache/spark/pull/4834 > Make Python API example consistent in NaiveBayes > > > Key: SPARK-6083 > URL: https://issues.apache.org/jira/browse/SPARK-6083 > Project: Spark > Issue Type: Documentation > Components: Documentation, MLlib >Reporter: Manoj Kumar >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3413) Spark Blocked due to Executor lost in FIFO MODE
[ https://issues.apache.org/jira/browse/SPARK-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341732#comment-14341732 ] Sean Owen commented on SPARK-3413: -- This looks like it might be stale. It's also not a great deal of info to go on. The driver should be rescheduling tasks that fail or whose executors fail, right? Is there more info? can this still be reproduced? > Spark Blocked due to Executor lost in FIFO MODE > --- > > Key: SPARK-3413 > URL: https://issues.apache.org/jira/browse/SPARK-3413 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 0.9.2 >Reporter: Patrick Liu > > I run spark on yarn. > Spark scheduler is running in FIFO mode. > I have 80 worker instances setup. However, as time passes, some worker will > be lost. (Killed by JVM when OOM, etc). > But some tasks will still run in those executors. > Obviously the task will never finished. > Then the stage will not finish. So the later stages will be blocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6083) Make Python API example consistent in NaiveBayes
Manoj Kumar created SPARK-6083: -- Summary: Make Python API example consistent in NaiveBayes Key: SPARK-6083 URL: https://issues.apache.org/jira/browse/SPARK-6083 Project: Spark Issue Type: Documentation Components: Documentation, MLlib Reporter: Manoj Kumar Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6082) SparkSQL should fail gracefully when input data format doesn't match expectations
Kay Ousterhout created SPARK-6082: - Summary: SparkSQL should fail gracefully when input data format doesn't match expectations Key: SPARK-6082 URL: https://issues.apache.org/jira/browse/SPARK-6082 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.1 Reporter: Kay Ousterhout I have a udf that creates a tab-delimited table. If any of the column values contain a tab, SQL fails with an ArrayIndexOutOfBounds exception (pasted below). It would be great if SQL failed gracefully here, with a helpful exception (something like "One row contained too many values"). It looks like this can be done quite easily, by checking here if i > columnBuilders.size and if so, throwing a nicer exception: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala#L124. One thing that makes this problem especially annoying to debug is because if you do "CREATE table foo as select transform(..." and then "CACHE table foo", it works fine. It only fails if you do "CACHE table foo as select transform(...". Because of this, it would be great if the problem were more transparent to users. Stack trace: java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.spark.sql.columnar.InMemoryRelation$anonfun$3$anon$1.next(InMemoryColumnarTableScan.scala:125) at org.apache.spark.sql.columnar.InMemoryRelation$anonfun$3$anon$1.next(InMemoryColumnarTableScan.scala:112) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70) at org.apache.spark.rdd.RDD.iterator(RDD.scala:245) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:220) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3402) Library for Natural Language Processing over Spark.
[ https://issues.apache.org/jira/browse/SPARK-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3402. -- Resolution: Won't Fix I think the consensus at this point would be that most third-party libraries built on Spark should by default be hosted outside the Spark project, and linked to at http://spark-packages.org/ That's the way to go if you've already got your own stand-alone project. Reopen if you mean you have an implementation to submit that fits, likely, the new ML Pipelines API. > Library for Natural Language Processing over Spark. > --- > > Key: SPARK-3402 > URL: https://issues.apache.org/jira/browse/SPARK-3402 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Nagamallikarjuna >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3387) Misleading stage description on the driver UI
[ https://issues.apache.org/jira/browse/SPARK-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3387. -- Resolution: Not a Problem Your code does not include a call to {{groupBy}}, right? It calls {{groupByKey}} and that's part of the list here. Yes, the actual execution plan does not map directly to what the user called. Some user-facing API methods are not distributed operations at all; some invoke several different distributed operations. I think this is as-intended. > Misleading stage description on the driver UI > - > > Key: SPARK-3387 > URL: https://issues.apache.org/jira/browse/SPARK-3387 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.0.2 > Environment: Java 1.6, OSX Mountain Lion >Reporter: Christian Chua > > Steps to reproduce : compile and run this modified version of the 1.0.2 > pagerank example : > public static void main(String[] args) throws Exception { > JavaSparkContext sc = new JavaSparkContext("local[8]", "Sample"); > JavaRDD < String > inputRDD = sc.textFile(INPUT_FILE,1); > JavaPairRDD < String , String > a = inputRDD.mapToPair(new > PairFunction < String , String , String >() { > @Override > public Tuple2 < String , String > call(String s) throws Exception > { > String[] parts = SPACES.split(s); > return new Tuple2 < String , String >(parts[0], parts[1]); > } > }); > JavaPairRDD < String , String > b = a.distinct(); > JavaPairRDD < String , Iterable < String >> c = b.groupByKey(11); > System.out.println(c.toDebugString()); > System.out.println(c.collect()); > JOptionPane.showMessageDialog(null, "Last Line"); > sc.stop(); > } > The debug string will appear as : > MappedValuesRDD[11] at groupByKey at Sample.java:45 (11 partitions) > MappedValuesRDD[10] at groupByKey at Sample.java:45 (11 partitions) > MapPartitionsRDD[9] at groupByKey at Sample.java:45 (11 partitions) > ShuffledRDD[8] at groupByKey at Sample.java:45 (11 partitions) > MappedRDD[7] at distinct at Sample.java:41 (1 partitions) > MapPartitionsRDD[6] at distinct at Sample.java:41 (1 partitions) > ShuffledRDD[5] at distinct at Sample.java:41 (1 partitions) > MapPartitionsRDD[4] at distinct at Sample.java:41 (1 partitions) > MappedRDD[3] at distinct at Sample.java:41 (1 partitions) > MappedRDD[2] at mapToPair at Sample.java:30 (1 partitions) > MappedRDD[1] at textFile at Sample.java:28 (1 partitions) > HadoopRDD[0] at textFile at Sample.java:28 (1 > partitions) > The problem is that the "list of stages" in the UI (localhost:4040) does not > mention anything about "groupBy" > In fact it mentions "distinct" twice: > stage 0 : collect > stage 1 : distinct > stage 2 : distinct > This is piece of misleading information can confuse the learner significantly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3312) Add a groupByKey which returns a special GroupBy object like in pandas
[ https://issues.apache.org/jira/browse/SPARK-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341727#comment-14341727 ] Sean Owen commented on SPARK-3312: -- Interesting, is the reduce / max / min in question here by key? We have the {{stats()}} method for RDDs of {{Double}} already to take care of this for a whole RDD. Rather than add an API method for the by-key case, it's possible to use {{StatCounter}} to compute all of these at once over a bunch of values that have been collected by key. Does that do the trick or is this something more? > Add a groupByKey which returns a special GroupBy object like in pandas > -- > > Key: SPARK-3312 > URL: https://issues.apache.org/jira/browse/SPARK-3312 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: holdenk >Priority: Minor > > A common pattern which causes problems for new Spark users is using > groupByKey followed by a reduce. I'd like to make a special version of > groupByKey which returns a groupBy object (like the Panda's groupby object). > The resulting class would have a number of functions (min,max, stats, reduce) > which could all be implemented efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2930) clarify docs on using webhdfs with spark.yarn.access.namenodes
[ https://issues.apache.org/jira/browse/SPARK-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-2930: - Component/s: YARN Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) Seems like a good easy improvement; I don't know webhdfs integration enough to write it, but does anyone who does have a moment to take this down? > clarify docs on using webhdfs with spark.yarn.access.namenodes > -- > > Key: SPARK-2930 > URL: https://issues.apache.org/jira/browse/SPARK-2930 > Project: Spark > Issue Type: Improvement > Components: Documentation, YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Minor > > The documentation of spark.yarn.access.namenodes talks about putting > namenodes in it and gives example with hdfs://. > I can also be used with webhdfs so we should clarify how to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6081) DriverRunner doesn't support pulling HTTP/HTTPS URIs
[ https://issues.apache.org/jira/browse/SPARK-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-6081: - Component/s: Spark Submit Priority: Minor (was: Major) > DriverRunner doesn't support pulling HTTP/HTTPS URIs > > > Key: SPARK-6081 > URL: https://issues.apache.org/jira/browse/SPARK-6081 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Reporter: Timothy Chen >Priority: Minor > > Standalone cluster mode according to the docs supports specifying http|https > jar urls, but when actually called the urls passed to the driver runner is > not able to pull http uris due to the usage of hadoopfs get. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5628) Add option to return spark-ec2 version
[ https://issues.apache.org/jira/browse/SPARK-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341724#comment-14341724 ] Apache Spark commented on SPARK-5628: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/4833 > Add option to return spark-ec2 version > -- > > Key: SPARK-5628 > URL: https://issues.apache.org/jira/browse/SPARK-5628 > Project: Spark > Issue Type: Improvement > Components: EC2 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: backport-needed > Fix For: 1.3.0, 1.4.0 > > > We need a {{--version}} option for {{spark-ec2}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6069) Deserialization Error ClassNotFoundException with Kryo, Guava 14
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-6069: - Priority: Critical (was: Major) Summary: Deserialization Error ClassNotFoundException with Kryo, Guava 14 (was: Deserialization Error ClassNotFound ) To clarify the properties situation, in Spark 1.2.x we have {{spark.files.userClassPathFirst}} _and_ {{spark.yarn.user.classpath.first}}. {{spark.driver.userClassPathFirst}} and {{spark.executor.userClassPathFirst}} are the new more logical versions in 1.3+ only. So ignore those. {{spark.yarn.user.classpath.first}} is actually what I am setting: https://github.com/OryxProject/oryx/blob/master/oryx-lambda/src/main/java/com/cloudera/oryx/lambda/BatchLayer.java#L153 But it sounds like you are not using YARN. Guava 14.0.1 is packaged with the app: https://github.com/OryxProject/oryx/blob/master/pom.xml#L233 I'm running this on 1.2.0 + YARN, and also local[*] + 1.3.0-SNAPSHOT. My ? is whether this is perhaps not working for standalone in 1.2 but does in 1.3, since there has been some overhaul to this mechanism since 1.2. > Deserialization Error ClassNotFoundException with Kryo, Guava 14 > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel >Priority: Critical > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6078) create event log directory automatically if not exists
[ https://issues.apache.org/jira/browse/SPARK-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-6078. -- Resolution: Duplicate > create event log directory automatically if not exists > -- > > Key: SPARK-6078 > URL: https://issues.apache.org/jira/browse/SPARK-6078 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.3.0 >Reporter: Zhang, Liye > > when event log directory does not exists, spark just throw > IlleagalArgumentException and stop the job. User need manually create > directory first. It's better to create the directory automatically if the > directory does not exists. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341705#comment-14341705 ] Pat Ferrel commented on SPARK-6069: --- I agree, that part makes me suspicious, which is why I’m not sure I trust my builds completely. No the ‘app' is one of the Spark-Mahout’s CLI drivers. The jar is a dependencies-reduced type thing that has only scopt and guava. In any case if I put -D:spark.executor.extraClassPath=/Users/pat/mahout/spark/target/mahout-spark_2.10-1.0-SNAPSHOT-dependency-reduced.jar on the command line, which passes the key=value to the SparkConf then the Mahout CLI driver it works. The test setup is a standalone localhost only cluster (not local[n]). It is started with sbin/start-all.sh The same jar is used to create the context and I’ve checked that and the contents of the jar quite carefully. On Feb 28, 2015, at 10:09 AM, Sean Owen (JIRA) wrote: [ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341699#comment-14341699 ] Sean Owen commented on SPARK-6069: -- Hm, the thing is I have been successfully running an app, without spark-submit, with kryo, with Guava 14 just like you and have never had a problem. I can't figure out what the difference is here. The kryo not-found exception is stranger still. You aren't packaging spark classes with your app right? -- This message was sent by Atlassian JIRA (v6.3.4#6332) > Deserialization Error ClassNotFound > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341699#comment-14341699 ] Sean Owen commented on SPARK-6069: -- Hm, the thing is I have been successfully running an app, without spark-submit, with kryo, with Guava 14 just like you and have never had a problem. I can't figure out what the difference is here. The kryo not-found exception is stranger still. You aren't packaging spark classes with your app right? > Deserialization Error ClassNotFound > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341689#comment-14341689 ] ASF GitHub Bot commented on SPARK-6069: --- Github user pferrel commented on the pull request: https://github.com/apache/mahout/pull/74#issuecomment-76536731 This seems to be a bug in Spark 1.2.1 SPARK-6069 Work around is to add the following either to your SparkConf in your app or -D:spark.executor.extraClassPath=/Users/pat/mahout/spark/target/mahout-spark_2.10-1.0-SNAPSHOT-dependency-reduced.jar To the mahout spark-xyz driver, where the jar contains any class that needs to be deserialized and the path exists on all workers. Therefor it currently looks like Spark 1.2.1 is not worth supporting. > Deserialization Error ClassNotFound > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6081) DriverRunner doesn't support pulling HTTP/HTTPS URIs
Timothy Chen created SPARK-6081: --- Summary: DriverRunner doesn't support pulling HTTP/HTTPS URIs Key: SPARK-6081 URL: https://issues.apache.org/jira/browse/SPARK-6081 Project: Spark Issue Type: Improvement Reporter: Timothy Chen Standalone cluster mode according to the docs supports specifying http|https jar urls, but when actually called the urls passed to the driver runner is not able to pull http uris due to the usage of hadoopfs get. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6081) DriverRunner doesn't support pulling HTTP/HTTPS URIs
[ https://issues.apache.org/jira/browse/SPARK-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341687#comment-14341687 ] Apache Spark commented on SPARK-6081: - User 'tnachen' has created a pull request for this issue: https://github.com/apache/spark/pull/4832 > DriverRunner doesn't support pulling HTTP/HTTPS URIs > > > Key: SPARK-6081 > URL: https://issues.apache.org/jira/browse/SPARK-6081 > Project: Spark > Issue Type: Improvement >Reporter: Timothy Chen > > Standalone cluster mode according to the docs supports specifying http|https > jar urls, but when actually called the urls passed to the driver runner is > not able to pull http uris due to the usage of hadoopfs get. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341679#comment-14341679 ] Pat Ferrel commented on SPARK-6069: --- No goodness from spark.executor.userClassPathFirst either--same error as above. I'll try again Monday when I'm back to my regular cluster. > Deserialization Error ClassNotFound > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341672#comment-14341672 ] Pat Ferrel edited comment on SPARK-6069 at 2/28/15 5:31 PM: Not sure I completely trust this result--I'm away from my HDFS cluster right now and so the standalone Spark is not quite that same as before... Also didn't see you spark.executor.userClassPathFirst comment--will try next. I tried: sparkConf.set("spark.files.userClassPathFirst", "true") But got the following error: 15/02/28 09:23:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.0.7): java.lang.NoClassDefFoundError: org/apache/spark/serializer/KryoRegistrator at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42) at org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:50) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:103) at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:103) at scala.Option.map(Option.scala:145) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:103) at org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:159) at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:121) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.spark.serializer.KryoRegistrator at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 36 more was (Author: pferrel): Not sure I completely trust this result--I'm away from my HDFS cluster right now and so the standalone Spark is not quite that same as before... I tried: sparkConf.set("spark.files.userClassPathFirst", "true") But got the following error: 15/02/28 09:23:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.0.7): java.lang.NoClassDefFoundError: org/apache/spark/serializer/KryoRegistrator at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defi
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341672#comment-14341672 ] Pat Ferrel commented on SPARK-6069: --- Not sure I completely trust this result--I'm away from my HDFS cluster right now and so the standalone Spark is not quite that same as before... I tried: sparkConf.set("spark.files.userClassPathFirst", "true") But got the following error: 15/02/28 09:23:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.0.7): java.lang.NoClassDefFoundError: org/apache/spark/serializer/KryoRegistrator at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42) at org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:50) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:103) at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:103) at scala.Option.map(Option.scala:145) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:103) at org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:159) at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:121) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.spark.serializer.KryoRegistrator at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 36 more > Deserialization Error ClassNotFound > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard
[jira] [Resolved] (SPARK-5993) Published Kafka-assembly JAR was empty in 1.3.0-RC1
[ https://issues.apache.org/jira/browse/SPARK-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5993. -- Resolution: Fixed Looks like this was resolved in https://github.com/apache/spark/pull/4753 > Published Kafka-assembly JAR was empty in 1.3.0-RC1 > --- > > Key: SPARK-5993 > URL: https://issues.apache.org/jira/browse/SPARK-5993 > Project: Spark > Issue Type: Bug > Components: Build, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Blocker > Fix For: 1.3.0 > > > This is because the maven build generated two Jars- > 1. an empty JAR file (since kafka-assembly has no code of its own) > 2. a assembly JAR file containing everything in a different location as 1 > The maven publishing plugin uploaded 1 and not 2. > Instead if 2 is not configure to generate in a different location, there is > only 1 jar containing everything, which gets published. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341655#comment-14341655 ] Sean Owen commented on SPARK-6069: -- This is an app-level setting as it's specific to the app. I would make the change in your app rather than globally. Although the new prop is spark.executor.userClassPathFirst I haven't double-checked whether that's 1.3+ only. Heh, set them all. > Deserialization Error ClassNotFound > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341640#comment-14341640 ] Pat Ferrel edited comment on SPARK-6069 at 2/28/15 4:48 PM: I can try it. Are you suggesting an app change or a master conf change? I need to add to conf/spar-default.conf? spark.files.userClassPathFirst true Or should I add that to the context via SparkConf? We have a standalone app that is not launched via spark-submit. But I guess your comment suggests an app change via SparkConf so I'll try that. was (Author: pferrel): I can try it. Are you suggesting an app change or a master conf change? I need to add to conf/spar-default.conf? spark.files.userClassPathFirst true Or should I add that to the context via SparkConf? We have a standalone app that is not launched via spark-submit. > Deserialization Error ClassNotFound > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341640#comment-14341640 ] Pat Ferrel edited comment on SPARK-6069 at 2/28/15 4:47 PM: I can try it. Are you suggesting an app change or a master conf change? I need to add to conf/spar-default.conf? spark.files.userClassPathFirst true Or should I add that to the context via SparkConf? We have a standalone app that is not launched via spark-submit. was (Author: pferrel): I can try it. Are you suggesting an app change or a master conf change? I need to add to conf/spar-default.conf? spark.files.userClassPathFirst true Or should I add that to the context via SparkConf? > Deserialization Error ClassNotFound > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341640#comment-14341640 ] Pat Ferrel commented on SPARK-6069: --- I can try it. Are you suggesting an app change or a master conf change? I need to add to conf/spar-default.conf? spark.files.userClassPathFirst true Or should I add that to the context via SparkConf? > Deserialization Error ClassNotFound > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6068) KMeans Parallel test may fail
[ https://issues.apache.org/jira/browse/SPARK-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341634#comment-14341634 ] Sean Owen commented on SPARK-6068: -- Yes, it seems like too much change to the existing version. From https://github.com/apache/spark/pull/2634 it seems like there are just some differences of opinion about what's worth doing and how. I think the only way forward would be to propose integration what you've done for the new version in the {{.ml}} package, because it's not clear the existing PR isn't going to proceed. I'm hoping to just drive a resolution to what is almost one big issue rather than leave it hanging. I'm looking at the ~8 JIRAs for k-means you created: https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20reporter%20%3D%20%22Derrick%20Burns%22%20AND%20resolution%20%3D%20Unresolved I assume a couple (like this one) are 'back-portable' from your work to the existing impl. Can we zap those and close them with a PR? This would be great and I'd like to help get those quick wins in. The rest sound like interdependent aspects of one proposal: create a new k-means implementation with different design and properties X / Y / Z, and use it in the new pipelines API. (I can't say whether this would be accepted or not but that's what's on the table). I'd rather coherently collect that rather than have it live in pieces in JIRA, esp. since I'm getting the sense these remaining pieces won't otherwise move forward. > KMeans Parallel test may fail > - > > Key: SPARK-6068 > URL: https://issues.apache.org/jira/browse/SPARK-6068 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.1 >Reporter: Derrick Burns > Labels: clustering > Original Estimate: 24h > Remaining Estimate: 24h > > The test "k-means|| initialization in KMeansSuite can fail when the random > number generator is truly random. > The test is predicated on the assumption that each round of K-Means || will > add at least one new cluster center. The current implementation of K-Means > || adds 2*k cluster centers with high probability. However, there is no > deterministic lower bound on the number of cluster centers added. > Choices are: > 1) change the KMeans || implementation to iterate on selecting points until > it has satisfied a lower bound on the number of points chosen. > 2) eliminate the test > 3) ignore the problem and depend on the random number generator to sample the > space in a lucky manner. > Option (1) is most in keeping with the contract that KMeans || should provide > a precise number of cluster centers when possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341625#comment-14341625 ] Sean Owen commented on SPARK-6069: -- No, I am not suggesting that {{--conf spark.executor.extraClassPath}} is a right way to do this, but {{userClassPathFirst}} may be. There is no class conflict problem, but there is definitely a classloader visibility and thus ordering problem. It's worth a try if you have a second, since I would think this is the right way to address this and really any of this type of issue. It remains to be seen though. > Deserialization Error ClassNotFound > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341608#comment-14341608 ] Pat Ferrel commented on SPARK-6069: --- It may be a dup, [~vanzin] said as much but I couldn't find the obvious Jira. Any time the work around is to use "spark-submit --conf spark.executor.extraClassPath=/guava.jar blah” that means a standalone apps must have hard coded paths that are honored on every worker. And as you know a lib is pretty much blocked from use of this version of Spark—hence the blocker severity. We’ll have to warn people to not use this version of Spark. I could easily be wrong but userClassPathFirst doesn’t seem to be the issue. There is no class conflict. > Deserialization Error ClassNotFound > > > Key: SPARK-6069 > URL: https://issues.apache.org/jira/browse/SPARK-6069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 > Environment: Standalone one worker cluster on localhost, or any > cluster >Reporter: Pat Ferrel > > A class is contained in the jars passed in when creating a context. It is > registered with kryo. The class (Guava HashBiMap) is created correctly from > an RDD and broadcast but the deserialization fails with ClassNotFound. > The work around is to hard code the path to the jar and make it available on > all workers. Hard code because we are creating a library so there is no easy > way to pass in to the app something like: > spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-5427: -- Labels: math (was: ) > Add support for floor function in Spark SQL > --- > > Key: SPARK-5427 > URL: https://issues.apache.org/jira/browse/SPARK-5427 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu > Labels: math > > floor() function is supported in Hive SQL. > This issue is to add floor() function to Spark SQL. > Related thread: http://search-hadoop.com/m/JW1q563fc22 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1965) Spark UI throws NPE on trying to load the app page for non-existent app
[ https://issues.apache.org/jira/browse/SPARK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1965. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 4777 [https://github.com/apache/spark/pull/4777] > Spark UI throws NPE on trying to load the app page for non-existent app > --- > > Key: SPARK-1965 > URL: https://issues.apache.org/jira/browse/SPARK-1965 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.0.0 >Reporter: Kay Ousterhout >Priority: Minor > Fix For: 1.4.0 > > > If you try to load the Spark UI for an application that doesn't exist: > sparkHost:8080/app/?appId=foobar > The UI throws a NPE. The problem is in ApplicationPage.scala -- Spark > proceeds even if the "app" variable is null. We should handle this more > gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-1965) Spark UI throws NPE on trying to load the app page for non-existent app
[ https://issues.apache.org/jira/browse/SPARK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-1965: Assignee: Sean Owen > Spark UI throws NPE on trying to load the app page for non-existent app > --- > > Key: SPARK-1965 > URL: https://issues.apache.org/jira/browse/SPARK-1965 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.0.0 >Reporter: Kay Ousterhout >Assignee: Sean Owen >Priority: Minor > Fix For: 1.4.0 > > > If you try to load the Spark UI for an application that doesn't exist: > sparkHost:8080/app/?appId=foobar > The UI throws a NPE. The problem is in ApplicationPage.scala -- Spark > proceeds even if the "app" variable is null. We should handle this more > gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5983) Don't respond to HTTP TRACE in HTTP-based UIs
[ https://issues.apache.org/jira/browse/SPARK-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5983: - Component/s: (was: Spark Core) Web UI Labels: security (was: ) Resolved by https://github.com/apache/spark/pull/4765 > Don't respond to HTTP TRACE in HTTP-based UIs > - > > Key: SPARK-5983 > URL: https://issues.apache.org/jira/browse/SPARK-5983 > Project: Spark > Issue Type: Improvement > Components: Web UI >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > Labels: security > Fix For: 1.4.0 > > > This was flagged a while ago during a routine security scan: the HTTP-based > Spark services respond to an HTTP TRACE command. This is basically an HTTP > verb that has no practical use, and has a pretty theoretical chance of being > an exploit vector. It is flagged as a security issue by one common tool, > however. > Spark's HTTP services are based on Jetty, which by default does not enable > TRACE (like Tomcat). However, the services do reply to TRACE requests. I > think it is because the use of Jetty is pretty 'raw' and does not enable much > of the default additional configuration you might get by using Jetty as a > standalone server. > I know that it is at least possible to stop the reply to TRACE with a few > extra lines of code, so I think it is worth shutting off TRACE requests. > Although the security risk is quite theoretical, it should be easy to fix and > bring the Spark services into line with the common default of HTTP servers > today. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org