[jira] [Commented] (SPARK-3889) JVM dies with SIGBUS, resulting in ConnectionManager failed ACK

2015-02-28 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341997#comment-14341997
 ] 

Mridul Muralidharan commented on SPARK-3889:


[~adav] I have seen this a lot in my recent tests with the latest 1.3 and with 
the stable 1.2.1 version.
Though usually the reasons are legitimate upon investigation - like remote side 
died via SIGTERM from yarn, etc.

So would be good to know if this happened legitimately or due to some other 
bug/issue.

> JVM dies with SIGBUS, resulting in ConnectionManager failed ACK
> ---
>
> Key: SPARK-3889
> URL: https://issues.apache.org/jira/browse/SPARK-3889
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Aaron Davidson
>Assignee: Aaron Davidson
>Priority: Critical
> Fix For: 1.2.0
>
>
> Here's the first part of the core dump, possibly caused by a job which 
> shuffles a lot of very small partitions.
> {code}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0x7fa5885fcdb0, pid=488, tid=140343502632704
> #
> # JRE version: 7.0_25-b30
> # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # v  ~StubRoutines::jbyte_disjoint_arraycopy
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please include
> # instructions on how to reproduce the bug and visit:
> #   https://bugs.launchpad.net/ubuntu/+source/openjdk-7/
> #
> ---  T H R E A D  ---
> Current thread (0x7fa4b0631000):  JavaThread "Executor task launch 
> worker-170" daemon [_thread_in_Java, id=6783, 
> stack(0x7fa4448ef000,0x7fa4449f)]
> siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), 
> si_addr=0x7fa428f79000
> {code}
> Here is the only useful content I can find related to JVM and SIGBUS from 
> Google: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=976664
> It appears it may be related to disposing byte buffers, which we do in the 
> ConnectionManager -- we mmap shuffle files via ManagedBuffer and dispose of 
> them in BufferMessage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost

2015-02-28 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-6075.
---
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 4835
[https://github.com/apache/spark/pull/4835]

> After SPARK-3885, some tasks' accumulator updates may be lost
> -
>
> Key: SPARK-6075
> URL: https://issues.apache.org/jira/browse/SPARK-6075
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.4.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Blocker
> Fix For: 1.4.0
>
>
> It looks like some of the AccumulatorSuite tests have started failing 
> nondeterministically on Jenkins.  The errors seem to be due to lost / missing 
> accumulator updates, e.g.
> {code}
> Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901
> {code}
> This could somehow be related to SPARK-3885 / 
> https://github.com/apache/spark/pull/4021, a patch to garbage-collect 
> accumulators, which was only merged into master.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/
> I think I've figured it out: consider the lifecycle of an accumulator in a 
> task, say ShuffleMapTask: on the executor, each task deserializes its own 
> copy of the RDD inside of its runTask method, so the strong reference to the 
> RDD disappears at the end of runTask. In Executor.run(), we call 
> Accumulators.values after runTask has exited, so there's a small window in 
> which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well 
> because there are no longer any strong references to them.
> The fix is to keep strong references in localAccums, since we clear this at 
> the end of each task anyways. I'm glad that I was able to figure out 
> precisely why this was necessary and sorry that I missed this during review; 
> I'll submit a fix shortly. In terms of preventative measures, it might be a 
> good idea to write up the lifetime / lifecycle of objects' strong references 
> whenever we're using WeakReferences, since the process of explicitly writing 
> that out would prevent these sorts of mistakes in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3785) Support off-loading computations to a GPU

2015-02-28 Thread Tycho Grouwstra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341974#comment-14341974
 ] 

Tycho Grouwstra edited comment on SPARK-3785 at 3/1/15 6:32 AM:


I was wondering, it seems 
[ArrayFire|http://www.arrayfire.com/docs/group__arrayfire__func.htm] already 
parallelized a number of mathematical/reductor functions for C(++) arrays. If 
Spark RDDS/DataFrames expose some array interface for columns, might it be 
possible to use those through JNI? Not sure there'd be tangible performance 
gains without using APUs, but seemed interesting to me.


was (Author: tycho01):
Hm, tried commenting a bit earlier but seems it failed.

I was wondering, it seems 
[ArrayFire](http://www.arrayfire.com/docs/group__arrayfire__func.htm) already 
parallelized a number of mathematical/reductor functions for C(++) arrays. If 
Spark RDDS/DataFrames expose some array interface for columns, might it be 
possible to use those through JNI? Not sure there'd be tangible performance 
gains without using APUs, but seemed interesting to me.


> Support off-loading computations to a GPU
> -
>
> Key: SPARK-3785
> URL: https://issues.apache.org/jira/browse/SPARK-3785
> Project: Spark
>  Issue Type: Brainstorming
>  Components: MLlib
>Reporter: Thomas Darimont
>Priority: Minor
>
> Are there any plans to adding support for off-loading computations to the 
> GPU, e.g. via an open-cl binding? 
> http://www.jocl.org/
> https://code.google.com/p/javacl/
> http://lwjgl.org/wiki/index.php?title=OpenCL_in_LWJGL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3785) Support off-loading computations to a GPU

2015-02-28 Thread Tycho Grouwstra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341974#comment-14341974
 ] 

Tycho Grouwstra commented on SPARK-3785:


Hm, tried commenting a bit earlier but seems it failed.

I was wondering, it seems 
[ArrayFire](http://www.arrayfire.com/docs/group__arrayfire__func.htm) already 
parallelized a number of mathematical/reductor functions for C(++) arrays. If 
Spark RDDS/DataFrames expose some array interface for columns, might it be 
possible to use those through JNI? Not sure there'd be tangible performance 
gains without using APUs, but seemed interesting to me.


> Support off-loading computations to a GPU
> -
>
> Key: SPARK-3785
> URL: https://issues.apache.org/jira/browse/SPARK-3785
> Project: Spark
>  Issue Type: Brainstorming
>  Components: MLlib
>Reporter: Thomas Darimont
>Priority: Minor
>
> Are there any plans to adding support for off-loading computations to the 
> GPU, e.g. via an open-cl binding? 
> http://www.jocl.org/
> https://code.google.com/p/javacl/
> http://lwjgl.org/wiki/index.php?title=OpenCL_in_LWJGL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container

2015-02-28 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341970#comment-14341970
 ] 

Aaron Davidson commented on SPARK-6056:
---

It's possible that it's actually the shuffle-read that's actually doing 
memory-mapping -- please try setting spark.storage.memoryMapThreshold to around 
1073741824 (1 GB) to disable this form of memory mapping for the test.

> Unlimit offHeap memory use cause RM killing the container
> -
>
> Key: SPARK-6056
> URL: https://issues.apache.org/jira/browse/SPARK-6056
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.1
>Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But once the allocated memory size reach the capacity of the overhead momery 
> set in yarn, this executor will be killed.
> I wrote a simple code to test it:
> {code:title=test.scala|borderStyle=solid}
> import org.apache.spark.storage._
> import org.apache.spark._
> val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
> Array[Byte](10*1024*1024)).persist
> bufferRdd.count
> val part =  bufferRdd.partitions(0)
> val sparkEnv = SparkEnv.get
> val blockMgr = sparkEnv.blockManager
> def test = {
> val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
> val resultIt = 
> blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
> val len = resultIt.map(_.length).sum
> println(s"[${Thread.currentThread.getId}] get block length = $len")
> }
> def test_driver(count:Int, parallel:Int)(f: => Unit) = {
> val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
> val taskSupport  = new 
> scala.collection.parallel.ForkJoinTaskSupport(tpool)
> val parseq = (1 to count).par
> parseq.tasksupport = taskSupport
> parseq.foreach(x=>f)
> tpool.shutdown
> tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS)
> }
> {code}
> progress:
> 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
> 2. :load test.scala in spark-shell
> 3. use such comman to catch executor on slave node
> {code}
> pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p 
> $pid|grep $pid
> {code}
> 4. test_driver(20,100)(test) in spark-shell
> 5. watch the output of the command on slave node
> If use multi-thread to get len, the physical memery will soon   exceed the 
> limit set by spark.yarn.executor.memoryOverhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3889) JVM dies with SIGBUS, resulting in ConnectionManager failed ACK

2015-02-28 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341962#comment-14341962
 ] 

Aaron Davidson commented on SPARK-3889:
---

This may be a new issue. I would open a new ticket, especially because the 
"ConnectionManager failed ACK" thing shouldn't be happening in 1.2.1; there 
should be different symptoms and perhaps a different cause as well.

A last ditch thing to try, by the way, is to up 
spark.storage.memoryMapThreshold to a very large number (e.g., 1 GB in bytes) 
and see if it still occurs -- if so, then please report more details about your 
workload and any other possible symptoms you see.

> JVM dies with SIGBUS, resulting in ConnectionManager failed ACK
> ---
>
> Key: SPARK-3889
> URL: https://issues.apache.org/jira/browse/SPARK-3889
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Aaron Davidson
>Assignee: Aaron Davidson
>Priority: Critical
> Fix For: 1.2.0
>
>
> Here's the first part of the core dump, possibly caused by a job which 
> shuffles a lot of very small partitions.
> {code}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0x7fa5885fcdb0, pid=488, tid=140343502632704
> #
> # JRE version: 7.0_25-b30
> # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # v  ~StubRoutines::jbyte_disjoint_arraycopy
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please include
> # instructions on how to reproduce the bug and visit:
> #   https://bugs.launchpad.net/ubuntu/+source/openjdk-7/
> #
> ---  T H R E A D  ---
> Current thread (0x7fa4b0631000):  JavaThread "Executor task launch 
> worker-170" daemon [_thread_in_Java, id=6783, 
> stack(0x7fa4448ef000,0x7fa4449f)]
> siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), 
> si_addr=0x7fa428f79000
> {code}
> Here is the only useful content I can find related to JVM and SIGBUS from 
> Google: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=976664
> It appears it may be related to disposing byte buffers, which we do in the 
> ConnectionManager -- we mmap shuffle files via ManagedBuffer and dispose of 
> them in BufferMessage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-3889) JVM dies with SIGBUS, resulting in ConnectionManager failed ACK

2015-02-28 Thread Aaron Davidson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Davidson updated SPARK-3889:
--
Comment: was deleted

(was: The only place we memory map in 1.1 is this method: 
https://github.com/apache/spark/blob/branch-1.1/core/src/main/scala/org/apache/spark/storage/DiskStore.scala#L106

This threshold is configurable with "spark.storage.memoryMapThreshold" -- we 
upped the default from 2 KB to 2 MB in 1.2, which you could try here as well.)

> JVM dies with SIGBUS, resulting in ConnectionManager failed ACK
> ---
>
> Key: SPARK-3889
> URL: https://issues.apache.org/jira/browse/SPARK-3889
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Aaron Davidson
>Assignee: Aaron Davidson
>Priority: Critical
> Fix For: 1.2.0
>
>
> Here's the first part of the core dump, possibly caused by a job which 
> shuffles a lot of very small partitions.
> {code}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0x7fa5885fcdb0, pid=488, tid=140343502632704
> #
> # JRE version: 7.0_25-b30
> # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # v  ~StubRoutines::jbyte_disjoint_arraycopy
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please include
> # instructions on how to reproduce the bug and visit:
> #   https://bugs.launchpad.net/ubuntu/+source/openjdk-7/
> #
> ---  T H R E A D  ---
> Current thread (0x7fa4b0631000):  JavaThread "Executor task launch 
> worker-170" daemon [_thread_in_Java, id=6783, 
> stack(0x7fa4448ef000,0x7fa4449f)]
> siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), 
> si_addr=0x7fa428f79000
> {code}
> Here is the only useful content I can find related to JVM and SIGBUS from 
> Google: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=976664
> It appears it may be related to disposing byte buffers, which we do in the 
> ConnectionManager -- we mmap shuffle files via ManagedBuffer and dispose of 
> them in BufferMessage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container

2015-02-28 Thread SaintBacchus (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341957#comment-14341957
 ] 

SaintBacchus edited comment on SPARK-6056 at 3/1/15 6:04 AM:
-

[~adav] Thx for comment. I can't understand what you say clearly. Do you mean 
if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only 
allocat the heap memory? Is it right?
I test it again.And I had update the test program in the Description.
I had set the _spark.shuffle.io.preferDirectBufs_  and when I type 
*test_driver(20,100)(test)* , the result is this:
_ 76602 root  20   0 1597m 339m  25m S0  0.1   0:04.89 java   _
 _76602 root  20   0 {color:red} 1777m {color} 1.0g  26m S   99  0.3   
0:07.88 java   _
 
_ 76602 root  20   0 1597m 880m  26m S4  0.3   0:07.99 java_

The red num is visual memory and it had raised about 180mb in the moment and 
the test case total transfor 200mb data (20 * 10MB) from executor to driver.
I think it's a problem. If I use 40 threads to get the result, it will need 
near 400mb momery and soon exceed the limit of yarn, fanally killed by yarn.
If there is a way to limit the peek use of memory, it will be fine. In addtion, 
I though the  number of remote fetch block threads in user side is 
uncontrollable, it's better to be controlled in spark.
[~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 
release version.
In my test case, I use the default memory: 1G executor and 384 overhead. But in 
the real case, momery is much more.


was (Author: carlmartin):
[~adav] Thx for comment. I can't understand what you say clearly. Do you mean 
if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only 
allocat the heap memory? Is it right?
I test it again.And I had update the test program in the Description.
I had set the _spark.shuffle.io.preferDirectBufs_  and when I type 
*test_driver(20,100)(test)* , the result is this:
_ 76602 root  20   0 1597m 339m  25m S0  0.1   0:04.89 java   _
 _76602 root  20   0 {color:red} 1777m {color} 1.0g  26m S   99  0.3   
0:07.88 java   _
 
_ 76602 root  20   0 1597m 880m  26m S4  0.3   0:07.99 java_

The red num is visual memory and it had raised about 180mb in the moment and 
total transfor 200mb data (20 * 10MB) from executor to driver.
I think it's a problem. If I use 40 threads to get the result, it will need 
near 400mb momery and soon exceed the limit of yarn, fanally killed by yarn.
If there is a way to limit the peek use of memory, it will be fine. In addtion, 
I though the  number of remote fetch block threads in user side is 
uncontrollable, it's better to be controlled in spark.
[~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 
release version.
In my test case, I use the default memory: 1G executor and 384 overhead. But in 
the real case, momery is much more.

> Unlimit offHeap memory use cause RM killing the container
> -
>
> Key: SPARK-6056
> URL: https://issues.apache.org/jira/browse/SPARK-6056
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.1
>Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But once the allocated memory size reach the capacity of the overhead momery 
> set in yarn, this executor will be killed.
> I wrote a simple code to test it:
> {code:title=test.scala|borderStyle=solid}
> import org.apache.spark.storage._
> import org.apache.spark._
> val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
> Array[Byte](10*1024*1024)).persist
> bufferRdd.count
> val part =  bufferRdd.partitions(0)
> val sparkEnv = SparkEnv.get
> val blockMgr = sparkEnv.blockManager
> def test = {
> val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
> val resultIt = 
> blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
> val len = resultIt.map(_.length).sum
> println(s"[${Thread.currentThread.getId}] get block length = $len")
> }
> def test_driver(count:Int, parallel:Int)(f: => Unit) = {
> val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
> val taskSupport  = ne

[jira] [Comment Edited] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container

2015-02-28 Thread SaintBacchus (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341957#comment-14341957
 ] 

SaintBacchus edited comment on SPARK-6056 at 3/1/15 6:03 AM:
-

[~adav] Thx for comment. I can't understand what you say clearly. Do you mean 
if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only 
allocat the heap memory? Is it right?
I test it again.And I had update the test program in the Description.
I had set the _spark.shuffle.io.preferDirectBufs_  and when I type 
*test_driver(20,100)(test)* , the result is this:
_ 76602 root  20   0 1597m 339m  25m S0  0.1   0:04.89 java   _
 _76602 root  20   0 {color:red} 1777m {color} 1.0g  26m S   99  0.3   
0:07.88 java   _
 
_ 76602 root  20   0 1597m 880m  26m S4  0.3   0:07.99 java_

The red num is visual memory and it had raised about 180mb in the moment and 
total transfor 200mb data (20 * 10MB) from executor to driver.
I think it's a problem. If I use 40 threads to get the result, it will need 
near 400mb momery and soon exceed the limit of yarn, fanally killed by yarn.
If there is a way to limit the peek use of memory, it will be fine. In addtion, 
I though the  number of remote fetch block threads in user side is 
uncontrollable, it's better to be controlled in spark.
[~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 
release version.
In my test case, I use the default memory: 1G executor and 384 overhead. But in 
the real case, momery is much more.


was (Author: carlmartin):
[~adav] Thx for comment. I can't understand what you say clearly. Do you mean 
if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only 
allocat the heap memory? Is it right?
I test it again.And I had update the test program in the Description.
I had set the _spark.shuffle.io.preferDirectBufs_  and when I type 
*test_driver(20,100)(test)* , the result is this:
_ 76602 root  20   0 1597m 339m  25m S0  0.1   0:04.89 java   _
 _76602 root  20   0 {color:red} 1777m {color} 1.0g  26m S   99  0.3   
0:07.88 java   _
 
_ 76602 root  20   0 1597m 880m  26m S4  0.3   0:07.99 java_

The red num is visual memory and it had raised about 180mb in the moment and 
total transfor 200mb data (20 * 10MB) from executor to driver.
I think it's a big problem. If I use 40 threads to get the result, it will need 
near 400mb momery and so exceed the limit of yarn fanally killed by yarn.
If there is a way to limit the peek use of memory, it will be fine. In addtion, 
I though the user side number of remote fetch block threads is uncontrollable, 
it's better to be controlled in spark.
[~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 
release version.
In my test case, I use the default memory: 1G executor and 384 overhead. But in 
the real case, momery is much more.

> Unlimit offHeap memory use cause RM killing the container
> -
>
> Key: SPARK-6056
> URL: https://issues.apache.org/jira/browse/SPARK-6056
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.1
>Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But once the allocated memory size reach the capacity of the overhead momery 
> set in yarn, this executor will be killed.
> I wrote a simple code to test it:
> {code:title=test.scala|borderStyle=solid}
> import org.apache.spark.storage._
> import org.apache.spark._
> val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
> Array[Byte](10*1024*1024)).persist
> bufferRdd.count
> val part =  bufferRdd.partitions(0)
> val sparkEnv = SparkEnv.get
> val blockMgr = sparkEnv.blockManager
> def test = {
> val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
> val resultIt = 
> blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
> val len = resultIt.map(_.length).sum
> println(s"[${Thread.currentThread.getId}] get block length = $len")
> }
> def test_driver(count:Int, parallel:Int)(f: => Unit) = {
> val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
> val taskSupport  = new 
> scala.collec

[jira] [Comment Edited] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container

2015-02-28 Thread SaintBacchus (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341957#comment-14341957
 ] 

SaintBacchus edited comment on SPARK-6056 at 3/1/15 6:01 AM:
-

[~adav] Thx for comment. I can't understand what you say clearly. Do you mean 
if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only 
allocat the heap memory? Is it right?
I test it again.And I had update the test program in the Description.
I had set the _spark.shuffle.io.preferDirectBufs_  and when I type 
*test_driver(20,100)(test)* , the result is this:
_ 76602 root  20   0 1597m 339m  25m S0  0.1   0:04.89 java   _
 _76602 root  20   0 {color:red} 1777m {color} 1.0g  26m S   99  0.3   
0:07.88 java   _
 
_ 76602 root  20   0 1597m 880m  26m S4  0.3   0:07.99 java_

The red num is visual memory and it had raised about 180mb in the moment and 
total transfor 200mb data (20 * 10MB) from executor to driver.
I think it's a big problem. If I use 40 threads to get the result, it will need 
near 400mb momery and so exceed the limit of yarn fanally killed by yarn.
If there is a way to limit the peek use of memory, it will be fine. In addtion, 
I though the user side number of remote fetch block threads is uncontrollable, 
it's better to be controlled in spark.
[~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 
release version.
In my test case, I use the default memory: 1G executor and 384 overhead. But in 
the real case, momery is much more.


was (Author: carlmartin):
[~adav] Thx for comment. I can't understand what you say clearly. Do you mean 
if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only 
allocat the heap memory? Is it right?
I test it again.And I had update the test program in the Description.
I had set the _spark.shuffle.io.preferDirectBufs_  and when I type 
*test_driver(20,100)(test)* , the result is this:
_ 76602 root  20   0 1597m 339m  25m S0  0.1   0:04.89 java   _
 _76602 root  20   0 {color:red} 1777m {color} 1.0g  26m S   99  0.3   
0:07.88 java   _
 
_ 76602 root  20   0 1597m 880m  26m S4  0.3   0:07.99 java_

The red num had raised about 180mb in the moment and total transfor 200mb data 
(20 * 10MB) from executor to driver.
I think it's a big problem. If I use 40 threads to get the result, it will need 
near 400mb momery and so exceed the limit of yarn fanally killed by yarn.
If there is a way to limit the peek use of memory, it will be fine. In addtion, 
I though the user side number of remote fetch block threads is uncontrollable, 
it's better to be controlled in spark.
[~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 
release version.
In my test case, I use the default memory: 1G executor and 384 overhead. But in 
the real case, momery is much more.

> Unlimit offHeap memory use cause RM killing the container
> -
>
> Key: SPARK-6056
> URL: https://issues.apache.org/jira/browse/SPARK-6056
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.1
>Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But once the allocated memory size reach the capacity of the overhead momery 
> set in yarn, this executor will be killed.
> I wrote a simple code to test it:
> {code:title=test.scala|borderStyle=solid}
> import org.apache.spark.storage._
> import org.apache.spark._
> val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
> Array[Byte](10*1024*1024)).persist
> bufferRdd.count
> val part =  bufferRdd.partitions(0)
> val sparkEnv = SparkEnv.get
> val blockMgr = sparkEnv.blockManager
> def test = {
> val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
> val resultIt = 
> blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
> val len = resultIt.map(_.length).sum
> println(s"[${Thread.currentThread.getId}] get block length = $len")
> }
> def test_driver(count:Int, parallel:Int)(f: => Unit) = {
> val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
> val taskSupport  = new 
> scala.collection.parallel.ForkJoinTaskS

[jira] [Commented] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container

2015-02-28 Thread SaintBacchus (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341957#comment-14341957
 ] 

SaintBacchus commented on SPARK-6056:
-

[~adav] Thx for comment. I can't understand what you say clearly. Do you mean 
if 'spark.shuffle.io.preferDirectBufs ' was set to be false, would netty only 
allocat the heap memory? Is it right?
I test it again.And I had update the test program in the Description.
I had set the _spark.shuffle.io.preferDirectBufs_  and when I type 
*test_driver(20,100)(test)* , the result is this:
_ 76602 root  20   0 1597m 339m  25m S0  0.1   0:04.89 java   _
 _76602 root  20   0 {color:red} 1777m {color} 1.0g  26m S   99  0.3   
0:07.88 java   _
 
_ 76602 root  20   0 1597m 880m  26m S4  0.3   0:07.99 java_

The red num had raised about 180mb in the moment and total transfor 200mb data 
(20 * 10MB) from executor to driver.
I think it's a big problem. If I use 40 threads to get the result, it will need 
near 400mb momery and so exceed the limit of yarn fanally killed by yarn.
If there is a way to limit the peek use of memory, it will be fine. In addtion, 
I though the user side number of remote fetch block threads is uncontrollable, 
it's better to be controlled in spark.
[~lianhuiwang] I use the recent spark in github and I also tested the 1.2.0 
release version.
In my test case, I use the default memory: 1G executor and 384 overhead. But in 
the real case, momery is much more.

> Unlimit offHeap memory use cause RM killing the container
> -
>
> Key: SPARK-6056
> URL: https://issues.apache.org/jira/browse/SPARK-6056
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.1
>Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But once the allocated memory size reach the capacity of the overhead momery 
> set in yarn, this executor will be killed.
> I wrote a simple code to test it:
> {code:title=test.scala|borderStyle=solid}
> import org.apache.spark.storage._
> import org.apache.spark._
> val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
> Array[Byte](10*1024*1024)).persist
> bufferRdd.count
> val part =  bufferRdd.partitions(0)
> val sparkEnv = SparkEnv.get
> val blockMgr = sparkEnv.blockManager
> def test = {
> val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
> val resultIt = 
> blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
> val len = resultIt.map(_.length).sum
> println(s"[${Thread.currentThread.getId}] get block length = $len")
> }
> def test_driver(count:Int, parallel:Int)(f: => Unit) = {
> val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
> val taskSupport  = new 
> scala.collection.parallel.ForkJoinTaskSupport(tpool)
> val parseq = (1 to count).par
> parseq.tasksupport = taskSupport
> parseq.foreach(x=>f)
> tpool.shutdown
> tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS)
> }
> {code}
> progress:
> 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
> 2. :load test.scala in spark-shell
> 3. use such comman to catch executor on slave node
> {code}
> pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p 
> $pid|grep $pid
> {code}
> 4. test_driver(20,100)(test) in spark-shell
> 5. watch the output of the command on slave node
> If use multi-thread to get len, the physical memery will soon   exceed the 
> limit set by spark.yarn.executor.memoryOverhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container

2015-02-28 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-6056:

Description: 
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:
{code:title=test.scala|borderStyle=solid}
import org.apache.spark.storage._
import org.apache.spark._
val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
Array[Byte](10*1024*1024)).persist
bufferRdd.count
val part =  bufferRdd.partitions(0)
val sparkEnv = SparkEnv.get
val blockMgr = sparkEnv.blockManager
def test = {
val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
val len = resultIt.map(_.length).sum
println(s"[${Thread.currentThread.getId}] get block length = $len")
}

def test_driver(count:Int, parallel:Int)(f: => Unit) = {
val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
val taskSupport  = new scala.collection.parallel.ForkJoinTaskSupport(tpool)
val parseq = (1 to count).par
parseq.tasksupport = taskSupport
parseq.foreach(x=>f)

tpool.shutdown
tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS)
}
{code}
progress:
1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
2. :load test.scala in spark-shell
3. use such comman to catch executor on slave node
{code}
pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print $1}');top -b -p 
$pid|grep $pid
{code}
4. test_driver(20,100)(test) in spark-shell
5. watch the output of the command on slave node

If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead

  was:
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:
{code:title=test.scala|borderStyle=solid}
import org.apache.spark.storage._
import org.apache.spark._
val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
Array[Byte](10*1024*1024)).persist
bufferRdd.count
val part =  bufferRdd.partitions(0)
val sparkEnv = SparkEnv.get
val blockMgr = sparkEnv.blockManager
def test = {
val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
val len = resultIt.map(_.length).sum
println(s"[${Thread.currentThread.getId}] get block length = $len")
}

def test_driver(count:Int, parallel:Int)(f: => Unit) = {
val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
val taskSupport  = new scala.collection.parallel.ForkJoinTaskSupport(tpool)
val parseq = (1 to count).par
parseq.tasksupport = taskSupport
parseq.foreach(x=>f)

tpool.shutdown
tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS)
}
{code}
progress:
1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
2. :load test.scala in spark-shell
3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print 
$1}');top -b -p $pid|grep $pid} to catch executor on slave node
4. test_driver(20,100)(test) in spark-shell
5. watch the output of the command on slave node

If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead


> Unlimit offHeap memory use cause RM killing the container
> -
>
> Key: SPARK-6056
> URL: https://issues.apache.org/jira/browse/SPARK-6056
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.1
>Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But o

[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container

2015-02-28 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-6056:

Description: 
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:
{code:title=test.scala|borderStyle=solid}
import org.apache.spark.storage._
import org.apache.spark._
val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
Array[Byte](10*1024*1024)).persist
bufferRdd.count
val part =  bufferRdd.partitions(0)
val sparkEnv = SparkEnv.get
val blockMgr = sparkEnv.blockManager
def test = {
val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
val len = resultIt.map(_.length).sum
println(s"[${Thread.currentThread.getId}] get block length = $len")
}

def test_driver(count:Int, parallel:Int)(f: => Unit) = {
val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
val taskSupport  = new scala.collection.parallel.ForkJoinTaskSupport(tpool)
val parseq = (1 to count).par
parseq.tasksupport = taskSupport
parseq.foreach(x=>f)

tpool.shutdown
tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS)
}
{code}
progress:
1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
2. :load test.scala in spark-shell
3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print 
$1}');top -b -p $pid|grep $pid} to catch executor on slave node
4. test_driver(20,100)(test) in spark-shell
5. watch the output of the command on slave node

If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead

  was:
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:

progress:
1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
2. :load test.scala in spark-shell
3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print 
$1}');top -b -p $pid|grep $pid} to catch executor on slave node
4. test_driver(20,100)(test) in spark-shell
5. watch the output of the command on slave node

If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead


> Unlimit offHeap memory use cause RM killing the container
> -
>
> Key: SPARK-6056
> URL: https://issues.apache.org/jira/browse/SPARK-6056
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.1
>Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But once the allocated memory size reach the capacity of the overhead momery 
> set in yarn, this executor will be killed.
> I wrote a simple code to test it:
> {code:title=test.scala|borderStyle=solid}
> import org.apache.spark.storage._
> import org.apache.spark._
> val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
> Array[Byte](10*1024*1024)).persist
> bufferRdd.count
> val part =  bufferRdd.partitions(0)
> val sparkEnv = SparkEnv.get
> val blockMgr = sparkEnv.blockManager
> def test = {
> val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
> val resultIt = 
> blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
> val len = resultIt.map(_.length).sum
> println(s"[${Thread.currentThread.getId}] get block length = $len")
> }
> def test_driver(count:Int, parallel:Int)(f: => Unit) = {
> val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
> val taskSupport  = new 
> scala.collection.parallel.ForkJoinTaskSupport(tpool)
> val parseq 

[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container

2015-02-28 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-6056:

Description: 
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:

progress:
1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
2. :load test.scala in spark-shell
3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print 
$1}');top -b -p $pid|grep $pid} to catch executor on slave node
4. test_driver(20,100)(test) in spark-shell
5. watch the output of the command on slave node

If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead

  was:
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:
```test.scala
import org.apache.spark.storage._
import org.apache.spark._
val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
Array[Byte](10*1024*1024)).persist
bufferRdd.count
val part =  bufferRdd.partitions(0)
val sparkEnv = SparkEnv.get
val blockMgr = sparkEnv.blockManager
def test =
 {
val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
val len = resultIt.map(_.length).sum
}

def test_driver(count:Int, parallel:Int)(f: => Unit) = 
{
val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
val taskSupport  = new scala.collection.parallel.ForkJoinTaskSupport(tpool)
val parseq = (1 to count).par
parseq.tasksupport = taskSupport
parseq.foreach(x=>f)

tpool.shutdown
tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS)
}
```
progress:
1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
2. :load test.scala in spark-shell
3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print 
$1}');top -b -p $pid|grep $pid} to catch executor on slave node
4. test_driver(20,100)(test) in spark-shell
5. watch the output of the command on slave node

If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead


> Unlimit offHeap memory use cause RM killing the container
> -
>
> Key: SPARK-6056
> URL: https://issues.apache.org/jira/browse/SPARK-6056
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.1
>Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But once the allocated memory size reach the capacity of the overhead momery 
> set in yarn, this executor will be killed.
> I wrote a simple code to test it:
> progress:
> 1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
> 2. :load test.scala in spark-shell
> 3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print 
> $1}');top -b -p $pid|grep $pid} to catch executor on slave node
> 4. test_driver(20,100)(test) in spark-shell
> 5. watch the output of the command on slave node
> If use multi-thread to get len, the physical memery will soon   exceed the 
> limit set by spark.yarn.executor.memoryOverhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container

2015-02-28 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-6056:

Description: 
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:
```test.scala
import org.apache.spark.storage._
import org.apache.spark._
val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
Array[Byte](10*1024*1024)).persist
bufferRdd.count
val part =  bufferRdd.partitions(0)
val sparkEnv = SparkEnv.get
val blockMgr = sparkEnv.blockManager
def test =
 {
val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
val len = resultIt.map(_.length).sum
}

def test_driver(count:Int, parallel:Int)(f: => Unit) = 
{
val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
val taskSupport  = new scala.collection.parallel.ForkJoinTaskSupport(tpool)
val parseq = (1 to count).par
parseq.tasksupport = taskSupport
parseq.foreach(x=>f)

tpool.shutdown
tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS)
}
```
progress:
1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
2. :load test.scala in spark-shell
3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print 
$1}');top -b -p $pid|grep $pid} to catch executor on slave node
4. test_driver(20,100)(test) in spark-shell
5. watch the output of the command on slave node

If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead

  was:
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:
```test.scala
import org.apache.spark.storage._
import org.apache.spark._
val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
Array[Byte](10*1024*1024)).persist
bufferRdd.count
val part =  bufferRdd.partitions(0)
val sparkEnv = SparkEnv.get
val blockMgr = sparkEnv.blockManager
def test = {
val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
val len = resultIt.map(_.length).sum
}

def test_driver(count:Int, parallel:Int)(f: => Unit) = {
val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
val taskSupport  = new scala.collection.parallel.ForkJoinTaskSupport(tpool)
val parseq = (1 to count).par
parseq.tasksupport = taskSupport
parseq.foreach(x=>f)

tpool.shutdown
tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS)
}
```
progress:
1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
2. :load test.scala in spark-shell
3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print 
$1}');top -b -p $pid|grep $pid} to catch executor on slave node
4. test_driver(20,100)(test) in spark-shell
5. watch the output of the command on slave node

If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead


> Unlimit offHeap memory use cause RM killing the container
> -
>
> Key: SPARK-6056
> URL: https://issues.apache.org/jira/browse/SPARK-6056
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.1
>Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But once the allocated memory size reach the capacity of the overhead momery 
> set in yarn, this executor will be killed.
> I wrote a simple code to test it:
> ```test.scala
> import org.apache.spark.storage._
> import org.apache.spa

[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container

2015-02-28 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-6056:

Description: 
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:
```test.scala
import org.apache.spark.storage._
import org.apache.spark._
val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
Array[Byte](10*1024*1024)).persist
bufferRdd.count
val part =  bufferRdd.partitions(0)
val sparkEnv = SparkEnv.get
val blockMgr = sparkEnv.blockManager
def test = {
val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
val len = resultIt.map(_.length).sum
}

def test_driver(count:Int, parallel:Int)(f: => Unit) = {
val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
val taskSupport  = new scala.collection.parallel.ForkJoinTaskSupport(tpool)
val parseq = (1 to count).par
parseq.tasksupport = taskSupport
parseq.foreach(x=>f)

tpool.shutdown
tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS)
}
```
progress:
1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
2. :load test.scala in spark-shell
3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print 
$1}');top -b -p $pid|grep $pid} to catch executor on slave node
4. test_driver(20,100)(test) in spark-shell
5. watch the output of the command on slave node

If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead

  was:
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:
```test.scala
import org.apache.spark.storage._
import org.apache.spark._
val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
Array[Byte](10*1024*1024)).persist
bufferRdd.count
val part =  bufferRdd.partitions(0)
val sparkEnv = SparkEnv.get
val blockMgr = sparkEnv.blockManager
def test = {
val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
val len = resultIt.map(_.length).sum
println(s"[${Thread.currentThread.getId}] get block length = $len")
}

def test_driver(count:Int, parallel:Int)(f: => Unit) = {
val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
val taskSupport  = new scala.collection.parallel.ForkJoinTaskSupport(tpool)
val parseq = (1 to count).par
parseq.tasksupport = taskSupport
parseq.foreach(x=>f)

tpool.shutdown
tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS)
}
```
progress:
1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
2. :load test.scala in spark-shell
3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print 
$1}');top -b -p $pid|grep $pid} to catch executor on slave node
4. test_driver(20,100)(test) in spark-shell
5. watch the output of the command on slave node

If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead


> Unlimit offHeap memory use cause RM killing the container
> -
>
> Key: SPARK-6056
> URL: https://issues.apache.org/jira/browse/SPARK-6056
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.1
>Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But once the allocated memory size reach the capacity of the overhead momery 
> set in yarn, this executor will be killed.
> I wrote a simple code to test it:
>

[jira] [Updated] (SPARK-6056) Unlimit offHeap memory use cause RM killing the container

2015-02-28 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-6056:

Description: 
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:
```test.scala
import org.apache.spark.storage._
import org.apache.spark._
val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
Array[Byte](10*1024*1024)).persist
bufferRdd.count
val part =  bufferRdd.partitions(0)
val sparkEnv = SparkEnv.get
val blockMgr = sparkEnv.blockManager
def test = {
val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
val len = resultIt.map(_.length).sum
println(s"[${Thread.currentThread.getId}] get block length = $len")
}

def test_driver(count:Int, parallel:Int)(f: => Unit) = {
val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
val taskSupport  = new scala.collection.parallel.ForkJoinTaskSupport(tpool)
val parseq = (1 to count).par
parseq.tasksupport = taskSupport
parseq.foreach(x=>f)

tpool.shutdown
tpool.awaitTermination(100, java.util.concurrent.TimeUnit.SECONDS)
}
```
progress:
1. bin/spark-shell --master yarn-cilent --executor-cores 40 --num-executors 1
2. :load test.scala in spark-shell
3. use comman {pid=$(jps|grep CoarseGrainedExecutorBackend |awk '{print 
$1}');top -b -p $pid|grep $pid} to catch executor on slave node
4. test_driver(20,100)(test) in spark-shell
5. watch the output of the command on slave node

If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead

  was:
No matter set the `preferDirectBufs` or limit the number of thread or not 
,spark can not limit the use of offheap memory.
At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, Netty 
had allocated a offheap memory buffer with the same size in heap.
So how many buffer you want to transfor, the same size offheap memory will be 
allocated.
But once the allocated memory size reach the capacity of the overhead momery 
set in yarn, this executor will be killed.
I wrote a simple code to test it:
```scala
val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
Array[Byte](10*1024*1024)).persist
bufferRdd.count
val part =  bufferRdd.partitions(0)
val sparkEnv = SparkEnv.get
val blockMgr = sparkEnv.blockManager
val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
val resultIt = blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
val len = resultIt.map(_.length).sum
```
If use multi-thread to get len, the physical memery will soon   exceed the 
limit set by spark.yarn.executor.memoryOverhead


> Unlimit offHeap memory use cause RM killing the container
> -
>
> Key: SPARK-6056
> URL: https://issues.apache.org/jira/browse/SPARK-6056
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.1
>Reporter: SaintBacchus
>
> No matter set the `preferDirectBufs` or limit the number of thread or not 
> ,spark can not limit the use of offheap memory.
> At line 269 of the class 'AbstractNioByteChannel' in netty-4.0.23.Final, 
> Netty had allocated a offheap memory buffer with the same size in heap.
> So how many buffer you want to transfor, the same size offheap memory will be 
> allocated.
> But once the allocated memory size reach the capacity of the overhead momery 
> set in yarn, this executor will be killed.
> I wrote a simple code to test it:
> ```test.scala
> import org.apache.spark.storage._
> import org.apache.spark._
> val bufferRdd = sc.makeRDD(0 to 10, 10).map(x=>new 
> Array[Byte](10*1024*1024)).persist
> bufferRdd.count
> val part =  bufferRdd.partitions(0)
> val sparkEnv = SparkEnv.get
> val blockMgr = sparkEnv.blockManager
> def test = {
> val blockOption = blockMgr.get(RDDBlockId(bufferRdd.id, part.index))
> val resultIt = 
> blockOption.get.data.asInstanceOf[Iterator[Array[Byte]]]
> val len = resultIt.map(_.length).sum
> println(s"[${Thread.currentThread.getId}] get block length = $len")
> }
> def test_driver(count:Int, parallel:Int)(f: => Unit) = {
> val tpool = new scala.concurrent.forkjoin.ForkJoinPool(parallel)
> val taskSupport  = new 
> scala.collection.parallel.ForkJoinTaskSupport(tpool)
> val parseq = (1 to count).par

[jira] [Commented] (SPARK-5771) Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called

2015-02-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341947#comment-14341947
 ] 

Apache Spark commented on SPARK-5771:
-

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/4841

> Number of Cores in Completed Applications of Standalone Master Web Page 
> always be 0 if sc.stop() is called
> --
>
> Key: SPARK-5771
> URL: https://issues.apache.org/jira/browse/SPARK-5771
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.2.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
>Priority: Minor
> Fix For: 1.4.0
>
>
> In Standalone mode, the number of cores in Completed Applications of the 
> Master Web Page will always be zero, if sc.stop() is called.
> But the number will always be right, if sc.stop() is not called.
> The reason maybe: 
> after sc.stop() is called, the function removeExecutor of class 
> ApplicationInfo will be called, thus reduce the variable coresGranted to 
> zero.  The variable coresGranted is used to display the number of Cores on 
> the Web Page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5981) pyspark ML models should support predict/transform on vector within map

2015-02-28 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341934#comment-14341934
 ] 

Joseph K. Bradley commented on SPARK-5981:
--

True, NaiveBayesModel does not use JavaModelWrapper.  It's only a problem for 
model which use that.

> pyspark ML models should support predict/transform on vector within map
> ---
>
> Key: SPARK-5981
> URL: https://issues.apache.org/jira/browse/SPARK-5981
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>
> Currently, most Python models only have limited support for single-vector 
> prediction.
> E.g., one can call {code}model.predict(myFeatureVector){code} for a single 
> instance, but that fails within a map for Python ML models and transformers 
> which use JavaModelWrapper:
> {code}
> data.map(lambda features: model.predict(features))
> {code}
> This fails because JavaModelWrapper.call uses the SparkContext (within the 
> transformation).  (It works for linear models, which do prediction within 
> Python.)
> Supporting prediction within a map would require storing the model and doing 
> prediction/transformation within Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3889) JVM dies with SIGBUS, resulting in ConnectionManager failed ACK

2015-02-28 Thread Idan Zalzberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341922#comment-14341922
 ] 

Idan Zalzberg commented on SPARK-3889:
--

Hi,
I am still getting the same error with spark 1.2.1 (sporadically):
{noformat}
#
# A fatal error has been detected by the Java Runtime Environment:
# 
#  SIGBUS (0x7) at pc=0x7ff5ed042220, pid=3694, tid=140692916811520
#
# JRE version: Java(TM) SE Runtime Environment (7.0_55-b13) (build 1.7.0_55-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# v  ~StubRoutines::jint_disjoint_arraycopy
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
{noformat}

Should we re-open this one, or open a new ticket?

> JVM dies with SIGBUS, resulting in ConnectionManager failed ACK
> ---
>
> Key: SPARK-3889
> URL: https://issues.apache.org/jira/browse/SPARK-3889
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Aaron Davidson
>Assignee: Aaron Davidson
>Priority: Critical
> Fix For: 1.2.0
>
>
> Here's the first part of the core dump, possibly caused by a job which 
> shuffles a lot of very small partitions.
> {code}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0x7fa5885fcdb0, pid=488, tid=140343502632704
> #
> # JRE version: 7.0_25-b30
> # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # v  ~StubRoutines::jbyte_disjoint_arraycopy
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please include
> # instructions on how to reproduce the bug and visit:
> #   https://bugs.launchpad.net/ubuntu/+source/openjdk-7/
> #
> ---  T H R E A D  ---
> Current thread (0x7fa4b0631000):  JavaThread "Executor task launch 
> worker-170" daemon [_thread_in_Java, id=6783, 
> stack(0x7fa4448ef000,0x7fa4449f)]
> siginfo:si_signo=SIGBUS: si_errno=0, si_code=2 (BUS_ADRERR), 
> si_addr=0x7fa428f79000
> {code}
> Here is the only useful content I can find related to JVM and SIGBUS from 
> Google: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=976664
> It appears it may be related to disposing byte buffers, which we do in the 
> ConnectionManager -- we mmap shuffle files via ManagedBuffer and dispose of 
> them in BufferMessage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5984) TimSort broken

2015-02-28 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-5984:
---
Affects Version/s: (was: 1.3.0)

> TimSort broken
> --
>
> Key: SPARK-5984
> URL: https://issues.apache.org/jira/browse/SPARK-5984
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.2.1
>Reporter: Reynold Xin
>Priority: Minor
> Fix For: 1.3.0
>
>
> See 
> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/
> Our TimSort is based on Android's TimSort, which is broken in some corner 
> case. Marking it minor as this problem exists for almost all TimSort 
> implementations out there, including Android, OpenJDK, Python, and it hasn't 
> manifested itself in practice yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5984) TimSort broken

2015-02-28 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-5984.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: (was: Aaron Davidson)

> TimSort broken
> --
>
> Key: SPARK-5984
> URL: https://issues.apache.org/jira/browse/SPARK-5984
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.2.1
>Reporter: Reynold Xin
>Priority: Minor
> Fix For: 1.3.0
>
>
> See 
> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/
> Our TimSort is based on Android's TimSort, which is broken in some corner 
> case. Marking it minor as this problem exists for almost all TimSort 
> implementations out there, including Android, OpenJDK, Python, and it hasn't 
> manifested itself in practice yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6089) Size of task result fetched can't be found in UI

2015-02-28 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6089:
-
Description: 
When you do a large collect the amount of data fetched as task result from each 
task is not present in the WebUI. 

We should make this appear under the 'Output' column (both per-task and in 
executor-level aggregation)

cc [~kayousterhout]

  was:
When you do a large collect the amount of data fetched as task result from each 
task is not present in the WebUI. 

We should make this appear under the 'Output' column (both per-task and in 
executor-level aggregation)

[cc ~kayousterhout]


> Size of task result fetched can't be found in UI
> 
>
> Key: SPARK-6089
> URL: https://issues.apache.org/jira/browse/SPARK-6089
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.3.0
>Reporter: Shivaram Venkataraman
>
> When you do a large collect the amount of data fetched as task result from 
> each task is not present in the WebUI. 
> We should make this appear under the 'Output' column (both per-task and in 
> executor-level aggregation)
> cc [~kayousterhout]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6089) Size of task result fetched can't be found in UI

2015-02-28 Thread Shivaram Venkataraman (JIRA)
Shivaram Venkataraman created SPARK-6089:


 Summary: Size of task result fetched can't be found in UI
 Key: SPARK-6089
 URL: https://issues.apache.org/jira/browse/SPARK-6089
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.3.0
Reporter: Shivaram Venkataraman


When you do a large collect the amount of data fetched as task result from each 
task is not present in the WebUI. 

We should make this appear under the 'Output' column (both per-task and in 
executor-level aggregation)

[cc ~kayousterhout]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4144) Support incremental model training of Naive Bayes classifier

2015-02-28 Thread Chris Fregly (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341901#comment-14341901
 ] 

Chris Fregly commented on SPARK-4144:
-

Hey [~freeman-lab]!

I was literally just talking to [~josephkb] in the office last week about 
picking this up.  Great timing!

Let's coordinate offline.  I'll shoot you an email.

-Chris



> Support incremental model training of Naive Bayes classifier
> 
>
> Key: SPARK-4144
> URL: https://issues.apache.org/jira/browse/SPARK-4144
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Streaming
>Reporter: Chris Fregly
>Assignee: Jeremy Freeman
>
> Per Xiangrui Meng from the following user list discussion:  
> http://mail-archives.apache.org/mod_mbox/spark-user/201408.mbox/%3CCAJgQjQ_QjMGO=jmm8weq1v8yqfov8du03abzy7eeavgjrou...@mail.gmail.com%3E
>
> "For Naive Bayes, we need to update the priors and conditional
> probabilities, which means we should also remember the number of
> observations for the updates."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6088) UI is malformed when tasks fetch remote results

2015-02-28 Thread Kay Ousterhout (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout updated SPARK-6088:
--
Description: 
There are three issues when tasks get remote results:

(1) The status never changes from GET_RESULT to SUCCEEDED
(2) The time to get the result is shown as the absolute time (resulting in a 
non-sensical output that says getting the result took >1 million hours) rather 
than the elapsed time
(3) The getting result time is included as part of the scheduler delay

cc [~shivaram]

  was:
There are two issues when tasks get remote results:

(1) The status never changes from GET_RESULT to SUCCEEDED
(2) The time to get the result is shown as the absolute time (resulting in a 
non-sensical output that says getting the result took >1 million hours) rather 
than the elapsed time

cc [~shivaram]


> UI is malformed when tasks fetch remote results
> ---
>
> Key: SPARK-6088
> URL: https://issues.apache.org/jira/browse/SPARK-6088
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
> Attachments: Screenshot 2015-02-28 18.24.42.png
>
>
> There are three issues when tasks get remote results:
> (1) The status never changes from GET_RESULT to SUCCEEDED
> (2) The time to get the result is shown as the absolute time (resulting in a 
> non-sensical output that says getting the result took >1 million hours) 
> rather than the elapsed time
> (3) The getting result time is included as part of the scheduler delay
> cc [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6088) UI is malformed when tasks fetch remote results

2015-02-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341898#comment-14341898
 ] 

Apache Spark commented on SPARK-6088:
-

User 'kayousterhout' has created a pull request for this issue:
https://github.com/apache/spark/pull/4839

> UI is malformed when tasks fetch remote results
> ---
>
> Key: SPARK-6088
> URL: https://issues.apache.org/jira/browse/SPARK-6088
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
> Attachments: Screenshot 2015-02-28 18.24.42.png
>
>
> There are two issues when tasks get remote results:
> (1) The status never changes from GET_RESULT to SUCCEEDED
> (2) The time to get the result is shown as the absolute time (resulting in a 
> non-sensical output that says getting the result took >1 million hours) 
> rather than the elapsed time
> cc [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6088) UI is malformed when tasks fetch remote results

2015-02-28 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341896#comment-14341896
 ] 

Shivaram Venkataraman commented on SPARK-6088:
--

Also for some reason the get result time is also included in the Scheduler 
Delay. Screen shot attached shows how the get result took 33 mins and how this 
shows up in scheduler delay.

> UI is malformed when tasks fetch remote results
> ---
>
> Key: SPARK-6088
> URL: https://issues.apache.org/jira/browse/SPARK-6088
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
> Attachments: Screenshot 2015-02-28 18.24.42.png
>
>
> There are two issues when tasks get remote results:
> (1) The status never changes from GET_RESULT to SUCCEEDED
> (2) The time to get the result is shown as the absolute time (resulting in a 
> non-sensical output that says getting the result took >1 million hours) 
> rather than the elapsed time
> cc [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6088) UI is malformed when tasks fetch remote results

2015-02-28 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6088:
-
Attachment: Screenshot 2015-02-28 18.24.42.png

> UI is malformed when tasks fetch remote results
> ---
>
> Key: SPARK-6088
> URL: https://issues.apache.org/jira/browse/SPARK-6088
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
> Attachments: Screenshot 2015-02-28 18.24.42.png
>
>
> There are two issues when tasks get remote results:
> (1) The status never changes from GET_RESULT to SUCCEEDED
> (2) The time to get the result is shown as the absolute time (resulting in a 
> non-sensical output that says getting the result took >1 million hours) 
> rather than the elapsed time
> cc [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce

2015-02-28 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341895#comment-14341895
 ] 

Marko Bonaci commented on SPARK-2620:
-

*Spark 1.2 shell local:*

{code:java}
scala> case class P(name:String)
defined class P

scala> val ps = Array(P("alice"), P("bob"), P("charly"), P("bob"))
ps: Array[P] = Array(P(alice), P(bob), P(charly), P(bob))

scala> sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect
res8: Array[(P, Int)] = Array((P(alice),1), (P(charly),1), (P(bob),2))
{code}

> case class cannot be used as key for reduce
> ---
>
> Key: SPARK-2620
> URL: https://issues.apache.org/jira/browse/SPARK-2620
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.0, 1.1.0
> Environment: reproduced on spark-shell local[4]
>Reporter: Gerard Maas
>Assignee: Tobias Schlatter
>Priority: Critical
>  Labels: case-class, core
>
> Using a case class as a key doesn't seem to work properly on Spark 1.0.0
> A minimal example:
> case class P(name:String)
> val ps = Array(P("alice"), P("bob"), P("charly"), P("bob"))
> sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect
> [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), 
> (P(bob),1), (P(abe),1), (P(charly),1))
> In contrast to the expected behavior, that should be equivalent to:
> sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect
> Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2))
> groupByKey and distinct also present the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6088) UI is malformed when tasks fetch remote results

2015-02-28 Thread Kay Ousterhout (JIRA)
Kay Ousterhout created SPARK-6088:
-

 Summary: UI is malformed when tasks fetch remote results
 Key: SPARK-6088
 URL: https://issues.apache.org/jira/browse/SPARK-6088
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout


There are two issues when tasks get remote results:

(1) The status never changes from GET_RESULT to SUCCEEDED
(2) The time to get the result is shown as the absolute time (resulting in a 
non-sensical output that says getting the result took >1 million hours) rather 
than the elapsed time

cc [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators

2015-02-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6086:
---
Component/s: SQL

> Exceptions in DAGScheduler.updateAccumulators
> -
>
> Key: SPARK-6086
> URL: https://issues.apache.org/jira/browse/SPARK-6086
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, SQL
>Affects Versions: 1.3.0
>Reporter: Kai Zeng
>Priority: Critical
>
> Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler 
> is collecting status from tasks. These exceptions happen occasionally, 
> especially when there are many stages in a job.
> Application code: 
> https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala
> Script used: ./bin/spark-submit --class 
> org.apache.spark.examples.sql.hive.SQLSuite 
> examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar 
> benchmark-cache 6
> There are two types of error messages:
> {code}
> java.lang.ClassCastException: scala.None$ cannot be cast to 
> scala.collection.TraversableOnce
>   at 
> org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188)
>   at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> {code}
> java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer
>   at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
>   at 
> org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263)
>   at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators

2015-02-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6086:
---
Component/s: Spark Core

> Exceptions in DAGScheduler.updateAccumulators
> -
>
> Key: SPARK-6086
> URL: https://issues.apache.org/jira/browse/SPARK-6086
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 1.3.0
>Reporter: Kai Zeng
>Priority: Critical
>
> Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler 
> is collecting status from tasks. These exceptions happen occasionally, 
> especially when there are many stages in a job.
> Application code: 
> https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala
> Script used: ./bin/spark-submit --class 
> org.apache.spark.examples.sql.hive.SQLSuite 
> examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar 
> benchmark-cache 6
> There are two types of error messages:
> {code}
> java.lang.ClassCastException: scala.None$ cannot be cast to 
> scala.collection.TraversableOnce
>   at 
> org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188)
>   at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> {code}
> java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer
>   at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
>   at 
> org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263)
>   at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators

2015-02-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6086:
---
Description: 
Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler is 
collecting status from tasks. These exceptions happen occasionally, especially 
when there are many stages in a job.

Application code: 
https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala
Script used: ./bin/spark-submit --class 
org.apache.spark.examples.sql.hive.SQLSuite 
examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar 
benchmark-cache 6

There are two types of error messages:
{code}
java.lang.ClassCastException: scala.None$ cannot be cast to 
scala.collection.TraversableOnce
  at 
org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188)
  at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
  at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
  at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
  at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
  at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
  at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
  at 
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
  at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{code}

{code}
java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
  at 
org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263)
  at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
  at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
  at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
  at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
  at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
  at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
  at 
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
  at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{code}

  was:
Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler is 
collecting status from tasks. These exceptions happen occasionally, especially 
when there are many stages in a job.

Application code: 
https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala
Script used: ./bin/spark-submit --class 
org.apache.spark.examples.sql.hive.SQLSuite 
examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar 
benchmark-cache 6

There are two types of error messages:

java.lang.ClassCastException: scala.None$ cannot be cast to 
scala.collection.TraversableOnce
  at 
org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188)
  at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
  at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
  at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
  at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
  at scala.collec

[jira] [Commented] (SPARK-6066) Metadata in event log makes it very difficult for external libraries to parse event log

2015-02-28 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341881#comment-14341881
 ] 

Patrick Wendell commented on SPARK-6066:


[~vanzin] - yes you are right (an early scratch version of the feature used a 
Gzip stream, I think). There are python bindings for all three of those 
compression codecs. To be fair, I'm not 100% sure the codecs are standardized 
enough to be compatible across different implementations. Gzip is pretty good 
in this regard, but not sure about those other three.

> Metadata in event log makes it very difficult for external libraries to parse 
> event log
> ---
>
> Key: SPARK-6066
> URL: https://issues.apache.org/jira/browse/SPARK-6066
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Kay Ousterhout
>Assignee: Andrew Or
>Priority: Blocker
>
> The fix for SPARK-2261 added a line at the beginning of the event log that 
> encodes metadata.  This line makes it much more difficult to parse the event 
> logs from external libraries (like 
> https://github.com/kayousterhout/trace-analysis, which is used by folks at 
> Berkeley) because:
> (1) The metadata is not written as JSON, unlike the rest of the file
> (2) More annoyingly, if the file is compressed, the metadata is not 
> compressed.  This has a few side-effects: first, someone can't just use the 
> command line to uncompress the file and then look at the logs, because the 
> file is in this weird half-compressed format; and second, now external tools 
> that parse these logs also need to deal with this weird format.
> We should fix this before the 1.3 release, because otherwise we'll have to 
> add a bunch more backward-compatibility code to handle this weird format!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6087) Provide actionable exception if Kryo buffer is not large enough

2015-02-28 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-6087:
--

 Summary: Provide actionable exception if Kryo buffer is not large 
enough
 Key: SPARK-6087
 URL: https://issues.apache.org/jira/browse/SPARK-6087
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: Patrick Wendell
Priority: Critical


Right now if you don't have a large enough Kryo buffer, you get a really 
confusing exception. I noticed this when using Kryo to serialize broadcasted 
tables in Spark SQL. We should catch-then-rethrow this in the KryoSerializer, 
wrapping it in a message that suggests increasing the Kryo buffer size 
configuration variable.

{code}
com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, 
required: 3
Serialization trace:
value (org.apache.spark.sql.catalyst.expressions.MutableAny)
values (org.apache.spark.sql.catalyst.expressions.SpecificMutableRow)
at com.esotericsoftware.kryo.io.Output.require(Output.java:138)
at com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446)
at com.esotericsoftware.kryo.io.Output.writeString(Output.java:306)
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153)
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at 
org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:167)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:234)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6087) Provide actionable exception if Kryo buffer is not large enough

2015-02-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6087:
---
Description: 
Right now if you don't have a large enough Kryo buffer, you get a really 
confusing exception. I noticed this when using Kryo to serialize broadcasted 
tables in Spark SQL. We should catch-then-rethrow this in the KryoSerializer, 
wrapping it in a message that suggests increasing the Kryo buffer size 
configuration variable.

{code}
com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, 
required: 3
Serialization trace:
value (org.apache.spark.sql.catalyst.expressions.MutableAny)
values (org.apache.spark.sql.catalyst.expressions.SpecificMutableRow)
at com.esotericsoftware.kryo.io.Output.require(Output.java:138)
at com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446)
at com.esotericsoftware.kryo.io.Output.writeString(Output.java:306)
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153)
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at 
org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:167)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:234)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}

/cc [~kayousterhout] who helped report his issue

  was:
Right now if you don't have a large enough Kryo buffer, you get a really 
confusing exception. I noticed this when using Kryo to serialize broadcasted 
tables in Spark SQL. We should catch-then-rethrow this in the KryoSerializer, 
wrapping it in a message that suggests increasing the Kryo buffer size 
configuration variable.

{code}
com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, 
required: 3
Serialization trace:
value (org.apache.spark.sql.catalyst.expressions.MutableAny)
values (org.apache.spark.sql.catalyst.expressions.SpecificMutableRow)
at com.esotericsoftware.kryo.io.Output.require(Output.java:138)
at com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446)
at com.esotericsoftware.kryo.io.Output.writeString(Output.java:306)
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153)
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializ

[jira] [Created] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators

2015-02-28 Thread Kai Zeng (JIRA)
Kai Zeng created SPARK-6086:
---

 Summary: Exceptions in DAGScheduler.updateAccumulators
 Key: SPARK-6086
 URL: https://issues.apache.org/jira/browse/SPARK-6086
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.3.0
Reporter: Kai Zeng
Priority: Critical


Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler is 
collecting status from tasks. These exceptions happen occasionally, especially 
when there are many stages in a job.

Application code: 
https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala
Script used: ./bin/spark-submit --class 
org.apache.spark.examples.sql.hive.SQLSuite 
examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar 
benchmark-cache 6

There are two types of error messages:

java.lang.ClassCastException: scala.None$ cannot be cast to 
scala.collection.TraversableOnce
  at 
org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188)
  at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
  at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
  at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
  at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
  at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
  at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
  at 
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
  at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
  at 
org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263)
  at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
  at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
  at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
  at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
  at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
  at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
  at 
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
  at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2301) add ability to submit multiple jars for Driver

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-2301.
--
Resolution: Won't Fix

The PR says this is WontFix.

> add ability to submit multiple jars for Driver
> --
>
> Key: SPARK-2301
> URL: https://issues.apache.org/jira/browse/SPARK-2301
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Reporter: Lianhui Wang
>
> add ability to submit multiple jars for Driver
> see PR:
> https://github.com/apache/spark/pull/1113



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3357) Internal log messages should be set at DEBUG level instead of INFO

2015-02-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341846#comment-14341846
 ] 

Apache Spark commented on SPARK-3357:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/4838

> Internal log messages should be set at DEBUG level instead of INFO
> --
>
> Key: SPARK-3357
> URL: https://issues.apache.org/jira/browse/SPARK-3357
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Xiangrui Meng
>Priority: Minor
>
> spark-shell shows INFO by default, so we should carefully choose what to show 
> at INFO level. For example, if I run
> {code}
> sc.parallelize(0 until 100).count()
> {code}
> and wait for one minute or so. I will see messages that mix with the current 
> input box, which is annoying:
> {code}
> scala> 14/09/02 17:09:00 INFO BlockManager: Removing broadcast 0
> 14/09/02 17:09:00 INFO BlockManager: Removing block broadcast_0
> 14/09/02 17:09:00 INFO MemoryStore: Block broadcast_0 of size 1088 dropped 
> from memory (free 278019440)
> 14/09/02 17:09:00 INFO ContextCleaner: Cleaned broadcast 0
> {code}
> Does a user need to know when a broadcast variable is removed? Maybe not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6068) KMeans Parallel test may fail

2015-02-28 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341833#comment-14341833
 ] 

Joseph K. Bradley commented on SPARK-6068:
--

[~derrickburns]  I'm sorry about how it can take a long time to get a PR into 
Spark, but sending small PRs with one PR per JIRA helps a lot.  For a reviewer 
to say "LGTM," they need to fully understand and be prepared to "own" the code, 
which makes reviewing large patches *much* harder.  I've spent a lot of time 
breaking my patches into smaller pieces.

Looking over your JIRAs, the changes all sound useful.  It also seems like the 
most important change for you (supporting general Bregman divergences) could 
potentially be added in spark.ml or spark.mllib without making breaking 
changes.  Since there is no distance metric parameter currently, adding one 
based on a Bregman divergence API should be possible.  However, but it's pretty 
hard to figure out exactly what changes are needed because of the many issues 
being addressed in your big k-means PR.  A smaller PR would help a lot.

I hope it will prove worthwhile for you to help get these improvements into 
MLlib, piece by piece.  I don't think they will all require waiting for the 
spark.ml API, but if you do want to make major API changes, then this would be 
time to design the new API for the spark.ml package.
* [SPARK-6001] might require an API change since it would return a model which 
could not be serialized.  Perhaps it could follow a similar pattern as LDA, 
which returns a DistributedLDAModel (with info about the training dataset topic 
distributions), which in turn can be converted into a LocalLDAModel (which 
stores model parameters locally and drops the training dataset info).

> KMeans Parallel test may fail
> -
>
> Key: SPARK-6068
> URL: https://issues.apache.org/jira/browse/SPARK-6068
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.2.1
>Reporter: Derrick Burns
>  Labels: clustering
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The test  "k-means|| initialization in KMeansSuite can fail when the random 
> number generator is truly random.
> The test is predicated on the assumption that each round of K-Means || will 
> add at least one new cluster center.  The current implementation of K-Means 
> || adds 2*k cluster centers with high probability.  However, there is no 
> deterministic lower bound on the number of cluster centers added.
> Choices are:
> 1)  change the KMeans || implementation to iterate on selecting points until 
> it has satisfied a lower bound on the number of points chosen.
> 2) eliminate the test
> 3) ignore the problem and depend on the random number generator to sample the 
> space in a lucky manner. 
> Option (1) is most in keeping with the contract that KMeans || should provide 
> a precise number of cluster centers when possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFoundException with Kryo, Guava 14

2015-02-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341819#comment-14341819
 ] 

Sean Owen commented on SPARK-6069:
--

No I set it kind of preemptively. I don't know that I serialize any Guava 
classes though, come to think of it.
I am using YARN + Hadoop 2.5.

I don't think it should be necessary in general. Guava is a strange special 
case, so though it worth trying.

If you have the energy, you might try 1.3.0-SNAPSHOT since I see a few things 
fixed that may be relevant:
https://issues.apache.org/jira/browse/SPARK-4877
https://issues.apache.org/jira/browse/SPARK-4660


> Deserialization Error ClassNotFoundException with Kryo, Guava 14
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>Priority: Critical
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFoundException with Kryo, Guava 14

2015-02-28 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341809#comment-14341809
 ] 

Pat Ferrel commented on SPARK-6069:
---

Embarrassed to say still on Hadoop 1.2.1 and so no yarn. The packaging is not 
in the app jar but a separate pruned down dependencies-only jar. I can see why 
yarn would throw a unique kink into the situation.  So I guess you ran into 
this and had to use the {{user.classpath.first}} work around or are you saying 
it doesn't occur in oryx?

Still none of this should be necessary, right? Why else would jars be specified 
in to context creation? We do have a work around if someone has to work with 
1.2.1 but because of that it doesn't seem like a good version to recommend. 
Maybe I'll try 1.2 and install H2 and yarn--which seems like what the distros 
support.

> Deserialization Error ClassNotFoundException with Kryo, Guava 14
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>Priority: Critical
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5389) spark-shell.cmd does not run from DOS Windows 7

2015-02-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341790#comment-14341790
 ] 

Nicholas Chammas edited comment on SPARK-5389 at 2/28/15 9:48 PM:
--

Marking as major since the shell -is technically broken- is behaving terribly 
when Java cannot be found.

Reopening since multiple reports of this problem have come in.


was (Author: nchammas):
Marking as major since the shell is technically broken. (Trivial is for mostly 
cosmetic problems.)

Reopening since multiple reports of this problem have come in.

> spark-shell.cmd does not run from DOS Windows 7
> ---
>
> Key: SPARK-5389
> URL: https://issues.apache.org/jira/browse/SPARK-5389
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0
> Environment: Windows 7
>Reporter: Yana Kadiyska
> Attachments: SparkShell_Win7.JPG
>
>
> spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. 
> spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2
> Marking as trivial since calling spark-shell2.cmd also works fine
> Attaching a screenshot since the error isn't very useful:
> {code}
> spark-1.2.0-bin-cdh4>bin\spark-shell.cmd
> else was unexpected at this time.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6084) spark-shell broken on Windows

2015-02-28 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas resolved SPARK-6084.
-
Resolution: Duplicate

Resolving as duplicate of SPARK-5389. That seems a more likely match for this 
than SPARK-4833.

> spark-shell broken on Windows
> -
>
> Key: SPARK-6084
> URL: https://issues.apache.org/jira/browse/SPARK-6084
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0, 1.2.1
> Environment: Windows 7, Scala 2.11.4, Java 1.8
>Reporter: Nicholas Chammas
>  Labels: windows
>
> Original report here: 
> http://stackoverflow.com/questions/28747795/spark-launch-find-version
> For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this:
> {code}
> bin\spark-shell.cmd
> {code}
> Yields the following error:
> {code}
> find: 'version': No such file or directory
> else was unexpected at this time.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5389) spark-shell.cmd does not run from DOS Windows 7

2015-02-28 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5389:

Description: 
spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. 

spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2

Marking as trivial since calling spark-shell2.cmd also works fine

Attaching a screenshot since the error isn't very useful:

{code}
spark-1.2.0-bin-cdh4>bin\spark-shell.cmd
else was unexpected at this time.
{code}

  was:
spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. 

spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2

Marking as trivial sine calling spark-shell2.cmd also works fine

Attaching a screenshot since the error isn't very useful:

spark-1.2.0-bin-cdh4>bin\spark-shell.cmd
else was unexpected at this time.

   Priority: Major  (was: Trivial)
Environment: Windows 7

Marking as major since the shell is technically broken. (Trivial is for mostly 
cosmetic problems.)

Reopening since multiple reports of this problem have come in.

> spark-shell.cmd does not run from DOS Windows 7
> ---
>
> Key: SPARK-5389
> URL: https://issues.apache.org/jira/browse/SPARK-5389
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0
> Environment: Windows 7
>Reporter: Yana Kadiyska
> Attachments: SparkShell_Win7.JPG
>
>
> spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. 
> spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2
> Marking as trivial since calling spark-shell2.cmd also works fine
> Attaching a screenshot since the error isn't very useful:
> {code}
> spark-1.2.0-bin-cdh4>bin\spark-shell.cmd
> else was unexpected at this time.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-5389) spark-shell.cmd does not run from DOS Windows 7

2015-02-28 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas reopened SPARK-5389:
-

> spark-shell.cmd does not run from DOS Windows 7
> ---
>
> Key: SPARK-5389
> URL: https://issues.apache.org/jira/browse/SPARK-5389
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0
> Environment: Windows 7
>Reporter: Yana Kadiyska
> Attachments: SparkShell_Win7.JPG
>
>
> spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. 
> spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2
> Marking as trivial since calling spark-shell2.cmd also works fine
> Attaching a screenshot since the error isn't very useful:
> {code}
> spark-1.2.0-bin-cdh4>bin\spark-shell.cmd
> else was unexpected at this time.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5389) spark-shell.cmd does not run from DOS Windows 7

2015-02-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341789#comment-14341789
 ] 

Nicholas Chammas commented on SPARK-5389:
-

Yeah, I think we found another instance of this in SPARK-6084 / 
[here|http://stackoverflow.com/questions/28747795/spark-launch-find-version].

> spark-shell.cmd does not run from DOS Windows 7
> ---
>
> Key: SPARK-5389
> URL: https://issues.apache.org/jira/browse/SPARK-5389
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0
>Reporter: Yana Kadiyska
>Priority: Trivial
> Attachments: SparkShell_Win7.JPG
>
>
> spark-shell.cmd crashes in DOS prompt Windows 7. Works fine under PowerShell. 
> spark-shell.cmd works fine for me in v.1.1 so this is new in spark1.2
> Marking as trivial sine calling spark-shell2.cmd also works fine
> Attaching a screenshot since the error isn't very useful:
> spark-1.2.0-bin-cdh4>bin\spark-shell.cmd
> else was unexpected at this time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5396) Syntax error in spark scripts on windows.

2015-02-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341788#comment-14341788
 ] 

Nicholas Chammas commented on SPARK-5396:
-

What does that error message say in English? So we can pattern match to similar 
reports elsewhere.

> Syntax error in spark scripts on windows.
> -
>
> Key: SPARK-5396
> URL: https://issues.apache.org/jira/browse/SPARK-5396
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.3.0
> Environment: Window 7 and Window 8.1.
>Reporter: Vladimir Protsenko
>Assignee: Masayoshi TSUZUKI
>Priority: Critical
> Fix For: 1.3.0
>
> Attachments: windows7.png, windows8.1.png
>
>
> I made the following steps: 
> 1. downloaded and installed Scala 2.11.5 
> 2. downloaded spark 1.2.0 by git clone git://github.com/apache/spark.git 
> 3. run dev/change-version-to-2.11.sh and mvn -Dscala-2.11 -DskipTests clean 
> package (in git bash) 
> After installation tried to run spark-shell.cmd in cmd shell and it says 
> there is a syntax error in file. The same with spark-shell2.cmd, 
> spark-submit.cmd and  spark-submit2.cmd.
> !windows7.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6085) Increase default value for memory overhead

2015-02-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341786#comment-14341786
 ] 

Apache Spark commented on SPARK-6085:
-

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/4836

> Increase default value for memory overhead
> --
>
> Key: SPARK-6085
> URL: https://issues.apache.org/jira/browse/SPARK-6085
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> Several users have communicated how current default memory overhead value 
> resulted in failed computation in Spark on YARN.
> See this thread:
> http://search-hadoop.com/m/JW1q58FDel
> Increasing default value for memory overhead would improve out of box user 
> experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6084) spark-shell broken on Windows

2015-02-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341787#comment-14341787
 ] 

Nicholas Chammas commented on SPARK-6084:
-

Ah, there's also SPARK-5396, though it's in Russian (?) so I'm not sure if the 
error is the same.

> spark-shell broken on Windows
> -
>
> Key: SPARK-6084
> URL: https://issues.apache.org/jira/browse/SPARK-6084
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0, 1.2.1
> Environment: Windows 7, Scala 2.11.4, Java 1.8
>Reporter: Nicholas Chammas
>  Labels: windows
>
> Original report here: 
> http://stackoverflow.com/questions/28747795/spark-launch-find-version
> For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this:
> {code}
> bin\spark-shell.cmd
> {code}
> Yields the following error:
> {code}
> find: 'version': No such file or directory
> else was unexpected at this time.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6085) Increase default value for memory overhead

2015-02-28 Thread Ted Yu (JIRA)
Ted Yu created SPARK-6085:
-

 Summary: Increase default value for memory overhead
 Key: SPARK-6085
 URL: https://issues.apache.org/jira/browse/SPARK-6085
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu


Several users have communicated how current default memory overhead value 
resulted in failed computation in Spark on YARN.
See this thread:
http://search-hadoop.com/m/JW1q58FDel

Increasing default value for memory overhead would improve out of box user 
experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6084) spark-shell broken on Windows

2015-02-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341785#comment-14341785
 ] 

Sean Owen commented on SPARK-6084:
--

Oops, I meant SPARK-5389. It still may not be the same thing. Maybe [~tsudukim] 
can look to double-check whether it's the same? Is {{find "version"'}} supposed 
to work in Windows at large, or just PowerShell or...?

> spark-shell broken on Windows
> -
>
> Key: SPARK-6084
> URL: https://issues.apache.org/jira/browse/SPARK-6084
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0, 1.2.1
> Environment: Windows 7, Scala 2.11.4, Java 1.8
>Reporter: Nicholas Chammas
>  Labels: windows
>
> Original report here: 
> http://stackoverflow.com/questions/28747795/spark-launch-find-version
> For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this:
> {code}
> bin\spark-shell.cmd
> {code}
> Yields the following error:
> {code}
> find: 'version': No such file or directory
> else was unexpected at this time.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-6084) spark-shell broken on Windows

2015-02-28 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas reopened SPARK-6084:
-

Don't see how this is a dup of SPARK-4833.

> spark-shell broken on Windows
> -
>
> Key: SPARK-6084
> URL: https://issues.apache.org/jira/browse/SPARK-6084
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0, 1.2.1
> Environment: Windows 7, Scala 2.11.4, Java 1.8
>Reporter: Nicholas Chammas
>  Labels: windows
>
> Original report here: 
> http://stackoverflow.com/questions/28747795/spark-launch-find-version
> For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this:
> {code}
> bin\spark-shell.cmd
> {code}
> Yields the following error:
> {code}
> find: 'version': No such file or directory
> else was unexpected at this time.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6084) spark-shell broken on Windows

2015-02-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341776#comment-14341776
 ] 

Nicholas Chammas commented on SPARK-6084:
-

I took a look at the linked issue (SPARK-4833) and I don't see how they are 
duplicates. They both relate to spark-shell and Windows, but the error messages 
and conditions are different.

Here the use is claiming spark-shell fails with an error right away. There, the 
user is claiming spark-shell runs OK the first time, but then doesn't run a 
second time.

> spark-shell broken on Windows
> -
>
> Key: SPARK-6084
> URL: https://issues.apache.org/jira/browse/SPARK-6084
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0, 1.2.1
> Environment: Windows 7, Scala 2.11.4, Java 1.8
>Reporter: Nicholas Chammas
>  Labels: windows
>
> Original report here: 
> http://stackoverflow.com/questions/28747795/spark-launch-find-version
> For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this:
> {code}
> bin\spark-shell.cmd
> {code}
> Yields the following error:
> {code}
> find: 'version': No such file or directory
> else was unexpected at this time.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6084) spark-shell broken on Windows

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6084.
--
  Resolution: Duplicate
Target Version/s:   (was: 1.3.0)

I think a lot does not work on Windows.

> spark-shell broken on Windows
> -
>
> Key: SPARK-6084
> URL: https://issues.apache.org/jira/browse/SPARK-6084
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0, 1.2.1
> Environment: Windows 7, Scala 2.11.4, Java 1.8
>Reporter: Nicholas Chammas
>  Labels: windows
>
> Original report here: 
> http://stackoverflow.com/questions/28747795/spark-launch-find-version
> For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this:
> {code}
> bin\spark-shell.cmd
> {code}
> Yields the following error:
> {code}
> find: 'version': No such file or directory
> else was unexpected at this time.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost

2015-02-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341767#comment-14341767
 ] 

Apache Spark commented on SPARK-6075:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/4835

> After SPARK-3885, some tasks' accumulator updates may be lost
> -
>
> Key: SPARK-6075
> URL: https://issues.apache.org/jira/browse/SPARK-6075
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.4.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Blocker
>
> It looks like some of the AccumulatorSuite tests have started failing 
> nondeterministically on Jenkins.  The errors seem to be due to lost / missing 
> accumulator updates, e.g.
> {code}
> Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901
> {code}
> This could somehow be related to SPARK-3885 / 
> https://github.com/apache/spark/pull/4021, a patch to garbage-collect 
> accumulators, which was only merged into master.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/
> I think I've figured it out: consider the lifecycle of an accumulator in a 
> task, say ShuffleMapTask: on the executor, each task deserializes its own 
> copy of the RDD inside of its runTask method, so the strong reference to the 
> RDD disappears at the end of runTask. In Executor.run(), we call 
> Accumulators.values after runTask has exited, so there's a small window in 
> which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well 
> because there are no longer any strong references to them.
> The fix is to keep strong references in localAccums, since we clear this at 
> the end of each task anyways. I'm glad that I was able to figure out 
> precisely why this was necessary and sorry that I missed this during review; 
> I'll submit a fix shortly. In terms of preventative measures, it might be a 
> good idea to write up the lifetime / lifecycle of objects' strong references 
> whenever we're using WeakReferences, since the process of explicitly writing 
> that out would prevent these sorts of mistakes in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost

2015-02-28 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-6075:
--
Priority: Blocker  (was: Critical)

> After SPARK-3885, some tasks' accumulator updates may be lost
> -
>
> Key: SPARK-6075
> URL: https://issues.apache.org/jira/browse/SPARK-6075
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.4.0
>Reporter: Josh Rosen
>Priority: Blocker
>
> It looks like some of the AccumulatorSuite tests have started failing 
> nondeterministically on Jenkins.  The errors seem to be due to lost / missing 
> accumulator updates, e.g.
> {code}
> Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901
> {code}
> This could somehow be related to SPARK-3885 / 
> https://github.com/apache/spark/pull/4021, a patch to garbage-collect 
> accumulators, which was only merged into master.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/
> I think I've figured it out: consider the lifecycle of an accumulator in a 
> task, say ShuffleMapTask: on the executor, each task deserializes its own 
> copy of the RDD inside of its runTask method, so the strong reference to the 
> RDD disappears at the end of runTask. In Executor.run(), we call 
> Accumulators.values after runTask has exited, so there's a small window in 
> which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well 
> because there are no longer any strong references to them.
> The fix is to keep strong references in localAccums, since we clear this at 
> the end of each task anyways. I'm glad that I was able to figure out 
> precisely why this was necessary and sorry that I missed this during review; 
> I'll submit a fix shortly. In terms of preventative measures, it might be a 
> good idea to write up the lifetime / lifecycle of objects' strong references 
> whenever we're using WeakReferences, since the process of explicitly writing 
> that out would prevent these sorts of mistakes in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost

2015-02-28 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-6075:
--
Labels:   (was: flaky-test)

> After SPARK-3885, some tasks' accumulator updates may be lost
> -
>
> Key: SPARK-6075
> URL: https://issues.apache.org/jira/browse/SPARK-6075
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.4.0
>Reporter: Josh Rosen
>Priority: Critical
>
> It looks like some of the AccumulatorSuite tests have started failing 
> nondeterministically on Jenkins.  The errors seem to be due to lost / missing 
> accumulator updates, e.g.
> {code}
> Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901
> {code}
> This could somehow be related to SPARK-3885 / 
> https://github.com/apache/spark/pull/4021, a patch to garbage-collect 
> accumulators, which was only merged into master.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/
> I think I've figured it out: consider the lifecycle of an accumulator in a 
> task, say ShuffleMapTask: on the executor, each task deserializes its own 
> copy of the RDD inside of its runTask method, so the strong reference to the 
> RDD disappears at the end of runTask. In Executor.run(), we call 
> Accumulators.values after runTask has exited, so there's a small window in 
> which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well 
> because there are no longer any strong references to them.
> The fix is to keep strong references in localAccums, since we clear this at 
> the end of each task anyways. I'm glad that I was able to figure out 
> precisely why this was necessary and sorry that I missed this during review; 
> I'll submit a fix shortly. In terms of preventative measures, it might be a 
> good idea to write up the lifetime / lifecycle of objects' strong references 
> whenever we're using WeakReferences, since the process of explicitly writing 
> that out would prevent these sorts of mistakes in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost

2015-02-28 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-6075:
--
Description: 
It looks like some of the AccumulatorSuite tests have started failing 
nondeterministically on Jenkins.  The errors seem to be due to lost / missing 
accumulator updates, e.g.

{code}
Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901
{code}

This could somehow be related to SPARK-3885 / 
https://github.com/apache/spark/pull/4021, a patch to garbage-collect 
accumulators, which was only merged into master.

https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/

I think I've figured it out: consider the lifecycle of an accumulator in a 
task, say ShuffleMapTask: on the executor, each task deserializes its own copy 
of the RDD inside of its runTask method, so the strong reference to the RDD 
disappears at the end of runTask. In Executor.run(), we call 
Accumulators.values after runTask has exited, so there's a small window in 
which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well 
because there are no longer any strong references to them.

The fix is to keep strong references in localAccums, since we clear this at the 
end of each task anyways. I'm glad that I was able to figure out precisely why 
this was necessary and sorry that I missed this during review; I'll submit a 
fix shortly. In terms of preventative measures, it might be a good idea to 
write up the lifetime / lifecycle of objects' strong references whenever we're 
using WeakReferences, since the process of explicitly writing that out would 
prevent these sorts of mistakes in the future.

  was:
It looks like some of the AccumulatorSuite tests have started failing 
nondeterministically on Jenkins.  The errors seem to be due to lost / missing 
accumulator updates, e.g.

{code}
Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901
{code}

This could somehow be related to SPARK-3885 / 
https://github.com/apache/spark/pull/4021, a patch to garbage-collect 
accumulators, which was only merged into master.

https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/


> After SPARK-3885, some tasks' accumulator updates may be lost
> -
>
> Key: SPARK-6075
> URL: https://issues.apache.org/jira/browse/SPARK-6075
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.4.0
>Reporter: Josh Rosen
>Priority: Critical
>
> It looks like some of the AccumulatorSuite tests have started failing 
> nondeterministically on Jenkins.  The errors seem to be due to lost / missing 
> accumulator updates, e.g.
> {code}
> Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901
> {code}
> This could somehow be related to SPARK-3885 / 
> https://github.com/apache/spark/pull/4021, a patch to garbage-collect 
> accumulators, which was only merged into master.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/
> I think I've figured it out: consider the lifecycle of an accumulator in a 
> task, say ShuffleMapTask: on the executor, each task deserializes its own 
> copy of the RDD inside of its runTask method, so the strong reference to the 
> RDD disappears at the end of runTask. In Executor.run(), we call 
> Accumulators.values after runTask has exited, so there's a small window in 
> which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well 
> because there are no longer any strong references to them.
> The fix is to keep strong references in localAccums, since we clear this at 
> the end of each task anyways. I'm glad that I was able to figure out 
> precisely why this was necessary and sorry that I missed this during review; 
> I'll submit a fix shortly. In terms of preventative measures, it might be a 
> good idea to write up the lifetime / lifecycle of objects' strong references 
> whenever we're using WeakReferences, since the process of explicitly writing 
> that out would prevent these sorts of mistakes in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost

2015-02-28 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-6075:
-

Assignee: Josh Rosen

> After SPARK-3885, some tasks' accumulator updates may be lost
> -
>
> Key: SPARK-6075
> URL: https://issues.apache.org/jira/browse/SPARK-6075
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.4.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Blocker
>
> It looks like some of the AccumulatorSuite tests have started failing 
> nondeterministically on Jenkins.  The errors seem to be due to lost / missing 
> accumulator updates, e.g.
> {code}
> Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901
> {code}
> This could somehow be related to SPARK-3885 / 
> https://github.com/apache/spark/pull/4021, a patch to garbage-collect 
> accumulators, which was only merged into master.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/
> I think I've figured it out: consider the lifecycle of an accumulator in a 
> task, say ShuffleMapTask: on the executor, each task deserializes its own 
> copy of the RDD inside of its runTask method, so the strong reference to the 
> RDD disappears at the end of runTask. In Executor.run(), we call 
> Accumulators.values after runTask has exited, so there's a small window in 
> which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well 
> because there are no longer any strong references to them.
> The fix is to keep strong references in localAccums, since we clear this at 
> the end of each task anyways. I'm glad that I was able to figure out 
> precisely why this was necessary and sorry that I missed this during review; 
> I'll submit a fix shortly. In terms of preventative measures, it might be a 
> good idea to write up the lifetime / lifecycle of objects' strong references 
> whenever we're using WeakReferences, since the process of explicitly writing 
> that out would prevent these sorts of mistakes in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost

2015-02-28 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-6075:
--
Comment: was deleted

(was: I left some notes on the PR that might have introduced the bug: 
https://github.com/apache/spark/pull/4021#issuecomment-76511660

{quote}
I'm still trying to see if I can spot the problem, but my hunch is that maybe 
the localAccums thread-local maps should not hold weak references. When 
deserializing an accumulator in an executor and registering it with 
localAccums, is there ever a moment in which the accumulator has no strong 
references pointing to it? Does someone object hold a strong reference to an 
accumulator while it's being deserialized? If not, this could lead to it being 
dropped from the localAccums map, causing that task's accumulator updates to be 
lost.

{quote})

> After SPARK-3885, some tasks' accumulator updates may be lost
> -
>
> Key: SPARK-6075
> URL: https://issues.apache.org/jira/browse/SPARK-6075
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.4.0
>Reporter: Josh Rosen
>Priority: Critical
>
> It looks like some of the AccumulatorSuite tests have started failing 
> nondeterministically on Jenkins.  The errors seem to be due to lost / missing 
> accumulator updates, e.g.
> {code}
> Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901
> {code}
> This could somehow be related to SPARK-3885 / 
> https://github.com/apache/spark/pull/4021, a patch to garbage-collect 
> accumulators, which was only merged into master.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/
> I think I've figured it out: consider the lifecycle of an accumulator in a 
> task, say ShuffleMapTask: on the executor, each task deserializes its own 
> copy of the RDD inside of its runTask method, so the strong reference to the 
> RDD disappears at the end of runTask. In Executor.run(), we call 
> Accumulators.values after runTask has exited, so there's a small window in 
> which the tasks's RDD can be GC'd, causing accumulators to be GC'd as well 
> because there are no longer any strong references to them.
> The fix is to keep strong references in localAccums, since we clear this at 
> the end of each task anyways. I'm glad that I was able to figure out 
> precisely why this was necessary and sorry that I missed this during review; 
> I'll submit a fix shortly. In terms of preventative measures, it might be a 
> good idea to write up the lifetime / lifecycle of objects' strong references 
> whenever we're using WeakReferences, since the process of explicitly writing 
> that out would prevent these sorts of mistakes in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6075) After SPARK-3885, some tasks' accumulator updates may be lost

2015-02-28 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-6075:
--
Summary: After SPARK-3885, some tasks' accumulator updates may be lost  
(was: Flaky AccumulatorSuite.add value to collection accumulators test)

> After SPARK-3885, some tasks' accumulator updates may be lost
> -
>
> Key: SPARK-6075
> URL: https://issues.apache.org/jira/browse/SPARK-6075
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.4.0
>Reporter: Josh Rosen
>Priority: Critical
>  Labels: flaky-test
>
> It looks like some of the AccumulatorSuite tests have started failing 
> nondeterministically on Jenkins.  The errors seem to be due to lost / missing 
> accumulator updates, e.g.
> {code}
> Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901
> {code}
> This could somehow be related to SPARK-3885 / 
> https://github.com/apache/spark/pull/4021, a patch to garbage-collect 
> accumulators, which was only merged into master.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6075) Flaky AccumulatorSuite.add value to collection accumulators test

2015-02-28 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-6075:
--
Priority: Critical  (was: Major)

> Flaky AccumulatorSuite.add value to collection accumulators test
> 
>
> Key: SPARK-6075
> URL: https://issues.apache.org/jira/browse/SPARK-6075
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.4.0
>Reporter: Josh Rosen
>Priority: Critical
>  Labels: flaky-test
>
> It looks like some of the AccumulatorSuite tests have started failing 
> nondeterministically on Jenkins.  The errors seem to be due to lost / missing 
> accumulator updates, e.g.
> {code}
> Set(843, 356, 437, [...], 181, 618, 131) did not contain element 901
> {code}
> This could somehow be related to SPARK-3885 / 
> https://github.com/apache/spark/pull/4021, a patch to garbage-collect 
> accumulators, which was only merged into master.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/lastCompletedBuild/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/org.apache.spark/AccumulatorSuite/add_value_to_collection_accumulators/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3885) Provide mechanism to remove accumulators once they are no longer used

2015-02-28 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341753#comment-14341753
 ] 

Josh Rosen commented on SPARK-3885:
---

I found a correctness issue in this patch, which I'll fix shortly: see 
SPARK-6075

> Provide mechanism to remove accumulators once they are no longer used
> -
>
> Key: SPARK-3885
> URL: https://issues.apache.org/jira/browse/SPARK-3885
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2, 1.1.0, 1.2.0
>Reporter: Josh Rosen
>Assignee: Ilya Ganelin
> Fix For: 1.4.0
>
>
> Spark does not currently provide any mechanism to delete accumulators after 
> they are no longer used.  This can lead to OOMs for long-lived SparkContexts 
> that create many large accumulators.
> Part of the problem is that accumulators are registered in a global 
> {{Accumulators}} registry.  Maybe the fix would be as simple as using weak 
> references in the Accumulators registry so that accumulators can be GC'd once 
> they can no longer be used.
> In the meantime, here's a workaround that users can try:
> Accumulators have a public setValue() method that can be called (only by the 
> driver) to change an accumulator’s value.  You might be able to use this to 
> reset accumulators’ values to smaller objects (e.g. the “zero” object of 
> whatever your accumulator type is, or ‘null’ if you’re sure that the 
> accumulator will never be accessed again).
> This issue was originally reported by [~nkronenfeld] on the dev mailing list: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Accumulator-question-td8709.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5981) pyspark ML models should support predict/transform on vector within map

2015-02-28 Thread Manoj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341752#comment-14341752
 ] 

Manoj Kumar commented on SPARK-5981:


It seems for NaiveBayes it does work, see 
https://github.com/apache/spark/pull/4834 . I shall have a better look 
tomorrow. Sorry for the delay.

> pyspark ML models should support predict/transform on vector within map
> ---
>
> Key: SPARK-5981
> URL: https://issues.apache.org/jira/browse/SPARK-5981
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>
> Currently, most Python models only have limited support for single-vector 
> prediction.
> E.g., one can call {code}model.predict(myFeatureVector){code} for a single 
> instance, but that fails within a map for Python ML models and transformers 
> which use JavaModelWrapper:
> {code}
> data.map(lambda features: model.predict(features))
> {code}
> This fails because JavaModelWrapper.call uses the SparkContext (within the 
> transformation).  (It works for linear models, which do prediction within 
> Python.)
> Supporting prediction within a map would require storing the model and doing 
> prediction/transformation within Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6084) spark-shell broken on Windows

2015-02-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341746#comment-14341746
 ] 

Nicholas Chammas commented on SPARK-6084:
-

cc [~pwendell], [~andrewor14]

I haven't confirmed this issue myself. Just forwarding along the report I saw 
on Stack Overflow.

> spark-shell broken on Windows
> -
>
> Key: SPARK-6084
> URL: https://issues.apache.org/jira/browse/SPARK-6084
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.2.0, 1.2.1
> Environment: Windows 7, Scala 2.11.4, Java 1.8
>Reporter: Nicholas Chammas
>  Labels: windows
>
> Original report here: 
> http://stackoverflow.com/questions/28747795/spark-launch-find-version
> For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this:
> {code}
> bin\spark-shell.cmd
> {code}
> Yields the following error:
> {code}
> find: 'version': No such file or directory
> else was unexpected at this time.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6084) spark-shell broken on Windows

2015-02-28 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-6084:
---

 Summary: spark-shell broken on Windows
 Key: SPARK-6084
 URL: https://issues.apache.org/jira/browse/SPARK-6084
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.2.1, 1.2.0
 Environment: Windows 7, Scala 2.11.4, Java 1.8
Reporter: Nicholas Chammas


Original report here: 
http://stackoverflow.com/questions/28747795/spark-launch-find-version

For both spark-1.2.0-bin-hadoop2.4 and spark-1.2.1-bin-hadoop2.4, doing this:

{code}
bin\spark-shell.cmd
{code}

Yields the following error:

{code}
find: 'version': No such file or directory
else was unexpected at this time.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6083) Make Python API example consistent in NaiveBayes

2015-02-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341735#comment-14341735
 ] 

Apache Spark commented on SPARK-6083:
-

User 'MechCoder' has created a pull request for this issue:
https://github.com/apache/spark/pull/4834

> Make Python API example consistent in NaiveBayes
> 
>
> Key: SPARK-6083
> URL: https://issues.apache.org/jira/browse/SPARK-6083
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, MLlib
>Reporter: Manoj Kumar
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3413) Spark Blocked due to Executor lost in FIFO MODE

2015-02-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341732#comment-14341732
 ] 

Sean Owen commented on SPARK-3413:
--

This looks like it might be stale. It's also not a great deal of info to go on. 
The driver should be rescheduling tasks that fail or whose executors fail, 
right? Is there more info? can this still be reproduced?

> Spark Blocked due to Executor lost in FIFO MODE
> ---
>
> Key: SPARK-3413
> URL: https://issues.apache.org/jira/browse/SPARK-3413
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.9.2
>Reporter: Patrick Liu
>
> I run spark on yarn.
> Spark scheduler is running in FIFO mode.
> I have 80 worker instances setup. However, as time passes, some worker will 
> be lost. (Killed by JVM when OOM, etc).
> But some tasks will still run in those executors. 
> Obviously the task will never finished.
> Then the stage will not finish. So the later stages will be blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6083) Make Python API example consistent in NaiveBayes

2015-02-28 Thread Manoj Kumar (JIRA)
Manoj Kumar created SPARK-6083:
--

 Summary: Make Python API example consistent in NaiveBayes
 Key: SPARK-6083
 URL: https://issues.apache.org/jira/browse/SPARK-6083
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, MLlib
Reporter: Manoj Kumar
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6082) SparkSQL should fail gracefully when input data format doesn't match expectations

2015-02-28 Thread Kay Ousterhout (JIRA)
Kay Ousterhout created SPARK-6082:
-

 Summary: SparkSQL should fail gracefully when input data format 
doesn't match expectations
 Key: SPARK-6082
 URL: https://issues.apache.org/jira/browse/SPARK-6082
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.1
Reporter: Kay Ousterhout


I have a udf that creates a tab-delimited table. If any of the column values 
contain a tab, SQL fails with an ArrayIndexOutOfBounds exception (pasted 
below).  It would be great if SQL failed gracefully here, with a helpful 
exception (something like "One row contained too many values").

It looks like this can be done quite easily, by checking here if i > 
columnBuilders.size and if so, throwing a nicer exception: 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala#L124.

One thing that makes this problem especially annoying to debug is because if 
you do "CREATE table foo as select transform(..." and then "CACHE table foo", 
it works fine.  It only fails if you do "CACHE table foo as select 
transform(...".  Because of this, it would be great if the problem were more 
transparent to users.

Stack trace:
java.lang.ArrayIndexOutOfBoundsException: 3
  at 
org.apache.spark.sql.columnar.InMemoryRelation$anonfun$3$anon$1.next(InMemoryColumnarTableScan.scala:125)
  at 
org.apache.spark.sql.columnar.InMemoryRelation$anonfun$3$anon$1.next(InMemoryColumnarTableScan.scala:112)
  at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
  at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
  at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:245)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:220)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3402) Library for Natural Language Processing over Spark.

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-3402.
--
Resolution: Won't Fix

I think the consensus at this point would be that most third-party libraries 
built on Spark should by default be hosted outside the Spark project, and 
linked to at http://spark-packages.org/  That's the way to go if you've already 
got your own stand-alone project.

Reopen if you mean you have an implementation to submit that fits, likely, the 
new ML Pipelines API.

> Library for Natural Language Processing over Spark.
> ---
>
> Key: SPARK-3402
> URL: https://issues.apache.org/jira/browse/SPARK-3402
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Nagamallikarjuna
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3387) Misleading stage description on the driver UI

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-3387.
--
Resolution: Not a Problem

Your code does not include a call to {{groupBy}}, right? It calls 
{{groupByKey}} and that's part of the list here.
Yes, the actual execution plan does not map directly to what the user called. 
Some user-facing API methods are not distributed operations at all; some invoke 
several different distributed operations. I think this is as-intended.

> Misleading stage description on the driver UI
> -
>
> Key: SPARK-3387
> URL: https://issues.apache.org/jira/browse/SPARK-3387
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.0.2
> Environment: Java 1.6, OSX Mountain Lion
>Reporter: Christian Chua
>
> Steps to reproduce : compile and run this modified version of the 1.0.2 
> pagerank example :
> public static void main(String[] args) throws Exception {
> JavaSparkContext sc = new JavaSparkContext("local[8]", "Sample");
> JavaRDD < String > inputRDD = sc.textFile(INPUT_FILE,1);
> JavaPairRDD < String , String > a = inputRDD.mapToPair(new 
> PairFunction < String , String , String >() {
> @Override
> public Tuple2 < String , String > call(String s) throws Exception 
> {
> String[] parts = SPACES.split(s);
> return new Tuple2 < String , String >(parts[0], parts[1]);
> }
> });
> JavaPairRDD < String , String > b = a.distinct();
> JavaPairRDD < String , Iterable < String >> c = b.groupByKey(11);
> System.out.println(c.toDebugString());
> System.out.println(c.collect());
> JOptionPane.showMessageDialog(null, "Last Line");
> sc.stop();
> }
> The debug string will appear as :
> MappedValuesRDD[11] at groupByKey at Sample.java:45 (11 partitions)
>   MappedValuesRDD[10] at groupByKey at Sample.java:45 (11 partitions)
> MapPartitionsRDD[9] at groupByKey at Sample.java:45 (11 partitions)
>   ShuffledRDD[8] at groupByKey at Sample.java:45 (11 partitions)
> MappedRDD[7] at distinct at Sample.java:41 (1 partitions)
>   MapPartitionsRDD[6] at distinct at Sample.java:41 (1 partitions)
> ShuffledRDD[5] at distinct at Sample.java:41 (1 partitions)
>   MapPartitionsRDD[4] at distinct at Sample.java:41 (1 partitions)
> MappedRDD[3] at distinct at Sample.java:41 (1 partitions)
>   MappedRDD[2] at mapToPair at Sample.java:30 (1 partitions)
> MappedRDD[1] at textFile at Sample.java:28 (1 partitions)
>   HadoopRDD[0] at textFile at Sample.java:28 (1 
> partitions)
> The problem is that the "list of stages" in the UI (localhost:4040) does not 
> mention anything about "groupBy" 
> In fact it mentions "distinct" twice:
> stage 0 : collect
> stage 1 : distinct
> stage 2 : distinct
> This is piece of misleading information can confuse the learner significantly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3312) Add a groupByKey which returns a special GroupBy object like in pandas

2015-02-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341727#comment-14341727
 ] 

Sean Owen commented on SPARK-3312:
--

Interesting, is the reduce / max / min in question here by key? We have the 
{{stats()}} method for RDDs of {{Double}} already to take care of this for a 
whole RDD. Rather than add an API method for the by-key case, it's possible to 
use {{StatCounter}} to compute all of these at once over a bunch of values that 
have been collected by key. Does that do the trick or is this something more?

> Add a groupByKey which returns a special GroupBy object like in pandas
> --
>
> Key: SPARK-3312
> URL: https://issues.apache.org/jira/browse/SPARK-3312
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: holdenk
>Priority: Minor
>
> A common pattern which causes problems for new Spark users is using 
> groupByKey followed by a reduce. I'd like to make a special version of 
> groupByKey which returns a groupBy object (like the Panda's groupby object). 
> The resulting class would have a number of functions (min,max, stats, reduce) 
> which could all be implemented efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2930) clarify docs on using webhdfs with spark.yarn.access.namenodes

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2930:
-
Component/s: YARN
   Priority: Minor  (was: Major)
 Issue Type: Improvement  (was: Bug)

Seems like a good easy improvement; I don't know webhdfs integration enough to 
write it, but does anyone who does have a moment to take this down?

> clarify docs on using webhdfs with spark.yarn.access.namenodes
> --
>
> Key: SPARK-2930
> URL: https://issues.apache.org/jira/browse/SPARK-2930
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
>
> The documentation of spark.yarn.access.namenodes talks about putting 
> namenodes in it and gives example with hdfs://.  
> I can also be used with webhdfs so we should clarify how to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6081) DriverRunner doesn't support pulling HTTP/HTTPS URIs

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6081:
-
Component/s: Spark Submit
   Priority: Minor  (was: Major)

> DriverRunner doesn't support pulling HTTP/HTTPS URIs
> 
>
> Key: SPARK-6081
> URL: https://issues.apache.org/jira/browse/SPARK-6081
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Reporter: Timothy Chen
>Priority: Minor
>
> Standalone cluster mode according to the docs supports specifying http|https 
> jar urls, but when actually called the urls passed to the driver runner is 
> not able to pull http uris due to the usage of hadoopfs get.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5628) Add option to return spark-ec2 version

2015-02-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341724#comment-14341724
 ] 

Apache Spark commented on SPARK-5628:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/4833

> Add option to return spark-ec2 version
> --
>
> Key: SPARK-5628
> URL: https://issues.apache.org/jira/browse/SPARK-5628
> Project: Spark
>  Issue Type: Improvement
>  Components: EC2
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: backport-needed
> Fix For: 1.3.0, 1.4.0
>
>
> We need a {{--version}} option for {{spark-ec2}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6069) Deserialization Error ClassNotFoundException with Kryo, Guava 14

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6069:
-
Priority: Critical  (was: Major)
 Summary: Deserialization Error ClassNotFoundException with Kryo, Guava 14  
(was: Deserialization Error ClassNotFound )

To clarify the properties situation, in Spark 1.2.x we have 
{{spark.files.userClassPathFirst}} _and_ {{spark.yarn.user.classpath.first}}. 
{{spark.driver.userClassPathFirst}} and {{spark.executor.userClassPathFirst}} 
are the new more logical versions in 1.3+ only. So ignore those.

{{spark.yarn.user.classpath.first}} is actually what I am setting:
https://github.com/OryxProject/oryx/blob/master/oryx-lambda/src/main/java/com/cloudera/oryx/lambda/BatchLayer.java#L153

But it sounds like you are not using YARN.

Guava 14.0.1 is packaged with the app:
https://github.com/OryxProject/oryx/blob/master/pom.xml#L233

I'm running this on 1.2.0 + YARN, and also local[*] + 1.3.0-SNAPSHOT.

My ? is whether this is perhaps not working for standalone in 1.2 but does in 
1.3, since there has been some overhaul to this mechanism since 1.2.

> Deserialization Error ClassNotFoundException with Kryo, Guava 14
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>Priority: Critical
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6078) create event log directory automatically if not exists

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6078.
--
Resolution: Duplicate

> create event log directory automatically if not exists
> --
>
> Key: SPARK-6078
> URL: https://issues.apache.org/jira/browse/SPARK-6078
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Zhang, Liye
>
> when event log directory does not exists, spark just throw 
> IlleagalArgumentException and stop the job. User need manually create 
> directory first. It's better to create the directory automatically if the 
> directory does not exists.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341705#comment-14341705
 ] 

Pat Ferrel commented on SPARK-6069:
---

I agree, that part makes me suspicious, which is why I’m not sure I trust my 
builds completely.

No the ‘app' is one of the Spark-Mahout’s CLI drivers. The jar is a 
dependencies-reduced type thing that has only scopt and guava.

In any case if I put 
-D:spark.executor.extraClassPath=/Users/pat/mahout/spark/target/mahout-spark_2.10-1.0-SNAPSHOT-dependency-reduced.jar
 on the command line, which passes the key=value to the SparkConf then the 
Mahout CLI driver it works. The test setup is a standalone localhost only 
cluster (not local[n]). It is started with sbin/start-all.sh The same jar is 
used to create the context and I’ve checked that and the contents of the jar 
quite carefully.

On Feb 28, 2015, at 10:09 AM, Sean Owen (JIRA)  wrote:


   [ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341699#comment-14341699
 ] 

Sean Owen commented on SPARK-6069:
--

Hm, the thing is I have been successfully running an app, without spark-submit, 
with kryo, with Guava 14 just like you and have never had a problem. I can't 
figure out what the difference is here.

The kryo not-found exception is stranger still. You aren't packaging spark 
classes with your app right?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



> Deserialization Error ClassNotFound 
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341699#comment-14341699
 ] 

Sean Owen commented on SPARK-6069:
--

Hm, the thing is I have been successfully running an app, without spark-submit, 
with kryo, with Guava 14 just like you and have never had a problem. I can't 
figure out what the difference is here.

The kryo not-found exception is stranger still. You aren't packaging spark 
classes with your app right?

> Deserialization Error ClassNotFound 
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341689#comment-14341689
 ] 

ASF GitHub Bot commented on SPARK-6069:
---

Github user pferrel commented on the pull request:

https://github.com/apache/mahout/pull/74#issuecomment-76536731
  
This seems to be a bug in Spark 1.2.1 SPARK-6069

Work around is to add the following either to your SparkConf in your app or 
-D:spark.executor.extraClassPath=/Users/pat/mahout/spark/target/mahout-spark_2.10-1.0-SNAPSHOT-dependency-reduced.jar

To the mahout spark-xyz driver, where the jar contains any class that needs 
to be deserialized and the path exists on all workers.

Therefor it currently looks like Spark 1.2.1 is not worth supporting.


> Deserialization Error ClassNotFound 
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6081) DriverRunner doesn't support pulling HTTP/HTTPS URIs

2015-02-28 Thread Timothy Chen (JIRA)
Timothy Chen created SPARK-6081:
---

 Summary: DriverRunner doesn't support pulling HTTP/HTTPS URIs
 Key: SPARK-6081
 URL: https://issues.apache.org/jira/browse/SPARK-6081
 Project: Spark
  Issue Type: Improvement
Reporter: Timothy Chen


Standalone cluster mode according to the docs supports specifying http|https 
jar urls, but when actually called the urls passed to the driver runner is not 
able to pull http uris due to the usage of hadoopfs get.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6081) DriverRunner doesn't support pulling HTTP/HTTPS URIs

2015-02-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341687#comment-14341687
 ] 

Apache Spark commented on SPARK-6081:
-

User 'tnachen' has created a pull request for this issue:
https://github.com/apache/spark/pull/4832

> DriverRunner doesn't support pulling HTTP/HTTPS URIs
> 
>
> Key: SPARK-6081
> URL: https://issues.apache.org/jira/browse/SPARK-6081
> Project: Spark
>  Issue Type: Improvement
>Reporter: Timothy Chen
>
> Standalone cluster mode according to the docs supports specifying http|https 
> jar urls, but when actually called the urls passed to the driver runner is 
> not able to pull http uris due to the usage of hadoopfs get.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341679#comment-14341679
 ] 

Pat Ferrel commented on SPARK-6069:
---

No goodness from spark.executor.userClassPathFirst either--same error as above. 
I'll try again Monday when I'm back to my regular cluster.

> Deserialization Error ClassNotFound 
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341672#comment-14341672
 ] 

Pat Ferrel edited comment on SPARK-6069 at 2/28/15 5:31 PM:


Not sure I completely trust this result--I'm away from my HDFS cluster right 
now and so the standalone Spark is not quite that same as before...

Also didn't see you spark.executor.userClassPathFirst comment--will try next. 

I tried: 

 sparkConf.set("spark.files.userClassPathFirst", "true")

But got the following error:

15/02/28 09:23:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
(TID 0, 192.168.0.7): java.lang.NoClassDefFoundError: 
org/apache/spark/serializer/KryoRegistrator
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at 
org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42)
at 
org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:50)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:103)
at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:103)
at scala.Option.map(Option.scala:145)
at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:103)
at 
org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:159)
at 
org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:121)
at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.serializer.KryoRegistrator
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at 
org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 36 more



was (Author: pferrel):
Not sure I completely trust this result--I'm away from my HDFS cluster right 
now and so the standalone Spark is not quite that same as before...

I tried: 

 sparkConf.set("spark.files.userClassPathFirst", "true")

But got the following error:

15/02/28 09:23:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
(TID 0, 192.168.0.7): java.lang.NoClassDefFoundError: 
org/apache/spark/serializer/KryoRegistrator
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defi

[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341672#comment-14341672
 ] 

Pat Ferrel commented on SPARK-6069:
---

Not sure I completely trust this result--I'm away from my HDFS cluster right 
now and so the standalone Spark is not quite that same as before...

I tried: 

 sparkConf.set("spark.files.userClassPathFirst", "true")

But got the following error:

15/02/28 09:23:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
(TID 0, 192.168.0.7): java.lang.NoClassDefFoundError: 
org/apache/spark/serializer/KryoRegistrator
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at 
org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42)
at 
org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:50)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:103)
at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:103)
at scala.Option.map(Option.scala:145)
at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:103)
at 
org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:159)
at 
org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:121)
at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.serializer.KryoRegistrator
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at 
org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 36 more


> Deserialization Error ClassNotFound 
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard 

[jira] [Resolved] (SPARK-5993) Published Kafka-assembly JAR was empty in 1.3.0-RC1

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5993.
--
Resolution: Fixed

Looks like this was resolved in https://github.com/apache/spark/pull/4753

> Published Kafka-assembly JAR was empty in 1.3.0-RC1
> ---
>
> Key: SPARK-5993
> URL: https://issues.apache.org/jira/browse/SPARK-5993
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Blocker
> Fix For: 1.3.0
>
>
> This is because the maven build generated two Jars-
> 1. an empty JAR file (since kafka-assembly has no code of its own)
> 2. a assembly JAR file containing everything in a different location as 1
> The maven publishing plugin uploaded 1 and not 2. 
> Instead if 2 is not configure to generate in a different location, there is 
> only 1 jar containing everything, which gets published.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341655#comment-14341655
 ] 

Sean Owen commented on SPARK-6069:
--

This is an app-level setting as it's specific to the app. I would make the 
change in your app rather than globally. Although the new prop is 
spark.executor.userClassPathFirst I haven't double-checked whether that's 1.3+ 
only. Heh, set them all.

> Deserialization Error ClassNotFound 
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341640#comment-14341640
 ] 

Pat Ferrel edited comment on SPARK-6069 at 2/28/15 4:48 PM:


I can try it. Are you suggesting an app change or a master conf change?

I need to add to conf/spar-default.conf?
spark.files.userClassPathFirst  true

Or should I add that to the context via SparkConf?

We have a standalone app that is not launched via spark-submit. But I guess 
your comment suggests an app change via SparkConf so I'll try that.


was (Author: pferrel):
I can try it. Are you suggesting an app change or a master conf change?

I need to add to conf/spar-default.conf?
spark.files.userClassPathFirst  true

Or should I add that to the context via SparkConf?

We have a standalone app that is not launched via spark-submit.

> Deserialization Error ClassNotFound 
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341640#comment-14341640
 ] 

Pat Ferrel edited comment on SPARK-6069 at 2/28/15 4:47 PM:


I can try it. Are you suggesting an app change or a master conf change?

I need to add to conf/spar-default.conf?
spark.files.userClassPathFirst  true

Or should I add that to the context via SparkConf?

We have a standalone app that is not launched via spark-submit.


was (Author: pferrel):
I can try it. Are you suggesting an app change or a master conf change?

I need to add to conf/spar-default.conf?
spark.files.userClassPathFirst  true

Or should I add that to the context via SparkConf?

> Deserialization Error ClassNotFound 
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341640#comment-14341640
 ] 

Pat Ferrel commented on SPARK-6069:
---

I can try it. Are you suggesting an app change or a master conf change?

I need to add to conf/spar-default.conf?
spark.files.userClassPathFirst  true

Or should I add that to the context via SparkConf?

> Deserialization Error ClassNotFound 
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6068) KMeans Parallel test may fail

2015-02-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341634#comment-14341634
 ] 

Sean Owen commented on SPARK-6068:
--

Yes, it seems like too much change to the existing version. From 
https://github.com/apache/spark/pull/2634 it seems like there are just some 
differences of opinion about what's worth doing and how. I think the only way 
forward would be to propose integration what you've done for the new version in 
the {{.ml}} package, because it's not clear the existing PR isn't going to 
proceed.

I'm hoping to just drive a resolution to what is almost one big issue rather 
than leave it hanging. I'm looking at the ~8 JIRAs for k-means you created:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20reporter%20%3D%20%22Derrick%20Burns%22%20AND%20resolution%20%3D%20Unresolved

I assume a couple (like this one) are 'back-portable' from your work to the 
existing impl. Can we zap those and close them with a PR? This would be great 
and I'd like to help get those quick wins in.

The rest sound like interdependent aspects of one proposal: create a new 
k-means implementation with different design and properties X / Y / Z, and use 
it in the new pipelines API. (I can't say whether this would be accepted or not 
but that's what's on the table). I'd rather coherently collect that rather than 
have it live in pieces in JIRA, esp. since I'm getting the sense these 
remaining pieces won't otherwise move forward.

> KMeans Parallel test may fail
> -
>
> Key: SPARK-6068
> URL: https://issues.apache.org/jira/browse/SPARK-6068
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.2.1
>Reporter: Derrick Burns
>  Labels: clustering
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The test  "k-means|| initialization in KMeansSuite can fail when the random 
> number generator is truly random.
> The test is predicated on the assumption that each round of K-Means || will 
> add at least one new cluster center.  The current implementation of K-Means 
> || adds 2*k cluster centers with high probability.  However, there is no 
> deterministic lower bound on the number of cluster centers added.
> Choices are:
> 1)  change the KMeans || implementation to iterate on selecting points until 
> it has satisfied a lower bound on the number of points chosen.
> 2) eliminate the test
> 3) ignore the problem and depend on the random number generator to sample the 
> space in a lucky manner. 
> Option (1) is most in keeping with the contract that KMeans || should provide 
> a precise number of cluster centers when possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341625#comment-14341625
 ] 

Sean Owen commented on SPARK-6069:
--

No, I am not suggesting that {{--conf spark.executor.extraClassPath}} is a 
right way to do this, but {{userClassPathFirst}} may be. There is no class 
conflict problem, but there is definitely a classloader visibility and thus 
ordering problem. It's worth a try if you have a second, since I would think 
this is the right way to address this and really any of this type of issue. It 
remains to be seen though.

> Deserialization Error ClassNotFound 
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFound

2015-02-28 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341608#comment-14341608
 ] 

Pat Ferrel commented on SPARK-6069:
---

It may be a dup, [~vanzin] said as much but I couldn't find the obvious Jira.

Any time the work around is to use "spark-submit --conf 
spark.executor.extraClassPath=/guava.jar blah” that means a standalone apps 
must have hard coded paths that are honored on every worker. And as you know a 
lib is pretty much blocked from use of this version of Spark—hence the blocker 
severity. We’ll have to warn people to not use this version of Spark.

I could easily be wrong but userClassPathFirst doesn’t seem to be the issue. 
There is no class conflict.


> Deserialization Error ClassNotFound 
> 
>
> Key: SPARK-6069
> URL: https://issues.apache.org/jira/browse/SPARK-6069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: Standalone one worker cluster on localhost, or any 
> cluster
>Reporter: Pat Ferrel
>
> A class is contained in the jars passed in when creating a context. It is 
> registered with kryo. The class (Guava HashBiMap) is created correctly from 
> an RDD and broadcast but the deserialization fails with ClassNotFound.
> The work around is to hard code the path to the jar and make it available on 
> all workers. Hard code because we are creating a library so there is no easy 
> way to pass in to the app something like:
> spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL

2015-02-28 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-5427:
--
Labels: math  (was: )

> Add support for floor function in Spark SQL
> ---
>
> Key: SPARK-5427
> URL: https://issues.apache.org/jira/browse/SPARK-5427
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>  Labels: math
>
> floor() function is supported in Hive SQL.
> This issue is to add floor() function to Spark SQL.
> Related thread: http://search-hadoop.com/m/JW1q563fc22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1965) Spark UI throws NPE on trying to load the app page for non-existent app

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1965.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 4777
[https://github.com/apache/spark/pull/4777]

> Spark UI throws NPE on trying to load the app page for non-existent app
> ---
>
> Key: SPARK-1965
> URL: https://issues.apache.org/jira/browse/SPARK-1965
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Fix For: 1.4.0
>
>
> If you try to load the Spark UI for an application that doesn't exist:
> sparkHost:8080/app/?appId=foobar
> The UI throws a NPE.  The problem is in ApplicationPage.scala -- Spark 
> proceeds even if the "app" variable is null.  We should handle this more 
> gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-1965) Spark UI throws NPE on trying to load the app page for non-existent app

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-1965:


Assignee: Sean Owen

> Spark UI throws NPE on trying to load the app page for non-existent app
> ---
>
> Key: SPARK-1965
> URL: https://issues.apache.org/jira/browse/SPARK-1965
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Kay Ousterhout
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 1.4.0
>
>
> If you try to load the Spark UI for an application that doesn't exist:
> sparkHost:8080/app/?appId=foobar
> The UI throws a NPE.  The problem is in ApplicationPage.scala -- Spark 
> proceeds even if the "app" variable is null.  We should handle this more 
> gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5983) Don't respond to HTTP TRACE in HTTP-based UIs

2015-02-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5983:
-
Component/s: (was: Spark Core)
 Web UI
 Labels: security  (was: )

Resolved by https://github.com/apache/spark/pull/4765

> Don't respond to HTTP TRACE in HTTP-based UIs
> -
>
> Key: SPARK-5983
> URL: https://issues.apache.org/jira/browse/SPARK-5983
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>  Labels: security
> Fix For: 1.4.0
>
>
> This was flagged a while ago during a routine security scan: the HTTP-based 
> Spark services respond to an HTTP TRACE command. This is basically an HTTP 
> verb that has no practical use, and has a pretty theoretical chance of being 
> an exploit vector. It is flagged as a security issue by one common tool, 
> however.
> Spark's HTTP services are based on Jetty, which by default does not enable 
> TRACE (like Tomcat). However, the services do reply to TRACE requests. I 
> think it is because the use of Jetty is pretty 'raw' and does not enable much 
> of the default additional configuration you might get by using Jetty as a 
> standalone server.
> I know that it is at least possible to stop the reply to TRACE with a few 
> extra lines of code, so I think it is worth shutting off TRACE requests. 
> Although the security risk is quite theoretical, it should be easy to fix and 
> bring the Spark services into line with the common default of HTTP servers 
> today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >