Re: RE: Spark checkpoint problem

2015-11-26 Thread eric wong
I don't think it is a deliberate design. So you may need do action on the RDD before the action of RDD, if you want to explicitly checkpoint RDD. 2015-11-26 13:23 GMT+08:00 wyphao.2007 : > Spark 1.5.2. > > 在 2015-11-26 13:19:39,"张志强(旺轩)" 写道: >

Re: when cached RDD will unpersist its data

2015-06-23 Thread eric wong
In a case that memory cannot hold all the cached RDD, then BlockManager will evict some older block for storage of new RDD block. Hope that will helpful. 2015-06-24 13:22 GMT+08:00 bit1...@163.com bit1...@163.com: I am kind of consused about when cached RDD will unpersist its data. I know we

How to set DEBUG level log of spark executor on Standalone deploy mode

2015-04-29 Thread eric wong
Hi, I want to check the DEBUG log of spark executor on Standalone deploy mode. But, 1. Set log4j.properties in spark/conf folder on master node and restart cluster. no means above works. 2. usning spark-submit --properties-file log4j. Just print debug log to screen but executor log still seems to

WebUI shows poor locality when task schduling

2015-04-21 Thread eric wong
Hi, When running a exprimental KMeans job for expriment, the Cached RDD is original Points data. I saw poor locality in Task details from WebUI. Almost one half of the input of task is Network instead of Memory. And Task with network input consumes almost the same time compare with the task

Re: How does Spark honor data locality when allocating computing resources for an application

2015-03-14 Thread eric wong
you seem like not to note the configuration varible spreadOutApps And it's comment: // As a temporary workaround before better ways of configuring memory, we allow users to set // a flag that will perform round-robin scheduling across the nodes (spreading out each app // among all the

Re: Re: I think I am almost lost in the internals of Spark

2015-01-06 Thread eric wong
A good beginning if you are chinese. https://github.com/JerryLead/SparkInternals/tree/master/markdown 2015-01-07 10:13 GMT+08:00 bit1...@163.com bit1...@163.com: Thank you, Tobias. I will look into the Spark paper. But it looks that the paper has been moved,

Re: how to set log level of spark executor on YARN(using yarn-cluster mode)

2014-10-20 Thread eric wong
, eric wong win19...@gmail.com wrote: Hi, I want to check the DEBUG log of spark executor on YARN(using yarn-cluster mode), but 1. yarn daemonlog setlevel DEBUG YarnChild.class 2. set log4j.properties in spark/conf folder on client node. no means above works. So how could i set

how to submit multiple jar files when using spark-submit script in shell?

2014-10-17 Thread eric wong
Hi, i using the comma separated style for submit multiple jar files in the follow shell but it does not work: bin/spark-submit --class org.apache.spark.examples.mllib.JavaKMeans --master yarn-cluster --execur-memory 2g *--jars

how to set log level of spark executor on YARN(using yarn-cluster mode)

2014-10-15 Thread eric wong
Hi, I want to check the DEBUG log of spark executor on YARN(using yarn-cluster mode), but 1. yarn daemonlog setlevel DEBUG YarnChild.class 2. set log4j.properties in spark/conf folder on client node. no means above works. So how could i set the log level of spark executor* on YARN container