from:"Tan Tim"

A question about accumulator

2015-11-10 Thread Tan Tim

Hi, all

There is a discussion about the accumulator in stack overflow:
http://stackoverflow.com/questions/27357440/spark-accumalator-value-is-different-when-inside-rdd-and-outside-rdd

I comment about this question (from user Tim). As the output I tried, I
hava two questions:
1. Why the addInplace function be called twice?
2. Why the order of two ouput  is difference ?

Any suggestion will be appreciated.

Re: why a machine learning application run slowly on the spark cluster

2014-07-29 Thread Tan Tim

The application is Logistic Regression (OWLQN), we develop a sparse vector
version. The feature dimesions is 1M+, but its very sparse. This appliction
can run on another spark cluster, and every stage is about 50 seconds, and
every executors have highly cpu usage. the only difference is OS(the faster
one is ubuntu, and the slower on is centos).

Re: why a machine learning application run slowly on the spark cluster

2014-07-29 Thread Tan Tim

input data is evenly distributed to the executors.

The input data is on the HDFS, not on the spark clusters. How can I make
the data distributed to the excutors?


On Wed, Jul 30, 2014 at 1:52 PM, Xiangrui Meng  wrote:

> The weight vector is usually dense and if you have many partitions,
> the driver may slow down. You can also take a look at the driver
> memory inside the Executor tab in WebUI. Another setting to check is
> the HDFS block size and whether the input data is evenly distributed
> to the executors. Are the hardware specs the same for the two
> clusters? -Xiangrui
>
> On Tue, Jul 29, 2014 at 10:46 PM, Tan Tim  wrote:
> > The application is Logistic Regression (OWLQN), we develop a sparse
> vector
> > version. The feature dimesions is 1M+, but its very sparse. This
> appliction
> > can run on another spark cluster, and every stage is about 50 seconds,
> and
> > every executors have highly cpu usage. the only difference is OS(the
> faster
> > one is ubuntu, and the slower on is centos).
>

Re: why a machine learning application run slowly on the spark cluster

2014-07-30 Thread Tan Tim

I modify the code:
lines.map(parsePoint).persist(StorageLever.MEMORY_ONLY)
to
lines.map(parsePoint).repartition(64).persist(StorageLever.MEMORY_ONLY)

Every Stage run so fast, about 30 seconds(reduce from 3.5 minutes). But I
found the total task reduce from 200 t0 64 after first stage just like this:


But I don't know if this is reasonable.



On Wed, Jul 30, 2014 at 2:11 PM, Xiangrui Meng  wrote:

> After you load the data in, call `.repartition(number of
> executors).cache()`. If the data is evenly distributed, it may be hard
> to guess the root cause. Do the two clusters have the same internode
> bandwidth? -Xiangrui
>
> On Tue, Jul 29, 2014 at 11:06 PM, Tan Tim  wrote:
> > input data is evenly distributed to the executors.
> > 
> > The input data is on the HDFS, not on the spark clusters. How can I make
> the
> > data distributed to the excutors?
> >
> >
> > On Wed, Jul 30, 2014 at 1:52 PM, Xiangrui Meng  wrote:
> >>
> >> The weight vector is usually dense and if you have many partitions,
> >> the driver may slow down. You can also take a look at the driver
> >> memory inside the Executor tab in WebUI. Another setting to check is
> >> the HDFS block size and whether the input data is evenly distributed
> >> to the executors. Are the hardware specs the same for the two
> >> clusters? -Xiangrui
> >>
> >> On Tue, Jul 29, 2014 at 10:46 PM, Tan Tim  wrote:
> >> > The application is Logistic Regression (OWLQN), we develop a sparse
> >> > vector
> >> > version. The feature dimesions is 1M+, but its very sparse. This
> >> > appliction
> >> > can run on another spark cluster, and every stage is about 50 seconds,
> >> > and
> >> > every executors have highly cpu usage. the only difference is OS(the
> >> > faster
> >> > one is ubuntu, and the slower on is centos).
> >
> >
>

how to track the jobs status without the webUI

2014-09-18 Thread Tan Tim

Hi, all,

I can see the job failed from the web UI, But when I run ps on the
client(which machine I submit the job), I can find the proces is still
exists:
user_tt   5971  2.6  2.2 15030180 3029840 ?Sl   11:41   4:37 java  -cp
/var/bh/lib/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar:/home/user_tt/pole_star_2.0/lib/owlqn_2.10-1.0.jar
OWLQN

I run the spark job in the background, so I use the pid to track the job's
status. so My question is  how can I track the job's status when the job is
actually failed.


tim.tan

Re: Spark run slow after unexpected repartition

2014-09-18 Thread Tan Tim

I also encountered the similar problem: after some stages, all the taskes
are assigned to one machine, and the stage execution get slower and slower.

*[the spark conf setting]*
val conf = new SparkConf().setMaster(sparkMaster).setAppName("ModelTraining"
).setSparkHome(sparkHome).setJars(List(jarFile))
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.set("spark.kryo.registrator", "LRRegistrator")
conf.set("spark.storage.memoryFraction", "0.7")
conf.set("spark.executor.memory", "8g")
conf.set("spark.cores.max", "150")
conf.set("spark.speculation", "true")
conf.set("spark.storage.blockManagerHeartBeatMs", "30")

val sc = new SparkContext(conf)
val lines = sc.textFile("hdfs://xxx:52310"+inputPath , 3)
val trainset = lines.map(parseWeightedPoint).repartition(50
).persist(StorageLevel.MEMORY_ONLY)

*[the warn log from the spark]*
14/09/19 10:26:23 WARN TaskSetManager: Loss was due to fetch failure from
BlockManagerId(45, TS-BH109, 48384, 0)
14/09/19 10:27:18 WARN TaskSetManager: Lost TID 726 (task 14.0:9)
14/09/19 10:29:03 WARN SparkDeploySchedulerBackend: Ignored task status
update (737 state FAILED) from unknown executor
Actor[akka.tcp://sparkExecutor@TS-BH96:33178/user/Executor#-913985102] with
ID 39
14/09/19 10:29:03 WARN TaskSetManager: Loss was due to fetch failure from
BlockManagerId(30, TS-BH136, 28518, 0)
14/09/19 11:01:22 WARN BlockManagerMasterActor: Removing BlockManager
BlockManagerId(47, TS-BH136, 31644, 0) with no recent heart beats: 47765ms
exceeds 45000ms

Any suggestions?

On Thu, Sep 18, 2014 at 4:46 PM, shishu  wrote:

>  Hi dear all~
>
> My spark application sometimes runs much slower than it use to be, so I
> wonder why would this happen.
>
> I find out that after a repartition stage of stage 17, all tasks go to one
> executor. But in my code, I only use repartition at the very beginning.
>
> In my application, before stage 17, every stage run sucessfully within 1
> minute, but after stage 17, it cost more than 10 minutes for every stage.
> Normally my application runs succcessfully and will finish within 9 minites.
>
> My spark version is 0.9.1, and my program is writen by scala.
>
>
>
> I take some screen-shots, you can see it in the archive.
>
>
>
> Great thanks if you can help~
>
>
>
> Shi Shu
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

A question about accumulator

Re: why a machine learning application run slowly on the spark cluster

Re: why a machine learning application run slowly on the spark cluster

Re: why a machine learning application run slowly on the spark cluster

how to track the jobs status without the webUI

Re: Spark run slow after unexpected repartition

6 matches

Site Navigation

Mail list logo

Footer information