from:"randylu"

Always two tasks slower than others, and then job fails

2015-08-13 Thread randylu

It is strange that there are always two tasks slower than others, and the corresponding partitions's data are larger, no matter how many partitions? Executor ID Address Task Time Shuffle Read Size / Records 1 slave129.vsvs.com:56691 16 s1 99.5 MB /

Re: why does com.esotericsoftware.kryo.KryoException: java.u til.ConcurrentModificationException happen?

2015-05-28 Thread randylu

begs for your help -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/why-does-com-esotericsoftware-kryo-KryoException-java-u-til-ConcurrentModificationException-happen-tp23067p23068.html Sent from the Apache Spark User List mailing list archive at

why does com.esotericsoftware.kryo.KryoException: java.u til.ConcurrentModificationException happen?

2015-05-28 Thread randylu

My program runs for 500 iterations, but fails at about 150 iterations almostly. It's hard to explain the details of my program, but i think my program is ok, for it runs succesfully somtimes. *I just wana know in which situations this exception will happen*. The detail error information is

Re: why does driver connects to master fail ?

2014-10-20 Thread randylu

? and what is the cluster setup that you are having? Given the logs, it looks like the master is dead for some reason. Thanks Best Regards On Sun, Oct 19, 2014 at 2:48 PM, randylu lt; randylu26@ gt; wrote: In additional, driver receives serveral DisassociatedEvent messages. -- View

Re: why does driver connects to master fail ?

2014-10-20 Thread randylu

The cluster also runs other applcations every hour as normal, so the master is always running. No matter what the cores i use or the quantity of input-data(but big enough), the application just fail at 1.1 hours later. -- View this message in context:

Re: why does driver connects to master fail ?

2014-10-20 Thread randylu

My application is used for LDA(a topic model, with gibbs sampling), it's hard for me to explain LDA, so you need to search it on google if any. I did increase spark.akka.frameSize to 1GB(even 5GB), both in master/workers's spark-defaults.conf and SparkConf, but it has no effect at all. I'm

why does driver connects to master fail ?

2014-10-19 Thread randylu

it? Best, randylu -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/why-does-driver-connects-to-master-fail-tp16758.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: why does driver connects to master fail ?

2014-10-19 Thread randylu

In additional, driver receives serveral DisassociatedEvent messages. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/why-does-driver-connects-to-master-fail-tp16758p16759.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

why do RDD's partitions migrate between worker nodes in different iterations

2014-10-17 Thread randylu

Dear all, In my test programer, there are 3 partitions for each RDD, the iteration procedure is as follows: var rdd_0 = ... // init for (...) { *rdd_1* = *rdd_0*.reduceByKey(...).partitionBy(p) // calculate rdd_1 from rdd_0 *rdd_0* = *rdd_0*.partitionBy(p).join(*rdd_1*)... //

something about rdd.collect

2014-10-14 Thread randylu

My code is as follows: *documents.flatMap(case words = words.map(w = (w, 1))).reduceByKey(_ + _).collect()* In driver's log, reduceByKey() is finished, but collect() seems always in run, just can't be finished. In additional, there are about 200,000,000 words needs to be collected. Is it

Re: something about rdd.collect

2014-10-14 Thread randylu

Thanks rxin, I still have a doubt about collect(). Word's number before reduceByKey() is about 200 million, and after reduceByKey() it decreases to 18 million. Memory for driver is initialized 15GB, then I print out runtime.freeMemory() before reduceByKey(), it indicates 13GB free memory. I

Re: something about rdd.collect

2014-10-14 Thread randylu

If memory is not enough, OutOfMemory exception should occur, but nothing in driver's log. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/something-about-rdd-collect-tp16451p16461.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: java.lang.StackOverflowError when calling count()

2014-08-12 Thread randylu

hi, TD. Thanks very much！ I got it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-StackOverflowError-when-calling-count-tp5649p11980.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: java.lang.StackOverflowError when calling count()

2014-08-11 Thread randylu

hi, TD. I also fall into the trap of long lineage, and your suggestions do work well. But i don't understand why the long lineage can cause stackover, and where it takes effect? -- View this message in context:

Re: how to make saveAsTextFile NOT split output into multiple file?

2014-06-25 Thread randylu

rdd.coalesce() will take effect: rdd.coalesce(1, true).saveAsTextFile(save_path) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-make-saveAsTextFile-NOT-split-output-into-multiple-file-tp8129p8244.html Sent from the Apache Spark User List

problem about cluster mode of spark 1.0.0

2014-06-20 Thread randylu

my programer runs in standalone model, the commond line is like： /opt/spark-1.0.0/bin/spark-submit \ --verbose \ --class $class_name --master spark://master:7077 \ --driver-memory 15G \ --driver-cores 2 \ --deploy-mode cluster \

Re: problem about broadcast variable in iteration

2014-05-29 Thread randylu

hi, Andrew Ash, thanks for your reply. In fact, I have already used unpersist(), but it doesn't take effect. One reason that I select 1.0.0 version is just for it providing unpersist() interface. -- View this message in context:

problem about broadcast variable in iteration

2014-05-15 Thread randylu

My code just like follows: 1 var rdd1 = ... 2 var rdd2 = ... 3 var kv = ... 4 for (i - 0 until n) { 5var kvGlobal = sc.broadcast(kv) // broadcast kv 6rdd1 = rdd2.map { 7 case t = doSomething(t, kvGlobal.value) 8} 9var tmp =

Re: problem about broadcast variable in iteration

2014-05-15 Thread randylu

rdd1 is cached, but it has no effect: 1 var rdd1 = ... 2 var rdd2 = ... 3 var kv = ... 4 for (i - 0 until n) { 5var kvGlobal = sc.broadcast(kv) // broadcast kv 6rdd1 = rdd2.map { 7 case t = doSomething(t, kvGlobal.value) 8}.cache() 9var tmp =

Re: problem about broadcast variable in iteration

2014-05-15 Thread randylu

But when i put broadcast variable out of for-circle, it workes well(if not concerned about memory issue as you pointed out): 1 var rdd1 = ... 2 var rdd2 = ... 3 var kv = ... 4 var kvGlobal = sc.broadcast(kv) // broadcast kv 5 for (i - 0 until n) { 6rdd1 =

Re: problem about broadcast variable in iteration

2014-05-10 Thread randylu

i run in spark 1.0.0, the newest under-development version. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479p5480.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: about broadcast

2014-05-06 Thread randylu

i found that the small broadcast variable always took about 10s, not 5s or else. If there is some property/conf(which is default 10) that control the timeout? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/about-broadcast-tp5416p5439.html Sent from the

about broadcast

2014-05-05 Thread randylu

In my code, there are two broadcast variables. Sometimes reading the small one took more time than the big one, so strange! Log on slave node is as follows: Block broadcast_2 stored as values to memory (estimated size *4.0 KB*, free 17.2 GB) Reading broadcast variable 2 took *9.998537123* s

Re: about broadcast

2014-05-05 Thread randylu

additional, Reading the big broadcast variable always took about 2s. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/about-broadcast-tp5416p5417.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: two calls of saveAsTextFile() have different results on the same RDD

2014-04-23 Thread randylu

i got it, thanks very much :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/two-calls-of-saveAsTextFile-have-different-results-on-the-same-RDD-tp4578p4655.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

about rdd.filter()

2014-04-23 Thread randylu

my code is like: rdd2 = rdd1.filter(_._2.length 1) rdd2.collect() it works well, but if i use a variable /num/ instead of 1: var num = 1 rdd2 = rdd1.filter(_._2.length num) rdd2.collect() it fails at rdd2.collect() so strange? -- View this message in context:

Re: about rdd.filter()

2014-04-23 Thread randylu

14/04/23 17:17:40 INFO DAGScheduler: Failed to run collect at SparkListDocByTopic.scala:407 Exception in thread main java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at

Re: about rdd.filter()

2014-04-23 Thread randylu

@Cheng Lian-2, Sourav Chandra, thanks very much. You are right! The situation just like what you say. so nice ! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/about-rdd-filter-tp4657p4718.html Sent from the Apache Spark User List mailing list archive

two calls of saveAsTextFile() have different results on the same RDD

2014-04-21 Thread randylu

i just call saveAsTextFile() twice. 'doc_topic_dist' is type of RDD[(Long, Array[Int])], each element is pair of (doc, topic_arr), for the same doc, they have different of topic_arr in two files. ... doc_topic_dist.coalesce(1, true).saveAsTextFile(save_path)

Re: two calls of saveAsTextFile() have different results on the same RDD

2014-04-21 Thread randylu

it's ok when i call doc_topic_dist.cache() firstly. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/two-calls-of-saveAsTextFile-have-different-results-on-the-same-RDD-tp4578p4580.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Always two tasks slower than others, and then job fails

Re: why does com.esotericsoftware.kryo.KryoException: java.u til.ConcurrentModificationException happen?

why does com.esotericsoftware.kryo.KryoException: java.u til.ConcurrentModificationException happen?

Re: why does driver connects to master fail ?

Re: why does driver connects to master fail ?

Re: why does driver connects to master fail ?

why does driver connects to master fail ?

Re: why does driver connects to master fail ?

why do RDD's partitions migrate between worker nodes in different iterations

something about rdd.collect

Re: something about rdd.collect

Re: something about rdd.collect

Re: java.lang.StackOverflowError when calling count()

Re: java.lang.StackOverflowError when calling count()

Re: how to make saveAsTextFile NOT split output into multiple file?

problem about cluster mode of spark 1.0.0

Re: problem about broadcast variable in iteration

problem about broadcast variable in iteration

Re: problem about broadcast variable in iteration

Re: problem about broadcast variable in iteration

Re: problem about broadcast variable in iteration

Re: about broadcast

about broadcast

Re: about broadcast

Re: two calls of saveAsTextFile() have different results on the same RDD

about rdd.filter()

Re: about rdd.filter()

Re: about rdd.filter()

two calls of saveAsTextFile() have different results on the same RDD

Re: two calls of saveAsTextFile() have different results on the same RDD

30 matches

Site Navigation

Mail list logo

Footer information