Re: Spark stalls or hangs: is this a clue? remote fetches seem to never return?

2015-02-05 Thread Xuefeng Wu
what's the dump info by jstack? Yours, Xuefeng Wu 吴雪峰 敬上 On 2015年2月6日, at 上午10:20, Michael Albert m_albert...@yahoo.com.INVALID wrote: My apologies for following up my own post, but I thought this might be of interest. I terminated the java process corresponding to executor which had

Re: how to debug this kind of error, e.g. lost executor?

2015-02-05 Thread Xuefeng Wu
could you find the shuffle files? or the files were deleted by other processes? Yours, Xuefeng Wu 吴雪峰 敬上 On 2015年2月5日, at 下午11:14, Yifan LI iamyifa...@gmail.com wrote: Hi, I am running a heavy memory/cpu overhead graphx application, I think the memory is sufficient and set RDDs

Re: If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-19 Thread Xuefeng Wu
: user-h...@spark.apache.org -- ~Yours, Xuefeng Wu/吴雪峰 敬上

Re: Is it possible to store graph directly into HDFS?

2014-12-30 Thread Xuefeng Wu
how about save as object? Yours, Xuefeng Wu 吴雪峰 敬上 On 2014年12月30日, at 下午9:27, Jason Hong begger3...@gmail.com wrote: Dear all:) We're trying to make a graph using large input data and get a subgraph applied some filter. Now, we wanna save this graph to HDFS so that we can load later

Re: Alternatives to groupByKey

2014-12-03 Thread Xuefeng Wu
looks good. I concern about the foldLeftByKey which looks break the consistence from foldLeft in RDD and aggregateByKey in PairRDD Yours, Xuefeng Wu 吴雪峰 敬上 On 2014年12月4日, at 上午7:47, Koert Kuipers ko...@tresata.com wrote: foldLeftByKey

How take top N of top M from RDD as RDD

2014-12-01 Thread Xuefeng Wu
= for { (_, ageScores) - takeTop(scores, _.age) (_, numScores) - takeTop(ageScores, _.num) } yield { numScores } topScores.size -- ~Yours, Xuefeng Wu/吴雪峰 敬上

Re: How take top N of top M from RDD as RDD

2014-12-01 Thread Xuefeng Wu
hi Debasish, I found test code in map translate, would it collect all products too? + val sortedProducts = products.toArray.sorted(ord.reverse) Yours, Xuefeng Wu 吴雪峰 敬上 On 2014年12月2日, at 上午1:33, Debasish Das debasish.da...@gmail.com wrote: rdd.top collects it on master... If you want

Re: [scala-user] Why aggregate is inconsistent?

2014-10-30 Thread Xuefeng Wu
, 2014 at 5:39 PM, Xuefeng Wu ben...@gmail.com wrote: scala import scala.collection.GenSeq scala val seq = GenSeq(This, is, an, example) scala seq.aggregate(0)(_ + _, _ + _) res0: String = 0Thisisanexample scala seq.par.aggregate(0)(_ + _, _ + _) res1: String = 0This0is0an0example

Re: how to use SPARK_PUBLIC_DNS

2014-08-10 Thread Xuefeng Wu
there is docker script for spark 0.9 in spark git Yours, Xuefeng Wu 吴雪峰 敬上 On 2014年8月10日, at 下午8:27, 诺铁 noty...@gmail.com wrote: hi, all, I am playing with docker, trying to create a spark cluster with docker containers. since spark master, worker, driver all need to visit each