date:20140309

Re: sequenceFile and groupByKey

2014-03-09 Thread Shixiong Zhu

Hi Kane, In the sequence file, the class is org.apache.hadoop.io.Text. You need to convert Text to String. There are two approaches: 1. Use implicit conversions to convert Text to String automatically. I recommend this one. E.g., val t2 = sc.sequenceFile[String, String](/user/hdfs/e1Mseq)

Aggregators in GraphX

2014-03-09 Thread Sebastian Schelter

Hi, Does GraphX currently support Giraph/Pregel's aggregator feature? I was thinking to implement a PageRank version that is able to correctly handle dangling vertices (i.e. vertices with no outlinks). Therefore I would have to globally sum up the rank associated to them in every iteration,

Re: Explain About Logs NetworkWordcount.scala

2014-03-09 Thread Eduardo Costa Alfaia

Yes TD, I can use tcpdump to see if the data are being accepted by the receiver and if else them are arriving into the IP packet. Thanks Em 3/8/14, 4:19, Tathagata Das escreveu: I am not sure how to debug this without any more information about the source. Can you monitor on the receiver side

RE: major Spark performance problem

2014-03-09 Thread Livni, Dana

YARN also have this scheduling option. The problem is all of our applications have the same flow where the first stage is the heaviest and the rest are very small. The problem is when some request (application) start to run on the same time, the first stage of all is schedule in parallel, and

State of spark docker script

2014-03-09 Thread Aureliano Buendia

Hi, Is the spark docker script now mature enough to substitute spark-ec2 script? Anyone here using the docker script is production?

Re: major Spark performance problem

2014-03-09 Thread Matei Zaharia

Hi Dana, It’s hard to tell exactly what is consuming time, but I’d suggest starting by profiling the single application first. Three things to look at there: 1) How many stages and how many tasks per stage is Spark launching (in the application web UI at http://driver:4040)? If you have

Spark on YARN use only one node

2014-03-09 Thread Assaf

Hi, I've installed Spark 0.81 on IDH 3.0.2 as on YARN. My cluster have 3 servers, 1 is NN and DN, other 2 only DN. I manage to launch spark-shell and execute the mllib kmeans. The problem is it is using only one node ( the NN ) and not running on the other 2 DN Please advise My spark-env.sh

Re: State of spark docker script

2014-03-09 Thread Aaron Davidson

Whoa, wait, the docker scripts are only used for testing purposes right now. They have not been designed with the intention of replacing the spark-ec2 scripts. For instance, there isn't an ssh server running so you can stop and restart the cluster (like sbin/stop-all.sh). Also, we currently mount

no stdout output from worker

2014-03-09 Thread Sen, Ranjan [USA]

Hi I have some System.out.println in my Java code that is working ok in a local environment. But when I run the same code on a standalone mode in a EC2 cluster I do not see them at the worker stdout (in the worker node under spark location/work ) or at the driver console. Could you help me

CDH5b2, Spark 0.9.0 and shark

2014-03-09 Thread danoomistmatiste

Hi, I am running cdh5b2. I have installed the hadoop2 version of shark 0.9.0 for cdh5. Want to know if there is a compatible version of shark that will run with this combination. -- View this message in context:

Re: Sbt Permgen

2014-03-09 Thread Sandy Ryza

There was an issue related to this fixed recently: https://github.com/apache/spark/pull/103 On Sun, Mar 9, 2014 at 8:40 PM, Koert Kuipers ko...@tresata.com wrote: edit last line of sbt/sbt, after which i run: sbt/sbt test On Sun, Mar 9, 2014 at 10:24 PM, Sean Owen so...@cloudera.com wrote:

Re: no stdout output from worker

2014-03-09 Thread Patrick Wendell

Hey Sen, Is your code in the driver code or inside one of the tasks? If it's in the tasks, the place you would expect these to be is in stdout file under spark/appid/work/[stdout/stderr]. Are you seeing at least stderr logs in that folder? If not then the tasks might not be running on the

Re: sequenceFile and groupByKey

Aggregators in GraphX

Re: Explain About Logs NetworkWordcount.scala

RE: major Spark performance problem

State of spark docker script

Re: major Spark performance problem

Spark on YARN use only one node

Re: State of spark docker script

no stdout output from worker

CDH5b2, Spark 0.9.0 and shark

Re: Sbt Permgen

Re: no stdout output from worker

12 matches

Site Navigation

Mail list logo

Footer information