date:20140403

Re: Error when run Spark on mesos

2014-04-03 Thread panfei

after upgrading to 0.9.1 , everything goes well now. thanks for the reply. 2014-04-03 13:47 GMT+08:00 andy petrella andy.petre...@gmail.com: Hello, It's indeed due to a known bug, but using another IP for the driver won't be enough (other problems will pop up). A easy solution would be to

How to stop system info output in spark shell

2014-04-03 Thread weida xu

Hi, alll When I start spark in the shell. It automatically output some system info every minute, see below. Can I stop or block the output of these info? I tried the :silent comnond, but the automatical output remains. 14/04/03 19:34:30 INFO MetadataCleaner: Ran metadata cleaner for

Spark Disk Usage

2014-04-03 Thread Surendranauth Hiraman

Hi, I know if we call persist with the right options, we can have Spark persist an RDD's data on disk. I am wondering what happens in intermediate operations that could conceivably create large collections/Sequences, like GroupBy and shuffling. Basically, one part of the question is when is

Re: Strange behavior of RDD.cartesian

2014-04-03 Thread Jaonary Rabarisoa

You can find here a gist that illustrates this issue https://gist.github.com/jrabary/9953562 I got this with spark from master branch. On Sat, Mar 29, 2014 at 7:12 PM, Andrew Ash and...@andrewash.com wrote: Is this spark 0.9.0? Try setting spark.shuffle.spill=false There was a hash collision

Re: Avro serialization

2014-04-03 Thread FRANK AUSTIN NOTHAFT

We use avro objects in our project, and have a Kryo serializer for generic Avro SpecificRecords. Take a look at: https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/serialization/ADAMKryoRegistrator.scala Also, Matt Massie has a good blog post

Re: Is there a way to get the current progress of the job?

2014-04-03 Thread Philip Ogren

This is great news thanks for the update! I will either wait for the 1.0 release or go and test it ahead of time from git rather than trying to pull it out of JobLogger or creating my own SparkListener. On 04/02/2014 06:48 PM, Andrew Or wrote: Hi Philip, In the upcoming release of Spark

Re: Is there a way to get the current progress of the job?

2014-04-03 Thread Philip Ogren

I can appreciate the reluctance to expose something like the JobProgressListener as a public interface. It's exactly the sort of thing that you want to deprecate as soon as something better comes along and can be a real pain when trying to maintain the level of backwards compatibility that

Re: what does SPARK_EXECUTOR_URI in spark-env.sh do ?

2014-04-03 Thread andy petrella

Indeed, it's how mesos works actually. So the tarball just has to be somewhere accessible by the mesos slaves. That's why it is often put in hdfs. Le 3 avr. 2014 18:46, felix cnwe...@gmail.com a écrit : So, if I set this parameter, there is no need to copy the spark tarball to every mesos

Spark 1.0.0 release plan

2014-04-03 Thread Bhaskar Dutta

Hi, Is there any change in the release plan for Spark 1.0.0-rc1 release date from what is listed in the Proposal for Spark Release Strategy thread? == Tentative Release Window for 1.0.0 == Feb 1st - April 1st: General development April 1st: Code freeze for new features April 15th: RC1 Thanks,

Re: Job initialization performance of Spark standalone mode vs YARN

2014-04-03 Thread Kevin Markey

We are now testing precisely what you ask about in our environment. But Sandy's questions are relevant. The bigger issue is not Spark vs. Yarn but "client" vs. "standalone" and where the client is located on the network relative to the cluster. The "client" options

Re: Spark 1.0.0 release plan

2014-04-03 Thread Matei Zaharia

Hey Bhaskar, this is still the plan, though QAing might take longer than 15 days. Right now since we’ve passed April 1st, the only features considered for a merge are those that had pull requests in review before. (Some big ones are things like annotating the public APIs and simplifying

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-04-03 Thread Vipul Pandey

Any word on this one ? On Apr 2, 2014, at 12:26 AM, Vipul Pandey vipan...@gmail.com wrote: I downloaded 0.9.0 fresh and ran the mvn command - the assembly jar thus generated also has both shaded and real version of protobuf classes Vipuls-MacBook-Pro-3:spark-0.9.0-incubating vipul$ jar -ftv

Spark SQL transformations, narrow vs. wide

2014-04-03 Thread Jan-Paul Bultmann

Hey, Does somebody know the kinds of dependencies that the new SQL operators produce? I’m specifically interested in the relational join operation as it seems substantially more optimized. The old join was narrow on two RDDs with the same partitioner. Is the relational join narrow as well?

Re: Spark SQL transformations, narrow vs. wide

2014-04-03 Thread Michael Armbrust

I'm sorry, but I don't really understand what you mean when you say wide in this context. For a HashJoin, the only dependencies of the produced RDD are the two input RDDs. For BroadcastNestedLoopJoin The only dependence will be on the streamed RDD. The other RDD will be distributed to all

Re: Optimal Server Design for Spark

2014-04-03 Thread Matei Zaharia

To run multiple workers with Spark’s standalone mode, set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES in conf/spark-env.sh. For example, if you have 16 cores and want 2 workers, you could add export SPARK_WORKER_INSTANCES=2 export SPARK_WORKER_CORES=8 Matei On Apr 3, 2014, at 12:38 PM,

Re: Optimal Server Design for Spark

2014-04-03 Thread Debasish Das

@Mayur...I am hitting ulimits on the cluster if I go beyond 4 core per worker and I don't think I can change the ulimit due to sudo issues etc... If I have more workers, in ALS, I can go for 20 blocks (right now I am running 10 blocks on 10 nodes with 4 cores each and now I can go upto 20 blocks

Re: Error when run Spark on mesos

How to stop system info output in spark shell

Spark Disk Usage

Re: Strange behavior of RDD.cartesian

Re: Avro serialization

Re: Is there a way to get the current progress of the job?

Re: Is there a way to get the current progress of the job?

Re: what does SPARK_EXECUTOR_URI in spark-env.sh do ?

Spark 1.0.0 release plan

Re: Job initialization performance of Spark standalone mode vs YARN

Re: Spark 1.0.0 release plan

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

Spark SQL transformations, narrow vs. wide

Re: Spark SQL transformations, narrow vs. wide

Re: Optimal Server Design for Spark

Re: Optimal Server Design for Spark

16 matches

Site Navigation

Mail list logo

Footer information