after upgrading to 0.9.1 , everything goes well now. thanks for the reply.
2014-04-03 13:47 GMT+08:00 andy petrella andy.petre...@gmail.com:
Hello,
It's indeed due to a known bug, but using another IP for the driver won't
be enough (other problems will pop up).
A easy solution would be to
Hi, alll
When I start spark in the shell. It automatically output some system info
every minute, see below. Can I stop or block the output of these info? I
tried the :silent comnond, but the automatical output remains.
14/04/03 19:34:30 INFO MetadataCleaner: Ran metadata cleaner for
Hi,
I know if we call persist with the right options, we can have Spark persist
an RDD's data on disk.
I am wondering what happens in intermediate operations that could
conceivably create large collections/Sequences, like GroupBy and shuffling.
Basically, one part of the question is when is
You can find here a gist that illustrates this issue
https://gist.github.com/jrabary/9953562
I got this with spark from master branch.
On Sat, Mar 29, 2014 at 7:12 PM, Andrew Ash and...@andrewash.com wrote:
Is this spark 0.9.0? Try setting spark.shuffle.spill=false There was a
hash collision
We use avro objects in our project, and have a Kryo serializer for generic
Avro SpecificRecords. Take a look at:
https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/serialization/ADAMKryoRegistrator.scala
Also, Matt Massie has a good blog post
This is great news thanks for the update! I will either wait for the
1.0 release or go and test it ahead of time from git rather than trying
to pull it out of JobLogger or creating my own SparkListener.
On 04/02/2014 06:48 PM, Andrew Or wrote:
Hi Philip,
In the upcoming release of Spark
I can appreciate the reluctance to expose something like the
JobProgressListener as a public interface. It's exactly the sort of
thing that you want to deprecate as soon as something better comes along
and can be a real pain when trying to maintain the level of backwards
compatibility that
Indeed, it's how mesos works actually. So the tarball just has to be
somewhere accessible by the mesos slaves. That's why it is often put in
hdfs.
Le 3 avr. 2014 18:46, felix cnwe...@gmail.com a écrit :
So, if I set this parameter, there is no need to copy the spark tarball to
every mesos
Hi,
Is there any change in the release plan for Spark 1.0.0-rc1 release date
from what is listed in the Proposal for Spark Release Strategy thread?
== Tentative Release Window for 1.0.0 ==
Feb 1st - April 1st: General development
April 1st: Code freeze for new features
April 15th: RC1
Thanks,
We are now testing precisely what you ask about in our environment.
But Sandy's questions are relevant. The bigger issue is not Spark
vs. Yarn but "client" vs. "standalone" and where the client is
located on the network relative to the cluster.
The "client" options
Hey Bhaskar, this is still the plan, though QAing might take longer than 15
days. Right now since we’ve passed April 1st, the only features considered for
a merge are those that had pull requests in review before. (Some big ones are
things like annotating the public APIs and simplifying
Any word on this one ?
On Apr 2, 2014, at 12:26 AM, Vipul Pandey vipan...@gmail.com wrote:
I downloaded 0.9.0 fresh and ran the mvn command - the assembly jar thus
generated also has both shaded and real version of protobuf classes
Vipuls-MacBook-Pro-3:spark-0.9.0-incubating vipul$ jar -ftv
Hey,
Does somebody know the kinds of dependencies that the new SQL operators produce?
I’m specifically interested in the relational join operation as it seems
substantially more optimized.
The old join was narrow on two RDDs with the same partitioner.
Is the relational join narrow as well?
I'm sorry, but I don't really understand what you mean when you say wide
in this context. For a HashJoin, the only dependencies of the produced RDD
are the two input RDDs. For BroadcastNestedLoopJoin The only dependence
will be on the streamed RDD. The other RDD will be distributed to all
To run multiple workers with Spark’s standalone mode, set
SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES in conf/spark-env.sh. For
example, if you have 16 cores and want 2 workers, you could add
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_CORES=8
Matei
On Apr 3, 2014, at 12:38 PM,
@Mayur...I am hitting ulimits on the cluster if I go beyond 4 core per
worker and I don't think I can change the ulimit due to sudo issues etc...
If I have more workers, in ALS, I can go for 20 blocks (right now I am
running 10 blocks on 10 nodes with 4 cores each and now I can go upto 20
blocks
16 matches
Mail list logo