This was actually a bug in the log message itself, where the Master would
print its own ip and port instead of the registered worker's. It has been
fixed in 0.9.1 and 1.0.0 (here's the patch:
https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85
).
Sorry about the
AFAIK cache() is just a shortcut to the persist method with MEMORY_ONLY
as storage level..
from the source code of RDD:
/** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
def persist(): RDD[T] = persist(StorageLevel.MEMORY_ONLY)
/** Persist this RDD with the default
On Thu, Apr 10, 2014 at 12:24 PM, Andrew Ash and...@andrewash.com wrote:
The biggest issue I've come across is that the cluster is somewhat unstable
when under memory pressure. Meaning that if you attempt to persist an RDD
that's too big for memory, even with MEMORY_AND_DISK, you'll often
It's highly dependent on what the issue is with your particular job, but
the ones I modify most commonly are:
spark.storage.memoryFraction
spark.shuffle.memoryFraction
parallelism (a parameter on many RDD calls) -- increase from the default
level to get more, smaller tasks that are more likely to
Thanks for attaching code. If I get your use case right you want to call
the sentiment analysis code from Spark Streaming right ? For that I think
you can just use jvmr if that works and I don't think you need SparkR.
SparkR is mainly intended as an API for large scale jobs which are written
in
Hi, I have multiple filters as shown below, should I use a single optimal
filter instead of them? these filters can degrade the performance of spark?
http://apache-spark-user-list.1001560.n3.nabble.com/file/n4185/Capture.png
--
View this message in context:
or how about the UpdateStateByKey() operation?
https://spark.apache.org/docs/0.9.0/streaming-programming-guide.html
the StatefulNetworkWordCount example demonstrates how to keep state across RDDs.
On Mar 28, 2014, at 8:44 PM, Mayur Rustagi mayur.rust...@gmail.com wrote:
Are you referring to
Many thanks for your explanation.
So there's just my issue with that TaskSchedulerImpl: Initial job has not
accepted any resources stuff that prevents me from starting with Spark (at
least execute the examples successfully) ;)
br, Gerd
On 13 April 2014 10:17, Aaron Davidson ilike...@gmail.com
This is usually due to a memory misconfiguration somewhere. Your job may be
requesting that each executor has 512MB, and your cluster may not be able
to satisfy that (if you're only allowing 64MB executors, for instance). Try
setting spark.executor.memory to be the same as SPARK_WORKER_MEMORY.
Hi
The 512MB is the default memory size which each executor needs. and
actually, your job does not need as much as the default memory size. you
can create a SparkContext with
sc = new SparkContext(local-cluster[2,1,512], test) // suppose you use
the local-cluster model.
Here the 512 is the
What is the difference between checkpointing and caching an RDD?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-count-maps-without-shuffling-too-much-data-tp4194.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
A few questions about the resilience of the client side of spark.
what would happen if the client process crashes, can it reconstruct its state ?
Suppose I just want to serialize it and reload it back is this possible ?
More advanced use case, is there a way to move SparkContext between
13 matches
Mail list logo