Hi,
I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).
What's the difference?
I have set -Dspark.local.dir for all my worker nodes but I'm still seeing
directories being created in /tmp when the job is running.
I have also tried setting -Dspark.local.dir when I run the
I'm not 100% sure but I think it goes
like this :
spark.local.dir can and should be set both on the executors and on
the driver (if the driver broadcast variables, the files will be
stored in this directory)
the SPARK_WORKER_DIR is where the
spark.local.dir can and should be set both on the executors and on the
driver (if the driver broadcast variables, the files will be stored in this
directory)
Do you mean the worker nodes?
Don’t think they are jetty connectors and the directories are empty:
spark.local.dir can and
should be set both on the executors and on the driver
(if the driver broadcast variables, the files will be
stored in this directory)
Hello,
Is there anything special about calling functions that parse json lines
from filter?
I have code that looks like this:
jsonMatches(line:String):Boolean = {
take a line in json format
val jline=parse(line)
val je = jline \ event
if (je != JNothing je.values.toString ==
The triangle count failed for me when I ran it on more than one node. There
was this assertion in TriangleCount.scala:
// double count should be even (divisible by two)
assert((dblCount 1) == 0)
That did not hold true when I ran this on multiple nodes, even when
following the
It's trying to send You just need to have the jsonMatches function
available on the worker side of the interaction rather than on the driver
side, e.g., put it on an object CodeThatIsRemote that gets shipped with the
JARs and then filter(CodeThatIsRemote.jsonMatches) and you should be off to
the
You can check out Ganglia for network utilization.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Thu, Mar 13, 2014 at 2:04 AM, moxiecui moxie...@gmail.com wrote:
Hello everyone:
Say I have a application run on
Hmm.
The whole thing is packaged in a .jar file and I execute .addJar on the
SparkContext. My expectation is that the whole jar together with that
function is available on every worker automatically. Is that not a valid
expectation?
Ognen
On 3/13/14, 11:09 AM, Paul Brown wrote:
It's
Well, the question is how you're referencing it. If you reference it in a
static fashion (function on an object, Scala-wise), then that's
dereferenced on the worker side. If you reference it in a way that refers
to something on the driver side, serializing the block will attempt to
serialize the
I must be really dense! :)
Here is the most simplified version of the code, I removed a bunch of
stuff and hard-coded the event and Sign Up lines.
def jsonMatches(line:String):Boolean = {
val jLine = parse(line)
// extract the event: from the line
val e = jLine \ event
I even tried this:
def jsonMatches(line:String):Boolean = true
It is still failing with the same error.
Ognen
On 3/13/14, 11:45 AM, Ognen Duzlevski wrote:
I must be really dense! :)
Here is the most simplified version of the code, I removed a bunch of
stuff and hard-coded the event and
Hi,
I have an RDD with S,Tuple2I,List which I want to reduceByKey and get
I+I and List of List
(add the integers and build a list of the lists.
BUT reduce by key requires that the return value is of the same type of the
input
so I can combine the lists.
Have a look at my project:
https://github.com/adamnovak/sequence-graphs/blob/master/importVCF/src/main/scala/importVCF.scala.
I use the SBT Native Packager, which dumps my jar and all its
dependency jars into one directory. Then I have my code find the jar
it's running from, and loop through that
By the way, this is the underlying error for me:
java.lang.VerifyError: (class:
org/jboss/netty/channel/socket/nio/NioWorkerPool, method: createWorker
signature:
(Ljava/util/concurrent/Executor;)Lorg/jboss/netty/channel/socket/nio/AbstractNioWorker;)
Wrong return type in function
at
You can find it here:
https://github.com/apache/incubator-spark/tree/master/graphx/data
On Thu, Mar 13, 2014 at 10:13 AM, Diana Carroll dcarr...@cloudera.comwrote:
I'd like to play around with the Page Rank example included with Spark but
I can't find that any sample data to work with is
Hi, I would like to know how is the correct way to add kafka to my project in
StandAlone YARN, given that now it's in a different artifact than the Spark
core.
I tried adding the dependency to my project but I get a
NotClassFoundException to my main class. Also, that makes my Jar file very
big,
Looks like everything from 0.8.0 and before errors similarly (though Spark
0.3 for Scala 2.9 has a malformed link as well).
On Thu, Mar 13, 2014 at 10:52 AM, Walrus theCat walrusthe...@gmail.comwrote:
Sup,
Where can I get Spark 0.7.3? It's 404 here:
http://spark.apache.org/downloads.html
Hi,
After sorting an RDD and writing to hadoop, would the RDD be still sorted
when reading it back?
Can sorting be guaranteed after reading back, when the RDD was written as 1
partition with rdd.coalesce(1)?
OK, problem solved.
Interesting thing - I separated the jsonMatches function below and put
it in as a method to a separate file/object. Once done that way, it all
serializes and works.
Ognen
On 3/13/14, 11:52 AM, Ognen Duzlevski wrote:
I even tried this:
def
In Spark 1.0 we've added better randomization to the scheduling of
tasks so they are distributed more evenly by default.
https://github.com/apache/spark/commit/556c56689bbc32c6cec0d07b57bd3ec73ceb243e
However having specific policies like that isn't really supported
unless you subclass the RDD
not that long ago there was a nice example on here about how to combine
multiple operations on a single RDD. so basically if you want to do a
count() and something else, how to roll them into a single job. i think
patrick wendell gave the examples.
i cant find them anymore patrick can you
The amplab spark internals talk you mentioned is actually referring to the
RDD persistence levels, where by default we do not persist RDDs to disk (
https://spark.apache.org/docs/0.9.0/scala-programming-guide.html#rdd-persistence
).
spark.shuffle.spill refers to a different behavior -- if the
23 matches
Mail list logo