If you are running spark in local mode, executor parameters are not used as
there is no executor. You should try to set corresponding driver parameter
to effect it.
On Mon, Jan 19, 2015, 00:21 Sean Owen so...@cloudera.com wrote:
OK. Are you sure the executor has the memory you think? -Xmx24g in
Just make sure both versions of spark are same (the one from where you are
submitting the job, and the one to which you are submitting the job).
Another reason would be firewall issues if you are submitting the job from
another network/remote machine.
Thanks
Best Regards
On Sun, Jan 18, 2015 at
Hi Sean,
Thanks for your advice, a normal 'val' will suffice. But will it be
serialized and transferred every batch and every partition? That's why
broadcast exists, right?
For now I'm going to use 'val', but I'm still looking for a broadcast-way
solution.
On Sun, Jan 18, 2015 at 5:36 PM, Sean
bq. there was no 2.11 Kafka available
That's right. Adding external/kafka module resulted in:
[ERROR] Failed to execute goal on project spark-streaming-kafka_2.11: Could
not resolve dependencies for project
org.apache.spark:spark-streaming-kafka_2.11:jar:1.3.0-SNAPSHOT: Could not
find artifact
Hi Jeff,
From my understanding it seems more like a bug, since JavaDStreamLike is used
for Java code, return a Scala DStream is not reasonable. You can fix this by
submitting a PR, or I can help you to fix this.
Thanks
Jerry
From: Jeff Nadler [mailto:jnad...@srcginc.com]
Sent: Monday, January
It seems the netty jar works with an incompatible method signature. Can you
check if there different versions of netty jar in your classpath?
From: Walrus theCat [mailto:walrusthe...@gmail.com]
Sent: Sunday, January 18, 2015 3:37 PM
To: user@spark.apache.org
Subject: Re: SparkSQL 1.2.0 sources
NioWorkerPool(Executor workerExecutor, int workerCount) was added in netty
3.5.4
https://github.com/netty/netty/blob/netty-3.5.4.Final/src/main/java/org/jboss/netty/channel/socket/nio/NioWorkerPool.java
If there is a netty jar in the classpath older than the above release, you
would see the
This looks like a bug in the master branch of Spark, related to some recent
changes to EventLoggingListener. You can reproduce this bug on a fresh
Spark checkout by running
./bin/spark-shell --conf spark.eventLog.enabled=true --conf
spark.eventLog.dir=/tmp/nonexistent-dir
where
The error in the log file says:
*java.lang.OutOfMemoryError: GC overhead limit exceeded*
with certain task ID and the error repeats for further task IDs.
What could be the problem?
On Sun, Jan 18, 2015 at 2:45 PM, Deep Pradhan pradhandeep1...@gmail.com
wrote:
Updating the Spark version means
Hi mehrdad,
I seem to have the same issue as you wrote about here. Did you manage to
resolve it?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/running-a-job-on-ec2-Initial-job-has-not-accepted-any-resources-tp20607p21218.html
Sent from the Apache Spark
The singleton hack works very different in spark 1.2.0 (it does not work if
the program has multiple map-reduce jobs in the same program). I guess there
should be an official documentation on how to have each machine/node do an
init step locally before executing any other instructions (e.g.
Also, I used the following pattern to extract information from a file path and
add it to the output of a transformation:
https://gist.github.com/btiernay/1ad5e3dea08904fe07d9
You may find it useful as well.
Cheers,
Bob
From: btier...@hotmail.com
To: so...@cloudera.com;
I think the problem is that you have a single object that is larger than
2GB and so fails to serialize to a byte array. I think it is best not to
design it this way as you can't parallelize combining maps. You could go
all the way to emit key value pairs and reduceByKey. There are solutions
You may also want to keep an eye on SPARK-5182 / SPARK-5302 which may help if
you are using Spark SQL. It should be noted that this is possible with
HiveContext today.
Cheers,
Bob
Date: Sun, 18 Jan 2015 08:47:06 +
Subject: Re: Directory / File Reading Patterns
From: so...@cloudera.com
Hi,
Please help me with this problem. I would really appreciate your help !
I am using spark 1.2.0. I have a map-reduce job written in spark in the
following way:
val sumW = splittedTrainingDataRDD.map(localTrainingData = LocalSGD(w,
localTrainingData, numeratorCtEta, numitorCtEta, regularizer,
Hi experts,
I'm getting ExceptionInInitializerError when using a class defined in REPL.
Code is something like this:
case class TEST(a: String)
sc.textFile(~~~).map(TEST(_)).count
The code above used to works well until yesterday, but suddenly for some
reason it doesn't work with the error.
Hi,
I am trying to find where Spark persists RDDs when we call the persist()
api and executed under YARN. This is purely for understanding...
In my driver program, I wait indefinitely, so as to avoid any clean up
problems.
In the actual job, I roughly do the following:
JavaRDDString lines =
Updating the Spark version means setting up the entire cluster once more?
Or can we update it in some other way?
On Sat, Jan 17, 2015 at 3:22 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Can you paste the code? Also you can try updating your spark version.
Thanks
Best Regards
On Sat,
Hi,
We have large datasets with data format for Spark MLLib matrix, but there are
pre-computed by Hive and stored inside Hive, my question is can we create a
distributed matrix such as IndexedRowMatrix directlly from Hive tables,
avoiding reading data from Hive tables and feed them into an
The singleton hack works very different in spark 1.2.0 (it does not work if
the program has multiple map-reduce jobs in the same program). I guess there
should be an official documentation on how to have each machine/node do an
init step locally before executing any other instructions (e.g.
Why do you say it does not work? The singleton pattern works the same as
ever. It is not a pattern that involves Spark.
On Jan 18, 2015 12:57 PM, octavian.ganea octavian.ga...@inf.ethz.ch
wrote:
The singleton hack works very different in spark 1.2.0 (it does not work if
the program has multiple
I think that this problem is not Spark-specific since you are simply side
loading some data into memory. Therefore you do not need an answer that
uses Spark.
Simply load the data and then poll for an update each time it is accessed?
Or some reasonable interval? This is just something you write in
Hi,
After some experiments, there're three methods that work in this 'join
DStream with other dataset which is updated periodically'.
1. Create an RDD in transform operation
val words = ssc.socketTextStream(localhost, ).flatMap(_.split(_))
val filtered = words transform { rdd =
val spam =
I think that putting part of the data (only) in a filename is an
anti-pattern, but we sometimes have to play these where they lie.
You can list all the directory paths containing the CSV files, map them
each to RDDs with textFile, transform the RDDs to include info from the
path, and then simply
You can try increasing the parallelism, can you be more specific about the
task that you are doing? May be pasting the piece of code would help.
On 18 Jan 2015 13:22, Deep Pradhan pradhandeep1...@gmail.com wrote:
The error in the log file says:
*java.lang.OutOfMemoryError: GC overhead limit
OK. Are you sure the executor has the memory you think? -Xmx24g in
its command line? It may be that for some reason your job is reserving
an exceptionally large amount of non-heap memory. I am not sure that's
to be expected with the ALS job though. Even if the settings work,
considering using the
Oh: are you running the tests with a different profile setting than
what the last assembly was built with? this particular test depends on
those matching. Not 100% sure that's the problem, but a good guess.
On Sat, Jan 17, 2015 at 4:54 PM, Ted Yu yuzhih...@gmail.com wrote:
The test passed here:
I could be wrong, but I thought this was on purpose. At the time it
was set up, there was no 2.11 Kafka available? or one of its
dependencies wouldn't work with 2.11?
But I'm not sure what the OP means by maven doesn't build Spark's
dependencies because Ted indicates it does, and of course you
Nathan,
I posted a bunch of questions for you as a comment on your question
http://stackoverflow.com/q/28002443/877069 on Stack Overflow. If you
answer them (don't forget to @ping me) I may be able to help you.
Nick
On Sat Jan 17 2015 at 3:49:54 PM gen tang gen.tan...@gmail.com wrote:
Hi,
I have the same issue.
- Messaggio originale -
Da: Rasika Pohankar rasikapohan...@gmail.com
Inviato: 18/01/2015 18:48
A: user@spark.apache.org user@spark.apache.org
Oggetto: Spark Streaming with Kafka
I am using Spark Streaming to process data received through Kafka. The Spark
These will be under the working directory of the YARN container
running the executor. I don't have it handy but think it will also be
a spark-local or similar directory.
On Sun, Jan 18, 2015 at 2:50 PM, Hemanth Yamijala yhema...@gmail.com wrote:
Hi,
I am trying to find where Spark persists
Right, done with matrix blocks. Seems like a lot of duplicate effort. but
that’s the way of OSS sometimes.
I didn’t see transpose in the Jira. Are there plans for transpose and
rowSimilarity without transpose? The latter seems easier than columnSimilarity
in the general/naive case. Thresholds
I am using Spark Streaming to process data received through Kafka. The
Spark version is 1.2.0. I have written the code in Java and am compiling it
using sbt. The program runs and receives data from Kafka and processes it
as well. But it stops receiving data suddenly after some time( it has run
for
I posted about the Application WebUI error (specifically application WebUI not
the master WebUI generally) and have spent at least a few hours a day for over
week trying to resolve it so I’d be very grateful for any suggestions. It is
quite troubling that I appear to be the only one
Yes.
That could be the cause.
On Sun, Jan 18, 2015 at 11:47 AM, Sean Owen so...@cloudera.com wrote:
Oh: are you running the tests with a different profile setting than
what the last assembly was built with? this particular test depends on
those matching. Not 100% sure that's the problem, but
35 matches
Mail list logo