Hi,
I'm executing a SparkStreamig code with Kafka. IçThe code was working but
today I tried to execute the code again and I got an exception, I dn't know
what's it happening. right now , there are no jobs executions on YARN.
How could it fix it?
Exception in thread main
Could you also provide the code where you set up the Kafka dstream? I dont
see it in the snippet.
On Fri, Jun 26, 2015 at 2:45 PM, Ashish Nigam ashnigamt...@gmail.com
wrote:
Here's code -
def createStreamingContext(checkpointDirectory: String) :
StreamingContext = {
val conf = new
1. you need checkpointing mostly for recovering from driver failures, and
in some cases also for some stateful operations.
2. Could you try not using the SPARK_CLASSPATH environment variable.
TD
On Sat, Jun 27, 2015 at 1:00 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
I don't have any
Well, though randomly chosen, SPARK_CLASSPATH is a recognized env variable
that is picked up by spark-submit. That is what was used pre-Spark-1.0, but
got deprecated after that. Mind renamign that variable and trying it out
again? At least it will reduce one possible source of problem.
TD
On
still feels like a bug to have to create unique names before a join.
On Fri, Jun 26, 2015 at 9:51 PM, ayan guha guha.a...@gmail.com wrote:
You can declare the schema with unique names before creation of df.
On 27 Jun 2015 13:01, Axel Dahl a...@whisperstream.com wrote:
I have the following
Well SPARK_CLASSPATH it's just a random name, the complete script is this:
export HADOOP_CONF_DIR=/etc/hadoop/conf
SPARK_CLASSPATH=file:/usr/metrics/conf/elasticSearch.properties,file:/usr/metrics/conf/redis.properties,/etc/spark/conf.cloudera.spark_on_yarn/yarn-conf/
for lib in `ls
How are you trying to execute the code again? From checkpoints, or
otherwise?
Also cc'ed Hari who may have a better idea of YARN related issues.
On Sat, Jun 27, 2015 at 12:35 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
Hi,
I'm executing a SparkStreamig code with Kafka. IçThe code was
I'm checking the logs in YARN and I found this error as well
Application application_1434976209271_15614 failed 2 times due to AM
Container for appattempt_1434976209271_15614_02 exited with exitCode:
255
Diagnostics: Exception from container-launch.
Container id:
Could you print the time on the driver (that is, in foreachRDD but before
RDD.foreachPartition) and see if it is behaving weird?
TD
On Fri, Jun 26, 2015 at 3:57 PM, Emrehan Tüzün emrehan.tu...@gmail.com
wrote:
On Fri, Jun 26, 2015 at 12:30 PM, Sea 261810...@qq.com wrote:
Hi, all
I find
I don't have any checkpoint on my code. Really, I don't have to save any
state. It's just a log processing of a PoC.
I have been testing the code in a VM from Cloudera and I never got that
error.. Not it's a real cluster.
The command to execute Spark
spark-submit --name PoC Logs --master
In the receiver based approach, If the receiver crashes for any reason
(receiver crashed or executor crashed) the receiver should get restarted on
another executor and should start reading data from the offset present in
the zookeeper. There is some chance of data loss which can alleviated using
Does YARN provide the token through that env variable you mentioned? Or how
does YARN do this?
Tim
On Fri, Jun 26, 2015 at 3:51 PM, Marcelo Vanzin van...@cloudera.com wrote:
On Fri, Jun 26, 2015 at 3:44 PM, Dave Ariens dari...@blackberry.com
wrote:
Fair. I will look into an alternative
Do you have SPARK_CLASSPATH set in both cases? Before and after checkpoint?
If yes, then you should not be using SPARK_CLASSPATH, it has been
deprecated since Spark 1.0 because of its ambiguity.
Also where do you have spark.executor.extraClassPath set? I dont see it in
the spark-submit command.
I looked at the code and found that batch exceptions are indeed ignored.
This is something that is worth fixing, that batch exceptions should not be
silently ignored.
Also, you can catch failed batch jobs (irrespective of the number of
retries) by catch the exception in foreachRDD. Here is an
I changed the variable name and I got the same error.
2015-06-27 11:36 GMT+02:00 Tathagata Das t...@databricks.com:
Well, though randomly chosen, SPARK_CLASSPATH is a recognized env variable
that is picked up by spark-submit. That is what was used pre-Spark-1.0, but
got deprecated after that.
Please take a look at JavaPairRDD.scala
Cheers
On Sat, Jun 27, 2015 at 3:42 AM, Bahubali Jain bahub...@gmail.com wrote:
Hi,
Why doesn't JavaRDD has saveAsNewAPIHadoopFile() associated with it.
Thanks,
Baahu
--
Twitter:http://twitter.com/Baahu
Hi,
There is another option to try for Receiver Based Low Level Kafka Consumer
which is part of Spark-Packages (
http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) . This can
be used with WAL as well for end to end zero data loss.
This is also Reliable Receiver and Commit offset to
SPARK_CLASSPATH is nice, spark.jars needs to list all the jars one by one when
submitting to yarn because spark.driver.classpath and spark.executor.classpath
are not available in yarn mode. Can someone remove the warnning from the code
or upload the jar in spark.driver.classpath and
Guillermo :
bq. Shell output: Requested user hdfs is not whitelisted and has id
496,which is below the minimum allowed 1000
Are you using a secure cluster ?
Can user hdfs be re-created with uuid 1000 ?
Cheers
On Sat, Jun 27, 2015 at 2:32 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
I'm
Yes , things go well now. It is a problem of SimpleDateFormat. Thank you all.
-- --
??: Dumas Hwang;dumas.hw...@gmail.com;
: 2015??6??27??(??) 8:16
??: Tathagata Dast...@databricks.com;
: Emrehan
Hi,
Why doesn't JavaRDD has saveAsNewAPIHadoopFile() associated with it.
Thanks,
Baahu
--
Twitter:http://twitter.com/Baahu
I had a look at the new R on Spark API / Feature in Spark 1.4.0
For those skilled in the art (of R and distributed computing) it will be
immediately clear that ON is a marketing ploy and what it actually is is
TO ie Spark 1.4.0 offers INTERFACE from R TO DATA stored in Spark in
distributed
How do you partition by product in Python?
the only API is partitionBy(50)
On Jun 18, 2015, at 8:42 AM, Debasish Das debasish.da...@gmail.com wrote:
Also in my experiments, it's much faster to blocked BLAS through cartesian
rather than doing sc.union. Here are the details on the experiments:
Hello!
We use pyspark to run a set of data extractors (think regex). The extractors
(regexes) generally run quite quickly and find a few matches which are
returned and stored into a database.
My question is -- is it possible to make the function that runs the
extractors have a timeout? I.E. if
Yeah, you shouldn't have to rename the columns before joining them.
Do you see the same behavior on 1.3 vs 1.4?
Nick
2015년 6월 27일 (토) 오전 2:51, Axel Dahl a...@whisperstream.com님이 작성:
still feels like a bug to have to create unique names before a join.
On Fri, Jun 26, 2015 at 9:51 PM, ayan
I've only tested on 1.4, but imagine 1.3 is the same or a lot of people's
code would be failing right now.
On Saturday, June 27, 2015, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Yeah, you shouldn't have to rename the columns before joining them.
Do you see the same behavior on 1.3 vs
Have you looked at:
http://stackoverflow.com/questions/2281850/timeout-function-if-it-takes-too-long-to-finish
FYI
On Sat, Jun 27, 2015 at 8:33 AM, wasauce wferr...@gmail.com wrote:
Hello!
We use pyspark to run a set of data extractors (think regex). The
extractors
(regexes) generally run
I would test it against 1.3 to be sure, because it could -- though unlikely
-- be a regression. For example, I recently stumbled upon this issue
https://issues.apache.org/jira/browse/SPARK-8670 which was specific to
1.4.
On Sat, Jun 27, 2015 at 12:28 PM Axel Dahl a...@whisperstream.com wrote:
For anyone monitoring the thread, I was able to successfully install and run
a small Spark cluster and model using this method:
First, make sure that the username being used to login to RStudio Server is
the one that was used to install Spark on the EC2 instance. Thanks to
Shivaram for his help
Our project is having a hard time following what we are supposed to do to
migrate this function from Spark 1.2 to 1.3.
/**
* Dump matrix as computed Mahout's DRM into specified (HD)FS path
* @param path
*/
def dfsWrite(path: String) = {
val ktag = implicitly[ClassTag[K]]
Thanks Mark for the update. For those interested Vincent Warmerdam also has
some details on making the /root/spark installation work at
https://issues.apache.org/jira/browse/SPARK-8596?focusedCommentId=14604328page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14604328
Not sure what is the issue but when i run the spark-submit or spark-shell i
am getting below error
/usr/bin/spark-class: line 24: /usr/bin/load-spark-env.sh: No such file or
directory
Can some one please help
Thanks,
Try setting the yarn executor memory overhead to a higher value like 1g or
1.5g or more.
Regards
Sab
On 28-Jun-2015 9:22 am, Ayman Farahat ayman.fara...@yahoo.com wrote:
That's correct this is Yarn
And spark 1.4
Also using the Anaconda tar for Numpy and other Libs
Sent from my iPhone
On
Are you running on top of YARN? Plus pls provide your infrastructure
details.
Regards
Sab
On 28-Jun-2015 8:47 am, Ayman Farahat ayman.fara...@yahoo.com.invalid
wrote:
Hello;
I tried to adjust the number of blocks by repartitioning the input.
Here is How I do it; (I am partitioning by users )
Where do I do that ?
Thanks
Sent from my iPhone
On Jun 27, 2015, at 8:59 PM, Sabarish Sasidharan
sabarish.sasidha...@manthan.com wrote:
Try setting the yarn executor memory overhead to a higher value like 1g or
1.5g or more.
Regards
Sab
On 28-Jun-2015 9:22 am, Ayman Farahat
Are you running on top of YARN? Plus pls provide your infrastructure
details.
Regards
Sab
On 28-Jun-2015 9:20 am, Sabarish Sasidharan
sabarish.sasidha...@manthan.com wrote:
Are you running on top of YARN? Plus pls provide your infrastructure
details.
Regards
Sab
On 28-Jun-2015 8:47 am,
That's correct this is Yarn
And spark 1.4
Also using the Anaconda tar for Numpy and other Libs
Sent from my iPhone
On Jun 27, 2015, at 8:50 PM, Sabarish Sasidharan
sabarish.sasidha...@manthan.com wrote:
Are you running on top of YARN? Plus pls provide your infrastructure details.
created as SPARK-8685
https://issues.apache.org/jira/browse/SPARK-8685
@Yin, thx, have fixed sample code with the correct names.
On Sat, Jun 27, 2015 at 1:56 PM, Yin Huai yh...@databricks.com wrote:
Axel,
Can you file a jira and attach your code in the description of the jira?
This looks
Hello;
I tried to adjust the number of blocks by repartitioning the input.
Here is How I do it; (I am partitioning by users )
tot = newrdd.map(lambda l:
(l[1],Rating(int(l[1]),int(l[2]),l[4]))).partitionBy(50).cache()
ratings = tot.values()
numIterations =8
rank = 80
model =
39 matches
Mail list logo