I am also seeing this error in a YARN spark streaming (1.2.0) application
Tim Smith wrote
Similar issue (Spark 1.0.0). Streaming app runs for a few seconds
before these errors start to pop all over the driver logs:
14/09/12 17:30:23 WARN TaskSetManager: Loss was due to java.lang.Exception
I believe the message merely means that a block has been removed from
memory because either it is not needed or because it is also persisted on
disk and memory is low. It does not mean data is lost. What is the end
problem you observe? This does not match the problem you link to in the
mailing
Its a typesafe jar conflict, you will need to put the jar with getDuration
method in the first position of your classpath.
Thanks
Best Regards
On Wed, Dec 31, 2014 at 4:38 PM, Christophe Billiard
christophe.billi...@gmail.com wrote:
Hi all,
I am currently trying to combine datastax's
Hi Fernando,
Here's a https://github.com/sigmoidanalytics/Test simple log
parser/analyser written in scala (you can run it without
spark-shell/submit). https://github.com/sigmoidanalytics/Test
Basically to run a spark job without spark-submit or shell you need a build
file
Hih
I am having simiiar problem and tries your solution with spark 1.2 build
withing hadoop
I am saving object to parquet files where some fields are of type Array.
When I fetch them as below I get
java.lang.ClassCastException: [B cannot be cast to java.lang.CharSequence
def
Would be great if you can share the piece of code happening inside your
mapPartition, I'm assuming you are creating/handling a lot of Complex
objects and hence it slows down the performance. Here's a link
http://spark.apache.org/docs/latest/tuning.html to performance tuning if
you haven't seen it
One approach will be to create a event based streaming pipeline, like your
spark streaming will listen on a socket or whatever for the event to happen
and once it happens, it will hit your cassandra and does work.
Thanks
Best Regards
On Wed, Dec 31, 2014 at 3:14 PM, Oleg Ruchovets
How about this..apply flatmap on per line. And in that function, parse each
line and return all the colums as per your need.
On Wed, Dec 31, 2014 at 10:16 AM, Sanjay Subramanian
sanjaysubraman...@yahoo.com.invalid wrote:
hey guys
Some of u may care :-) but this is just give u a background
ok.. we need to get these centralized somewhere as the documentation for
spark-env.sh sends people far far far off in the wrong direction. maybe
remove all the directives in that script in favor of a link to a page that
is more live and can be updated?
Kevin
On Thu, Jan 1, 2015 at 12:43 AM,
You don't in general configure Spark with environment variables. They exist
but largely for backwards compatibility. Use arguments like
--executor-memory on spark-submit, which are explained in the docs and the
help message. It is possible to directly set the system properties with -D
too if you
Also it looks like that when I store the String in parquet and try to fetch
them using spark code I got classcast exception
below how my array of strings are saved. each character ascii value is
present in array of ints
res25: Array[Seq[String]] r= Array(ArrayBuffer(Array(104, 116, 116, 112,
Hi All,
I am consuming a 8 partition kafka topic through multiple Dstreams and
Processing them in Spark.
But irrespective of multiple InputDstreams the spark master UI is showing
only one receiver.
The following is the consumer part of spark code:
int numStreams = 8;
Hi,
I encountered the same issue and solved it. Please check my blog post
http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/
Thank you
--
View this message in context:
oops
sqlContext.setConf(spark.sql.parquet.binaryAsString, true)
thois solved the issue important for everyone
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-nested-JSON-data-with-Spark-SQL-tp19310p20936.html
Sent from the Apache Spark User
Hi,
I wrote a blog post about this.
http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-tp19233p20939.html
Sent from the Apache Spark User List mailing list
I am trying to read a file into a single partition but it seems like
sparkContext.textFile ignores the passed minPartitions value. I know I can
repartition the RDD but I was curious to know if this is expected or if
this is a bug that needs to be further investigated?
Thanks Akhil, that will help a lot !
It turned out that spark-jobserver does not work in development mode but
if you deploy a server it works (looks like the dependencies when running
jobserver from sbt are not right)
On Thu, Jan 1, 2015 at 5:22 AM, Akhil Das ak...@sigmoidanalytics.com
thanks let me try that out
From: Hitesh Khamesra hiteshk...@gmail.com
To: Sanjay Subramanian sanjaysubraman...@yahoo.com
Cc: Kapil Malik kma...@adobe.com; Sean Owen so...@cloudera.com;
user@spark.apache.org user@spark.apache.org
Sent: Thursday, January 1, 2015 9:46 AM
Subject: Re:
I have seen that link. I am using RDD of Byte Array n Kryo serialization.
Inside mapPartition when I measure time it is never more than 1 ms whereas
total time took by application is like 30 min. Codebase has lot of
dependencies. I m trying to come up with a simple version where I can
reproduce
thank you, issue of ticket.
2015/01/02 15:45、Akhil Das ak...@sigmoidanalytics.com のメッセージ:
Yep, Opened SPARK-5054
Thanks
Best Regards
On Tue, Dec 30, 2014 at 5:52 AM, Michael Armbrust mich...@databricks.com
wrote:
Yeah, this looks like a regression in the API due to the addition of
Yep, Opened SPARK-5054 https://issues.apache.org/jira/browse/SPARK-5054
Thanks
Best Regards
On Tue, Dec 30, 2014 at 5:52 AM, Michael Armbrust mich...@databricks.com
wrote:
Yeah, this looks like a regression in the API due to the addition of
arbitrary decimal support. Can you open a JIRA?
This log message is normal; in this case, this message is saying that the
final stage needed to compute your job does not have any dependencies /
parent stages and that there are no parent stages that need to be computed.
On Thu, Jan 1, 2015 at 11:02 PM, shahid sha...@trialx.com wrote:
hi guys
hi guys
i have just starting using spark, i am getting this as an info
15/01/02 11:54:17 INFO DAGScheduler: Parents of final stage: List()
15/01/02 11:54:17 INFO DAGScheduler: Missing parents: List()
15/01/02 11:54:17 INFO DAGScheduler: Submitting Stage 6 (PythonRDD[12] at
RDD at
Hi Ankit,
Optional number of partitions value is to increase number of partitions not
reduce it from default value.
On Thu, Jan 1, 2015 at 10:43 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
I am trying to read a file into a single partition but it seems like
sparkContext.textFile
Hi Anoop,
The Spark decision tree implementation supports: regression and multi class
classification, continuous and categorical features, pruning and does not
support missing features at present. You can probably think of it as
distributed CART though personally I always find the acronyms
25 matches
Mail list logo