Re: Execption writing on two cassandra tables NoHostAvailableException: All host(s) tried for query failed (no host was tried)

2015-06-01 Thread Helena Edelson
Hi Antonio, First, what version of the Spark Cassandra Connector are you using? You are using Spark 1.3.1, which the Cassandra connector today supports in builds from the master branch only - the release with public artifacts supporting Spark 1.3.1 is coming soon ;) Please see

Re: Execption writing on two cassandra tables NoHostAvailableException: All host(s) tried for query failed (no host was tried)

2015-06-01 Thread Helena Edelson
) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/06/01 16:43:30 WARN TaskSetManager: Lost task 1.0 in stage 61.0 (TID 82, localhost): org.apache.spark.TaskKilledException at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194) A G 2015-06-01 13:26 GMT+02:00 Helena

Re: Grouping and storing unordered time series data stream to HDFS

2015-05-16 Thread Helena Edelson
Consider using cassandra with spark streaming and timeseries, cassandra has been doing time series for years. Here’s some snippets with kafka streaming and writing/reading the data back:

Re: Spark streaming alerting

2015-03-24 Thread Helena Edelson
Streaming _from_ cassandra, CassandraInputDStream, is coming BTW https://issues.apache.org/jira/browse/SPARK-6283 https://issues.apache.org/jira/browse/SPARK-6283 I am working on it now. Helena @helenaedelson On Mar 23, 2015, at 5:22 AM, Khanderao Kand Gmail khanderao.k...@gmail.com wrote:

Re: Spark streaming alerting

2015-03-24 Thread Helena Edelson
Rizal anriza...@gmail.com wrote: Helena, The CassandraInputDStream sounds interesting. I dont find many things in the jira though. Do you have more details on what it tries to achieve ? Thanks, Anwar. On Tue, Mar 24, 2015 at 2:39 PM, Helena Edelson helena.edel...@datastax.com

Re: How to parse Json formatted Kafka message in spark streaming

2015-03-05 Thread Helena Edelson
Hi Cui, What version of Spark are you using? There was a bug ticket that may be related to this, fixed in core/src/main/scala/org/apache/spark/rdd/RDD.scala that is merged into versions 1.3.0 and 1.2.1 . If you are using 1.1.1 that may be the reason but it’s a stretch

Re: How to parse Json formatted Kafka message in spark streaming

2015-03-05 Thread Helena Edelson
[MonthlyCommits]} .saveToCassandra(githubstats,monthly_commits) HELENA EDELSON Senior Software Engineer, DSE Analytics On Mar 5, 2015, at 9:33 AM, Ted Yu yuzhih...@gmail.com wrote: Cui: You can check messages.partitions.size to determine whether messages is an empty RDD. Cheers

Re: Error: Spark-streaming to Cassandra

2014-12-13 Thread Helena Edelson
I am curious why you use the 1.0.4 java artifact with the latest 1.1.0? This might be your compilation problem - The older java version. dependency groupIdcom.datastax.spark/groupId artifactIdspark-cassandra-connector_2.10/artifactId version1.1.0/version /dependency dependency

Re: JSON Input files

2014-12-13 Thread Helena Edelson
One solution can be found here: https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets - Helena @helenaedelson On Dec 13, 2014, at 11:18 AM, Madabhattula Rajesh Kumar mrajaf...@gmail.com wrote: Hi Team, I have a large JSON file in Hadoop. Could you please let me know

Re: Spark-Streaming: output to cassandra

2014-12-05 Thread Helena Edelson
You can just do You can just do something like this, the Spark Cassandra Connector handles the rest KafkaUtils.createStream[String, String, StringDecoder, StringDecoder]( ssc, kafkaParams, Map(KafkaTopicRaw - 10), StorageLevel.DISK_ONLY_2) .map { case (_, line) = line.split(,)}

Re: Spark-Streaming: output to cassandra

2014-12-05 Thread Helena Edelson
. Thanks and Regards, Md. Aiman Sarosh. Accenture Services Pvt. Ltd. Mob #: (+91) - 9836112841. From: Helena Edelson helena.edel...@datastax.com Sent: Friday, December 5, 2014 6:26 PM To: Sarosh, M. Cc: user@spark.apache.org Subject: Re: Spark-Streaming: output to cassandra You

Re: Spark streaming cannot receive any message from Kafka

2014-11-13 Thread Helena Edelson
I encounter no issues with streaming from kafka to spark in 1.1.0. Do you perhaps have a version conflict? Helena On Nov 13, 2014 12:55 AM, Jay Vyas jayunit100.apa...@gmail.com wrote: Yup , very important that n1 for spark streaming jobs, If local use local[2] The thing to remember is

Re: Cassandra spark connector exception: NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;

2014-11-11 Thread Helena Edelson
Hi, It looks like you are building from master (spark-cassandra-connector-assembly-1.2.0). - Append this to your com.google.guava declaration: % provided - Be sure your version of the connector dependency is the same as the assembly build. For instance, if you are using 1.1.0-beta1, build your

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Helena Edelson
Hi Harold, Can you include the versions of spark and spark-cassandra-connector you are using? Thanks! Helena @helenaedelson On Oct 30, 2014, at 12:58 PM, Harold Nguyen har...@nexgate.com wrote: Hi all, I'd like to be able to modify values in a DStream, and then send it off to an

Re: Accessing Cassandra with SparkSQL, Does not work?

2014-10-31 Thread Helena Edelson
Hi Shahab, I’m just curious, are you explicitly needing to use thrift? Just using the connector with spark does not require any thrift dependencies. Simply: com.datastax.spark %% spark-cassandra-connector % 1.1.0-beta1” But to your question, you declare the keyspace but also unnecessarily

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Helena Edelson
, Harold On Fri, Oct 31, 2014 at 10:31 AM, Helena Edelson helena.edel...@datastax.com wrote: Hi Harold, Can you include the versions of spark and spark-cassandra-connector you are using? Thanks! Helena @helenaedelson On Oct 30, 2014, at 12:58 PM, Harold Nguyen har...@nexgate.com

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Helena Edelson
Hi Harold, This is a great use case, and here is how you could do it, for example, with Spark Streaming: Using a Kafka stream: https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrweather/KafkaStreamingActor.scala#L50 Save raw data to

Re: Accessing Cassandra with SparkSQL, Does not work?

2014-10-31 Thread Helena Edelson
, org.slf4j % slf4j-api % 1.7.7, org.slf4j % slf4j-simple % 1.7.7, org.clapper %% grizzled-slf4j % 1.0.2, log4j % log4j % 1.2.17 On Fri, Oct 31, 2014 at 6:42 PM, Helena Edelson helena.edel...@datastax.com wrote: Hi Shahab, I’m just curious, are you explicitly needing

Re: Best way to partition RDD

2014-10-30 Thread Helena Edelson
-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/CassandraRDD.scala#L26-L37 Cheers, Helena @helenaedelson On Oct 30, 2014, at 1:12 PM, Helena Edelson helena.edel...@datastax.com wrote: Hi Shahab, -How many spark/cassandra nodes are in your cluster? -What is your deploy

Re: PySpark and Cassandra 2.1 Examples

2014-10-29 Thread Helena Edelson
Nice! - Helena @helenaedelson On Oct 29, 2014, at 12:01 PM, Mike Sukmanowsky mike.sukmanow...@gmail.com wrote: Hey all, Just thought I'd share this with the list in case any one else would benefit. Currently working on a proper integration of PySpark and DataStax's new

Re: Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Helena Edelson
Hi Harold, It seems like, based on your previous post, you are using one version of the connector as a dependency yet building the assembly jar from master? You were using 1.1.0-alpha3 (you can upgrade to alpha4, beta coming this week) yet your assembly is

Re: Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Helena Edelson
absolutely new to Spark and Scala and sbt). I'll write a blog post on how to get this working later, in case it can help someone. I really appreciate the help! Harold On Tue, Oct 28, 2014 at 11:55 AM, Helena Edelson helena.edel...@datastax.com wrote: Hi Harold, It seems like, based

Re: NoSuchMethodError: cassandra.thrift.ITransportFactory.openTransport()

2014-10-27 Thread Helena Edelson
Hi Sasi, Thrift is not needed to integrate Cassandra with Spark. In fact the only dep you need is spark-cassandra-connector_2.10-1.1.0-alpha3.jar, and you can upgrade to alpha4; we’re publishing beta very soon. For future reference, questions/tickets can be created

Re: Spark as Relational Database

2014-10-26 Thread Helena Edelson
Hi, It is very easy to integrate using Cassandra in a use case such as this. For instance, do your joins in Spark and do your data storage in Cassandra which allows a very flexible schema, unlike a relational DB, and is much faster, fault tolerant, and with spark and colocation WRT data