Re: saveAsTextFile error

2014-11-14 Thread Harold Nguyen
Hi Niko, It looks like you are calling a method on DStream, which does not exist. Check out: https://spark.apache.org/docs/1.1.0/streaming-programming-guide.html#output-operations-on-dstreams for the method saveAsTextFiles Harold On Fri, Nov 14, 2014 at 10:39 AM, Niko Gamulin

Re: Can spark read and write to cassandra without HDFS?

2014-11-12 Thread Harold Nguyen
Hi Kevin, Yes, Spark can read and write to Cassandra without Hadoop. Have you seen this: https://github.com/datastax/spark-cassandra-connector Harold On Wed, Nov 12, 2014 at 9:28 PM, Kevin Burton bur...@spinn3r.com wrote: We have all our data in Cassandra so I’d prefer to not have to bring

Spark Streaming - Most popular Twitter Hashtags

2014-11-03 Thread Harold Nguyen
Hi all, I was just reading this nice documentation here: http://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html And got to the end of it, which says: Note that there are more efficient ways to get the top 10 hashtags. For example, instead of sorting the entire of

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Harold Nguyen
Thanks Lalit, and Helena, What I'd like to do is manipulate the values within a DStream like this: DStream.foreachRDD( rdd = { val arr = record.toArray } I'd then like to be able to insert results from the arr back into Cassadnra, after I've manipulated the arr array. However, for all

NonSerializable Exception in foreachRDD

2014-10-30 Thread Harold Nguyen
Hi all, In Spark Streaming, when I do foreachRDD on my DStreams, I get a NonSerializable exception when I try to do something like: DStream.foreachRDD( rdd = { var sc.parallelize(Seq((test, blah))) }) Is there any way around that ? Thanks, Harold

Re: Manipulating RDDs within a DStream

2014-10-30 Thread Harold Nguyen
Hi, Sorry, there's a typo there: val arr = rdd.toArray Harold On Thu, Oct 30, 2014 at 9:58 AM, Harold Nguyen har...@nexgate.com wrote: Hi all, I'd like to be able to modify values in a DStream, and then send it off to an external source like Cassandra, but I keep getting Serialization

Manipulating RDDs within a DStream

2014-10-30 Thread Harold Nguyen
Hi all, I'd like to be able to modify values in a DStream, and then send it off to an external source like Cassandra, but I keep getting Serialization errors and am not sure how to use the correct design pattern. I was wondering if you could help me. I'd like to be able to do the following:

Spark Streaming from Kafka

2014-10-29 Thread Harold Nguyen
Hi, Just wondering if you've seen the following error when reading from Kafka: ERROR ReceiverTracker: Deregistered receiver for stream 0: Error starting receiver 0 - java.lang.NoClassDefFoundError: scala/reflect/ClassManifest at kafka.utils.Log4jController$.init(Log4jController.scala:29) at

Spark Streaming with Kinesis

2014-10-29 Thread Harold Nguyen
Hi all, I followed the guide here: http://spark.apache.org/docs/latest/streaming-kinesis-integration.html But got this error: Exception in thread main java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider Would you happen to know what dependency or jar is needed ? Harold

Re: Spark Streaming with Kinesis

2014-10-29 Thread Harold Nguyen
, 2014 at 9:22 AM, Harold Nguyen har...@nexgate.com wrote: Hi all, I followed the guide here: http://spark.apache.org/docs/latest/streaming-kinesis-integration.html But got this error: Exception in thread main java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider Would

Convert DStream to String

2014-10-29 Thread Harold Nguyen
Hi all, How do I convert a DStream to a string ? For instance, I want to be able to: val myword = words.filter(word = word.startsWith(blah)) And use myword in other places, like tacking it onto (key, value) pairs, like so: val pairs = words.map(word = (myword+_+word, 1)) Thanks for any help,

Re: Convert DStream to String

2014-10-29 Thread Harold Nguyen
but which are usually RDDs of things. On Wed, Oct 29, 2014 at 11:15 PM, Harold Nguyen har...@nexgate.com wrote: Hi all, How do I convert a DStream to a string ? For instance, I want to be able to: val myword = words.filter(word = word.startsWith(blah)) And use myword in other

Saving to Cassandra from Spark Streaming

2014-10-28 Thread Harold Nguyen
Hi all, I'm having trouble troubleshooting this particular block of code for Spark Streaming and saving to Cassandra: val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER) val words = lines.flatMap(_.split( )) val wordCounts = words.map(x = (x,

Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Harold Nguyen
Hi all, The following works fine when submitting dependency jars through Spark-Shell: ./bin/spark-shell --master spark://ip-172-31-38-112:7077 --jars

Spark Streaming into Cassandra - NoClass ColumnMapper

2014-10-27 Thread Harold Nguyen
Hi Spark friends, I'm trying to connect Spark Streaming into Cassandra by modifying the NetworkWordCount.scala streaming example, and doing the make as few changes as possible but having it insert data into Cassandra. Could you let me know if you see any errors? I'm using the

Re: Spark Streaming into Cassandra - NoClass ColumnMapper

2014-10-27 Thread Harold Nguyen
, Oct 27, 2014 at 9:22 PM, Harold Nguyen har...@nexgate.com wrote: Hi Spark friends, I'm trying to connect Spark Streaming into Cassandra by modifying the NetworkWordCount.scala streaming example, and doing the make as few changes as possible but having it insert data into Cassandra. Could you