RE: streaming sequence files?

2014-07-28 Thread Barnaby Falls
e cluster resources may get consumed by > the application. > http://spark.apache.org/docs/latest/spark-standalone.html > > TD > > On Thu, Jul 24, 2014 at 4:57 PM, Barnaby wrote: > > I have the streaming program writing sequence files. I can find one of the > > fil

Re: streaming sequence files?

2014-07-24 Thread Barnaby
I have the streaming program writing sequence files. I can find one of the files and load it in the shell using: scala> val rdd = sc.sequenceFile[String, Int]("tachyon://localhost:19998/files/WordCounts/20140724-213930") 14/07/24 21:47:50 INFO storage.MemoryStore: ensureFreeSpace(32856) called wit

streaming sequence files?

2014-07-23 Thread Barnaby
If I save an RDD as a sequence file such as: val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) wordCounts.foreachRDD( d => { d.saveAsSequenceFile("tachyon://localhost:19998/files/WordCounts-" + (new SimpleDateFormat("MMdd-HHmmss") format Calendar.getInstance.getTime).t

Re: saveAsSequenceFile for DStream

2014-07-22 Thread Barnaby Falls
sequenceFileStream() method? Thanks again for your help. > On Jul 22, 2014, at 1:57, "Sean Owen" wrote: > > What about simply: > > dstream.foreachRDD(_.saveAsSequenceFile(...)) > > ? > >> On Tue, Jul 22, 2014 at 2:06 AM, Barnaby wrote: >> First of all, I do no

saveAsSequenceFile for DStream

2014-07-21 Thread Barnaby
First of all, I do not know Scala, but learning. I'm doing a proof of concept by streaming content from a socket, counting the words and write it to a Tachyon disk. A different script will read the file stream and print out the results. val lines = ssc.socketTextStream(args(0), args(1).toInt, St