e cluster resources may get consumed by
> the application.
> http://spark.apache.org/docs/latest/spark-standalone.html
>
> TD
>
> On Thu, Jul 24, 2014 at 4:57 PM, Barnaby wrote:
> > I have the streaming program writing sequence files. I can find one of the
> > fil
I have the streaming program writing sequence files. I can find one of the
files and load it in the shell using:
scala> val rdd = sc.sequenceFile[String,
Int]("tachyon://localhost:19998/files/WordCounts/20140724-213930")
14/07/24 21:47:50 INFO storage.MemoryStore: ensureFreeSpace(32856) called
wit
If I save an RDD as a sequence file such as:
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.foreachRDD( d => {
d.saveAsSequenceFile("tachyon://localhost:19998/files/WordCounts-" +
(new SimpleDateFormat("MMdd-HHmmss") format
Calendar.getInstance.getTime).t
sequenceFileStream() method?
Thanks again for your help.
> On Jul 22, 2014, at 1:57, "Sean Owen" wrote:
>
> What about simply:
>
> dstream.foreachRDD(_.saveAsSequenceFile(...))
>
> ?
>
>> On Tue, Jul 22, 2014 at 2:06 AM, Barnaby wrote:
>> First of all, I do no
First of all, I do not know Scala, but learning.
I'm doing a proof of concept by streaming content from a socket, counting
the words and write it to a Tachyon disk. A different script will read the
file stream and print out the results.
val lines = ssc.socketTextStream(args(0), args(1).toInt,
St