RE: streaming sequence files?

Barnaby Falls Mon, 28 Jul 2014 18:41:21 -0700

Running as Standalone Cluster. From my monitoring console:
[spark-logo-77x50px-hd.png] Spark Master at spark://101.73.54.149:7077
     * URL: spark://101.73.54.149:7077     * Workers: 1     * Cores: 2 Total, 0 
Used     * Memory: 2.4 GB Total, 0.0 B Used     * Applications: 0 Running, 24 
Completed     * Drivers: 0 Running, 0 Completed     * Status: ALIVE
Workers
   Id   Address   State
   Cores Memory   worker-20140723222518-101.73.54.149-37995 101.73.54.149:37995 
ALIVE 2 (0 Used) 2.4 GB (0.0 B Used)


> From: tathagata.das1...@gmail.com
> Date: Sat, 26 Jul 2014 20:14:37 -0700
> Subject: Re: streaming sequence files?
> To: user@spark.apache.org
> CC: u...@spark.incubator.apache.org
> 
> Which deployment environment are you running the streaming programs?
> Standalone? In that case you have to specify what is the max cores for
> each application, other all the cluster resources may get consumed by
> the application.
> http://spark.apache.org/docs/latest/spark-standalone.html
> 
> TD
> 
> On Thu, Jul 24, 2014 at 4:57 PM, Barnaby <bfa...@outlook.com> wrote:
> > I have the streaming program writing sequence files. I can find one of the
> > files and load it in the shell using:
> >
> > scala> val rdd = sc.sequenceFile[String,
> > Int]("tachyon://localhost:19998/files/WordCounts/20140724-213930")
> > 14/07/24 21:47:50 INFO storage.MemoryStore: ensureFreeSpace(32856) called
> > with curMem=0, maxMem=309225062
> > 14/07/24 21:47:50 INFO storage.MemoryStore: Block broadcast_0 stored as
> > values to memory (estimated size 32.1 KB, free 294.9 MB)
> > rdd: org.apache.spark.rdd.RDD[(String, Int)] = MappedRDD[1] at sequenceFile
> > at <console>:12
> >
> > So I got some type information, seems good.
> >
> > It took a while to research but I got the following streaming code to
> > compile and run:
> >
> > val wordCounts = ssc.fileStream[String, Int, SequenceFileInputFormat[String,
> > Int]](args(0))
> >
> > It works now and I offer this for reference to anybody else who may be
> > curious about saving sequence files and then streaming them back in.
> >
> > Question:
> > When running both streaming programs at the same time using spark-submit I
> > noticed that only one app would really run. To get the one app to continue I
> > had to stop the other app. Is there a way to get these running
> > simultaneously?
> >
> >
> >
> > --
> > View this message in context: 
> > http://apache-spark-user-list.1001560.n3.nabble.com/streaming-sequence-files-tp10557p10620.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: streaming sequence files?

Reply via email to