Re: Does Spark Streaming calculate during a batch?
On Thu, Nov 13, 2014 at 11:02 AM, Sean Owen wrote: > Yes. Data is collected for 5 minutes, then processing starts at the > end. The result may be an arbitrary function of the data in the > interval, so the interval has to finish before computation can start. > Thanks everyone.
Does Spark Streaming calculate during a batch?
I was running a proof of concept for my company with spark streaming, and the conclusion I came to is that spark collects data for the batch-duration, THEN starts the data-pipeline calculations. My batch size was 5 minutes, and the CPU was all but dead for 5, then when the 5 minutes were up the CPU's would spike for a while presumably doing the calculations. Is this presumption true, or is it running the data through the calculation pipeline before the batch is up? What could lead to the periodic CPU spike - I had a reduceByKey, so was it doing that only after all the batch data was in? Thanks
Re: ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId
Can you list what your fix was so others can benefit? On Wed, Oct 22, 2014 at 8:15 PM, arthur.hk.c...@gmail.com < arthur.hk.c...@gmail.com> wrote: > Hi, > > I have managed to resolve it because a wrong setting. Please ignore this . > > Regards > Arthur > > On 23 Oct, 2014, at 5:14 am, arthur.hk.c...@gmail.com < > arthur.hk.c...@gmail.com> wrote: > > > 14/10/23 05:09:04 WARN ConnectionManager: All connections not cleaned up > > >
Re: Submission to cluster fails (Spark SQL; NoSuchMethodError on SchemaRDD)
For posterity's sake, I solved this. The problem was the Cloudera cluster I was submitting to is running 1.0, and I was compiling against the latest 1.1 release. Downgrading to 1.0 on my compile got me past this. On Tue, Oct 14, 2014 at 6:08 PM, Michael Campbell < michael.campb...@gmail.com> wrote: > Hey all, I'm trying a very basic spark SQL job and apologies as I'm new to > a lot of this, but I'm getting this failure: > > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.spark.sql.SchemaRDD.take(I)[Lorg/apache/spark/sql/catalyst/expressions/Row; > > I've tried a variety of uber-jar creation, but it always comes down to > this. Right now my jar has *NO* dependencies on other jars (other than > Spark itself), and my "uber jar" contains essentially only the .class file > of my own code. > > My own code simply reads a parquet file and does a count(*) on the > contents. I'm sure this is very basic, but I'm at a loss. > > Thoughts and/or debugging tips welcome. > > Here's my code, which I call from the "main" object which passes in the > context, Parquet file name, and some arbitrary sql, (which for this is just > "select count(*) from flows"). > > I have run the equivalent successfully in spark-shell in the cluster. > > def readParquetFile(sparkCtx: SparkContext, dataFileName: String, sql: > String) = { > val sqlContext = new SQLContext(sparkCtx) > val parquetFile = sqlContext.parquetFile(dataFileName) > > parquetFile.registerAsTable("flows") > > println(s"About to run $sql") > > val start = System.nanoTime() > val countRDD = sqlContext.sql(sql) > val rows: Array[Row] = countRDD.take(1) // DIES HERE > val stop = System.nanoTime() > > println(s"result: ${rows(0)}") > println(s"Query took ${(stop - start) / 1e9} seconds.") > println(s"Query was $sql") > > } > > > >
Spark Bug? job fails to run when given options on spark-submit (but starts and fails without)
TL;DR - a spark SQL job fails with an OOM (Out of heap space) error. If given "--executor-memory" values, it won't even start. Even (!) if the values given ARE THE SAME AS THE DEFAULT. Without --executor-memory: 14/10/16 17:14:58 INFO TaskSetManager: Serialized task 1.0:64 as 14710 bytes in 1 ms 14/10/16 17:14:58 WARN TaskSetManager: Lost TID 26 (task 1.0:25) 14/10/16 17:14:58 WARN TaskSetManager: Loss was due to java.lang.OutOfMemoryError java.lang.OutOfMemoryError: Java heap space at parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:609) at parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360) ... USING --executor-memory (WITH ANY VALUE), even "1G" which is the default: Parsed arguments: master spark://:7077 deployMode null executorMemory 1G ... System properties: spark.executor.memory -> 1G spark.eventLog.enabled -> true ... 14/10/16 17:14:23 INFO TaskSchedulerImpl: Adding task set 1.0 with 678 tasks 14/10/16 17:14:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Spark 1.0.0. Is this a bug?
Re: jsonRDD: NoSuchMethodError
How did you resolve it? On Tue, Jul 15, 2014 at 3:50 AM, SK wrote: > The problem is resolved. Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/jsonRDD-NoSuchMethodError-tp9688p9742.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
Submission to cluster fails (Spark SQL; NoSuchMethodError on SchemaRDD)
Hey all, I'm trying a very basic spark SQL job and apologies as I'm new to a lot of this, but I'm getting this failure: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.SchemaRDD.take(I)[Lorg/apache/spark/sql/catalyst/expressions/Row; I've tried a variety of uber-jar creation, but it always comes down to this. Right now my jar has *NO* dependencies on other jars (other than Spark itself), and my "uber jar" contains essentially only the .class file of my own code. My own code simply reads a parquet file and does a count(*) on the contents. I'm sure this is very basic, but I'm at a loss. Thoughts and/or debugging tips welcome. Here's my code, which I call from the "main" object which passes in the context, Parquet file name, and some arbitrary sql, (which for this is just "select count(*) from flows"). I have run the equivalent successfully in spark-shell in the cluster. def readParquetFile(sparkCtx: SparkContext, dataFileName: String, sql: String) = { val sqlContext = new SQLContext(sparkCtx) val parquetFile = sqlContext.parquetFile(dataFileName) parquetFile.registerAsTable("flows") println(s"About to run $sql") val start = System.nanoTime() val countRDD = sqlContext.sql(sql) val rows: Array[Row] = countRDD.take(1) // DIES HERE val stop = System.nanoTime() println(s"result: ${rows(0)}") println(s"Query took ${(stop - start) / 1e9} seconds.") println(s"Query was $sql") }
Re: can't print DStream after reduce
I think you typo'd the jira id; it should be https://issues.apache.org/jira/browse/SPARK-2475 "Check whether #cores > #receivers in local mode" On Mon, Jul 14, 2014 at 3:57 PM, Tathagata Das wrote: > The problem is not really for local[1] or local. The problem arises when > there are more input streams than there are cores. > But I agree, for people who are just beginning to use it by running it > locally, there should be a check addressing this. > > I made a JIRA for this. > https://issues.apache.org/jira/browse/SPARK-2464 > > TD > > > On Sun, Jul 13, 2014 at 4:26 PM, Sean Owen wrote: > >> How about a PR that rejects a context configured for local or local[1]? >> As I understand it is not intended to work and has bitten several people. >> On Jul 14, 2014 12:24 AM, "Michael Campbell" >> wrote: >> >>> This almost had me not using Spark; I couldn't get any output. It is >>> not at all obvious what's going on here to the layman (and to the best of >>> my knowledge, not documented anywhere), but now you know you'll be able to >>> answer this question for the numerous people that will also have it. >>> >>> >>> On Sun, Jul 13, 2014 at 5:24 PM, Walrus theCat >>> wrote: >>> >>>> Great success! >>>> >>>> I was able to get output to the driver console by changing the >>>> construction of the Streaming Spark Context from: >>>> >>>> val ssc = new StreamingContext("local" /**TODO change once a cluster >>>> is up **/, >>>> "AppName", Seconds(1)) >>>> >>>> >>>> to: >>>> >>>> val ssc = new StreamingContext("local[2]" /**TODO change once a cluster >>>> is up **/, >>>> "AppName", Seconds(1)) >>>> >>>> >>>> I found something that tipped me off that this might work by digging >>>> through this mailing list. >>>> >>>> >>>> On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat >>>> wrote: >>>> >>>>> More strange behavior: >>>>> >>>>> lines.foreachRDD(x => println(x.first)) // works >>>>> lines.foreachRDD(x => println((x.count,x.first))) // no output is >>>>> printed to driver console >>>>> >>>>> >>>>> >>>>> >>>>> On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat < >>>>> walrusthe...@gmail.com> wrote: >>>>> >>>>>> >>>>>> Thanks for your interest. >>>>>> >>>>>> lines.foreachRDD(x => println(x.count)) >>>>>> >>>>>> And I got 0 every once in a while (which I think is strange, because >>>>>> lines.print prints the input I'm giving it over the socket.) >>>>>> >>>>>> >>>>>> When I tried: >>>>>> >>>>>> lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) >>>>>> >>>>>> I got no count. >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < >>>>>> tathagata.das1...@gmail.com> wrote: >>>>>> >>>>>>> Try doing DStream.foreachRDD and then printing the RDD count and >>>>>>> further inspecting the RDD. >>>>>>> On Jul 13, 2014 1:03 AM, "Walrus theCat" >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have a DStream that works just fine when I say: >>>>>>>> >>>>>>>> dstream.print >>>>>>>> >>>>>>>> If I say: >>>>>>>> >>>>>>>> dstream.map(_,1).print >>>>>>>> >>>>>>>> that works, too. However, if I do the following: >>>>>>>> >>>>>>>> dstream.reduce{case(x,y) => x}.print >>>>>>>> >>>>>>>> I don't get anything on my console. What's going on? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >
Re: can't print DStream after reduce
Thank you Tathagata. This had me going for far too long. On Mon, Jul 14, 2014 at 3:57 PM, Tathagata Das wrote: > The problem is not really for local[1] or local. The problem arises when > there are more input streams than there are cores. > But I agree, for people who are just beginning to use it by running it > locally, there should be a check addressing this. > > I made a JIRA for this. > https://issues.apache.org/jira/browse/SPARK-2464 > > TD > > > On Sun, Jul 13, 2014 at 4:26 PM, Sean Owen wrote: > >> How about a PR that rejects a context configured for local or local[1]? >> As I understand it is not intended to work and has bitten several people. >> On Jul 14, 2014 12:24 AM, "Michael Campbell" >> wrote: >> >>> This almost had me not using Spark; I couldn't get any output. It is >>> not at all obvious what's going on here to the layman (and to the best of >>> my knowledge, not documented anywhere), but now you know you'll be able to >>> answer this question for the numerous people that will also have it. >>> >>> >>> On Sun, Jul 13, 2014 at 5:24 PM, Walrus theCat >>> wrote: >>> >>>> Great success! >>>> >>>> I was able to get output to the driver console by changing the >>>> construction of the Streaming Spark Context from: >>>> >>>> val ssc = new StreamingContext("local" /**TODO change once a cluster >>>> is up **/, >>>> "AppName", Seconds(1)) >>>> >>>> >>>> to: >>>> >>>> val ssc = new StreamingContext("local[2]" /**TODO change once a cluster >>>> is up **/, >>>> "AppName", Seconds(1)) >>>> >>>> >>>> I found something that tipped me off that this might work by digging >>>> through this mailing list. >>>> >>>> >>>> On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat >>>> wrote: >>>> >>>>> More strange behavior: >>>>> >>>>> lines.foreachRDD(x => println(x.first)) // works >>>>> lines.foreachRDD(x => println((x.count,x.first))) // no output is >>>>> printed to driver console >>>>> >>>>> >>>>> >>>>> >>>>> On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat < >>>>> walrusthe...@gmail.com> wrote: >>>>> >>>>>> >>>>>> Thanks for your interest. >>>>>> >>>>>> lines.foreachRDD(x => println(x.count)) >>>>>> >>>>>> And I got 0 every once in a while (which I think is strange, because >>>>>> lines.print prints the input I'm giving it over the socket.) >>>>>> >>>>>> >>>>>> When I tried: >>>>>> >>>>>> lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) >>>>>> >>>>>> I got no count. >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < >>>>>> tathagata.das1...@gmail.com> wrote: >>>>>> >>>>>>> Try doing DStream.foreachRDD and then printing the RDD count and >>>>>>> further inspecting the RDD. >>>>>>> On Jul 13, 2014 1:03 AM, "Walrus theCat" >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have a DStream that works just fine when I say: >>>>>>>> >>>>>>>> dstream.print >>>>>>>> >>>>>>>> If I say: >>>>>>>> >>>>>>>> dstream.map(_,1).print >>>>>>>> >>>>>>>> that works, too. However, if I do the following: >>>>>>>> >>>>>>>> dstream.reduce{case(x,y) => x}.print >>>>>>>> >>>>>>>> I don't get anything on my console. What's going on? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >
Re: not getting output from socket connection
Make sure you use "local[n]" (where n > 1) in your context setup too, (if you're running locally), or you won't get output. On Sat, Jul 12, 2014 at 11:36 PM, Walrus theCat wrote: > Thanks! > > I thought it would get "passed through" netcat, but given your email, I > was able to follow this tutorial and get it to work: > > http://docs.oracle.com/javase/tutorial/networking/sockets/clientServer.html > > > > > On Fri, Jul 11, 2014 at 1:31 PM, Sean Owen wrote: > >> netcat is listening for a connection on port . It is echoing what >> you type to its console to anything that connects to and reads. >> That is what Spark streaming does. >> >> If you yourself connect to and write, nothing happens except that >> netcat echoes it. This does not cause Spark to somehow get that data. >> nc is only echoing input from the console. >> >> On Fri, Jul 11, 2014 at 9:25 PM, Walrus theCat >> wrote: >> > Hi, >> > >> > I have a java application that is outputting a string every second. I'm >> > running the wordcount example that comes with Spark 1.0, and running nc >> -lk >> > . When I type words into the terminal running netcat, I get counts. >> > However, when I write the String onto a socket on port , I don't get >> > counts. I can see the strings showing up in the netcat terminal, but no >> > counts from Spark. If I paste in the string, I get counts. >> > >> > Any ideas? >> > >> > Thanks >> > >
Re: can't print DStream after reduce
This almost had me not using Spark; I couldn't get any output. It is not at all obvious what's going on here to the layman (and to the best of my knowledge, not documented anywhere), but now you know you'll be able to answer this question for the numerous people that will also have it. On Sun, Jul 13, 2014 at 5:24 PM, Walrus theCat wrote: > Great success! > > I was able to get output to the driver console by changing the > construction of the Streaming Spark Context from: > > val ssc = new StreamingContext("local" /**TODO change once a cluster is > up **/, > "AppName", Seconds(1)) > > > to: > > val ssc = new StreamingContext("local[2]" /**TODO change once a cluster is > up **/, > "AppName", Seconds(1)) > > > I found something that tipped me off that this might work by digging > through this mailing list. > > > On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat > wrote: > >> More strange behavior: >> >> lines.foreachRDD(x => println(x.first)) // works >> lines.foreachRDD(x => println((x.count,x.first))) // no output is printed >> to driver console >> >> >> >> >> On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat >> wrote: >> >>> >>> Thanks for your interest. >>> >>> lines.foreachRDD(x => println(x.count)) >>> >>> And I got 0 every once in a while (which I think is strange, because >>> lines.print prints the input I'm giving it over the socket.) >>> >>> >>> When I tried: >>> >>> lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) >>> >>> I got no count. >>> >>> Thanks >>> >>> >>> On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < >>> tathagata.das1...@gmail.com> wrote: >>> Try doing DStream.foreachRDD and then printing the RDD count and further inspecting the RDD. On Jul 13, 2014 1:03 AM, "Walrus theCat" wrote: > Hi, > > I have a DStream that works just fine when I say: > > dstream.print > > If I say: > > dstream.map(_,1).print > > that works, too. However, if I do the following: > > dstream.reduce{case(x,y) => x}.print > > I don't get anything on my console. What's going on? > > Thanks > >>> >> >
Re: HELP!? Re: Streaming trouble (reduceByKey causes all printing to stop)
I don't know if this matters, but I'm looking at the web site that spark puts up, and I see under the streaming tab: - *Started at: *Thu Jun 12 11:42:10 EDT 2014 - *Time since start: *6 minutes 3 seconds - *Network receivers: *1 - *Batch interval: *5 seconds - *Processed batches: *0 - *Waiting batches: *1 Why would a batch be "waiting" for long over my batch time of 5 seconds? On Thu, Jun 12, 2014 at 10:18 AM, Michael Campbell < michael.campb...@gmail.com> wrote: > Ad... it's NOT working. > > Here's the code: > > val bytes = kafkaStream.map({ case (key, messageBytes) => > messageBytes}) // Map to just get the bytes part out... > val things = bytes.flatMap(bytesArrayToThings) // convert to a > thing > val srcDestinations = things.map(thing => > (ipToString(thing.getSourceIp), Set(ipToString(thing.getDestinationIp > // up to this point works fine. > > // this fails to print > val srcDestinationSets = srcDestinations.reduceByKey((exist: > Set[String], addl: Set[String]) => exist ++ addl) > > > What it does... > > From a kafka message containing many "things", convert the message to an > array of said "things", flatMap them out to a stream of 1 "thing" at a > time, pull out and make a Tuple of a (SourceIP, DestinationIP). > > ALL THAT WORKS. If I do a "srcDestinations.print()" I get output like the > following, every 5 seconds, which is my batch size. > > --- > Time: 140258200 ms > --- > (10.30.51.216,Set(10.20.1.1)) > (10.20.11.3,Set(10.10.61.98)) > (10.20.11.3,Set(10.10.61.95)) > ... > > > > What I want is a SET of (sourceIP -> Set(all the destination Ips)) Using > a set because as you can see above, the same source may have the same > destination multiple times and I want to eliminate dupes on the destination > side. > > When I call the reduceByKey() method, I never get any output. When I do a > "srcDestinationSets.print()" NOTHING EVER PRINTS. Ever. Never. > > What am I doing wrong? (The same happens for "reduceByKeyAndWindow(..., > Seconds(5))".) > > I'm sure this is something I've done, but I cannot figure out what it was. > > Help, please? > > >
HELP!? Re: Streaming trouble (reduceByKey causes all printing to stop)
Ad... it's NOT working. Here's the code: val bytes = kafkaStream.map({ case (key, messageBytes) => messageBytes}) // Map to just get the bytes part out... val things = bytes.flatMap(bytesArrayToThings) // convert to a thing val srcDestinations = things.map(thing => (ipToString(thing.getSourceIp), Set(ipToString(thing.getDestinationIp // up to this point works fine. // this fails to print val srcDestinationSets = srcDestinations.reduceByKey((exist: Set[String], addl: Set[String]) => exist ++ addl) What it does... >From a kafka message containing many "things", convert the message to an array of said "things", flatMap them out to a stream of 1 "thing" at a time, pull out and make a Tuple of a (SourceIP, DestinationIP). ALL THAT WORKS. If I do a "srcDestinations.print()" I get output like the following, every 5 seconds, which is my batch size. --- Time: 140258200 ms --- (10.30.51.216,Set(10.20.1.1)) (10.20.11.3,Set(10.10.61.98)) (10.20.11.3,Set(10.10.61.95)) ... What I want is a SET of (sourceIP -> Set(all the destination Ips)) Using a set because as you can see above, the same source may have the same destination multiple times and I want to eliminate dupes on the destination side. When I call the reduceByKey() method, I never get any output. When I do a "srcDestinationSets.print()" NOTHING EVER PRINTS. Ever. Never. What am I doing wrong? (The same happens for "reduceByKeyAndWindow(..., Seconds(5))".) I'm sure this is something I've done, but I cannot figure out what it was. Help, please?
Kafka client - specify offsets?
Is there a way in the Apache Spark Kafka Utils to specify an offset to start reading? Specifically, from the start of the queue, or failing that, a specific point?
Re: Having trouble with streaming (updateStateByKey)
I rearranged my code to do a reduceByKey which I think is working. I also don't think the problem was that updateState call, but something else; unfortunately I changed a lot in looking for this issue, so not sure what the actual fix might have been, but I think it's working now. On Wed, Jun 11, 2014 at 1:47 PM, Michael Campbell < michael.campb...@gmail.com> wrote: > I'm having a little trouble getting an "updateStateByKey()" call to work; > was wondering if anyone could help. > > In my chain of calls from getting Kafka messages out of the queue to > converting the message to a set of "things", then pulling out 2 attributes > of those things to a Tuple2, everything works. > > So what I end up with is about a 1 second dump of things like this (this > is crufted up data, but it's basically 2 IPV6 addresses...) > > --- > Time: 1402507839000 ms > --- > (:::a14:b03,:::a0a:2bd4) > (:::a14:b03,:::a0a:2bd4) > (:::a0a:25a7,:::a14:b03) > (:::a14:b03,:::a0a:2483) > (:::a14:b03,:::a0a:2480) > (:::a0a:2d96,:::a14:b03) > (:::a0a:abb5,:::a14:100) > (:::a0a:dcd7,:::a14:28) > (:::a14:28,:::a0a:da94) > (:::a14:b03,:::a0a:2def) > ... > > > This works ok. > > The problem is when I add a call to updateStateByKey - the streaming app > runs and runs and runs and never outputs anything. When I debug, I can't > confirm that my state update passed-in function is ever actually getting > called. > > Indeed I have breakpoints and println statements in my updateFunc and it > LOOKS like it's never getting called. I can confirm that the > updateStateByKey function IS getting called (via it stopping on a > breakpoint). > > Is there something obvious I'm missing? >
Having trouble with streaming (updateStateByKey)
I'm having a little trouble getting an "updateStateByKey()" call to work; was wondering if anyone could help. In my chain of calls from getting Kafka messages out of the queue to converting the message to a set of "things", then pulling out 2 attributes of those things to a Tuple2, everything works. So what I end up with is about a 1 second dump of things like this (this is crufted up data, but it's basically 2 IPV6 addresses...) --- Time: 1402507839000 ms --- (:::a14:b03,:::a0a:2bd4) (:::a14:b03,:::a0a:2bd4) (:::a0a:25a7,:::a14:b03) (:::a14:b03,:::a0a:2483) (:::a14:b03,:::a0a:2480) (:::a0a:2d96,:::a14:b03) (:::a0a:abb5,:::a14:100) (:::a0a:dcd7,:::a14:28) (:::a14:28,:::a0a:da94) (:::a14:b03,:::a0a:2def) ... This works ok. The problem is when I add a call to updateStateByKey - the streaming app runs and runs and runs and never outputs anything. When I debug, I can't confirm that my state update passed-in function is ever actually getting called. Indeed I have breakpoints and println statements in my updateFunc and it LOOKS like it's never getting called. I can confirm that the updateStateByKey function IS getting called (via it stopping on a breakpoint). Is there something obvious I'm missing?
Re: New user streaming question
Thanks all - I still don't know what the underlying problem is, but I KIND OF got it working by dumping my random-words stuff to a file and pointing spark streaming to that. So it's not "Streaming", as such, but I got output. More investigation to follow =) On Sat, Jun 7, 2014 at 8:22 AM, Gino Bustelo wrote: > I would make sure that your workers are running. It is very difficult to > tell from the console dribble if you just have no data or the workers just > disassociated from masters. > > Gino B. > > On Jun 6, 2014, at 11:32 PM, Jeremy Lee > wrote: > > Yup, when it's running, DStream.print() will print out a timestamped block > for every time step, even if the block is empty. (for v1.0.0, which I have > running in the other window) > > If you're not getting that, I'd guess the stream hasn't started up > properly. > > > On Sat, Jun 7, 2014 at 11:50 AM, Michael Campbell < > michael.campb...@gmail.com> wrote: > >> I've been playing with spark and streaming and have a question on stream >> outputs. The symptom is I don't get any. >> >> I have run spark-shell and all does as I expect, but when I run the >> word-count example with streaming, it *works* in that things happen and >> there are no errors, but I never get any output. >> >> Am I understanding how it it is supposed to work correctly? Is the >> Dstream.print() method supposed to print the output for every (micro)batch >> of the streamed data? If that's the case, I'm not seeing it. >> >> I'm using the "netcat" example and the StreamingContext uses the network >> to read words, but as I said, nothing comes out. >> >> I tried changing the .print() to .saveAsTextFiles(), and I AM getting a >> file, but nothing is in it other than a "_temporary" subdir. >> >> I'm sure I'm confused here, but not sure where. Help? >> > > > > -- > Jeremy Lee BCompSci(Hons) > The Unorthodox Engineers > >
New user streaming question
I've been playing with spark and streaming and have a question on stream outputs. The symptom is I don't get any. I have run spark-shell and all does as I expect, but when I run the word-count example with streaming, it *works* in that things happen and there are no errors, but I never get any output. Am I understanding how it it is supposed to work correctly? Is the Dstream.print() method supposed to print the output for every (micro)batch of the streamed data? If that's the case, I'm not seeing it. I'm using the "netcat" example and the StreamingContext uses the network to read words, but as I said, nothing comes out. I tried changing the .print() to .saveAsTextFiles(), and I AM getting a file, but nothing is in it other than a "_temporary" subdir. I'm sure I'm confused here, but not sure where. Help?