Re: can't print DStream after reduce
Yeah. I have been wondering how to check this in the general case, across all deployment modes, but thats a hard problem. Last week I realized that even if we can do it just for local, we can get the biggest bang of the buck. TD On Tue, Jul 15, 2014 at 9:31 PM, Tobias Pfeiffer wrote: > Hi, > > thanks for creating the issue. It feels like in the last week, more or > less half of the questions about Spark Streaming rooted in setting the > master to "local" ;-) > > Tobias > > > On Wed, Jul 16, 2014 at 11:03 AM, Tathagata Das < > tathagata.das1...@gmail.com> wrote: > >> Aah, right, copied from the wrong browser tab i guess. Thanks! >> >> TD >> >> >> On Tue, Jul 15, 2014 at 5:57 PM, Michael Campbell < >> michael.campb...@gmail.com> wrote: >> >>> I think you typo'd the jira id; it should be >>> https://issues.apache.org/jira/browse/SPARK-2475 "Check whether #cores >>> > #receivers in local mode" >>> >>> >>> On Mon, Jul 14, 2014 at 3:57 PM, Tathagata Das < >>> tathagata.das1...@gmail.com> wrote: >>> The problem is not really for local[1] or local. The problem arises when there are more input streams than there are cores. But I agree, for people who are just beginning to use it by running it locally, there should be a check addressing this. I made a JIRA for this. https://issues.apache.org/jira/browse/SPARK-2464 TD On Sun, Jul 13, 2014 at 4:26 PM, Sean Owen wrote: > How about a PR that rejects a context configured for local or > local[1]? As I understand it is not intended to work and has bitten > several > people. > On Jul 14, 2014 12:24 AM, "Michael Campbell" < > michael.campb...@gmail.com> wrote: > >> This almost had me not using Spark; I couldn't get any output. It is >> not at all obvious what's going on here to the layman (and to the best of >> my knowledge, not documented anywhere), but now you know you'll be able >> to >> answer this question for the numerous people that will also have it. >> >> >> On Sun, Jul 13, 2014 at 5:24 PM, Walrus theCat < >> walrusthe...@gmail.com> wrote: >> >>> Great success! >>> >>> I was able to get output to the driver console by changing the >>> construction of the Streaming Spark Context from: >>> >>> val ssc = new StreamingContext("local" /**TODO change once a >>> cluster is up **/, >>> "AppName", Seconds(1)) >>> >>> >>> to: >>> >>> val ssc = new StreamingContext("local[2]" /**TODO change once a >>> cluster is up **/, >>> "AppName", Seconds(1)) >>> >>> >>> I found something that tipped me off that this might work by digging >>> through this mailing list. >>> >>> >>> On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat < >>> walrusthe...@gmail.com> wrote: >>> More strange behavior: lines.foreachRDD(x => println(x.first)) // works lines.foreachRDD(x => println((x.count,x.first))) // no output is printed to driver console On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat < walrusthe...@gmail.com> wrote: > > Thanks for your interest. > > lines.foreachRDD(x => println(x.count)) > > And I got 0 every once in a while (which I think is strange, > because lines.print prints the input I'm giving it over the socket.) > > > When I tried: > > lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) > > I got no count. > > Thanks > > > On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < > tathagata.das1...@gmail.com> wrote: > >> Try doing DStream.foreachRDD and then printing the RDD count and >> further inspecting the RDD. >> On Jul 13, 2014 1:03 AM, "Walrus theCat" >> wrote: >> >>> Hi, >>> >>> I have a DStream that works just fine when I say: >>> >>> dstream.print >>> >>> If I say: >>> >>> dstream.map(_,1).print >>> >>> that works, too. However, if I do the following: >>> >>> dstream.reduce{case(x,y) => x}.print >>> >>> I don't get anything on my console. What's going on? >>> >>> Thanks >>> >> > >>> >> >>> >> >
Re: can't print DStream after reduce
Hi, thanks for creating the issue. It feels like in the last week, more or less half of the questions about Spark Streaming rooted in setting the master to "local" ;-) Tobias On Wed, Jul 16, 2014 at 11:03 AM, Tathagata Das wrote: > Aah, right, copied from the wrong browser tab i guess. Thanks! > > TD > > > On Tue, Jul 15, 2014 at 5:57 PM, Michael Campbell < > michael.campb...@gmail.com> wrote: > >> I think you typo'd the jira id; it should be >> https://issues.apache.org/jira/browse/SPARK-2475 "Check whether #cores >> > #receivers in local mode" >> >> >> On Mon, Jul 14, 2014 at 3:57 PM, Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >>> The problem is not really for local[1] or local. The problem arises when >>> there are more input streams than there are cores. >>> But I agree, for people who are just beginning to use it by running it >>> locally, there should be a check addressing this. >>> >>> I made a JIRA for this. >>> https://issues.apache.org/jira/browse/SPARK-2464 >>> >>> TD >>> >>> >>> On Sun, Jul 13, 2014 at 4:26 PM, Sean Owen wrote: >>> How about a PR that rejects a context configured for local or local[1]? As I understand it is not intended to work and has bitten several people. On Jul 14, 2014 12:24 AM, "Michael Campbell" < michael.campb...@gmail.com> wrote: > This almost had me not using Spark; I couldn't get any output. It is > not at all obvious what's going on here to the layman (and to the best of > my knowledge, not documented anywhere), but now you know you'll be able to > answer this question for the numerous people that will also have it. > > > On Sun, Jul 13, 2014 at 5:24 PM, Walrus theCat > wrote: > >> Great success! >> >> I was able to get output to the driver console by changing the >> construction of the Streaming Spark Context from: >> >> val ssc = new StreamingContext("local" /**TODO change once a cluster >> is up **/, >> "AppName", Seconds(1)) >> >> >> to: >> >> val ssc = new StreamingContext("local[2]" /**TODO change once a >> cluster is up **/, >> "AppName", Seconds(1)) >> >> >> I found something that tipped me off that this might work by digging >> through this mailing list. >> >> >> On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat < >> walrusthe...@gmail.com> wrote: >> >>> More strange behavior: >>> >>> lines.foreachRDD(x => println(x.first)) // works >>> lines.foreachRDD(x => println((x.count,x.first))) // no output is >>> printed to driver console >>> >>> >>> >>> >>> On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat < >>> walrusthe...@gmail.com> wrote: >>> Thanks for your interest. lines.foreachRDD(x => println(x.count)) And I got 0 every once in a while (which I think is strange, because lines.print prints the input I'm giving it over the socket.) When I tried: lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) I got no count. Thanks On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < tathagata.das1...@gmail.com> wrote: > Try doing DStream.foreachRDD and then printing the RDD count and > further inspecting the RDD. > On Jul 13, 2014 1:03 AM, "Walrus theCat" > wrote: > >> Hi, >> >> I have a DStream that works just fine when I say: >> >> dstream.print >> >> If I say: >> >> dstream.map(_,1).print >> >> that works, too. However, if I do the following: >> >> dstream.reduce{case(x,y) => x}.print >> >> I don't get anything on my console. What's going on? >> >> Thanks >> > >>> >> > >>> >> >
Re: can't print DStream after reduce
Aah, right, copied from the wrong browser tab i guess. Thanks! TD On Tue, Jul 15, 2014 at 5:57 PM, Michael Campbell < michael.campb...@gmail.com> wrote: > I think you typo'd the jira id; it should be > https://issues.apache.org/jira/browse/SPARK-2475 "Check whether #cores > > #receivers in local mode" > > > On Mon, Jul 14, 2014 at 3:57 PM, Tathagata Das < > tathagata.das1...@gmail.com> wrote: > >> The problem is not really for local[1] or local. The problem arises when >> there are more input streams than there are cores. >> But I agree, for people who are just beginning to use it by running it >> locally, there should be a check addressing this. >> >> I made a JIRA for this. >> https://issues.apache.org/jira/browse/SPARK-2464 >> >> TD >> >> >> On Sun, Jul 13, 2014 at 4:26 PM, Sean Owen wrote: >> >>> How about a PR that rejects a context configured for local or local[1]? >>> As I understand it is not intended to work and has bitten several people. >>> On Jul 14, 2014 12:24 AM, "Michael Campbell" >>> wrote: >>> This almost had me not using Spark; I couldn't get any output. It is not at all obvious what's going on here to the layman (and to the best of my knowledge, not documented anywhere), but now you know you'll be able to answer this question for the numerous people that will also have it. On Sun, Jul 13, 2014 at 5:24 PM, Walrus theCat wrote: > Great success! > > I was able to get output to the driver console by changing the > construction of the Streaming Spark Context from: > > val ssc = new StreamingContext("local" /**TODO change once a cluster > is up **/, > "AppName", Seconds(1)) > > > to: > > val ssc = new StreamingContext("local[2]" /**TODO change once a > cluster is up **/, > "AppName", Seconds(1)) > > > I found something that tipped me off that this might work by digging > through this mailing list. > > > On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat > wrote: > >> More strange behavior: >> >> lines.foreachRDD(x => println(x.first)) // works >> lines.foreachRDD(x => println((x.count,x.first))) // no output is >> printed to driver console >> >> >> >> >> On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat < >> walrusthe...@gmail.com> wrote: >> >>> >>> Thanks for your interest. >>> >>> lines.foreachRDD(x => println(x.count)) >>> >>> And I got 0 every once in a while (which I think is strange, >>> because lines.print prints the input I'm giving it over the socket.) >>> >>> >>> When I tried: >>> >>> lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) >>> >>> I got no count. >>> >>> Thanks >>> >>> >>> On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < >>> tathagata.das1...@gmail.com> wrote: >>> Try doing DStream.foreachRDD and then printing the RDD count and further inspecting the RDD. On Jul 13, 2014 1:03 AM, "Walrus theCat" wrote: > Hi, > > I have a DStream that works just fine when I say: > > dstream.print > > If I say: > > dstream.map(_,1).print > > that works, too. However, if I do the following: > > dstream.reduce{case(x,y) => x}.print > > I don't get anything on my console. What's going on? > > Thanks > >>> >> > >> >
Re: can't print DStream after reduce
I think you typo'd the jira id; it should be https://issues.apache.org/jira/browse/SPARK-2475 "Check whether #cores > #receivers in local mode" On Mon, Jul 14, 2014 at 3:57 PM, Tathagata Das wrote: > The problem is not really for local[1] or local. The problem arises when > there are more input streams than there are cores. > But I agree, for people who are just beginning to use it by running it > locally, there should be a check addressing this. > > I made a JIRA for this. > https://issues.apache.org/jira/browse/SPARK-2464 > > TD > > > On Sun, Jul 13, 2014 at 4:26 PM, Sean Owen wrote: > >> How about a PR that rejects a context configured for local or local[1]? >> As I understand it is not intended to work and has bitten several people. >> On Jul 14, 2014 12:24 AM, "Michael Campbell" >> wrote: >> >>> This almost had me not using Spark; I couldn't get any output. It is >>> not at all obvious what's going on here to the layman (and to the best of >>> my knowledge, not documented anywhere), but now you know you'll be able to >>> answer this question for the numerous people that will also have it. >>> >>> >>> On Sun, Jul 13, 2014 at 5:24 PM, Walrus theCat >>> wrote: >>> Great success! I was able to get output to the driver console by changing the construction of the Streaming Spark Context from: val ssc = new StreamingContext("local" /**TODO change once a cluster is up **/, "AppName", Seconds(1)) to: val ssc = new StreamingContext("local[2]" /**TODO change once a cluster is up **/, "AppName", Seconds(1)) I found something that tipped me off that this might work by digging through this mailing list. On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat wrote: > More strange behavior: > > lines.foreachRDD(x => println(x.first)) // works > lines.foreachRDD(x => println((x.count,x.first))) // no output is > printed to driver console > > > > > On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat < > walrusthe...@gmail.com> wrote: > >> >> Thanks for your interest. >> >> lines.foreachRDD(x => println(x.count)) >> >> And I got 0 every once in a while (which I think is strange, because >> lines.print prints the input I'm giving it over the socket.) >> >> >> When I tried: >> >> lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) >> >> I got no count. >> >> Thanks >> >> >> On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >>> Try doing DStream.foreachRDD and then printing the RDD count and >>> further inspecting the RDD. >>> On Jul 13, 2014 1:03 AM, "Walrus theCat" >>> wrote: >>> Hi, I have a DStream that works just fine when I say: dstream.print If I say: dstream.map(_,1).print that works, too. However, if I do the following: dstream.reduce{case(x,y) => x}.print I don't get anything on my console. What's going on? Thanks >>> >> > >>> >
Re: can't print DStream after reduce
Thank you Tathagata. This had me going for far too long. On Mon, Jul 14, 2014 at 3:57 PM, Tathagata Das wrote: > The problem is not really for local[1] or local. The problem arises when > there are more input streams than there are cores. > But I agree, for people who are just beginning to use it by running it > locally, there should be a check addressing this. > > I made a JIRA for this. > https://issues.apache.org/jira/browse/SPARK-2464 > > TD > > > On Sun, Jul 13, 2014 at 4:26 PM, Sean Owen wrote: > >> How about a PR that rejects a context configured for local or local[1]? >> As I understand it is not intended to work and has bitten several people. >> On Jul 14, 2014 12:24 AM, "Michael Campbell" >> wrote: >> >>> This almost had me not using Spark; I couldn't get any output. It is >>> not at all obvious what's going on here to the layman (and to the best of >>> my knowledge, not documented anywhere), but now you know you'll be able to >>> answer this question for the numerous people that will also have it. >>> >>> >>> On Sun, Jul 13, 2014 at 5:24 PM, Walrus theCat >>> wrote: >>> Great success! I was able to get output to the driver console by changing the construction of the Streaming Spark Context from: val ssc = new StreamingContext("local" /**TODO change once a cluster is up **/, "AppName", Seconds(1)) to: val ssc = new StreamingContext("local[2]" /**TODO change once a cluster is up **/, "AppName", Seconds(1)) I found something that tipped me off that this might work by digging through this mailing list. On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat wrote: > More strange behavior: > > lines.foreachRDD(x => println(x.first)) // works > lines.foreachRDD(x => println((x.count,x.first))) // no output is > printed to driver console > > > > > On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat < > walrusthe...@gmail.com> wrote: > >> >> Thanks for your interest. >> >> lines.foreachRDD(x => println(x.count)) >> >> And I got 0 every once in a while (which I think is strange, because >> lines.print prints the input I'm giving it over the socket.) >> >> >> When I tried: >> >> lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) >> >> I got no count. >> >> Thanks >> >> >> On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >>> Try doing DStream.foreachRDD and then printing the RDD count and >>> further inspecting the RDD. >>> On Jul 13, 2014 1:03 AM, "Walrus theCat" >>> wrote: >>> Hi, I have a DStream that works just fine when I say: dstream.print If I say: dstream.map(_,1).print that works, too. However, if I do the following: dstream.reduce{case(x,y) => x}.print I don't get anything on my console. What's going on? Thanks >>> >> > >>> >
Re: can't print DStream after reduce
The problem is not really for local[1] or local. The problem arises when there are more input streams than there are cores. But I agree, for people who are just beginning to use it by running it locally, there should be a check addressing this. I made a JIRA for this. https://issues.apache.org/jira/browse/SPARK-2464 TD On Sun, Jul 13, 2014 at 4:26 PM, Sean Owen wrote: > How about a PR that rejects a context configured for local or local[1]? As > I understand it is not intended to work and has bitten several people. > On Jul 14, 2014 12:24 AM, "Michael Campbell" > wrote: > >> This almost had me not using Spark; I couldn't get any output. It is not >> at all obvious what's going on here to the layman (and to the best of my >> knowledge, not documented anywhere), but now you know you'll be able to >> answer this question for the numerous people that will also have it. >> >> >> On Sun, Jul 13, 2014 at 5:24 PM, Walrus theCat >> wrote: >> >>> Great success! >>> >>> I was able to get output to the driver console by changing the >>> construction of the Streaming Spark Context from: >>> >>> val ssc = new StreamingContext("local" /**TODO change once a cluster is >>> up **/, >>> "AppName", Seconds(1)) >>> >>> >>> to: >>> >>> val ssc = new StreamingContext("local[2]" /**TODO change once a cluster >>> is up **/, >>> "AppName", Seconds(1)) >>> >>> >>> I found something that tipped me off that this might work by digging >>> through this mailing list. >>> >>> >>> On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat >>> wrote: >>> More strange behavior: lines.foreachRDD(x => println(x.first)) // works lines.foreachRDD(x => println((x.count,x.first))) // no output is printed to driver console On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat >>> > wrote: > > Thanks for your interest. > > lines.foreachRDD(x => println(x.count)) > > And I got 0 every once in a while (which I think is strange, because > lines.print prints the input I'm giving it over the socket.) > > > When I tried: > > lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) > > I got no count. > > Thanks > > > On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < > tathagata.das1...@gmail.com> wrote: > >> Try doing DStream.foreachRDD and then printing the RDD count and >> further inspecting the RDD. >> On Jul 13, 2014 1:03 AM, "Walrus theCat" >> wrote: >> >>> Hi, >>> >>> I have a DStream that works just fine when I say: >>> >>> dstream.print >>> >>> If I say: >>> >>> dstream.map(_,1).print >>> >>> that works, too. However, if I do the following: >>> >>> dstream.reduce{case(x,y) => x}.print >>> >>> I don't get anything on my console. What's going on? >>> >>> Thanks >>> >> > >>> >>
Re: can't print DStream after reduce
How about a PR that rejects a context configured for local or local[1]? As I understand it is not intended to work and has bitten several people. On Jul 14, 2014 12:24 AM, "Michael Campbell" wrote: > This almost had me not using Spark; I couldn't get any output. It is not > at all obvious what's going on here to the layman (and to the best of my > knowledge, not documented anywhere), but now you know you'll be able to > answer this question for the numerous people that will also have it. > > > On Sun, Jul 13, 2014 at 5:24 PM, Walrus theCat > wrote: > >> Great success! >> >> I was able to get output to the driver console by changing the >> construction of the Streaming Spark Context from: >> >> val ssc = new StreamingContext("local" /**TODO change once a cluster is >> up **/, >> "AppName", Seconds(1)) >> >> >> to: >> >> val ssc = new StreamingContext("local[2]" /**TODO change once a cluster >> is up **/, >> "AppName", Seconds(1)) >> >> >> I found something that tipped me off that this might work by digging >> through this mailing list. >> >> >> On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat >> wrote: >> >>> More strange behavior: >>> >>> lines.foreachRDD(x => println(x.first)) // works >>> lines.foreachRDD(x => println((x.count,x.first))) // no output is >>> printed to driver console >>> >>> >>> >>> >>> On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat >>> wrote: >>> Thanks for your interest. lines.foreachRDD(x => println(x.count)) And I got 0 every once in a while (which I think is strange, because lines.print prints the input I'm giving it over the socket.) When I tried: lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) I got no count. Thanks On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < tathagata.das1...@gmail.com> wrote: > Try doing DStream.foreachRDD and then printing the RDD count and > further inspecting the RDD. > On Jul 13, 2014 1:03 AM, "Walrus theCat" > wrote: > >> Hi, >> >> I have a DStream that works just fine when I say: >> >> dstream.print >> >> If I say: >> >> dstream.map(_,1).print >> >> that works, too. However, if I do the following: >> >> dstream.reduce{case(x,y) => x}.print >> >> I don't get anything on my console. What's going on? >> >> Thanks >> > >>> >> >
Re: can't print DStream after reduce
This almost had me not using Spark; I couldn't get any output. It is not at all obvious what's going on here to the layman (and to the best of my knowledge, not documented anywhere), but now you know you'll be able to answer this question for the numerous people that will also have it. On Sun, Jul 13, 2014 at 5:24 PM, Walrus theCat wrote: > Great success! > > I was able to get output to the driver console by changing the > construction of the Streaming Spark Context from: > > val ssc = new StreamingContext("local" /**TODO change once a cluster is > up **/, > "AppName", Seconds(1)) > > > to: > > val ssc = new StreamingContext("local[2]" /**TODO change once a cluster is > up **/, > "AppName", Seconds(1)) > > > I found something that tipped me off that this might work by digging > through this mailing list. > > > On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat > wrote: > >> More strange behavior: >> >> lines.foreachRDD(x => println(x.first)) // works >> lines.foreachRDD(x => println((x.count,x.first))) // no output is printed >> to driver console >> >> >> >> >> On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat >> wrote: >> >>> >>> Thanks for your interest. >>> >>> lines.foreachRDD(x => println(x.count)) >>> >>> And I got 0 every once in a while (which I think is strange, because >>> lines.print prints the input I'm giving it over the socket.) >>> >>> >>> When I tried: >>> >>> lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) >>> >>> I got no count. >>> >>> Thanks >>> >>> >>> On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < >>> tathagata.das1...@gmail.com> wrote: >>> Try doing DStream.foreachRDD and then printing the RDD count and further inspecting the RDD. On Jul 13, 2014 1:03 AM, "Walrus theCat" wrote: > Hi, > > I have a DStream that works just fine when I say: > > dstream.print > > If I say: > > dstream.map(_,1).print > > that works, too. However, if I do the following: > > dstream.reduce{case(x,y) => x}.print > > I don't get anything on my console. What's going on? > > Thanks > >>> >> >
Re: can't print DStream after reduce
Great success! I was able to get output to the driver console by changing the construction of the Streaming Spark Context from: val ssc = new StreamingContext("local" /**TODO change once a cluster is up **/, "AppName", Seconds(1)) to: val ssc = new StreamingContext("local[2]" /**TODO change once a cluster is up **/, "AppName", Seconds(1)) I found something that tipped me off that this might work by digging through this mailing list. On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat wrote: > More strange behavior: > > lines.foreachRDD(x => println(x.first)) // works > lines.foreachRDD(x => println((x.count,x.first))) // no output is printed > to driver console > > > > > On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat > wrote: > >> >> Thanks for your interest. >> >> lines.foreachRDD(x => println(x.count)) >> >> And I got 0 every once in a while (which I think is strange, because >> lines.print prints the input I'm giving it over the socket.) >> >> >> When I tried: >> >> lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) >> >> I got no count. >> >> Thanks >> >> >> On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >>> Try doing DStream.foreachRDD and then printing the RDD count and further >>> inspecting the RDD. >>> On Jul 13, 2014 1:03 AM, "Walrus theCat" >>> wrote: >>> Hi, I have a DStream that works just fine when I say: dstream.print If I say: dstream.map(_,1).print that works, too. However, if I do the following: dstream.reduce{case(x,y) => x}.print I don't get anything on my console. What's going on? Thanks >>> >> >
Re: can't print DStream after reduce
More strange behavior: lines.foreachRDD(x => println(x.first)) // works lines.foreachRDD(x => println((x.count,x.first))) // no output is printed to driver console On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat wrote: > > Thanks for your interest. > > lines.foreachRDD(x => println(x.count)) > > And I got 0 every once in a while (which I think is strange, because > lines.print prints the input I'm giving it over the socket.) > > > When I tried: > > lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) > > I got no count. > > Thanks > > > On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das < > tathagata.das1...@gmail.com> wrote: > >> Try doing DStream.foreachRDD and then printing the RDD count and further >> inspecting the RDD. >> On Jul 13, 2014 1:03 AM, "Walrus theCat" wrote: >> >>> Hi, >>> >>> I have a DStream that works just fine when I say: >>> >>> dstream.print >>> >>> If I say: >>> >>> dstream.map(_,1).print >>> >>> that works, too. However, if I do the following: >>> >>> dstream.reduce{case(x,y) => x}.print >>> >>> I don't get anything on my console. What's going on? >>> >>> Thanks >>> >> >
Re: can't print DStream after reduce
Thanks for your interest. lines.foreachRDD(x => println(x.count)) And I got 0 every once in a while (which I think is strange, because lines.print prints the input I'm giving it over the socket.) When I tried: lines.map(_->1).reduceByKey(_+_).foreachRDD(x => println(x.count)) I got no count. Thanks On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das wrote: > Try doing DStream.foreachRDD and then printing the RDD count and further > inspecting the RDD. > On Jul 13, 2014 1:03 AM, "Walrus theCat" wrote: > >> Hi, >> >> I have a DStream that works just fine when I say: >> >> dstream.print >> >> If I say: >> >> dstream.map(_,1).print >> >> that works, too. However, if I do the following: >> >> dstream.reduce{case(x,y) => x}.print >> >> I don't get anything on my console. What's going on? >> >> Thanks >> >
Re: can't print DStream after reduce
Update on this: val lines = ssc.socketTextStream("localhost", ) lines.print // works lines.map(_->1).print // works lines.map(_->1).reduceByKey(_+_).print // nothing printed to driver console Just lots of: 14/07/13 11:37:40 INFO receiver.BlockGenerator: Pushed block input-0-1405276660400 14/07/13 11:37:41 INFO scheduler.ReceiverTracker: Stream 0 received 1 blocks 14/07/13 11:37:41 INFO scheduler.JobScheduler: Added jobs for time 1405276661000 ms 14/07/13 11:37:41 INFO storage.MemoryStore: ensureFreeSpace(60) called with curMem=1275, maxMem=98539929 14/07/13 11:37:41 INFO storage.MemoryStore: Block input-0-1405276661400 stored as bytes to memory (size 60.0 B, free 94.0 MB) 14/07/13 11:37:41 INFO storage.BlockManagerInfo: Added input-0-1405276661400 in memory on 25.17.218.118:55820 (size: 60.0 B, free: 94.0 MB) 14/07/13 11:37:41 INFO storage.BlockManagerMaster: Updated info of block input-0-1405276661400 Any insight? Thanks On Sun, Jul 13, 2014 at 1:03 AM, Walrus theCat wrote: > Hi, > > I have a DStream that works just fine when I say: > > dstream.print > > If I say: > > dstream.map(_,1).print > > that works, too. However, if I do the following: > > dstream.reduce{case(x,y) => x}.print > > I don't get anything on my console. What's going on? > > Thanks >
Re: can't print DStream after reduce
Try doing DStream.foreachRDD and then printing the RDD count and further inspecting the RDD. On Jul 13, 2014 1:03 AM, "Walrus theCat" wrote: > Hi, > > I have a DStream that works just fine when I say: > > dstream.print > > If I say: > > dstream.map(_,1).print > > that works, too. However, if I do the following: > > dstream.reduce{case(x,y) => x}.print > > I don't get anything on my console. What's going on? > > Thanks >