Re: truly bizarre behavior with local[n] on Spark 1.0.1

2014-07-15 Thread Walrus theCat
This is (obviously) spark streaming, by the way.


On Mon, Jul 14, 2014 at 8:27 PM, Walrus theCat walrusthe...@gmail.com
wrote:

 Hi,

 I've got a socketTextStream through which I'm reading input.  I have three
 Dstreams, all of which are the same window operation over that
 socketTextStream.  I have a four core machine.  As we've been covering
 lately, I have to give a cores parameter to my StreamingSparkContext:

 ssc = new StreamingContext(local[4] /**TODO change once a cluster is up
 **/,
   AppName, Seconds(1))

 Now, I have three dstreams, and all I ask them to do is print or count.  I
 should preface this with the statement that they all work on their own.

 dstream1 // 1 second window
 dstream2 // 2 second window
 dstream3 // 5 minute window


 If I construct the ssc with local[8], and put these statements in this
 order, I get prints on the first one, and zero counts on the second one:

 ssc(local[8])  // hyperthread dat sheezy
 dstream1.print // works
 dstream2.count.print // always prints 0



 If I do this, this happens:
 ssc(local[4])
 dstream1.print // doesn't work, just gives me the Time:  ms message
 dstream2.count.print // doesn't work, prints 0

 ssc(local[6])
 dstream1.print // doesn't work, just gives me the Time:  ms message
 dstream2.count.print // works, prints 1

 Sometimes these results switch up, seemingly at random. How can I get
 things to the point where I can develop and test my application locally?

 Thanks









Re: truly bizarre behavior with local[n] on Spark 1.0.1

2014-07-15 Thread Tathagata Das
This sounds really really weird. Can you give me a piece of code that I can
run to reproduce this issue myself?

TD


On Tue, Jul 15, 2014 at 12:02 AM, Walrus theCat walrusthe...@gmail.com
wrote:

 This is (obviously) spark streaming, by the way.


 On Mon, Jul 14, 2014 at 8:27 PM, Walrus theCat walrusthe...@gmail.com
 wrote:

 Hi,

 I've got a socketTextStream through which I'm reading input.  I have
 three Dstreams, all of which are the same window operation over that
 socketTextStream.  I have a four core machine.  As we've been covering
 lately, I have to give a cores parameter to my StreamingSparkContext:

 ssc = new StreamingContext(local[4] /**TODO change once a cluster is up
 **/,
   AppName, Seconds(1))

 Now, I have three dstreams, and all I ask them to do is print or count.
 I should preface this with the statement that they all work on their own.

 dstream1 // 1 second window
 dstream2 // 2 second window
 dstream3 // 5 minute window


 If I construct the ssc with local[8], and put these statements in this
 order, I get prints on the first one, and zero counts on the second one:

 ssc(local[8])  // hyperthread dat sheezy
 dstream1.print // works
 dstream2.count.print // always prints 0



 If I do this, this happens:
 ssc(local[4])
 dstream1.print // doesn't work, just gives me the Time:  ms message
 dstream2.count.print // doesn't work, prints 0

 ssc(local[6])
 dstream1.print // doesn't work, just gives me the Time:  ms message
 dstream2.count.print // works, prints 1

 Sometimes these results switch up, seemingly at random. How can I get
 things to the point where I can develop and test my application locally?

 Thanks










Re: truly bizarre behavior with local[n] on Spark 1.0.1

2014-07-15 Thread Walrus theCat
Will do.


On Tue, Jul 15, 2014 at 12:56 PM, Tathagata Das tathagata.das1...@gmail.com
 wrote:

 This sounds really really weird. Can you give me a piece of code that I
 can run to reproduce this issue myself?

 TD


 On Tue, Jul 15, 2014 at 12:02 AM, Walrus theCat walrusthe...@gmail.com
 wrote:

 This is (obviously) spark streaming, by the way.


 On Mon, Jul 14, 2014 at 8:27 PM, Walrus theCat walrusthe...@gmail.com
 wrote:

 Hi,

 I've got a socketTextStream through which I'm reading input.  I have
 three Dstreams, all of which are the same window operation over that
 socketTextStream.  I have a four core machine.  As we've been covering
 lately, I have to give a cores parameter to my StreamingSparkContext:

 ssc = new StreamingContext(local[4] /**TODO change once a cluster is
 up **/,
   AppName, Seconds(1))

 Now, I have three dstreams, and all I ask them to do is print or count.
 I should preface this with the statement that they all work on their own.

 dstream1 // 1 second window
 dstream2 // 2 second window
 dstream3 // 5 minute window


 If I construct the ssc with local[8], and put these statements in this
 order, I get prints on the first one, and zero counts on the second one:

 ssc(local[8])  // hyperthread dat sheezy
 dstream1.print // works
 dstream2.count.print // always prints 0



 If I do this, this happens:
 ssc(local[4])
 dstream1.print // doesn't work, just gives me the Time:  ms message
 dstream2.count.print // doesn't work, prints 0

 ssc(local[6])
 dstream1.print // doesn't work, just gives me the Time:  ms message
 dstream2.count.print // works, prints 1

 Sometimes these results switch up, seemingly at random. How can I get
 things to the point where I can develop and test my application locally?

 Thanks











truly bizarre behavior with local[n] on Spark 1.0.1

2014-07-14 Thread Walrus theCat
Hi,

I've got a socketTextStream through which I'm reading input.  I have three
Dstreams, all of which are the same window operation over that
socketTextStream.  I have a four core machine.  As we've been covering
lately, I have to give a cores parameter to my StreamingSparkContext:

ssc = new StreamingContext(local[4] /**TODO change once a cluster is up
**/,
  AppName, Seconds(1))

Now, I have three dstreams, and all I ask them to do is print or count.  I
should preface this with the statement that they all work on their own.

dstream1 // 1 second window
dstream2 // 2 second window
dstream3 // 5 minute window


If I construct the ssc with local[8], and put these statements in this
order, I get prints on the first one, and zero counts on the second one:

ssc(local[8])  // hyperthread dat sheezy
dstream1.print // works
dstream2.count.print // always prints 0



If I do this, this happens:
ssc(local[4])
dstream1.print // doesn't work, just gives me the Time:  ms message
dstream2.count.print // doesn't work, prints 0

ssc(local[6])
dstream1.print // doesn't work, just gives me the Time:  ms message
dstream2.count.print // works, prints 1

Sometimes these results switch up, seemingly at random. How can I get
things to the point where I can develop and test my application locally?

Thanks