Re: getting different results from same line of code repeated

2015-11-20 Thread Walrus theCat
://polyglotprogramming.com > > On Fri, Nov 20, 2015 at 12:45 PM, Walrus theCat <walrusthe...@gmail.com> > wrote: > >> Dean, >> >> What's the point of Scala without magic? :-) >> >> Thanks for your help. It's still giving me unreliable results

getting different results from same line of code repeated

2015-11-18 Thread Walrus theCat
Hi, I'm launching a Spark cluster with the spark-ec2 script and playing around in spark-shell. I'm running the same line of code over and over again, and getting different results, and sometimes exceptions. Towards the end, after I cache the first RDD, it gives me the correct result multiple

Re: send transformed RDD to s3 from slaves

2015-11-16 Thread Walrus theCat
Update: You can now answer this on stackoverflow for 100 bounty: http://stackoverflow.com/questions/33704073/how-to-send-transformed-data-from-partitions-to-s3 On Fri, Nov 13, 2015 at 4:56 PM, Walrus theCat <walrusthe...@gmail.com> wrote: > Hi, > > I have an RDD which crashes

send transformed RDD to s3 from slaves

2015-11-13 Thread Walrus theCat
Hi, I have an RDD which crashes the driver when being collected. I want to send the data on its partitions out to S3 without bringing it back to the driver. I try calling rdd.foreachPartition, but the data that gets sent has not gone through the chain of transformations that I need. It's the

maven doesn't build dependencies with Scala 2.11

2015-01-17 Thread Walrus theCat
Hi, When I run this: dev/change-version-to-2.11.sh mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package as per here https://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211, maven doesn't build Spark's dependencies. Only when I run:

Re: SparkSQL 1.2.0 sources API error

2015-01-17 Thread Walrus theCat
I'm getting this also, with Scala 2.11 and Scala 2.10: 15/01/18 07:34:51 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/01/18 07:34:51 INFO Remoting: Starting remoting 15/01/18 07:34:51 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread

Re: using multiple dstreams together (spark streaming)

2014-07-17 Thread Walrus theCat
this. 2s-window-stream.transformWith(1s-window-stream, (rdd1: RDD[...], rdd2: RDD[...]) = { ... // return a new RDD }) And streamingContext.transform() extends it to N DStreams. :) Hope this helps! TD On Wed, Jul 16, 2014 at 10:42 AM, Walrus theCat walrusthe...@gmail.com wrote

using multiple dstreams together (spark streaming)

2014-07-16 Thread Walrus theCat
Hi, My application has multiple dstreams on the same inputstream: dstream1 // 1 second window dstream2 // 2 second window dstream3 // 5 minute window I want to write logic that deals with all three windows (e.g. when the 1 second window differs from the 2 second window by some delta ...) I've

Re: using multiple dstreams together (spark streaming)

2014-07-16 Thread Walrus theCat
several kafka dstreams using the join operation but you have the limitation that the duration of the batch has to be same,i.e. 1 second window for all dstreams... so it would not work for you. 2014-07-16 18:08 GMT+01:00 Walrus theCat walrusthe...@gmail.com: Hi, My application has multiple

Re: using multiple dstreams together (spark streaming)

2014-07-16 Thread Walrus theCat
problem. On Wed, Jul 16, 2014 at 10:30 AM, Walrus theCat walrusthe...@gmail.com wrote: Yeah -- I tried the .union operation and it didn't work for that reason. Surely there has to be a way to do this, as I imagine this is a commonly desired goal in streaming applications? On Wed, Jul 16

Re: using multiple dstreams together (spark streaming)

2014-07-16 Thread Walrus theCat
... maybe consuming all streams at the same time with an actor that would act as a new DStream source... but this is just a random idea... I don't really know if that would be a good idea or even possible. 2014-07-16 18:30 GMT+01:00 Walrus theCat walrusthe...@gmail.com: Yeah -- I tried the .union

Re: truly bizarre behavior with local[n] on Spark 1.0.1

2014-07-15 Thread Walrus theCat
This is (obviously) spark streaming, by the way. On Mon, Jul 14, 2014 at 8:27 PM, Walrus theCat walrusthe...@gmail.com wrote: Hi, I've got a socketTextStream through which I'm reading input. I have three Dstreams, all of which are the same window operation over that socketTextStream. I

Re: truly bizarre behavior with local[n] on Spark 1.0.1

2014-07-15 Thread Walrus theCat
Will do. On Tue, Jul 15, 2014 at 12:56 PM, Tathagata Das tathagata.das1...@gmail.com wrote: This sounds really really weird. Can you give me a piece of code that I can run to reproduce this issue myself? TD On Tue, Jul 15, 2014 at 12:02 AM, Walrus theCat walrusthe...@gmail.com wrote

truly bizarre behavior with local[n] on Spark 1.0.1

2014-07-14 Thread Walrus theCat
Hi, I've got a socketTextStream through which I'm reading input. I have three Dstreams, all of which are the same window operation over that socketTextStream. I have a four core machine. As we've been covering lately, I have to give a cores parameter to my StreamingSparkContext: ssc = new

can't print DStream after reduce

2014-07-13 Thread Walrus theCat
Hi, I have a DStream that works just fine when I say: dstream.print If I say: dstream.map(_,1).print that works, too. However, if I do the following: dstream.reduce{case(x,y) = x}.print I don't get anything on my console. What's going on? Thanks

Re: can't print DStream after reduce

2014-07-13 Thread Walrus theCat
of block input-0-1405276661400 Any insight? Thanks On Sun, Jul 13, 2014 at 1:03 AM, Walrus theCat walrusthe...@gmail.com wrote: Hi, I have a DStream that works just fine when I say: dstream.print If I say: dstream.map(_,1).print that works, too. However, if I do the following

Re: can't print DStream after reduce

2014-07-13 Thread Walrus theCat
. Thanks On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Try doing DStream.foreachRDD and then printing the RDD count and further inspecting the RDD. On Jul 13, 2014 1:03 AM, Walrus theCat walrusthe...@gmail.com wrote: Hi, I have a DStream that works just

Re: can't print DStream after reduce

2014-07-13 Thread Walrus theCat
More strange behavior: lines.foreachRDD(x = println(x.first)) // works lines.foreachRDD(x = println((x.count,x.first))) // no output is printed to driver console On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat walrusthe...@gmail.com wrote: Thanks for your interest. lines.foreachRDD(x

Re: can't print DStream after reduce

2014-07-13 Thread Walrus theCat
change once a cluster is up **/, AppName, Seconds(1)) I found something that tipped me off that this might work by digging through this mailing list. On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat walrusthe...@gmail.com wrote: More strange behavior: lines.foreachRDD(x = println(x.first

Re: not getting output from socket connection

2014-07-13 Thread Walrus theCat
, (if you're running locally), or you won't get output. On Sat, Jul 12, 2014 at 11:36 PM, Walrus theCat walrusthe...@gmail.com wrote: Thanks! I thought it would get passed through netcat, but given your email, I was able to follow this tutorial and get it to work: http://docs.oracle.com/javase

Re: not getting output from socket connection

2014-07-12 Thread Walrus theCat
that data. nc is only echoing input from the console. On Fri, Jul 11, 2014 at 9:25 PM, Walrus theCat walrusthe...@gmail.com wrote: Hi, I have a java application that is outputting a string every second. I'm running the wordcount example that comes with Spark 1.0, and running nc -lk

not getting output from socket connection

2014-07-11 Thread Walrus theCat
Hi, I have a java application that is outputting a string every second. I'm running the wordcount example that comes with Spark 1.0, and running nc -lk . When I type words into the terminal running netcat, I get counts. However, when I write the String onto a socket on port , I don't get

Re: not getting output from socket connection

2014-07-11 Thread Walrus theCat
I forgot to add that I get the same behavior if I tail -f | nc localhost on a log file. On Fri, Jul 11, 2014 at 1:25 PM, Walrus theCat walrusthe...@gmail.com wrote: Hi, I have a java application that is outputting a string every second. I'm running the wordcount example that comes

Re: How can adding a random count() change the behavior of my program?

2014-05-11 Thread Walrus theCat
Nick, I have encountered strange things like this before (usually when programming with mutable structures and side-effects), and for me, the answer was that, until .count (or .first, or similar), is called, your variable 'a' refers to a set of instructions that only get executed to form the

Re: can't sc.paralellize in Spark 0.7.3 spark-shell

2014-04-15 Thread Walrus theCat
Replaying: sc.parallelize(List(1,2,3)) console:8: error: not found: value sc sc.parallelize(List(1,2,3)) On Mon, Apr 14, 2014 at 7:51 PM, Walrus theCat walrusthe...@gmail.comwrote: Nevermind -- I'm like 90% sure the problem is that I'm importing stuff that declares a SparkContext as sc

Re: can't sc.paralellize in Spark 0.7.3 spark-shell

2014-04-15 Thread Walrus theCat
Dankeschön ! On Tue, Apr 15, 2014 at 11:29 AM, Aaron Davidson ilike...@gmail.com wrote: This is probably related to the Scala bug that :cp does not work: https://issues.scala-lang.org/browse/SI-6502 On Tue, Apr 15, 2014 at 11:21 AM, Walrus theCat walrusthe...@gmail.comwrote: Actually

can't sc.paralellize in Spark 0.7.3 spark-shell

2014-04-14 Thread Walrus theCat
Hi, Using the spark-shell, I can't sc.parallelize to get an RDD. Looks like a bug. scala sc.parallelize(Array(a,s,d)) java.lang.NullPointerException at init(console:17) at init(console:22) at init(console:24) at init(console:26) at init(console:28) at init(console:30)

Re: can't sc.paralellize in Spark 0.7.3 spark-shell

2014-04-14 Thread Walrus theCat
Nevermind -- I'm like 90% sure the problem is that I'm importing stuff that declares a SparkContext as sc. If it's not, I'll report back. On Mon, Apr 14, 2014 at 2:55 PM, Walrus theCat walrusthe...@gmail.comwrote: Hi, Using the spark-shell, I can't sc.parallelize to get an RDD. Looks like

interleave partitions?

2014-03-26 Thread Walrus theCat
Hi, I want to do something like this: rdd3 = rdd1.coalesce(N).partitions.zip(rdd2.coalesce(N).partitions) I realize the above will get me something like Array[(partition,partition)]. I hope you see what I'm going for here -- any tips on how to accomplish this? Thanks

Re: interleave partitions?

2014-03-26 Thread Walrus theCat
Answering my own question here. This may not be efficient, but this is what I came up with: rdd1.coalesce(N).glom.zip(rdd2.coalesce(N).glom).map { case(x,y) = x++y} On Wed, Mar 26, 2014 at 11:11 AM, Walrus theCat walrusthe...@gmail.comwrote: Hi, I want to do something like this: rdd3

Re: coalescing RDD into equally sized partitions

2014-03-26 Thread Walrus theCat
For the record, I tried this, and it worked. On Wed, Mar 26, 2014 at 10:51 AM, Walrus theCat walrusthe...@gmail.comwrote: Oh so if I had something more reasonable, like RDD's full of tuples of say, (Int,Set,Set), I could expect a more uniform distribution? Thanks On Mon, Mar 24, 2014

question about partitions

2014-03-24 Thread Walrus theCat
Hi, Quick question about partitions. If my RDD is partitioned into 5 partitions, does that mean that I am constraining it to exist on at most 5 machines? Thanks

Re: question about partitions

2014-03-24 Thread Walrus theCat
For instance, I need to work with an RDD in terms of N parts. Will calling RDD.coalesce(N) possibly cause processing bottlenecks? On Mon, Mar 24, 2014 at 1:28 PM, Walrus theCat walrusthe...@gmail.comwrote: Hi, Quick question about partitions. If my RDD is partitioned into 5 partitions

Re: question about partitions

2014-03-24 Thread Walrus theCat
. you are shrinking partitions, do not set shuffle=true, otherwise it will cause additional unnecessary shuffle overhead. On Mon, Mar 24, 2014 at 2:32 PM, Walrus theCat walrusthe...@gmail.comwrote: For instance, I need to work with an RDD in terms of N parts. Will calling RDD.coalesce(N

coalescing RDD into equally sized partitions

2014-03-24 Thread Walrus theCat
Hi, sc.parallelize(Array.tabulate(100)(i=i)).filter( _ % 20 == 0 ).coalesce(5,true).glom.collect yields Array[Array[Int]] = Array(Array(0, 20, 40, 60, 80), Array(), Array(), Array(), Array()) How do I get something more like: Array(Array(0), Array(20), Array(40), Array(60), Array(80))

Re: inexplicable exceptions in Spark 0.7.3

2014-03-18 Thread Walrus theCat
Hi Andrew, Thanks for your interest. This is a standalone job. On Mon, Mar 17, 2014 at 4:30 PM, Andrew Ash and...@andrewash.com wrote: Are you running from the spark shell or from a standalone job? On Mon, Mar 17, 2014 at 4:17 PM, Walrus theCat walrusthe...@gmail.comwrote: Hi, I'm

inexplicable exceptions in Spark 0.7.3

2014-03-17 Thread Walrus theCat
Hi, I'm getting this stack trace, using Spark 0.7.3. No references to anything in my code, never experienced anything like this before. Any ideas what is going on? java.lang.ClassCastException: spark.SparkContext$$anonfun$9 cannot be cast to scala.Function2 at