://polyglotprogramming.com
>
> On Fri, Nov 20, 2015 at 12:45 PM, Walrus theCat <walrusthe...@gmail.com>
> wrote:
>
>> Dean,
>>
>> What's the point of Scala without magic? :-)
>>
>> Thanks for your help. It's still giving me unreliable results
Hi,
I'm launching a Spark cluster with the spark-ec2 script and playing around
in spark-shell. I'm running the same line of code over and over again, and
getting different results, and sometimes exceptions. Towards the end,
after I cache the first RDD, it gives me the correct result multiple
Update:
You can now answer this on stackoverflow for 100 bounty:
http://stackoverflow.com/questions/33704073/how-to-send-transformed-data-from-partitions-to-s3
On Fri, Nov 13, 2015 at 4:56 PM, Walrus theCat <walrusthe...@gmail.com>
wrote:
> Hi,
>
> I have an RDD which crashes
Hi,
I have an RDD which crashes the driver when being collected. I want to
send the data on its partitions out to S3 without bringing it back to the
driver. I try calling rdd.foreachPartition, but the data that gets sent has
not gone through the chain of transformations that I need. It's the
Hi,
When I run this:
dev/change-version-to-2.11.sh
mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package
as per here
https://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211,
maven doesn't build Spark's dependencies.
Only when I run:
I'm getting this also, with Scala 2.11 and Scala 2.10:
15/01/18 07:34:51 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/01/18 07:34:51 INFO Remoting: Starting remoting
15/01/18 07:34:51 ERROR actor.ActorSystemImpl: Uncaught fatal error from
thread
this.
2s-window-stream.transformWith(1s-window-stream, (rdd1: RDD[...], rdd2:
RDD[...]) = {
...
// return a new RDD
})
And streamingContext.transform() extends it to N DStreams. :)
Hope this helps!
TD
On Wed, Jul 16, 2014 at 10:42 AM, Walrus theCat walrusthe...@gmail.com
wrote
Hi,
My application has multiple dstreams on the same inputstream:
dstream1 // 1 second window
dstream2 // 2 second window
dstream3 // 5 minute window
I want to write logic that deals with all three windows (e.g. when the 1
second window differs from the 2 second window by some delta ...)
I've
several kafka dstreams using the join operation but you have
the limitation that the duration of the batch has to be same,i.e. 1 second
window for all dstreams... so it would not work for you.
2014-07-16 18:08 GMT+01:00 Walrus theCat walrusthe...@gmail.com:
Hi,
My application has multiple
problem.
On Wed, Jul 16, 2014 at 10:30 AM, Walrus theCat walrusthe...@gmail.com
wrote:
Yeah -- I tried the .union operation and it didn't work for that reason.
Surely there has to be a way to do this, as I imagine this is a commonly
desired goal in streaming applications?
On Wed, Jul 16
... maybe consuming all streams at the same time with an actor that
would act as a new DStream source... but this is just a random idea... I
don't really know if that would be a good idea or even possible.
2014-07-16 18:30 GMT+01:00 Walrus theCat walrusthe...@gmail.com:
Yeah -- I tried the .union
This is (obviously) spark streaming, by the way.
On Mon, Jul 14, 2014 at 8:27 PM, Walrus theCat walrusthe...@gmail.com
wrote:
Hi,
I've got a socketTextStream through which I'm reading input. I have three
Dstreams, all of which are the same window operation over that
socketTextStream. I
Will do.
On Tue, Jul 15, 2014 at 12:56 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:
This sounds really really weird. Can you give me a piece of code that I
can run to reproduce this issue myself?
TD
On Tue, Jul 15, 2014 at 12:02 AM, Walrus theCat walrusthe...@gmail.com
wrote
Hi,
I've got a socketTextStream through which I'm reading input. I have three
Dstreams, all of which are the same window operation over that
socketTextStream. I have a four core machine. As we've been covering
lately, I have to give a cores parameter to my StreamingSparkContext:
ssc = new
Hi,
I have a DStream that works just fine when I say:
dstream.print
If I say:
dstream.map(_,1).print
that works, too. However, if I do the following:
dstream.reduce{case(x,y) = x}.print
I don't get anything on my console. What's going on?
Thanks
of block
input-0-1405276661400
Any insight?
Thanks
On Sun, Jul 13, 2014 at 1:03 AM, Walrus theCat walrusthe...@gmail.com
wrote:
Hi,
I have a DStream that works just fine when I say:
dstream.print
If I say:
dstream.map(_,1).print
that works, too. However, if I do the following
.
Thanks
On Sun, Jul 13, 2014 at 11:34 AM, Tathagata Das tathagata.das1...@gmail.com
wrote:
Try doing DStream.foreachRDD and then printing the RDD count and further
inspecting the RDD.
On Jul 13, 2014 1:03 AM, Walrus theCat walrusthe...@gmail.com wrote:
Hi,
I have a DStream that works just
More strange behavior:
lines.foreachRDD(x = println(x.first)) // works
lines.foreachRDD(x = println((x.count,x.first))) // no output is printed
to driver console
On Sun, Jul 13, 2014 at 11:47 AM, Walrus theCat walrusthe...@gmail.com
wrote:
Thanks for your interest.
lines.foreachRDD(x
change once a cluster is
up **/,
AppName, Seconds(1))
I found something that tipped me off that this might work by digging
through this mailing list.
On Sun, Jul 13, 2014 at 2:00 PM, Walrus theCat walrusthe...@gmail.com
wrote:
More strange behavior:
lines.foreachRDD(x = println(x.first
, (if
you're running locally), or you won't get output.
On Sat, Jul 12, 2014 at 11:36 PM, Walrus theCat walrusthe...@gmail.com
wrote:
Thanks!
I thought it would get passed through netcat, but given your email, I
was able to follow this tutorial and get it to work:
http://docs.oracle.com/javase
that data.
nc is only echoing input from the console.
On Fri, Jul 11, 2014 at 9:25 PM, Walrus theCat walrusthe...@gmail.com
wrote:
Hi,
I have a java application that is outputting a string every second. I'm
running the wordcount example that comes with Spark 1.0, and running nc
-lk
Hi,
I have a java application that is outputting a string every second. I'm
running the wordcount example that comes with Spark 1.0, and running nc -lk
. When I type words into the terminal running netcat, I get counts.
However, when I write the String onto a socket on port , I don't get
I forgot to add that I get the same behavior if I tail -f | nc localhost
on a log file.
On Fri, Jul 11, 2014 at 1:25 PM, Walrus theCat walrusthe...@gmail.com
wrote:
Hi,
I have a java application that is outputting a string every second. I'm
running the wordcount example that comes
Nick,
I have encountered strange things like this before (usually when
programming with mutable structures and side-effects), and for me, the
answer was that, until .count (or .first, or similar), is called, your
variable 'a' refers to a set of instructions that only get executed to form
the
Replaying: sc.parallelize(List(1,2,3))
console:8: error: not found: value sc
sc.parallelize(List(1,2,3))
On Mon, Apr 14, 2014 at 7:51 PM, Walrus theCat walrusthe...@gmail.comwrote:
Nevermind -- I'm like 90% sure the problem is that I'm importing stuff
that declares a SparkContext as sc
Dankeschön !
On Tue, Apr 15, 2014 at 11:29 AM, Aaron Davidson ilike...@gmail.com wrote:
This is probably related to the Scala bug that :cp does not work:
https://issues.scala-lang.org/browse/SI-6502
On Tue, Apr 15, 2014 at 11:21 AM, Walrus theCat walrusthe...@gmail.comwrote:
Actually
Hi,
Using the spark-shell, I can't sc.parallelize to get an RDD.
Looks like a bug.
scala sc.parallelize(Array(a,s,d))
java.lang.NullPointerException
at init(console:17)
at init(console:22)
at init(console:24)
at init(console:26)
at init(console:28)
at init(console:30)
Nevermind -- I'm like 90% sure the problem is that I'm importing stuff that
declares a SparkContext as sc. If it's not, I'll report back.
On Mon, Apr 14, 2014 at 2:55 PM, Walrus theCat walrusthe...@gmail.comwrote:
Hi,
Using the spark-shell, I can't sc.parallelize to get an RDD.
Looks like
Hi,
I want to do something like this:
rdd3 = rdd1.coalesce(N).partitions.zip(rdd2.coalesce(N).partitions)
I realize the above will get me something like Array[(partition,partition)].
I hope you see what I'm going for here -- any tips on how to accomplish
this?
Thanks
Answering my own question here. This may not be efficient, but this is
what I came up with:
rdd1.coalesce(N).glom.zip(rdd2.coalesce(N).glom).map { case(x,y) = x++y}
On Wed, Mar 26, 2014 at 11:11 AM, Walrus theCat walrusthe...@gmail.comwrote:
Hi,
I want to do something like this:
rdd3
For the record, I tried this, and it worked.
On Wed, Mar 26, 2014 at 10:51 AM, Walrus theCat walrusthe...@gmail.comwrote:
Oh so if I had something more reasonable, like RDD's full of tuples of
say, (Int,Set,Set), I could expect a more uniform distribution?
Thanks
On Mon, Mar 24, 2014
Hi,
Quick question about partitions. If my RDD is partitioned into 5
partitions, does that mean that I am constraining it to exist on at most 5
machines?
Thanks
For instance, I need to work with an RDD in terms of N parts. Will calling
RDD.coalesce(N) possibly cause processing bottlenecks?
On Mon, Mar 24, 2014 at 1:28 PM, Walrus theCat walrusthe...@gmail.comwrote:
Hi,
Quick question about partitions. If my RDD is partitioned into 5
partitions
. you are shrinking partitions, do not set
shuffle=true, otherwise it will cause additional unnecessary shuffle
overhead.
On Mon, Mar 24, 2014 at 2:32 PM, Walrus theCat walrusthe...@gmail.comwrote:
For instance, I need to work with an RDD in terms of N parts. Will
calling RDD.coalesce(N
Hi,
sc.parallelize(Array.tabulate(100)(i=i)).filter( _ % 20 == 0
).coalesce(5,true).glom.collect yields
Array[Array[Int]] = Array(Array(0, 20, 40, 60, 80), Array(), Array(),
Array(), Array())
How do I get something more like:
Array(Array(0), Array(20), Array(40), Array(60), Array(80))
Hi Andrew,
Thanks for your interest. This is a standalone job.
On Mon, Mar 17, 2014 at 4:30 PM, Andrew Ash and...@andrewash.com wrote:
Are you running from the spark shell or from a standalone job?
On Mon, Mar 17, 2014 at 4:17 PM, Walrus theCat walrusthe...@gmail.comwrote:
Hi,
I'm
Hi,
I'm getting this stack trace, using Spark 0.7.3. No references to anything
in my code, never experienced anything like this before. Any ideas what is
going on?
java.lang.ClassCastException: spark.SparkContext$$anonfun$9 cannot be cast
to scala.Function2
at
37 matches
Mail list logo