Re: A proposal for Spark 2.0

2015-11-11 Thread Zoltán Zvara
Hi, Reconsidering the execution model behind Streaming would be a good candidate here, as Spark will not be able to provide the low latency and sophisticated windowing semantics that more and more use-cases will require. Maybe relaxing the strict batch model would help a lot. (Mainly this would

Re: What's the best practice for developing new features for spark ?

2015-08-19 Thread Zoltán Zvara
I personally build with SBT and run Spark on YARN with IntelliJ. You need to connect to remote JVMs with a remote debugger. You also need to do similar, if you use Python, because it will launch a JVM on the driver aswell. On Wed, Aug 19, 2015 at 2:10 PM canan chen ccn...@gmail.com wrote:

SparkSqlSerializer2

2015-07-03 Thread Zoltán Zvara
Hi, Is there any way to bypass the limitations of SparkSqlSerializer2 in module SQL? Said that, 1) it does not support complex types, 2) assumes key-value pairs. Is there any other pluggable serializer that can be used here? Thanks!

DStream.reduce

2015-06-30 Thread Zoltán Zvara
Why is reduce in DStream implemented with a map, reduceByKey and another map, given that we have an RDD.reduce?

Re: YARN mode startup takes too long (10+ secs)

2015-05-11 Thread Zoltán Zvara
: I think so. In fact, the flow is: allocator.allocateResources() - sleep - allocator.allocateResources() - sleep … But I guess that on the first allocateResources() the allocation is not fulfilled. So sleep occurs. *From:* Zoltán Zvara [mailto:zoltan.zv...@gmail.com] *Sent:* Friday, May

Connect to remote YARN cluster

2015-04-09 Thread Zoltán Zvara
I'm trying to debug Spark in yarn-client mode. On my local, single node cluster everything works fine, but the remote YARN resource manager throws away my request because of authentication error. I'm running IntelliJ 14 on Ubuntu and the driver tries to connect to YARN with my local user name. How

Re: Spark remote communication pattern

2015-04-09 Thread Zoltán Zvara
/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala On Thu, Apr 9, 2015 at 1:15 AM, Zoltán Zvara zoltan.zv...@gmail.com wrote: Dear Developers, I'm trying to investigate the communication pattern regarding data-flow during execution of a Spark program defined

RDD firstParent

2015-04-08 Thread Zoltán Zvara
Is does not seem to be safe to call RDD.firstParent from anywhere, as it might throw a java.util.NoSuchElementException: head of empty list. This seems to be a bug for a consumer of the RDD API. Zvara Zoltán mail, hangout, skype: zoltan.zv...@gmail.com mobile, viber: +36203129543 bank:

Can't assembly YARN project with SBT

2015-03-25 Thread Zoltán Zvara
Hi! I'm using the latest IntelliJ and I can't compile the yarn project into the Spark assembly fat JAR. That is why I'm getting a SparkException with message Unable to load YARN support. The yarn project is also missing from SBT tasks and I can't add it. How can I force sbt to include? Thanks!

Re: Can't assembly YARN project with SBT

2015-03-25 Thread Zoltán Zvara
) 2015-03-25 9:45 GMT+01:00 Zoltán Zvara zoltan.zv...@gmail.com: Hi! I'm using the latest IntelliJ and I can't compile the yarn project into the Spark assembly fat JAR. That is why I'm getting a SparkException with message Unable to load YARN support. The yarn project is also missing from SBT

Re: Optimize the first map reduce of DStream

2015-03-24 Thread Zoltán Zvara
work like this way? Dose Flink work like this? On Tue, Mar 24, 2015 at 7:04 PM Zoltán Zvara zoltan.zv...@gmail.com wrote: There is a BlockGenerator on each worker node next to the ReceiverSupervisorImpl, which generates Blocks out of an ArrayBuffer in each interval (block_interval

Re: Optimize the first map reduce of DStream

2015-03-24 Thread Zoltán Zvara
There is a BlockGenerator on each worker node next to the ReceiverSupervisorImpl, which generates Blocks out of an ArrayBuffer in each interval (block_interval). These Blocks are passed to ReceiverSupervisorImpl, which throws these blocks to into the BlockManager for storage. BlockInfos are passed

Spark scheduling, data locality

2015-03-19 Thread Zoltán Zvara
I'm trying to understand the task scheduling mechanism of Spark, and I'm curious about where does locality preferences get evaluated? I'm trying to determine if locality preferences are fetchable before the task get serialized. A HintSet would be most appreciated! Have nice day! Zvara Zoltán

Spark Streaming - received block allocation to batch

2015-03-11 Thread Zoltán Zvara
I'm trying to understand the block allocation mechanism Spark uses to generate batch jobs and a JobSet. The JobGenerator.generateJobs tries to allocate received blocks to batch, effectively in ReceivedBlockTracker.allocateBlocksToBatch creates a streamIdToBlocks, where steam ID's (Int) mapped to