Hi,
Reconsidering the execution model behind Streaming would be a good
candidate here, as Spark will not be able to provide the low latency and
sophisticated windowing semantics that more and more use-cases will
require. Maybe relaxing the strict batch model would help a lot. (Mainly
this would
I personally build with SBT and run Spark on YARN with IntelliJ. You need
to connect to remote JVMs with a remote debugger. You also need to do
similar, if you use Python, because it will launch a JVM on the driver
aswell.
On Wed, Aug 19, 2015 at 2:10 PM canan chen ccn...@gmail.com wrote:
Hi,
Is there any way to bypass the limitations of SparkSqlSerializer2 in module
SQL? Said that,
1) it does not support complex types,
2) assumes key-value pairs.
Is there any other pluggable serializer that can be used here?
Thanks!
Why is reduce in DStream implemented with a map, reduceByKey and another
map, given that we have an RDD.reduce?
:
I think so.
In fact, the flow is: allocator.allocateResources() - sleep -
allocator.allocateResources() - sleep …
But I guess that on the first allocateResources() the allocation is not
fulfilled. So sleep occurs.
*From:* Zoltán Zvara [mailto:zoltan.zv...@gmail.com]
*Sent:* Friday, May
I'm trying to debug Spark in yarn-client mode. On my local, single node
cluster everything works fine, but the remote YARN resource manager throws
away my request because of authentication error. I'm running IntelliJ 14 on
Ubuntu and the driver tries to connect to YARN with my local user name. How
/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
On Thu, Apr 9, 2015 at 1:15 AM, Zoltán Zvara zoltan.zv...@gmail.com
wrote:
Dear Developers,
I'm trying to investigate the communication pattern regarding data-flow
during execution of a Spark program defined
Is does not seem to be safe to call RDD.firstParent from anywhere, as it
might throw a java.util.NoSuchElementException: head of empty list. This
seems to be a bug for a consumer of the RDD API.
Zvara Zoltán
mail, hangout, skype: zoltan.zv...@gmail.com
mobile, viber: +36203129543
bank:
Hi!
I'm using the latest IntelliJ and I can't compile the yarn project into the
Spark assembly fat JAR. That is why I'm getting a SparkException with
message Unable to load YARN support. The yarn project is also missing
from SBT tasks and I can't add it. How can I force sbt to include?
Thanks!
)
2015-03-25 9:45 GMT+01:00 Zoltán Zvara zoltan.zv...@gmail.com:
Hi!
I'm using the latest IntelliJ and I can't compile the yarn project into
the Spark assembly fat JAR. That is why I'm getting a SparkException with
message Unable to load YARN support. The yarn project is also missing
from SBT
work like this way? Dose Flink work like this?
On Tue, Mar 24, 2015 at 7:04 PM Zoltán Zvara zoltan.zv...@gmail.com
wrote:
There is a BlockGenerator on each worker node next to the
ReceiverSupervisorImpl, which generates Blocks out of an ArrayBuffer in
each interval (block_interval
There is a BlockGenerator on each worker node next to the
ReceiverSupervisorImpl, which generates Blocks out of an ArrayBuffer in
each interval (block_interval). These Blocks are passed to
ReceiverSupervisorImpl, which throws these blocks to into the BlockManager
for storage. BlockInfos are passed
I'm trying to understand the task scheduling mechanism of Spark, and I'm
curious about where does locality preferences get evaluated? I'm trying to
determine if locality preferences are fetchable before the task get
serialized. A HintSet would be most appreciated!
Have nice day!
Zvara Zoltán
I'm trying to understand the block allocation mechanism Spark uses to
generate batch jobs and a JobSet.
The JobGenerator.generateJobs tries to allocate received blocks to batch,
effectively in ReceivedBlockTracker.allocateBlocksToBatch creates
a streamIdToBlocks, where steam ID's (Int) mapped to
14 matches
Mail list logo