Re: spark 2.0 readStream from a REST API

2016-08-11 Thread Sela, Amit
The current available output modes are Complete and Append. Complete mode is for stateful processing (aggregations), and Append mode for stateless processing (I.e., map/filter). See : http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes

Re: What / Where / When / How questions in Spark 2.0 ?

2016-05-21 Thread Sela, Amit
It seems I forgot to add the link to the “Technical Vision” paper so there it is - https://docs.google.com/document/d/1y4qlQinjjrusGWlgq-mYmbxRW2z7-_X5Xax-GG0YsC0/edit?usp=sharing From: "Sela, Amit" <ans...@paypal.com<mailto:ans...@paypal.com>> Date: Saturday, Ma

Re: What / Where / When / How questions in Spark 2.0 ?

2016-05-21 Thread Sela, Amit
This is a “Technical Vision” paper for the Spark runner, which provides general guidelines to the future development of Spark’s Beam support as part of the Apache Beam (incubating) project. This is our JIRA -

Apache Beam Spark runner

2016-03-19 Thread Sela, Amit
Hi all, The Apache Beam Spark runner is now available at: https://github.com/apache/incubator-beam/tree/master/runners/spark Check it out! The Apache Beam (http://beam.incubator.apache.org/) project is a unified model for building data pipelines using Google’s Dataflow programming model, and

Why does DStream have a different StorageLevel than RDD ?

2016-01-26 Thread Sela, Amit
I was wondering why does DStream and RDD have a different cache() StorageLevel ? Thanks, Amit

Support for custom serializers in Checkpoint

2015-12-06 Thread Sela, Amit
Why does Spark allows only Java Serializable in Checkpointing ? I see in Checkpoint.serialize() that it doesn’t even try to load a serializer from the configuration and uses Java’s ObjectOutputStream. This means that I can’t use Avro (fro eaxmple) in updateStateByKey, right ? Is there a reason

Accumulators internals and reliability

2015-10-26 Thread Sela, Amit
It seems like there is not much literature about Spark's Accumulators so I thought I'd ask here: Do Accumulators reside in a Task ? Are they being serialized with the task ? Sent back on task completion as part of the ResultTask ? Are they reliable ? If so, when ? Can I relay on accumulators

NullPointerException when adding to accumulator

2015-10-14 Thread Sela, Amit
I'm running a simple streaming application that reads from Kafka, maps the events and prints them and I'm trying to use accumulators to count the number of mapped records. While this works in standalone(IDE), when submitting to YARN I get NullPointerException on accumulator.add(1) or

Anyone using Intel's spark-streamingsql project to execute SQL queries over Spark streaming ?

2015-07-29 Thread Sela, Amit
Is there an available release? Anyone using in production? Is the project being actively developed and maintained? Thanks!

How does Spark streaming move data around ?

2015-07-06 Thread Sela, Amit
I know that Spark is using data parallelism over, say, HDFS - optimally running computations on local data (aka data locality). I was wondering how Spark streaming moves data (messages) around? since the data is streamed in as DStreams and is not on a distributed FS like HDFS. Thanks!