Running driver app as a daemon

2015-07-21 Thread algermissen1971
Hi, I am trying to start a driver app as a daemon using Linux' start-stop-daemon script (I need console detaching, unbuffered STDOUT/STDERR to logfile and start/stop using a PID file). I am doing this like this (which works great for the other apps we have) /sbin/start-stop-daemon -c $USER

Joda Time best practice?

2015-07-20 Thread algermissen1971
Hi, I am having trouble with Joda Time in a Spark application and saw by now that I am not the only one (generally seems to have to do with serialization and internal caches of the Joda Time objects). Is there a known best practice to work around these issues? Jan

Re: Joda Time best practice?

2015-07-20 Thread algermissen1971
On Mon, Jul 20, 2015 at 1:19 PM, algermissen1971 algermissen1...@icloud.com wrote: Hi Harish, On 20 Jul 2015, at 20:37, Harish Butani rhbutani.sp...@gmail.com wrote: Hey Jan, Can you provide more details on the serialization and cache issues. My symptom is that I have a Joda DateTime

Re: Joda Time best practice?

2015-07-20 Thread algermissen1971
functionality with spark-sql please consider: https://github.com/SparklineData/spark-datetime It provides a simple way to combine joda datetime expressions with spark sql. regards, Harish. On Mon, Jul 20, 2015 at 7:37 AM, algermissen1971 algermissen1...@icloud.com wrote: Hi, I am

Re: Sessionization using updateStateByKey

2015-07-15 Thread algermissen1971
' of which a limited number exists (the users of the visits or the products sold). Yes? Jan Maybe someone has a better idea, I'd like to hear it. On Wed, Jul 15, 2015 at 8:54 AM, algermissen1971 algermissen1...@icloud.com wrote: Hi Cody, oh ... I though that was one of *the* use cases

Re: Sessionization using updateStateByKey

2015-07-15 Thread algermissen1971
Hi Cody, oh ... I though that was one of *the* use cases for it. Do you have a suggestion / best practice how to achieve the same thing with better scaling characteristics? Jan On 15 Jul 2015, at 15:33, Cody Koeninger c...@koeninger.org wrote: I personally would try to avoid

Re: Master vs. Slave Nodes Clarification

2015-07-14 Thread algermissen1971
. Master can be on one of the C* node or a non-C* node. Mohammed -Original Message- From: algermissen1971 [mailto:algermissen1...@icloud.com] Sent: Sunday, July 12, 2015 12:35 PM To: Spark User Subject: Master vs. Slave Nodes Clarification Hi, I have a question that I

Re: Spark Streaming and using Swift object store for checkpointing

2015-07-11 Thread algermissen1971
On 10 Jul 2015, at 23:10, algermissen1971 algermissen1...@icloud.com wrote: Hi, initially today when moving my streaming application to the cluster the first time I ran in to newbie error of using a local file system for checkpointing and the RDD partition count differences (see

Starting Spark-Application without explicit submission to cluster?

2015-07-10 Thread algermissen1971
Hi, I am a bit confused about the steps I need to take to start a Spark application on a cluster. So far I had this impression from the documentation that I need to explicitly submit the application using for example spark-submit. However, from the SparkContext constructur signature I get the

Spark Streaming and using Swift object store for checkpointing

2015-07-10 Thread algermissen1971
Hi, initially today when moving my streaming application to the cluster the first time I ran in to newbie error of using a local file system for checkpointing and the RDD partition count differences (see exception below). Having neither HDFS nor S3 (and the Cassandra-Connector not yet

Re: Spark Streaming, updateStateByKey and mapPartitions() - and lazy DatabaseConnection

2015-06-12 Thread algermissen1971
, algermissen1971 algermissen1...@icloud.com wrote: Hi, I have a scenario with spark streaming, where I need to write to a database from within updateStateByKey[1]. That means that inside my update function I need a connection. I have so far understood that I should create a new (lazy

Re: Spark Streaming, updateStateByKey and mapPartitions() - and lazy DatabaseConnection

2015-06-12 Thread algermissen1971
of partition / executor / stage, but I get the idea.) Jan On Fri, Jun 12, 2015 at 4:11 PM, algermissen1971 algermissen1...@icloud.com wrote: On 12 Jun 2015, at 22:59, Cody Koeninger c...@koeninger.org wrote: Close. the mapPartitions call doesn't need to do anything at all to the iter

Re: Spark Streaming, updateStateByKey and mapPartitions() - and lazy DatabaseConnection

2015-06-12 Thread algermissen1971
request to store the data), not a DB connection - I presume this does not changethe concept? Jan On Fri, Jun 12, 2015 at 3:55 PM, algermissen1971 algermissen1...@icloud.com wrote: Cody, On 12 Jun 2015, at 17:26, Cody Koeninger c...@koeninger.org wrote: There are several database apis

Spark Streaming, updateStateByKey and mapPartitions() - and lazy DatabaseConnection

2015-06-12 Thread algermissen1971
Hi, I have a scenario with spark streaming, where I need to write to a database from within updateStateByKey[1]. That means that inside my update function I need a connection. I have so far understood that I should create a new (lazy) connection for every partition. But since I am not working

How to obtain ActorSystem and/or ActorFlowMaterializer in updateStateByKey

2015-06-08 Thread algermissen1971
Hi, I am writing some code inside an update function for updateStateByKey that flushes data to a remote system using akk-http. For the akka-http request I need an ActorSystem and an ActorFlowMaterializer. Can anyone share a pattern or insights that address the following questions: - Where and

Re: Roadmap for Spark with Kafka on Scala 2.11?

2015-06-04 Thread algermissen1971
Hi Iulian, On 26 May 2015, at 13:04, Iulian DragoČ™ iulian.dra...@typesafe.com wrote: On Tue, May 26, 2015 at 10:09 AM, algermissen1971 algermissen1...@icloud.com wrote: Hi, I am setting up a project that requires Kafka support and I wonder what the roadmap is for Scala 2.11 Support