Re: How to execute a function from class in distributed jar on each worker node?
I'm not sure that this will work but it makes sense to me. Basically you write the functionality in a static block in a class and broadcast that class. Not sure what your use case is but I need to load a native library and want to avoid running the init in mapPartitions if it's not necessary (just to make the code look cleaner) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-execute-a-function-from-class-in-distributed-jar-on-each-worker-node-tp3870p18611.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming with long batch / window duration
So I think I may end up using hourglass (https://engineering.linkedin.com/datafu/datafus-hourglass-incremental-data-processing-hadoop) a hadoop framework for incremental data processing, it would be very cool if spark (not streaming ) could support something like this -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-long-batch-window-duration-tp10191p10311.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Spark Streaming with long batch / window duration
Would it be a reasonable use case of spark streaming to have a very large window size (lets say on the scale of weeks). In this particular case the reduce function would be invertible so that would aid in efficiency. I assume that having a larger batch size since the window is so large would also lighten the workload for spark. The sliding duration is not too important, I just want to know if this is reasonable for spark to handle with any slide duration -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-long-batch-window-duration-tp10191.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: NullPointerException When Reading Avro Sequence Files
I think you probably want to use `AvroSequenceFileOutputFormat` with `newAPIHadoopFile`. I'm not even sure that in hadoop you would use SequenceFileInput format to read an avro sequence file -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p10203.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark Streaming with long batch / window duration
Unfortunately for reasons I won't go into my options for what I can use are limited, it was more of a curiosity to see if spark could handle a use case like this since the functionality I wanted fit perfectly into the reduceByKeyAndWindow frame of thinking. Anyway thanks for answering. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-long-batch-window-duration-tp10191p10219.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Difference among batchDuration, windowDuration, slideDuration
The only other thing to keep in mind is that window duration and slide duration have to be multiples of batch duration, IDK if you made that fully clear -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Difference-among-batchDuration-windowDuration-slideDuration-tp9966p9973.html Sent from the Apache Spark User List mailing list archive at Nabble.com.