Re: How to execute a function from class in distributed jar on each worker node?

2014-11-11 Thread aaronjosephs
I'm not sure that this will work but it makes sense to me. Basically you
write the functionality in a static block in a class and broadcast that
class. Not sure what your use case is but I need to load a native library
and want to avoid running the init in mapPartitions if it's not necessary
(just to make the code look cleaner)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-execute-a-function-from-class-in-distributed-jar-on-each-worker-node-tp3870p18611.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming with long batch / window duration

2014-07-21 Thread aaronjosephs
So I think  I may end up using hourglass
(https://engineering.linkedin.com/datafu/datafus-hourglass-incremental-data-processing-hadoop)
a hadoop framework for incremental data processing, it would be very cool if
spark (not streaming ) could support something like this



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-long-batch-window-duration-tp10191p10311.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Spark Streaming with long batch / window duration

2014-07-18 Thread aaronjosephs
Would it be a reasonable use case of spark streaming to have a very large
window size (lets say on the scale of weeks). In this particular case the
reduce function would be invertible so that would aid in efficiency. I
assume that having a larger batch size since the window is so large would
also lighten the workload for spark. The sliding duration is not too
important, I just want to know if this is reasonable for spark to handle
with any slide duration



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-long-batch-window-duration-tp10191.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: NullPointerException When Reading Avro Sequence Files

2014-07-18 Thread aaronjosephs
I think you probably want to use `AvroSequenceFileOutputFormat` with
`newAPIHadoopFile`. I'm not even sure that in hadoop you would use
SequenceFileInput format to read an avro sequence file



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p10203.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark Streaming with long batch / window duration

2014-07-18 Thread aaronjosephs
Unfortunately for reasons I won't go into my options for what I can use are
limited, it was more of a curiosity to see if spark could handle a use case
like this since the functionality I wanted fit perfectly into the
reduceByKeyAndWindow frame of thinking. Anyway thanks for answering.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-long-batch-window-duration-tp10191p10219.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Difference among batchDuration, windowDuration, slideDuration

2014-07-16 Thread aaronjosephs
The only other thing to keep in mind is that window duration and slide
duration have to be multiples of batch duration, IDK if you made that fully
clear



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Difference-among-batchDuration-windowDuration-slideDuration-tp9966p9973.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.