I'm not sure that this will work but it makes sense to me. Basically you
write the functionality in a static block in a class and broadcast that
class. Not sure what your use case is but I need to load a native library
and want to avoid running the init in mapPartitions if it's not necessary
(just
So I think I may end up using hourglass
(https://engineering.linkedin.com/datafu/datafus-hourglass-incremental-data-processing-hadoop)
a hadoop framework for incremental data processing, it would be very cool if
spark (not streaming ) could support something like this
--
View this message in
Would it be a reasonable use case of spark streaming to have a very large
window size (lets say on the scale of weeks). In this particular case the
reduce function would be invertible so that would aid in efficiency. I
assume that having a larger batch size since the window is so large would
also
I think you probably want to use `AvroSequenceFileOutputFormat` with
`newAPIHadoopFile`. I'm not even sure that in hadoop you would use
SequenceFileInput format to read an avro sequence file
--
View this message in context:
Unfortunately for reasons I won't go into my options for what I can use are
limited, it was more of a curiosity to see if spark could handle a use case
like this since the functionality I wanted fit perfectly into the
reduceByKeyAndWindow frame of thinking. Anyway thanks for answering.
--
View
The only other thing to keep in mind is that window duration and slide
duration have to be multiples of batch duration, IDK if you made that fully
clear
--
View this message in context: