Re: How do I Process Streams that span multiple lines?
If you are using Kafka, then you can basically push an entire file as a message to Kafka. In that case in your DStream, you will receive the single message which is the contents of the file and it can of course span multiple lines. Thanks Best Regards On Mon, Aug 3, 2015 at 8:27 PM, Spark Enthusiast sparkenthusi...@yahoo.in wrote: All examples of Spark Stream programming that I see assume streams of lines that are then tokenised and acted upon (like the WordCount example). How do I process Streams that span multiple lines? Are there examples that I can use?
How do I Process Streams that span multiple lines?
All examples of Spark Stream programming that I see assume streams of lines that are then tokenised and acted upon (like the WordCount example). How do I process Streams that span multiple lines? Are there examples that I can use?
Re: How do I Process Streams that span multiple lines?
Are you looking for RDD.wholeTextFiles? On 3 August 2015 at 10:57, Spark Enthusiast sparkenthusi...@yahoo.in wrote: All examples of Spark Stream programming that I see assume streams of lines that are then tokenised and acted upon (like the WordCount example). How do I process Streams that span multiple lines? Are there examples that I can use?
Re: How do I Process Streams that span multiple lines?
Sorry. SparkContext.wholeTextFiles Not sure about streams. On 3 August 2015 at 14:50, Michal Čizmazia mici...@gmail.com wrote: Are you looking for RDD.wholeTextFiles? On 3 August 2015 at 10:57, Spark Enthusiast sparkenthusi...@yahoo.in wrote: All examples of Spark Stream programming that I see assume streams of lines that are then tokenised and acted upon (like the WordCount example). How do I process Streams that span multiple lines? Are there examples that I can use?