Re: How to split log data into different files according to severity

2015-06-14 Thread Hao Wang
Thanks for the link. I’m still running 1.3.1 but will give it a try :) Hao On Jun 13, 2015, at 9:38 AM, Will Briggs wrbri...@gmail.com wrote: Check out this recent post by Cheng Liam regarding dynamic partitioning in Spark 1.4:

Re: How to split log data into different files according to severity

2015-06-13 Thread Hao Wang
I am currently using filter inside a loop of all severity levels to do this, which I think is pretty inefficient. It has to read the entire data set once for each severity. I wonder if there is a more efficient way that takes just one pass of the data? Thanks. Best, Hao Wang On Jun 13, 2015,

Re: How to split log data into different files according to severity

2015-06-13 Thread Will Briggs
Check out this recent post by Cheng Liam regarding dynamic partitioning in Spark 1.4: https://www.mail-archive.com/user@spark.apache.org/msg30204.html On June 13, 2015, at 5:41 AM, Hao Wang bill...@gmail.com wrote: Hi, I have a bunch of large log files on Hadoop. Each line contains a log and

How to split log data into different files according to severity

2015-06-13 Thread Hao Wang
Hi, I have a bunch of large log files on Hadoop. Each line contains a log and its severity. Is there a way that I can use Spark to split the entire data set into different files on Hadoop according the severity field? Thanks. Below is an example of the input and output. Input: [ERROR] log1

Re: How to split log data into different files according to severity

2015-06-13 Thread Akhil Das
Are you looking for something like filter? See a similar example here https://spark.apache.org/examples.html Thanks Best Regards On Sat, Jun 13, 2015 at 3:11 PM, Hao Wang bill...@gmail.com wrote: Hi, I have a bunch of large log files on Hadoop. Each line contains a log and its severity. Is