Thanks for the link. I’m still running 1.3.1 but will give it a try :)
Hao
On Jun 13, 2015, at 9:38 AM, Will Briggs wrbri...@gmail.com wrote:
Check out this recent post by Cheng Liam regarding dynamic partitioning in
Spark 1.4:
I am currently using filter inside a loop of all severity levels to do this,
which I think is pretty inefficient. It has to read the entire data set once
for each severity. I wonder if there is a more efficient way that takes just
one pass of the data? Thanks.
Best,
Hao Wang
On Jun 13, 2015,
Check out this recent post by Cheng Liam regarding dynamic partitioning in
Spark 1.4: https://www.mail-archive.com/user@spark.apache.org/msg30204.html
On June 13, 2015, at 5:41 AM, Hao Wang bill...@gmail.com wrote:
Hi,
I have a bunch of large log files on Hadoop. Each line contains a log and
Hi,
I have a bunch of large log files on Hadoop. Each line contains a log and
its severity. Is there a way that I can use Spark to split the entire data
set into different files on Hadoop according the severity field? Thanks.
Below is an example of the input and output.
Input:
[ERROR] log1
Are you looking for something like filter? See a similar example here
https://spark.apache.org/examples.html
Thanks
Best Regards
On Sat, Jun 13, 2015 at 3:11 PM, Hao Wang bill...@gmail.com wrote:
Hi,
I have a bunch of large log files on Hadoop. Each line contains a log and
its severity. Is