Hi,

I have a bunch of large log files on Hadoop. Each line contains a log and
its severity. Is there a way that I can use Spark to split the entire data
set into different files on Hadoop according the severity field? Thanks.
Below is an example of the input and output.

Input:
[ERROR] log1
[INFO] log2
[ERROR] log3
[INFO] log4

Output:
error_file
[ERROR] log1
[ERROR] log3

info_file
[INFO] log2
[INFO] log4


Best,
Hao Wang

Reply via email to