Hi All, I am planning to handle streaming data from kafka to spark Using python code Earlier using my own log files i handled them in spark using INDEX But in case of Apache log I cannot prefer index because by splitting with whitespace, index will be missed so Is that Possible to use regex in TrasformRDD ? OR Any other possible ways to for different groups ex:- THIS IS THE APACHE LOG
[u'10.10.80.1', u'-', u'-', u'[08/Sep/2015:12:15:15', u'+0530]', u'"GET', u'/', u'HTTP/1.1"', u'200', u'1213', u'"-"', u'"Mozilla/5.0', u'(Windows', u'NT', u'10.0;', u'WOW64)', u'AppleWebKit/537.36', u'(KHTML,', u'like', u'Gecko)', u'Chrome/45.0.2454.85', u'Safari/537.36"'] I NEED LIKE THIS IP: 10.10.80.1 IDENTITY: - USER: - TIME: 08/Sep/2015:12:15:15 +0530 SERVER MESSAGE: GET /favicon.ico HTTP/1.1 STATUS: 404 SIZE: 514 REFERER: http://xxxxxxxxxxxxxxxx.com/ CLIENT MESSAGE: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36 Thanks & Regards Amithsha --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org