Roy Smith <r...@panix.com> writes: > In article <mailman.96.1365077619.3114.python-l...@python.org>, > Jason Swails <jason.swa...@gmail.com> wrote: > >> The only time I regularly break my rule is for regular expressions (at some >> point I may embrace re.X to allow me to break those up, too). > > re.X is a pretty cool tool for making huge regexes readable. But, it > turns out that python's auto-continuation and string literal > concatenation rules are enough to let you get much the same effect. > Here's a regex we use to parse haproxy log files. This would be utter > line noise all run together. This way, it's almost readable :-) > > pattern = re.compile(r'haproxy\[(?P<pid>\d+)]: ' > r'(?P<client_ip>(\d{1,3}\.){3}\d{1,3}):' > r'(?P<client_port>\d{1,5}) ' > > r'\[(?P<accept_date>\d{2}/\w{3}/\d{4}(:\d{2}){3}\.\d{3})] ' > r'(?P<frontend_name>\S+) ' > r'(?P<backend_name>\S+)/' > r'(?P<server_name>\S+) ' > r'(?P<Tq>(-1|\d+))/' > r'(?P<Tw>(-1|\d+))/' > r'(?P<Tc>(-1|\d+))/' > r'(?P<Tr>(-1|\d+))/' > r'(?P<Tt>\+?\d+) ' > r'(?P<status_code>\d{3}) ' > r'(?P<bytes_read>\d+) ' > r'(?P<captured_request_cookie>\S+) ' > r'(?P<captured_response_cookie>\S+) ' > r'(?P<termination_state>[\w-]{4}) ' > r'(?P<actconn>\d+)/' > r'(?P<feconn>\d+)/' > r'(?P<beconn>\d+)/' > r'(?P<srv_conn>\d+)/' > r'(?P<retries>\d+) ' > r'(?P<srv_queue>\d+)/' > r'(?P<backend_queue>\d+) ' > r'(\{(?P<request_id>.*?)\} )?' > r'(\{(?P<captured_request_headers>.*?)\} )?' > r'(\{(?P<captured_response_headers>.*?)\} )?' > r'"(?P<http_request>.+)"' > ) > > And, for those of you who go running in the other direction every time > regex is suggested as a solution, I challenge you to come up with easier > to read (or write) code for parsing a line like this (probably > hopelessly mangled by the time you read it): > > 2013-04-03T00:00:00+00:00 localhost haproxy[5199]: 10.159.19.244:57291 > [02/Apr/2013:23:59:59.811] app-nodes next-song-nodes/web8.songza.com > 0/0/3/214/219 200 593 sessionid=NWiX5KGOdvg6dSaA > sessionid=NWiX5KGOdvg6dSaA ---- 249/249/149/14/0 0/0 > {4C0ABFA9-515B6DEF-933229} "POST > /api/1/station/892337/song/16024201/notify-play HTTP/1.0"
Is using csv.DictReader with delimiter=' ' not sufficient for this? I did not actually read the regular expression in its entirety. -- regards, kushal -- http://mail.python.org/mailman/listinfo/python-list