I have attempted to write myself a parser for the Apache Log formats, and
starting off, I seem to have a parser that works for the two main parts I am
looking at now: ip and date. The problem is the transform.
My transform looks like
class WebLogTransform < Parslet::Transform
rule(:wordmonth => simple(:month)) {
['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct',
'Nov', 'Dec'].index(month) + 1
}
rule(:rawdate => simple(:date)) {
DateTime.new(Integer(Date.year), Integer(Date.month), Integer(Date.day))
}
end
(* note I just added the conversion for wordmonth, and rawdate depends on it,
not sure best way to handle that).
This thing doesn't affect anything. I would like to think that I have gotten a
handle on everything else.
Any suggestions?
The output is like this:
=> {:IP=>"137.207.74.55"@0, :rawdate=>{:day=>"08"@19,
:month=>{:month=>"Feb"@22}, :year=>"2013"@26, :hour=>"19"@31, :minute=>"28"@34,
:second=>"10"@37, :timezone=>{:tzpm=>"-"@40, :tz=>"0500"@41}}}
The full parser file is below. I am amazed at how clean this is, much easier to
read than Boost::Spirit.
With thanks,
Jeffrey Drake.
#!/usr/bin/env ruby
require 'parslet'
require 'date'
class WebLog < Parslet::Parser
rule(:integer) { match('[0-9]').repeat(1) }
rule(:space) { match('\s').repeat(1) }
rule(:space?) { space.maybe }
rule(:dot) { match('.') }
rule(:month) { (str('Jan') | str('Feb') |
str('Mar') | str('Apr') |
str('May') | str('Jun') |
str('Jul') | str('Aug') |
str('Sep') | str('Oct') |
str('Nov') | str('Dec')).as(:wordmonth) >> space?
}
rule(:timezone) { match('[+-]').as(:tzpm) >> integer.as(:tz) >> space? }
rule(:date) { str('[') >> integer.as(:day) >>
str('/') >> month.as(:month) >>
str('/') >> integer.as(:year) >>
str(':') >> integer.as(:hour) >>
str(':') >> integer.as(:minute) >>
str(':') >> integer.as(:second) >>
space? >> timezone.as(:timezone) >>
str(']')
}
rule(:ipaddr) { integer >> dot >>
integer >> dot >>
integer >> dot >>
integer }
rule(:weblog) { ipaddr.as(:IP) >> space? >>
str('-') >> space? >> str('-') >> space? >>
date.as(:rawdate)
}
root :weblog
end
class WebLogTransform < Parslet::Transform
rule(:wordmonth => simple(:month)) {
['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct',
'Nov', 'Dec'].index(month) + 1
}
rule(:rawdate => simple(:date)) {
DateTime.new(Integer(Date.year), Integer(Date.month), Integer(Date.day))
}
end
def parse(str)
log = WebLog.new
trans = WebLogTransform.new
puts trans.apply(log.parse(str))
rescue Parslet::ParseFailed => failure
puts failure.cause.ascii_tree
end
parse "137.207.74.55 - - [08/Feb/2013:19:28:10 -0500]"