Re: [racket-users] Quick regexp question
It is nice. I'll have to play with this as well. Thanks again everyone. On Friday, February 2, 2018 at 5:49:15 PM UTC-6, johnbclements wrote: > > > > > On Feb 2, 2018, at 3:21 PM, Matthew Butterick> wrote: > > > > > >> On Feb 2, 2018, at 10:23 AM, 'John Clements' via Racket Users < > racket...@googlegroups.com > wrote: > >> > >> This macro gets the names in much closer to the corresponding patterns > than matching by index, but it doesn’t actually embed the names into the > regexp. > > > > > > If you like keeping the names and patterns together, you could also > create an association list of the names and subpatterns, and iterate: > > > > #lang racket > > > > (define msg "2018-02-02T11:26:34 someuser some-computername01 > 233.194.20.110 something broke") > > (with-input-from-string msg > > (thunk > >(for/hash ([(name pat) (in-dict '((date . "[-\\dT:]+") > > (username . "\\w+") > > (hostname . "[-\\w\\d]+") > > (ip . "[\\d\\.]+") > > (message . ".+")))]) > > (values name (car (regexp-match (pregexp pat) > (current-input-port))) > > Oh, that’s nice. > > In fact, I’ll tell you what I *really* like about that; it could radically > simplify the irritating process of debugging regexps by breaking them in > various places to perform a binary search; you could instead provide a nice > error message specifying exactly which part of the regexp failed to match. > > One thing to be aware of is that you’d need to make sure that your regexp > still works without backtracking. If you broke #px”.*abc” into #px”.*” and > #px”abc”, it wouldn’t mean the same thing any more. > > John > > > > > > > '#hash((message . #" something broke") > > (date . #"2018-02-02T11:26:34") > > (username . #"someuser") > > (hostname . #"some-computername01") > > (ip . #"233.194.20.110")) > > > > -- > > You received this message because you are subscribed to the Google > Groups "Racket Users" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to racket-users...@googlegroups.com . > > For more options, visit https://groups.google.com/d/optout. > > > > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Quick regexp question
> On Feb 2, 2018, at 3:21 PM, Matthew Butterickwrote: > > >> On Feb 2, 2018, at 10:23 AM, 'John Clements' via Racket Users >> wrote: >> >> This macro gets the names in much closer to the corresponding patterns than >> matching by index, but it doesn’t actually embed the names into the regexp. > > > If you like keeping the names and patterns together, you could also create an > association list of the names and subpatterns, and iterate: > > #lang racket > > (define msg "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 > something broke") > (with-input-from-string msg > (thunk >(for/hash ([(name pat) (in-dict '((date . "[-\\dT:]+") > (username . "\\w+") > (hostname . "[-\\w\\d]+") > (ip . "[\\d\\.]+") > (message . ".+")))]) > (values name (car (regexp-match (pregexp pat) > (current-input-port))) Oh, that’s nice. In fact, I’ll tell you what I *really* like about that; it could radically simplify the irritating process of debugging regexps by breaking them in various places to perform a binary search; you could instead provide a nice error message specifying exactly which part of the regexp failed to match. One thing to be aware of is that you’d need to make sure that your regexp still works without backtracking. If you broke #px”.*abc” into #px”.*” and #px”abc”, it wouldn’t mean the same thing any more. John > > > '#hash((message . #" something broke") > (date . #"2018-02-02T11:26:34") > (username . #"someuser") > (hostname . #"some-computername01") > (ip . #"233.194.20.110")) > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Quick regexp question
> On Feb 2, 2018, at 10:23 AM, 'John Clements' via Racket Users >wrote: > > This macro gets the names in much closer to the corresponding patterns than > matching by index, but it doesn’t actually embed the names into the regexp. If you like keeping the names and patterns together, you could also create an association list of the names and subpatterns, and iterate: #lang racket (define msg "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something broke") (with-input-from-string msg (thunk (for/hash ([(name pat) (in-dict '((date . "[-\\dT:]+") (username . "\\w+") (hostname . "[-\\w\\d]+") (ip . "[\\d\\.]+") (message . ".+")))]) (values name (car (regexp-match (pregexp pat) (current-input-port))) '#hash((message . #" something broke") (date . #"2018-02-02T11:26:34") (username . #"someuser") (hostname . #"some-computername01") (ip . #"233.194.20.110")) -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Quick regexp question
In the long run this is probably better than what I wanted. Thank you On Friday, February 2, 2018 at 12:23:48 PM UTC-6, johnbclements wrote: > > Not sure if this gets you as far as you want, but you could use a macro to > associate names with paren-wrapped items: > > #lang racket > > (define-syntax re-match > (syntax-rules () > [(_ str re name ...) > (match str >[(regexp re (list _ name ...)) > (list (list (quote name) name) ...)])])) > > (define msg "2018-02-02T11:26:34 someuser some-computername01 > 233.194.20.110 something broke") > > (re-match msg > #px"^([-\\dT:]+)\\s(\\w+)\\s([-\\w\\d]+)\\s([\\d\\.]+)\\s(.+)$" > date username hostname ip message) > > … produces: > > '((date "2018-02-02T11:26:34") > (username "someuser") > (hostname "some-computername01") > (ip "233.194.20.110") > (message "something broke”)) > > > This macro gets the names in much closer to the corresponding patterns > than matching by index, but it doesn’t actually embed the names into the > regexp. > > John Clements > > > > On Feb 2, 2018, at 10:01 AM, noch...@gmail.com wrote: > > > > Sorry if I've missed this in the documentation, but I don't see it, and > it is starting to bother me. > > > > In Powershell. Python, and Splunk I'm able to perform automatic field > extraction on strings and access the values of fields by name. Is there a > way to do this in Racket? Of course, pairing matches with field names by > index is an option, but not as convenient in some situations. > > > > Take string "2018-02-02T11:26:34 someuser some-computername01 > 233.194.20.110 something broke" as a trivial example. > > > > Powershell: > > "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 > something broke" -match > "^(?[\d\-T:]+)\s(?\w+)\s(?[\w\-\d]+)\s(?[\d\.]+)\s(?.+)$" > > | Out-Null > > > > $matches.date > > $matches.username > > $matches.hostname > > $matches.IP > > $matches.message > > > > Python: > > m = > re.match("^(?P[\d\-T:]+)\s(?P\w+)\s(?P[\w\-\d]+)\s(?P[\d\.]+)\s(?P.+)$", > > "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something > broke") > > > > m['Date'] > > m['Username'] > > m['Hostname'] > > m['IP'] > > m['Message'] > > > > Both output: > > > > 2018-02-02T11:26:34 > > someuser > > some-computername01 > > 233.194.20.110 > > something broke > > > > -- > > You received this message because you are subscribed to the Google > Groups "Racket Users" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to racket-users...@googlegroups.com . > > For more options, visit https://groups.google.com/d/optout. > > > > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Quick regexp question
Not sure if this gets you as far as you want, but you could use a macro to associate names with paren-wrapped items: #lang racket (define-syntax re-match (syntax-rules () [(_ str re name ...) (match str [(regexp re (list _ name ...)) (list (list (quote name) name) ...)])])) (define msg "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something broke") (re-match msg #px"^([-\\dT:]+)\\s(\\w+)\\s([-\\w\\d]+)\\s([\\d\\.]+)\\s(.+)$" date username hostname ip message) … produces: '((date "2018-02-02T11:26:34") (username "someuser") (hostname "some-computername01") (ip "233.194.20.110") (message "something broke”)) This macro gets the names in much closer to the corresponding patterns than matching by index, but it doesn’t actually embed the names into the regexp. John Clements > On Feb 2, 2018, at 10:01 AM, nocher...@gmail.com wrote: > > Sorry if I've missed this in the documentation, but I don't see it, and it is > starting to bother me. > > In Powershell. Python, and Splunk I'm able to perform automatic field > extraction on strings and access the values of fields by name. Is there a > way to do this in Racket? Of course, pairing matches with field names by > index is an option, but not as convenient in some situations. > > Take string "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 > something broke" as a trivial example. > > Powershell: > "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something > broke" -match > "^(?[\d\-T:]+)\s(?\w+)\s(?[\w\-\d]+)\s(?[\d\.]+)\s(?.+)$" > | Out-Null > > $matches.date > $matches.username > $matches.hostname > $matches.IP > $matches.message > > Python: > m = > re.match("^(?P[\d\-T:]+)\s(?P\w+)\s(?P[\w\-\d]+)\s(?P[\d\.]+)\s(?P.+)$", > "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something > broke") > > m['Date'] > m['Username'] > m['Hostname'] > m['IP'] > m['Message'] > > Both output: > > 2018-02-02T11:26:34 > someuser > some-computername01 > 233.194.20.110 > something broke > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Quick regexp question
Sorry if I've missed this in the documentation, but I don't see it, and it is starting to bother me. In Powershell. Python, and Splunk I'm able to perform automatic field extraction on strings and access the values of fields by name. Is there a way to do this in Racket? Of course, pairing matches with field names by index is an option, but not as convenient in some situations. Take string "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something broke" as a trivial example. Powershell: "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something broke" -match "^(?[\d\-T:]+)\s(?\w+)\s(?[\w\-\d]+)\s(?[\d\.]+)\s(?.+)$" | Out-Null $matches.date $matches.username $matches.hostname $matches.IP $matches.message Python: m = re.match( "^(?P[\d\-T:]+)\s(?P\w+)\s(?P[\w\-\d]+)\s(?P[\d\.]+)\s(?P.+)$" , "2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something broke") m['Date'] m['Username'] m['Hostname'] m['IP'] m['Message'] Both output: 2018-02-02T11:26:34 someuser some-computername01 233.194.20.110 something broke -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.