Re: patterns.c question or possible bug
On Jan 30, 2018 12:05 AM, Ori Bernstein wrote: > > On Mon, 29 Jan 2018 23:23:18 -0600, Edgar Pettijohn > wrote: > > > I'm trying to use patterns.c for some pattern matching. The manual > > mentions captures using "()" around what you want to capture. I don't > > see how to get at the data though. Here is a sample program. > > > > #include > > #include "patterns.h" > > > > int > > main(int argc, char *argv[]) > > { > > const char *errstr = NULL; > > const char *string = "the quick the brown the fox"; > > const char *pattern = "the"; > > int ret; > > struct str_match match; > > > > ret = str_match(string, pattern, &match, &errstr); > > > > if (errstr != NULL) > > printf("%s\n", errstr); > > else > > printf("number of matches %d\n", match.sm_nmatch); > > > > return 0; > > } > > > > It prints 2 which I was expecting 3. I've tried multiple other patterns > > and it seems the answer is always 2. Which leads me to believe I'm doing > > something wrong. Any assistance appreciated. > > > > > > Thanks, > > > > > > Edgar > > The code is looking for a match of the pattern in the string, not all matches > of the pattern in the string. It also makes the (IMO, surprising) decision > that not having any capture groups in the pattern implies capturing the whole > pattern. The whole string goes into the first match. > > So, in your case, you're matching: > > "the quick the brown the fox"; > ^^^ > > Accordingly: > > matches.sm_match[0] = "the quick the brown the fox" > matches.sm_match[1] = "the" > > If you had 'quick', you'd get similar behavior: > > "the quick the brown the fox"; > > > Equivalently, putting the whole pattern in '()' will match the same thing: > > pattern = "(quick)" > > But multiple parens will match their substrings: > > pattern = "(qu)ick (the)" > > "the quick the brown the fox"; > ^^ ^^^ > matches.sm_match[0] = "the quick the brown the fox" > matches.sm_match[1] = "qu" > matches.sm_match[2] = "the" > > The choice to capture implicitly, I think, is confusing, but the behavior > seems to me to be correct. > > -- > Ori Bernstein Thanks. Makes sense now. Probably would have figured it out for myself if I'd have printed out matches.sm_match[0], etc. Live and learn. Edgar
Re: patterns.c question or possible bug
On Tue, Jan 30, 2018 at 07:48:17AM +0100, Otto Moerbeek wrote: > On Mon, Jan 29, 2018 at 11:23:18PM -0600, Edgar Pettijohn wrote: > > > I'm trying to use patterns.c for some pattern matching. The manual mentions > > captures using "()" around what you want to capture. I don't see how to get > > at the data though. Here is a sample program. > > > > #include > > #include "patterns.h" > > > > int > > main(int argc, char *argv[]) > > { > > const char*errstr = NULL; > > const char*string = "the quick the brown the fox"; > > const char*pattern = "the"; > > intret; > > struct str_match match; > > > > ret = str_match(string, pattern, &match, &errstr); > > > > if (errstr != NULL) > > printf("%s\n", errstr); > > else > > printf("number of matches %d\n", match.sm_nmatch); > > > > return 0; > > } > > > > It prints 2 which I was expecting 3. I've tried multiple other patterns and > > it seems the answer is always 2. Which leads me to believe I'm doing > > something wrong. Any assistance appreciated. > > > > > > Thanks, > > > > > > Edgar > > Hmm, str_match() isn't a function in any OpenBSD API. So I have no > idea what function you are talking about. > > -Otto > It is in httpd patterns.c, which is based on the LUA pattern matching code. -- Kind regards, Hiltjo
Re: patterns.c question or possible bug
On Mon, 29 Jan 2018 23:23:18 -0600, Edgar Pettijohn wrote: > I'm trying to use patterns.c for some pattern matching. The manual > mentions captures using "()" around what you want to capture. I don't > see how to get at the data though. Here is a sample program. > > #include > #include "patterns.h" > > int > main(int argc, char *argv[]) > { > const char*errstr = NULL; > const char*string = "the quick the brown the fox"; > const char*pattern = "the"; > intret; > struct str_match match; > > ret = str_match(string, pattern, &match, &errstr); > > if (errstr != NULL) > printf("%s\n", errstr); > else > printf("number of matches %d\n", match.sm_nmatch); > > return 0; > } > > It prints 2 which I was expecting 3. I've tried multiple other patterns > and it seems the answer is always 2. Which leads me to believe I'm doing > something wrong. Any assistance appreciated. > > > Thanks, > > > Edgar The code is looking for a match of the pattern in the string, not all matches of the pattern in the string. It also makes the (IMO, surprising) decision that not having any capture groups in the pattern implies capturing the whole pattern. The whole string goes into the first match. So, in your case, you're matching: "the quick the brown the fox"; ^^^ Accordingly: matches.sm_match[0] = "the quick the brown the fox" matches.sm_match[1] = "the" If you had 'quick', you'd get similar behavior: "the quick the brown the fox"; Equivalently, putting the whole pattern in '()' will match the same thing: pattern = "(quick)" But multiple parens will match their substrings: pattern = "(qu)ick (the)" "the quick the brown the fox"; ^^^^^ matches.sm_match[0] = "the quick the brown the fox" matches.sm_match[1] = "qu" matches.sm_match[2] = "the" The choice to capture implicitly, I think, is confusing, but the behavior seems to me to be correct. -- Ori Bernstein
Re: patterns.c question or possible bug
On Mon, Jan 29, 2018 at 11:23:18PM -0600, Edgar Pettijohn wrote: > I'm trying to use patterns.c for some pattern matching. The manual mentions > captures using "()" around what you want to capture. I don't see how to get > at the data though. Here is a sample program. > > #include > #include "patterns.h" > > int > main(int argc, char *argv[]) > { > const char*errstr = NULL; > const char*string = "the quick the brown the fox"; > const char*pattern = "the"; > intret; > struct str_match match; > > ret = str_match(string, pattern, &match, &errstr); > > if (errstr != NULL) > printf("%s\n", errstr); > else > printf("number of matches %d\n", match.sm_nmatch); > > return 0; > } > > It prints 2 which I was expecting 3. I've tried multiple other patterns and > it seems the answer is always 2. Which leads me to believe I'm doing > something wrong. Any assistance appreciated. > > > Thanks, > > > Edgar Hmm, str_match() isn't a function in any OpenBSD API. So I have no idea what function you are talking about. -Otto
patterns.c question or possible bug
I'm trying to use patterns.c for some pattern matching. The manual mentions captures using "()" around what you want to capture. I don't see how to get at the data though. Here is a sample program. #include #include "patterns.h" int main(int argc, char *argv[]) { const char*errstr = NULL; const char*string = "the quick the brown the fox"; const char*pattern = "the"; intret; struct str_match match; ret = str_match(string, pattern, &match, &errstr); if (errstr != NULL) printf("%s\n", errstr); else printf("number of matches %d\n", match.sm_nmatch); return 0; } It prints 2 which I was expecting 3. I've tried multiple other patterns and it seems the answer is always 2. Which leads me to believe I'm doing something wrong. Any assistance appreciated. Thanks, Edgar