Re: GRegex regular expression failing to match
On Mon, May 14, 2012 at 01:36:02PM -0800, Christopher Howard wrote: Is there anything wrong with the regexp, Sure. Two things. It should be ^/[0-9]+$ not ^/d+$ First, it lacks the backslash to make \d a digit atom. But, second, since \d matches a digit (possibly whatever Unicode may say is a digit), not 0-9, you should really use [0-9] to match ASCII digits. Yeti ___ gtk-app-devel-list mailing list gtk-app-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-app-devel-list
Re: GRegex(win32) : 500 tests passed, 3 failed
Il giorno gio, 15/03/2007 alle 18.41 +0100, Hans Breuer ha scritto: with only small modifications I was able to compile GRegex with msvc, thanks for providing an almost working makefile.msc ;-) [...] But now for the question: are these 3 failed specific to my build so I should investigate them further? It's my fault, I wrote makefile.msc (without testing it) before the release of PCRE 7. PCRE 6.x can recognize as a newline one of \n, \r or \r\n. PCRE 7.x added the ability to match any newline character, so I changed the default value from 10 (\n) to -1 (PCRE_NEWLINE_ANY) in Makefile.am but not in makefile.msc. Sorry :) -- Marco Barisione http://www.barisione.org/ ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Re: GRegex(win32) : 500 tests passed, 3 failed
Having newlines seems suspicious. What kind of newlines are they? Hans Breuer wrote: with only small modifications I was able to compile GRegex with msvc, thanks for providing an almost working makefile.msc ;-) The first attempt to run regex-test.exe --noisy did crash due to gnulib not liking g_strdup_vprintf (matching \%s\ against \%s\ \t, %, \p{Common}) The attached patch works around this and also removes the #include glib.h from gregex.h. I think it is better to only include required sub-headers like almost all glib/*.h do. But now for the question: are these 3 failed specific to my build so I should investigate them further? Thanks, Hans matching a b c against ^b$ (start: 0, len: -1) failed (unexpected mismatch) matching a b c against ^b$ (start: 0, len: -1) failed (unexpected mismatch) matching a against a# b (start: 0, len: -1)failed (unexpected match) Hans at Breuer dot Org --- Tell me what you need, and I'll tell you how to get along without it.-- Dilbert Index: glib/gregex.h === --- glib/gregex.h (revision 5410) +++ glib/gregex.h (working copy) @@ -22,7 +22,8 @@ #ifndef __G_REGEX_H__ #define __G_REGEX_H__ -#include glib.h +#include glib/gerror.h +#include glib/gstring.h G_BEGIN_DECLS Index: tests/regex-test.c === --- tests/regex-test.c(revision 5409) +++ tests/regex-test.c(working copy) @@ -230,7 +230,10 @@ gbooleanexpected) { gboolean match; - + + if (string[0] == '%' string[1] == '\0') + string = %%; + verbose (matching \%s\ against \%s\ \t, string, pattern); match = g_regex_match_simple (pattern, string, compile_opts, match_opts); ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list -- JAKE GOULDING Software Engineer [EMAIL PROTECTED] Vivísimo [Search Done Right] 1710 Murray Avenue Pittsburgh, PA 15217 USA tel: +1.412.422.2499 x105 fax: +1.412.422.2495 vivisimo.com clusty.com ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Re: GRegex
Il giorno sab, 28/10/2006 alle 19.35 +0200, Murray Cumming ha scritto: If it's possible, it would be nice to avoid making it a GObject just to add easy reference counting. That tends to restrict how it can be wrapped by language bindings for whom automatic memory management is not the default. It can't be a GObject because GRegex will be in libglib. I don't know exactly how it might be done in C (it's easy in C++), but I would hope that there's some way to reference-count anything without forcing the object itself to do the reference counting. What do you mean? GRegex handles ref counting as other structures in GLib. -- Marco Barisione http://www.barisione.org/ ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Re: GRegex
On 10/24/06, Behdad Esfahbod [EMAIL PROTECTED] wrote: On Tue, 2006-10-24 at 16:05 -0400, Marco Barisione wrote: This is broken. It should err at configure time, not run time. The user shouldn't need to check the output of g_regex_new for failures, just like any other thing we do with glib. I have just uploaded a new patch that corrects this and some other problems. I kept the run-time check, it's useful if cross-compiling or if the installed PCRE library is updated. -- Marco Barisione http://www.barisione.org/ ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Re: GRegex
On 10/24/06, Marco Barisione [EMAIL PROTECTED] wrote: As discussed some times ago [1] I propose to add a PCRE wrapper to GLib. Bug #50075 [2] contains a patch that adds it as a separate libgregex. The documentation of the new API is at [3] (yes, there are some unresolved problems with gtk-doc). Owen Taylor would prefer to have GRegex directly in the main GLib library: [...] To give you an idea of the size of libgregex and libpcre, these are the sizes of the stripped .so files on my computer: libgregex with internal PCRE 138 KB libgregex with system PCRE24 KB libpcre with Unicode support 125 KB libpcre without Unicode support 96 KB -- Marco Barisione http://www.barisione.org/ ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Re: GRegex
Hi Marco, Please take my review with a grain of salt. I've been wanting a convenience API on top of PCRE for a while now, and it would be great if we could get something like GRegex into Glib proper. 1) Please don't name variables 'string', as there may be a conflict with C++'s std::string 2) I noticed that there are g_regex_ref/unref() methods. Why did you choose to do this, rather than subclass GObject? You would also then have easy GObject-style accessors for the regex's pattern and match_options. 3) Should there be a GRegexMatch object too? For instance, at least Python and Java have a notion of a read-only Pattern and a Match Set. Your design combines the two into a single GRegex object. Having the pattern be read-only gets around your thread-safety gotcha comment in the docs. 4) Python's search() and match() methods have a start position and an end position argument, while your match_full() has only a start position argument. Is there a reason for this? Could it be implemented? 5) I didn't fully investigate, but Java and Python have a concept of search vs. match with slightly different semantics. Is this semantic distinction easily expressible in your API? http://docs.python.org/lib/re-objects.html 6) GRegex requires that PCRE be built with UTF-8 support, which some existing installations aren't. For reference, Gnumeric and Goffice get around this by including a copy of PCRE in their distribution and statically link it in. How do you ensure that GRegex finds a version of PCRE compiled with UTF-8 support? Thanks, Dom On 10/24/06, Marco Barisione [EMAIL PROTECTED] wrote: As discussed some times ago [1] I propose to add a PCRE wrapper to GLib. Bug #50075 [2] contains a patch that adds it as a separate libgregex. The documentation of the new API is at [3] (yes, there are some unresolved problems with gtk-doc). Owen Taylor would prefer to have GRegex directly in the main GLib library: (17:38:55) owen: is the latest plan for gregex really a separate library? (17:39:45) mclasen: owen: you would prefer it folded in ? (17:40:16) owen: mclasen: I think it makes tons more sense folded in. A regular expression facility is most useful if you can just use it when you need it (17:40:36) owen: mclasen: And on the desktop, having it folded in is purely a performance win (17:41:36) owen: if there is an embedded problem (how big is it anyways?) then a --without-regex configure option would be better (17:43:19) mclasen: owen: you are probably right What are your ideas? I would like to add to the documentation a simple and short tutorial on regular expressions and GRegex API. Does someone know something good (and with a compatible license) to copy? [1] http://mail.gnome.org/archives/gtk-devel-list/2006-July/msg00099.html [2] http://bugzilla.gnome.org/show_bug.cgi?id=50075 [3] http://www.barisione.org/gregex/ -- Marco Barisione http://www.barisione.org/ ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list -- Counting bodies like sheep to the rhythm of the war drums. ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Re: GRegex
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/24/2006 10:17 AM, Dominic Lachowicz wrote: Hi Marco, Please take my review with a grain of salt. I've been wanting a convenience API on top of PCRE for a while now, and it would be great if we could get something like GRegex into Glib proper. [...] 2) I noticed that there are g_regex_ref/unref() methods. Why did you choose to do this, rather than subclass GObject? You would also then have easy GObject-style accessors for the regex's pattern and match_options. In that case, GRegex couldn't be included in libglib proper. It would have to be in libgobject, or in a separate (libgregex?) library that depends on libgobject. -brian -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2.2 (MingW32) iD8DBQFFPmxp6XyW6VEeAnsRAvniAKCL71koL8aWDduD1Xn+wnRVvgTI9QCfb2OP NEvfq3v8t1K+EJ4PUiIh8z8= =IL7u -END PGP SIGNATURE- ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Re: GRegex
Il giorno mar, 24/10/2006 alle 13.17 -0400, Dominic Lachowicz ha scritto: 1) Please don't name variables 'string', as there may be a conflict with C++'s std::string I think they were called string in the original version of GRegex written by Scott Wimer in 1999. PCRE calls the string subject. However it's not a problem with C++, this program is valid: #include string #include iostream using namespace std; int main () { string string = hello; cout string endl; } 2) I noticed that there are g_regex_ref/unref() methods. Why did you choose to do this, rather than subclass GObject? You would also then have easy GObject-style accessors for the regex's pattern and match_options. The original plan was to include directly GRegex in GLib, so it cannot depend on GObject. This could be changed if we decide to include GRegex in a separate library. However is really necessary to have a real object? I added _ref and _unref because the only two programs that are currently using my modified version of EggRegex are GtkSourceView and MooEdit.Both programs need reference counting for regular expressions. In Glib there are other structures that are reference counted without being objects, such as GHashTable, GAsyncQueue, GIOChannel and others. 3) Should there be a GRegexMatch object too? For instance, at least Python and Java have a notion of a read-only Pattern and a Match Set. Your design combines the two into a single GRegex object. Having the pattern be read-only gets around your thread-safety gotcha comment in the docs. I know this but using them in a language with garbage collector is easier. The regex class in QT uses the same approach of GRegex. 4) Python's search() and match() methods have a start position and an end position argument, while your match_full() has only a start position argument. Is there a reason for this? Could it be implemented? It has a length argument. 5) I didn't fully investigate, but Java and Python have a concept of search vs. match with slightly different semantics. Is this semantic distinction easily expressible in your API? http://docs.python.org/lib/re-objects.html In Python match matches only at the start of the string, search at any position. You can have the match behavior adding a ^ at the beginning of the string or passing the compile option G_REGEX_ANCHORED or the match option G_REGEX_MATCH_ANCHORED. I prefer to have only a function as I always this distinction in Python a bit confusing. 6) GRegex requires that PCRE be built with UTF-8 support, which some existing installations aren't. For reference, Gnumeric and Goffice get around this by including a copy of PCRE in their distribution and statically link it in. How do you ensure that GRegex finds a version of PCRE compiled with UTF-8 support? The default for GRegex is to use its internal copy of PCRE. This is automatically patched to use GLib for Unicode and memory management. If you prefer you can pass --enable-system-pcre to use the system-supplied library but, if it's compiled without utf-8 support, g_regex_new fails. -- Marco Barisione http://www.barisione.org/ ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Re: GRegex
On Tue, 2006-10-24 at 22:05 +0200, Marco Barisione wrote: Il giorno mar, 24/10/2006 alle 13.17 -0400, Dominic Lachowicz ha scritto: 1) Please don't name variables 'string', as there may be a conflict with C++'s std::string I think they were called string in the original version of GRegex written by Scott Wimer in 1999. PCRE calls the string subject. However it's not a problem with C++, this program is valid: #include string #include iostream using namespace std; int main () { string string = hello; cout string endl; } It's not necessary to challenge every compiler and every build environment with that. A rename is easy. 2) I noticed that there are g_regex_ref/unref() methods. Why did you choose to do this, rather than subclass GObject? You would also then have easy GObject-style accessors for the regex's pattern and match_options. The original plan was to include directly GRegex in GLib, so it cannot depend on GObject. This could be changed if we decide to include GRegex in a separate library. However is really necessary to have a real object? I added _ref and _unref because the only two programs that are currently using my modified version of EggRegex are GtkSourceView and MooEdit.Both programs need reference counting for regular expressions. [snip] Do they need to reference count plain strings too? -- Murray Cumming [EMAIL PROTECTED] www.murrayc.com www.openismus.com ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Re: GRegex
On Tue, 2006-10-24 at 16:05 -0400, Marco Barisione wrote: If you prefer you can pass --enable-system-pcre to use the system-supplied library but, if it's compiled without utf-8 support, g_regex_new fails. This is broken. It should err at configure time, not run time. The user shouldn't need to check the output of g_regex_new for failures, just like any other thing we do with glib. -- behdad http://behdad.org/ Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill -- Dan Bern, New American Language ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Re: GRegex
On 10/24/06, Behdad Esfahbod [EMAIL PROTECTED] wrote: On Tue, 2006-10-24 at 16:05 -0400, Marco Barisione wrote: If you prefer you can pass --enable-system-pcre to use the system-supplied library but, if it's compiled without utf-8 support, g_regex_new fails. This is broken. It should err at configure time, not run time. The user shouldn't need to check the output of g_regex_new for failures, just like any other thing we do with glib. It should be possible to write an auto* check that basically checks whether something like: #include pcre.h int main(int argc, char ** argv) { int has_utf8_support; if(pcre_config(PCRE_CONFIG_UTF8, has_utf8_support)) return has_utf8_support; return 0; } returns '1' or '0'. If so, we should probably favor the system installation of PCRE over the glib-supplied one. Best, Dom -- Counting bodies like sheep to the rhythm of the war drums. ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Re: GRegex
On Tue, 2006-10-24 at 16:48 -0400, Dominic Lachowicz wrote: On 10/24/06, Behdad Esfahbod [EMAIL PROTECTED] wrote: On Tue, 2006-10-24 at 16:05 -0400, Marco Barisione wrote: If you prefer you can pass --enable-system-pcre to use the system-supplied library but, if it's compiled without utf-8 support, g_regex_new fails. This is broken. It should err at configure time, not run time. The user shouldn't need to check the output of g_regex_new for failures, just like any other thing we do with glib. It should be possible to write an auto* check that basically checks whether something like: #include pcre.h int main(int argc, char ** argv) { int has_utf8_support; if(pcre_config(PCRE_CONFIG_UTF8, has_utf8_support)) return has_utf8_support; return 0; } returns '1' or '0'. If so, we should probably favor the system installation of PCRE over the glib-supplied one. At the expense of relying whatever older version of the Unicode Character Database that is using, and of course loading two sets of Unicode data tables into memory. PCRE itself is rather small compared to the data tables, so last time the conclusion was that using glib's probably makes more sense as they are already in memory anyway. Best, Dom -- behdad http://behdad.org/ Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill -- Dan Bern, New American Language ___ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list