Re: GRegex regular expression failing to match

2012-05-14 Thread David Nečas
On Mon, May 14, 2012 at 01:36:02PM -0800, Christopher Howard wrote:
 Is there anything wrong with the regexp,

Sure.  Two things.  It should be

^/[0-9]+$

not

^/d+$

First, it lacks the backslash to make \d a digit atom.  But, second,
since \d matches a digit (possibly whatever Unicode may say is a digit),
not 0-9, you should really use [0-9] to match ASCII digits.

Yeti
___
gtk-app-devel-list mailing list
gtk-app-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-app-devel-list


Re: GRegex(win32) : 500 tests passed, 3 failed

2007-03-16 Thread Marco Barisione
Il giorno gio, 15/03/2007 alle 18.41 +0100, Hans Breuer ha scritto:
 with only small modifications I was able to compile GRegex with msvc,
 thanks for providing an almost working makefile.msc ;-)
 [...]
 But now for the question: are these 3 failed specific to my build so I
 should investigate them further?

It's my fault, I wrote makefile.msc (without testing it) before the
release of PCRE 7.
PCRE 6.x can recognize as a newline one of \n, \r or \r\n. PCRE 7.x
added the ability to match any newline character, so I changed the
default value from 10 (\n) to -1 (PCRE_NEWLINE_ANY) in Makefile.am but
not in makefile.msc.

Sorry :)

-- 
Marco Barisione
http://www.barisione.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex(win32) : 500 tests passed, 3 failed

2007-03-15 Thread Jake Goulding
Having newlines seems suspicious. What kind of newlines are they?

Hans Breuer wrote:
 with only small modifications I was able to compile GRegex with msvc,
 thanks for providing an almost working makefile.msc ;-)

 The first attempt to run

   regex-test.exe --noisy

 did crash due to gnulib not liking

 g_strdup_vprintf (matching \%s\ against \%s\ \t, %, \p{Common})

 The attached patch works around this and also removes the
 #include glib.h from gregex.h. I think it is better to only include
 required sub-headers like almost all glib/*.h do.

 But now for the question: are these 3 failed specific to my build so I
 should investigate them further?

 Thanks,
   Hans

 matching a

 b

 c against ^b$ (start: 0, len: -1)  failed  (unexpected mismatch)
 matching a
 b
 c against ^b$ (start: 0, len: -1)  failed  (unexpected mismatch)

 matching a against a#
 b (start: 0, len: -1)failed  (unexpected match)


  Hans at Breuer dot Org ---
 Tell me what you need, and I'll tell you how to
 get along without it.-- Dilbert
   
 

 Index: glib/gregex.h
 ===
 --- glib/gregex.h (revision 5410)
 +++ glib/gregex.h (working copy)
 @@ -22,7 +22,8 @@
  #ifndef __G_REGEX_H__
  #define __G_REGEX_H__
  
 -#include glib.h
 +#include glib/gerror.h
 +#include glib/gstring.h
  
  G_BEGIN_DECLS
  
 Index: tests/regex-test.c
 ===
 --- tests/regex-test.c(revision 5409)
 +++ tests/regex-test.c(working copy)
 @@ -230,7 +230,10 @@
  gbooleanexpected)
  {
gboolean match;
 -  
 +  
 +  if (string[0] == '%'  string[1] == '\0')
 +  string = %%;
 +
verbose (matching \%s\ against \%s\ \t, string, pattern);
  
match = g_regex_match_simple (pattern, string, compile_opts, match_opts);
   
 

 ___
 gtk-devel-list mailing list
 gtk-devel-list@gnome.org
 http://mail.gnome.org/mailman/listinfo/gtk-devel-list
   

-- 

JAKE GOULDING
Software Engineer
[EMAIL PROTECTED]

Viví­simo [Search Done Right™]
1710 Murray Avenue
Pittsburgh, PA 15217 USA
tel: +1.412.422.2499 x105
fax: +1.412.422.2495
vivisimo.com  clusty.com

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-29 Thread Marco Barisione
Il giorno sab, 28/10/2006 alle 19.35 +0200, Murray Cumming ha scritto:
 If it's possible, it would be nice to avoid making it a GObject just to
 add easy reference counting. That tends to restrict how it can be
 wrapped by language bindings for whom automatic memory management is not
 the default.

It can't be a GObject because GRegex will be in libglib.

 I don't know exactly how it might be done in C (it's easy in C++), but I
 would hope that there's some way to reference-count anything without
 forcing the object itself to do the reference counting.

What do you mean? GRegex handles ref counting as other structures in
GLib.

-- 
Marco Barisione
http://www.barisione.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-25 Thread Marco Barisione
On 10/24/06, Behdad Esfahbod [EMAIL PROTECTED] wrote:
 On Tue, 2006-10-24 at 16:05 -0400, Marco Barisione wrote:
 This is broken.  It should err at configure time, not run time.  The
 user shouldn't need to check the output of g_regex_new for failures,
 just like any other thing we do with glib.

I have just uploaded a new patch that corrects this and some other problems.

I kept the run-time check, it's useful if cross-compiling or if the
installed PCRE library is updated.

-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-25 Thread Marco Barisione
On 10/24/06, Marco Barisione [EMAIL PROTECTED] wrote:
 As discussed some times ago [1] I propose to add a PCRE wrapper to GLib.
 Bug #50075 [2] contains a patch that adds it as a separate libgregex.
 The documentation of the new API is at [3] (yes, there are some
 unresolved problems with gtk-doc).

 Owen Taylor would prefer to have GRegex directly in the main GLib
 library:
 [...]

To give you an idea of the size of libgregex and libpcre, these are
the sizes of the stripped .so files on my computer:

libgregex with internal PCRE  138 KB
libgregex with system PCRE24  KB
libpcre with Unicode support  125 KB
libpcre without Unicode support   96  KB

-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-24 Thread Dominic Lachowicz
Hi Marco,

Please take my review with a grain of salt. I've been wanting a
convenience API on top of PCRE for a while now, and it would be great
if we could get something like GRegex into Glib proper.

1) Please don't name variables 'string', as there may be a conflict
with C++'s std::string

2) I noticed that there are g_regex_ref/unref() methods. Why did you
choose to do this, rather than subclass GObject? You would also then
have easy GObject-style accessors for the regex's pattern and
match_options.

3) Should there be a GRegexMatch object too? For instance, at least
Python and Java have a notion of a read-only Pattern and a Match
Set. Your design combines the two into a single GRegex object. Having
the pattern be read-only gets around your thread-safety gotcha
comment in the docs.

4) Python's search() and match() methods have a start position and
an end position argument, while your match_full() has only a start
position argument. Is there a reason for this? Could it be
implemented?

5) I didn't fully investigate, but Java and Python have a concept of
search vs. match with slightly different semantics. Is this semantic
distinction easily expressible in your API?

http://docs.python.org/lib/re-objects.html

6) GRegex requires that PCRE be built with UTF-8 support, which some
existing installations aren't. For reference, Gnumeric and Goffice get
around this by including a copy of PCRE in their distribution and
statically link it in. How do you ensure that GRegex finds a version
of PCRE compiled with UTF-8 support?

Thanks,
Dom

On 10/24/06, Marco Barisione [EMAIL PROTECTED] wrote:
 As discussed some times ago [1] I propose to add a PCRE wrapper to GLib.
 Bug #50075 [2] contains a patch that adds it as a separate libgregex.
 The documentation of the new API is at [3] (yes, there are some
 unresolved problems with gtk-doc).

 Owen Taylor would prefer to have GRegex directly in the main GLib
 library:
 (17:38:55) owen: is the latest plan for gregex really a separate
 library?
 (17:39:45) mclasen: owen: you would prefer it folded in ?
 (17:40:16) owen: mclasen: I think it makes tons more sense folded in. A
 regular expression facility is most useful if you can just use it when
 you need it
 (17:40:36) owen: mclasen: And on the desktop, having it folded in is
 purely a performance win
 (17:41:36) owen: if there is an embedded problem (how big is it
 anyways?) then a --without-regex configure option would be better
 (17:43:19) mclasen: owen: you are probably right

 What are your ideas?


 I would like to add to the documentation a simple and short tutorial on
 regular expressions and GRegex API. Does someone know something good
 (and with a compatible license) to copy?


 [1]
 http://mail.gnome.org/archives/gtk-devel-list/2006-July/msg00099.html

 [2] http://bugzilla.gnome.org/show_bug.cgi?id=50075

 [3] http://www.barisione.org/gregex/


 --
 Marco Barisione
 http://www.barisione.org/

 ___
 gtk-devel-list mailing list
 gtk-devel-list@gnome.org
 http://mail.gnome.org/mailman/listinfo/gtk-devel-list



-- 
Counting bodies like sheep to the rhythm of the war drums.
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-24 Thread Brian J. Tarricone
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/24/2006 10:17 AM, Dominic Lachowicz wrote:
 Hi Marco,
 
 Please take my review with a grain of salt. I've been wanting a
 convenience API on top of PCRE for a while now, and it would be great
 if we could get something like GRegex into Glib proper.
[...]
 2) I noticed that there are g_regex_ref/unref() methods. Why did you
 choose to do this, rather than subclass GObject? You would also then
 have easy GObject-style accessors for the regex's pattern and
 match_options.

In that case, GRegex couldn't be included in libglib proper.  It would
have to be in libgobject, or in a separate (libgregex?) library that
depends on libgobject.

-brian

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (MingW32)

iD8DBQFFPmxp6XyW6VEeAnsRAvniAKCL71koL8aWDduD1Xn+wnRVvgTI9QCfb2OP
NEvfq3v8t1K+EJ4PUiIh8z8=
=IL7u
-END PGP SIGNATURE-
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-24 Thread Marco Barisione
Il giorno mar, 24/10/2006 alle 13.17 -0400, Dominic Lachowicz ha
scritto:
 1) Please don't name variables 'string', as there may be a conflict
 with C++'s std::string

I think they were called string in the original version of GRegex
written by Scott Wimer in 1999. PCRE calls the string subject.

However it's not a problem with C++, this program is valid:
#include string
#include iostream

using namespace std;

int main ()
{
  string string = hello;
  cout  string  endl;
}

 2) I noticed that there are g_regex_ref/unref() methods. Why did you
 choose to do this, rather than subclass GObject? You would also then
 have easy GObject-style accessors for the regex's pattern and
 match_options.

The original plan was to include directly GRegex in GLib, so it cannot
depend on GObject. This could be changed if we decide to include GRegex
in a separate library.

However is really necessary to have a real object?

I added _ref and _unref because the only two programs that are currently
using my modified version of EggRegex are GtkSourceView and MooEdit.Both
programs need reference counting for regular expressions.

In Glib there are other structures that are reference counted without
being objects, such as GHashTable, GAsyncQueue, GIOChannel and others.

 3) Should there be a GRegexMatch object too? For instance, at least
 Python and Java have a notion of a read-only Pattern and a Match
 Set. Your design combines the two into a single GRegex object. Having
 the pattern be read-only gets around your thread-safety gotcha
 comment in the docs.

I know this but using them in a language with garbage collector is
easier. The regex class in QT uses the same approach of GRegex.

 4) Python's search() and match() methods have a start position and
 an end position argument, while your match_full() has only a start
 position argument. Is there a reason for this? Could it be
 implemented?

It has a length argument.

 5) I didn't fully investigate, but Java and Python have a concept of
 search vs. match with slightly different semantics. Is this semantic
 distinction easily expressible in your API?
 
 http://docs.python.org/lib/re-objects.html

In Python match matches only at the start of the string, search at any
position. You can have the match behavior adding a ^ at the beginning
of the string or passing the compile option G_REGEX_ANCHORED or the
match option G_REGEX_MATCH_ANCHORED.

I prefer to have only a function as I always this distinction in Python
a bit confusing.

 6) GRegex requires that PCRE be built with UTF-8 support, which some
 existing installations aren't. For reference, Gnumeric and Goffice get
 around this by including a copy of PCRE in their distribution and
 statically link it in. How do you ensure that GRegex finds a version
 of PCRE compiled with UTF-8 support?

The default for GRegex is to use its internal copy of PCRE. This is
automatically patched to use GLib for Unicode and memory management.

If you prefer you can pass --enable-system-pcre to use the
system-supplied library but, if it's compiled without utf-8 support,
g_regex_new fails.


-- 
Marco Barisione
http://www.barisione.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-24 Thread Murray Cumming
On Tue, 2006-10-24 at 22:05 +0200, Marco Barisione wrote:
 Il giorno mar, 24/10/2006 alle 13.17 -0400, Dominic Lachowicz ha
 scritto:
  1) Please don't name variables 'string', as there may be a conflict
  with C++'s std::string
 
 I think they were called string in the original version of GRegex
 written by Scott Wimer in 1999. PCRE calls the string subject.
 
 However it's not a problem with C++, this program is valid:
 #include string
 #include iostream
 
 using namespace std;
 
 int main ()
 {
   string string = hello;
   cout  string  endl;
 }

It's not necessary to challenge every compiler and every build
environment with that. A rename is easy.

  2) I noticed that there are g_regex_ref/unref() methods. Why did you
  choose to do this, rather than subclass GObject? You would also then
  have easy GObject-style accessors for the regex's pattern and
  match_options.
 
 The original plan was to include directly GRegex in GLib, so it cannot
 depend on GObject. This could be changed if we decide to include GRegex
 in a separate library.
 
 However is really necessary to have a real object?
 
 I added _ref and _unref because the only two programs that are currently
 using my modified version of EggRegex are GtkSourceView and MooEdit.Both
 programs need reference counting for regular expressions.
[snip]

Do they need to reference count plain strings too?


-- 
Murray Cumming
[EMAIL PROTECTED]
www.murrayc.com
www.openismus.com

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-24 Thread Behdad Esfahbod
On Tue, 2006-10-24 at 16:05 -0400, Marco Barisione wrote:
 
 If you prefer you can pass --enable-system-pcre to use the
 system-supplied library but, if it's compiled without utf-8 support,
 g_regex_new fails. 

This is broken.  It should err at configure time, not run time.  The
user shouldn't need to check the output of g_regex_new for failures,
just like any other thing we do with glib.

-- 
behdad
http://behdad.org/

Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill
-- Dan Bern, New American Language

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-24 Thread Dominic Lachowicz
On 10/24/06, Behdad Esfahbod [EMAIL PROTECTED] wrote:
 On Tue, 2006-10-24 at 16:05 -0400, Marco Barisione wrote:
 
  If you prefer you can pass --enable-system-pcre to use the
  system-supplied library but, if it's compiled without utf-8 support,
  g_regex_new fails.

 This is broken.  It should err at configure time, not run time.  The
 user shouldn't need to check the output of g_regex_new for failures,
 just like any other thing we do with glib.

It should be possible to write an auto* check that basically checks
whether something like:

#include pcre.h
int main(int argc, char ** argv) {
int has_utf8_support;
 if(pcre_config(PCRE_CONFIG_UTF8,  has_utf8_support))
   return has_utf8_support;
 return 0;
}

returns '1' or '0'. If so, we should probably favor the system
installation of PCRE over the glib-supplied one.

Best,
Dom
-- 
Counting bodies like sheep to the rhythm of the war drums.
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-24 Thread Behdad Esfahbod
On Tue, 2006-10-24 at 16:48 -0400, Dominic Lachowicz wrote:
 On 10/24/06, Behdad Esfahbod [EMAIL PROTECTED] wrote:
  On Tue, 2006-10-24 at 16:05 -0400, Marco Barisione wrote:
  
   If you prefer you can pass --enable-system-pcre to use the
   system-supplied library but, if it's compiled without utf-8 support,
   g_regex_new fails.
 
  This is broken.  It should err at configure time, not run time.  The
  user shouldn't need to check the output of g_regex_new for failures,
  just like any other thing we do with glib.
 
 It should be possible to write an auto* check that basically checks
 whether something like:
 
 #include pcre.h
 int main(int argc, char ** argv) {
 int has_utf8_support;
  if(pcre_config(PCRE_CONFIG_UTF8,  has_utf8_support))
return has_utf8_support;
  return 0;
 }
 
 returns '1' or '0'. If so, we should probably favor the system
 installation of PCRE over the glib-supplied one.

At the expense of relying whatever older version of the Unicode
Character Database that is using, and of course loading two sets of
Unicode data tables into memory.  PCRE itself is rather small compared
to the data tables, so last time the conclusion was that using glib's
probably makes more sense as they are already in memory anyway.

 Best,
 Dom
-- 
behdad
http://behdad.org/

Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill
-- Dan Bern, New American Language

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list