Re: let g_warn_if_fail replace g_assert

2007-10-17 Thread Marco Barisione
Il giorno mer, 17/10/2007 alle 11.56 +0200, Tim Janik ha scritto:
 - add g_warn_if_fail (condition); which produces a critical
warning about failing assertions but contrary to g_assert
returns.

If it's called g_warn_if_fail() I would expect a g_warning() not a
g_critical().

-- 
Marco Barisione
http://www.barisione.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: turning g_assert* into warnings

2007-10-12 Thread Marco Barisione
Il giorno ven, 12/10/2007 alle 15.16 +0200, Tim Janik ha scritto:
 please reread my reasoning about G_DISABLE_ASSERT, there already is no 
 behavior
 of g_assert() you could rely on. (and some distributions do build their
 binaries with G_DISABLE_ASSERT and/or G_DISABLE_CHECKS defined).

What distributions? Excluding Gentoo and other distros that allow the
user to choose how to build everything.

-- 
Marco Barisione
http://www.barisione.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Performance implications of GRegex structure

2007-03-16 Thread Marco Barisione
Il giorno gio, 15/03/2007 alle 10.18 -0400, Owen Taylor ha scritto:
 But looking over the header file, there is something that puzzles me
 about the way that it's set up: there is no distinction between a
 pattern/regular expression object and a match/matcher object.

The internal code in GRegex was deeply modified but the API is quite
similar to the original one written by Scott Wimer and then modified by
Matthias Clasen, so I kept a single GRegex object but with lots of
doubts.

In the end I decided to keep a single object because I prefer this
approach when using languages without a garbage collector and because
QRegExp (the equivalent object in QT) is a single object.

This matter was brought out in the mailing list and in bugzilla but only
Havoc Pennington and Yevgen Muntyan expressed their opinion saying that
they prefer a single object.

BTW if you want I can split GRegex in two separate objects.

-- 
Marco Barisione
http://www.barisione.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex(win32) : 500 tests passed, 3 failed

2007-03-16 Thread Marco Barisione
Il giorno gio, 15/03/2007 alle 18.41 +0100, Hans Breuer ha scritto:
 with only small modifications I was able to compile GRegex with msvc,
 thanks for providing an almost working makefile.msc ;-)
 [...]
 But now for the question: are these 3 failed specific to my build so I
 should investigate them further?

It's my fault, I wrote makefile.msc (without testing it) before the
release of PCRE 7.
PCRE 6.x can recognize as a newline one of \n, \r or \r\n. PCRE 7.x
added the ability to match any newline character, so I changed the
default value from 10 (\n) to -1 (PCRE_NEWLINE_ANY) in Makefile.am but
not in makefile.msc.

Sorry :)

-- 
Marco Barisione
http://www.barisione.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-29 Thread Marco Barisione
Il giorno sab, 28/10/2006 alle 19.35 +0200, Murray Cumming ha scritto:
 If it's possible, it would be nice to avoid making it a GObject just to
 add easy reference counting. That tends to restrict how it can be
 wrapped by language bindings for whom automatic memory management is not
 the default.

It can't be a GObject because GRegex will be in libglib.

 I don't know exactly how it might be done in C (it's easy in C++), but I
 would hope that there's some way to reference-count anything without
 forcing the object itself to do the reference counting.

What do you mean? GRegex handles ref counting as other structures in
GLib.

-- 
Marco Barisione
http://www.barisione.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-25 Thread Marco Barisione
On 10/24/06, Behdad Esfahbod [EMAIL PROTECTED] wrote:
 On Tue, 2006-10-24 at 16:05 -0400, Marco Barisione wrote:
 This is broken.  It should err at configure time, not run time.  The
 user shouldn't need to check the output of g_regex_new for failures,
 just like any other thing we do with glib.

I have just uploaded a new patch that corrects this and some other problems.

I kept the run-time check, it's useful if cross-compiling or if the
installed PCRE library is updated.

-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-25 Thread Marco Barisione
On 10/24/06, Marco Barisione [EMAIL PROTECTED] wrote:
 As discussed some times ago [1] I propose to add a PCRE wrapper to GLib.
 Bug #50075 [2] contains a patch that adds it as a separate libgregex.
 The documentation of the new API is at [3] (yes, there are some
 unresolved problems with gtk-doc).

 Owen Taylor would prefer to have GRegex directly in the main GLib
 library:
 [...]

To give you an idea of the size of libgregex and libpcre, these are
the sizes of the stripped .so files on my computer:

libgregex with internal PCRE  138 KB
libgregex with system PCRE24  KB
libpcre with Unicode support  125 KB
libpcre without Unicode support   96  KB

-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


GRegex

2006-10-24 Thread Marco Barisione
As discussed some times ago [1] I propose to add a PCRE wrapper to GLib.
Bug #50075 [2] contains a patch that adds it as a separate libgregex.
The documentation of the new API is at [3] (yes, there are some
unresolved problems with gtk-doc).

Owen Taylor would prefer to have GRegex directly in the main GLib
library:
(17:38:55) owen: is the latest plan for gregex really a separate
library?
(17:39:45) mclasen: owen: you would prefer it folded in ?
(17:40:16) owen: mclasen: I think it makes tons more sense folded in. A
regular expression facility is most useful if you can just use it when
you need it
(17:40:36) owen: mclasen: And on the desktop, having it folded in is
purely a performance win
(17:41:36) owen: if there is an embedded problem (how big is it
anyways?) then a --without-regex configure option would be better
(17:43:19) mclasen: owen: you are probably right

What are your ideas?


I would like to add to the documentation a simple and short tutorial on
regular expressions and GRegex API. Does someone know something good
(and with a compatible license) to copy?


[1]
http://mail.gnome.org/archives/gtk-devel-list/2006-July/msg00099.html

[2] http://bugzilla.gnome.org/show_bug.cgi?id=50075

[3] http://www.barisione.org/gregex/


-- 
Marco Barisione
http://www.barisione.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: GRegex

2006-10-24 Thread Marco Barisione
Il giorno mar, 24/10/2006 alle 13.17 -0400, Dominic Lachowicz ha
scritto:
 1) Please don't name variables 'string', as there may be a conflict
 with C++'s std::string

I think they were called string in the original version of GRegex
written by Scott Wimer in 1999. PCRE calls the string subject.

However it's not a problem with C++, this program is valid:
#include string
#include iostream

using namespace std;

int main ()
{
  string string = hello;
  cout  string  endl;
}

 2) I noticed that there are g_regex_ref/unref() methods. Why did you
 choose to do this, rather than subclass GObject? You would also then
 have easy GObject-style accessors for the regex's pattern and
 match_options.

The original plan was to include directly GRegex in GLib, so it cannot
depend on GObject. This could be changed if we decide to include GRegex
in a separate library.

However is really necessary to have a real object?

I added _ref and _unref because the only two programs that are currently
using my modified version of EggRegex are GtkSourceView and MooEdit.Both
programs need reference counting for regular expressions.

In Glib there are other structures that are reference counted without
being objects, such as GHashTable, GAsyncQueue, GIOChannel and others.

 3) Should there be a GRegexMatch object too? For instance, at least
 Python and Java have a notion of a read-only Pattern and a Match
 Set. Your design combines the two into a single GRegex object. Having
 the pattern be read-only gets around your thread-safety gotcha
 comment in the docs.

I know this but using them in a language with garbage collector is
easier. The regex class in QT uses the same approach of GRegex.

 4) Python's search() and match() methods have a start position and
 an end position argument, while your match_full() has only a start
 position argument. Is there a reason for this? Could it be
 implemented?

It has a length argument.

 5) I didn't fully investigate, but Java and Python have a concept of
 search vs. match with slightly different semantics. Is this semantic
 distinction easily expressible in your API?
 
 http://docs.python.org/lib/re-objects.html

In Python match matches only at the start of the string, search at any
position. You can have the match behavior adding a ^ at the beginning
of the string or passing the compile option G_REGEX_ANCHORED or the
match option G_REGEX_MATCH_ANCHORED.

I prefer to have only a function as I always this distinction in Python
a bit confusing.

 6) GRegex requires that PCRE be built with UTF-8 support, which some
 existing installations aren't. For reference, Gnumeric and Goffice get
 around this by including a copy of PCRE in their distribution and
 statically link it in. How do you ensure that GRegex finds a version
 of PCRE compiled with UTF-8 support?

The default for GRegex is to use its internal copy of PCRE. This is
automatically patched to use GLib for Unicode and memory management.

If you prefer you can pass --enable-system-pcre to use the
system-supplied library but, if it's compiled without utf-8 support,
g_regex_new fails.


-- 
Marco Barisione
http://www.barisione.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: EggRegex

2006-07-25 Thread Marco Barisione
Marco Barisione wrote:
 My version of EggRegex is at http://techn.ocracy.org/eggregex/ and a 
 copy of the documentation is at http://www.barisione.org/eggregex/

And a tar.gz generated by make dist is at 
http://www.barisione.org/eggregex/eggregex-0.1.tar.gz

In these days I did some changes, IMHO EggRegex is now decent and 
usable. Is there someone that can review the code (and the docs as I'm a 
non-english speaker)?

-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: EggRegex

2006-07-22 Thread Marco Barisione
Marco Barisione wrote:
 Can someone take a look to pcre/ucptable.c, pcre/ucp.h and 
 pcre/pcre_ucp_searchfuncs.c?

Now the internal PCRE uses glib for Unicode properties.

There is a problem, PCRE allows script names in \p{}, so you can match 
an arabic character using \p{Arabic}. But AFAIK glib does not know about 
scripts.

gucharmap handles this internally but I can't copy the code because, as 
far as I know, it's under GPL and not LGPL.


However I think that the better solution is to add this directly to glib:

typedef enum
{
   G_UNICODE_SCRIPT_ARABIC,
   G_UNICODE_SCRIPT_ARMENIAN,
   ...
   G_UNICODE_SCRIPT_UGARITIC
} GUnicodeScript;

/* returns the script of c */
GUnicodeScript g_unichar_get_script(gunichar c);

/* returns the (translated?) name of the script */
const gchar *g_unichar_get_script_name(GUnicodeScript script);


-- 
Marco Barisione
http://www.barisione.org/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: EggRegex

2006-07-22 Thread Marco Barisione
Marco Barisione wrote:
 gucharmap handles this internally but I can't copy the code because, as 
 far as I know, it's under GPL and not LGPL.

I was wrong, it's in the library and not in the app so it's LGPLed.

What should I do with the scripts? Obviously eggregex cannot depend of 
libgucharmap.

-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: EggRegex

2006-07-21 Thread Marco Barisione
Owen Taylor wrote:

 Considering that a large amount of the size of GLib is Unicode tables,
 it's almost certainly better that a few apps have two copies of the PCRE
 code than all processes have two copies of the Unicode tables.

Using the internal copy, if there is a security bug in PCRE, distros 
have to update two libraries instead of just libpcre.


Can someone take a look to pcre/ucptable.c, pcre/ucp.h and 
pcre/pcre_ucp_searchfuncs.c?

The files are here:
http://techn.ocracy.org/eggregex/?f=03897669abe3;file=pcre/ucp.h;style=raw
http://techn.ocracy.org/eggregex/?f=3ad939693cb3;file=pcre/ucptable.c;style=raw
http://techn.ocracy.org/eggregex/?f=b567294355b0;file=pcre/pcre_ucp_searchfuncs.c;style=raw

I need some advice to do this.

-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: EggRegex

2006-07-20 Thread Marco Barisione
Matthias Clasen wrote:
 When I was last looking at regular expressions for GLib (which
 resulted in the current eggregex code), the first decision was to
 go for Perl regular expression, rather than posix. That naturally
 leads to PCRE. The main gripe with PCRE was (and is) that it
 had (and probably still has) relatively limited Unicode support.

The version of eggregex in libegg uses the three years old pcre 4.5. Now 
pcre 6.7 has a better support for Unicode.

Now PCRE:
- handles UTF-8
- knows that, doing a caseless match, à matches À
- has generic character types for non ASCII characters, so \p{Lt} 
matches a title case letter, \p{Sc} matches a currency symbol, and so on

Extended properties such as Greek or InMusicalSymbols are not supported.

 And it brings its own implementation of the necessary Unicode
 data, instead of using the GLib one.

Yes, but it shouldn't be too difficult to port pcre to use glib for 
Unicode. I can't do it because my knowledge of Unicode is very limited.

However this would mean that we should always use the internal PCRE 
instead of the system supplied one.

-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


EggRegex

2006-07-19 Thread Marco Barisione
Hi,
GtkSourceView 2 will have a new syntax highlighting engine that will 
require a more powerful and fast regular expression library. This is why 
I worked on EggRegex (a wrapper library around PCRE) to correct bugs and 
to add new features.

My version of EggRegex is at http://techn.ocracy.org/eggregex/ and a 
copy of the documentation is at http://www.barisione.org/eggregex/

EggRegex was originally written by Scott Wimer to be included in glib, 
renaming it to GRegex. However including it in glib would mean adding a 
dependency to libpcre or linking it statically increasing the size of 
glib (a stripped libeggregex is 144 KB on my computer).

So my question is: what should be the future of EggRegex?
If it will not be included in glib what do you think about having a 
separate libgregex?
If it will be a separate library can I use the name GRegex or should I 
choose another name without using the G namespace?

-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: EggRegex

2006-07-19 Thread Marco Barisione
Behdad Esfahbod wrote:
 Last time I checked, PCRE's didn't use Unicode Character Database to
 classify characters and so is a poor choice for a highlighting engine
 and definitely suboptimal in GNOME.

It supports utf-8 and Unicode properties. Don't ask me more about this 
because I know very little about Unicode :)

  I believe GNOME
 should use the GNU regexp engine.

It's slower and doesn't support some patterns supported by pcre.

PCRE benefits:

- it's faster

- has more advanced regular expressions

- supports partial matching (using the pattern ^ab against the string a 
the match fails but pcre knows that there is a partial match so adding 
more characters may lead to a match), see 
http://www.barisione.org/eggregex/eggregex-eggregex.html#egg-regex-is-partial-match

- DFA matching (matching .* against abc you get a, ab and 
abc)


-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: EggRegex

2006-07-19 Thread Marco Barisione
Simos Xenitellis wrote:
 Per http://www.pcre.org/pcre.txt
 
The current implementation of PCRE (release
6.x) corresponds approximately with Perl  5.8,  including  support  for
UTF-8 encoded strings and Unicode general category properties. However,
this support has to be explicitly enabled; it is not the default.

Today most distributions ship a copy of pcre that supports utf-8 and 
unicode properties.

However you can pass --enable-internal-pcre to configure to statically 
link an internal copy of pcre.

If eggregex links to a pcre version without unicode egg_regex_new() 
prints an error message and fails.

-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: EggRegex

2006-07-19 Thread Marco Barisione
Tristan Van Berkom wrote:
 If eggregex links to a pcre version without unicode egg_regex_new() 
 prints an error message and fails.
 
 Are you sugesting that highlighting be a site-dependant feature ?
 i.e. g_regexp_supported() ... similar to g_thread_supported() ?

No, I'm saying that if you link against a pcre that does not support 
Unicode you will see immediately that something is not working, so you 
can use the internal copy of pcre.

-- 
Marco Barisione
http://www.barisione.org/
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Gtk+ print support - request for feedback

2006-03-07 Thread Marco Barisione

Alexander Larsson wrote:

  locale_data = localeconv ();
  decimal_point = locale_data-decimal_point;
...
  val = strtod (nptr, fail_pos);


What happens if another thread calls setlocale() after localeconv() but 
before strtod()?


--
Marco Barisione
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: Announcing: Project Ridley

2005-08-25 Thread Marco Barisione

Jonathan Blandford wrote:

The primary goal of Project Ridley is to cut down on the number of
problem libraries that are part of the GNOME platform.  We propose to do
this by moving functionality into GTK+, wherever it makes sense.


What about EggRegex?

--
Marco Barisione
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list