Re: utf-16 and glib (was: g_malloc overhead)

2009-01-30 Thread Maciej Piechotka
On Mon, 2009-01-26 at 22:49 +0100, Martin (OPENGeoMap) wrote:
> Maciej Piechotka escribió:
> > On Mon, 2009-01-26 at 22:30 +0100, Martin (OPENGeoMap) wrote:
> >   
> >> hi:
> >> 
>  
>  
> >>> Well - what do you mean? Having 2 functions - one reciving utf-16 and
> >>> one utf-8? To be honest - it doesn't make any sense to me (it would
> >>> create much mess, double the code, make programming errors easier...).
> >>>
> >>> Converting? What's wrong with g_utf16_to_utf8?
> >>>   
> >>>   
> >> I was talking about a full utf16 and utf8 api in glib and use a macro to 
> >> work work intermediate string:
> >>
> >> For example in windows they have this types:
> >> LPSTR =char *
> >> 
> >
> > char * is used for utf-8 AFAIR
> >
> >   
> >> LPWSTR= utf16windowschar *
> >>
> >> 
> >
> > gunichar2
> >
> >   
> >> perhaps in glib we could have utf16 and utf8 in that way or am i wrong?
> >>
> >> 
> >
> > I'm not glib developer. As far as the module of operating on utf-16
> > strings is proposed I'm not against. However I would prefere to not have
> > 2 entries to each function.
> >   
> 
> Hi:
> 
> What is wrong with:
> gchar*  g_utf8_strncpy  (gchar *dest,const gchar *src,gsize n);

That's one not needed as strncpy should work.

> gunichar2 *  g_utf16_strncpy  (gunichar2*dest,const gunichar2*src,gsize n);

That's kind of support I'm not against.

> and the macro:
> gtext*  g_text_strncpy  (gtext*dest,const gtext*src,gsize n);
> 
> 
> regards.
> 

With the entries - nothing. With macro - it may be just me but I percive
it shooting into foot. Just imagine that some header will assume gtext
to be utf-8. Other will turn on the macro (or user code) and change it
to utf-16. IMHO - having magic switch which might change the ABI is not
good.

Regards


signature.asc
Description: This is a digitally signed message part
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-30 Thread Maciej Piechotka
On Mon, 2009-01-26 at 22:30 +0100, Martin (OPENGeoMap) wrote:
> hi:
> >> 
> >
> > Well - what do you mean? Having 2 functions - one reciving utf-16 and
> > one utf-8? To be honest - it doesn't make any sense to me (it would
> > create much mess, double the code, make programming errors easier...).
> >
> > Converting? What's wrong with g_utf16_to_utf8?
> >   
> I was talking about a full utf16 and utf8 api in glib and use a macro to 
> work work intermediate string:
> 
> For example in windows they have this types:
> LPSTR =char *

char * is used for utf-8 AFAIR

> LPWSTR= utf16windowschar *
> 

gunichar2

> perhaps in glib we could have utf16 and utf8 in that way or am i wrong?
> 

I'm not glib developer. As far as the module of operating on utf-16
strings is proposed I'm not against. However I would prefere to not have
2 entries to each function.

Regards

> Regards.
> 
> 
> 


signature.asc
Description: This is a digitally signed message part
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-30 Thread PHILIP PAGE, BLOOMBERG/ 731 LEXIN
> strncpy() works fine for C strings that represent text in whatever
> multi-byte codeset (as long as it lacks zero bytes), like UTF-8,
> Microsoft's double-byte codepages, etc.
>
> (Well, I exaggerate, obviously if you want to be sure that multi-byte
> characters don't get truncated you shouldn't use strncpy(), but some
> encoding-aware function.)

That encoding-aware function would be g_utf8_strlcpy.
See http://bugzilla.gnome.org/show_bug.cgi?id=520116
This was entered by Behdad some time back. I have recently attached a proposed 
implementation. Maybe it can be committed.

Philip Page
---
The best laid schemes o' mice an' men gang aft aglay. -Burns
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-29 Thread muppet


On Jan 29, 2009, at 2:44 PM, Tor Lillqvist wrote:

this is even more true if you see there's a memcpy() function that  
is quite

the same as what strncpy() is.


No it isn't. strncpy() stops when it encounters a zero char (byte).
memcpy() always copies exactly the requested number of chars (bytes).


It's even more confusing than that.  strncpy() will touch n bytes.  If  
a nul is encountered in the source string before reaching n, then from  
there to n will be filled with zeroes.  If it reaches n before  
reaching the end of the source string, it simply stops, without  
terminating.  Hence the creation of strlcpy().


http://www.gratisoft.us/todd/papers/strlcpy.html

http://www.gtk.org/api/2.6/glib/glib-String-Utility-Functions.html#g-strlcpy

--
Doing a good job around here is like wetting your pants in a dark  
suit; you get a warm feeling, but no one notices.

  -- unknown


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-29 Thread Tor Lillqvist
> I don't think it is so confusing since I think strncpy() expects ASCII
> characters,

No. strncpy() expects C chars, half of which are not even in ASCII! In
other words bytes. It doesn't care at all whether the bytes represent
ASCII, EBCDIC, or whatever.

strncpy() works fine for C strings that represent text in whatever
multi-byte codeset (as long as it lacks zero bytes), like UTF-8,
Microsoft's double-byte codepages, etc.

(Well, I exaggerate, obviously if you want to be sure that multi-byte
characters don't get truncated you shouldn't use strncpy(), but some
encoding-aware function.)

> this is even more true if you see there's a memcpy() function that is quite
> the same as what strncpy() is.

No it isn't. strncpy() stops when it encounters a zero char (byte).
memcpy() always copies exactly the requested number of chars (bytes).

> Then considering both strncpy() and
> g_utf8_strncpy() takes the number of chars as the size argument

That is a quite misleading misuse of the term "char". g_utf8_strncpy()
takes the number of Unicode characters (code points), each of which is
represented by one or more bytes. Not "chars". Please let's stick to
using the term "char" to always mean what it means in C, i.e. "byte"
or "octet" (as long as we ignore weird architectures). If you mean the
more abstract concept "character", say so!

--tml
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-29 Thread Martín Vales

Tor Lillqvist escribió:

What is wrong with:
gchar*  g_utf8_strncpy  (gchar *dest,const gchar *src,gsize n);



It isn't needed. The nice thing about UTF-8 is that strings in UTF-8
can be handled with normal C str* functions just fine.
  

this function it really exist :-[ .
http://library.gnome.org/devel/glib/unstable/glib-Unicode-Manipulation.html#g-utf8-strncpy

n is the number of "real" chars not the number of bytes.

regards.


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-29 Thread Colomban Wendling
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tor Lillqvist a écrit :
> The existing g_utf8_strncpy() has it meaning characters. As such I
> think the name is bit unfortunate, because of the similarity to
> strncpy() but then different semantics of the "size" parameter.
>
> --tml
I don't think it is so confusing since I think strncpy() expects ASCII
characters, and ASCII characters are obviously 1-byte sized; this is
even more true if you see there's a memcpy() function that is quite
the same as what strncpy() is. Then considering both strncpy() and
g_utf8_strncpy() takes the number of chars as the size argument fixes
confusing if using it with what it was designed for (respectively
ASCII and UTF-8).
And when computing UTF-8 strings, I think it is obvious that if
there's an utf8_* function, it does the same as the C's one does with
ASCII string, no?


Regards,
Colomban
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmB3EYACgkQyqbACDEjVWhhFwCeNHKu1/wDCnuGwoCuHGczzFnK
1msAnRi633VAMvjhagG8+S36/P0AG1hI
=gg6h
-END PGP SIGNATURE-

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-29 Thread Xan Lopez
On Thu, Jan 29, 2009 at 5:02 PM, Xavier Bestel  wrote:
> On Thu, 2009-01-29 at 16:51 +0200, Tor Lillqvist wrote:
>> > I think strncpy() is one of the few that needs an utf8 equivalent,
>> > because a char may span several bytes.
>>
>> Well, he didn't say exactly what semantics he wanted his
>> g_utf8_strncpy() and g_utf16_strncpy() to have. In the UTF-8 case,
>> should the "size" mean characters or bytes? In the UTF-16 case,
>> characters or 16-bit units?
>>
>> The existing g_utf8_strncpy() has it meaning characters. As such I
>> think the name is bit unfortunate, because of the similarity to
>> strncpy() but then different semantics of the "size" parameter.
>
> Even if the meaning was "bytes", I think an utf8-aware function that
> avoids cutting in the middle of a multibyte char is a plus.

Then the meaning wouldn't be bytes anymore. It would be bytes with
some exceptions, which would A LOT more confusing.

>
>Xav
>
>
> ___
> gtk-devel-list mailing list
> gtk-devel-list@gnome.org
> http://mail.gnome.org/mailman/listinfo/gtk-devel-list
>
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-29 Thread Xan Lopez
On Thu, Jan 29, 2009 at 5:02 PM, Xavier Bestel  wrote:
> On Thu, 2009-01-29 at 16:51 +0200, Tor Lillqvist wrote:
>> > I think strncpy() is one of the few that needs an utf8 equivalent,
>> > because a char may span several bytes.
>>
>> Well, he didn't say exactly what semantics he wanted his
>> g_utf8_strncpy() and g_utf16_strncpy() to have. In the UTF-8 case,
>> should the "size" mean characters or bytes? In the UTF-16 case,
>> characters or 16-bit units?
>>
>> The existing g_utf8_strncpy() has it meaning characters. As such I
>> think the name is bit unfortunate, because of the similarity to
>> strncpy() but then different semantics of the "size" parameter.
>
> Even if the meaning was "bytes", I think an utf8-aware function that
> avoids cutting in the middle of a multibyte char is a plus.
>

Then the meaning wouldn't be bytes anymore. It would be bytes with
some exceptions, which would A LOT more confusing.

Xan

>Xav
>
>
> ___
> gtk-devel-list mailing list
> gtk-devel-list@gnome.org
> http://mail.gnome.org/mailman/listinfo/gtk-devel-list
>
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-29 Thread Xavier Bestel
On Thu, 2009-01-29 at 16:51 +0200, Tor Lillqvist wrote:
> > I think strncpy() is one of the few that needs an utf8 equivalent,
> > because a char may span several bytes.
> 
> Well, he didn't say exactly what semantics he wanted his
> g_utf8_strncpy() and g_utf16_strncpy() to have. In the UTF-8 case,
> should the "size" mean characters or bytes? In the UTF-16 case,
> characters or 16-bit units?
> 
> The existing g_utf8_strncpy() has it meaning characters. As such I
> think the name is bit unfortunate, because of the similarity to
> strncpy() but then different semantics of the "size" parameter.

Even if the meaning was "bytes", I think an utf8-aware function that
avoids cutting in the middle of a multibyte char is a plus.

Xav


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-29 Thread Tor Lillqvist
> I think strncpy() is one of the few that needs an utf8 equivalent,
> because a char may span several bytes.

Well, he didn't say exactly what semantics he wanted his
g_utf8_strncpy() and g_utf16_strncpy() to have. In the UTF-8 case,
should the "size" mean characters or bytes? In the UTF-16 case,
characters or 16-bit units?

The existing g_utf8_strncpy() has it meaning characters. As such I
think the name is bit unfortunate, because of the similarity to
strncpy() but then different semantics of the "size" parameter.

--tml
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-29 Thread Xavier Bestel
Hi Tor,

On Thu, 2009-01-29 at 16:37 +0200, Tor Lillqvist wrote:
> > What is wrong with:
> > gchar*  g_utf8_strncpy  (gchar *dest,const gchar *src,gsize n);
> 
> It isn't needed. The nice thing about UTF-8 is that strings in UTF-8
> can be handled with normal C str* functions just fine.

I think strncpy() is one of the few that needs an utf8 equivalent,
because a char may span several bytes.

Xav


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-29 Thread Tor Lillqvist
> What is wrong with:
> gchar*  g_utf8_strncpy  (gchar *dest,const gchar *src,gsize n);

It isn't needed. The nice thing about UTF-8 is that strings in UTF-8
can be handled with normal C str* functions just fine.

> gunichar2 *  g_utf16_strncpy  (gunichar2*dest,const gunichar2*src,gsize n);

Such a function might well be useful in some circumstances dealing
with interoperability or data formats, and I don't oppose adding it to
GLib. (Together with g_utf16_strcpy(), g_utf16_strcat() etc.)

But I don't think I have ever personally needed such a function in
platform-independent GTK code;)

(And in code that is inside a Windows ifdef, such functions aren't
needed either. The C library on Windows already has wcsncpy(),
wcscpy(), wcscat() etc.)

> and the macro:
> gtext*  g_text_strncpy  (gtext*dest,const gtext*src,gsize n);

Never, ever. Didn't the previous replies get this across strongly
enough? This idiocy is not something we want to copy from the stone
age Windows programming style.

(In current-day Windows-specific programming in C, I see no reason to
uglify your code with those TEXT() macros, TCHAR types, etc. Just use
wchar_t for characters, wchar_t literals (L'A'), and wchar_t string
literals (L"Foo"), and call the wide-char versions of C library and
Win32 API functions explicitly. Win9x is dead. No reason not to use
Unicode explicitly all the time.)

(And actually, why would one want to do Windows-specific programming
in general in C (or C++) any more... C# and Java are so much nicer.
And neither of them has any of this silly TEXT and TCHAR stuff.)

--tml
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-26 Thread Martin (OPENGeoMap)

Maciej Piechotka escribió:

On Mon, 2009-01-26 at 22:30 +0100, Martin (OPENGeoMap) wrote:
  

hi:




Well - what do you mean? Having 2 functions - one reciving utf-16 and
one utf-8? To be honest - it doesn't make any sense to me (it would
create much mess, double the code, make programming errors easier...).

Converting? What's wrong with g_utf16_to_utf8?
  
  
I was talking about a full utf16 and utf8 api in glib and use a macro to 
work work intermediate string:


For example in windows they have this types:
LPSTR =char *



char * is used for utf-8 AFAIR

  

LPWSTR= utf16windowschar *




gunichar2

  

perhaps in glib we could have utf16 and utf8 in that way or am i wrong?




I'm not glib developer. As far as the module of operating on utf-16
strings is proposed I'm not against. However I would prefere to not have
2 entries to each function.
  


Hi:

What is wrong with:
gchar*  g_utf8_strncpy  (gchar *dest,const gchar *src,gsize n);
gunichar2 *  g_utf16_strncpy  (gunichar2*dest,const gunichar2*src,gsize n);
and the macro:
gtext*  g_text_strncpy  (gtext*dest,const gtext*src,gsize n);


regards.


Regards

  

Regards.






___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-26 Thread Martin (OPENGeoMap)

hi:



Well - what do you mean? Having 2 functions - one reciving utf-16 and
one utf-8? To be honest - it doesn't make any sense to me (it would
create much mess, double the code, make programming errors easier...).

Converting? What's wrong with g_utf16_to_utf8?
  
I was talking about a full utf16 and utf8 api in glib and use a macro to 
work work intermediate string:


For example in windows they have this types:
LPSTR =char *
LPWSTR= utf16windowschar *

... and:
LPTSTR type. If we defined the _UNICODE macro is LPWSTR else LPSTR .

,...after they have a full api to manage utf16 and ansi strings, 
(strcat, strcpy, etc),

http://msdn.microsoft.com/en-us/library/h1x0y282.aspx
... and finally macros to use string in the same way _TEXT _T, etc.

_TEXT("are you defined _UNICODE macro?. Perhaps i am ansi or perhaps utf16")
http://msdn.microsoft.com/en-us/library/dd374074(VS.85).aspx 



perhaps in glib we could have utf16 and utf8 in that way or am i wrong?

Regards.



___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-26 Thread Maciej Piechotka
Martín Vales  writes:

> Colin Walters escribió:
>> On Mon, Jan 26, 2009 at 9:12 AM, Behdad Esfahbod  wrote:
>>   
>>> Lets just say that
>>> UTF-16 is at best implementation details of Firefox.
>>> 
>>
>> Well, JavaScript is notably UTF-16.  Given that the Web, Java and
>> .NET

To be honest - aren't web currently XML-based (XHTML & co.)? And isn't
UTF-8 default encoding, and acidentally the most widly used, for XML?

>
>> But yeah, there's no way POSIX/GNOME etc. could switch even if it made
>> sense to do so (which it clearly doesn't).
>>   
> Yes, i only talked about the overhead with utf8 outside of glib, only that.
> Perhaps the only solution is add more suport to utf16 in glib with
> more methods.
>

Well - what do you mean? Having 2 functions - one reciving utf-16 and
one utf-8? To be honest - it doesn't make any sense to me (it would
create much mess, double the code, make programming errors easier...).

Converting? What's wrong with g_utf16_to_utf8?

Regards
-- 
I've probably left my head... somewhere. Please wait untill I find it.
Homepage (pl_PL): http://uzytkownik.jogger.pl/
(GNU/)Linux User: #425935 (see http://counter.li.org/)

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-26 Thread Paul LeoNerd Evans
On Mon, Jan 26, 2009 at 12:57:28PM -0500, Owen Taylor wrote:
> On Mon, 2009-01-26 at 18:30 +0100, Martín Vales wrote:
> > Yes, i only talked about the overhead with utf8 outside of glib, only that.
> > Perhaps the only solution is add more suport to utf16 in glib with more 
> > methods.
> 
> There's zero point in talking about a "solution" until you have profile
> data indicating that there is a problem.

Indeed. UTF-16 is horribly broken by design, and any attempt made to
migrate in the direction _towards_ it is a flawed one, and should be
avoided.

UTF-8 is backward-compatible with the legacy str*() functions in C,
which, like it or not, will be around for a while yet. 

 * It makes sure not to embed any ASCII NUL ('\0') in the stream unless
   it means it, as U+, which makes it work with these functions. 
   
 * UTF-8 has nice properties in substring matches - grep can work on
   UTF-8 despite not knowing it, because no valid UTF-8 string ever appears
   falsely as a substring of another.

 * This also means that the only occurance of '\n' in a UTF-8 stream is
   a real one. This means that cat, head/tail, awk, etc... can properly
   detect where the linefeeds are. 'head' can print, say, the first 3
   lines of UTF-8 text without knowing it's UTF-8.

 * UTF-8 can be sorted by only sorting the encoded bytes. sort can sort
   a UTF-8-encoded text file. The order of the Unicode strings, is the
   same as the bytewise-sorted order of the raw bytes that encode it.

This list goes on.


Meanwhile, on the other end of the spectrum, storing Unicode data as
decoded 32bit integers makes some sense. It means string indexing
operations are constant-width - the substring between the 4th and 9th
characters in such an array will be known to lie between the 16th and
36th bytes. The presence of combining characters, and double-width
glyphs does make this transformation a bit harder, effectively reducing
the advantage such a scheme has.


Compared to that, UTF-16 offers NONE of these advantages. UTF-16 cannot
be passed through any legacy str*() function, nor will it work in grep,
sed, awk, cut, sort, head, tail, or in fact _any_ of the standard UNIX
text tools. Nor can UTF-16 be array indexed in constant time, because of
the surrogate pairs used to encode codepoints outside of the BMP (Basic
Multilingual Plane).


In Summary - UTF-16. Don't. Just Don't.

-- 
Paul "LeoNerd" Evans

leon...@leonerd.org.uk
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-26 Thread Owen Taylor
On Mon, 2009-01-26 at 18:30 +0100, Martín Vales wrote:
> Colin Walters escribió:
> > On Mon, Jan 26, 2009 at 9:12 AM, Behdad Esfahbod  wrote:
> >   
> >> Lets just say that
> >> UTF-16 is at best implementation details of Firefox.
> >> 
> >
> > Well, JavaScript is notably UTF-16.  Given that the Web, Java and .NET
> > (i.e. all the most important platforms) are all UTF-16 it's likely to
> > be with us for quite a while, so it's important to understand.
> >   
> Yes i only wanted say that. For example i work in c# and i would like 
> create glib libraries and use it in .net, but the "char" in mono/.NET is 
> utf16  and therefore i have there the same overhead.
> 
> The solution are 2:
> 
> 1.- conversion using glib ():
> http://library.gnome.org/devel/glib/2.19/glib-Unicode-Manipulation.html#gunichar2
> .-2. automatic NET conversion in the p/invoke side.
> 
> The 2 solutions have the same overhead.
> 
> > But yeah, there's no way POSIX/GNOME etc. could switch even if it made
> > sense to do so (which it clearly doesn't).
> >   
> Yes, i only talked about the overhead with utf8 outside of glib, only that.
> Perhaps the only solution is add more suport to utf16 in glib with more 
> methods.
> 

There's zero point in talking about a "solution" until you have profile
data indicating that there is a problem.

- Owen


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-26 Thread Martín Vales

Colin Walters escribió:

On Mon, Jan 26, 2009 at 9:12 AM, Behdad Esfahbod  wrote:
  

Lets just say that
UTF-16 is at best implementation details of Firefox.



Well, JavaScript is notably UTF-16.  Given that the Web, Java and .NET
(i.e. all the most important platforms) are all UTF-16 it's likely to
be with us for quite a while, so it's important to understand.
  
Yes i only wanted say that. For example i work in c# and i would like 
create glib libraries and use it in .net, but the "char" in mono/.NET is 
utf16  and therefore i have there the same overhead.


The solution are 2:

1.- conversion using glib ():
http://library.gnome.org/devel/glib/2.19/glib-Unicode-Manipulation.html#gunichar2
.-2. automatic NET conversion in the p/invoke side.

The 2 solutions have the same overhead.


But yeah, there's no way POSIX/GNOME etc. could switch even if it made
sense to do so (which it clearly doesn't).
  

Yes, i only talked about the overhead with utf8 outside of glib, only that.
Perhaps the only solution is add more suport to utf16 in glib with more 
methods.



Regards.


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-26 Thread Colin Walters
On Mon, Jan 26, 2009 at 9:12 AM, Behdad Esfahbod  wrote:
> Lets just say that
> UTF-16 is at best implementation details of Firefox.

Well, JavaScript is notably UTF-16.  Given that the Web, Java and .NET
(i.e. all the most important platforms) are all UTF-16 it's likely to
be with us for quite a while, so it's important to understand.

But yeah, there's no way POSIX/GNOME etc. could switch even if it made
sense to do so (which it clearly doesn't).
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-26 Thread Behdad Esfahbod
Martín Vales wrote:
> I can see the advantages of use utf8 but the true it´s most of people
> use utf16. I know gnome/linux/cairo/freedesktop promote utf8 but most
> people use utf16:
> http://unicode.org/notes/tn12/#Software_16

This is a very baseless claim.  One that actually turns out to be false.  Most
people don't right Windows code.  Most people read and write content on the
internet, and I bet more than 99% of the Unicode content on the net is in UTF-8.

As for the technical note you cite, it's a very biased document of its own.  I
once wrote a full critical review of it but can't find it.  Lets just say that
UTF-16 is at best implementation details of Firefox.  I can't see how that can
be relevant here.  Moreover, it's plain wrong that Python uses UTF-16.  Python
APIs are encoding-agnostic, and while Python 2.x can be compiled with UCS-2,
it's recommended that UCS-4 be enabled.  And note the difference: I said
UCS-2, not UTF-16.

UTF-16 is a disease.  It's variable-width, so it doesn't have the benefits of
UTF-32.  It's sixteen bit, so it doesn't have the ASCII-compatibility of
UTF-8.  Tell me one good thing about it other than "everyone made the mistake
of using it and now they have to keep doing that because they exposed it in
their API".

behdad
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-26 Thread Mathias Hasselmann
Am Montag, den 26.01.2009, 12:40 +0100 schrieb Martín Vales:
> Paul LeoNerd Evans escribió:
> > On Sun, 18 Jan 2009 17:43:57 +0100
> > Martín Vales  wrote:
> >
> >   
> >> Other overhead i see is the open dir/file funtions, where in windows we 
> >> need do the utf8 to utf16 everytime in windows. If JAVA,.NET and Qt use 
> >> utf16 by default why in gnome world we use utf8 by default?.
> >> 
> >
> > Probably one of the biggest reasons, is that UTF-8 does not use \0
> > octets, whereas UTF-16 does. This means that UTF-8 data can transparently
> > pass through all of the usual str*() functions in C, such as strlen(),
> > strcpy(), etc...
> >
> >   
> I can see the advantages of use utf8 but the true it´s most of people 
> use utf16. I know gnome/linux/cairo/freedesktop promote utf8 but most 
> people use utf16:
> http://unicode.org/notes/tn12/#Software_16

Currently C doesn't support for UTF-16 literals. The wchar_t type is 32
bits on Linux. So instead of:

do_something ("abc")

you'd suddenly have to write:

const utf16_t abc_literal[] = { 65, 66, 67, 0 }; /* "abc" */
do_something (abc_literal);

I really don't see how this would help.

Ciao,
Mathias
-- 
Mathias Hasselmann 
Personal Blog: http://taschenorakel.de/mathias/
Openismus GmbH: http://www.openismus.com/

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-26 Thread Martín Vales

Paul LeoNerd Evans escribió:

On Sun, 18 Jan 2009 17:43:57 +0100
Martín Vales  wrote:

  
Other overhead i see is the open dir/file funtions, where in windows we 
need do the utf8 to utf16 everytime in windows. If JAVA,.NET and Qt use 
utf16 by default why in gnome world we use utf8 by default?.



Probably one of the biggest reasons, is that UTF-8 does not use \0
octets, whereas UTF-16 does. This means that UTF-8 data can transparently
pass through all of the usual str*() functions in C, such as strlen(),
strcpy(), etc...

  
I can see the advantages of use utf8 but the true it´s most of people 
use utf16. I know gnome/linux/cairo/freedesktop promote utf8 but most 
people use utf16:

http://unicode.org/notes/tn12/#Software_16


Regards.

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-26 Thread Paul LeoNerd Evans
On Sun, 18 Jan 2009 17:43:57 +0100
Martín Vales  wrote:

> Other overhead i see is the open dir/file funtions, where in windows we 
> need do the utf8 to utf16 everytime in windows. If JAVA,.NET and Qt use 
> utf16 by default why in gnome world we use utf8 by default?.

Probably one of the biggest reasons, is that UTF-8 does not use \0
octets, whereas UTF-16 does. This means that UTF-8 data can transparently
pass through all of the usual str*() functions in C, such as strlen(),
strcpy(), etc...

-- 
Paul "LeoNerd" Evans

leon...@leonerd.org.uk
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-23 Thread Liam R E Quin
On Wed, 2009-01-21 at 10:21 +0100, BJörn Lindqvist wrote:
> 2009/1/21 Liam R E Quin :
> > On Mon, 2009-01-19 at 18:43 +0100, BJörn Lindqvist wrote:
> >> Actually, a custom allocator could be useful even in the general case.
> >> Malloc is a system call and has quite bad performance on certain
> >> platforms (windows in particular i think). Something like the gslice
> >> allocator could
> >> Probably improve performance a bit.
> >
> > malloc is a library call.
> 
> On Linux, it is implemented using mmap() and brk() which are system
> calls.

brk(2) is called to grow the heap, but not on every malloc() call;
mmap(2) is used only for large objects, and then not always.

If you malloc() a few megabytes and then call free, a program
that allocates a lot of small objects may well go faster on some
systems, and slower on others.

Yes, g_slice was tested, but the program _calling_ g_slice is in
the domain of the user, and errors in calling g_slice or malloc()
can be hard to debug.

No more from me on this.

Liam

> 
-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-23 Thread Liam R E Quin
On Mon, 2009-01-19 at 18:43 +0100, BJörn Lindqvist wrote:
> Actually, a custom allocator could be useful even in the general case.
> Malloc is a system call and has quite bad performance on certain
> platforms (windows in particular i think). Something like the gslice
> allocator could
> Probably improve performance a bit.

malloc is a library call.

It's not worth changing memory allocators unless you have a good
solid understanding of how your program uses memory, and have
done *very* detailed timings.

The main trade-offs are between space, time and complexity of code.

Errors in malloc() or other memory code can be very difficult to
find and debug, so it' an area to avoid if at all possible.

Having said all that, yes, using g_slice may help in some cases.
But you need to do timings and profiling, of course.

Liam

Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-23 Thread Emmanuel Rodriguez
On Tue, Jan 20, 2009 at 12:48 PM, Larry Reaves  wrote:
> On Tue, 2009-01-20 at 09:01 +0100, Martín Vales wrote:
>> BJörn Lindqvist escribió:
>> > Actually, a custom allocator could be useful even in the general case.
>> > Malloc is a system call and has quite bad performance on certain
>> > platforms (windows in particular i think). Something like the gslice
>> > allocator could
>> > Probably improve performance a bit.
>> >
>> gslice i believe use malloc internally. I believe you always need
>> malloc/new-(C/C++) because you depend on ms Windows API.
>>
>> I am not sure if you can build your own malloc because you depend on the
>> operating system.
> Sure, you must malloc to get new memory, but you can malloc bigger than
> what you need and hand out the extra memory later at a much lower cost.
>
I recall reading somewhere that mmap can be used to build custom
memory allocators. If that's true than one can bypass malloc. I think
that you can request memory through mmap by using MAP_ANONYMOUS.

Emmanuel Rodriguez
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-23 Thread Larry Reaves
On Tue, 2009-01-20 at 09:01 +0100, Martín Vales wrote:
> BJörn Lindqvist escribió:
> > Actually, a custom allocator could be useful even in the general case.
> > Malloc is a system call and has quite bad performance on certain
> > platforms (windows in particular i think). Something like the gslice
> > allocator could
> > Probably improve performance a bit.
> >   
> gslice i believe use malloc internally. I believe you always need 
> malloc/new-(C/C++) because you depend on ms Windows API.
> 
> I am not sure if you can build your own malloc because you depend on the 
> operating system.
Sure, you must malloc to get new memory, but you can malloc bigger than
what you need and hand out the extra memory later at a much lower cost.

-Larry



> 
> regards.
> 
> >
> > 2009/1/18, muppet :
> >   
> >> On Jan 18, 2009, at 11:43 AM, Martín Vales wrote:
> >>
> >> 
> >>> What are the advantages of use a glib_mem_vtable ???. I think we
> >>> have the same malloc function in all operating systems?
> >>>   
> >> This vtable allows you to swap in a different allocator with next to
> >> no effort.  Maybe it has special OOM handling, or uses a special pool
> >> or allocation algorithm tuned to your use-case, or does debugging
> >> logging work, or whatever.  The fact that the default is the same
> >> everywhere is a bit beside the point of having the functionality.
> >>
> >>
> >>
> >> --
> >> Me:  What's that in your mouth?
> >> Zella:  *swallows laboriously*  Nothing.
> >> Me:  What did you just swallow?
> >> Zella:  A booger.
> >> Me:  Baby girl, don't eat boogers.  That's gross.
> >> Zella:  But it was in my nose.
> >>
> >> ___
> >> gtk-devel-list mailing list
> >> gtk-devel-list@gnome.org
> >> http://mail.gnome.org/mailman/listinfo/gtk-devel-list
> >>
> >> 
> >
> >
> >   
> 
> ___
> gtk-app-devel-list mailing list
> gtk-app-devel-l...@gnome.org
> http://mail.gnome.org/mailman/listinfo/gtk-app-devel-list

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-22 Thread Maciej Piechotka
Martín Vales  writes:

> hi:
>
> I working with visual c++ in Windows and i find glib very useful for
> many C task, but i am worry about the g_malloc overhead.
>
> We really need a new malloc??
>
> gpointer
> g_malloc (gsize n_bytes)
> {
>  if (G_UNLIKELY (!g_mem_initialized))
>g_mem_init_nomessage();
>  if (G_LIKELY (n_bytes))
>{
>  gpointer mem;
>
>  mem = glib_mem_vtable.malloc (n_bytes);
>  if (mem)
>return mem;
>
>  g_error ("%s: failed to allocate %"G_GSIZE_FORMAT" bytes",
>   G_STRLOC, n_bytes);
>}
>
>  return NULL;
> }
>
>
>
>
>
> What are the advantages of use a glib_mem_vtable ???. I think we have
> the same malloc function in all operating systems?.
> static GMemVTable glib_mem_vtable = {
>  standard_malloc,
>  standard_realloc,
>  standard_free,
>  standard_calloc,
>  standard_try_malloc,
>  standard_try_realloc,
> };
>

g_malloc will abort program when no additional memory is avaible (as
usually programers do not care about handling it as it would require
usually... allocating memory). 

From g_try_malloc:
"Attempts to allocate n_bytes, and returns NULL on failure. Contrast
with g_malloc(), which aborts the program on failure. " 

>
> Other overhead i see is the open dir/file funtions, where in windows
> we need do the utf8 to utf16 everytime in windows. If JAVA,.NET and Qt
> use utf16 by default why in gnome world we use utf8 by default?.
>

I guess that:
1. Because utf-8 is currently the main coding for unicode I guess (see
xml & co.)
2. Because the most strings in latin alphabet will be nearly 2x smaller
then in utf-16 (on average in my mother language AFAIR utf-8 is bigger
by a few % then iso-8859-2 - utf-16 would by 100% bigger)
3. I guess that utf-8 is a standard on main Gnome platform -
GNU/Linux. While I met in many places generating xx_XX.UTF-8 locales
I've never encountered utf-16.
4. utf-16 is not fixed size so this is not an advantage over utf-8
(utf-32 is).

Regards
-- 
I've probably left my head... somewhere. Please wait untill I find it.
Homepage (pl_PL): http://uzytkownik.jogger.pl/
(GNU/)Linux User: #425935 (see http://counter.li.org/)

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-21 Thread Tor Lillqvist
> Malloc is a system call and has quite bad performance on certain
> platforms (windows in particular i think).

Malloc is not a system call. And please don't make performance
assumptions without having benchmark data to back it up. Note that it
is not necessarily that clear what is a "system call" on Windows, as
far as I know.

> Something like the gslice allocator could probably improve performance a bit.

At least the g_slice_free() API requires passing the size of the
block, so it is not possible to simply have g_malloc() call
g_slice_alloc(), and g_free() and g_realloc() call g_slice_free().

If you start adding a bookkeeping layer to keep track of the size of
each allocation, you end up with a bunch of code that might well
correspond to what the C library's malloc, or the heap management code
in the kernel32 library (which is code running at user level, not in
the kernel, as far as I know) that it calls, already does anyway.

--tml
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-21 Thread BJörn Lindqvist
2009/1/21 Liam R E Quin :
> On Mon, 2009-01-19 at 18:43 +0100, BJörn Lindqvist wrote:
>> Actually, a custom allocator could be useful even in the general case.
>> Malloc is a system call and has quite bad performance on certain
>> platforms (windows in particular i think). Something like the gslice
>> allocator could
>> Probably improve performance a bit.
>
> malloc is a library call.

On Linux, it is implemented using mmap() and brk() which are system
calls. The point is that malloc usually translates into one or more
system calls which are expensive. With a custom allocator the system
call part of malloc can be avoided.

> It's not worth changing memory allocators unless you have a good
> solid understanding of how your program uses memory, and have
> done *very* detailed timings.

You are right of course. For GSlice in particular, it was tested
thoroughly when it was merged to glib. See
http://markmail.org/message/ohmuxdfyttuy4ipa. For gtk programs I
believe we have quite good understanding on how applications use
memory.

Another example is Python which also uses a custom memory allocator.
It works very well because Python uses lots of short-lived small
objects.


-- 
mvh Björn
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-20 Thread Martín Vales

BJörn Lindqvist escribió:

Actually, a custom allocator could be useful even in the general case.
Malloc is a system call and has quite bad performance on certain
platforms (windows in particular i think). Something like the gslice
allocator could
Probably improve performance a bit.
  
gslice i believe use malloc internally. I believe you always need 
malloc/new-(C/C++) because you depend on ms Windows API.


I am not sure if you can build your own malloc because you depend on the 
operating system.


regards.



2009/1/18, muppet :
  

On Jan 18, 2009, at 11:43 AM, Martín Vales wrote:



What are the advantages of use a glib_mem_vtable ???. I think we
have the same malloc function in all operating systems?
  

This vtable allows you to swap in a different allocator with next to
no effort.  Maybe it has special OOM handling, or uses a special pool
or allocation algorithm tuned to your use-case, or does debugging
logging work, or whatever.  The fact that the default is the same
everywhere is a bit beside the point of having the functionality.



--
Me:  What's that in your mouth?
Zella:  *swallows laboriously*  Nothing.
Me:  What did you just swallow?
Zella:  A booger.
Me:  Baby girl, don't eat boogers.  That's gross.
Zella:  But it was in my nose.

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list





  


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-19 Thread BJörn Lindqvist
Actually, a custom allocator could be useful even in the general case.
Malloc is a system call and has quite bad performance on certain
platforms (windows in particular i think). Something like the gslice
allocator could
Probably improve performance a bit.


2009/1/18, muppet :
>
> On Jan 18, 2009, at 11:43 AM, Martín Vales wrote:
>
>> What are the advantages of use a glib_mem_vtable ???. I think we
>> have the same malloc function in all operating systems?
>
> This vtable allows you to swap in a different allocator with next to
> no effort.  Maybe it has special OOM handling, or uses a special pool
> or allocation algorithm tuned to your use-case, or does debugging
> logging work, or whatever.  The fact that the default is the same
> everywhere is a bit beside the point of having the functionality.
>
>
>
> --
> Me:  What's that in your mouth?
> Zella:  *swallows laboriously*  Nothing.
> Me:  What did you just swallow?
> Zella:  A booger.
> Me:  Baby girl, don't eat boogers.  That's gross.
> Zella:  But it was in my nose.
>
> ___
> gtk-devel-list mailing list
> gtk-devel-list@gnome.org
> http://mail.gnome.org/mailman/listinfo/gtk-devel-list
>


-- 
mvh Björn
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-18 Thread muppet


On Jan 18, 2009, at 11:43 AM, Martín Vales wrote:

What are the advantages of use a glib_mem_vtable ???. I think we  
have the same malloc function in all operating systems?


This vtable allows you to swap in a different allocator with next to  
no effort.  Maybe it has special OOM handling, or uses a special pool  
or allocation algorithm tuned to your use-case, or does debugging  
logging work, or whatever.  The fact that the default is the same  
everywhere is a bit beside the point of having the functionality.




--
Me:  What's that in your mouth?
Zella:  *swallows laboriously*  Nothing.
Me:  What did you just swallow?
Zella:  A booger.
Me:  Baby girl, don't eat boogers.  That's gross.
Zella:  But it was in my nose.

___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


Re: g_malloc overhead

2009-01-18 Thread Colin Walters
On Sun, Jan 18, 2009 at 11:43 AM, Martín Vales  wrote:
>
> Other overhead i see is the open dir/file funtions, where in windows we need
> do the utf8 to utf16 everytime in windows. If JAVA,.NET and Qt use utf16 by
> default why in gnome world we use utf8 by default?.

Historically, Unix was a late adopter of Unicode.  And crucially, the
Unicode designers originally thought 16 bits would be enough.  So Java
was explicitly designed around Unicode and specifically UTF-16, and
Windows was a relatively early adopter.  Only later did it became
clear that more code point space was needed, and also that UTF-8
specifically had a number of advantages.

Strings and encodings are actually a pretty interesting subject I
think, and for any programmer it's worth taking some time to read
available material on the web at least.
___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list


g_malloc overhead

2009-01-18 Thread Martín Vales

hi:

I working with visual c++ in Windows and i find glib very useful for 
many C task, but i am worry about the g_malloc overhead.


We really need a new malloc??

gpointer
g_malloc (gsize n_bytes)
{
 if (G_UNLIKELY (!g_mem_initialized))
   g_mem_init_nomessage();
 if (G_LIKELY (n_bytes))
   {
 gpointer mem;

 mem = glib_mem_vtable.malloc (n_bytes);
 if (mem)
   return mem;

 g_error ("%s: failed to allocate %"G_GSIZE_FORMAT" bytes",
  G_STRLOC, n_bytes);
   }

 return NULL;
}





What are the advantages of use a glib_mem_vtable ???. I think we have 
the same malloc function in all operating systems?.

static GMemVTable glib_mem_vtable = {
 standard_malloc,
 standard_realloc,
 standard_free,
 standard_calloc,
 standard_try_malloc,
 standard_try_realloc,
};


Other overhead i see is the open dir/file funtions, where in windows we 
need do the utf8 to utf16 everytime in windows. If JAVA,.NET and Qt use 
utf16 by default why in gnome world we use utf8 by default?.


.
#ifdef G_OS_WIN32
 wpath = g_utf8_to_utf16 (path, -1, NULL, NULL, error);

 if (wpath == NULL)
   return NULL;

 dir = g_new (GDir, 1);

 dir->wdirp = _wopendir (wpath);
 g_free (wpath);

 if (dir->wdirp)
   return dir;

.
Regards.


  


___
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list