php-i18n Digest 12 May 2008 06:37:34 -0000 Issue 392

Topics (messages 1183 through 1188):

Re: ubuntu 7.10 pecl install intl
        1183 by: Ed Batutis
        1185 by: Darren Cook
        1186 by: Ed Batutis
        1187 by: Darren Cook
        1188 by: Stanislav Malyshev

Re: proposal: unification of the grapheme_extract functions
        1184 by: Texin, Tex

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------
--- Begin Message ---
> Has anyone managed to install "intl" pecl extension on ubuntu?

Yes - ubuntu 8.04 64bit.

> I try just pressing Enter and get lots of output ending ... failed

I just pressed ENTER and it found icu. What it is looking for is 

  bin/icu-config

On my system this is in /usr. If the default empty path doesn't work, I'd
try '/usr'. If you can't find 'icu-config' on your system then you don't
have libXXicu-dev installed.

=Ed Batutis



--- End Message ---
--- Begin Message ---
Ed Batutis wrote:
>> Has anyone managed to install "intl" pecl extension on ubuntu?

Hi Ed,
Thanks for the reply. I have icu-config in /usr/bin, but get the same error:

...
checking whether to enable internationalization support... yes, shared
/tmp/pear/temp/intl/configure: line 3838: syntax error near unexpected
token `INTL_SHARED_LIBADD'
/tmp/pear/temp/intl/configure: line 3838: `
PHP_SETUP_ICU(INTL_SHARED_LIBADD)'
ERROR: `/tmp/pear/temp/intl/configure --with-icu-dir=/usr' failed


I've had a look in the files that exist at that point and PHP_SETUP_ICU
only exists on that one line; it doesn't seem to be defined anywhere
(not in configure, nor in any other of the files in the
/tmp/pear/temp/intl directory).

And INTL_SHARED_LIBADD is not mentioned before that line; it is only
used afterwards (and is only used, not defined anywhere, as far as I can
see).

But I suppose the configure file might be cleverer than me and making
those functions/symbols on the fly?

Anyone got any suggestions I can try? This is intl-1.0.0beta.tgz by the way.

Darren

P.S. The above error message is identical in all cases:
  * I press enter
  * I input abort
  * I input all and give a path that exists
  * I input all and give a load of garbage
I.e. I think the --with-icu-dir setting is a red herring, and the
problem is earlier.




> 
> Yes - ubuntu 8.04 64bit.
> 
>> I try just pressing Enter and get lots of output ending ... failed
> 
> I just pressed ENTER and it found icu. What it is looking for is 
> 
>   bin/icu-config
> 
> On my system this is in /usr. If the default empty path doesn't work, I'd
> try '/usr'. If you can't find 'icu-config' on your system then you don't
> have libXXicu-dev installed.
> 
> =Ed Batutis
> 
> 
> 


-- 
Darren Cook
http://dcook.org/mlsn/ (English-Japanese-German-Chinese free dictionary)
http://dcook.org/work/ (About me and my work)
http://dcook.org/work/charts/  (My flash charting demos)

--- End Message ---
--- Begin Message ---
> I've had a look in the files that exist at that point and PHP_SETUP_ICU
> only exists on that one line; it doesn't seem to be defined anywhere

It should be defined in acinclude.m4 in the php5 dev directory -
/usr/lib/php5/build on my ubuntu - installed when you installed php5-dev.

Maybe your autoconf isn't finding it for some reason. You might check that
if so.

=Ed



--- End Message ---
--- Begin Message ---
>> I've had a look in the files that exist at that point and PHP_SETUP_ICU
>> only exists on that one line; it doesn't seem to be defined anywhere
> 
> It should be defined in acinclude.m4 in the php5 dev directory -
> /usr/lib/php5/build on my ubuntu - installed when you installed php5-dev.

Thanks Ed,
That narrows it down a lot: it isn't defined in that file!
The top line says:
 dnl $Id: acinclude.m4,v 1.332.2.14.2.15 2007/05/24 21:40:41 sniper Ex

So I assume ubuntu 8 is using a more recent PHP than ubuntu 7.1 and
defines that constant. My php -v says:
 PHP 5.2.3-1ubuntu6.3 (cli) (built: Jan 10 2008 09:38:37)

pecl intl should work with php 5.2 shouldn't it? So is it fair to say
that the php-dev ubunutu 7.1 package has a bug because the acinclude.m4
file (2007/05/24) wasn't updated in the last build (2008/01/10)? (I'm
just trying to work out who I should give a bug report to.)

Is it possible for you to post your PHP_SETUP_ICU definition and I'll
paste that into acinclude.m4 to see if it is the only problem.

Thanks,

Darren


-- 
Darren Cook
http://dcook.org/mlsn/ (English-Japanese-German-Chinese free dictionary)
http://dcook.org/work/ (About me and my work)
http://dcook.org/work/charts/  (My flash charting demos)

--- End Message ---
--- Begin Message ---
Hi!

So I assume ubuntu 8 is using a more recent PHP than ubuntu 7.1 and
defines that constant. My php -v says:
 PHP 5.2.3-1ubuntu6.3 (cli) (built: Jan 10 2008 09:38:37)

I think ICU stuff might be added to 5.2 branch after 5.2.3 You may want to d/l 5.2.6 from php.net and compare the acinclude.m4, or just see one on cvs.php.net in PHP_5_2 branch. I understand it's not as good as having it ready on your system, but 5.2.3 is year old now, and 5.2.6 is current, so I hope Ubuntu has some more up-to-date package...
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--- End Message ---
--- Begin Message ---
Thanks Ed. I remember the discussion now.
Personally I don't think it makes sense.
It is an option that should be offered, because it is good for performance, but 
it is more tedious programming and harder to migrate programs to use this 
functionality.

The tradeoff is like this:

Let's say a program is on the third character in a string. Today the program 
knows it is at an index of 3.
In a multibyte world if the start value is the character count the first thing 
it does is scan the string to find the byte offset where the 3rd character 
begins.
However, it is likely that this same byte position is known from immediately 
prior work on the string. So passing byte length around saves frequent 
rescanning of the string.
An important caveat is that if the string is modified the byte counts have to 
be thrown away, or at least those after the string is modified.

On the other hand, most existing code is doing character count arithmetic and 
changing it means replacing simple indexing with functions to get byte offsets.
It is harder to convert the code. It is of course possible to make the intl 
extension much smarter and remember index to byte mappings, but we didn't have 
time in the initial version.

$start = 3;
//does stuff at 3 and then wants to do stuff 4 characters after this position.
$extractbegin = $start + 4;
$ext = $substr( $mystr, $extractbegin, $len);

Becomes code that has to:
Call a function to find the byte offset of character 3 in the string (by 
scanning).
Needs 2 variables to remember both current character count and byte count

Needs to call a function to find the byte offset of character 7 by either 
scanning from the beginning of the string or starting from the known offset of 
character 3.
$ext=graphemeextract....

My preference is for start to optionally be grapheme or character count and let 
the migration be quick and then add optimizations into the extension to 
recognize strings that are ascii, cache recently used offsets, etc.
But that's just me...

For most programs the performance enhancement of using byte offsets is 
countered by the extra function calls etc. Especially for the typically short 
strings.
(Scanning large buffers repeatedly for offsets into the last few characters can 
hurt, but can usually be worked around thru other optimizations.)

And making the migration difficult will reduce the number of programs that 
actually support languages that need graphemes...

This wasn't your decision so no reflection on you of course. Next version 
should add in support for start values to be grapheme counts....
tex


> -----Original Message-----
> From: Ed Batutis [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, May 08, 2008 12:42 PM
> To: Texin, Tex; [EMAIL PROTECTED]
> Subject: RE: [PHP-I18N] proposal: unification of the 
> grapheme_extract functions
> 
> 
> >  If I use GRAPHEME_EXTR_MAXBYTES, does it return ...
> 
> > I assume it is the max # of whole graphemes that do not 
> exceed the max 
> > bytes.
> 
> Yes. It works just like the old grapheme_extractb.
> 
> > Also, the $start value is that in byte, character or grapheme units 
> > for each of the types?
> 
> The start value is always bytes. I was unsure if this made 
> sense, really, but it is consistent (and easy to implement).
> 
> =Ed
> 
> 
> 

--- End Message ---

Reply via email to