Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl
Okay, I can be convinced. :) Diff and RFC updated. And I agree on the slight sense of unease at adding a third encoding converter, but ICU is so far ahead of the other two that I really feel we should encourage its use over iconv and mbstring. -Sara On Wed, Oct 31, 2012 at 6:40 PM, Adam Harvey ahar...@php.net wrote: On 31 October 2012 06:18, Sara Golemon poll...@php.net wrote: With the exception of renaming the UConverter::UCNV_* constants to remove the UCNV_ prefix, I believe I've addressed the concerns thus far. ((Waiting to hear if anyone else wants to weigh in on the contants)) The RFC has been updated accordingly.+ I would say that unprefixing makes sense in general, particularly since that's what happens in other intl classes (such as Collator). Prefixing the callback reason constants with REASON_* seems like a good compromise there — as you said upthread, they are different to the other constants defined on UConverter. Beyond that, I think the type constants would do fine as direct UConverter constants (UConverter::UTF8, for instance, makes sense to me — in general, you're using a converter because you want to deal with encoding types). I'm +1 on the functionality in general. The thought of another encoding conversion API in PHP doesn't fill me with great joy given we already have mbstring and iconv, but it does provide features neither of those libraries provide: combined with the existing intl functionality and the ever-increasing need for internationalisation support, I'd like to think we might look at nudging intl towards being the usual way of providing that functionality in PHP. Thanks, Adam, who is apparently in a run-on sentence kind of mood today. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl
After the updates it looks really good, very handy functionality to have. On Tue, Oct 30, 2012 at 6:18 PM, Sara Golemon poll...@php.net wrote: With the exception of renaming the UConverter::UCNV_* constants to remove the UCNV_ prefix, I believe I've addressed the concerns thus far. ((Waiting to hear if anyone else wants to weigh in on the contants)) The RFC has been updated accordingly.+ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl
On 31 October 2012 06:18, Sara Golemon poll...@php.net wrote: With the exception of renaming the UConverter::UCNV_* constants to remove the UCNV_ prefix, I believe I've addressed the concerns thus far. ((Waiting to hear if anyone else wants to weigh in on the contants)) The RFC has been updated accordingly.+ I would say that unprefixing makes sense in general, particularly since that's what happens in other intl classes (such as Collator). Prefixing the callback reason constants with REASON_* seems like a good compromise there — as you said upthread, they are different to the other constants defined on UConverter. Beyond that, I think the type constants would do fine as direct UConverter constants (UConverter::UTF8, for instance, makes sense to me — in general, you're using a converter because you want to deal with encoding types). I'm +1 on the functionality in general. The thought of another encoding conversion API in PHP doesn't fill me with great joy given we already have mbstring and iconv, but it does provide features neither of those libraries provide: combined with the existing intl functionality and the ever-increasing need for internationalisation support, I'd like to think we might look at nudging intl towards being the usual way of providing that functionality in PHP. Thanks, Adam, who is apparently in a run-on sentence kind of mood today. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl
hi Sara! On Tue, Oct 30, 2012 at 2:57 AM, Sara Golemon poll...@php.net wrote: http://wiki.php.net/rfc/uconverter Discuss! Nice job! Some comments, raw :) : - the UCNV prefix is not necessary, we are in the UConverter class already, I would simply use LATIN_1, or ILLEGAL for exampe. - It is OO, we should or could use exception. Warning are not nice to deal with and errors are quite common during conversion - I would add ERROR_ to the error related constants (UNASSIGNED co), or? Some questions, is 'latin1' the same than const UCNV_LATIN_1? or is there any relation between the two? Cheers, -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl
btw, can you add the changes to config.w32 too pls? Pretty much the same than in .m4, so I can add it to our CI and valid the RFC patch in a more handy manner :) On Tue, Oct 30, 2012 at 2:57 AM, Sara Golemon poll...@php.net wrote: http://wiki.php.net/rfc/uconverter Discuss! -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- Pierre @pierrejoye | http://blog.thepimp.net | http://www.libgd.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl
Very nice work. Bravo! On 30/10/12 02:57, Sara Golemon wrote: http://wiki.php.net/rfc/uconverter Discuss! -- Ivan Enderlin Developer of Hoa http://hoa.42/ or http://hoa-project.net/ PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis) http://disc.univ-fcomte.fr/ and http://www.inria.fr/ Member of HTML and WebApps Working Group of W3C http://w3.org/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl
Em 2012-10-30 2:57, Sara Golemon escreveu: http://wiki.php.net/rfc/uconverter Discuss! Good work. I do, however, have some concerns: * Error handling is done in a different way from the rest of the extension. Error handling in ext/intl is, admittedly, a little strange. Errors are both stored globally (to be retrieved by intl_get_error_code() and intl_get_error_message()) and in the object that raised them (through ::getErrorCode() and ::getErrorMessage()). The mechanism in ICU where one can do several calls in succession and check the error code only at the end is broken; the error codes are reset at the start of each call. The errors may raise any kind of PHP error (depending on the value of a INI setting, intl.error_level) or an exception (activated globally via intl.use_exceptions). On failure, functions/methods return false (constructors and factory methods return null). Unusual as this may be, it would be a bad idea to introduce inconsistency here. See the implementations in the other modules. * The tests have poor coverage. -- Gustavo Lopes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl
- the UCNV prefix is not necessary, we are in the UConverter class already, I would simply use LATIN_1, or ILLEGAL for exampe. - I would add ERROR_ to the error related constants (UNASSIGNED co), or? For the UConverterType enum I could maybe get behind that. For UConverterCallbackReason I think we need something, but ERROR_ isn't universally applicable since three are just status incidents (UCNV_RESET, UCNV_CLOSE, UCNV_CLONE). Perhaps: REASON_* for these? In that vein, I'd be tempted to make the UConverterType values be TYPE_* - It is OO, we should or could use exception. Warning are not nice to deal with and errors are quite common during conversion We are. UConverterException is thrown everywhere but one place. That one exception is the static method I added to give a non-oop-like API for the most core functionality. Given its non-oopness, I left it non-exceptiony. See Gustvo's notes about error handling below. Some questions, is 'latin1' the same than const UCNV_LATIN_1? or is there any relation between the two? The UCNV_LATIN1 constant is a numeric value, so they're not the same, no. The relationship is that when you instantiate a UConverter with 'latin1' (or 'iso-8859-1', or any of its aliases), $conv-getSourceType() (or getDestinationType() as appropriate) will return UConverter::UCNV_LATIN_1. Each type in the UConverterType set uses a different algorithm approach to do the character conversion. Sets like latin1, utf*, ascii, etc... can be done faster than sets like 'koi8-r' because the conversions are simple bitshifts (very simple in the case of ascii/latin1). This just surfaces some info on what the converter is doing internally and isn't much use beyond a bit of logging, in practice. btw, can you add the changes to config.w32 too pls? Ah, right sorry, had meant to do that at the last minute. Will add that in a bit, Error handling is done in a different way from the rest of the extension. Unusual as this may be, it would be a bad idea to introduce inconsistency here. See the implementations in the other modules. I do agree, I was hoping I could slip that one by. :p I'll take another look at the rest of ext/intl and see about adapting my error handling. The tests have poor coverage. The important bits are hit, but yeah, it could stand to have better testing. I'll add some. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl
Hi! http://wiki.php.net/rfc/uconverter Discuss! Looks nice. Some points: 1. transcode() accepts options, but there's no comparable way to set options to the object. I think these APIs should be synchronized. Imagine code keeping options in array/config object - it's be really annoying to have two separate procedures to feed these to object and to transcode(). Also, description of options would be helpful. 2. Shouldn't Enumeration and lookup methods be static? They look like independent from encodings and don't use the object. 3. For Advanced Use, I think no error condition should be the default and not requiring explicit action. 4. I think error reporting should match other intl functions. It'd not really be good if intl submodules would be all different in error reporting. 5. What is $source parameter for callbacks? 6. Why toUCallback returns string but fromUCallback gets codepoint as long? Shouldn't those be the same - i.e., if toU returns unicode codepoint, it should be long? Or it can return multiple codepoints? In which case it becomes confusing as we represent codepoints as both string and long in the same API. 7. Link to ICU API from the RFC would be helpful for reviewers and later docs, I think. -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227 -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl
1. transcode() accepts options, but there's no comparable way to set options to the object. I think these APIs should be synchronized. Imagine code keeping options in array/config object - it's be really annoying to have two separate procedures to feed these to object and to transcode(). transcode() having an $options parameter is to make up for the instance version (convert()) being able to set those via instance functions (setSubstChars()). I don't picture a given app using both convert() and transcode(), the latter only exists to placate those who are objectophobic. Also, description of options would be helpful. They're covered in the RFC: to_subst and from_subst under Simple Use 2. Shouldn't Enumeration and lookup methods be static? They look like independent from encodings and don't use the object. They are in the patch, I just forgot to note that in the RFC. Updated. 3. For Advanced Use, I think no error condition should be the default and not requiring explicit action. If you take no action at all, then an error still exists. This is consistent with the underlying API. 4. I think error reporting should match other intl functions. It'd not really be good if intl submodules would be all different in error reporting. Mentioned in previous feedback, I plan to look at this again. 5. What is $source parameter for callbacks? It's context for where in the conversion we are. $codeunit/$codepoint is the specific element causing the problem, $source is the string from that point forward. 6. Why toUCallback returns string but fromUCallback gets codepoint as long? Shouldn't those be the same - i.e., if toU returns unicode codepoint, it should be long? Or it can return multiple codepoints? In which case it becomes confusing as we represent codepoints as both string and long in the same API. Actually (I left this out of the RFC), they both can return a large number of types. In the case of toUCallback, you can return a utf-8 string (most reasonable Unicode representation to be returned as a char* string) and the callback mechanism will make that into UChars to put into the target string. You can return a long and it'll be treated as a single Unicode codepoint (One UChar for BMP, 2 for higher planes). You can also return an array of either of these types to specify a string in a readable, but unicode friendly format, e.g. array(Espa, 0x00F1 /* LATIN SMALL LETTER N WITH TILDE */, ol) would be equivalent to Espa\xC3\xB1ol. The same is true for fromUCallback() with the exception that the values being returned are assumed to be in the target encoding. For longs this means a single byte unsigned char which is appended to the target as-is. Similarly strings are appended as-is. As for input parameters: for toUCallback, $source and $codeUnits are still in their original encoding and presented as-is for that encoding. For fromUCallback(), the $source/$codePoint are in Unicode (UChar/UTF16 internall) and can't be directly offered to PHP without running into endianness issues. So the codepoint is provided as a single UChar32 (avoiding the surrogate problem in the process), and source is given as a series of UChar32 codepoints in a numerically indexed array. I'll add a section about callback input/return types to clarify this. 7. Link to ICU API from the RFC would be helpful for reviewers and later docs, I think. Added! -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl
With the exception of renaming the UConverter::UCNV_* constants to remove the UCNV_ prefix, I believe I've addressed the concerns thus far. ((Waiting to hear if anyone else wants to weigh in on the contants)) The RFC has been updated accordingly.+ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php