Re: Resolving charset names with Encode

2004-10-24 Thread Nick Ing-Simmons
Bjoern Hoehrmann [EMAIL PROTECTED] writes:
Hi,

  What is currently the best way to resolve charset names to use them
with Encode.pm? I would have expected that e.g.

  Encode::decode('ebcdic-cp-us', '')

would just work but it does not appear to know that alias. Then I've
tried to use I18N::Charset as in

There are two parts to the problem:

1. The actual encoding map must exist.
2. There must be an alias fromn the name to the map.

Encode's charset list is largely based on the tables on the Unicode website,
with some additions from old its Tcl/Tk roots in rendering to various font 
encodings. 

EBCDIC encodings are probably a bit weak compared to say ICU (which as 
IBM is _the_ EBCDIC shop and originator of ICU if I remember correctly).

As with most open source projects work gets done by volunteers with an 
interest in or need for the function.

If you have authoritiative tables and names I am sure Dan would accept 
patches to add charsets.


  Encode::decode(I18N::Charset::enco_charset_name('ebcdic-cp-us'), '')

which also fails. Then I've tried something simpler using the cp037
alias

  Encode::decode('cp037', '')
  Encode::decode(I18N::Charset::enco_charset_name('cp037'), '')

which both fail, too. In order to use the encoding with Encode it seems
I have to use CP37 which is not registered in the IANA registry... So
this does not seem to work very well. 

The Encode name was intended to follow this order:

1. What main users (native writers/speakers) of encoding call it.
2. MIME name
3. IANA name 
4. De-facto name
5. Some other name ? (But such an encoding seems obscure!)

But with the intent if having IANA and National Standard names as aliases.

It works better in other cases
such as

  Encode::decode('l1', '')   # fails
  Encode::decode(I18N::Charset::enco_charset_name('l1'), '') # works

Now I would have hoped there is a foo() in I18N::Charset that I could
use as in

  foreach my $name (I18N::Charset::foo)
  {
my $alias = I18N::Charset::enco_charset_name();
Encode::Alias::define_alias($name, '$alias') if defined $alias;
  }

that would make

  Encode::decode('l1', '');

work, but it seems that there is no such routine... What could be done
to improve this? Ideally I would like to reduce my code to deal with
this stuff to at most

  use I18N::Charset qw(...);

preferably less.

regards.



RE: Resolving charset names with Encode

2004-10-24 Thread Martin 'Kingpin' Thurn
  It seems to me that the main problem is that Encode does not use IANA
registered names.  And ebcdic-cp-us didn't work because of a bug in
I18N::Charset (sorry about that).
  The proper solution IMO is to use the add_enco_alias() function of
I18N::Charset.  In the meantime, I have studied the Encode documentation and
I have added some default aliases, and I will release a new version of
I18N::Charset soon.

 - - Martin

 -Original Message-
 From: Bjoern Hoehrmann [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, October 20, 2004 15:53
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Resolving charset names with Encode


 Hi,

   What is currently the best way to resolve charset names to use them
 with Encode.pm? I would have expected that e.g.

   Encode::decode('ebcdic-cp-us', '')

 would just work but it does not appear to know that alias. Then I've
 tried to use I18N::Charset as in

   Encode::decode(I18N::Charset::enco_charset_name('ebcdic-cp-us'), '')

 which also fails. Then I've tried something simpler using the cp037
 alias

   Encode::decode('cp037', '')
   Encode::decode(I18N::Charset::enco_charset_name('cp037'), '')

 which both fail, too. In order to use the encoding with Encode it seems
 I have to use CP37 which is not registered in the IANA registry... So
 this does not seem to work very well. It works better in other cases
 such as

   Encode::decode('l1', '')   # fails
   Encode::decode(I18N::Charset::enco_charset_name('l1'), '') # works

 Now I would have hoped there is a foo() in I18N::Charset that I could
 use as in

   foreach my $name (I18N::Charset::foo)
   {
 my $alias = I18N::Charset::enco_charset_name();
 Encode::Alias::define_alias($name, '$alias') if defined $alias;
   }

 that would make

   Encode::decode('l1', '');

 work, but it seems that there is no such routine... What could be done
 to improve this? Ideally I would like to reduce my code to deal with
 this stuff to at most

   use I18N::Charset qw(...);

 preferably less.

 regards.




Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Rafael Garcia-Suarez
Dan Kogai wrote:
 This makes perl-5.8.6 happy but the problem is that I have made 
 Encode::utf8 so that it accepts fallback values like Encode::XS (upon 
 the request by Bjoern Hoehrmann via RT).  Encode::utf8 used to return 
 immediately at partial character but now Encode:RETURN_ON_ERR is 
 required, meaning those who installed Encode-2.07 on older perl are in 
 trouble w/ PerlIO.  So I am looking for a solution which does that 
 without tweaking PerlIO::encoding.

Welcome to backward compatibility hell :)

 I just want Encode::utf8-decode() to make sure Encode:RETURN_ON_ERR is 
 on when the callar is PerlIO::encoding...

Or, one could backport PerlIO::encoding (with your patch) to CPAN and
require this latest version for Encode 2.08.


Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Dan Kogai
On Oct 24, 2004, at 18:34, Rafael Garcia-Suarez wrote:
Welcome to backward compatibility hell :)
Hell it was but seems like I came up with a way out (yay).
I just want Encode::utf8-decode() to make sure Encode:RETURN_ON_ERR 
is
on when the callar is PerlIO::encoding...
Or, one could backport PerlIO::encoding (with your patch) to CPAN and
require this latest version for Encode 2.08.
That was what came across my mind first but I found it was not good 
enough to coerce Encode:RETURN_ON_ERR since $PerlIO::encoding:fallback 
is open to the public (even documented!).

So far -renew() is only used by PerlIO (and is meaningful only when 
the object is Encode::Unicode).  In other words, you can tell it's 
PerlIO that is calling you if the object is renewed.

The following patch does that.  The new Encode::utf8-decode() checks 
$self-renewed and if so it sets Encode:RETURN_ON_ERR.  Here is the 
patch or you can wait for Encode-2.08.

Thankfully Encode::XS needs no real -renew so it is left as is 
(dummy -renewed() was introduced just to be safe).

Dan the Encode Maintainer
diff -ruN ext/Encode-2.07/Encode.xs ext/Encode/Encode.xs
--- ext/Encode-2.07/Encode.xs   Sat Oct 23 04:37:13 2004
+++ ext/Encode/Encode.xsSun Oct 24 20:31:06 2004
@@ -252,14 +252,6 @@
 PROTOTYPES: DISABLE
 void
-Method_renew(obj)
-SV *   obj
-CODE:
-{
-XSRETURN(1);
-}
-
-void
 Method_decode_xs(obj,src,check = 0)
 SV *   obj
 SV *   src
@@ -270,6 +262,28 @@
 U8 *s = (U8 *) SvPV(src, slen);
 U8 *e = (U8 *) SvEND(src);
 SV *dst = newSV(slen0?slen:1); /* newSV() abhors 0 -- inaba */
+
+/*
+ * PerlO check -- we assume the object is of PerlIO if renewed
+ * and if so, we set RETURN_ON_ERR for partial character
+ */
+int renewed = 0;
+dSP; ENTER; SAVETMPS;
+PUSHMARK(sp);
+XPUSHs(obj);
+PUTBACK;
+if (call_method(renewed,G_SCALAR) == 1) {
+   SPAGAIN;
+   renewed = POPi;
+   PUTBACK;
+#if 0
+   fprintf(stderr, renewed == %d\n, renewed);
+#endif
+   if (renewed){ check |= ENCODE_RETURN_ON_ERR; }
+}
+FREETMPS; LEAVE;
+/* end PerlIO check */
+
 SvPOK_only(dst);
 SvCUR_set(dst,0);
 if (SvUTF8(src)) {
@@ -397,6 +411,14 @@
 {
 XSRETURN(1);
 }
+
+int
+Method_renewed(obj)
+SV *obj
+CODE:
+RETVAL = 0;
+OUTPUT:
+RETVAL
 void
 Method_name(obj)
diff -ruN ext/Encode-2.07/Unicode/Unicode.pm 
ext/Encode/Unicode/Unicode.pm
--- ext/Encode-2.07/Unicode/Unicode.pm  Sat Oct 23 04:37:17 2004
+++ ext/Encode/Unicode/Unicode.pm   Sun Oct 24 20:38:16 2004
@@ -46,7 +46,7 @@
 my $self = shift;
 $BOM_Unknown{$self-name} or return $self;
 my $clone = bless { %$self } = ref($self);
-$clone-{clone} = 1; # so the caller knows it is renewed.
+$clone-{clone}++ # so the caller knows it is renewed.
 return $clone;
 }

diff -ruN ext/Encode-2.07/lib/Encode/Encoding.pm 
ext/Encode/lib/Encode/Encoding.pm
--- ext/Encode-2.07/lib/Encode/Encoding.pm  Sat Oct 23 04:37:13 2004
+++ ext/Encode/lib/Encode/Encoding.pm   Sun Oct 24 20:25:13 2004
@@ -5,6 +5,7 @@

 require Encode;
+sub DEBUG { 0 }
 sub Define
 {
 my $obj = shift;
@@ -16,7 +17,18 @@
 sub name  { return shift-{'Name'} }
-sub renew { return $_[0] }
+# sub renew { return $_[0] }
+
+sub renew {
+my $self = shift;
+my $clone = bless { %$self } = ref($self);
+$clone-{renewed}++; # so the caller can see it
+DEBUG and warn $clone-{renewed};
+return $clone;
+}
+
+sub renewed{ return $_[0]-{renewed} || 0 }
+
 *new_sequence = \renew;
 sub needs_lines { 0 };
@@ -167,24 +179,28 @@
 Predefined As:
-  sub renew { return $_[0] }
+  sub renew {
+my $self = shift;
+my $clone = bless { %$self } = ref($self);
+$clone-{renewed}++;
+return $clone;
+  }
 This method reconstructs the encoding object if necessary.  If you need
 to store the state during encoding, this is where you clone your 
object.
-Here is an example:
-
-  sub renew {
-  my $self = shift;
-  my $clone = bless { %$self } = ref($self);
-  $clone-{clone} = 1; # so the caller can see it
-  return $clone;
-  }
-
-Since most encodings are stateless the default behavior is just return
-itself as shown above.

 PerlIO ALWAYS calls this method to make sure it has its own private
 encoding object.
+
+=item -Egtrenewed
+
+Predefined As:
+
+  sub renewed { $_[0]-{renewed} || 0 }
+
+Tells whether the object is renewed (and how many times).  Some
+modules emit CUse of uninitialized value in null operation warning
+unless the value is numeric so return 0 for false.
 =item -Egtperlio_ok()


Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Dan Kogai
On Oct 24, 2004, at 20:50, Dan Kogai wrote:
The following patch does that.  The new Encode::utf8-decode() checks 
$self-renewed and if so it sets Encode:RETURN_ON_ERR.  Here is the 
patch or you can wait for Encode-2.08.
One patch to Unicode/Unicode.xs was missing and Unicode/Unicode.pm was 
garbled. Here we go again, the patch against 2.07.  Forget the  
previous patch.

Or wait for Encode-2.08
Dan the Encode Maintainer
diff -ruN ext/Encode-2.07/Encode.xs ext/Encode/Encode.xs
--- ext/Encode-2.07/Encode.xs   Sat Oct 23 04:37:13 2004
+++ ext/Encode/Encode.xsSun Oct 24 20:31:06 2004
@@ -252,14 +252,6 @@
 PROTOTYPES: DISABLE
 void
-Method_renew(obj)
-SV *   obj
-CODE:
-{
-XSRETURN(1);
-}
-
-void
 Method_decode_xs(obj,src,check = 0)
 SV *   obj
 SV *   src
@@ -270,6 +262,28 @@
 U8 *s = (U8 *) SvPV(src, slen);
 U8 *e = (U8 *) SvEND(src);
 SV *dst = newSV(slen0?slen:1); /* newSV() abhors 0 -- inaba */
+
+/*
+ * PerlO check -- we assume the object is of PerlIO if renewed
+ * and if so, we set RETURN_ON_ERR for partial character
+ */
+int renewed = 0;
+dSP; ENTER; SAVETMPS;
+PUSHMARK(sp);
+XPUSHs(obj);
+PUTBACK;
+if (call_method(renewed,G_SCALAR) == 1) {
+   SPAGAIN;
+   renewed = POPi;
+   PUTBACK;
+#if 0
+   fprintf(stderr, renewed == %d\n, renewed);
+#endif
+   if (renewed){ check |= ENCODE_RETURN_ON_ERR; }
+}
+FREETMPS; LEAVE;
+/* end PerlIO check */
+
 SvPOK_only(dst);
 SvCUR_set(dst,0);
 if (SvUTF8(src)) {
@@ -397,6 +411,14 @@
 {
 XSRETURN(1);
 }
+
+int
+Method_renewed(obj)
+SV *obj
+CODE:
+RETVAL = 0;
+OUTPUT:
+RETVAL
 void
 Method_name(obj)
diff -ruN ext/Encode-2.07/Unicode/Unicode.pm 
ext/Encode/Unicode/Unicode.pm
--- ext/Encode-2.07/Unicode/Unicode.pm  Sat Oct 23 04:37:17 2004
+++ ext/Encode/Unicode/Unicode.pm   Sun Oct 24 21:20:22 2004
@@ -46,7 +46,7 @@
 my $self = shift;
 $BOM_Unknown{$self-name} or return $self;
 my $clone = bless { %$self } = ref($self);
-$clone-{clone} = 1; # so the caller knows it is renewed.
+$clone-{renewed}++; # so the caller knows it is renewed.
 return $clone;
 }

diff -ruN ext/Encode-2.07/Unicode/Unicode.xs 
ext/Encode/Unicode/Unicode.xs
--- ext/Encode-2.07/Unicode/Unicode.xs  Sat Oct 23 04:37:21 2004
+++ ext/Encode/Unicode/Unicode.xs   Sun Oct 24 21:20:22 2004
@@ -1,5 +1,5 @@
 /*
- $Id: Unicode.xs,v 2.0 2004/05/16 20:55:16 dankogai Exp $
+ $Id: Unicode.xs,v 2.0 2004/05/16 20:55:16 dankogai Exp dankogai $
  */

 #define PERL_NO_GET_CONTEXT
@@ -97,7 +97,7 @@
 U8 endian   = *((U8 *)SvPV_nolen(attr(endian, 6)));
 int size=   SvIV(attr(size,   4));
 int ucs2= SvTRUE(attr(ucs2,   4));
-int clone   = SvTRUE(attr(clone,  5));
+int renewed = SvTRUE(attr(renewed,  7));
 SV *result  = newSVpvn(,0);
 STRLEN ulen;
 U8 *s = (U8 *)SvPVbyte(str,ulen);
@@ -124,7 +124,7 @@
}
 #if 1
/* Update endian for next sequence */
-   if (clone) {
+   if (renewed) {
hv_store((HV *)SvRV(obj),endian,6,newSVpv((char 
*)endian,1),0);
}
 #endif
@@ -200,7 +200,7 @@
 U8 endian   = *((U8 *)SvPV_nolen(attr(endian, 6)));
 int size=   SvIV(attr(size,   4));
 int ucs2= SvTRUE(attr(ucs2,   4));
-int clone   = SvTRUE(attr(clone,  5));
+int renewed = SvTRUE(attr(renewed,  7));
 SV *result  = newSVpvn(,0);
 STRLEN ulen;
 U8 *s = (U8 *)SvPVutf8(utf8,ulen);
@@ -211,7 +211,7 @@
enc_pack(aTHX_ result,size,endian,BOM_BE);
 #if 1
/* Update endian for next sequence */
-   if (clone){
+   if (renewed){
hv_store((HV *)SvRV(obj),endian,6,newSVpv((char 
*)endian,1),0);
}
 #endif
diff -ruN ext/Encode-2.07/lib/Encode/Encoding.pm 
ext/Encode/lib/Encode/Encoding.pm
--- ext/Encode-2.07/lib/Encode/Encoding.pm  Sat Oct 23 04:37:13 2004
+++ ext/Encode/lib/Encode/Encoding.pm   Sun Oct 24 20:25:13 2004
@@ -5,6 +5,7 @@

 require Encode;
+sub DEBUG { 0 }
 sub Define
 {
 my $obj = shift;
@@ -16,7 +17,18 @@
 sub name  { return shift-{'Name'} }
-sub renew { return $_[0] }
+# sub renew { return $_[0] }
+
+sub renew {
+my $self = shift;
+my $clone = bless { %$self } = ref($self);
+$clone-{renewed}++; # so the caller can see it
+DEBUG and warn $clone-{renewed};
+return $clone;
+}
+
+sub renewed{ return $_[0]-{renewed} || 0 }
+
 *new_sequence = \renew;
 sub needs_lines { 0 };
@@ -167,24 +179,28 @@
 Predefined As:
-  sub renew { return $_[0] }
+  sub renew {
+my $self = shift;
+my $clone = bless { %$self } = ref($self);
+$clone-{renewed}++;
+return $clone;
+  }
 This method reconstructs the encoding object if necessary.  If you need
 to store the state during encoding, this is where you clone your 
object.
-Here is an example:
-
-  sub renew {
-  my $self = shift;
-  my $clone = bless { %$self } = ref($self);
-  $clone-{clone} = 1; # so the caller can see it
-  return 

[Encode] 2.08 released

2004-10-24 Thread Dan Kogai
Porters,
On Oct 24, 2004, at 20:50, Dan Kogai wrote:
The following patch does that.  The new Encode::utf8-decode() checks 
$self-renewed and if so it sets Encode:RETURN_ON_ERR.  Here is the 
patch or you can wait for Encode-2.08.
One patch to Unicode/Unicode.xs was missing and Unicode/Unicode.pm was 
garbled. Here we go again, the patch against 2.07.  Forget the  
previous patch.

Or wait for Encode-2.08
And here comes Encode-2.08.  If you are by any chance using 
Encode-2.07, upgrade RIGHT NOW!

=head1 Tested
As follows:
 Perl 5.8.3 on Mac OS X v10.3.5 (/usr/bin/perl, post-built as in CPAN)
 Perl 5.8.5 on Mac OS X v10.3.5 (post-built)
on FreeBSD 4.10-STABLE (post-built)
 bleedperl  on Mac OS X v10.3.5 (integrally built w/ whole perl dist)
on FreeBSD 4.10-STABLE (integrally built)
=head1 Availability
http://www.dan.co.jp/~dankogai/cpan/Encode-2.08.tar.gz
or CPAN near you
=head1 Changes
$Revision: 2.8 $ $Date: 2004/10/24 13:00:29 $
! Encode.xs lib/Encode/Encoding.pm  Unicode/Unicode.{pm,xs}
  Resolved the issue that was raised by the Encode::utf8 fallbacks vs.
  PerlIO::encoding issue that was introduced in 2.07.  This is done by
  making use of -renew() method that used to be used only by
  Encode::Unicode.  -renewed() method was also introduced to fetch
  the value thereof.
  Message-Id: [EMAIL PROTECTED]
=head1 Epilogue
Enjoy!
Dan the Encode Maintainer


Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-24 Thread Nick Ing-Simmons
Dan Kogai [EMAIL PROTECTED] writes:
On Oct 23, 2004, at 01:04, Bjoern Hoehrmann wrote:
 C12a in Unicode 4.0.1 notes

 [...]
   For example, in UTF-8 every code unit of the form 110 must be
   followed by a code unit of the form 10xx. A sequence such as
   110x 0xxx is illformed and must never be generated. When
   faced with this ill-formed code unit sequence while transforming or
   interpreting text, a conformant process must treat the first code 
 unit
   110x as an illegally terminated code unit sequence--for example,
   by signaling an error, filtering the code unit out, or representing
   the code unit with a marker such as U+FFFD
 [...]
 [snip]

Okay, you win.  You have convinced me that Encode::utf8 should behave 
the same as Encode::XS (UCM-base encodings).  And the patch to make 
that way is deceptively simple, as follow;

I think \xF6r is indeed wrong.

But as Dan said at the start \xF6 on its own (say as 1023 octet 
in a 0..1023 1024-octet buffer is not a fail.
Changing that will make :encoding() layer have problems as buffer 
boundaries can occur in the middle of characters.



===
RCS file: Encode.xs,v
retrieving revision 2.0
diff -u -r2.0 Encode.xs
--- Encode.xs   2004/05/16 20:55:15 2.0
+++ Encode.xs   2004/10/22 18:00:29
@@ -297,7 +297,7 @@
 U8 skip = UTF8SKIP(s);
 if ((s + skip)  e) {
 /* Partial character - done */
-   break;
+   goto decode_utf8_fallback;
 }
 else if (is_utf8_char(s)) {
 /* Whole char is good */
@@ -313,6 +313,7 @@
 /* Invalid start byte */
 }
 /* If we get here there is something wrong with alleged UTF-8 */
+decode_utf8_fallback:
 if (check  ENCODE_DIE_ON_ERR){
 Perl_croak(aTHX_ ERR_DECODE_NOMAP, utf8, (UV)*s);
 XSRETURN(0);

===

The most decisive comment of yours is this:

 holds true and I expect that

   my $x = Bj\xF6rn; # as well as Bj\xF6r and Bj\xF6
   decode(utf-8, $x, Encode::FB_CROAK);

 croaks.

Which apparently did not.  Thank you for being so persitent on this 
problem.  I'd be honor to add your name to AUTHORS file for this.

I will $Encode::VERSION++ as soon as I am done w/ the test suites and 
Tel's patch.  This time I will be careful not to screw up 
(maint|bread)perl so give me some time before the update is ready (but 
I won't keep you waiting for too long since 5.8.6 deadline is soon).

 Your statement about \xF6\x80\x80\x80 is interesting, Encode::is_utf8 
 is
 documented as

 [...]
   is_utf8(STRING [, CHECK])
 [INTERNAL] Tests whether the UTF-8 flag is turned on in the STRING.
 If CHECK is true, also checks the data in STRING for being
 well-formed UTF-8. Returns true if successful, false otherwise.
 [...]

 And D36 in Unicode 4.0.1 is very clear that

 [...]
   As a consequence of the well-formedness conditions specified in Table
   3-6, the following byte values are disallowed in UTF-8: C0C1, F5FF.
 [...]

That's because perl's notion of Unicode is broader than that of 
unicode.org.  So far Unicode.org's mapping only spans from U+ to 
U+1f, While that of perl is U+ or even U+ 
(in other words, MAX_UINT).  See Camel 3 on details.

And I think we can leave this :)

Dan the Encode Maintainer



Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-24 Thread Dan Kogai
On Oct 25, 2004, at 03:01, Nick Ing-Simmons wrote:
But as Dan said at the start \xF6 on its own (say as 1023 octet
in a 0..1023 1024-octet buffer is not a fail.
Changing that will make :encoding() layer have problems as buffer
boundaries can occur in the middle of characters.
Right.  Encode-2.07 indeed had the problem, resulting bleedperl to fail 
on ext/PerlIO/t/encoding.t, test 14.  Encode-2.08 corrected the problem 
by checking if the caller is PerlIO and if so, sets 
Encode::RETURN_ON_ERR so it breaks out of the loop on partial character 
case.

I believe I have checked  tested enough but I would appreaciate if you 
guys take a look, especially Encode.xs and t/fallback.t.

Dan the Encode Maintainer


Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
Dan Kogai [EMAIL PROTECTED] writes:
On Oct 24, 2004, at 06:41, Rafael Garcia-Suarez wrote:
 Dan Kogai wrote:
 Within less than 24hrs I resorted to release version 2.07.  What the
 heck.  5.8.6 is soon

 I applied 2.07 to bleadperl, and looks like something is broken in
 PerlIO::encoding.
 More precisely, ext/PerlIO/t/encoding.t fails test 14, that tests
 open(F,':encoding(utf-8)',$threebyte).

The easiest solution is the patch below;

--- ext/PerlIO/encoding/encoding.pm.distSat May 24 00:38:36 2003
+++ ext/PerlIO/encoding/encoding.pm Sun Oct 24 13:38:45 2004
@@ -12,7 +12,7 @@
  use XSLoader ();
  XSLoader::load(__PACKAGE__, $VERSION);

-our $fallback = Encode::PERLQQ()|Encode::WARN_ON_ERR();
+our $fallback = 
Encode::PERLQQ()|Encode::WARN_ON_ERR()|Encode::RETURN_ON_ERR();

  1;
  __END__

This makes perl-5.8.6 happy but the problem is that I have made 
Encode::utf8 so that it accepts fallback values like Encode::XS (upon 
the request by Bjoern Hoehrmann via RT).  

That is worthwhile - but a partial character has a mechanism in Encode::XS - 
Return any complete characters and leave the partial characters in the 
input SV.

Encode::utf8 used to return 
immediately at partial character but now Encode:RETURN_ON_ERR is 
required, meaning those who installed Encode-2.07 on older perl are in 
trouble w/ PerlIO.  So I am looking for a solution which does that 
without tweaking PerlIO::encoding.

But in my humble optinion a partial character is NOT an error.
Snag is that with that set when a REAL error comes along it will return :-(


I just want Encode::utf8-decode() to make sure Encode:RETURN_ON_ERR is 
on when the callar is PerlIO::encoding...

Which would be a change - may be an improvement (I am not sure) - but a change.

Existing :encoding uses may be expecting Encode to croak/warn/replace 
bad characters. 



Dan the Encode Maintainer



Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
Rafael Garcia-Suarez [EMAIL PROTECTED] writes:
Dan Kogai wrote:
 This makes perl-5.8.6 happy but the problem is that I have made 
 Encode::utf8 so that it accepts fallback values like Encode::XS (upon 
 the request by Bjoern Hoehrmann via RT).  Encode::utf8 used to return 
 immediately at partial character but now Encode:RETURN_ON_ERR is 
 required, meaning those who installed Encode-2.07 on older perl are in 
 trouble w/ PerlIO.  So I am looking for a solution which does that 
 without tweaking PerlIO::encoding.

Welcome to backward compatibility hell :)

 I just want Encode::utf8-decode() to make sure Encode:RETURN_ON_ERR is 
 on when the callar is PerlIO::encoding...

Or, one could backport PerlIO::encoding (with your patch) to CPAN and
require this latest version for Encode 2.08.

You could. But I designed the two together and partial characters 
were not errors but distingushed in a way which is part of Encode::XS 
behaviour.



Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
Dan Kogai [EMAIL PROTECTED] writes:
On Oct 24, 2004, at 18:34, Rafael Garcia-Suarez wrote:
 Welcome to backward compatibility hell :)

Hell it was but seems like I came up with a way out (yay).

 I just want Encode::utf8-decode() to make sure Encode:RETURN_ON_ERR 
 is
 on when the callar is PerlIO::encoding...

 Or, one could backport PerlIO::encoding (with your patch) to CPAN and
 require this latest version for Encode 2.08.

That was what came across my mind first but I found it was not good 
enough to coerce Encode:RETURN_ON_ERR since $PerlIO::encoding:fallback 
is open to the public (even documented!).

So far -renew() is only used by PerlIO (and is meaningful only when 
the object is Encode::Unicode).  

And certain other bits of code writen by its original author ;-)

In other words, you can tell it's 
PerlIO that is calling you if the object is renewed.

The following patch does that.  The new Encode::utf8-decode() checks 
$self-renewed and if so it sets Encode:RETURN_ON_ERR.  Here is the 
patch or you can wait for Encode-2.08.

Will try and find time to do a proper patch to the utf-8 decoder.



Thankfully Encode::XS needs no real -renew so it is left as is 
(dummy -renewed() was introduced just to be safe).

Dan the Encode Maintainer

diff -ruN ext/Encode-2.07/Encode.xs ext/Encode/Encode.xs
--- ext/Encode-2.07/Encode.xs   Sat Oct 23 04:37:13 2004
+++ ext/Encode/Encode.xsSun Oct 24 20:31:06 2004
@@ -252,14 +252,6 @@
  PROTOTYPES: DISABLE

  void
-Method_renew(obj)
-SV *   obj
-CODE:
-{
-XSRETURN(1);
-}
-
-void
  Method_decode_xs(obj,src,check = 0)
  SV *   obj
  SV *   src
@@ -270,6 +262,28 @@
  U8 *s = (U8 *) SvPV(src, slen);
  U8 *e = (U8 *) SvEND(src);
  SV *dst = newSV(slen0?slen:1); /* newSV() abhors 0 -- inaba */
+
+/*
+ * PerlO check -- we assume the object is of PerlIO if renewed
+ * and if so, we set RETURN_ON_ERR for partial character
+ */
+int renewed = 0;
+dSP; ENTER; SAVETMPS;
+PUSHMARK(sp);
+XPUSHs(obj);
+PUTBACK;
+if (call_method(renewed,G_SCALAR) == 1) {
+   SPAGAIN;
+   renewed = POPi;
+   PUTBACK;
+#if 0
+   fprintf(stderr, renewed == %d\n, renewed);
+#endif
+   if (renewed){ check |= ENCODE_RETURN_ON_ERR; }
+}
+FREETMPS; LEAVE;
+/* end PerlIO check */
+
  SvPOK_only(dst);
  SvCUR_set(dst,0);
  if (SvUTF8(src)) {
@@ -397,6 +411,14 @@
  {
  XSRETURN(1);
  }
+
+int
+Method_renewed(obj)
+SV *obj
+CODE:
+RETVAL = 0;
+OUTPUT:
+RETVAL

  void
  Method_name(obj)
diff -ruN ext/Encode-2.07/Unicode/Unicode.pm 
ext/Encode/Unicode/Unicode.pm
--- ext/Encode-2.07/Unicode/Unicode.pm  Sat Oct 23 04:37:17 2004
+++ ext/Encode/Unicode/Unicode.pm   Sun Oct 24 20:38:16 2004
@@ -46,7 +46,7 @@
  my $self = shift;
  $BOM_Unknown{$self-name} or return $self;
  my $clone = bless { %$self } = ref($self);
-$clone-{clone} = 1; # so the caller knows it is renewed.
+$clone-{clone}++ # so the caller knows it is renewed.
  return $clone;
  }

diff -ruN ext/Encode-2.07/lib/Encode/Encoding.pm 
ext/Encode/lib/Encode/Encoding.pm
--- ext/Encode-2.07/lib/Encode/Encoding.pm  Sat Oct 23 04:37:13 2004
+++ ext/Encode/lib/Encode/Encoding.pm   Sun Oct 24 20:25:13 2004
@@ -5,6 +5,7 @@

  require Encode;

+sub DEBUG { 0 }
  sub Define
  {
  my $obj = shift;
@@ -16,7 +17,18 @@

  sub name  { return shift-{'Name'} }

-sub renew { return $_[0] }
+# sub renew { return $_[0] }
+
+sub renew {
+my $self = shift;
+my $clone = bless { %$self } = ref($self);
+$clone-{renewed}++; # so the caller can see it
+DEBUG and warn $clone-{renewed};
+return $clone;
+}
+
+sub renewed{ return $_[0]-{renewed} || 0 }
+
  *new_sequence = \renew;

  sub needs_lines { 0 };
@@ -167,24 +179,28 @@

  Predefined As:

-  sub renew { return $_[0] }
+  sub renew {
+my $self = shift;
+my $clone = bless { %$self } = ref($self);
+$clone-{renewed}++;
+return $clone;
+  }

  This method reconstructs the encoding object if necessary.  If you need
  to store the state during encoding, this is where you clone your 
object.
-Here is an example:
-
-  sub renew {
-  my $self = shift;
-  my $clone = bless { %$self } = ref($self);
-  $clone-{clone} = 1; # so the caller can see it
-  return $clone;
-  }
-
-Since most encodings are stateless the default behavior is just return
-itself as shown above.

  PerlIO ALWAYS calls this method to make sure it has its own private
  encoding object.
+
+=item -Egtrenewed
+
+Predefined As:
+
+  sub renewed { $_[0]-{renewed} || 0 }
+
+Tells whether the object is renewed (and how many times).  Some
+modules emit CUse of uninitialized value in null operation warning
+unless the value is numeric so return 0 for false.

  =item -Egtperlio_ok()