Re: [not-yet-a-PATCH] compress Encode better

2002-11-04 Thread Nicholas Clark

On Mon, Nov 04, 2002 at 08:11:04PM +0900, Dan Kogai wrote:
> NC and porters,
> 
>First of all, this is a great patch.  Not only does it optimize the 
> resulting shlibs, it seems to consume less memory during compilation.

Thanks. I wasn't actually trying to reduce memory usage during compilation
(either running the perl script, or running the C compiler)

The only change that was explictly thinking about memory and CPU usage for
the perl script was this one:
-   # We have a single long line. Split it at convenient commas.
-   $definition =~ s/(.{74,77},)/$1\n/g;
-   print $fh "$definition };\n\n";
+  # We have a single long line. Split it at convenient commas.
+  print $fh $1, "\n" while $definition =~ /\G(.{74,77},)/gcs;
+  print $fh substr ($definition, pos $definition), " };\n";

and I was bad and didn't actually benchmark its effects.
[instead of doing things in memory, and constantly re-copying the remainder of
the string every time the s///g adds a newline, the revised version prints
the sections of string out (and lets the IO system worry about aggregating
sections into one string]

> Thank you, NC.

It's not a problem. It allowed me to put of doing other stuff :-)
[such as actually writing the book review I am supposed to be doing for
http://london.pm.org/reviews/ ]

On Mon, Nov 04, 2002 at 01:42:58PM +, Nick Ing-Simmons wrote:
> Dan Kogai <[EMAIL PROTECTED]> writes:
> >On Monday, Nov 4, 2002, at 19:17 Asia/Tokyo, Nick Ing-Simmons wrote:
> >> Someone could/should write a generic test that pushes all codepoints
> >> supported by a .ucm file both ways through the generated encoder
> >> and checks for correctness. This would be a pointless thing to do
> >> as part of perl's "make test" as once the "compiler" works it works,
> >> but would be useful for folk working on the compile process.
> >
> >That is already in as t/rt.pl.  Since the test takes a long time (30 
> >seconds on my PowerBook G4 800MHz) it is not a part of standard 'make 
> >test' suite.  The NC Patch passes all that.
> 
> Excellent. 

Mr Burns couldn't have put it better. :-)

On Mon, Nov 04, 2002 at 08:19:57PM +0900, Dan Kogai wrote:
> On Monday, Nov 4, 2002, at 20:11 Asia/Tokyo, Dan Kogai wrote:
> > oh wait!  Encode.xs remains unchanged so Encode::* may still work
> 
> Confirmed.  The NC patch works w/ preexisting shlibs.

Good. It would have been worrying if it had not. The idea was not to
change any of the internal data structures visible to any code anywhere,
just to change how the U8 strings they point were arranged.

Nicholas Clark
-- 
z-code better than perl?http://www.perl.org/advocacy/spoofathon/



Re: [not-yet-a-PATCH] compress Encode better

2002-11-04 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes:
>On Monday, Nov 4, 2002, at 19:17 Asia/Tokyo, Nick Ing-Simmons wrote:
>> Someone could/should write a generic test that pushes all codepoints
>> supported by a .ucm file both ways through the generated encoder
>> and checks for correctness. This would be a pointless thing to do
>> as part of perl's "make test" as once the "compiler" works it works,
>> but would be useful for folk working on the compile process.
>
>That is already in as t/rt.pl.  Since the test takes a long time (30 
>seconds on my PowerBook G4 800MHz) it is not a part of standard 'make 
>test' suite.  The NC Patch passes all that.

Excellent. 

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/




Re: [Encode] HEADS-UP: NC patch will be in

2002-11-04 Thread Dan Kogai
On Monday, Nov 4, 2002, at 20:11 Asia/Tokyo, Dan Kogai wrote:

oh wait!  Encode.xs remains unchanged so Encode::* may still work


Confirmed.  The NC patch works w/ preexisting shlibs.


>perl -MEncode -e 'print Encode->VERSION, "\n"'
1.81 # not released, of course!
>perl -MEncode::HanExtra -e 1
>


Dan the Encode Maintainer




Re: [not-yet-a-PATCH] compress Encode better

2002-11-04 Thread Dan Kogai
On Monday, Nov 4, 2002, at 19:17 Asia/Tokyo, Nick Ing-Simmons wrote:

Someone could/should write a generic test that pushes all codepoints
supported by a .ucm file both ways through the generated encoder
and checks for correctness. This would be a pointless thing to do
as part of perl's "make test" as once the "compiler" works it works,
but would be useful for folk working on the compile process.


That is already in as t/rt.pl.  Since the test takes a long time (30 
seconds on my PowerBook G4 800MHz) it is not a part of standard 'make 
test' suite.  The NC Patch passes all that.

Dan the Encode Maintainer



[Encode] HEADS-UP: NC patch will be in

2002-11-04 Thread Dan Kogai
NC and porters,

  First of all, this is a great patch.  Not only does it optimize the 
resulting shlibs, it seems to consume less memory during compilation.

On Monday, Nov 4, 2002, at 12:26 Asia/Tokyo, [EMAIL PROTECTED] wrote:
Nicholas Clark <[EMAIL PROTECTED]> wrote:
:I've been experimenting with how enc2xs builds the C tables that turn 
into the
:shared objects. enc2xs is building tables (arrays of struct 
encpage_t) which
:in turn have pointers to blocks of bytes.

Great, you seem to be getting some excellent results.

Worked absolutely fine on my PowerBook G4, too.

Before:
  208948 Encode/Byte/Byte.bundle
 1984416 Encode/CN/CN.bundle
   30076 Encode/EBCDIC/EBCDIC.bundle
   33728 Encode/Encode.bundle
 2590420 Encode/JP/JP.bundle
 2208996 Encode/KR/KR.bundle
   39720 Encode/Symbol/Symbol.bundle
 1940288 Encode/TW/TW.bundle
   17892 Encode/Unicode/Unicode.bundle

After:
  178220 Encode/Byte/Byte.bundle
 1085116 Encode/CN/CN.bundle
   25336 Encode/EBCDIC/EBCDIC.bundle
   33604 Encode/Encode.bundle
 1308568 Encode/JP/JP.bundle
 1209804 Encode/KR/KR.bundle
   34896 Encode/Symbol/Symbol.bundle
 1059040 Encode/TW/TW.bundle
   17892 Encode/Unicode/Unicode.bundle


I have also wondered whether the .ucm files are needed after these
have been built; if not, we should consider supplying with perl only
the optimised table data if that could give us a space saving in the
distribution - it would cut build time significantly as well as
allowing us to consider algorithms that take much longer over the
table optimisation, since they need be run only once when we
integrate updated .ucm files.


Trivial yet effective patch is to strip all comments therein.  That 
should dramatically saves space but since *.ucm is, in a way, a source. 
 So I am not sure if I should go for it

Anyway, I am pretty much for integrating NC patch not just because it 
reduces shlib sizes but it also appears compiler safer (one of the 
optimizer features (AGGREGATE_TABLES) was dropped during the dev phase 
of perl 5.8 for the sake of djgpp and other low memory platforms).  
Unfortunately I am at my parents' place this week (to finish the book I 
am writing -- away from kids) so I do not have as much resources for 
extensive tests (the FreeBSD box I was using here at my parents just 
died (physically) the day before I came :-( ).

Another concern is that since it changes the internal structure of 
shlibs CPANized Encode::* modules need to be rebuilt as well, so the 
released version needs to print a warning on that -- oh wait!  
Encode.xs remains unchanged so Encode::* may still work

Thank you, NC.

Dan the Encode Maintainer



Re: [not-yet-a-PATCH] compress Encode better

2002-11-04 Thread Nick Ing-Simmons
Nicholas Clark <[EMAIL PROTECTED]> writes:
>I've not looked at what the
>Encode regression tests actually do, so I don't know how thoroughly they
>check whether the transformations are actually correct. In other words,
>done correctly this approach *will* generate the same transformation tables
>as before, and although I *think* I'm doing it correctly (without the -O;
>patches welcome) I'm not certain of this.

Someone could/should write a generic test that pushes all codepoints 
supported by a .ucm file both ways through the generated encoder 
and checks for correctness. This would be a pointless thing to do 
as part of perl's "make test" as once the "compiler" works it works,
but would be useful for folk working on the compile process.

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/




Re: [not-yet-a-PATCH] compress Encode better

2002-11-04 Thread Nick Ing-Simmons
<[EMAIL PROTECTED]> writes:
>Nicholas Clark <[EMAIL PROTECTED]> wrote:
>:I've been experimenting with how enc2xs builds the C tables that turn into the
>:shared objects. enc2xs is building tables (arrays of struct encpage_t) which
>:in turn have pointers to blocks of bytes.
>
>Great, you seem to be getting some excellent results.
>
>I have also wondered whether the .ucm files are needed after these
>have been built; if not, we should consider supplying with perl only
>the optimised table data if that could give us a space saving in the
>distribution - it would cut build time significantly as well as
>allowing us to consider algorithms that take much longer over the
>table optimisation, since they need be run only once when we
>integrate updated .ucm files.

The main reason the "compile" was left in was that the tables are different 
for UTF-EBCDIC machines. Otherwise just shipping the .C files used 
to generate the .so-s is a good idea - I was also struggling to come up 
with a "data" format that would be as efficent in space/time as the code
version.

If we eliminate .ucm-s from distribution then I think it would be 
useful to have something which will print them out from the internal 
form.

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/