Re: [not-yet-a-PATCH] compress Encode better
On Mon, Nov 04, 2002 at 08:11:04PM +0900, Dan Kogai wrote: > NC and porters, > >First of all, this is a great patch. Not only does it optimize the > resulting shlibs, it seems to consume less memory during compilation. Thanks. I wasn't actually trying to reduce memory usage during compilation (either running the perl script, or running the C compiler) The only change that was explictly thinking about memory and CPU usage for the perl script was this one: - # We have a single long line. Split it at convenient commas. - $definition =~ s/(.{74,77},)/$1\n/g; - print $fh "$definition };\n\n"; + # We have a single long line. Split it at convenient commas. + print $fh $1, "\n" while $definition =~ /\G(.{74,77},)/gcs; + print $fh substr ($definition, pos $definition), " };\n"; and I was bad and didn't actually benchmark its effects. [instead of doing things in memory, and constantly re-copying the remainder of the string every time the s///g adds a newline, the revised version prints the sections of string out (and lets the IO system worry about aggregating sections into one string] > Thank you, NC. It's not a problem. It allowed me to put of doing other stuff :-) [such as actually writing the book review I am supposed to be doing for http://london.pm.org/reviews/ ] On Mon, Nov 04, 2002 at 01:42:58PM +, Nick Ing-Simmons wrote: > Dan Kogai <[EMAIL PROTECTED]> writes: > >On Monday, Nov 4, 2002, at 19:17 Asia/Tokyo, Nick Ing-Simmons wrote: > >> Someone could/should write a generic test that pushes all codepoints > >> supported by a .ucm file both ways through the generated encoder > >> and checks for correctness. This would be a pointless thing to do > >> as part of perl's "make test" as once the "compiler" works it works, > >> but would be useful for folk working on the compile process. > > > >That is already in as t/rt.pl. Since the test takes a long time (30 > >seconds on my PowerBook G4 800MHz) it is not a part of standard 'make > >test' suite. The NC Patch passes all that. > > Excellent. Mr Burns couldn't have put it better. :-) On Mon, Nov 04, 2002 at 08:19:57PM +0900, Dan Kogai wrote: > On Monday, Nov 4, 2002, at 20:11 Asia/Tokyo, Dan Kogai wrote: > > oh wait! Encode.xs remains unchanged so Encode::* may still work > > Confirmed. The NC patch works w/ preexisting shlibs. Good. It would have been worrying if it had not. The idea was not to change any of the internal data structures visible to any code anywhere, just to change how the U8 strings they point were arranged. Nicholas Clark -- z-code better than perl?http://www.perl.org/advocacy/spoofathon/
Re: [not-yet-a-PATCH] compress Encode better
Dan Kogai <[EMAIL PROTECTED]> writes: >On Monday, Nov 4, 2002, at 19:17 Asia/Tokyo, Nick Ing-Simmons wrote: >> Someone could/should write a generic test that pushes all codepoints >> supported by a .ucm file both ways through the generated encoder >> and checks for correctness. This would be a pointless thing to do >> as part of perl's "make test" as once the "compiler" works it works, >> but would be useful for folk working on the compile process. > >That is already in as t/rt.pl. Since the test takes a long time (30 >seconds on my PowerBook G4 800MHz) it is not a part of standard 'make >test' suite. The NC Patch passes all that. Excellent. -- Nick Ing-Simmons http://www.ni-s.u-net.com/
Re: [Encode] HEADS-UP: NC patch will be in
On Monday, Nov 4, 2002, at 20:11 Asia/Tokyo, Dan Kogai wrote: oh wait! Encode.xs remains unchanged so Encode::* may still work Confirmed. The NC patch works w/ preexisting shlibs. >perl -MEncode -e 'print Encode->VERSION, "\n"' 1.81 # not released, of course! >perl -MEncode::HanExtra -e 1 > Dan the Encode Maintainer
Re: [not-yet-a-PATCH] compress Encode better
On Monday, Nov 4, 2002, at 19:17 Asia/Tokyo, Nick Ing-Simmons wrote: Someone could/should write a generic test that pushes all codepoints supported by a .ucm file both ways through the generated encoder and checks for correctness. This would be a pointless thing to do as part of perl's "make test" as once the "compiler" works it works, but would be useful for folk working on the compile process. That is already in as t/rt.pl. Since the test takes a long time (30 seconds on my PowerBook G4 800MHz) it is not a part of standard 'make test' suite. The NC Patch passes all that. Dan the Encode Maintainer
[Encode] HEADS-UP: NC patch will be in
NC and porters, First of all, this is a great patch. Not only does it optimize the resulting shlibs, it seems to consume less memory during compilation. On Monday, Nov 4, 2002, at 12:26 Asia/Tokyo, [EMAIL PROTECTED] wrote: Nicholas Clark <[EMAIL PROTECTED]> wrote: :I've been experimenting with how enc2xs builds the C tables that turn into the :shared objects. enc2xs is building tables (arrays of struct encpage_t) which :in turn have pointers to blocks of bytes. Great, you seem to be getting some excellent results. Worked absolutely fine on my PowerBook G4, too. Before: 208948 Encode/Byte/Byte.bundle 1984416 Encode/CN/CN.bundle 30076 Encode/EBCDIC/EBCDIC.bundle 33728 Encode/Encode.bundle 2590420 Encode/JP/JP.bundle 2208996 Encode/KR/KR.bundle 39720 Encode/Symbol/Symbol.bundle 1940288 Encode/TW/TW.bundle 17892 Encode/Unicode/Unicode.bundle After: 178220 Encode/Byte/Byte.bundle 1085116 Encode/CN/CN.bundle 25336 Encode/EBCDIC/EBCDIC.bundle 33604 Encode/Encode.bundle 1308568 Encode/JP/JP.bundle 1209804 Encode/KR/KR.bundle 34896 Encode/Symbol/Symbol.bundle 1059040 Encode/TW/TW.bundle 17892 Encode/Unicode/Unicode.bundle I have also wondered whether the .ucm files are needed after these have been built; if not, we should consider supplying with perl only the optimised table data if that could give us a space saving in the distribution - it would cut build time significantly as well as allowing us to consider algorithms that take much longer over the table optimisation, since they need be run only once when we integrate updated .ucm files. Trivial yet effective patch is to strip all comments therein. That should dramatically saves space but since *.ucm is, in a way, a source. So I am not sure if I should go for it Anyway, I am pretty much for integrating NC patch not just because it reduces shlib sizes but it also appears compiler safer (one of the optimizer features (AGGREGATE_TABLES) was dropped during the dev phase of perl 5.8 for the sake of djgpp and other low memory platforms). Unfortunately I am at my parents' place this week (to finish the book I am writing -- away from kids) so I do not have as much resources for extensive tests (the FreeBSD box I was using here at my parents just died (physically) the day before I came :-( ). Another concern is that since it changes the internal structure of shlibs CPANized Encode::* modules need to be rebuilt as well, so the released version needs to print a warning on that -- oh wait! Encode.xs remains unchanged so Encode::* may still work Thank you, NC. Dan the Encode Maintainer
Re: [not-yet-a-PATCH] compress Encode better
Nicholas Clark <[EMAIL PROTECTED]> writes: >I've not looked at what the >Encode regression tests actually do, so I don't know how thoroughly they >check whether the transformations are actually correct. In other words, >done correctly this approach *will* generate the same transformation tables >as before, and although I *think* I'm doing it correctly (without the -O; >patches welcome) I'm not certain of this. Someone could/should write a generic test that pushes all codepoints supported by a .ucm file both ways through the generated encoder and checks for correctness. This would be a pointless thing to do as part of perl's "make test" as once the "compiler" works it works, but would be useful for folk working on the compile process. -- Nick Ing-Simmons http://www.ni-s.u-net.com/
Re: [not-yet-a-PATCH] compress Encode better
<[EMAIL PROTECTED]> writes: >Nicholas Clark <[EMAIL PROTECTED]> wrote: >:I've been experimenting with how enc2xs builds the C tables that turn into the >:shared objects. enc2xs is building tables (arrays of struct encpage_t) which >:in turn have pointers to blocks of bytes. > >Great, you seem to be getting some excellent results. > >I have also wondered whether the .ucm files are needed after these >have been built; if not, we should consider supplying with perl only >the optimised table data if that could give us a space saving in the >distribution - it would cut build time significantly as well as >allowing us to consider algorithms that take much longer over the >table optimisation, since they need be run only once when we >integrate updated .ucm files. The main reason the "compile" was left in was that the tables are different for UTF-EBCDIC machines. Otherwise just shipping the .C files used to generate the .so-s is a good idea - I was also struggling to come up with a "data" format that would be as efficent in space/time as the code version. If we eliminate .ucm-s from distribution then I think it would be useful to have something which will print them out from the internal form. -- Nick Ing-Simmons http://www.ni-s.u-net.com/