Dan Sugalski <[EMAIL PROTECTED]> writes:

>> Hmm. Suppose that I have a system that is friendly to 80 byte
>> records.  I want to output "meaningful" strings, so I want to
>> partition a buffer into 80-ish byte substrings, but preserve any
>> graphemes (i.e., store the data in a legible format).
>>
>> How would I do that?
>
> You don't. Or if you do, you do it with a lot of pain, sweat, and
> annoying hard work. 80 bytes gets you somewhere between three (And
> this may be a *high* estimate--there may be circumstances where 80
> bytes is insufficient for *one* grapheme) and 80 graphemes.
>
> This isn't something that can be made generically easy.

It's no worse than implementing word wrap.  Someone will of course
implement it as a generic routine, something along the lines of

my @line = breakunicodestringintobytebufferchunks(
   string => $string,
   chunksize => 80,
   keeptogether => 'graphemes',
   extremelongparts => 'split',
    # 'split' will try to split it at a mostly-reasonable
    #   place if possible, similar to word wrap that looks
    #   for syllable boundaries.
    # 'truncate' would do the same but drop the second part,
    #   rather than putting it in the next line.
    # 'skip' would drop the whole grapheme out.
    # 'allow' would create a line longer (in bytes) than
    #   the chunksize, which is what a lot of word wrap
    #   algorithms do, but would not work if you really
    #   have to fit in a fixed-byte-size buffer.  It would
    #   of course put the thing on a line by itself though,
    #   to minimize the overflow.
   );

There are reasons for doing this, e.g. if you've got Unicode text to
send via a network protocol with an octet-oriented RFC, or if you're
interacting with some legacy C code that has fixed-size buffers.
Someone will write the routine to do as well as can be expected, and
it'll be put on the CPAN, and people who need this sort of thing will
use it.

I don't think the language needs to be designed around it though.

-- 
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,"[EMAIL PROTECTED]/ --";$\=$ ;-> ();print$/

Reply via email to