Re: [Chicken-users] UTF-8 support in eggs

Alex Shinn Thu, 10 Jul 2014 18:29:32 -0700

On Fri, Jul 11, 2014 at 7:20 AM, Oleg Kolosov <bazur...@gmail.com> wrote:

> On 07/09/14 09:00, Alex Shinn wrote:
> > The clean way to handle this is to duplicate the useful string
> > APIs for bytevectors.  This could be done without code duplication
> > with the use of functors, though compiler assistance may be
> > needed for efficiency (e.g. for inlined procedures).  Even without
> > code duplication there would be an increase in the core library
> > size, though we could probably move most utilities to external
> > libraries (how often do you need regexps that operate on binary
> > data?).
>
> Considering Chibi Scheme size numbers from your other mail, I hardly
> call this a huge price for the benefit received. Even for my specific
> embedded use cases.
>

Note Chibi factors out all but a few string utilities into
separate libraries, i.e. the Chibi core is smaller than the
Chicken core.  The size increase for Chicken would thus
be correspondingly larger, though still likely very small.

> The bigger issue from the performance perspective is existing
> > idioms that use indexes, which can degrade to quadratic behavior
> > in the worst case no matter how much you optimize (without hacks
> > that slow down normal usage).  So people would have to learn to
> > take substrings where appropriate to avoid the start/end parameters
> > to all SRFI 13 functions, or we would need to deprecate SRFI 13
> > in favor of a cursor-oriented API (planned for R7RS).
>
> Do you have some examples on how to avoid performance degradation and
> not use string indexes?

Just don't use string indexes - they're not useful.  Passing
and returning cursors (byte offsets into strings) is all you need. [*]

In the more common cases, just using string ports, string-map,
or loop syntax hides the underlying iteration (a good loop macro
has potential to be faster than manual iteration).

How about more complex formatting like
> outputting numbers with padding? I guess these should be handled with
> something like fmt (or chibi.show).

Well, this is completely orthogonal to utf8, but probably the
most important performance hack for combinator formatters
is Chicken's define-compiler-syntax.

-- 
Alex

[*] With very few exceptions, the only example of which I'm aware
of is Boyer-Moore. However, string search on utf8 bytes is faster than
on UCS-32 codepoints, so the trick is to just provide string search as
part of an API and let implementations optimize accordingly.

_______________________________________________
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] UTF-8 support in eggs

Reply via email to