I'll throw in my two bits here.
I'm not personally decided whether utf-8 in core would be an
improvement. I don't have enough background or knowledge of
the internals to contribute to that decision.
I can offer this, however:
I have found that I have to use utf-8 support in every project
I've written in Chicken. I do so, and have only had a problem
when the utf-8 egg did not map a procedure from core properly.
I'm getting by just fine with the current state of affairs, and
I do have a certain nostalgic love of ASCII. If I *could* get
away with only having ASCII, I would. This has not been true
in practice.
My experience with numbers is slightly different, where I do
find I need to do word-level calculation where I depend on the
underlying machine implementation of character- and pointer-sized
integers. I use the fx versions of these functions when I do
rely on this, but I mainly have found I must intentionally subvert
the numeric tower to get a specific behavior. This has never been
true when I've dealt with characters.
FWIW,
-Alan
On Sun, Jan 27, 2013 at 10:43:41AM +0900, Ivan Raikov wrote:
Hi Alex,
*** Yes, I would have thought that more people would be interested in
having UTF-8 support in core Chicken (or at least wide-char compatible
srfi-14). I have changed the title of this thread to reflect the subject
more accurately :-)
* Personally, I think that adding UTF-8* in core is much better than the
hacks I had to do in mbox, and is a no brainer considering the benchmark
results you have below.* But I am sure that opinions vary on this
subject...
** Can you post your bounds-check patches to srfi-14 on the mailing list,
and/or create a ticket for it? Hopefully there will be more responses this
time.
*** Ivan
On Sat, Jan 26, 2013 at 1:42 PM, Alex Shinn [1]alexsh...@gmail.com
wrote:
On Wed, Jan 23, 2013 at 5:09 PM, Alex Shinn [2]alexsh...@gmail.com
wrote:
On Wed, Jan 23, 2013 at 3:45 PM, Ivan Raikov
[3]ivan.g.rai...@gmail.com wrote:
Yes, I ran into this when I was adding UTF-8 support to mbox... If
you were to add wide char support in srfi-14, is there a way to
quantify the performance penalty?
To add the bounds check so it doesn't error? *Practically
nothing.
To branch to a separate path for a wide-char table if
the bounds check fails? *Same cost if the input is ASCII.
For efficient handling in the case of Unicode input...
how small/fast do you want it?
I've never met such stony silence in response to an offer to do work...
I ran the following simple char-set-contains? benchmark with
a few variations:
* (time
* *(do ((i 0 (+ i 1)))
* * * *((= i 1))
* * * *(do ((j 0 (+ j 1)))
* * * * * *((= j 256))
* * * * *(char-set-contains? char-set:letter (integer-char j)
This is what most people are concerned about for speed, as
the boolean and construction operations are less common.
The results:
;; reference implementation
;; 0.312s CPU time, 1/2059 GCs (major/minor)
;; fixed reference implementation (no error but no support for
non-latin-1)
;; 0.257s CPU time, 1/1706 GCs (major/minor)
;; utf8-srfi-14 with full Unicode char-set:letter
;; 0.243s CPU time, 0/1526 GCs (major/minor)
;; utf8-srfi-14 with ASCII-only char-set:letter
;; 0.242s CPU time, 0/1526 GCs (major/minor)
I was able to add the check and make the reference
implementation faster because I fixed the common case -
it was optimized for checking for 0 instead of 1.
Even with the enormous and complex definition of a
Unicode letter, utf8-srfi-14 is faster than srfi-14.
As for what we want in Chicken, the answer depends
on what you're optimizing for. *utf8-srfi-14 will always
win for space, and generally for speed as well.
If the biggest concern is code-size, then you might want
to borrow the char-set definition from irregex and use
that as a fallback for non-latin-1 chars in the srfi-14
reference impl. *This would have the same perf as
srfi-14 for latin-1, yet still support full Unicode and not
increase the size of the Chicken distribution.
--*
Alex
References
Visible links
1. mailto:alexsh...@gmail.com
2. mailto:alexsh...@gmail.com
3. mailto:ivan.g.rai...@gmail.com
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users
--
my personal website: http://c0redump.org/
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users