Very good points. Let me add more inline noise.

On Wed, May 13, 2015 at 11:25 PM, Matt Gushee <m...@gushee.net> wrote:

> Hi, Moritz--
>
> On Thu, Apr 16, 2015 at 2:35 PM, Moritz Heidkamp <
> mor...@twoticketsplease.de> wrote:
>
>>
>> sorry for the late reply, got busy :-)
>>
>
> And I'm sorry for the even later reply, got scared :-)
> No, really! It's stupid, but I am often scared of people's reactions when
> I make even mildly critical remarks (I hasten to point out that that
> feeling has nothing to do with any behavior I've observed on this mailing
> list - just my own neurosis, I guess).
>

I have the same issue, but only when I myself make overly critical remarks.
Afraid of the retorts, perhaps.


>
>> > On Sat, Mar 28, 2015 at 5:33 AM, Moritz Heidkamp <
>> mor...@twoticketsplease.de
>> >> wrote:
>>
>
>
>> > Maybe in that case it would be good if the API doc said something like:
>> > "comparse is compatible with UTF-8, but many of the built-in
>> combinators do
>> > not work with UTF-8 characters, so you may need to construct your own.
>> For
>> > example: ..."
>>
>> Sure, we can do that! I didn't mention it so far as this property is
>> implicit with how CHICKEN core strings work.
>
>
> I think you are assuming too much background knowledge. Maybe in a perfect
> world, everybody would get a CS degree, then learn all the fundamentals of
> Scheme, then master the Chicken core, then start working with extensions
> and building practical software ... but of course that's often not how it
> works in reality. And in my opinion - as someone with no formal education
> in the field, but who came to Scheme with a few years of practical
> experience in other languages - I think Scheme in general, and any given
> implementation, has a really steep learning curve. The documentation is
> good in that almost everything you need to know is covered somewhere, but
> it can also be really hard to find the information due to the extreme
> modularity and very bazaar-like culture of Scheme - docs are split up
> between r*rs and an implementation manual and extension docs written by
> different people following very different conventions. Additionally - and
> this is not anyone's fault in particular, more a side-effect of the small
> community working on Chicken - since the documentation as a whole is not
> rigorously maintained, contradictions inevitably creep in, and there are
> often several ways to do the same or similar things, with no indication of
> which is best or recommended.
>
> Sorry to rant ... I'm rather passionate about documentation. Must be my
> Python background ;-)
>

I believe I complained about this once. But I've since stopped complaining,
because I had no useful ideas on how to improve things. If you already know
what you need, things like Chickadee are awesome. If you are trying to
figure out a strange behavior, I've since learned to go back to the R5RS
docs and, if not there, look up 'deviations from the standard'. If not
there, I'll stumble through the documentation like a cockroach on meth.
Sometimes I manage to find useful information, sometimes I don't. I'd never
find out about CHICKEN core string behavior, for instance.

One idea I came up, but never implemented, was to record a list of
'documentation cache misses'. Would go like this: every time I failed to
find something in the documentation, I'd record what it was. If I really
couldn't find it, I'd ask around here or on #chicken. If noone could find
it, then I'd mark it to be inserted somewhere (or updated).


>
>
>> It's not quite that simple: Characters may be encoded in many ways,
>> UTF-8 is far from the only widely used one and not ideal in all cases,
>> e.g. the algorithmic complexity of some operations on UTF-8 encoded
>> strings is objectively worse than those on UTF-32 encoded strings. And
>> it will remain so even till the year 2050. Not guarantees on what
>> happens after that, though!
>>
>
> I'm certainly aware of different encodings - I started programming when I
> lived in Japan in the 90s, before Unicode 1.0 was finalized, and there were
> 3 major encodings in common use just for Japanese. But there is such a
> thing as 'reasonable defaults' - if you can't support all the encodings,
> what is more interoperable than UTF-8?
>

Nothing that I'm aware of. Making it the default can hurt some japanese
users, I'm told, because encoding conversion between (???? - forgot what
the encoding was) and UTF-8 can be lossy.

Many implementations use UTF-16 internally(Win32 API, Java). Maybe the are
up to something, performance-wise.


>
>
>> > I'm of the opinion (shared by many I18n experts, if I'm not mistaken)
>> > that a high-level language in the 21st century should have in its core
>> > a rock-solid character abstraction that is never, ever conflated with
>> > a byte.
>>
>> The character abstraction actually is rock-solid even in CHICKEN 4
>> already: A character object represents a Unicode codepoint in an
>> encoding independent way.
>>
>
> Okay, but that isn't reflected very well in the core API.
>
> Here's what bugs me about the laissez-faire approach to strings (i.e.
> "yes, the underlying implementation is Unicode-aware, but you are free to
> treat strings as sequences of bytes"): I think it's very bad practice from
> the standpoint of promoting adoption of the language.
>

Treating strings as a sequence of bytes is actually ok if you aren't
actually manipulating content. For example, you are doing I/O with it. As
long as you assemble it correctly again. If you are actually doing string
manipulation, you're screwed if you call a function that thinks it is just
a sequence of bytes.


> And *that* matters, IMHO, because programming languages tend to either
> grow or die (though it appears to me that, unusually, Scheme as a whole is
> gradually declining, but Chicken is kind of in a holding pattern).
>

As is Racket.

But I feel that the general trend for the Lisp family is upwards, thanks to
Clojure.


> And - to simplify a bit - people/companies planning to build real
> applications choose languages based on their whole ecosystems (core
> language + tools + platform support + libraries). And if you're concerned
> about interoperability in a global context, the fact that a language does
> nothing to enforce the use of one or more interoperable string encodings is
> surely a big black mark. IOW, you can't trust that any given Chicken
> extension is Unicode-aware.
>
>
That sucks. Your data could get mangled without you knowing about it.

I wish unicode strings could be explicitly marked as such, and known
'unsafe' functions would at least generate compiler warnings.


— Stephen
_______________________________________________
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Reply via email to