Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

Ludovic Courtès Wed, 03 Apr 2013 13:11:56 -0700

Mark H Weaver <m...@netris.org> skribis:

> l...@gnu.org (Ludovic Courtès) writes:
>> Woow, well thought out.  The semantics seem good.  (It’s interesting to
>> see how BOMs complicate things, but that’s life, I guess.)
>>
>> The patch looks good to me.  The test suite is nice.  It doesn’t seem to
>> cover all the corner cases listed above, but that can be added later on
>> perhaps?
>
> Yes, the tests are still a work-in-progess, but I've added quite a few
> more since you last looked.


Nice.

>> Perhaps the text above could be added to the manual,
>
> In the attached patch, I've added a new node to the "Input and Output"
> section.

Perfect.

>>> +{
>>> +  scm_t_port *pt = SCM_PTAB_ENTRY (port);
>>> +  int result;
>>> +  int i = 0;
>>> +
>>> +  while (i < len && scm_peek_byte_or_eof (port) == bytes[i])
>>> +    {
>>> +      pt->read_pos++;
>>> +      i++;
>>> +    }
>>> +
>>> +  result = (i == len);
>>> +
>>> +  while (i > 0)
>>> +    scm_unget_byte (bytes[--i], port);
>>> +
>>> +  return result;
>>> +}
>>
>> Should it be scm_get_byte_or_eof given that scm_unget_byte is used later?
>
> Yes.  Bytes are only consumed if are equal to bytes[i], so an EOF will
> never be consumed or passed to scm_unget_byte.
>
>> What if pt->read_buf_size == 1?  What if there’s data in saved_read_buf?
>
> All of those details are handled by 'scm_peek_byte_or_eof', which is
> guaranteed to leave 'pt->read_pos' pointing at the byte that's returned
> (if not EOF).  Therefore, it's always safe to increment that pointer by
> one (but no more than one) after calling 'scm_peek_byte_or_eof' if it
> returned non-EOF.
>
> Look at the code for 'scm_peek_byte_or_eof' and this will be clear.
> Also note that you did the same thing in 'scm_utf8_codepoint' :)

Ah yes, indeed.

[...]

> Here's the new patch.  Any more suggestions?

Not from me!  OK to commit as far as I’m concerned.

Thank you!

Ludo’.

Re: [PATCH] Improve handling of Unicode byte-order marks (BOMs)

Reply via email to