RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)
Lars Kristan wrote:
> I never said it doesn't violate any existing rules. Stating that it
> does, doesn't help a bit. Rules can be changed. Assuming we understand
> the consequences. And that is what we should be discussing. By stating
> what shoul
Kenneth Whistler wrote:
> I do not think this is a proposal to amend UTF-8 to allow
> invalid sequences. So we should get that off the table.
I hope you are right.
> Apparently Lars is currently using PUA U+E080..U+E0FF
> (or U+EE80..U+EEFF ?) for this purpose, enabling the round-tripping
> of
Philippe Verdy wrote:
> An alternative can then be a mixed encoding selection:
> - choose a legacy encoding that will most often be able to represent
> valid filenames without loss of information (for example ISO-8859-1,
> or Cp1252).
> - encode the filename with it.
> - try to decode it with a *
Kenneth Whistler scripsit:
> Storage of UNIX filenames on Windows databases, for example,
> can be done with BINARY fields, which correctly capture the
> identity of them as what they are: an unconvertible array of
> byte values, not a convertible string in some particular
> code page.
This solut
Peter Kirk scripsit:
> I notice that Elaine is here proposing a HEBREW SAMARITAN PUNCTUATION
> WORD DIVIDER - and this should be in the BMP as Samaritan is a script in
> modern list. But there is already in the pipeline a PHOENICIAN WORD
> SEPARATOR, provisionally U+1091F, and already defined U
John Cowan wrote:
OpenType is a trademark of Microsoft and a proprietary font format
jointly developed by Microsoft and Adobe.
The question is, is it an open standard? That is, is anyone free to
create OpenType fonts, OpenType font tools, OpenType font renderers?
Is the documentation freely ava
John Hudson scripsit:
> OpenType is a trademark of Microsoft and a proprietary font format
> jointly developed by Microsoft and Adobe.
The question is, is it an open standard? That is, is anyone free to
create OpenType fonts, OpenType font tools, OpenType font renderers?
Is the documentation f
Lars,
I'm going to step in here, because this argument seems to
be generating more heat than light.
> I never said it doesn't violate any existing rules. Stating that it does,
> doesn't help a bit. Rules can be changed.
> I ask you to step back and try to see the big picture.
First, I'm going
On 06/12/2004 22:41, E. Keown wrote:
...
1.
Proposal to add Samaritan Pointing to the UCS
http://www.lashonkodesh.org/samarpro.pdf
WG2 number: N2748
I notice that Elaine is here proposing a HEBREW SAMARITAN PUNCTUATION
WORD DIVIDER - and this should be in the BMP as Samaritan is a script in
m
Philippe continued:
> As if Unicode had to be bound on
> architectural constraints such as the requirement of representing code units
> (which are architectural for a system) only as 16-bit or 32-bit units,
Yes, it does. By definition. In the standard.
> ignoring the fact that technologies do
RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)I know wht you mean here:
most Linux/Unix filesystems (as well as many legacy filesystems for Windows
and MacOS...) do not track the encoding with which filenames were encoded
and, depending on local user preferences when that user created that fi
From: "D. Starner" <[EMAIL PROTECTED]>
(Sorry for sending this twice, Marcin.)
"Marcin 'Qrczak' Kowalczyk" writes:
UTF-8 is poorly suitable for internal processing of strings in a
modern programming language (i.e. one which doesn't already have a
pile of legacy functions working of bytes, but whic
At 09:50 PM 12/6/2004, John Hudson wrote:
I don't know. I try to avoid politics, if possible. The significance of
what I'm saying is that you have made a good start in your proposal, that
it has some shortcomings, and that I hope to be able to help put something
more complete together.
It wou
From: "Kenneth Whistler" <[EMAIL PROTECTED]>
Yes, and pigs could fly, if they had big enough wings.
Once again, this is a creative comment. As if Unicode had to be bound on
architectural constraints such as the requirement of representing code units
(which are architectural for a system) only as
At 11:52 PM 12/6/2004, Jony Rosenne wrote:
In chapter 8, regarding Hebrew, the standard says:
Positioning. Marks may combine with vowels and other points, and there are
complex typographic rules for positioning these combinations.
I understand that this sentence should be regarded as being normativ
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of E. Keown
> In the so-called 'deprecated' block, the 2nd Hebrew
> block in the BMP, are composed Hebrew points which I
> plan to go on using. And I expect everyone else to go
> on using them also, all Hebraists. We think they are
E. Keown wrote:
In the so-called 'deprecated' block, the 2nd Hebrew
block in the BMP, are composed Hebrew points which I
plan to go on using. And I expect everyone else to go
on using them also, all Hebraists. We think they are
needed for 'text representation' of shin and sin.
It really is a be
> Yes, and pigs could fly, if they had big enough wings.
An 8-foot wingspan should do it. For picture of said flying pig see:
http://www.cincinnati.com/bigpiggig/profile_091700.html
http://www.cincinnati.com/bigpiggig/images/pig091700.jpg
Rick
Thanks to Peter Constable, John Hudson, Tom Gewecke, Christopher Fynn, and
others, for taking the time to address my question.
Gary
---
Gary Grosso
Arbortext, Inc.
Ann Arbor, MI, USA
Philippe stated, and I need to correct:
> UTF-24 already exists as an encoding form (it is identical to UTF-32), if
> you just consider that encoding forms just need to be able to represent a
> valid code range within a single code unit.
This is false.
Unicode encoding forms exist by virtue of
Richard Cook wrote:
> Well, why stop with words, my lord? Why not just encode all sentences,
> paragraphs, pages, chapters, books, libraries, or your higher level
> unit of choice, for that matter.
> ...
> Whether you choose to associate a single glyph with your private-use
> code point, or an en
Title: RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)
Doug Ewell replied:
> Actually the Unicode Technical Committee. But you are
> correct: it is up
> to the UTC to decide whether they want to redefine UTF-8 to permit
> invalid sequences, which are to be interpreted as unknown characte
Title: RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)
Doug Ewell wrote:
> John Cowan wrote:
>
> > Windows filesystems do know what encoding they use. But a
> filename on
> > a Unix(oid) file system is a mere sequence of octets, of
> which only 00
> > and 2F are interpreted. (Filenam
On 07/12/2004 07:52, Jony Rosenne wrote:
...
Consequently, there is and cannot be anything wrong with Unicode (at least
in this respect) and it does support "ANY sequence of Hebrew vowels and
consonants".
I do maintain that is some cases the typographic process would require out
of band assistance
Elaine in Vancouver
Dear Mark:
Thanks, I guess.
> This is the one I'm going to comment on, since it's
> the one I know best.
> I know that Michael Everson and I are working on a
> Samaritan proposal,
It appears to me that my proposal came first, no? By
some months...I have some mate
Elaine Keown
Vancouver
Dear Philippe and Lists:
> In all your searches and in your proposals, did you
> try to segregate the proposed additional characters
> into two separate categories: those needed
> for inclusion within many modern studies, and those
The Samaritan marks are sti
From: "D. Starner" <[EMAIL PROTECTED]>
If you're talking about a language that hides the structure of strings
and has no problem with variable length data, then it wouldn't matter
what the internal processing of the string looks like. You'd need to
use iterators and discourage the use of arbitrary
On Dec 5, 2004, at 07:02 PM, Doug Ewell wrote:
A word-based encoding for English could automatically assume spaces
where they are appropriate. The sentence:
"What means this, my lord?"
would have seven encodable elements: the five words, the comma, and the
question mark. Spaces would be automatic
In chapter 8, regarding Hebrew, the standard says:
Positioning. Marks may combine with vowels and other points, and there are
complex typographic rules for positioning these combinations.
I understand that this sentence should be regarded as being normative.
Clause 4.3 uses the word "tend".
Ch
29 matches
Mail list logo