Re: RTL PUA?
On 8/20/2011 6:44 PM, Doug Ewell wrote: Would that really be a better default? I thought the main RTL needs for the PUA would be for unencoded scripts, not for even more Arabic letters. (How many more are there anyway?) In any case, either 'R' or 'AL' as the Plane 16 default would be an improvement over having 'L' for the entire PUA. The best default would be an explicit "PU" - undefined behavior in the absence of a private agreement. However, it helps to remember why the PUAs exist to begin with. The demand came from East Asian character sets, which long had had such private use areas. In their case, the issue of properties did not seriously arise, because the vast bulk of private characters where ideographs. I bet this remains true, and so the original motivation for the suggestion of "L" as the default would still apply - no matter how unsatisfactory this is from a formal point of view. If maintaining the "L" default were to fail on the cliff of political correctness (or the "fairness" argument that has been made) the only proper solution is to use a value of "unknown" (i.e the hypothetical PU value) for all private use code points. There are some properties where stability guarantees prevent adding a new value. In that case, the documentation should point out that the intended effect was to have a PU value, but for historical / stability reasons, the tables contain a different entry. Suggesting a "structure" on the private use area, by suggesting different default properties, ipso facto makes the PUA less private. That should be a non-starter. A./
Re: RTL PUA?
Would that really be a better default? I thought the main RTL needs for the PUA would be for unencoded scripts, not for even more Arabic letters. (How many more are there anyway?) In any case, either 'R' or 'AL' as the Plane 16 default would be an improvement over having 'L' for the entire PUA. --Original Message-- From: Richard Wordingham Sender: unicode-bou...@unicode.org To: unicode@unicode.org Subject: Re: RTL PUA? Sent: Aug 20, 2011 19:18 On Sun, 21 Aug 2011 00:21:28 + "Doug Ewell" wrote: > The more I think of it, the more I like the idea of reassigning the > default BC of Plane 16 to 'R'. What would the arguments against this > be? BC of 'AL'? Richard. -- Doug Ewell • d...@ewellic.org Sent via BlackBerry by AT&T
Re: RTL PUA?
On Sun, 21 Aug 2011 00:21:28 + "Doug Ewell" wrote: > The more I think of it, the more I like the idea of reassigning the > default BC of Plane 16 to 'R'. What would the arguments against this > be? BC of 'AL'? Richard.
Re: Code pages and Unicode
On Fri, 19 Aug 2011 17:03:41 -0700 Ken Whistler wrote: > O.k., so apparently we have awhile to go before we have to start > worrying about the Y2K or IPv4 problem for Unicode. Call me again in > the year 2851, and we'll still have 5 years left to design a new > scheme and plan for the transition. ;-) It'll be much easier to extend UTF-16 if there are still enough contiguous points available. Set that wake-up call for 2790, or whenever plane 13 (better, plane 12) is about to come into use. Richard.
Re: RTL PUA?
The more I think of it, the more I like the idea of reassigning the default BC of Plane 16 to 'R'. What would the arguments against this be? -- Doug Ewell • d...@ewellic.org Sent via BlackBerry by AT&T
Re: RTL PUA?
On Fri, 19 Aug 2011 22:14:17 +0700 Martin Hosken wrote: > Therefore, I would suggest that a carefully allocated set of columns > for non L directionality PUA characters be encoded. This PUA doesn't > have to be big, with probably 1 column allocated per directionality. > I'm no expert in the bidi algorithm, but my guess is that we only > need a maximum of 5 columns and perhaps much less. The Ancient Egyptian hieroglyphic script, an RTL script of Bidi mirrored characters, has a Unicode LTR script of Bidi unmirrored characters for its modern representation that currently contains 1071 characters. However, I've seen a quote of about 6,000 different Graeco-Roman period hieroglyphs, so actually writing hieroglyphic Ancient Egyptian in plain text would need about 24 columns of characters. > I would value some input from Bidi experts on this. I hope my input helps in the mean time. I for one couldn't say whether the characters of the demotic Egyptian script should have a Bidi-class of R or AL. Richard.
Re: Code pages and Unicode
It sounds like you’re trying to encode glyphs or glyph fragments, not characters. There is a virtually endless repertoire of “shapes” that could be encoded, but unless each of these is a character actually used in a writing system (not just hypothetically), it’s probably not appropriate for a character encoding. -- Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell From: srivas sinnathurai Sent: Saturday, August 20, 2011 3:35 To: Christoph Päper Cc: unicode@unicode.org Subject: Re: Code pages and Unicode About the research works. I alone (with with my colleagues) researching the fact that Sumerian is Tamil / Tamil is Sumerian This requires quite a lot of space. Additionally I do research on Tamil alphabet as based on scientific definitions and it only represents the mechanical parts , ie only represents the places of articulation as alphabet and not sound based. And, what is call a mathematical multiplier theory on expanding the alphabets leads to not just long-mathematics (nedung kaNaku), but also to extra long mathematics. This is just a sample requirement from me and my colleagues. How many others are there who would require Unicode support? Do you think allocating 32,000 to the code page model would help? Regards Sinnathurai
Timetable for PDAM 1.2, the next PDAM (presumably PDAM 1.3), and DAM 1 of ISO/IEC 10646:2012
On the last SC2/WG2 meeting this June in Helsinki, according to resolution M58.24 (see http://std.dkuug.dk/JTC1/SC2/WG2/docs/n4104.pdf ), there will be "a discussion list and teleconferencing facilities to arrive at dispositions to ballot comments, and issuing of any PDAM ballots (within the scope of current SC2 projects and its subdivisions), between WG2 face to face meetings". According to the discussions, it was intended that, due to the eight-month distance between WG2 meetings due to the prolonged DAM/DIS period, there is an opportunity to solve PDAM issues by the usual commenting prcess within this interval. and to create a new edition of the PDAM, which in turn is to be commented before the next meeting where the outcome the disposition is progressed to DAM. Now, we have PDAM 1.2 (L2/11-316 = SC2 N4201 (derived from WG2 N4107 http://std.dkuug.dk/JTC1/SC2/WG2/docs/n4107.pdf ). In accordance with the issues mentioned above, it states: > Status: > In accordance with Resolution M17.06 adopted at the SC 2 Plenary > Meeting held in Helsinki, Finland, 2011-06-10, this document is > circulated to the SC 2 national bodies for a second PDAM ballot for a > 3-month period. Please vote and comment via the Electronic balloting > system as soon as possible but not later than 2011-10-29. If I understand it correctly, this means: - On 2011-10-29, all comments are collected. - A few days later, the comments will be published on the WG2 list. - At the same time, the discussion list starts working, and there is a reasonable date, maybe two weeks later, the disposition (at least these which are unanimous), are collected, and the editor updates PDAM 1.2 into a PDAM 1.3. A reasonable point in time for the publication of such a PDAM 1.3, as far as I presume, is the end of November of 2011. Then, this PDAM is to be commented the usual way in time to the next WG2 meeting scheduled for 2012-02-13/17 at Mountain View, CA, USA. Thus, the commenting time for PDAM 1.3 is less than three months. - Is this in line with the ISO rules? - Otherwise, is it intended to squeeze the whole disposition process for PDAM 1.2 into two weeks, thus PDAM 1.3 can be published before 2011-11-13 to have a three-month period until the start of the February meeting? - Karl
Re: Code pages and Unicode
About the research works. I alone (with with my colleagues) researching the fact that Sumerian is Tamil / Tamil is Sumerian This requires quite a lot of space. Additionally I do research on Tamil alphabet as based on scientific definitions and it only represents the mechanical parts , ie only represents the places of articulation as alphabet and not sound based. And, what is call a mathematical multiplier theory on expanding the alphabets leads to not just long-mathematics (nedung kaNaku), but also to extra long mathematics. This is just a sample requirement from me and my colleagues. How many others are there who would require Unicode support? Do you think allocating 32,000 to the code page model would help? Regards Sinnathurai On 20 August 2011 09:31, Christoph Päper wrote: > Mark Davis ☕: > > > Under the original design principles of Unicode, the goal was a bit more > limited; we envisioned […] a generative mechanism for infrequent CJK > ideographs, > > I'd still like having that as an option. > >
Re: Code pages and Unicode
Mark Davis ☕: > Under the original design principles of Unicode, the goal was a bit more > limited; we envisioned […] a generative mechanism for infrequent CJK > ideographs, I'd still like having that as an option.