Re: about starting off

2002-09-19 Thread Edward H Trager


One more thing:

If you compile PHP with GD and FreeType2 support, you can generate .png
graphics with nicely antialiased text in many scripts on the fly.  Just
feed UTF-8 strings directly to the ImageFTText() function.  Take a look at
my test script at php.net under the ImageFTText() documentation
(http://www.php.net/manual/en/function.imagefttext.php) to see what you
can (and can't) do.

Of course this technique is useful for displaying non-latin scripts
without having to worry about whether your users are using a supported
browser and have the necessary fonts.


On Thu, 19 Sep 2002, Edward H Trager wrote:

>
> Hi, Roslyn,
>
> The tools you choose might to some extent depend on your development
> environment. Using PHP on GNU/Linux or another *NIX environment, the
> following tools will certainly get you started in the right direction.
> Plan on using UTF-8 encoding for everything: so you need to calculate
> database column widths that will be wide enough to support the UTF-8
> strings:
>
>  -- Yudit (www.yudit.org).
> This is a fantastic Unicode editor.  It has keyboard maps for just
> about every language imaginable, has correct shaping for Arabic and
> a number of Indic scripts, and even some handwriting recognition for
> Kanji/Hanzi.  Command-line tools are also provided for converting
> files in different encodings.  Of course UTF-8 is supported.
>
>  -- Latest version of Mozilla (www.mozilla.org).  Mozilla provides very
> good support for rendering a lot of scripts and is very
> standards-compliant, maybe the most standards-compliant
> browser available.
>
>  -- Edith (www.zfc.nl) is a possibly little-known editor for X11.  It is
> *not* unicode aware at all, but it has lots of other indispensible features
> for coding and development, such as regex-based searching and
> replacement, column-wise cut-and-paste, etc.
>
> What I do is type all non-ASCII strings in Yudit and save the file, write
> the ASCII PHP code in Edith (substitute your favorite editor here), open
> up the UTF-8 Yudit file in another Edith window, and copy and paste in the
> UTF-8 strings (which look awful in a non-unicode-aware editor, but a
> good editor doesn't mess with them).
>
>
> On Thu, 19 Sep 2002, roslyn jose wrote:
>
> >
> > hi,
> >
> > im new to unicode, and am working on a project in php/postgresql. i need
> > some info on how to start off with unicode. i went thro the web site and
> > only saw explanations on what it is, its char set,etc. do i need to
> > download or install anything to work with unicode, pls let me know soon.
> > and also once downloaded do i need to import any classes or files when
> > working with it, as im scripting in php and html. thanx
> >
> > regards,
> >
> > roslyn
> >
> >
> >
> > -
> > Do you Yahoo!?
> > New DSL Internet Access from SBC & Yahoo!
>
>
>





Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

2002-09-19 Thread William Overington

Kenneth Whistler wrote, as part of a longer response to my original posting.

>William Overington asked:

[snip]

>> I wonder if consideration could please be given as to whether this matter
>> should be left unregulated or whether some level of regulation should be
>> used.

>I think this should depend first on a determination of whether there
>is a demonstrated need for an actual representation of these sequences --
>which ought to be determined by the people responsible for the
>data stores which might contain them, namely the online bibliographic
>community.

[further remarks here snipped]

Actually, "this matter" to which I was intending to refer was as follows,
being more general than just the romanization of Cyrillic characters.

quote

It seems to me that this matter of sequences of combining characters being
used to give glyphs where different meanings are needed other than just
locally and that glyphs for such meanings are only correctly displayed if a
particular rendering system or a particular font are used touches at the
roots of the Unicode system.

It seems to me that the glyphs for such sequences are being left as if they
were a Private Use Area unregulated system.  I recognize that fonts have
glyph variations in that, say, an Arial letter b looks different to a
Bookman Old Style letter b, yet in that case the meaning is the same.

I wonder if consideration could please be given as to whether this matter
should be left unregulated or whether some level of regulation should be
used.

end quote

In another post in the same thread, Ken states as follows.

quote

But that wasn't my point. There is no particular evidence
that the ALA-LC conventions with the dot above the graphic
ligature ties is in widespread use for romanizations of these
particular languages, that I can see. So the *urgency* of
solving this problem isn't there, unless the LC/library/bibliographic
community comes to the UTC and indicates that they have a data interchange
problem with USMARC records using ANSEL that requires a clear
representation solution in Unicode.

end quote

The problem of which I am seeking discussion please is as to whether, in the
present state of the rules, there would be any need for any bibliographic
community to approach the Unicode Consortium over such a matter, and, if it
is the case that they would not need to do so, would it be better to seek to
change the rules now.

It is convenient to consider the situation in relation to the romanization
of Cyrillic characters, yet similar considerations may well potentially also
apply to topics such as the Byzantine legal texts.  There may well be other
topics to which similar considerations may apply.

For example, please suppose that there were a committee called the
Romanization of Cyrillic Committee.  Suppose that that committee were to
have various meetings and decide that for a ts romanization ligature that

t U+FE20 s U+FE21

suits them fine, and that for the ts with a dot above romanization ligature
that

t U+FE20 s U+FE21 U+0307

suits them fine and publishes a list of assignments and example glyphs.  The
glyph for the ts with a dot above ligature in that publication has the dot
above the curved line, centred horizontally.  It is only later that someone
with expert knowledge of the Unicode standard sees the published list and
notices that the glyph shown in the document is, in fact, not the way that
the glyph should appear according to the Unicode standard.  By this time,
many copies of the document have been published and sent to libraries around
the world!  Databases having started to be converted to what that
publication may well be calling "the new Unicode based system".

This might sound impossible, yet what is the present alternative?  There is
no way to formally register such sequences with the Unicode Consortium!

I suggest that it might be a good idea to have an infrastructure whereby the
Unicode Consortium registers sequences of combining characters and example
glyphs, categorized as to application.

This would have potentially far reaching benefits.

Suppose, for example, that such an infrastructure existed, and that there is
a mathematician, M, and a font designer, F, who do not know each other.

M is writing a research paper on a particular branch of mathematics, where
one of the key reference papers was written by an author whose name is
written in Cyrillic characters, yet which name also has a romanized version.
M finds that that romanization needs a character to represent the ts
romanization ligature.  How can M, who is using a word processor to prepare
the research paper, insert that character into the document, because M is
keen to insert the ts ligature in a form compatible with the standard
bibliographic method for romanization of Cyrillic names?

Fortunately, M finds that the word processor has available various special
characters and finds a ts ligature and inserts it in the document.  Behind
the scenes the wordprocessor softw

Re: about starting off

2002-09-19 Thread Barry Caplan

Roslyn,

I am working on a postgres database too - I haven't yet gotten to extensively testing 
the unicode aspects, but be sure to set the character set of the database to unicode 
when you create it. Otherwise all is probably lost - I don't know that you can simply 
change the char set later, and if you have to dump and import the data, you'd have to 
do some sort of conversions. Why bother making extra work for yourself?


As for the code in php (I am using Perl myself and something similar applies) every 
time you manipulate text (every time!) get used to asking yourself if you (or php) are 
making any assumptions that one byte is the same as one character. The answer needs to 
be no, but will often be yes. Reconciling these issues is the bulk of making Unicode 
work for you.

Barry Caplan
Publisher, www.i18n.com


On Thu, 19 Sep 2002, roslyn jose wrote:

>>
>> hi,
>>
>> im new to unicode, and am working on a project in php/postgresql. i need
>> some info on how to start off with unicode. i went thro the web site and
>> only saw explanations on what it is, its char set,etc. do i need to
>> download or install anything to work with unicode, pls let me know soon.
>> and also once downloaded do i need to import any classes or files when
>> working with it, as im scripting in php and html. thanx
>>
>> regards,
>>
>> roslyn
>>





Re: about starting off

2002-09-19 Thread Edward H Trager


Hi, Roslyn,

The tools you choose might to some extent depend on your development
environment. Using PHP on GNU/Linux or another *NIX environment, the
following tools will certainly get you started in the right direction.
Plan on using UTF-8 encoding for everything: so you need to calculate
database column widths that will be wide enough to support the UTF-8
strings:

 -- Yudit (www.yudit.org).
This is a fantastic Unicode editor.  It has keyboard maps for just
about every language imaginable, has correct shaping for Arabic and
a number of Indic scripts, and even some handwriting recognition for
Kanji/Hanzi.  Command-line tools are also provided for converting
files in different encodings.  Of course UTF-8 is supported.

 -- Latest version of Mozilla (www.mozilla.org).  Mozilla provides very
good support for rendering a lot of scripts and is very
standards-compliant, maybe the most standards-compliant
browser available.

 -- Edith (www.zfc.nl) is a possibly little-known editor for X11.  It is
*not* unicode aware at all, but it has lots of other indispensible features
for coding and development, such as regex-based searching and
replacement, column-wise cut-and-paste, etc.

What I do is type all non-ASCII strings in Yudit and save the file, write
the ASCII PHP code in Edith (substitute your favorite editor here), open
up the UTF-8 Yudit file in another Edith window, and copy and paste in the
UTF-8 strings (which look awful in a non-unicode-aware editor, but a
good editor doesn't mess with them).


On Thu, 19 Sep 2002, roslyn jose wrote:

>
> hi,
>
> im new to unicode, and am working on a project in php/postgresql. i need
> some info on how to start off with unicode. i went thro the web site and
> only saw explanations on what it is, its char set,etc. do i need to
> download or install anything to work with unicode, pls let me know soon.
> and also once downloaded do i need to import any classes or files when
> working with it, as im scripting in php and html. thanx
>
> regards,
>
> roslyn
>
>
>
> -
> Do you Yahoo!?
> New DSL Internet Access from SBC & Yahoo!






about starting off

2002-09-19 Thread roslyn jose
hi,
im new to unicode, and am working on a project in php/postgresql. i need some info on how to start off with unicode. i went thro the web site and only saw explanations on what it is, its char set,etc. do i need to download or install anything to work with unicode, pls let me know soon. and also once downloaded do i need to import any classes or files when working with it, as im scripting in php and html. thanx
regards,
roslynDo you Yahoo!?
New DSL Internet Access from SBC & Yahoo!