Re: about starting off
One more thing: If you compile PHP with GD and FreeType2 support, you can generate .png graphics with nicely antialiased text in many scripts on the fly. Just feed UTF-8 strings directly to the ImageFTText() function. Take a look at my test script at php.net under the ImageFTText() documentation (http://www.php.net/manual/en/function.imagefttext.php) to see what you can (and can't) do. Of course this technique is useful for displaying non-latin scripts without having to worry about whether your users are using a supported browser and have the necessary fonts. On Thu, 19 Sep 2002, Edward H Trager wrote: > > Hi, Roslyn, > > The tools you choose might to some extent depend on your development > environment. Using PHP on GNU/Linux or another *NIX environment, the > following tools will certainly get you started in the right direction. > Plan on using UTF-8 encoding for everything: so you need to calculate > database column widths that will be wide enough to support the UTF-8 > strings: > > -- Yudit (www.yudit.org). > This is a fantastic Unicode editor. It has keyboard maps for just > about every language imaginable, has correct shaping for Arabic and > a number of Indic scripts, and even some handwriting recognition for > Kanji/Hanzi. Command-line tools are also provided for converting > files in different encodings. Of course UTF-8 is supported. > > -- Latest version of Mozilla (www.mozilla.org). Mozilla provides very > good support for rendering a lot of scripts and is very > standards-compliant, maybe the most standards-compliant > browser available. > > -- Edith (www.zfc.nl) is a possibly little-known editor for X11. It is > *not* unicode aware at all, but it has lots of other indispensible features > for coding and development, such as regex-based searching and > replacement, column-wise cut-and-paste, etc. > > What I do is type all non-ASCII strings in Yudit and save the file, write > the ASCII PHP code in Edith (substitute your favorite editor here), open > up the UTF-8 Yudit file in another Edith window, and copy and paste in the > UTF-8 strings (which look awful in a non-unicode-aware editor, but a > good editor doesn't mess with them). > > > On Thu, 19 Sep 2002, roslyn jose wrote: > > > > > hi, > > > > im new to unicode, and am working on a project in php/postgresql. i need > > some info on how to start off with unicode. i went thro the web site and > > only saw explanations on what it is, its char set,etc. do i need to > > download or install anything to work with unicode, pls let me know soon. > > and also once downloaded do i need to import any classes or files when > > working with it, as im scripting in php and html. thanx > > > > regards, > > > > roslyn > > > > > > > > - > > Do you Yahoo!? > > New DSL Internet Access from SBC & Yahoo! > > >
Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)
Kenneth Whistler wrote, as part of a longer response to my original posting. >William Overington asked: [snip] >> I wonder if consideration could please be given as to whether this matter >> should be left unregulated or whether some level of regulation should be >> used. >I think this should depend first on a determination of whether there >is a demonstrated need for an actual representation of these sequences -- >which ought to be determined by the people responsible for the >data stores which might contain them, namely the online bibliographic >community. [further remarks here snipped] Actually, "this matter" to which I was intending to refer was as follows, being more general than just the romanization of Cyrillic characters. quote It seems to me that this matter of sequences of combining characters being used to give glyphs where different meanings are needed other than just locally and that glyphs for such meanings are only correctly displayed if a particular rendering system or a particular font are used touches at the roots of the Unicode system. It seems to me that the glyphs for such sequences are being left as if they were a Private Use Area unregulated system. I recognize that fonts have glyph variations in that, say, an Arial letter b looks different to a Bookman Old Style letter b, yet in that case the meaning is the same. I wonder if consideration could please be given as to whether this matter should be left unregulated or whether some level of regulation should be used. end quote In another post in the same thread, Ken states as follows. quote But that wasn't my point. There is no particular evidence that the ALA-LC conventions with the dot above the graphic ligature ties is in widespread use for romanizations of these particular languages, that I can see. So the *urgency* of solving this problem isn't there, unless the LC/library/bibliographic community comes to the UTC and indicates that they have a data interchange problem with USMARC records using ANSEL that requires a clear representation solution in Unicode. end quote The problem of which I am seeking discussion please is as to whether, in the present state of the rules, there would be any need for any bibliographic community to approach the Unicode Consortium over such a matter, and, if it is the case that they would not need to do so, would it be better to seek to change the rules now. It is convenient to consider the situation in relation to the romanization of Cyrillic characters, yet similar considerations may well potentially also apply to topics such as the Byzantine legal texts. There may well be other topics to which similar considerations may apply. For example, please suppose that there were a committee called the Romanization of Cyrillic Committee. Suppose that that committee were to have various meetings and decide that for a ts romanization ligature that t U+FE20 s U+FE21 suits them fine, and that for the ts with a dot above romanization ligature that t U+FE20 s U+FE21 U+0307 suits them fine and publishes a list of assignments and example glyphs. The glyph for the ts with a dot above ligature in that publication has the dot above the curved line, centred horizontally. It is only later that someone with expert knowledge of the Unicode standard sees the published list and notices that the glyph shown in the document is, in fact, not the way that the glyph should appear according to the Unicode standard. By this time, many copies of the document have been published and sent to libraries around the world! Databases having started to be converted to what that publication may well be calling "the new Unicode based system". This might sound impossible, yet what is the present alternative? There is no way to formally register such sequences with the Unicode Consortium! I suggest that it might be a good idea to have an infrastructure whereby the Unicode Consortium registers sequences of combining characters and example glyphs, categorized as to application. This would have potentially far reaching benefits. Suppose, for example, that such an infrastructure existed, and that there is a mathematician, M, and a font designer, F, who do not know each other. M is writing a research paper on a particular branch of mathematics, where one of the key reference papers was written by an author whose name is written in Cyrillic characters, yet which name also has a romanized version. M finds that that romanization needs a character to represent the ts romanization ligature. How can M, who is using a word processor to prepare the research paper, insert that character into the document, because M is keen to insert the ts ligature in a form compatible with the standard bibliographic method for romanization of Cyrillic names? Fortunately, M finds that the word processor has available various special characters and finds a ts ligature and inserts it in the document. Behind the scenes the wordprocessor softw
Re: about starting off
Roslyn, I am working on a postgres database too - I haven't yet gotten to extensively testing the unicode aspects, but be sure to set the character set of the database to unicode when you create it. Otherwise all is probably lost - I don't know that you can simply change the char set later, and if you have to dump and import the data, you'd have to do some sort of conversions. Why bother making extra work for yourself? As for the code in php (I am using Perl myself and something similar applies) every time you manipulate text (every time!) get used to asking yourself if you (or php) are making any assumptions that one byte is the same as one character. The answer needs to be no, but will often be yes. Reconciling these issues is the bulk of making Unicode work for you. Barry Caplan Publisher, www.i18n.com On Thu, 19 Sep 2002, roslyn jose wrote: >> >> hi, >> >> im new to unicode, and am working on a project in php/postgresql. i need >> some info on how to start off with unicode. i went thro the web site and >> only saw explanations on what it is, its char set,etc. do i need to >> download or install anything to work with unicode, pls let me know soon. >> and also once downloaded do i need to import any classes or files when >> working with it, as im scripting in php and html. thanx >> >> regards, >> >> roslyn >>
Re: about starting off
Hi, Roslyn, The tools you choose might to some extent depend on your development environment. Using PHP on GNU/Linux or another *NIX environment, the following tools will certainly get you started in the right direction. Plan on using UTF-8 encoding for everything: so you need to calculate database column widths that will be wide enough to support the UTF-8 strings: -- Yudit (www.yudit.org). This is a fantastic Unicode editor. It has keyboard maps for just about every language imaginable, has correct shaping for Arabic and a number of Indic scripts, and even some handwriting recognition for Kanji/Hanzi. Command-line tools are also provided for converting files in different encodings. Of course UTF-8 is supported. -- Latest version of Mozilla (www.mozilla.org). Mozilla provides very good support for rendering a lot of scripts and is very standards-compliant, maybe the most standards-compliant browser available. -- Edith (www.zfc.nl) is a possibly little-known editor for X11. It is *not* unicode aware at all, but it has lots of other indispensible features for coding and development, such as regex-based searching and replacement, column-wise cut-and-paste, etc. What I do is type all non-ASCII strings in Yudit and save the file, write the ASCII PHP code in Edith (substitute your favorite editor here), open up the UTF-8 Yudit file in another Edith window, and copy and paste in the UTF-8 strings (which look awful in a non-unicode-aware editor, but a good editor doesn't mess with them). On Thu, 19 Sep 2002, roslyn jose wrote: > > hi, > > im new to unicode, and am working on a project in php/postgresql. i need > some info on how to start off with unicode. i went thro the web site and > only saw explanations on what it is, its char set,etc. do i need to > download or install anything to work with unicode, pls let me know soon. > and also once downloaded do i need to import any classes or files when > working with it, as im scripting in php and html. thanx > > regards, > > roslyn > > > > - > Do you Yahoo!? > New DSL Internet Access from SBC & Yahoo!
about starting off
hi, im new to unicode, and am working on a project in php/postgresql. i need some info on how to start off with unicode. i went thro the web site and only saw explanations on what it is, its char set,etc. do i need to download or install anything to work with unicode, pls let me know soon. and also once downloaded do i need to import any classes or files when working with it, as im scripting in php and html. thanx regards, roslynDo you Yahoo!? New DSL Internet Access from SBC & Yahoo!