Re: Hyphenation foundry [was: Re: proposed font project]
Simon Pepping wrote: Hi Clay, On Sat, May 29, 2004 at 10:02:37PM -0700, Clay Leeds wrote: It would also be good to develop some sort of hyphenation foundry... I think it is time to create a project for the hyphenation files at Sourceforge. The project should be a home for all sorts of accessories to FOP, or even to FO processors in general. Do you want to participate? Do you know a nice name? Hy-pe Hy-Phi Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Hyphenation foundry [was: Re: proposed font project]
J.Pietschmann wrote: Simon Pepping wrote: I think it is time to create a project for the hyphenation files at Sourceforge. The project should be a home for all sorts of accessories to FOP, or even to FO processors in general. Do you want to participate? Do you know a nice name? Well, sf.net would appeal to a larger body of developers, I think, and is certainly easier to menage for small projects, but we can also ask on jakarta-commons, xml-commons and even declare it a FOP (or XML graphics) subproject. Anyway, I just uploaded http://cvs.apache.org/~pietsch/t.tar.gz which contains several unfinished stuff I produced the last year: - Utilities to generate tables for the Unicode line break property Does Character.UnicodeBlock provide any of this functionality? Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Hyphenation foundry [was: Re: proposed font project]
Hi Clay, On Sat, May 29, 2004 at 10:02:37PM -0700, Clay Leeds wrote: It would also be good to develop some sort of hyphenation foundry... I think it is time to create a project for the hyphenation files at Sourceforge. The project should be a home for all sorts of accessories to FOP, or even to FO processors in general. Do you want to participate? Do you know a nice name? Regards, Simon -- Simon Pepping home page: http://www.leverkruid.nl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Hyphenation foundry [was: Re: proposed font project]
On Jun 16, 2004, at 12:20 PM, Simon Pepping wrote: Hi Clay, Hi Simon! On Sat, May 29, 2004 at 10:02:37PM -0700, Clay Leeds wrote: It would also be good to develop some sort of hyphenation foundry... I think it is time to create a project for the hyphenation files at Sourceforge. The project should be a home for all sorts of accessories to FOP, or even to FO processors in general. Do you want to participate? Do you know a nice name? Regards, Simon Sure! I'd love to participate! I don't know how yet, though... Ideas for names? I guess it depends on how 'we' want to position this foundry. Is the foundry geared toward FOP users? * fopstuff * fop-stuff * fostuff * fo-stuff * xslfostuff * xsl-fo-stuff * foptoys * fop-toys * fotoys * fo-toys * xslfotoys * xsl-fo-toys * fopaccessories * fop-accessories * foaccessories * fo-accessories * xslfoaccessories * xsl-fo-accessories * fopperipherals * fop-peripherals * foperipherals * fo-peripherals * xslfoperipherals * xsl-fo-peripherals I don't have a particular favorite, although since there are so many, it wouldn't be very helpful if I didn't 'choose' one or two. I like the ones *with* the hyphen (no pun intended! ;-) -- which makes it easier to read): * xsl-fo-toys * xsl-fo-stuff In addition, since we want it to be of broader use (i.e., not just FOP), I would think we'd want to use one of the 'fo' or 'xsl-fo' prefixes (with or without hyphens) over the 'fop' based ones. Hope this helps! Web Maestro Clay [EMAIL PROTECTED] --- There are only 10 kinds of people in the world: those who understand binary and those who don't. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Hyphenation foundry [was: Re: proposed font project]
Simon Pepping wrote: I think it is time to create a project for the hyphenation files at Sourceforge. The project should be a home for all sorts of accessories to FOP, or even to FO processors in general. Do you want to participate? Do you know a nice name? Well, sf.net would appeal to a larger body of developers, I think, and is certainly easier to menage for small projects, but we can also ask on jakarta-commons, xml-commons and even declare it a FOP (or XML graphics) subproject. Anyway, I just uploaded http://cvs.apache.org/~pietsch/t.tar.gz which contains several unfinished stuff I produced the last year: - Utilities to generate tables for the Unicode line break property - A class keeping a line break state according to TR14, which should be easier to usee than the java.text.BreakIterator for FOP - A Java port of MySpell - An attempt at providing a layered hierarchy for spell checking and hyphenation interfaces. - A Java port of the link grammar parser (incomplete, badly designed, buggy and without approvement of the original authors, *please* use only for personal study, don't redistribute). - An attempt at a morphological analyzer for german words. Somehow, the simple port of patgen as well as other attempts at simplifying the current FOP hyphenator are missing, I hope I remember to upload them tomorrow. If someone want some problems to chew on: - Implementation of an optimized trie or ternary or PATRICIA tree. Issues here: The FOP implementation packs both tree construction and retrieval into a single class, while the data structure is WORM. Furthermore, while it is fast, it could be implemented with much less memory, especially peak memory during construction. I ultimately concluded compiling the data into Java bytecode would be the best. Consider inserting the words WORD and WORM. A PATRICIA tree would collapse this to root: WOR - leaf D - leaf M In order to map this, the root node gets an operation match string with the string WOR leading to the subtree. Statistical compression could optimize the necessary operation, like switch array, match 2char string, match 3char string, match n-char string etc. May utilize BCEL. - Institutionalized alphabet transformation. This is somewhat of a generalization of the hyphenation character classes. Java uses 16bit characters, but in many languages it is rare that more than 256 characters are actually used in words. TeX/PatGen also map the characters onto the numbers 1..N (256), folding character classification into the process. Mapping chars onto bytes saves almost half the memory. Because there are languages which requires more than 256 characters, at least two implementation of the trie/whatever holding the patterns are necessary, one where the keys are byte sequences, another with char sequences. Too bad generics aren't ready yet, but if the data is byte compiled into a Java class, the compiler may analyze the patterns and decide whether bytes are sufficient. Stuff like Unicode character normalization should probably be folded into the classification/alphabet transformation too. It would be too bad if hyphenation failed because someone decided to use unnormalized characters like FI LIGATURE. - API design. Need a hierarchy of interfaces which allow polymorphy at various levels: + Hyphenator implementations: pattern hyphenator, dictionary hyphenator, composite hyphenator: delegate to a collection of child hyphenators + Pattern hyphenator - pattern storage implementations: HashTable (very easy to understand but slow), R/W-trie, optimized WORM class, ... + Dictionary hyphenator - dictionary ... For reuse in interactive applications, R/W storage may be useful (user dictionaries) - Generalized line breaking strategies. Possible strategies + naive, break before the first non-space after a space + TR14 + break before any character + pattern, regexp or dictionary pased - Other ideas: API for processing the Unicode data files. Optimized compile for Unicode properties into Java class data: select the properties you want, get it. Use this to get the latest Unicode data into your Java applications rather than the outdated stuff in the JRE. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: proposed font project
On Sat, May 29, 2004 at 10:02:37PM -0700, Clay Leeds wrote: Paul Tremblay said: I have been hard at work collecting high-quality fonts from the web, generating an XML metrics from these fonts, creating fragments that one can plug into the fop configuration file, and write short fop-xml files to display the qualities of these fonts. I think think that FOP could benefit from having a central place where one could download these fonts and metrics files to get almost instant fonts. Does anyone else feel this would be helpful? I had planned to make the fonts and metrics file avaible on Sourceforge. Any thoughts? Paul I was just thinking it would be good to add FONT resources to the FOP Resources or Fonts pages. This sounds like a fine idea to me. In addition, it might fit in to the XML Graphics spinoff currently being discussed. It would also be good to develop some sort of hyphenation foundry... I still intend to create a SourceForge project for the hyphenation files that do not comply with the Apache license, and therefore cannot be made available with the FOP code. These should naturally go with font resources with the same problem. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.nl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: proposed font project
Paul Tremblay said: I have been hard at work collecting high-quality fonts from the web, generating an XML metrics from these fonts, creating fragments that one can plug into the fop configuration file, and write short fop-xml files to display the qualities of these fonts. I think think that FOP could benefit from having a central place where one could download these fonts and metrics files to get almost instant fonts. Does anyone else feel this would be helpful? I had planned to make the fonts and metrics file avaible on Sourceforge. Any thoughts? Paul I was just thinking it would be good to add FONT resources to the FOP Resources or Fonts pages. This sounds like a fine idea to me. In addition, it might fit in to the XML Graphics spinoff currently being discussed. It would also be good to develop some sort of hyphenation foundry... -- Clay Leeds - [EMAIL PROTECTED] Web Developer - Medata, Inc. - http://www.medata.com PGP Public Key: https://mail.medata.com/pgp/cleeds.asc - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]