Re: Hyphenation foundry [was: Re: proposed font project]

2004-06-17 Thread Peter B. West
Simon Pepping wrote:
Hi Clay,
On Sat, May 29, 2004 at 10:02:37PM -0700, Clay Leeds wrote:
It would also be good to develop some sort of hyphenation foundry...

I think it is time to create a project for the hyphenation files at
Sourceforge. The project should be a home for all sorts of accessories
to FOP, or even to FO processors in general. Do you want to
participate? Do you know a nice name?
Hy-pe
Hy-Phi
Peter
--
Peter B. West http://www.powerup.com.au/~pbwest/resume.html
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Hyphenation foundry [was: Re: proposed font project]

2004-06-17 Thread Peter B. West
J.Pietschmann wrote:
Simon Pepping wrote:
I think it is time to create a project for the hyphenation files at
Sourceforge. The project should be a home for all sorts of accessories
to FOP, or even to FO processors in general. Do you want to
participate? Do you know a nice name?

Well, sf.net would appeal to a larger body of developers, I think,
and is certainly easier to menage for small projects, but we
can also ask on jakarta-commons, xml-commons and even declare it
a FOP (or XML graphics) subproject.
Anyway, I just uploaded
 http://cvs.apache.org/~pietsch/t.tar.gz
which contains several unfinished stuff I produced the last year:
- Utilities to generate tables for the Unicode line break property
Does Character.UnicodeBlock provide any of this functionality?
Peter
--
Peter B. West http://www.powerup.com.au/~pbwest/resume.html
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Hyphenation foundry [was: Re: proposed font project]

2004-06-16 Thread Simon Pepping
Hi Clay,

On Sat, May 29, 2004 at 10:02:37PM -0700, Clay Leeds wrote:
 It would also be good to develop some sort of hyphenation foundry...

I think it is time to create a project for the hyphenation files at
Sourceforge. The project should be a home for all sorts of accessories
to FOP, or even to FO processors in general. Do you want to
participate? Do you know a nice name?

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Hyphenation foundry [was: Re: proposed font project]

2004-06-16 Thread Clay Leeds
On Jun 16, 2004, at 12:20 PM, Simon Pepping wrote:
Hi Clay,
Hi Simon!
On Sat, May 29, 2004 at 10:02:37PM -0700, Clay Leeds wrote:
It would also be good to develop some sort of hyphenation foundry...
I think it is time to create a project for the hyphenation files at
Sourceforge. The project should be a home for all sorts of accessories
to FOP, or even to FO processors in general. Do you want to
participate? Do you know a nice name?
Regards, Simon
Sure! I'd love to participate! I don't know how yet, though...
Ideas for names? I guess it depends on how 'we' want to position this 
foundry. Is the foundry geared toward FOP users?

* fopstuff
* fop-stuff
* fostuff
* fo-stuff
* xslfostuff
* xsl-fo-stuff
* foptoys
* fop-toys
* fotoys
* fo-toys
* xslfotoys
* xsl-fo-toys
* fopaccessories
* fop-accessories
* foaccessories
* fo-accessories
* xslfoaccessories
* xsl-fo-accessories
* fopperipherals
* fop-peripherals
* foperipherals
* fo-peripherals
* xslfoperipherals
* xsl-fo-peripherals
I don't have a particular favorite, although since there are so many, 
it wouldn't be very helpful if I didn't 'choose' one or two. I like the 
ones *with* the hyphen (no pun intended! ;-) -- which makes it easier 
to read):

* xsl-fo-toys
* xsl-fo-stuff
In addition, since we want it to be of broader use (i.e., not just 
FOP), I would think we'd want to use one of the 'fo' or 'xsl-fo' 
prefixes (with or without hyphens) over the 'fop' based ones.

Hope this helps!
Web Maestro Clay [EMAIL PROTECTED]
---
There are only 10 kinds of people in the world: those who understand 
binary and those who don't.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Hyphenation foundry [was: Re: proposed font project]

2004-06-16 Thread J.Pietschmann
Simon Pepping wrote:
I think it is time to create a project for the hyphenation files at
Sourceforge. The project should be a home for all sorts of accessories
to FOP, or even to FO processors in general. Do you want to
participate? Do you know a nice name?
Well, sf.net would appeal to a larger body of developers, I think,
and is certainly easier to menage for small projects, but we
can also ask on jakarta-commons, xml-commons and even declare it
a FOP (or XML graphics) subproject.
Anyway, I just uploaded
 http://cvs.apache.org/~pietsch/t.tar.gz
which contains several unfinished stuff I produced the last year:
- Utilities to generate tables for the Unicode line break property
- A class keeping a line break state according to TR14, which should
  be easier to usee than the java.text.BreakIterator for FOP
- A Java port of MySpell
- An attempt at providing a layered hierarchy for spell checking
 and hyphenation interfaces.
- A Java port of the link grammar parser (incomplete, badly designed,
 buggy and without approvement of the original authors, *please* use
 only for personal study, don't redistribute).
- An attempt at a morphological analyzer for german words.
Somehow, the simple port of patgen as well as other attempts at
simplifying the current FOP hyphenator are missing, I hope I
remember to upload them tomorrow.
If someone want some problems to chew on:
- Implementation of an optimized trie or ternary or PATRICIA tree.
 Issues here: The FOP implementation packs both tree construction and
 retrieval into a single class, while the data structure is WORM.
 Furthermore, while it is fast, it could be implemented with much
 less memory, especially peak memory during construction. I ultimately
 concluded compiling the data into Java bytecode would be the best.
 Consider inserting the words WORD and WORM. A PATRICIA tree would
 collapse this to
   root: WOR - leaf D
 - leaf M
 In order to map this, the root node gets an operation match string
 with the string WOR leading to the subtree. Statistical compression
 could optimize the necessary operation, like switch array, match
 2char string, match 3char string, match n-char string etc. May utilize
 BCEL.
- Institutionalized alphabet transformation. This is somewhat of a
 generalization of the hyphenation character classes. Java uses 16bit
 characters, but in many languages it is rare that more than 256
 characters are actually used in words. TeX/PatGen also map the
 characters onto the numbers 1..N (256), folding character
 classification into the process. Mapping chars onto bytes saves almost
 half the memory. Because there are languages which requires more than
 256 characters, at least two implementation of the trie/whatever
 holding the patterns are necessary, one where the keys are byte
 sequences, another with char sequences. Too bad generics aren't ready
 yet, but if the data is byte compiled into a Java class, the compiler
 may analyze the patterns and decide whether bytes are sufficient.
 Stuff like Unicode character normalization should probably be folded
 into the classification/alphabet transformation too. It would be too
 bad if hyphenation failed because someone decided to use unnormalized
 characters like FI LIGATURE.
- API design. Need a hierarchy of interfaces which allow polymorphy
 at various levels:
  + Hyphenator
  implementations: pattern hyphenator, dictionary hyphenator,
  composite hyphenator: delegate to a collection of child
  hyphenators
  + Pattern hyphenator - pattern storage
 implementations: HashTable (very easy to understand but slow),
 R/W-trie, optimized WORM class, ...
  + Dictionary hyphenator - dictionary ...
 For reuse in interactive applications, R/W storage may be useful (user
 dictionaries)
- Generalized line breaking strategies. Possible strategies
 + naive, break before the first non-space after a space
 + TR14
 + break before any character
 + pattern, regexp or dictionary pased
- Other ideas: API for processing the Unicode data files. Optimized
 compile for Unicode properties into Java class data: select the
 properties you want, get it. Use this to get the latest Unicode data
 into your Java applications rather than the outdated stuff in the
 JRE.
J.Pietschmann
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: proposed font project

2004-06-01 Thread Simon Pepping
On Sat, May 29, 2004 at 10:02:37PM -0700, Clay Leeds wrote:
 Paul Tremblay said:
  I have been hard at work collecting high-quality fonts from the web,
  generating an XML metrics from these fonts, creating fragments that one
  can plug into the fop configuration file, and write short fop-xml files
  to display the qualities of these fonts.
 
  I think think that FOP could benefit from having a central place where
  one could download these fonts and metrics files to get almost instant
  fonts.
 
  Does anyone else feel this would be helpful? I had planned to make the
  fonts and metrics file avaible on Sourceforge. Any thoughts?
 
  Paul
 
 I was just thinking it would be good to add FONT resources to the FOP
 Resources or Fonts pages. This sounds like a fine idea to me. In addition,
 it might fit in to the XML Graphics spinoff currently being discussed.
 
 It would also be good to develop some sort of hyphenation foundry...

I still intend to create a SourceForge project for the hyphenation
files that do not comply with the Apache license, and therefore cannot
be made available with the FOP code. These should naturally go with
font resources with the same problem.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: proposed font project

2004-05-30 Thread Clay Leeds
Paul Tremblay said:
 I have been hard at work collecting high-quality fonts from the web,
 generating an XML metrics from these fonts, creating fragments that one
 can plug into the fop configuration file, and write short fop-xml files
 to display the qualities of these fonts.

 I think think that FOP could benefit from having a central place where
 one could download these fonts and metrics files to get almost instant
 fonts.

 Does anyone else feel this would be helpful? I had planned to make the
 fonts and metrics file avaible on Sourceforge. Any thoughts?

 Paul

I was just thinking it would be good to add FONT resources to the FOP
Resources or Fonts pages. This sounds like a fine idea to me. In addition,
it might fit in to the XML Graphics spinoff currently being discussed.

It would also be good to develop some sort of hyphenation foundry...

-- 
Clay Leeds - [EMAIL PROTECTED]
Web Developer - Medata, Inc. - http://www.medata.com
PGP Public Key: https://mail.medata.com/pgp/cleeds.asc


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]