Hi,

I wanted to let anyone using HFST tools for morphology know about a tool
that I have been preparing as the last phase of my GSoC project. It is
hfst-fst2strings and plays a role similar to lt-expand, in that it provides
a dump of transductions recognized by a transducer. It is aware of and can
evaluate and filter flag diacritics and has the ability to filter transducer
paths to include only those with a surface or output form shorter than a
give length and/or only those with a surface or output form matching a given
prefix. For example, one could use it like:

hfst-fst2strings -ef -c 0 -P <lemma> <transducer>

to extract strings whose analysis begins with <lemma> while evaluating and
stripping out any flag diacritics. Results are less than ideal for
transducers that produce compounds, however the "-l" and "-L" flags for
limiting the input/output strings to a specified length can help deal with
that, and I have a couple other ideas that would be simple to implement and
make the tool even more useful.

A caveat is that the tool is a part of HFST3 (only available from SVN) which
does not yet have the lexc and twol tools, so the package as a whole is not
yet ready to simply replace the current toolchain for those producing
transducers that way.

--Brian Croom
------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to