[
https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703670#action_12703670
]
Felipe Sánchez Martínez commented on LUCENE-1284:
-------------------------------------------------
Hi,
I think that the fact that the tool relies on an external free/open-source
package to pre-process the files to be indexed should not be an obstacle for
the community to benefit from them; the world is pretty heterogeneous ;).
Furthermore, they are not required at search time.
> Felipe, although Java equivalents of those command-line tools don't exist
> currently, do you think one could implement them in Java (and release them
> under ASL)?
This year the Apertium project is in the Google Summer of Code. A student will
port the ltoolbox package to Java. Note that the tool I contribute also uses
the apertium tagger and that this tool will not be ported; fortunately the
usage of the tagger is optional. The Java version of lttoolbox will be
released under the GPL license, I am not sure if they will accept to give it a
dual license.
--
Felipe
> Set of Java classes that allow the Lucene search engine to use morphological
> information developed for the Apertium open-source machine translation
> platform (http://www.apertium.org)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-1284
> URL: https://issues.apache.org/jira/browse/LUCENE-1284
> Project: Lucene - Java
> Issue Type: New Feature
> Environment: New feature developed under GNU/Linux, but it should
> work in any other Java-compliance platform
> Reporter: Felipe Sánchez Martínez
> Assignee: Otis Gospodnetic
> Attachments: apertium-morph.0.9.0.tgz
>
>
> Set of Java classes that allow the Lucene search engine to use morphological
> information developed for the Apertium open-source machine translation
> platform (http://www.apertium.org). Morphological information is used to
> index new documents and to process smarter queries in which morphological
> attributes can be used to specify query terms.
> The tool makes use of morphological analyzers and dictionaries developed for
> the open-source machine translation platform Apertium (http://apertium.org)
> and, optionally, the part-of-speech taggers developed for it. Currently there
> are morphological dictionaries available for Spanish, Catalan, Galician,
> Portuguese,
> Aranese, Romanian, French and English. In addition new dictionaries are being
> developed for Esperanto, Occitan, Basque, Swedish, Danish,
> Welsh, Polish and Italian, among others; we hope more language pairs to be
> added to the Apertium machine translation platform in the near future.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]