[ 
https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698732#action_12698732
 ] 

Felipe Sánchez Martínez commented on LUCENE-1284:
-------------------------------------------------

Hi Otis,

The Java code I contributed is ASL and GPLv2  (dual license). Apertium tools 
and data are GPL v2.


>  Why are they in pairs? Is that simply for the translation part of Apertium, 
> and  something that's ignored when you use the pair for Lucene and 
> morphological analysis?

Yes, they are language pairs because of the translation. If you are not 
interested in translation (as is our case) you can used whichever language pair 
containing the language you are interested in; choose the language pair with 
the highest number of lemmata, probably the one with the highest version number.

> Do you mind replacing the deprecated Hits object in the Searcher class?

Which is the new class I should use?

> Could you explain why the removal of multiword expressions is needed?

Multiword units need to be removed from the dictionary mainly because they are 
there to facilitate the correct translation of some expressions to the target 
language. This is not Spanish specific and should be done in all cases.


> So these are a few command-line tools that end up marking up the input text 
> with POS? 

Yes. 

> I seem to be missing some libraries and can't compile Apterium locally to 
> check what that this marked up file looks like.

You need to install lttoolbox,  you can download it from the Apertium web page.

> But my main question here is whether there are Java equivalents of these 
> command-line tools,

Unfortunately, no :(

Regards.
--
Felipe

> Set of Java classes that allow the Lucene search engine to use morphological 
> information developed for the Apertium open-source machine translation 
> platform (http://www.apertium.org)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1284
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1284
>             Project: Lucene - Java
>          Issue Type: New Feature
>         Environment: New feature developed under GNU/Linux, but it should 
> work in any other Java-compliance platform
>            Reporter: Felipe Sánchez Martínez
>            Assignee: Otis Gospodnetic
>         Attachments: apertium-morph.0.9.0.tgz
>
>
> Set of Java classes that allow the Lucene search engine to use morphological 
> information developed for the Apertium open-source machine translation 
> platform (http://www.apertium.org). Morphological information is used to 
> index new documents and to process smarter queries in which morphological 
> attributes can be used to specify query terms.
> The tool makes use of morphological analyzers and dictionaries developed for 
> the open-source machine translation platform Apertium (http://apertium.org) 
> and, optionally, the part-of-speech taggers developed for it. Currently there 
> are morphological dictionaries available for Spanish, Catalan, Galician, 
> Portuguese, 
> Aranese, Romanian, French and English. In addition new dictionaries are being 
> developed for Esperanto, Occitan, Basque, Swedish, Danish, 
> Welsh, Polish and Italian, among others; we hope more language pairs to be 
> added to the Apertium machine translation platform in the near future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to