[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Dybizbański updated LUCENE-2341:
---------------------------------------

    Attachment: LUCENE-2341.diff

Dawid, I'm attaching a patch with the suggested changes:

1. MorfologikAnalyzer now doesn't use a LowerCaseFilter. When IStemmer.lookup 
returns an empty list for an originally cased token, another lookup is made for 
lowercased one. I hope the test case reflects your intentions.

2. I've added MorfologikPOSAttributeImpl class that provides information about 
morphosyntactic annotations for each lemma, obtained with WordData.getTag(). A 
test provides a short insight for potential users. Two notes here:
  a) Since MorfologikPOSAttribute might be unused, I've implemented it in terms 
of CharSequence (and not String), to not convert prematurely each POS tag to 
String.
  b) Currently a default POS (for a nonlemmatized token) is an empty String, 
however null value might be more distinctive if empty POS tags for lemma were 
allowed.

BTW, the patch deletes one line from dev-tools/eclipse/dot.classpath 
(<classpathentry kind="src" path="modules/queries/src/test"/>) - was that 
intentional ?


> explore morfologik integration
> ------------------------------
>
>                 Key: LUCENE-2341
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2341
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Dawid Weiss
>         Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff, 
> LUCENE-2341.diff, LUCENE-2341.patch, morfologik-fsa-1.5.2.jar, 
> morfologik-polish-1.5.2.jar, morfologik-stemming-1.5.2.jar
>
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
> available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option 
> for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to