[
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michał Dybizbański updated LUCENE-2341:
---------------------------------------
Attachment: LUCENE-2341.diff
Dawid, I'm attaching a patch with the suggested changes:
1. MorfologikAnalyzer now doesn't use a LowerCaseFilter. When IStemmer.lookup
returns an empty list for an originally cased token, another lookup is made for
lowercased one. I hope the test case reflects your intentions.
2. I've added MorfologikPOSAttributeImpl class that provides information about
morphosyntactic annotations for each lemma, obtained with WordData.getTag(). A
test provides a short insight for potential users. Two notes here:
a) Since MorfologikPOSAttribute might be unused, I've implemented it in terms
of CharSequence (and not String), to not convert prematurely each POS tag to
String.
b) Currently a default POS (for a nonlemmatized token) is an empty String,
however null value might be more distinctive if empty POS tags for lemma were
allowed.
BTW, the patch deletes one line from dev-tools/eclipse/dot.classpath
(<classpathentry kind="src" path="modules/queries/src/test"/>) - was that
intentional ?
> explore morfologik integration
> ------------------------------
>
> Key: LUCENE-2341
> URL: https://issues.apache.org/jira/browse/LUCENE-2341
> Project: Lucene - Java
> Issue Type: New Feature
> Components: modules/analysis
> Reporter: Robert Muir
> Assignee: Dawid Weiss
> Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff,
> LUCENE-2341.diff, LUCENE-2341.patch, morfologik-fsa-1.5.2.jar,
> morfologik-polish-1.5.2.jar, morfologik-stemming-1.5.2.jar
>
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer
> available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option
> for users.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]