[
https://issues.apache.org/jira/browse/LUCENE-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-6833:
--------------------------------
Description:
This is a follow-up to Uwe's work on LUCENE-6774.
This patch updates the code to use Morfologik stemming version 2.0.1, which
removes the "automatic" lookup of classpath-relative dictionary resources in
favor of an explicit InputStream or URL. So the user code is explicitly
responsible to provide these resources, reacting to missing files, etc.
There were no other "default" dictionaries in Morfologik other than the Polish
dictionary so I also cleaned up the filter code from a number of attributes
that were, to me, confusing.
* {{MorfologikFilterFactory}} now accepts an (optional) {{dictionary}}
attribute which contains an explicit name of the dictionary resource to load.
The resource is loaded with a {{ResourceLoader}} passed to the {{inform(..)}}
method, so the final location depends on the resource loader.
* There is no way to load the dictionary and metadata separately (this isn't at
all useful).
* If the {{dictionary}} attribute is missing, the filter loads the Polish
dictionary by default (since most people would be using Morfologik for stemming
Polish anyway).
This patch is *not* backward compatible, but it attempts to provide useful
feedback on initialization: if the removed attributes were used, it points at
this JIRA issue, so it should be clear what to change and how.
was:
This is a follow-up to Uwe's work on LUCENE-6774.
This patch updates the code to use Morfologik stemming version 2.0.1, which
removes the "automatic" lookup of classpath-relative dictionary resources in
favor of an explicit InputStream or URL. So the user code is explicitly
responsible to provide these resources, reacting to missing files, etc.
There were no other "default" dictionaries in Morfologik other than the Polish
dictionary so I also cleaned up the filter code from a number of attributes
that were, to me, confusing.
* {{MorfologikFilterFactory}} now accepts an (optional) {{dictionary}}
attribute which contains an explicit name of the dictionary resource to load.
The resource is loaded with a {{ResourceLoader}} passed to the {{inform(..)}}
method, so the final location depends on the resource loader.
* There is no way to load the dictionary and metadata separately (this isn't at
all useful).
* If the {{dictionary}} attribute is missing, the filter loads the Polish
dictionary by default (since most people would be using Morfologik for stemming
Polish anyway).
> Upgrade morfologik to version 2.0.1, simplify MorfologikFilter's dictionary
> lookup
> ----------------------------------------------------------------------------------
>
> Key: LUCENE-6833
> URL: https://issues.apache.org/jira/browse/LUCENE-6833
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Priority: Minor
> Fix For: Trunk
>
>
> This is a follow-up to Uwe's work on LUCENE-6774.
> This patch updates the code to use Morfologik stemming version 2.0.1, which
> removes the "automatic" lookup of classpath-relative dictionary resources in
> favor of an explicit InputStream or URL. So the user code is explicitly
> responsible to provide these resources, reacting to missing files, etc.
> There were no other "default" dictionaries in Morfologik other than the
> Polish dictionary so I also cleaned up the filter code from a number of
> attributes that were, to me, confusing.
> * {{MorfologikFilterFactory}} now accepts an (optional) {{dictionary}}
> attribute which contains an explicit name of the dictionary resource to load.
> The resource is loaded with a {{ResourceLoader}} passed to the {{inform(..)}}
> method, so the final location depends on the resource loader.
> * There is no way to load the dictionary and metadata separately (this isn't
> at all useful).
> * If the {{dictionary}} attribute is missing, the filter loads the Polish
> dictionary by default (since most people would be using Morfologik for
> stemming Polish anyway).
> This patch is *not* backward compatible, but it attempts to provide useful
> feedback on initialization: if the removed attributes were used, it points at
> this JIRA issue, so it should be clear what to change and how.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]