kuromoji dictionary could be more compact
-----------------------------------------
Key: LUCENE-3699
URL: https://issues.apache.org/jira/browse/LUCENE-3699
Project: Lucene - Java
Issue Type: Improvement
Reporter: Robert Muir
Attachments: LUCENE-3699.patch
Reading thru the ipadic documentation, i realized we are storing a lot of
redundant information,
for example the connection costs for bigram weights are based on POS+inflection
data, so its redundant
to also separately encode POS and inflection data for each entry.
With the patch the dictionary access is also faster and simpler, and
TokenInfoDictionary is 1.5MB smaller.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]