[ https://issues.apache.org/jira/browse/SOLR-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169371#comment-13169371 ]
Robert Muir commented on SOLR-2968: ----------------------------------- yeah but the HunspellDictionary really is ridiculous if you try to use a large dictionary with it, even without cutting over to an FST it could probably be improved. for minority languages without really nice dictionaries it probably doesnt matter much, but for the languages with really nice dictionaries you also tend to have language-specific options available. just another crazy idea: I don't know how much of morfologik is dependent upon polish itself, but if it already knows how to compile ispell/hunspell into an efficient form and work with it, maybe we should just be seeing if we can 'generalize' that and work it from that angle. > Hunspell very high memory use when loading dictionary > ----------------------------------------------------- > > Key: SOLR-2968 > URL: https://issues.apache.org/jira/browse/SOLR-2968 > Project: Solr > Issue Type: Bug > Affects Versions: 3.5 > Reporter: Maciej Lisiewski > Priority: Minor > > Hunspell stemmer requires gigantic (for the task) amounts of memory to load > dictionary/rules files. > For example loading a 4.5 MB polish dictionary (with empty index!) will cause > whole core to crash with various out of memory errors unless you set max heap > size close to 2GB or more. > By comparison Stempel using the same dictionary file works just fine with 1/8 > of that (and possibly lower values as well). > Sample error log entries: > http://pastebin.com/fSrdd5W1 > http://pastebin.com/Lmi0re7Z -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org