[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017403#comment-14017403 ] vivek commented on LUCENE-2899: --- I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to integrate Installation For English language testing: Until LUCENE-2899 is committed: 1.pull the latest trunk or 4.0 branch 2.apply the latest LUCENE-2899 patch 3.do 'ant compile' cd solr/contrib/opennlp/src/test-files/training . . . i followed first two steps but got the following error while executing 3rd point common.compile-core: [javac] Compiling 10 source files to /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java [javac] warning: [path] bad path element /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar: no such file or directory [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43: error: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac] location: class Version [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56: error: no suitable constructor found for Tokenizer(Reader) [javac] super(input); [javac] ^ [javac] constructor Tokenizer.Tokenizer(AttributeFactory) is not applicable [javac] (actual argument Reader cannot be converted to AttributeFactory by method invocation conversion) [javac] constructor Tokenizer.Tokenizer() is not applicable [javac] (actual and formal argument lists differ in length) [javac] 2 errors [javac] 1 warning Im really stuck how to passthough this step. I wasted my entire day to fix this but couldn't move a bit. Please someone help me..? Add OpenNLP Analysis capabilities as a module - Key: LUCENE-2899 URL: https://issues.apache.org/jira/browse/LUCENE-2899 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-2899-RJN.patch, LUCENE-2899.patch, OpenNLPFilter.java, OpenNLPTokenizer.java Now that OpenNLP is an ASF project and has a nice license, it would be nice to have a submodule (under analysis) that exposed capabilities for it. Drew Farris, Tom Morton and I have code that does: * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it would have to change slightly to buffer tokens) * NamedEntity recognition as a TokenFilter We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position. I'd propose it go under: modules/analysis/opennlp -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries
[ https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560912#action_12560912 ] vivek commented on LUCENE-855: -- Any plans to have this part of Lucene 2.3? MemoryCachedRangeFilter to boost performance of Range queries - Key: LUCENE-855 URL: https://issues.apache.org/jira/browse/LUCENE-855 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.1 Reporter: Andy Liu Assignee: Otis Gospodnetic Attachments: contrib-filters.tar.gz, FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, MemoryCachedRangeFilter.patch, MemoryCachedRangeFilter_1.4.patch, TestRangeFilterPerformanceComparison.java, TestRangeFilterPerformanceComparison.java Currently RangeFilter uses TermEnum and TermDocs to find documents that fall within the specified range. This requires iterating through every single term in the index and can get rather slow for large document sets. MemoryCachedRangeFilter reads all docId, value pairs of a given field, sorts by value, and stores in a SortedFieldCache. During bits(), binary searches are used to find the start and end indices of the lower and upper bound values. The BitSet is populated by all the docId values that fall in between the start and end indices. TestMemoryCachedRangeFilterPerformance creates a 100K RAMDirectory-backed index with random date values within a 5 year range. Executing bits() 1000 times on standard RangeQuery using random date intervals took 63904ms. Using MemoryCachedRangeFilter, it took 876ms. Performance increase is less dramatic when you have less unique terms in a field or using less number of documents. Currently MemoryCachedRangeFilter only works with numeric values (values are stored in a long[] array) but it can be easily changed to support Strings. A side benefit of storing the values are stored as longs, is that there's no longer the need to make the values lexographically comparable, i.e. padding numeric values with zeros. The downside of using MemoryCachedRangeFilter is there's a fairly significant memory requirement. So it's designed to be used in situations where range filter performance is critical and memory consumption is not an issue. The memory requirements are: (sizeof(int) + sizeof(long)) * numDocs. MemoryCachedRangeFilter also requires a warmup step which can take a while to run in large datasets (it took 40s to run on a 3M document corpus). Warmup can be called explicitly or is automatically called the first time MemoryCachedRangeFilter is applied using a given field. So in summery, MemoryCachedRangeFilter can be useful when: - Performance is critical - Memory is not an issue - Field contains many unique numeric values - Index contains large amount of documents -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-390) Contribution: LuceneIndexAccessor
[ https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559694#action_12559694 ] vivek commented on LUCENE-390: -- Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. Contribution: LuceneIndexAccessor - Key: LUCENE-390 URL: https://issues.apache.org/jira/browse/LUCENE-390 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: unspecified Environment: Operating System: other Platform: Other Reporter: Maik Schreiber Assignee: Lucene Developers Priority: Minor Attachments: lucene-indexaccess-0.2.0.zip As per this post: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL PROTECTED] I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The Apache Software Foundation. Please note that it won't compile out of the box, but that should be fairly easy to fix using a CVS version of Lucene. Also it makes use of Log4J. I'm fine with moving the classes to any package you like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-390) Contribution: LuceneIndexAccessor
[ https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559694#action_12559694 ] vivek commented on LUCENE-390: -- Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. Contribution: LuceneIndexAccessor - Key: LUCENE-390 URL: https://issues.apache.org/jira/browse/LUCENE-390 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: unspecified Environment: Operating System: other Platform: Other Reporter: Maik Schreiber Assignee: Lucene Developers Priority: Minor Attachments: lucene-indexaccess-0.2.0.zip As per this post: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL PROTECTED] I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The Apache Software Foundation. Please note that it won't compile out of the box, but that should be fairly easy to fix using a CVS version of Lucene. Also it makes use of Log4J. I'm fine with moving the classes to any package you like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-390) Contribution: LuceneIndexAccessor
[ https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559694#action_12559694 ] vivash edited comment on LUCENE-390 at 1/16/08 2:33 PM: --- Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 (the package last updated 09/2007: http://myhardshadow.com/indexaccessorapi/ ) We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. was (Author: vivash): Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. Contribution: LuceneIndexAccessor - Key: LUCENE-390 URL: https://issues.apache.org/jira/browse/LUCENE-390 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: unspecified Environment: Operating System: other Platform: Other Reporter: Maik Schreiber Assignee: Lucene Developers Priority: Minor Attachments: lucene-indexaccess-0.2.0.zip As per this post: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL PROTECTED] I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The Apache Software Foundation. Please note that it won't compile out of the box, but that should be fairly easy to fix using a CVS version of Lucene. Also it makes use of Log4J. I'm fine with moving the classes to any package you like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-390) Contribution: LuceneIndexAccessor
[ https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559694#action_12559694 ] vivash edited comment on LUCENE-390 at 1/16/08 2:33 PM: --- Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 (the package last updated 09/2007: http://myhardshadow.com/indexaccessorapi/ ) We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. was (Author: vivash): Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I see this library has been updated since last comments in this Jira, http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251 We have a quite a similar requirement: 1) We have two writer threads (they come at the same time every 5 minutes) and write to temporary index 2) The two temporary indexes are then merged into a master index - using another IndexWriter 3) Currently, we open searcher for every new search, but we want to be able to cache the searcher and get a new one only if there is a change in the indexes (like every 5 minutes). I think IndexAccessor is a good addition, unless there already exists something similar in Lucene package, which I'm not aware of. Contribution: LuceneIndexAccessor - Key: LUCENE-390 URL: https://issues.apache.org/jira/browse/LUCENE-390 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: unspecified Environment: Operating System: other Platform: Other Reporter: Maik Schreiber Assignee: Lucene Developers Priority: Minor Attachments: lucene-indexaccess-0.2.0.zip As per this post: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL PROTECTED] I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The Apache Software Foundation. Please note that it won't compile out of the box, but that should be fairly easy to fix using a CVS version of Lucene. Also it makes use of Log4J. I'm fine with moving the classes to any package you like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]