[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2014-06-04 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017403#comment-14017403
 ] 

vivek commented on LUCENE-2899:
---

I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to 
integrate

Installation

For English language testing: Until LUCENE-2899 is committed:

1.pull the latest trunk or 4.0 branch

2.apply the latest LUCENE-2899 patch
3.do 'ant compile'
cd solr/contrib/opennlp/src/test-files/training
.
.
. 
i followed first two steps but got the following error while executing 3rd point

common.compile-core:
[javac] Compiling 10 source files to 
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java

[javac] warning: [path] bad path element 
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar:
 no such file or directory

[javac] 
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43:
 error: cannot find symbol

[javac] super(Version.LUCENE_44, input);

[javac]  ^
[javac]   symbol:   variable LUCENE_44
[javac]   location: class Version
[javac] 
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56:
 error: no suitable constructor found for Tokenizer(Reader)
[javac] super(input);
[javac] ^
[javac] constructor Tokenizer.Tokenizer(AttributeFactory) is not 
applicable
[javac]   (actual argument Reader cannot be converted to 
AttributeFactory by method invocation conversion)
[javac] constructor Tokenizer.Tokenizer() is not applicable
[javac]   (actual and formal argument lists differ in length)
[javac] 2 errors
[javac] 1 warning

Im really stuck how to passthough this step. I wasted my entire day to fix this 
but couldn't move a bit. Please someone help me..?


 Add OpenNLP Analysis capabilities as a module
 -

 Key: LUCENE-2899
 URL: https://issues.apache.org/jira/browse/LUCENE-2899
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-2899-RJN.patch, LUCENE-2899.patch, 
 OpenNLPFilter.java, OpenNLPTokenizer.java


 Now that OpenNLP is an ASF project and has a nice license, it would be nice 
 to have a submodule (under analysis) that exposed capabilities for it. Drew 
 Farris, Tom Morton and I have code that does:
 * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
 would have to change slightly to buffer tokens)
 * NamedEntity recognition as a TokenFilter
 We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
 either payloads (PartOfSpeechAttribute?) on a token or at the same position.
 I'd propose it go under:
 modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2008-01-20 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560912#action_12560912
 ] 

vivek commented on LUCENE-855:
--

Any plans to have this part of Lucene 2.3?

 MemoryCachedRangeFilter to boost performance of Range queries
 -

 Key: LUCENE-855
 URL: https://issues.apache.org/jira/browse/LUCENE-855
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.1
Reporter: Andy Liu
Assignee: Otis Gospodnetic
 Attachments: contrib-filters.tar.gz, FieldCacheRangeFilter.patch, 
 FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, 
 FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, 
 FieldCacheRangeFilter.patch, MemoryCachedRangeFilter.patch, 
 MemoryCachedRangeFilter_1.4.patch, TestRangeFilterPerformanceComparison.java, 
 TestRangeFilterPerformanceComparison.java


 Currently RangeFilter uses TermEnum and TermDocs to find documents that fall 
 within the specified range.  This requires iterating through every single 
 term in the index and can get rather slow for large document sets.
 MemoryCachedRangeFilter reads all docId, value pairs of a given field, 
 sorts by value, and stores in a SortedFieldCache.  During bits(), binary 
 searches are used to find the start and end indices of the lower and upper 
 bound values.  The BitSet is populated by all the docId values that fall in 
 between the start and end indices.
 TestMemoryCachedRangeFilterPerformance creates a 100K RAMDirectory-backed 
 index with random date values within a 5 year range.  Executing bits() 1000 
 times on standard RangeQuery using random date intervals took 63904ms.  Using 
 MemoryCachedRangeFilter, it took 876ms.  Performance increase is less 
 dramatic when you have less unique terms in a field or using less number of 
 documents.
 Currently MemoryCachedRangeFilter only works with numeric values (values are 
 stored in a long[] array) but it can be easily changed to support Strings.  A 
 side benefit of storing the values are stored as longs, is that there's no 
 longer the need to make the values lexographically comparable, i.e. padding 
 numeric values with zeros.
 The downside of using MemoryCachedRangeFilter is there's a fairly significant 
 memory requirement.  So it's designed to be used in situations where range 
 filter performance is critical and memory consumption is not an issue.  The 
 memory requirements are: (sizeof(int) + sizeof(long)) * numDocs.  
 MemoryCachedRangeFilter also requires a warmup step which can take a while to 
 run in large datasets (it took 40s to run on a 3M document corpus).  Warmup 
 can be called explicitly or is automatically called the first time 
 MemoryCachedRangeFilter is applied using a given field.
 So in summery, MemoryCachedRangeFilter can be useful when:
 - Performance is critical
 - Memory is not an issue
 - Field contains many unique numeric values
 - Index contains large amount of documents

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-390) Contribution: LuceneIndexAccessor

2008-01-16 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559694#action_12559694
 ] 

vivek commented on LUCENE-390:
--

Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I 
see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.

 Contribution: LuceneIndexAccessor
 -

 Key: LUCENE-390
 URL: https://issues.apache.org/jira/browse/LUCENE-390
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: unspecified
 Environment: Operating System: other
 Platform: Other
Reporter: Maik Schreiber
Assignee: Lucene Developers
Priority: Minor
 Attachments: lucene-indexaccess-0.2.0.zip


 As per this post:
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL 
 PROTECTED]
 I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The
 Apache Software Foundation.
 Please note that it won't compile out of the box, but that should be fairly 
 easy
 to fix using a CVS version of Lucene. Also it makes use of Log4J.
 I'm fine with moving the classes to any package you like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-390) Contribution: LuceneIndexAccessor

2008-01-16 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559694#action_12559694
 ] 

vivek commented on LUCENE-390:
--

Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I 
see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.

 Contribution: LuceneIndexAccessor
 -

 Key: LUCENE-390
 URL: https://issues.apache.org/jira/browse/LUCENE-390
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: unspecified
 Environment: Operating System: other
 Platform: Other
Reporter: Maik Schreiber
Assignee: Lucene Developers
Priority: Minor
 Attachments: lucene-indexaccess-0.2.0.zip


 As per this post:
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL 
 PROTECTED]
 I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The
 Apache Software Foundation.
 Please note that it won't compile out of the box, but that should be fairly 
 easy
 to fix using a CVS version of Lucene. Also it makes use of Log4J.
 I'm fine with moving the classes to any package you like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-390) Contribution: LuceneIndexAccessor

2008-01-16 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559694#action_12559694
 ] 

vivash edited comment on LUCENE-390 at 1/16/08 2:33 PM:
---

Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I 
see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

(the package last updated 09/2007: http://myhardshadow.com/indexaccessorapi/ )

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.

  was (Author: vivash):
Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? 
I see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.
  
 Contribution: LuceneIndexAccessor
 -

 Key: LUCENE-390
 URL: https://issues.apache.org/jira/browse/LUCENE-390
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: unspecified
 Environment: Operating System: other
 Platform: Other
Reporter: Maik Schreiber
Assignee: Lucene Developers
Priority: Minor
 Attachments: lucene-indexaccess-0.2.0.zip


 As per this post:
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL 
 PROTECTED]
 I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The
 Apache Software Foundation.
 Please note that it won't compile out of the box, but that should be fairly 
 easy
 to fix using a CVS version of Lucene. Also it makes use of Log4J.
 I'm fine with moving the classes to any package you like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-390) Contribution: LuceneIndexAccessor

2008-01-16 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559694#action_12559694
 ] 

vivash edited comment on LUCENE-390 at 1/16/08 2:33 PM:
---

Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? I 
see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

(the package last updated 09/2007: http://myhardshadow.com/indexaccessorapi/ )

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.

  was (Author: vivash):
Is there any plan to incorporate this IndexAccessor package in Lucene 2.3? 
I see this library has been updated since last comments in this Jira,

 
http://www.gossamer-threads.com/lists/lucene/java-user/53117?search_string=LuceneIndexAccessor%20;#53251

We have a quite a similar requirement:

1) We have two writer threads (they come at the same time every 5 minutes) and 
write to temporary index
2) The two temporary indexes are then merged into a master index - using 
another IndexWriter
3) Currently, we open searcher for every new search, but we want to be able to 
cache the searcher and get a new one only if there is a change in the indexes 
(like every 5 minutes).

I think IndexAccessor is a good addition, unless there already exists something 
similar in Lucene package, which I'm not aware of.
  
 Contribution: LuceneIndexAccessor
 -

 Key: LUCENE-390
 URL: https://issues.apache.org/jira/browse/LUCENE-390
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: unspecified
 Environment: Operating System: other
 Platform: Other
Reporter: Maik Schreiber
Assignee: Lucene Developers
Priority: Minor
 Attachments: lucene-indexaccess-0.2.0.zip


 As per this post:
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200505.mbox/[EMAIL 
 PROTECTED]
 I'm attaching the LuceneIndexAccessor source here. Copyright is now 2005 The
 Apache Software Foundation.
 Please note that it won't compile out of the box, but that should be fairly 
 easy
 to fix using a CVS version of Lucene. Also it makes use of Log4J.
 I'm fine with moving the classes to any package you like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]