from:"Erik Hatcher \(JIRA\)"

[jira] Commented: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-14 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833552#action_12833552
 ] 

Erik Hatcher commented on LUCENE-1941:
--

Uwe - patch looks good.  Go for it!

 MinPayloadFunction returns 0 when only one payload is present
 -

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9, 3.0
Reporter: Erik Hatcher
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-1941.patch, LUCENE-1941.patch


 In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
 No time at the moment to flesh out a unit test, but wanted to report it for 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-12 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832932#action_12832932
 ] 

Erik Hatcher commented on LUCENE-1941:
--

Feel free to adjust this issue to whichever Lucene version makes sense.  I 
don't have bandwidth at the moment to address this myself.

 MinPayloadFunction returns 0 when only one payload is present
 -

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9, 3.0
Reporter: Erik Hatcher
 Fix For: 2.9.2, 3.0.1, 3.1


 In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
 No time at the moment to flesh out a unit test, but wanted to report it for 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2238) deprecate ChineseAnalyzer

2010-01-28 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806239#action_12806239
 ] 

Erik Hatcher commented on LUCENE-2238:
--

+1

 deprecate ChineseAnalyzer
 -

 Key: LUCENE-2238
 URL: https://issues.apache.org/jira/browse/LUCENE-2238
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2238.patch


 The ChineseAnalyzer, ChineseTokenizer, and ChineseFilter (not the smart one, 
 or CJK) indexes chinese text as individual characters and removes english 
 stopwords, etc.
 In my opinion we should simply deprecate all of this in favor of 
 StandardAnalyzer, StandardTokenizer, and StopFilter, which does the same 
 thing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2231) my lucene project is able to search single time how can make it as long as i can

2010-01-22 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved LUCENE-2231.
--

Resolution: Not A Problem

Please ask support questions on the java-user list.  Also (bias noted here), 
the book Lucene in Action will help you out immensely with these getting 
started questions.

 my lucene project is able to search single time how can make it as long as i 
 can
 

 Key: LUCENE-2231
 URL: https://issues.apache.org/jira/browse/LUCENE-2231
 Project: Lucene - Java
  Issue Type: Wish
Affects Versions: 2.9.1
Reporter: sameeuddin Mohammed
Priority: Critical
   Original Estimate: 5h
  Remaining Estimate: 5h

 i am using lucene with netbeans 6.5 when i execute my project it will show 
 only single time next time there are no results in search and i want to know 
 how to match lower case and higher case as same, and when i have i word for 
 ex simpletext i want to search for only simple 
 plz send my reply as soon as possible

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2198) support protected words in Stemming TokenFilters

2010-01-13 Thread Erik Hatcher (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799839#action_12799839
]

Erik Hatcher commented on LUCENE-2198:
--

+1 on the StemAttribute approach. I've just encountered this exact need in
some custom code I've been reviewing, where the decision to stem or not is
dynamic per term (with the approach I'm looking at using a custom term type
string and a custom stem filter).

support protected words in Stemming TokenFilters

Key: LUCENE-2198
URL: https://issues.apache.org/jira/browse/LUCENE-2198
Project: Lucene - Java
Issue Type: Improvement
Components: Analysis
Affects Versions: 3.0
Reporter: Robert Muir
Priority: Minor

This is from LUCENE-1515
I propose that all stemming TokenFilters have an 'exclusion set' that
bypasses any stemming for words in this set.
Some stemming tokenfilters have this, some do not.
This would be one way for Karl to implement his new swedish stemmer (as a
text file of ignore words).
Additionally, it would remove duplication between lucene and solr, as they
reimplement snowballfilter since it does not have this functionality.
Finally, I think this is a pretty common use case, where people want to
ignore things like proper nouns in the stemming.
As an alternative design I considered a case where we generalized this to
CharArrayMap (and ignoring words would mean mapping them to themselves),
which would also provide a mechanism to override the stemming algorithm. But
I think this is too expert, could be its own filter, and the only example of
this i can find is in the Dutch stemmer.
So I think we should just provide ignore with CharArraySet, but if you feel
otherwise please comment.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2009-10-02 Thread Erik Hatcher (JIRA)

MinPayloadFunction returns 0 when only one payload is present
-

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9
Reporter: Erik Hatcher


In some experiments with payload scoring through PayloadTermQuery, I'm seeing 0 
returned when using MinPayloadFunction.  I believe there is a bug there.  No 
time at the moment to flesh out a unit test, but wanted to report it for 
tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1938) Precedence query parser using the contrib/queryparser framework

2009-10-01 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761405#action_12761405
 ] 

Erik Hatcher commented on LUCENE-1938:
--

Yes, let's just remove the old PrecedenceQueryParser (which was just an 
experiment by me - is anyone actually using it?)

 Precedence query parser using the contrib/queryparser framework
 ---

 Key: LUCENE-1938
 URL: https://issues.apache.org/jira/browse/LUCENE-1938
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Affects Versions: 2.9
Reporter: Adriano Crestani
Assignee: Adriano Crestani
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-1938.patch


 Extend the current StandardQueryParser on contrib so it supports boolean 
 precedence

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1850) Update overview example code

2009-08-24 Thread Erik Hatcher (JIRA)

Update overview example code


 Key: LUCENE-1850
 URL: https://issues.apache.org/jira/browse/LUCENE-1850
 Project: Lucene - Java
  Issue Type: Task
  Components: Examples, Javadocs
Reporter: Erik Hatcher
 Fix For: 2.9


See http://lucene.apache.org/java/2_4_1/api/core/overview-summary.html - need 
to update for non-deprecated best-practices/recommended API usage.

Also, double-check that the demo app works as documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-1806) Add args to test-macro

2009-08-14 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved LUCENE-1806.
--

Resolution: Fixed

Done, thanks Jason.

 Add args to test-macro
 --

 Key: LUCENE-1806
 URL: https://issues.apache.org/jira/browse/LUCENE-1806
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 2.9

 Attachments: LUCENE-1806.patch

   Original Estimate: 0.03h
  Remaining Estimate: 0.03h

 Add passing args to JUnit.  (Like Solr and mainly for debugging).  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1800) QueryParser should use reusable token streams

2009-08-13 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742781#action_12742781
 ] 

Erik Hatcher commented on LUCENE-1800:
--

Does anyone use PrecedenceQueryParser?   It was an experiment tossed out there, 
but I've not heard of anyone using it for real.  

 QueryParser should use reusable token streams
 -

 Key: LUCENE-1800
 URL: https://issues.apache.org/jira/browse/LUCENE-1800
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 2.9

 Attachments: LUCENE-1800.patch, LUCENE-1800_analyzingQP.patch


 Just like indexing, the query parser should use reusable token streams

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-1405) Support for new Resources model in ant 1.7 in Lucene ant task.

2009-06-19 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved LUCENE-1405.
--

Resolution: Fixed

Przemyslaw - apologies for the delay in addressing this valuable patch.  It's 
now been tested and committed.  I also added a comment to example.xml showing 
how to run the index task from a source checkout.

 Support for new Resources model in ant 1.7 in Lucene ant task.
 --

 Key: LUCENE-1405
 URL: https://issues.apache.org/jira/browse/LUCENE-1405
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.3.2
Reporter: Przemyslaw Sztoch
Assignee: Erik Hatcher
 Fix For: 2.9

 Attachments: lucene-ant1.7-newresources.patch


 Ant Task for Lucene should use modern Resource model (not only FileSet child 
 element).
 There is a patch with required changes.
 Supported by old (ant 1.6) and new (ant 1.7) resources model:
 index  !-- Lucene Ant Task --
   fileset ... /
 /index 
 Supported only by new (ant 1.7) resources model:
 index  !-- Lucene Ant Task --
   filelist ... /
 /index 
 index  !-- Lucene Ant Task --
   userdefinied-filesource ... /
 /index 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-1405) Support for new Resources model in ant 1.7 in Lucene ant task.

2009-06-12 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher reassigned LUCENE-1405:


Assignee: Erik Hatcher

 Support for new Resources model in ant 1.7 in Lucene ant task.
 --

 Key: LUCENE-1405
 URL: https://issues.apache.org/jira/browse/LUCENE-1405
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.3.2
Reporter: Przemyslaw Sztoch
Assignee: Erik Hatcher
 Fix For: 2.9

 Attachments: lucene-ant1.7-newresources.patch


 Ant Task for Lucene should use modern Resource model (not only FileSet child 
 element).
 There is a patch with required changes.
 Supported by old (ant 1.6) and new (ant 1.7) resources model:
 index  !-- Lucene Ant Task --
   fileset ... /
 /index 
 Supported only by new (ant 1.7) resources model:
 index  !-- Lucene Ant Task --
   filelist ... /
 /index 
 index  !-- Lucene Ant Task --
   userdefinied-filesource ... /
 /index 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-1629) contrib intelligent Analyzer for Chinese

2009-05-13 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708912#action_12708912
 ] 

Erik Hatcher edited comment on LUCENE-1629 at 5/13/09 5:58 AM:
---

My initial thought is to move the copy excluding {noformat} **/*.java and 
**/*.html{noformat}  to the compile macro.   In the ancient past, Ant 
actually used to do this automatically with javac.



  was (Author: ehatcher):
My initial thought is to move the copy excluding **/*.java and **/*.html 
to the compile macro.   In the ancient past, Ant actually used to do this 
automatically with javac.


  
 contrib intelligent Analyzer for Chinese
 

 Key: LUCENE-1629
 URL: https://issues.apache.org/jira/browse/LUCENE-1629
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Affects Versions: 2.4.1
 Environment: for java 1.5 or higher, lucene 2.4.1
Reporter: Xiaoping Gao
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: analysis-data.zip, bigramdict.mem, 
 build-resources.patch, coredict.mem, LUCENE-1629-java1.4.patch


 I wrote a Analyzer for apache lucene for analyzing sentences in Chinese 
 language. it's called imdict-chinese-analyzer, the project on google code 
 is here: http://code.google.com/p/imdict-chinese-analyzer/
 In Chinese, 我是中国人(I am Chinese), should be tokenized as 我(I)   是(am)   
 中国人(Chinese), not 我 是中 国人. So the analyzer must handle each sentence 
 properly, or there will be mis-understandings everywhere in the index 
 constructed by Lucene, and the accuracy of the search engine will be affected 
 seriously!
 Although there are two analyzer packages in apache repository which can 
 handle Chinese: ChineseAnalyzer and CJKAnalyzer, they take each character or 
 every two adjoining characters as a single word, this is obviously not true 
 in reality, also this strategy will increase the index size and hurt the 
 performance baddly.
 The algorithm of imdict-chinese-analyzer is based on Hidden Markov Model 
 (HMM), so it can tokenize chinese sentence in a really intelligent way. 
 Tokenizaion accuracy of this model is above 90% according to the paper 
 HHMM-based Chinese Lexical analyzer ICTCLAL while other analyzer's is about 
 60%.
 As imdict-chinese-analyzer is a really fast and intelligent. I want to 
 contribute it to the apache lucene repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1629) contrib intelligent Analyzer for Chinese

2009-05-13 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708912#action_12708912
 ] 

Erik Hatcher commented on LUCENE-1629:
--

My initial thought is to move the copy excluding **/*.java and **/*.html to 
the compile macro.   In the ancient past, Ant actually used to do this 
automatically with javac.



 contrib intelligent Analyzer for Chinese
 

 Key: LUCENE-1629
 URL: https://issues.apache.org/jira/browse/LUCENE-1629
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Affects Versions: 2.4.1
 Environment: for java 1.5 or higher, lucene 2.4.1
Reporter: Xiaoping Gao
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: analysis-data.zip, bigramdict.mem, 
 build-resources.patch, coredict.mem, LUCENE-1629-java1.4.patch


 I wrote a Analyzer for apache lucene for analyzing sentences in Chinese 
 language. it's called imdict-chinese-analyzer, the project on google code 
 is here: http://code.google.com/p/imdict-chinese-analyzer/
 In Chinese, 我是中国人(I am Chinese), should be tokenized as 我(I)   是(am)   
 中国人(Chinese), not 我 是中 国人. So the analyzer must handle each sentence 
 properly, or there will be mis-understandings everywhere in the index 
 constructed by Lucene, and the accuracy of the search engine will be affected 
 seriously!
 Although there are two analyzer packages in apache repository which can 
 handle Chinese: ChineseAnalyzer and CJKAnalyzer, they take each character or 
 every two adjoining characters as a single word, this is obviously not true 
 in reality, also this strategy will increase the index size and hurt the 
 performance baddly.
 The algorithm of imdict-chinese-analyzer is based on Hidden Markov Model 
 (HMM), so it can tokenize chinese sentence in a really intelligent way. 
 Tokenizaion accuracy of this model is above 90% according to the paper 
 HHMM-based Chinese Lexical analyzer ICTCLAL while other analyzer's is about 
 60%.
 As imdict-chinese-analyzer is a really fast and intelligent. I want to 
 contribute it to the apache lucene repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-09 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662475#action_12662475
 ] 

Erik Hatcher commented on LUCENE-1314:
--

{quote}
Is there a way with ant to only test one test case?
Tried:
ant -Dtestcase=org.apache.lucene.index.TestIndexReaderReopen test-core which 
according to the Wiki http://wiki.apache.org/lucene-java/HowToContribute should 
work. 
{quote}

The value of the testcase parameter fits in this way **/${testcase}.java in 
common-build.xml, so in your case it'd be -Dtestcase=TestIndexReaderReopen


 IndexReader.clone
 -

 Key: LUCENE-1314
 URL: https://issues.apache.org/jira/browse/LUCENE-1314
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.3.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, 
 LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, 
 LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, lucene-1314.patch, 
 lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, 
 lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, 
 lucene-1314.patch, lucene-1314.patch, lucene-1314.patch


 Based on discussion 
 http://www.nabble.com/IndexReader.reopen-issue-td18070256.html.  The problem 
 is reopen returns the same reader if there are no changes, so if docs are 
 deleted from the new reader, they are also reflected in the previous reader 
 which is not always desired behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1387) Add LocalLucene

2008-12-19 Thread Erik Hatcher (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12658062#action_12658062
]

Erik Hatcher commented on LUCENE-1387:
--

I've taken some quick peeks into the code, run the unit tests, nicely packaged
and presented!

A couple of thoughts:

* Maybe the Filter's should be using the DocIdSet API rather than the BitSet
deprecated stuff? We can refactor that after being committed I supposed, but
not something we want to leave like that.

* DistanceQuery is awkwardly named. It's not an (extends) Query it's a
POJO with helpers. Maybe DistanceQueryFactory? (but it creates a Filter also)

* CartesianPolyFilter is not a Filter (but CartesianShapeFilter is)

I think this looks good enough to commit as well, just noting the above for
cosmetic refactoring consideration after the code is in.

Add LocalLucene
---

Key: LUCENE-1387
URL: https://issues.apache.org/jira/browse/LUCENE-1387
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/*
Reporter: Grant Ingersoll
Priority: Minor
Attachments: spatial-lucene.zip, spatial.tar.gz, spatial.zip

Local Lucene (Geo-search) has been donated to the Lucene project, per
https://issues.apache.org/jira/browse/INCUBATOR-77. This issue is to handle
the Lucene portion of integration.
See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1061) Adding a factory to QueryParser to instantiate query instances

2008-08-28 Thread Erik Hatcher (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12626499#action_12626499
]

Erik Hatcher commented on LUCENE-1061:
--

Michael - you are a machine!

+1 to the subclassing approach and your general patch.

What might be even more interesting is to make the newXXX methods return Query
instead of a specific type. I'm not sure if that would work in all cases
(surely not for BooleanQuery), but might for most of 'em.

For example, what if newTermQuery(Term term) returned a Query instead of a
TermQuery? That'd add a fair bit more flexibility, as long as none of the
calling code needed a specific type of Query.

The hoops we jump through because we're in Java sheesh. :)

Adding a factory to QueryParser to instantiate query instances
--

Key: LUCENE-1061
URL: https://issues.apache.org/jira/browse/LUCENE-1061
Project: Lucene - Java
Issue Type: Improvement
Components: QueryParser
Affects Versions: 2.3
Reporter: John Wang
Assignee: Michael McCandless
Fix For: 2.4

Attachments: LUCENE-1061.patch, lucene_patch.txt

With the new efforts with Payload and scoring functions, it would be nice to
plugin custom query implementations while using the same QueryParser.
Included is a patch with some refactoring the QueryParser to take a factory
that produces query instances.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1061) Adding a factory to QueryParser to instantiate query instances

2008-08-27 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12626055#action_12626055
 ] 

Erik Hatcher commented on LUCENE-1061:
--

What's wrong with just subclassing QueryParser and overriding the desired 
methods?   Either way someone wanting to provide custom Query implementations 
will be writing effectively the same code, just with more indirection with this 
method.

 Adding a factory to QueryParser to instantiate query instances
 --

 Key: LUCENE-1061
 URL: https://issues.apache.org/jira/browse/LUCENE-1061
 Project: Lucene - Java
  Issue Type: Improvement
  Components: QueryParser
Affects Versions: 2.3
Reporter: John Wang
 Fix For: 2.4

 Attachments: lucene_patch.txt


 With the new efforts with Payload and scoring functions, it would be nice to 
 plugin custom query implementations while using the same QueryParser.
 Included is a patch with some refactoring the QueryParser to take a factory 
 that produces query instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1343) A replacement for ISOLatin1AccentFilter that does a more thorough job of removing diacritical marks or non-spacing modifiers.

2008-08-14 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12622476#action_12622476
 ] 

Erik Hatcher commented on LUCENE-1343:
--

{quote}
Unit tests are the best way to document the many ways this thing can work.
{quote}

gets a judges score of 11 from me.  Gold for Lance for Quote of the Day.

 A replacement for ISOLatin1AccentFilter that does a more thorough job of 
 removing diacritical marks or non-spacing modifiers.
 -

 Key: LUCENE-1343
 URL: https://issues.apache.org/jira/browse/LUCENE-1343
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Reporter: Robert Haschart
Priority: Minor
 Attachments: normalizer.jar, UnicodeCharUtil.java, 
 UnicodeNormalizationFilter.java, UnicodeNormalizationFilterFactory.java


 The ISOLatin1AccentFilter takes Unicode characters that have diacritical 
 marks and replaces them with a version of that character with the diacritical 
 mark removed.  For example é becomes e.  However another equally valid way of 
 representing an accented character in Unicode is to have the unaccented 
 character followed by a non-spacing modifier character (like this:  é  )
 The ISOLatin1AccentFilter doesn't handle the accents in decomposed unicode 
 characters at all.Additionally there are some instances where a word will 
 contain what looks like an accented character, that is actually considered to 
 be a separate unaccented character  such as  Ł  but which to make searching 
 easier you want to fold onto the latin1  lookalike  version   L  .   
 The UnicodeNormalizationFilter can filter out accents and diacritical marks 
 whether they occur as composed characters or decomposed characters, it can 
 also handle cases where as described above characters that look like they 
 have diacritics (but don't) are to be folded onto the letter that they look 
 like ( Ł  - L )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word

2007-12-18 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552905
 ] 

Erik Hatcher commented on LUCENE-1095:
--

I believe QueryParser has been fixed since that first change I made mentioned 
by Steven to account for positions returned from an Analyzer.So maybe all 
is well with fixing StopFilter now.  Unit tests needed :)

 StopFilter should have option to incr positionIncrement after stop word
 ---

 Key: LUCENE-1095
 URL: https://issues.apache.org/jira/browse/LUCENE-1095
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man

 I've seen this come up on the mailing list a few times in the last month, so 
 i'm filing a known bug/improvement arround it...
 StopFilter should have an option that if set, records how many stop words are 
 skipped in a row, and then sets that value as the positionIncrement on the 
 next token that StopFilter does return.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-167) [PATCH] QueryParser not handling queries containing AND and OR

2007-12-06 Thread Erik Hatcher (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549046
]

Erik Hatcher commented on LUCENE-167:
-

the PrecedenceQueryParser is in the contrib/miscellaneous codebase (in Lucene's
repo) and in released miscellaneous JAR. But it has some issues that are
documented in the test case, so it is definitely not ready for prime time.

[PATCH] QueryParser not handling queries containing AND and OR
--

Key: LUCENE-167
URL: https://issues.apache.org/jira/browse/LUCENE-167
Project: Lucene - Java
Issue Type: Bug
Components: QueryParser
Affects Versions: unspecified
Environment: Operating System: Linux
Platform: PC
Reporter: Morus Walter
Assignee: Erik Hatcher
Attachments: LuceneTest.java, QueryParser.jj.patch, QueryParser.patch

The QueryParser does not seem to handle boolean queries containing AND and OR
operators correctly:
e.g.
a AND b OR c AND d gets parsed as +a +b +c +d.
The attached patch fixes this by changing the vector of boolean clauses into a
vector of vectors of boolean clauses in the addClause method of the query
parser. A new sub-vector is created whenever an explicit OR operator is used.
Queries using explicit AND/OR are grouped by precedence of AND over OR. That
is
a OR b AND c gets a OR (b AND c).
Queries using implicit AND/OR (depending on the default operator) are handled
as
before (so one can still use a +b -c to create one boolean query, where b is
required, c forbidden and a optional).
It's less clear how a query using both explizit AND/OR and implicit operators
should be handled.
Since the patch groups on explicit OR operators a query
a OR b c is read as a (b c)
whereas
a AND b c as +a +b c
(given that default operator or is used).
There's one issue left:
The old query parser reads a query
`a OR NOT b' as `a -b' which is the same as `a AND NOT b'.
The modified query parser reads this as `a (-b)'.
While this looks better (at least to me), it does not produce the result of a
OR
NOT b. Instead the (-b) part seems to be silently dropped.
While I understand that this query is illegal (just searching for one negative
term) I don't think that silently dropping this part is an appropriate way to
deal with that. But I don't think that's a query parser issue.
The only question is, if the query parser should take care of that.
I attached the patch (made against 1.3rc3 but working for 1.3final as well)
and
a test program.
The test program parses a number of queries with default-or and default-and
operator and reparses the result of the toString method of the created query.
It outputs the initial query, the parsed query with default or, the reparesed
query, the parsed query with the default and it's reparsed query.
If called with a -q option, it also run's the queries against an index
consisting of all documentes containing one or none a b c or d.
Using an unpatched and a patched version of lucene in the classpath one can
look
at the effect of the patch in detail.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1049) Simple toString() for BooleanFilter

2007-11-09 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541497
 ] 

Erik Hatcher commented on LUCENE-1049:
--

Jason - the patch looks like it is generated backwards (minus signs, not 
plusses).  

 Simple toString() for BooleanFilter
 ---

 Key: LUCENE-1049
 URL: https://issues.apache.org/jira/browse/LUCENE-1049
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Reporter: Jason Calabrese
Priority: Trivial
 Attachments: patch.txt


 While working with BooleanFilter I wanted a basic toString() for debugging.
 This is what I came up.  It works ok for me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Assigned: (LUCENE-961) RegexCapabilities is not Serializable

2007-07-18 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher reassigned LUCENE-961:
---

Assignee: Erik Hatcher

 RegexCapabilities is not Serializable
 -

 Key: LUCENE-961
 URL: https://issues.apache.org/jira/browse/LUCENE-961
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 2.2
Reporter: Konrad Rokicki
Assignee: Erik Hatcher
Priority: Minor

 The class RegexQuery is marked Serializable by its super class, but it 
 contains a RegexCapabilities which is not Serializable. Thus attempting to 
 serialize the query results in an exception. 
 Making RegexCapabilities serializable should be no problem since its 
 subclasses contain only serializable classes (java.util.regex.Pattern and 
 org.apache.regexp.RE).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-898) contrib/javascript is not packaged into releases

2007-06-01 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500715
 ] 

Erik Hatcher commented on LUCENE-898:
-

It may still work ok, but my hunch is that changes to the QueryParser have made 
this javascript code more deprecated than anything.  

Even if we removed it from svn, it historically would still be there in case 
anyone really needed it.   

Again, I am +1 for removing it entirely after running it by the java-user list 
to see if anyone desires it.

 contrib/javascript is not packaged into releases
 

 Key: LUCENE-898
 URL: https://issues.apache.org/jira/browse/LUCENE-898
 Project: Lucene - Java
  Issue Type: Bug
  Components: Build
Reporter: Hoss Man
Priority: Trivial

 the contrib/javascript directory is (apparently) a collection of javascript 
 utilities for lucene .. but it has not build files or any mechanism to 
 package it, so it is excluded form releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-898) contrib/javascript is not packaged into releases

2007-05-31 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500453
 ] 

Erik Hatcher commented on LUCENE-898:
-

My vote is to remove the javascript contrib area entirely.  It doesn't really 
do all that much useful.  I'd be surprised if anyone really uses it.

 contrib/javascript is not packaged into releases
 

 Key: LUCENE-898
 URL: https://issues.apache.org/jira/browse/LUCENE-898
 Project: Lucene - Java
  Issue Type: Bug
  Components: Build
Reporter: Hoss Man
Priority: Trivial

 the contrib/javascript directory is (apparently) a collection of javascript 
 utilities for lucene .. but it has not build files or any mechanism to 
 package it, so it is excluded form releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-885) clean up build files so contrib tests are run more easily

2007-05-29 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499740
 ] 

Erik Hatcher commented on LUCENE-885:
-

PQP was a hack I made long ago to mainly show how QP could be possibly 
improved. I'm fine with that class being removed altogether, or the failing 
tests commented out.  I don't use that class personally.

 clean up build files so contrib tests are run more easily
 -

 Key: LUCENE-885
 URL: https://issues.apache.org/jira/browse/LUCENE-885
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: LUCENE-885.patch, LUCENE-885.patch


 Per mailing list discussion...
 http://www.nabble.com/Tests%2C-Contribs%2C-and-Releases-tf3768924.html#a10655448
 Tests for contribs should be run when ant test is used,  existing test 
 target renamed to test-core

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-889) Standard tokenizer with punctuation output

2007-05-25 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499085
 ] 

Erik Hatcher commented on LUCENE-889:
-

This patch concerns me.  This changes default behavior in a very basic and 
commonly used piece of Lucene.  At the very least this should be made entirely 
optional and off by default.  

Thoughts?

 Standard tokenizer with punctuation output
 --

 Key: LUCENE-889
 URL: https://issues.apache.org/jira/browse/LUCENE-889
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.1
Reporter: Karl Wettin
Priority: Trivial
 Attachments: standard.patch, test.patch


 This patch adds punctuation (comma, period, question mark and exclamation 
 point)  tokens as output from the StandardTokenizer, and filters them out in 
 the StandardFilter.
 (I needed them for text classification reasons.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

2007-05-03 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493570
 ] 

Erik Hatcher commented on LUCENE-874:
-

Do note that Solr can be embedded: http://wiki.apache.org/solr/EmbeddedSolr
And there are improvements to this in the works too.

 Automatic reopen of IndexSearcher/IndexReader
 -

 Key: LUCENE-874
 URL: https://issues.apache.org/jira/browse/LUCENE-874
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: João Fonseca
Priority: Minor

 To improve performance, a single instance of IndexSearcher should be used. 
 However, if the index is updated, it's hard to close/reopen it, because 
 multiple threads may be accessing it at the same time.
 Lucene should include an out-of-the-box solution to this problem. Either a 
 new class should be implemented to manage this behaviour (singleton 
 IndexSearcher, plus detection of a modified index, plus safely closing and 
 reopening the IndexSearcher) or this could be behind the scenes by the 
 IndexSearcher class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (LUCENE-707) Lucene Java Site docs

2007-03-20 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher closed LUCENE-707.
---


Applied, thanks George!

 Lucene Java Site docs
 -

 Key: LUCENE-707
 URL: https://issues.apache.org/jira/browse/LUCENE-707
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Website
 Environment: N/A
Reporter: Grant Ingersoll
 Assigned To: Grant Ingersoll
Priority: Minor
 Attachments: lucene.apache.org.patch


 It would be really nice if the Java site docs where consistent with the rest 
 of the Lucene family (namely, with navigation tabs, etc.) so that one can 
 easily go between Nutch, Hadoop, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Assigned: (LUCENE-707) Lucene Java Site docs

2007-03-20 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher reassigned LUCENE-707:
---

Assignee: Erik Hatcher  (was: Grant Ingersoll)

 Lucene Java Site docs
 -

 Key: LUCENE-707
 URL: https://issues.apache.org/jira/browse/LUCENE-707
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Website
 Environment: N/A
Reporter: Grant Ingersoll
 Assigned To: Erik Hatcher
Priority: Minor
 Attachments: lucene.apache.org.patch


 It would be really nice if the Java site docs where consistent with the rest 
 of the Lucene family (namely, with navigation tabs, etc.) so that one can 
 easily go between Nutch, Hadoop, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-446) FunctionQuery - score based on field value

2007-03-18 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481944
 ] 

Erik Hatcher commented on LUCENE-446:
-

+1 to FunctionQuery being brought into Lucene proper.

 FunctionQuery - score based on field value
 --

 Key: LUCENE-446
 URL: https://issues.apache.org/jira/browse/LUCENE-446
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 1.9
Reporter: Yonik Seeley
 Attachments: function.zip, function.zip


 FunctionQuery can return a score based on a field's value or on it's ordinal 
 value.
 FunctionFactory subclasses define the details of the function.  There is 
 currently a LinearFloatFunction (a line specified by slope and intercept).
 Field values are typically obtained from FieldValueSourceFactory.  
 Implementations include FloatFieldSource, IntFieldSource, and OrdFieldSource.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-805) New Lucene Demo

2007-02-15 Thread Erik Hatcher (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473419
]

Erik Hatcher commented on LUCENE-805:
-

The examples from Lucene in Action are freely available and Otis and I are fine
with assigning the ASL to them (its currently unspecified but implicitly ASLd).
If these would be useful, at least the Indexer.java and Searcher.java which
are better demos than current demo application, we're free to use that as a
starter. All the code could be contributed if folks are ok with that.

In fact, maybe Otis and I should do the 2nd edition codebase within the Lucene
svn somewhere so that it serves as a built-in example.

New Lucene Demo
---

Key: LUCENE-805
URL: https://issues.apache.org/jira/browse/LUCENE-805
Project: Lucene - Java
Issue Type: Improvement
Components: Examples
Reporter: Grant Ingersoll
Assigned To: Grant Ingersoll
Priority: Minor

The much maligned demo, while useful, could use a breath of fresh air. This
issue is to start collecting requirements about what people would like to see
in a demo and what they don't like in the current one.
Ideas (not necessarily in order of importance):
1. More in-depth tutorial explaining indexing/searching
2. Multilingual support/demonstration
3. Better demonstration of querying capabilities: Spans, Phrases, Wildcards,
Filters, sorting, etc.
4. Dealing with different content types and pointers to resources
5. Wiki use cases links -- I think it would be cool to solicit people to
contribute use cases to the docs.
6. Demonstration of contrib packages, esp. Highlighter
7. Performance issues/factors/tradeoffs. Lucene lessons learned and best
practices
Advanced tutorials:
1. Hadoop + Lucene
2. Writing custom analyzers/filters/tokenizers
3. Changing Scoring
4. Payloads (when they are committed)
Please contribute what else you would like to see. I may be able to address
some of these issues for my ApacheCon talk, but not all of them.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-797) Query for searching document whose title starts with ...

2007-02-07 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved LUCENE-797.
-

Resolution: Invalid

The java-user e-mail list is the appropriate forum to ask questions.  The issue 
tracker is used for tracking bugs and feature enhancements.

If you did not tokenize the title, you could use a prefix query (title*) with 
QueryParser (though you will likely want to lowercase, and index a tokenized 
title into another field for full-text search capability).  

QueryParser does not currently support the SpanQuery's, but with a SpanQuery 
you could find terms at the beginning of a field.

 Query for searching document whose title starts with ...
 

 Key: LUCENE-797
 URL: https://issues.apache.org/jira/browse/LUCENE-797
 Project: Lucene - Java
  Issue Type: Task
  Components: QueryParser
Reporter: diasp

 Do you know the correct syntax for QueryParser to search all documents whose 
 field 'title' starts with a selected text?
 Thank you for your help.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-645) Highligter fails to include non-token at end of string to be highlighted

2006-08-03 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-645?page=comments#action_12425643 ] 

Erik Hatcher commented on LUCENE-645:
-

There is some commented out code in Highlighter.java:

if (lastEndOffset  text.length())

newText.append(encoder.encodeText(text.substring(lastEndOffset)));

uncommenting that code fixes this issue.

I've added this test to HighlighterTest.java:

  public void testOffByOne() throws IOException {
TermQuery query= new TermQuery( new Term( data, help ));
Highlighter hg = new Highlighter(new SimpleHTMLFormatter(), new 
QueryScorer( query ));
hg.setTextFragmenter( new NullFragmenter() );

String match = null;
match = hg.getBestFragment( new StandardAnalyzer(), data, help me 
[54-65]);
assertEquals(Bhelp/B me [54-65], match);
  }

all tests pass even with that code uncommented.   I'll commit if there are no 
objections.

 Highligter fails to include non-token at end of string to be highlighted
 

 Key: LUCENE-645
 URL: http://issues.apache.org/jira/browse/LUCENE-645
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 1.9
 Environment: Red Hat Linux, Java 1.5
 Windows Java 1.5
Reporter: Andrew Palmer
Priority: Minor

 The following code extract show the problem
   TermQuery query= new TermQuery( new Term( data, help )); 
   Highlighter hg = new Highlighter(new SimpleHTMLFormatter(), new 
 QueryScorer( query ));
   hg.setTextFragmenter( new NullFragmenter() );
   
   String match = null;
   try {
   match = hg.getBestFragment( new StandardAnalyzer(), 
 data, help me [54-65] );
   } catch (IOException e) {
   e.printStackTrace();
   }
   System.out.println( match );
 The sytsem outputs 
 Bhelp/B me [54-65
 would expect 
 Bhelp/B me [54-65]

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (LUCENE-637) class HitDoc should be inner class or in its own .java file

2006-07-27 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-637?page=all ]

Erik Hatcher closed LUCENE-637.
---

Resolution: Invalid

 class HitDoc should be inner class or in its own .java file
 ---

 Key: LUCENE-637
 URL: http://issues.apache.org/jira/browse/LUCENE-637
 Project: Lucene - Java
  Issue Type: Wish
Affects Versions: 2.0.0
Reporter: alan ezust

 Why is class HitDoc tacked onto the end of Hits.java like that? I'd like to 
 use it.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Assigned: (LUCENE-415) Merge error during add to index (IndexOutOfBoundsException)

2006-06-24 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-415?page=all ]

Erik Hatcher reassigned LUCENE-415:
---

Assign To: Yonik Seeley  (was: Lucene Developers)

 Merge error during add to index (IndexOutOfBoundsException)
 ---

  Key: LUCENE-415
  URL: http://issues.apache.org/jira/browse/LUCENE-415
  Project: Lucene - Java
 Type: Bug

   Components: Index
 Versions: 1.4
  Environment: Operating System: Linux
 Platform: Other
 Reporter: Daniel Quaroni
 Assignee: Yonik Seeley


 I've been batch-building indexes, and I've build a couple hundred indexes 
 with 
 a total of around 150 million records.  This only happened once, so it's 
 probably impossible to reproduce, but anyway... I was building an index with 
 around 9.6 million records, and towards the end I got this:
 java.lang.IndexOutOfBoundsException: Index: 54, Size: 24
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
 at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:151)
 at 
 org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java
 :149)
 at org.apache.lucene.index.SegmentTermEnum.next
 (SegmentTermEnum.java:115)
 at org.apache.lucene.index.SegmentMergeInfo.next
 (SegmentMergeInfo.java:52)
 at org.apache.lucene.index.SegmentMerger.mergeTermInfos
 (SegmentMerger.java:294)
 at org.apache.lucene.index.SegmentMerger.mergeTerms
 (SegmentMerger.java:254)
 at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:93)
 at org.apache.lucene.index.IndexWriter.mergeSegments
 (IndexWriter.java:487)
 at org.apache.lucene.index.IndexWriter.maybeMergeSegments
 (IndexWriter.java:458)
 at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:310)
 at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:294)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

2006-05-04 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377933 ] 

Erik Hatcher commented on LUCENE-436:
-

Please, everyone, let's keep this discussion technical and factual and avoid 
making degrading statements to one another.  It doesn't help the situation to 
have such negative tones being used. The discussion aspect of this should be 
moved to java-dev anyway, and leave JIRA comments for details on patches 
attached and other technical details related directly towards resolving this 
issue.

 [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception
 

  Key: LUCENE-436
  URL: http://issues.apache.org/jira/browse/LUCENE-436
  Project: Lucene - Java
 Type: Improvement

   Components: Index
 Versions: 1.4
  Environment: Solaris JVM 1.4.1
 Linux JVM 1.4.2/1.5.0
 Windows not tested
 Reporter: kieran
  Attachments: FixedThreadLocal.java, Lucene-436-TestCase.tar.gz, 
 TermInfosReader.java, ThreadLocalTest.java

 We've been experiencing terrible memory problems on our production search 
 server, running lucene (1.4.3).
 Our live app regularly opens new indexes and, in doing so, releases old 
 IndexReaders for garbage collection.
 But...there appears to be a memory leak in 
 org.apache.lucene.index.TermInfosReader.java.
 Under certain conditions (possibly related to JVM version, although I've 
 personally observed it under both linux JVM 1.4.2_06, and 1.5.0_03, and SUNOS 
 JVM 1.4.1) the ThreadLocal member variable, enumerators doesn't get 
 garbage-collected when the TermInfosReader object is gc-ed.
 Looking at the code in TermInfosReader.java, there's no reason why it 
 _shouldn't_ be gc-ed, so I can only presume (and I've seen this suggested 
 elsewhere) that there could be a bug in the garbage collector of some JVMs.
 I've seen this problem briefly discussed; in particular at the following URL:
   http://java2.5341.com/msg/85821.html
 The patch that Doug recommended, which is included in lucene-1.4.3 doesn't 
 work in our particular circumstances. Doug's patch only clears the 
 ThreadLocal variable for the thread running the finalizer (my knowledge of 
 java breaks down here - I'm not sure which thread actually runs the 
 finalizer). In our situation, the TermInfosReader is (potentially) used by 
 more than one thread, meaning that Doug's patch _doesn't_ allow the affected 
 JVMs to correctly collect garbage.
 So...I've devised a simple patch which, from my observations on linux JVMs 
 1.4.2_06, and 1.5.0_03, fixes this problem.
 Kieran
 PS Thanks to daniel naber for pointing me to jira/lucene
 @@ -19,6 +19,7 @@
  import java.io.IOException;
  import org.apache.lucene.store.Directory;
 +import java.util.Hashtable;
  /** This stores a monotonically increasing set of Term, TermInfo pairs in a
   * Directory.  Pairs are accessed either by Term or by ordinal position the
 @@ -29,7 +30,7 @@
private String segment;
private FieldInfos fieldInfos;
 -  private ThreadLocal enumerators = new ThreadLocal();
 +  private final Hashtable enumeratorsByThread = new Hashtable();
private SegmentTermEnum origEnum;
private long size;
 @@ -60,10 +61,10 @@
}
private SegmentTermEnum getEnum() {
 -SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get();
 +SegmentTermEnum termEnum = 
 (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread());
  if (termEnum == null) {
termEnum = terms();
 -  enumerators.set(termEnum);
 +  enumeratorsByThread.put(Thread.currentThread(), termEnum);
  }
  return termEnum;
}
 @@ -195,5 +196,15 @@
public SegmentTermEnum terms(Term term) throws IOException {
  get(term);
  return (SegmentTermEnum)getEnum().clone();
 +  }
 +
 +  /* some jvms might have trouble gc-ing enumeratorsByThread */
 +  protected void finalize() throws Throwable {
 +try {
 +// make sure gc can clear up.
 +enumeratorsByThread.clear();
 +} finally {
 +super.finalize();
 +}
}
  }
 TermInfosReader.java, full source:
 ==
 package org.apache.lucene.index;
 /**
  * Copyright 2004 The Apache Software Foundation
  *
  * Licensed under the Apache License, Version 2.0 (the License);
  * you may not use this file except in compliance with the License.
  * You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an AS IS BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
 import java.io.IOException;
 import org.apache.lucene.store.Directory;
 import java.util.Hashtable;
 /** This stores a monotonically increasing

[jira] Commented: (LUCENE-555) Index Corruption

2006-04-24 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-555?page=comments#action_12375996 ] 

Erik Hatcher commented on LUCENE-555:
-

Could you share a test case that demonstrates this issue?

 Index Corruption
 

  Key: LUCENE-555
  URL: http://issues.apache.org/jira/browse/LUCENE-555
  Project: Lucene - Java
 Type: Bug

   Components: Index
 Versions: 1.9
  Environment: Linux FC4, Java 1.4.9
 Reporter: dan
 Priority: Critical


 Index Corruption
  output
 java.io.FileNotFoundException: ../_aki.fnm (No such file or directory)
 at java.io.RandomAccessFile.open(Native Method)
 at java.io.RandomAccessFile.init(RandomAccessFile.java:204)
 at 
 org.apache.lucene.store.FSIndexInput$Descriptor.init(FSDirectory.java:425)
 at org.apache.lucene.store.FSIndexInput.init(FSDirectory.java:434)
 at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:324)
 at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:56)
 at 
 org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:144)
 at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:129)
 at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:110)
 at 
 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:674)
 at 
 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658)
 at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517)
  input
 - I open an index, I read, I write, I optimize, and eventually the above 
 happens. The index is unusable.
 - This has happened to me somewhere between 20 and 30 times now - on indexes 
 of different shapes and sizes.
 - I don't know the reason. But, the following requirement applies regardless.
  requirement
 - Like all modern database programs, there has to be a way to repair an 
 index. Period.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (LUCENE-527) Bug in the TermDocs.freq() method?

2006-03-20 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-527?page=all ]
 
Erik Hatcher closed LUCENE-527:
---

Resolution: Invalid

 Bug in  the TermDocs.freq() method?
 ---

  Key: LUCENE-527
  URL: http://issues.apache.org/jira/browse/LUCENE-527
  Project: Lucene - Java
 Type: Bug
 Versions: 1.9
  Environment: Scientific linux
 Reporter: Håkon T. Bommen


 I belive I get incorrect data from the TermDocs.freq() method. The attached 
 code demonstrate this. Document one has correct term count. In document zero 
 and two, the term stored and indexed is reported to occure once in both 
 documents. This is incorrect.
 // LuceneTest.java
 import org.apache.lucene.analysis.Analyzer;
 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.queryParser.ParseException;
 import org.apache.lucene.document.*;
 import org.apache.lucene.index.*;
 import org.apache.lucene.search.*;
 import org.apache.lucene.queryParser.QueryParser;
 import org.apache.lucene.store.RAMDirectory;
 import org.apache.lucene.store.Directory;
 public class LuceneTest{
   public LuceneTest(){}
 public static void main(String[] args){
   IndexWriter writer;
   IndexReader reader;
   Searcher searcher;
   Document doc;
   Directory dir = new RAMDirectory();
   try{
   // create index
   writer = new IndexWriter( dir , new StandardAnalyzer(), 
 true);
   doc = new Document();
   doc.add( new Field( title, Doc 0, Field.Store.YES, 
 Field.Index.TOKENIZED ) );
   doc.add( new Field( contents, Text Text and more 
 Text, Field.Store.NO, Field.Index.TOKENIZED ) );
   writer.addDocument(doc);
   doc = new Document();
   doc.add( new Field( title, Doc 1, Field.Store.YES, 
 Field.Index.TOKENIZED ) );
   doc.add( new Field( contents, This text is not 
 stored, only indexed., Field.Store.NO, Field.Index.TOKENIZED ) );
   writer.addDocument(doc);
   doc = new Document();
   doc.add( new Field( title, Doc 2, Field.Store.YES, 
 Field.Index.TOKENIZED ) );
   doc.add( new Field( contents, Text Text Text Text, 
 Field.Store.NO, Field.Index.TOKENIZED ) );
   writer.addDocument(doc);
   writer.close();
   // search
   searcher = new IndexSearcher(dir);
   reader = IndexReader.open(dir);
   QueryParser qp = new QueryParser(contents, new 
 StandardAnalyzer());
 Query query = qp.parse(stored and indexed text);
   String[] terms = {stored, indexed, text};
   Hits queryHits = searcher.search(query);
   // print results
   System.out.println( Found  + queryHits.length() +  
 hits.);
   for(int i=0; iqueryHits.length(); i++){
   doc = queryHits.doc(i);
   System.out.println(***  + doc.get(title) + 
  ***);
   int docID = queryHits.id(i);
   for (int j=0; jterms.length; j++){
   TermDocs td = reader.termDocs( new 
 Term(contents, terms[j]) );
   td.skipTo(docID);
   System.out.println( Term ' + terms[j] 
 + ' occures  +
   td.freq() +  time(s) in 
 document nr.  + docID );
   }
   }
   }catch(Exception e){System.out.println(Darn);}
   }
 }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-330) [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2006-03-07 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-330?page=comments#action_12369342 ] 

Erik Hatcher commented on LUCENE-330:
-

The patch from FilteredQueryPatch1.txt has been applied and committed.  Thanks 
for the fix, Paul!

 [PATCH] Use filter bits for next() and skipTo() in FilteredQuery
 

  Key: LUCENE-330
  URL: http://issues.apache.org/jira/browse/LUCENE-330
  Project: Lucene - Java
 Type: Improvement
   Components: Search
 Versions: CVS Nightly - Specify date in submission
  Environment: Operating System: other
 Platform: Other
 Reporter: paul.elschot
 Assignee: Lucene Developers
 Priority: Minor
  Attachments: FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, 
 FilteredQuery.java, FilteredQueryPatch1.txt, IndexSearcherPatch2.txt, 
 SkipFilter.java, SkipFilter.java

 This improves performance of FilteredQuery by not calling score() 
 on documents that do not pass the filter. 
 This passes the current tests for FilteredQuery, but these tests 
 have not been adapted/extended.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-330) [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2006-03-05 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-330?page=comments#action_12368934 ] 

Erik Hatcher commented on LUCENE-330:
-

Paul - it is unfortunate that we've let this patch sit for as long as it has.  
I've just encountered issues with FilteredQuery myself and am looking to apply 
your patches in hopes they'll address the problem I've encountered with 
FilteredQuery's nested within a BooleanQuery.  There is a comment in some of 
your code that this doesn't work with BooleanQuery though.  Since the code has 
changed and your patches are no longer easily applied, could you advise on what 
the latest patches should be and how to go about going from trunk to these 
patches?   Many thanks!

 [PATCH] Use filter bits for next() and skipTo() in FilteredQuery
 

  Key: LUCENE-330
  URL: http://issues.apache.org/jira/browse/LUCENE-330
  Project: Lucene - Java
 Type: Improvement
   Components: Search
 Versions: CVS Nightly - Specify date in submission
  Environment: Operating System: other
 Platform: Other
 Reporter: paul.elschot
 Assignee: Lucene Developers
 Priority: Minor
  Attachments: FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, 
 FilteredQuery.java, FilteredQueryPatch1.txt, IndexSearcherPatch2.txt, 
 SkipFilter.java, SkipFilter.java

 This improves performance of FilteredQuery by not calling score() 
 on documents that do not pass the filter. 
 This passes the current tests for FilteredQuery, but these tests 
 have not been adapted/extended.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-330) [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2006-03-05 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-330?page=comments#action_12368948 ] 

Erik Hatcher commented on LUCENE-330:
-

I manually applied that patch (prior to my first comment actually) as 
automatically applying didn't work.  I just committed another test to 
TestFilteredQuery, which fails with this patch with this error:

java.lang.IndexOutOfBoundsException: Not a valid hit number: 0
at org.apache.lucene.search.Hits.hitDoc(Hits.java:134)
at org.apache.lucene.search.Hits.id(Hits.java:116)
at 
org.apache.lucene.search.TestFilteredQuery.testBoolean(TestFilteredQuery.java:139)

I'm fairly confident I applied the patch correctly, though I suppose its 
possible I missed something.  

Here's an inlined version of the diff I have locally of FilteredQuery:

$ svn diff FilteredQuery.java
Index: FilteredQuery.java
===
--- FilteredQuery.java  (revision 383339)
+++ FilteredQuery.java  (working copy)
@@ -34,6 +34,7 @@
  * pCreated: Apr 20, 2004 8:58:29 AM
  *
  * @author  Tim Jones
+ * @author  Paul Elschot
  * @since   1.4
  * @version $Id$
  * @see CachingWrapperFilter
@@ -75,22 +76,42 @@
   // return this query
   public Query getQuery() { return FilteredQuery.this; }
 
-  // return a scorer that overrides the enclosed query's score if
-  // the given hit has been filtered out.
-  public Scorer scorer (IndexReader indexReader) throws IOException {
+  // return a filtering scorer
+   public Scorer scorer (IndexReader indexReader) throws IOException {
 final Scorer scorer = weight.scorer (indexReader);
 final BitSet bitset = filter.bits (indexReader);
 return new Scorer (similarity) {
 
-  // pass these methods through to the enclosed scorer
-  public boolean next() throws IOException { return scorer.next(); }
+  public boolean next() throws IOException {
+do {
+  if (! scorer.next()) {
+return false;
+  }
+} while (! bitset.get(scorer.doc()));
+/* When skipTo() is allowed on scorer it should be used here
+ * in combination with bitset.nextSetBit(...)
+ * See the while loop in skipTo() below.
+ */
+return true;
+  }
   public int doc() { return scorer.doc(); }
-  public boolean skipTo (int i) throws IOException { return 
scorer.skipTo(i); }
 
-  // if the document has been filtered out, set score to 0.0
-  public float score() throws IOException {
-return (bitset.get(scorer.doc())) ? scorer.score() : 0.0f;
-  }
+  public boolean skipTo(int i) throws IOException {
+if (! scorer.skipTo(i)) {
+  return false;
+}
+while (! bitset.get(scorer.doc())) {
+  int nextFiltered = bitset.nextSetBit(scorer.doc() + 1);
+  if (nextFiltered == -1) {
+return false;
+  } else if (! scorer.skipTo(nextFiltered)) {
+return false;
+  }
+}
+return true;
+   }
+  
+  public float score() throws IOException { return scorer.score(); }
 
   // add an explanation about whether the document was filtered
   public Explanation explain (int i) throws IOException {

What am I missing?

 [PATCH] Use filter bits for next() and skipTo() in FilteredQuery
 

  Key: LUCENE-330
  URL: http://issues.apache.org/jira/browse/LUCENE-330
  Project: Lucene - Java
 Type: Improvement
   Components: Search
 Versions: CVS Nightly - Specify date in submission
  Environment: Operating System: other
 Platform: Other
 Reporter: paul.elschot
 Assignee: Lucene Developers
 Priority: Minor
  Attachments: FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, 
 FilteredQuery.java, FilteredQueryPatch1.txt, IndexSearcherPatch2.txt, 
 SkipFilter.java, SkipFilter.java

 This improves performance of FilteredQuery by not calling score() 
 on documents that do not pass the filter. 
 This passes the current tests for FilteredQuery, but these tests 
 have not been adapted/extended.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-330) [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2006-03-05 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-330?page=comments#action_12368953 ] 

Erik Hatcher commented on LUCENE-330:
-

 Could you be more specific? 

The new TestFilteredQuery shows the details of the failure, with the stack 
trace in my last comment provided the patch I supplied.  Those are all the 
specifics I have.

 The patch contains my name as @author, could that be removed? 

Sure, no problem.  I simply was true to the patch you provided earlier on in 
this issue, but I'd be happy to remove it if this patch gets committed.


 [PATCH] Use filter bits for next() and skipTo() in FilteredQuery
 

  Key: LUCENE-330
  URL: http://issues.apache.org/jira/browse/LUCENE-330
  Project: Lucene - Java
 Type: Improvement
   Components: Search
 Versions: CVS Nightly - Specify date in submission
  Environment: Operating System: other
 Platform: Other
 Reporter: paul.elschot
 Assignee: Lucene Developers
 Priority: Minor
  Attachments: FilteredQuery.java, FilteredQuery.java, FilteredQuery.java, 
 FilteredQuery.java, FilteredQueryPatch1.txt, IndexSearcherPatch2.txt, 
 SkipFilter.java, SkipFilter.java

 This improves performance of FilteredQuery by not calling score() 
 on documents that do not pass the filter. 
 This passes the current tests for FilteredQuery, but these tests 
 have not been adapted/extended.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans

2006-01-27 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-413?page=all ]

Erik Hatcher updated LUCENE-413:


Attachment: (was: BooleanScorer2.java)

 [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans
 -

  Key: LUCENE-413
  URL: http://issues.apache.org/jira/browse/LUCENE-413
  Project: Lucene - Java
 Type: Bug
   Components: Search
 Versions: CVS Nightly - Specify date in submission
  Environment: Operating System: other
 Platform: Other
 Reporter: paul.elschot
 Assignee: Lucene Developers
  Attachments: DisjunctionSumScorerPatch3.txt, DisjunctionSumScorerPatch4.txt, 
 DisjunctionSumScorerTestPatch1.txt, NearSpansOrdered.java, 
 NearSpansOrderedBugHuntPatch1.txt, NearSpansUnordered.java, 
 SpanNearQueryPatch1.txt, SpanScorerTestPatch1.txt, TestSpansPatch1.txt

 From Erik's post at java-dev: 
  
       [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 
       [java]     at org.apache.lucene.search.BooleanScorer2  
  $Coordinator.coordFactor(BooleanScorer2.java:54) 
       [java]     at org.apache.lucene.search.BooleanScorer2.score  
  (BooleanScorer2.java:292) 
 ... 
  
 and my answer: 
  
 Probably nrMatchers is increased too often in score() by calling score() 
 more than once.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans

2006-01-27 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-413?page=all ]

Erik Hatcher updated LUCENE-413:


Attachment: (was: BooleanScorer2Patch20050721.txt)

 [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans
 -

  Key: LUCENE-413
  URL: http://issues.apache.org/jira/browse/LUCENE-413
  Project: Lucene - Java
 Type: Bug
   Components: Search
 Versions: CVS Nightly - Specify date in submission
  Environment: Operating System: other
 Platform: Other
 Reporter: paul.elschot
 Assignee: Lucene Developers
  Attachments: DisjunctionSumScorerPatch3.txt, DisjunctionSumScorerPatch4.txt, 
 DisjunctionSumScorerTestPatch1.txt, NearSpansOrdered.java, 
 NearSpansOrderedBugHuntPatch1.txt, NearSpansUnordered.java, 
 SpanNearQueryPatch1.txt, SpanScorerTestPatch1.txt, TestSpansPatch1.txt

 From Erik's post at java-dev: 
  
       [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 
       [java]     at org.apache.lucene.search.BooleanScorer2  
  $Coordinator.coordFactor(BooleanScorer2.java:54) 
       [java]     at org.apache.lucene.search.BooleanScorer2.score  
  (BooleanScorer2.java:292) 
 ... 
  
 and my answer: 
  
 Probably nrMatchers is increased too often in score() by calling score() 
 more than once.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans

2006-01-27 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-413?page=comments#action_12364232 ] 

Erik Hatcher commented on LUCENE-413:
-

I ran into one issue after applying all of these patches:

[javac] 
/Users/erik/dev/lucene/src/java/org/apache/lucene/search/BooleanQuery.java:337: 
cannot find symbol
[javac] symbol  : constructor 
BooleanScorer2(org.apache.lucene.search.Similarity,int)
[javac] location: class org.apache.lucene.search.BooleanScorer2
[javac]   BooleanScorer2 result = new BooleanScorer2(similarity,


The code in BooleanQuery was this:

  BooleanScorer2 result = new BooleanScorer2(similarity,
 minNrShouldMatch);

I'm not sure where the mismatch came in.  I removed the 2nd parameter to the 
non-existent BooleanScorer2 constructor to get the compile to work.  What am I 
missing?



 [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans
 -

  Key: LUCENE-413
  URL: http://issues.apache.org/jira/browse/LUCENE-413
  Project: Lucene - Java
 Type: Bug
   Components: Search
 Versions: CVS Nightly - Specify date in submission
  Environment: Operating System: other
 Platform: Other
 Reporter: paul.elschot
 Assignee: Lucene Developers
  Attachments: DisjunctionSumScorerPatch3.txt, DisjunctionSumScorerPatch4.txt, 
 DisjunctionSumScorerTestPatch1.txt, NearSpansOrdered.java, 
 NearSpansOrderedBugHuntPatch1.txt, NearSpansUnordered.java, 
 SpanNearQueryPatch1.txt, SpanScorerTestPatch1.txt

 From Erik's post at java-dev: 
  
       [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 
       [java]     at org.apache.lucene.search.BooleanScorer2  
  $Coordinator.coordFactor(BooleanScorer2.java:54) 
       [java]     at org.apache.lucene.search.BooleanScorer2.score  
  (BooleanScorer2.java:292) 
 ... 
  
 and my answer: 
  
 Probably nrMatchers is increased too often in score() by calling score() 
 more than once.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans

2006-01-27 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-413?page=all ]

Erik Hatcher updated LUCENE-413:


Comment: was deleted

 [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans
 -

  Key: LUCENE-413
  URL: http://issues.apache.org/jira/browse/LUCENE-413
  Project: Lucene - Java
 Type: Bug
   Components: Search
 Versions: CVS Nightly - Specify date in submission
  Environment: Operating System: other
 Platform: Other
 Reporter: paul.elschot
 Assignee: Lucene Developers
  Attachments: DisjunctionSumScorerPatch3.txt, DisjunctionSumScorerPatch4.txt, 
 DisjunctionSumScorerTestPatch1.txt, NearSpansOrdered.java, 
 NearSpansOrderedBugHuntPatch1.txt, NearSpansUnordered.java, 
 SpanNearQueryPatch1.txt, SpanScorerTestPatch1.txt

 From Erik's post at java-dev: 
  
       [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 
       [java]     at org.apache.lucene.search.BooleanScorer2  
  $Coordinator.coordFactor(BooleanScorer2.java:54) 
       [java]     at org.apache.lucene.search.BooleanScorer2.score  
  (BooleanScorer2.java:292) 
 ... 
  
 and my answer: 
  
 Probably nrMatchers is increased too often in score() by calling score() 
 more than once.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-490) JavaCC 4.0 fails to generate QueryParser.java

2006-01-25 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-490?page=all ]
 
Erik Hatcher resolved LUCENE-490:
-

Fix Version: 1.9
 Resolution: Fixed
  Assign To: Erik Hatcher

Patch applied, thanks!

 JavaCC 4.0 fails to generate QueryParser.java
 -

  Key: LUCENE-490
  URL: http://issues.apache.org/jira/browse/LUCENE-490
  Project: Lucene - Java
 Type: Bug
   Components: QueryParser
 Versions: CVS Nightly - Specify date in submission
 Reporter: Steven Rowe
 Assignee: Erik Hatcher
 Priority: Minor
  Fix For: 1.9
  Attachments: QueryParser.jj.patch

 When generating the Java source for QueryParser via the ant task 
 'javacc-QueryParser' against Subversion trunk (updated Jan. 25, 2006), JavaCC 
 4.0 gives the following error:
 javacc-QueryParser:
[javacc] Java Compiler Compiler Version 4.0 (Parser Generator)
[javacc] (type javacc with no arguments for help)
[javacc] Reading from file 
 [...]/src/java/org/apache/lucene/queryParser/QueryParser.jj . . .
[javacc] org.javacc.parser.ParseException: Encountered  at line 754, 
 column 3.
[javacc] Was expecting one of:
[javacc] STRING_LITERAL ...
[javacc]  ...
[javacc] 
[javacc] Detected 1 errors and 0 warnings.
 BUILD FAILED

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (LUCENE-490) JavaCC 4.0 fails to generate QueryParser.java

2006-01-25 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-490?page=all ]
 
Erik Hatcher closed LUCENE-490:
---


 JavaCC 4.0 fails to generate QueryParser.java
 -

  Key: LUCENE-490
  URL: http://issues.apache.org/jira/browse/LUCENE-490
  Project: Lucene - Java
 Type: Bug
   Components: QueryParser
 Versions: CVS Nightly - Specify date in submission
 Reporter: Steven Rowe
 Assignee: Erik Hatcher
 Priority: Minor
  Fix For: 1.9
  Attachments: QueryParser.jj.patch

 When generating the Java source for QueryParser via the ant task 
 'javacc-QueryParser' against Subversion trunk (updated Jan. 25, 2006), JavaCC 
 4.0 gives the following error:
 javacc-QueryParser:
[javacc] Java Compiler Compiler Version 4.0 (Parser Generator)
[javacc] (type javacc with no arguments for help)
[javacc] Reading from file 
 [...]/src/java/org/apache/lucene/queryParser/QueryParser.jj . . .
[javacc] org.javacc.parser.ParseException: Encountered  at line 754, 
 column 3.
[javacc] Was expecting one of:
[javacc] STRING_LITERAL ...
[javacc]  ...
[javacc] 
[javacc] Detected 1 errors and 0 warnings.
 BUILD FAILED

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-489) Wildcard Queries with leading *

2006-01-24 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-489?page=comments#action_12363831 ] 

Erik Hatcher commented on LUCENE-489:
-

There are term rotation techniques that allow for efficient wildcard querying.  
For example, the word cat can be indexed as cat, $cat, t$ca, and 
at$c.  For a query of a*, the search can be rotated to search for a*.

 Wildcard Queries with leading *
 -

  Key: LUCENE-489
  URL: http://issues.apache.org/jira/browse/LUCENE-489
  Project: Lucene - Java
 Type: Wish
   Components: QueryParser
 Reporter: Peter Schäfer


 It would be nice to have wildcard queries with a leading wildcard (? or 
 *).
 I'm aware that this is a well-known issue, and I do understand the reasons 
 behind it,
 but try explaining that to our end-users ... :-(

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-489) Wildcard Queries with leading *

2006-01-24 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-489?page=comments#action_12363833 ] 

Erik Hatcher commented on LUCENE-489:
-

FYI - Actually it would not be possible to override getWildcardQuery to reverse 
a *foo query term.  The parser prevents *foo from being parsed before even 
getting to getWildcardQuery without a change to the .jj grammar.

 Wildcard Queries with leading *
 -

  Key: LUCENE-489
  URL: http://issues.apache.org/jira/browse/LUCENE-489
  Project: Lucene - Java
 Type: Wish
   Components: QueryParser
 Reporter: Peter Schäfer


 It would be nice to have wildcard queries with a leading wildcard (? or 
 *).
 I'm aware that this is a well-known issue, and I do understand the reasons 
 behind it,
 but try explaining that to our end-users ... :-(

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (LUCENE-477) Build an index which allows me to broswe by category.

2005-12-06 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-477?page=all ]
 
Erik Hatcher closed LUCENE-477:
---

Resolution: Invalid

Yes, please bring this topic to the user list rather than JIRA

 Build an index which allows me to broswe by category.
 -

  Key: LUCENE-477
  URL: http://issues.apache.org/jira/browse/LUCENE-477
  Project: Lucene - Java
 Type: Task
   Components: Index
 Versions: 1.4
  Environment: JDK 1.4, Windows 2003, Tomcat 5.0.28
 Reporter: Mark Dos Santos


 Hello there,
 I have a collection of documents that I am using lucene to build an index 
 for, and then I have a jsp app to search my documents. This all works great. 
 I believe lucene is such an amazing product, but thats a whole other topic. 
 Anyway, maybe it's my lack of experience in building indexes, but I am have 
 trouble coming up with an index that kind of mimics verity's parametric 
 index.  You see my documents all have a category path (I have over 50,000 
 docs).  A document can be at any level of the category path, and that same 
 path can have many different documents. IE. Document x, has a category path 
 USA//New Jersey//Trenton//09890 and Document y has a category path USA//New 
 Jersey//Trenton//09890.  
 Basically, I would like to build an index using lucene, where when I search, 
 if my results were to bring back those two documents, I would like to 
 retrieve the distinct category path for those two documents.  Of course I can 
 loop through and build a vector with only the unique paths that come in the 
 search results, but that obviously would take to long when I get lets say 
 1 results from my search.
 So the question I guess is, how can I build an index that would facilitate 
 this functionality for me.  If anyone has any suggestions I would greatly 
 appreciate it.
 Thanks,
 Mark

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (LUCENE-324) org.apache.lucene.analysis.cn.ChineseTokenizer missing offset decrement

2005-12-04 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-324?page=all ]
 
Erik Hatcher closed LUCENE-324:
---

Assign To: (was: Erik Hatcher)

 org.apache.lucene.analysis.cn.ChineseTokenizer missing offset decrement
 ---

  Key: LUCENE-324
  URL: http://issues.apache.org/jira/browse/LUCENE-324
  Project: Lucene - Java
 Type: Bug
   Components: Analysis
 Versions: unspecified
  Environment: Operating System: All
 Platform: All
 Reporter: Ray Tsang
 Priority: Trivial
  Fix For: 1.9
  Attachments: ChineseTokenizerTest.java, 
 chinese_tokenizer-missing_offset.patch

 Apparently, in ChineseTokenizer, offset should be decremented like bufferIndex
 when Character is OTHER_LETTER.  This directly affects startOffset and 
 endOffset
 values.
 This is critical to have Highlighter working correctly because Highlighter 
 marks
 matching text based on these offset values.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (LUCENE-288) [patch] better support gcj compilation

2005-12-01 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-288?page=all ]
 
Erik Hatcher closed LUCENE-288:
---


 [patch] better support gcj compilation
 --

  Key: LUCENE-288
  URL: http://issues.apache.org/jira/browse/LUCENE-288
  Project: Lucene - Java
 Type: Bug
   Components: Search
 Versions: 1.4
  Environment: Operating System: All
 Platform: Other
 Reporter: Andi Vajda
  Attachments: 15411.txt

 In order to workaround http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15411 the
 attached patch is necessary.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-470) Refactoring and slight extension of regex testing code.

2005-11-24 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-470?page=comments#action_12358455 ] 

Erik Hatcher commented on LUCENE-470:
-

Paul - I committed your changes, thanks!   I did have to add String in front 
of the declaration of the FN variable though :)

 Refactoring and slight extension of regex testing code.
 ---

  Key: LUCENE-470
  URL: http://issues.apache.org/jira/browse/LUCENE-470
  Project: Lucene - Java
 Type: Test
   Components: Search
 Versions: CVS Nightly - Specify date in submission
 Reporter: paul.elschot
  Fix For: CVS Nightly - Specify date in submission
  Attachments: TestRegexQuery.java



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-461) StandardTokenizer splitting all of Korean words into separate characters

2005-11-12 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-461?page=all ]
 
Erik Hatcher resolved LUCENE-461:
-

Fix Version: 1.9
 Resolution: Fixed

These patches have been applied, thanks! 

There is one thing to note, and that is a change in the token type emitted from 
CJK to CJ.  It is possible that folks have written code to rely on 
that, but this token type is currently brittle as it is based on the JavaCC 
grammar definition and I view this as an acceptable break in full backwards 
compatibility because it is unlikely that anyone is using that token type.

 StandardTokenizer splitting all of Korean words into separate characters
 

  Key: LUCENE-461
  URL: http://issues.apache.org/jira/browse/LUCENE-461
  Project: Lucene - Java
 Type: Bug
   Components: Analysis
  Environment: Analyzing Korean text with Apache Lucene, esp. with 
 StandardAnalyzer.
 Reporter: Cheolgoo Kang
 Priority: Minor
  Fix For: 1.9
  Attachments: StandardTokenizer_KoreanWord.patch, 
 TestStandardAnalyzer_KoreanWord.patch

 StandardTokenizer splits all those Korean words inth separate character 
 tokens. For example, ? is one Korean word that means Hello, but 
 StandardAnalyzer separates it into five tokens of ?, ?, ?, ?, ?.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (LUCENE-461) StandardTokenizer splitting all of Korean words into separate characters

2005-11-12 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-461?page=all ]
 
Erik Hatcher closed LUCENE-461:
---


 StandardTokenizer splitting all of Korean words into separate characters
 

  Key: LUCENE-461
  URL: http://issues.apache.org/jira/browse/LUCENE-461
  Project: Lucene - Java
 Type: Bug
   Components: Analysis
  Environment: Analyzing Korean text with Apache Lucene, esp. with 
 StandardAnalyzer.
 Reporter: Cheolgoo Kang
 Priority: Minor
  Fix For: 1.9
  Attachments: StandardTokenizer_KoreanWord.patch, 
 TestStandardAnalyzer_KoreanWord.patch

 StandardTokenizer splits all those Korean words inth separate character 
 tokens. For example, ? is one Korean word that means Hello, but 
 StandardAnalyzer separates it into five tokens of ?, ?, ?, ?, ?.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-452) PrefixQuery is missing the equals() method

2005-10-12 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-452?page=comments#action_12331878 ] 

Erik Hatcher commented on LUCENE-452:
-

Thank you for this patch!  I'm in the process of applying it right now.   Your 
use of the boost factor was nice to see, but points out that we have ignored it 
in other .equals methods (e.g. WildcardQuery).  If you're interested, we'd 
accept patches to correct all of the other .equals methods to incorporate the 
boost factor :)

 PrefixQuery is missing the equals() method
 --

  Key: LUCENE-452
  URL: http://issues.apache.org/jira/browse/LUCENE-452
  Project: Lucene - Java
 Type: Improvement
 Versions: 1.9
 Reporter: Guillaume Blain
 Priority: Minor
  Attachments: PrefixQuery.java

 The PrefixQuery is inheriting the java.lang.Object's object default equals 
 method. This makes it hard to have test working of PrefixFilter or any other 
 task requiring equals to work proerply (insertion in Set, etc.). The equal 
 method should be very similar, not to say identical except for class casting, 
 to the equals() of TermQuery. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (LUCENE-444) StandardTokenizer loses Korean characters

2005-10-05 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-444?page=all ]
 
Erik Hatcher closed LUCENE-444:
---


I'm closing this issue... but some unit tests would be nice to go along with 
this too, eventually :)

 StandardTokenizer loses Korean characters
 -

  Key: LUCENE-444
  URL: http://issues.apache.org/jira/browse/LUCENE-444
  Project: Lucene - Java
 Type: Bug
   Components: Analysis
 Reporter: Cheolgoo Kang
 Priority: Minor
  Fix For: 1.9
  Attachments: StandardTokenizer_Korean.patch

 While using StandardAnalyzer, exp. StandardTokenizer with Korean text stream, 
 StandardTokenizer ignores the Korean characters. This is because the 
 definition of CJK token in StandardTokenizer.jj JavaCC file doesn't have 
 enough range covering Korean syllables described in Unicode character map.
 This patch adds one line of 0xAC00~0xD7AF, the Korean syllables range to the 
 StandardTokenizer.jj code.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Closed: (LUCENE-429) Little improvement for SimpleHTMLEncoder

2005-09-23 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-429?page=all ]
 
Erik Hatcher closed LUCENE-429:
---

Resolution: Fixed

 Little improvement for SimpleHTMLEncoder
 

  Key: LUCENE-429
  URL: http://issues.apache.org/jira/browse/LUCENE-429
  Project: Lucene - Java
 Type: Improvement
   Components: Examples
 Versions: CVS Nightly - Specify date in submission
  Environment: Operating System: other
 Platform: Other
 Reporter: Stefan Wachter
 Priority: Minor


 The SimpleHTMLEncoder could be improved slightly: all characters with code =
 128 should be encoded as character entities. The reason is, that the encoder
 does not know the encoding that is used for the response. Therefore it is 
 safer
 to encode all characters beyond ASCII as character entities.
 Here is the necessary modification of SimpleHTMLEncoder:
default:
  if (c  128) {
result.append(c);
  } else {
result.append(#).append((int)c).append(;);
  }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-437) SnowballFilter loses token position offset

2005-09-22 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-437?page=all ]
 
Erik Hatcher resolved LUCENE-437:
-

Fix Version: unspecified
 Resolution: Fixed

Yonik - thanks for the patch!  It has been applied.

 SnowballFilter loses token position offset
 --

  Key: LUCENE-437
  URL: http://issues.apache.org/jira/browse/LUCENE-437
  Project: Lucene - Java
 Type: Bug
   Components: Analysis
 Versions: CVS Nightly - Specify date in submission
 Reporter: Yonik Seeley
 Assignee: Erik Hatcher
  Fix For: unspecified
  Attachments: yonik_snowballfix.txt

 SnowballFilter doesn't set the token position increment (and thus it defaults 
 to 1).
 This also affetcs SnowballAnalyzer since it uses SnowballFilter.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-429) Little improvement for SimpleHTMLEncoder

2005-09-22 Thread Erik Hatcher (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-429?page=all ]

Erik Hatcher updated LUCENE-429:


Bugzilla Id:   (was: 36333)
  Component: Examples
 (was: Other)
Description: 
The SimpleHTMLEncoder could be improved slightly: all characters with code =
128 should be encoded as character entities. The reason is, that the encoder
does not know the encoding that is used for the response. Therefore it is safer
to encode all characters beyond ASCII as character entities.

Here is the necessary modification of SimpleHTMLEncoder:

   default:
 if (c  128) {
   result.append(c);
 } else {
   result.append(#).append((int)c).append(;);
 }

  was:
The SimpleHTMLEncoder could be improved slightly: all characters with code =
128 should be encoded as character entities. The reason is, that the encoder
does not know the encoding that is used for the response. Therefore it is safer
to encode all characters beyond ASCII as character entities.

Here is the necessary modification of SimpleHTMLEncoder:

   default:
 if (c  128) {
   result.append(c);
 } else {
   result.append(#).append((int)c).append(;);
 }

Environment: 
Operating System: other
Platform: Other

  was:
Operating System: other
Platform: Other

  Assign To: (was: Lucene Developers)

 Little improvement for SimpleHTMLEncoder
 

  Key: LUCENE-429
  URL: http://issues.apache.org/jira/browse/LUCENE-429
  Project: Lucene - Java
 Type: Improvement
   Components: Examples
 Versions: CVS Nightly - Specify date in submission
  Environment: Operating System: other
 Platform: Other
 Reporter: Stefan Wachter
 Priority: Minor


 The SimpleHTMLEncoder could be improved slightly: all characters with code =
 128 should be encoded as character entities. The reason is, that the encoder
 does not know the encoding that is used for the response. Therefore it is 
 safer
 to encode all characters beyond ASCII as character entities.
 Here is the necessary modification of SimpleHTMLEncoder:
default:
  if (c  128) {
result.append(c);
  } else {
result.append(#).append((int)c).append(;);
  }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-438) add Token.setTermText(), remove final

2005-09-22 Thread Erik Hatcher (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-438?page=comments#action_12330239 ] 

Erik Hatcher commented on LUCENE-438:
-

Yes, please elaborate on why you need to subclass Token.

 add Token.setTermText(), remove final
 -

  Key: LUCENE-438
  URL: http://issues.apache.org/jira/browse/LUCENE-438
  Project: Lucene - Java
 Type: Improvement
 Versions: CVS Nightly - Specify date in submission
 Reporter: Yonik Seeley
 Priority: Minor
  Attachments: yonik_Token.txt

 The Token class should be more friendly to classes not in it's package:
   1) add setTermText()
   2) remove final from class and toString()
   3) add clone()
 Support for (1):
   TokenFilters in the same package as Token are able to do things like 
t.termText = t.termText.toLowerCase(); which is more efficient, but more 
 importantly less error prone.  Without the ability to change *only* the term 
 text, a new Token must be created, and one must remember to set all the 
 properties correctly.  This exact issue caused this bug:
 http://issues.apache.org/jira/browse/LUCENE-437
 Support for (2):
   Removing final allows one to subclass Token.  I didn't see any performance 
 impact after removing final.
 I can go into more detail on why I want to subclass Token if anyone is 
 interested.
 Support for (3):
   - support for a synonym TokenFilter, where one needs to make two tokens 
 from one (same args that support (1), and esp important if instance is a 
 subclass of Token).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

63 matches

Mail list logo