Hudson build is back to normal: Lucene-trunk #904

2009-07-29 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/904/changes



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Build failed in Hudson: Lucene-trunk #902

2009-07-29 Thread Uwe Schindler
This seems to be fixed now. But there is something completely wrong with
clover:

If you look into the clover reports, there are a lot of classes having 0%
code coverage, but there are tests available (e.g. my new NumericRange
things). Also *all* contribs have 0%.

After thinking a little bit about it, it seems, that the cloverage report is
build not from the normal test-run, but it is generated from the results of
the test-tag. This explains, why NumericRange and Spatial seem to have no
tests for clover.

Does anybody know, how to fix this. Maybe the cloverage should be disabled
for the test run in test-tag? What can be changed in build.xml to do this?

I have no clover installed locally, so I cannot try this out.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Tuesday, July 28, 2009 12:13 PM
 To: java-dev@lucene.apache.org
 Subject: Re: Build failed in Hudson: Lucene-trunk #902
 
 Hmm... the build looks like it failed because of some odd clover
 licensing issue:
 
[clover] Sorry, you are not licensed to instrument files in the package
 ''.
 
 Anyone have any ideas?
 
 Mike
 
 On Mon, Jul 27, 2009 at 11:26 PM, Apache Hudson
 Serverhud...@hudson.zones.apache.org wrote:
  See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/902/changes
 
  Changes:
 
  [uschindler] LUCENE-1754: JavaDoc updates
 
  [mikemccand] LUCENE-1754: EMPTY_DOCIDSET subclasses DocIdSet directly
 
  [mikemccand] LUCENE-1754: just use EMPTY_DOCIDSET.iterator() instead of
 new EmptyDocIdSetIterator
 
  [mikemccand] LUCENE-1595: don't use SortField.AUTO; deprecate
 LineDocMaker  EnwikiDocMaker
 
  [mikemccand] LUCENE-1754: add EmptyDocIdSetIterator
 
  [mikemccand] LUCENE-1754: update back-compat test
 
  [mikemccand] LUCENE-1754: BooleanQuery detects up front if it won't
 match any docs and returns null from its scorer() instead of
 NonMatchingScorer
 
  --
  [...truncated 21062 lines...]
   [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property
   [javadoc] 1 error
   [javadoc] 32 warnings
       [jar] Building jar:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/contrib/spatial/lucene-spatial-2009-07-28_02-04-46-
 javadoc.jar
      [echo] Building spellchecker...
 
  javadocs:
   [javadoc] Generating Javadoc
   [javadoc] Javadoc execution
   [javadoc] Loading source files for package
 org.apache.lucene.search.spell...
   [javadoc] Constructing Javadoc information...
   [javadoc] javadoc: warning - Error reading file:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/docs/api/contrib-spellchecker/../package-list
   [javadoc] Standard Doclet version 1.5.0_14
   [javadoc] Building tree for all the packages and classes...
   [javadoc] Building index for all the packages and classes...
   [javadoc] Building index for all classes...
   [javadoc] javadoc: error - Error while reading file
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/contrib/spellchecker/src/java/overview.html
   [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/docs/api/contrib-spellchecker/stylesheet.css...
   [javadoc] Note: Custom tags that could override future standard tags:
 �...@todo. To avoid potential overrides, use at least one period character
 (.) in custom tag names.
   [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property
   [javadoc] 1 error
   [javadoc] 1 warning
       [jar] Building jar:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/contrib/spellchecker/lucene-spellchecker-2009-07-
 28_02-04-46-javadoc.jar
      [echo] Building surround...
 
  javadocs:
   [javadoc] Generating Javadoc
   [javadoc] Javadoc execution
   [javadoc] Loading source files for package
 org.apache.lucene.queryParser.surround.parser...
   [javadoc] Loading source files for package
 org.apache.lucene.queryParser.surround.query...
   [javadoc] Constructing Javadoc information...
   [javadoc] javadoc: warning - Error reading file:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/docs/api/contrib-surround/../package-list
   [javadoc] Standard Doclet version 1.5.0_14
   [javadoc] Building tree for all the packages and classes...
   [javadoc] Building index for all the packages and classes...
   [javadoc] Building index for all classes...
   [javadoc] javadoc: error - Error while reading file
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/contrib/surround/src/java/overview.html
   [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/docs/api/contrib-surround/stylesheet.css...
   [javadoc] Note: Custom tags that could override future standard tags:
 �...@todo. To avoid potential overrides, use at least one period character
 (.) in 

[jira] Created: (LUCENE-1765) incorrect doc description of fielded query syntax

2009-07-29 Thread solrize (JIRA)
incorrect doc description of fielded query syntax
-

 Key: LUCENE-1765
 URL: https://issues.apache.org/jira/browse/LUCENE-1765
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.4.1
 Environment: lucene.apache.org docs
Reporter: solrize
Priority: Minor


http://lucene.apache.org/java/2_4_1/queryparsersyntax.html#Fields says:

  You can search any field by typing the field name followed by a colon : and 
then the term you are looking for. 

This is slightly incomplete since the stuff after the fieldname can be a more 
complex query, not necessarily a term.  For example, 

title:(do it right)

seems to work when I tried it.  It would be good if the doc was updated to 
describe the syntax exactly.

Also, documentation should be one of the components selectable in bug reports.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1690) Morelikethis queries are very slow compared to other search types

2009-07-29 Thread Richard Marr (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736525#action_12736525
 ] 

Richard Marr commented on LUCENE-1690:
--

There's also another problem I've just noticed. Please ignore the latest patch.

 Morelikethis queries are very slow compared to other search types
 -

 Key: LUCENE-1690
 URL: https://issues.apache.org/jira/browse/LUCENE-1690
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Richard Marr
Priority: Minor
 Attachments: LruCache.patch, LUCENE-1690.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 The MoreLikeThis object performs term frequency lookups for every query.  
 From my testing that's what seems to take up the majority of time for 
 MoreLikeThis searches.  
 For some (I'd venture many) applications it's not necessary for term 
 statistics to be looked up every time. A fairly naive opt-in caching 
 mechanism tied to the life of the MoreLikeThis object would allow 
 applications to cache term statistics for the duration that suits them.
 I've got this working in my test code. I'll put together a patch file when I 
 get a minute. From my testing this can improve performance by a factor of 
 around 10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-trunk #902

2009-07-29 Thread Michael McCandless
I'm guessing it was the empty source file I accidentally left in for
LUCENE-1754, that Hoss removed (thanks!). I think clover saw that as
an attempt to instrument a source in the empty-string package.

I'm unfamiliar w/ how to configure clover, but I agree we should make
sure it's testing coverage for our normal unit tests.  Rather than
turn it off for test-tag, can we measure coverage of all tests
(test-tag, test-core, test-contrib)?

Is there someone familiar w/ clover who can look into this?

Mike

On Wed, Jul 29, 2009 at 3:10 AM, Uwe Schindleru...@thetaphi.de wrote:
 This seems to be fixed now. But there is something completely wrong with
 clover:

 If you look into the clover reports, there are a lot of classes having 0%
 code coverage, but there are tests available (e.g. my new NumericRange
 things). Also *all* contribs have 0%.

 After thinking a little bit about it, it seems, that the cloverage report is
 build not from the normal test-run, but it is generated from the results of
 the test-tag. This explains, why NumericRange and Spatial seem to have no
 tests for clover.

 Does anybody know, how to fix this. Maybe the cloverage should be disabled
 for the test run in test-tag? What can be changed in build.xml to do this?

 I have no clover installed locally, so I cannot try this out.

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de

 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Tuesday, July 28, 2009 12:13 PM
 To: java-dev@lucene.apache.org
 Subject: Re: Build failed in Hudson: Lucene-trunk #902

 Hmm... the build looks like it failed because of some odd clover
 licensing issue:

    [clover] Sorry, you are not licensed to instrument files in the package
 ''.

 Anyone have any ideas?

 Mike

 On Mon, Jul 27, 2009 at 11:26 PM, Apache Hudson
 Serverhud...@hudson.zones.apache.org wrote:
  See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/902/changes
 
  Changes:
 
  [uschindler] LUCENE-1754: JavaDoc updates
 
  [mikemccand] LUCENE-1754: EMPTY_DOCIDSET subclasses DocIdSet directly
 
  [mikemccand] LUCENE-1754: just use EMPTY_DOCIDSET.iterator() instead of
 new EmptyDocIdSetIterator
 
  [mikemccand] LUCENE-1595: don't use SortField.AUTO; deprecate
 LineDocMaker  EnwikiDocMaker
 
  [mikemccand] LUCENE-1754: add EmptyDocIdSetIterator
 
  [mikemccand] LUCENE-1754: update back-compat test
 
  [mikemccand] LUCENE-1754: BooleanQuery detects up front if it won't
 match any docs and returns null from its scorer() instead of
 NonMatchingScorer
 
  --
  [...truncated 21062 lines...]
   [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property
   [javadoc] 1 error
   [javadoc] 32 warnings
       [jar] Building jar:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/contrib/spatial/lucene-spatial-2009-07-28_02-04-46-
 javadoc.jar
      [echo] Building spellchecker...
 
  javadocs:
   [javadoc] Generating Javadoc
   [javadoc] Javadoc execution
   [javadoc] Loading source files for package
 org.apache.lucene.search.spell...
   [javadoc] Constructing Javadoc information...
   [javadoc] javadoc: warning - Error reading file:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/docs/api/contrib-spellchecker/../package-list
   [javadoc] Standard Doclet version 1.5.0_14
   [javadoc] Building tree for all the packages and classes...
   [javadoc] Building index for all the packages and classes...
   [javadoc] Building index for all classes...
   [javadoc] javadoc: error - Error while reading file
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/contrib/spellchecker/src/java/overview.html
   [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/docs/api/contrib-spellchecker/stylesheet.css...
   [javadoc] Note: Custom tags that could override future standard tags:
 �...@todo. To avoid potential overrides, use at least one period character
 (.) in custom tag names.
   [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property
   [javadoc] 1 error
   [javadoc] 1 warning
       [jar] Building jar:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/contrib/spellchecker/lucene-spellchecker-2009-07-
 28_02-04-46-javadoc.jar
      [echo] Building surround...
 
  javadocs:
   [javadoc] Generating Javadoc
   [javadoc] Javadoc execution
   [javadoc] Loading source files for package
 org.apache.lucene.queryParser.surround.parser...
   [javadoc] Loading source files for package
 org.apache.lucene.queryParser.surround.query...
   [javadoc] Constructing Javadoc information...
   [javadoc] javadoc: warning - Error reading file:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/docs/api/contrib-surround/../package-list
   [javadoc] Standard Doclet version 1.5.0_14
   [javadoc] Building tree for all the packages 

[jira] Created: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Simon Willnauer (JIRA)
Add Thread-Safety note to IndexWriter JavaDoc
-

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 2.9


IndexWriter Javadocs should contain a note about thread-safety. This is already 
mentioned on the wiki FAQ page but such an essential information should be part 
of the module documentation too.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1766:
--

Assignee: Michael McCandless

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1766:
---

Attachment: LUCENE-1766.patch

Tweaked the wording... Simon if this looks OK to you I'll commit shortly!

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch, LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1758) improve arabic analyzer: light8 - light10

2009-07-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736548#action_12736548
 ] 

Michael McCandless commented on LUCENE-1758:


bq. perhaps both this and LUCENE-1628 should include LowerCaseFilter.

That seems reasonable?

 improve arabic analyzer: light8 - light10
 --

 Key: LUCENE-1758
 URL: https://issues.apache.org/jira/browse/LUCENE-1758
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Robert Muir
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1758.patch, LUCENE-1758.txt


 Someone mentioned on the java user list that the arabic analysis was not as 
 good as they would like.
 This patch adds the لل- prefix (light10 algorithm versus light8 algorithm).
 In the light10 paper, this improves precision from .390 to .413
 They mention this is not statistically significant, but it makes linguistic 
 sense and at least has been shown not to hurt.
 In the future, I hope openrelevance will allow us to try some more 
 approaches. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736551#action_12736551
 ] 

Simon Willnauer commented on LUCENE-1766:
-

looks good to me.

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch, LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736556#action_12736556
 ] 

Uwe Schindler commented on LUCENE-1766:
---

By the way: Do we have a TS note for IndexReader?

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch, LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1766.


Resolution: Fixed

OK thanks Simon!

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch, LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736557#action_12736557
 ] 

Simon Willnauer commented on LUCENE-1766:
-

We don't afaik.

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch, LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-1766:



I'll add to IndexReader  IndexSearcher as well.

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch, LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction

2009-07-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736614#action_12736614
 ] 

Michael McCandless commented on LUCENE-1763:


How about we:

  * Simply change the methods.  Yes it's technically a break in back-compat, 
but since they are package private, and so advanced (I think very few people 
have customized their merge policy/scheduler), a compile time error on upgrade 
seems fine.

  * Make the APIs public (perhaps add a unit test, outside of oal.index 
package, asserting that all that's required is in fact public)

  * Mark the APIs as subject to change.

 MergePolicy should require an IndexWriter upon construction
 ---

 Key: LUCENE-1763
 URL: https://issues.apache.org/jira/browse/LUCENE-1763
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9


 MergePolicy does not require an IW upon construction, but requires one to be 
 passed as method arg to various methods. This gives the impression as if a 
 single MP instance can be shared across various IW instances, which is not 
 true for all MPs (if at all). In addition, LogMergePolicy uses the IW 
 instance passed to these methods incosistently, and is currently exposed to 
 potential NPEs.
 This issue will change MP to require an IW instance, however for back-compat 
 reasons the following changes will be made:
 # A new MP ctor w/ IW as arg will be introduced. Additionally, for 
 back-compat a default ctor will also be declared which will assign null to 
 the member IW.
 # Methods that require IW will be deprecated, and new ones will be declared.
 #* For back-compat, the new ones will not be made abstract, but will throw 
 UOE, with a comment that they will become abstract in 3.0.
 # All current MP impls will move to use the member instance.
 # The code which calls MP methods will continue to use the deprecated 
 methods, passing an IW even that it won't be necessary -- this is strictly 
 for back-compat.
 In 3.0, we'll remove the deprecated default ctor and methods, and change the 
 code to not call the IW method variants anymore.
 I hope that I didn't leave anything out. I'm sure I'll find out when I work 
 on the patch :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1766:
---

Attachment: LUCENE-1766.patch

IndexReader  IndexSearcher as well.

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction

2009-07-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736617#action_12736617
 ] 

Shai Erera commented on LUCENE-1763:


I don't mind doing that ... but note that LMP's methods are public (it 
overrides and declare them public) and so I was thinking that someone could 
potentially have written his own LMP (no one can write their own MP today). But 
if you're fine w/ me doing that, it's fine by me as well.

BTW - I don't need to come up w/ new names after all, since by just adding the 
same method, w/o the IW arg changes its signature. But I agree that having just 
the right form makes more sense.

 MergePolicy should require an IndexWriter upon construction
 ---

 Key: LUCENE-1763
 URL: https://issues.apache.org/jira/browse/LUCENE-1763
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9


 MergePolicy does not require an IW upon construction, but requires one to be 
 passed as method arg to various methods. This gives the impression as if a 
 single MP instance can be shared across various IW instances, which is not 
 true for all MPs (if at all). In addition, LogMergePolicy uses the IW 
 instance passed to these methods incosistently, and is currently exposed to 
 potential NPEs.
 This issue will change MP to require an IW instance, however for back-compat 
 reasons the following changes will be made:
 # A new MP ctor w/ IW as arg will be introduced. Additionally, for 
 back-compat a default ctor will also be declared which will assign null to 
 the member IW.
 # Methods that require IW will be deprecated, and new ones will be declared.
 #* For back-compat, the new ones will not be made abstract, but will throw 
 UOE, with a comment that they will become abstract in 3.0.
 # All current MP impls will move to use the member instance.
 # The code which calls MP methods will continue to use the deprecated 
 methods, passing an IW even that it won't be necessary -- this is strictly 
 for back-compat.
 In 3.0, we'll remove the deprecated default ctor and methods, and change the 
 code to not call the IW method variants anymore.
 I hope that I didn't leave anything out. I'm sure I'll find out when I work 
 on the patch :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1752) incorrect snippet returned with SpanScorer

2009-07-29 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-1752:
---

Fix Version/s: 2.9

I'd like set 2.9. With the patch, highlighter works on our production 
environment perfectly.

 incorrect snippet returned with SpanScorer
 --

 Key: LUCENE-1752
 URL: https://issues.apache.org/jira/browse/LUCENE-1752
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.9
Reporter: Koji Sekiguchi
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1752.patch


 This problem was reported by my customer. They are using Solr 1.3 and 
 uni-gram, but it can be reproduced with Lucene 2.9 and WhitespaceAnalyzer.
 {panel:title=Query}
 (f1:a b c d OR f2:a b c d) AND (f1:b c g OR f2:b c g)
 {panel}
 The snippet we expected is:
 {panel}
 x y z Ba/B Bb/B Bc/B Bd/B e f g Bb/B Bc/B Bg/B
 {panel}
 but we got:
 {panel}
 x y z Ba/B b c Bd/B e f g Bb/B Bc/B Bg/B
 {panel}
 Program to reproduce the problem:
 {code}
 public class TestHighlighter {
   static final String CONTENT = x y z a b c d e f g b c g;
   static final String PH1 = \a b c d\;
   static final String PH2 = \b c g\;
   static final String F1 = f1;
   static final String F2 = f2;
   static final String F1C = F1 + :;
   static final String F2C = F2 + :;
   static final String QUERY_STRING =
 ( + F1C + PH1 +  OR  + F2C + PH1 + ) AND (
 + F1C + PH2 +  OR  + F2C + PH2 + );
   static Analyzer analyzer = new WhitespaceAnalyzer();
   
   public static void main(String[] args) throws Exception {
 QueryParser qp = new QueryParser( F1, analyzer );
 Query query = qp.parse( QUERY_STRING );
 CachingTokenFilter stream = new CachingTokenFilter( analyzer.tokenStream( 
 F1, new StringReader( CONTENT ) ) );
 Scorer scorer = new SpanScorer( query, F1, stream, false );
 Highlighter h = new Highlighter( scorer );
 System.out.println( query :  + QUERY_STRING );
 System.out.println( h.getBestFragment( analyzer, F1,  CONTENT ) );
   }
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-1766:


Attachment: LUCENE-1766.patch

Added small but important fact about the synchronization Object.
Everything else looks good to me!

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, 
 LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1752) incorrect snippet returned with SpanScorer

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736629#action_12736629
 ] 

Mark Miller commented on LUCENE-1752:
-

Thanks Koji - I had forgotten about this one. I'll commit it in a bit.

 incorrect snippet returned with SpanScorer
 --

 Key: LUCENE-1752
 URL: https://issues.apache.org/jira/browse/LUCENE-1752
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.9
Reporter: Koji Sekiguchi
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1752.patch


 This problem was reported by my customer. They are using Solr 1.3 and 
 uni-gram, but it can be reproduced with Lucene 2.9 and WhitespaceAnalyzer.
 {panel:title=Query}
 (f1:a b c d OR f2:a b c d) AND (f1:b c g OR f2:b c g)
 {panel}
 The snippet we expected is:
 {panel}
 x y z Ba/B Bb/B Bc/B Bd/B e f g Bb/B Bc/B Bg/B
 {panel}
 but we got:
 {panel}
 x y z Ba/B b c Bd/B e f g Bb/B Bc/B Bg/B
 {panel}
 Program to reproduce the problem:
 {code}
 public class TestHighlighter {
   static final String CONTENT = x y z a b c d e f g b c g;
   static final String PH1 = \a b c d\;
   static final String PH2 = \b c g\;
   static final String F1 = f1;
   static final String F2 = f2;
   static final String F1C = F1 + :;
   static final String F2C = F2 + :;
   static final String QUERY_STRING =
 ( + F1C + PH1 +  OR  + F2C + PH1 + ) AND (
 + F1C + PH2 +  OR  + F2C + PH2 + );
   static Analyzer analyzer = new WhitespaceAnalyzer();
   
   public static void main(String[] args) throws Exception {
 QueryParser qp = new QueryParser( F1, analyzer );
 Query query = qp.parse( QUERY_STRING );
 CachingTokenFilter stream = new CachingTokenFilter( analyzer.tokenStream( 
 F1, new StringReader( CONTENT ) ) );
 Scorer scorer = new SpanScorer( query, F1, stream, false );
 Highlighter h = new Highlighter( scorer );
 System.out.println( query :  + QUERY_STRING );
 System.out.println( h.getBestFragment( analyzer, F1,  CONTENT ) );
   }
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API

2009-07-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736634#action_12736634
 ] 

Robert Muir commented on LUCENE-1460:
-

Michael, sorry to leave it incomplete, I think I am not the best for the 
remaining ones.

For example I am a little intimidated by things such as this note in 
ShingleMatrix: 
{code}
  * This method exists in order to avoid reursive calls to the method
  * as the complexity of a fairlt small matrix then easily would require
  * a gigabyte sized stack per thread.
{code}


 Change all contrib TokenStreams/Filters to use the new TokenStream API
 --

 Key: LUCENE-1460
 URL: https://issues.apache.org/jira/browse/LUCENE-1460
 Project: Lucene - Java
  Issue Type: Task
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1460.patch, lucene-1460.patch, lucene-1460.patch, 
 lucene-1460.patch, LUCENE-1460_contrib_partial.txt, 
 LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, 
 LUCENE-1460_core.txt, LUCENE-1460_partial.txt


 Now that we have the new TokenStream API (LUCENE-1422) we should change all 
 contrib modules to use it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction

2009-07-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736643#action_12736643
 ] 

Michael McCandless commented on LUCENE-1763:


I think subclassing LMP is also extremely advanced, ie, it's OK to make an 
exception to our back-compat policy.

 MergePolicy should require an IndexWriter upon construction
 ---

 Key: LUCENE-1763
 URL: https://issues.apache.org/jira/browse/LUCENE-1763
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9


 MergePolicy does not require an IW upon construction, but requires one to be 
 passed as method arg to various methods. This gives the impression as if a 
 single MP instance can be shared across various IW instances, which is not 
 true for all MPs (if at all). In addition, LogMergePolicy uses the IW 
 instance passed to these methods incosistently, and is currently exposed to 
 potential NPEs.
 This issue will change MP to require an IW instance, however for back-compat 
 reasons the following changes will be made:
 # A new MP ctor w/ IW as arg will be introduced. Additionally, for 
 back-compat a default ctor will also be declared which will assign null to 
 the member IW.
 # Methods that require IW will be deprecated, and new ones will be declared.
 #* For back-compat, the new ones will not be made abstract, but will throw 
 UOE, with a comment that they will become abstract in 3.0.
 # All current MP impls will move to use the member instance.
 # The code which calls MP methods will continue to use the deprecated 
 methods, passing an IW even that it won't be necessary -- this is strictly 
 for back-compat.
 In 3.0, we'll remove the deprecated default ctor and methods, and change the 
 code to not call the IW method variants anymore.
 I hope that I didn't leave anything out. I'm sure I'll find out when I work 
 on the patch :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1695) Update the Highlighter to use the new TokenStream API

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736646#action_12736646
 ] 

Mark Miller commented on LUCENE-1695:
-

So without further objection, I'm going to commit this so that I can finish the 
'make spanscorer the default' issue.

 Update the Highlighter to use the new TokenStream API
 -

 Key: LUCENE-1695
 URL: https://issues.apache.org/jira/browse/LUCENE-1695
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 2.9

 Attachments: LUCENE-1695.patch, LUCENE-1695.patch, LUCENE-1695.patch, 
 LUCENE-1695.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Shai Erera
Hi

I think such methods are useful for a Lucene app, which needs to rollback a
single document delete. Today, IndexReader offers undeleteAll(), which is a
bit extreme. There are two scenarios for this, that I know of:
1) (recently showed up on the user list) I'd like to synchronize documents
on disk and in the index. So if I have a document in the index which I want
to delete, and also a file on the file system (corresponds to an ID or
something), and the file delete fails, I may want to undelete that document.
This has alternatives, but still and undeleteDocument will be useful in this
case.

2) ParallelReader allows one to add a document to two indexes, some fields
to one index and other to the second index, and then read those indexes in
parallel. Such applications will need to delete documents sometimes, and an
undeleteDocument will be useful if a transactional delete is needed: i.e.,
if the first delete succeeds, and the second fails, undo the first delete.

3) ParallelReader doesn't support deleteDocument well currently - i.e., if
one of the deletes fail, some readers will be left w/ the document and some
won't (this is I think a bug).

What do you think?

Shai


[jira] Updated: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction

2009-07-29 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-1763:
---

Attachment: LUCENE-1763.patch

Adds a ctor w/ IndexWriter to MergePolicy, LogMergePolicy, and its extensions.
Fixed tests and IndexWriter code
Fixed tags

All tests pass

 MergePolicy should require an IndexWriter upon construction
 ---

 Key: LUCENE-1763
 URL: https://issues.apache.org/jira/browse/LUCENE-1763
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1763.patch


 MergePolicy does not require an IW upon construction, but requires one to be 
 passed as method arg to various methods. This gives the impression as if a 
 single MP instance can be shared across various IW instances, which is not 
 true for all MPs (if at all). In addition, LogMergePolicy uses the IW 
 instance passed to these methods incosistently, and is currently exposed to 
 potential NPEs.
 This issue will change MP to require an IW instance, however for back-compat 
 reasons the following changes will be made:
 # A new MP ctor w/ IW as arg will be introduced. Additionally, for 
 back-compat a default ctor will also be declared which will assign null to 
 the member IW.
 # Methods that require IW will be deprecated, and new ones will be declared.
 #* For back-compat, the new ones will not be made abstract, but will throw 
 UOE, with a comment that they will become abstract in 3.0.
 # All current MP impls will move to use the member instance.
 # The code which calls MP methods will continue to use the deprecated 
 methods, passing an IW even that it won't be necessary -- this is strictly 
 for back-compat.
 In 3.0, we'll remove the deprecated default ctor and methods, and change the 
 code to not call the IW method variants anymore.
 I hope that I didn't leave anything out. I'm sure I'll find out when I work 
 on the patch :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1749) FieldCache introspection API

2009-07-29 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1749:
-

Attachment: LUCENE-1749.patch

checkpoint: refactored the sanity checking code into a utility class and wrote 
tests specifically for it to prove it finds insane stuff.

TODO:
* clean up the api, make it less clunky (and not static)
** return structured data showing exactly which combinations in FieldCache are 
insane
* javadocs
* figure out why previously mentioned tests are breaking (need help with this 
one ... don't know enough about the code these tests excercise)

 FieldCache introspection API
 

 Key: LUCENE-1749
 URL: https://issues.apache.org/jira/browse/LUCENE-1749
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Priority: Minor
 Fix For: 2.9

 Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, 
 LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch


 FieldCache should expose an Expert level API for runtime introspection of the 
 FieldCache to provide info about what is in the FieldCache at any given 
 moment.  We should also provide utility methods for sanity checking that the 
 FieldCache doesn't contain anything odd...
* entries for the same reader/field with different types/parsers
* entries for the same field/type/parser in a reader and it's subreader(s)
* etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1567) New flexible query parser

2009-07-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736662#action_12736662
 ] 

Uwe Schindler commented on LUCENE-1567:
---

Just a question: Will it be possible to specify some type of schema for the 
query parser in future, to automatically create NumericRangeQuery for different 
numeric types? It would then be possible to index a numeric value 
(double,float,long,int) using NumericField and then the query parser knows, 
which type of field this is and so it correctly creates a NumericRangeQuery for 
strings like [1.567..*] or (1.787..19.5]. NumericRangeQuery also supports 
the rewrite modes, only some type of schema support is missing.

I ask this, because someone asked on java-user for such a feature in query 
parser.

 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Michael Busch
 Fix For: 2.9

 Attachments: lucene-1567.patch, 
 lucene_1567_adriano_crestani_07_13_2009.patch, 
 lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
 lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
 lucene_trunk_FlexQueryParser_2009july15_v6.patch, 
 lucene_trunk_FlexQueryParser_2009july16_v7.patch, 
 lucene_trunk_FlexQueryParser_2009july23_v8.patch, 
 lucene_trunk_FlexQueryParser_2009july27_v9.patch, 
 lucene_trunk_FlexQueryParser_2009july28_v10.patch, 
 lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
 QueryParser_restructure_meetup_june2009_v2.pdf, 
 wiki_switching_to_the_new_query_parser.txt


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 3. QueryBuilder
 1. The upper layer is the parsing layer which simply transforms the
 query text string into a QueryNodeTree. Currently our implementations of
 this layer use javacc.
 2. The query node processors do most of the work. It is in fact a
 configurable chain of processors. Each processors can walk the tree and
 modify nodes or even the tree's structure. That makes it possible to
 e.g. do query optimization before the query is executed or to tokenize
 terms.
 3. The third layer is also a configurable chain of builders, which
 transform the QueryNodeTree into Lucene Query objects.
 Furthermore the query parser uses flexible configuration objects, which
 are based on AttributeSource/Attribute. It also uses message classes that
 allow to attach resource bundles. This makes it possible to translate
 messages, which is an important feature of a query parser.
 This design allows us to develop different query syntaxes very quickly.
 Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
 underlying processors and builders in a few days. We now have a 100%
 compatible Lucene query parser, 

[jira] Commented: (LUCENE-1748) getPayloadSpans on org.apache.lucene.search.spans.SpanQuery should be abstract

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736674#action_12736674
 ] 

Mark Miller commented on LUCENE-1748:
-

This is going to require a patch to the 2.4 back compat branch to pass tests.

 getPayloadSpans on org.apache.lucene.search.spans.SpanQuery should be abstract
 --

 Key: LUCENE-1748
 URL: https://issues.apache.org/jira/browse/LUCENE-1748
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.4, 2.4.1
 Environment: all
Reporter: Hugh Cayless
Assignee: Mark Miller
 Fix For: 2.9, 3.0, 3.1

 Attachments: LUCENE-1748.patch


 I just spent a long time tracking down a bug resulting from upgrading to 
 Lucene 2.4.1 on a project that implements some SpanQuerys of its own and was 
 written against 2.3.  Since the project's SpanQuerys didn't implement 
 getPayloadSpans, the call to that method went to SpanQuery.getPayloadSpans 
 which returned null and caused a NullPointerException in the Lucene code, far 
 away from the actual source of the problem.  
 It would be much better for this kind of thing to show up at compile time, I 
 think.
 Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1766:
---

Attachment: LUCENE-1766.patch

OK another rev!

I backed away from giving particulars on how should synchronize and just said 
generically use your own (non-Lucene) objects instead.

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, 
 LUCENE-1766.patch, LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



backwards compat tests

2009-07-29 Thread Mark Miller
Is their a wiki page on how to handle updating the back compat tests? I
found some mail regarding it, but most of what I found was older. The latest
I saw talked about the separate branch, and updating that branch with fixes
if you need too - but I see now it seems to work with tags?
Do I update the branch, tag it with the current date, then update the build
file to point to the new tag (compatibility.tag)?

-- 
- Mark

http://www.lucidimagination.com


Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Michael McCandless
+1

Though not by docID (since they aren't reliable in context of
IndexWriter)... and it should be undeleteDocuments (with an s) since
it could affect more than one doc.

Mike

On Wed, Jul 29, 2009 at 10:55 AM, Shai Ereraser...@gmail.com wrote:
 Hi

 I think such methods are useful for a Lucene app, which needs to rollback a
 single document delete. Today, IndexReader offers undeleteAll(), which is a
 bit extreme. There are two scenarios for this, that I know of:
 1) (recently showed up on the user list) I'd like to synchronize documents
 on disk and in the index. So if I have a document in the index which I want
 to delete, and also a file on the file system (corresponds to an ID or
 something), and the file delete fails, I may want to undelete that document.
 This has alternatives, but still and undeleteDocument will be useful in this
 case.

 2) ParallelReader allows one to add a document to two indexes, some fields
 to one index and other to the second index, and then read those indexes in
 parallel. Such applications will need to delete documents sometimes, and an
 undeleteDocument will be useful if a transactional delete is needed: i.e.,
 if the first delete succeeds, and the second fails, undo the first delete.

 3) ParallelReader doesn't support deleteDocument well currently - i.e., if
 one of the deletes fail, some readers will be left w/ the document and some
 won't (this is I think a bug).

 What do you think?

 Shai


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: backwards compat tests

2009-07-29 Thread Michael McCandless
I think it's not documented anywhere... roughly these are the steps:

  * Make mods to tags/lucene_2_4_.../* so ant test-tag passes

  * Use svn switch to switch that tags checkout from a tag to the
2_4 back compat branch

  * Commit from that dir  plant a new tag

  * Update common-build.xml to point to the new tag

  * Maybe run ant test-tag again and confirm everything passes

  * Commit at the top level

Mike

On Wed, Jul 29, 2009 at 12:23 PM, Mark Millermarkrmil...@gmail.com wrote:
 Is their a wiki page on how to handle updating the back compat tests? I
 found some mail regarding it, but most of what I found was older. The latest
 I saw talked about the separate branch, and updating that branch with fixes
 if you need too - but I see now it seems to work with tags?
 Do I update the branch, tag it with the current date, then update the build
 file to point to the new tag (compatibility.tag)?

 --
 - Mark

 http://www.lucidimagination.com



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736696#action_12736696
 ] 

Simon Willnauer commented on LUCENE-1766:
-

looks good. private final Object is rather a general best practice than 
something lucene or module specific.

simon

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, 
 LUCENE-1766.patch, LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1752) incorrect snippet returned with SpanScorer

2009-07-29 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved LUCENE-1752.
-

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

Thanks Koji!

 incorrect snippet returned with SpanScorer
 --

 Key: LUCENE-1752
 URL: https://issues.apache.org/jira/browse/LUCENE-1752
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.9
Reporter: Koji Sekiguchi
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1752.patch


 This problem was reported by my customer. They are using Solr 1.3 and 
 uni-gram, but it can be reproduced with Lucene 2.9 and WhitespaceAnalyzer.
 {panel:title=Query}
 (f1:a b c d OR f2:a b c d) AND (f1:b c g OR f2:b c g)
 {panel}
 The snippet we expected is:
 {panel}
 x y z Ba/B Bb/B Bc/B Bd/B e f g Bb/B Bc/B Bg/B
 {panel}
 but we got:
 {panel}
 x y z Ba/B b c Bd/B e f g Bb/B Bc/B Bg/B
 {panel}
 Program to reproduce the problem:
 {code}
 public class TestHighlighter {
   static final String CONTENT = x y z a b c d e f g b c g;
   static final String PH1 = \a b c d\;
   static final String PH2 = \b c g\;
   static final String F1 = f1;
   static final String F2 = f2;
   static final String F1C = F1 + :;
   static final String F2C = F2 + :;
   static final String QUERY_STRING =
 ( + F1C + PH1 +  OR  + F2C + PH1 + ) AND (
 + F1C + PH2 +  OR  + F2C + PH2 + );
   static Analyzer analyzer = new WhitespaceAnalyzer();
   
   public static void main(String[] args) throws Exception {
 QueryParser qp = new QueryParser( F1, analyzer );
 Query query = qp.parse( QUERY_STRING );
 CachingTokenFilter stream = new CachingTokenFilter( analyzer.tokenStream( 
 F1, new StringReader( CONTENT ) ) );
 Scorer scorer = new SpanScorer( query, F1, stream, false );
 Highlighter h = new Highlighter( scorer );
 System.out.println( query :  + QUERY_STRING );
 System.out.println( h.getBestFragment( analyzer, F1,  CONTENT ) );
   }
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1767) Add sizeof to OpenBitSet

2009-07-29 Thread Jason Rutherglen (JIRA)
Add sizeof to OpenBitSet


 Key: LUCENE-1767
 URL: https://issues.apache.org/jira/browse/LUCENE-1767
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 2.9


Adding a sizeof method to OpenBitSet will facilitate estimating RAM usage when 
many OBS' are cached (such as Solr).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1767) Add sizeof to OpenBitSet

2009-07-29 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1767:
-

Attachment: LUCENE-1767.patch

Added sizeOf method

 Add sizeof to OpenBitSet
 

 Key: LUCENE-1767
 URL: https://issues.apache.org/jira/browse/LUCENE-1767
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 2.9

 Attachments: LUCENE-1767.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Adding a sizeof method to OpenBitSet will facilitate estimating RAM usage 
 when many OBS' are cached (such as Solr).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1767) Add sizeof to OpenBitSet

2009-07-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736724#action_12736724
 ] 

Simon Willnauer commented on LUCENE-1767:
-

Jason, I would expect a sizeOf method to return the size of the bitset itself 
(what #size()) returns. Maybe you find another name for that method. I also 
think you can safely leave the constants out - once you leave those out this 
method is almost identical to #capacity / #size.

I'm not sure if such a method would rather confuse users / developers. If we 
add it I would rather go for a very meaningful name like allocatedBytes.

simon

 Add sizeof to OpenBitSet
 

 Key: LUCENE-1767
 URL: https://issues.apache.org/jira/browse/LUCENE-1767
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 2.9

 Attachments: LUCENE-1767.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Adding a sizeof method to OpenBitSet will facilitate estimating RAM usage 
 when many OBS' are cached (such as Solr).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1749) FieldCache introspection API

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736732#action_12736732
 ] 

Mark Miller commented on LUCENE-1749:
-

bq. figure out why previously mentioned tests are breaking (need help with this 
one ... don't know enough about the code these tests excercise

Eh - its yucky. There are parts where the tests are passing the top level 
reader (say to a collector) when it should be using the sub readers. I fixed 
one :)
But then there is more - looked at a couple more difficult ones that also pass 
the top level reader for the test.

And then there is explain - IndexSearcher passes the top level reader to the 
weight explain, and valuesourcequery will get a fieldcache based on that 
reader. I guess that one is a bug.

And there are prob a few other similar type things...

 FieldCache introspection API
 

 Key: LUCENE-1749
 URL: https://issues.apache.org/jira/browse/LUCENE-1749
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Priority: Minor
 Fix For: 2.9

 Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, 
 LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch


 FieldCache should expose an Expert level API for runtime introspection of the 
 FieldCache to provide info about what is in the FieldCache at any given 
 moment.  We should also provide utility methods for sanity checking that the 
 FieldCache doesn't contain anything odd...
* entries for the same reader/field with different types/parsers
* entries for the same field/type/parser in a reader and it's subreader(s)
* etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1766.


Resolution: Fixed

 Add Thread-Safety note to IndexWriter JavaDoc
 -

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, 
 LUCENE-1766.patch, LUCENE-1766.patch


 IndexWriter Javadocs should contain a note about thread-safety. This is 
 already mentioned on the wiki FAQ page but such an essential information 
 should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1749) FieldCache introspection API

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736750#action_12736750
 ] 

Mark Miller commented on LUCENE-1749:
-

bq. And then there is explain - IndexSearcher passes the top level reader to 
the weight explain, and valuesourcequery will get a fieldcache based on that 
reader. I guess that one is a bug.

I don't even know what to do about this one. All I can think is that you pump 
out an explain for each sub reader - but thats pretty unhelpful.

Perhaps the best we can do is javadoc the extra requirements that may be needed 
when you use explain?

 FieldCache introspection API
 

 Key: LUCENE-1749
 URL: https://issues.apache.org/jira/browse/LUCENE-1749
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Priority: Minor
 Fix For: 2.9

 Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, 
 LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch


 FieldCache should expose an Expert level API for runtime introspection of the 
 FieldCache to provide info about what is in the FieldCache at any given 
 moment.  We should also provide utility methods for sanity checking that the 
 FieldCache doesn't contain anything odd...
* entries for the same reader/field with different types/parsers
* entries for the same field/type/parser in a reader and it's subreader(s)
* etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: backwards compat tests

2009-07-29 Thread Uwe Schindler
I do it that way:

 

-  Checkout the backwards branch (not the tag) to
trunk/tags/lucene_2_4_back_compat_tests. I have this checkout everytime
there, I update it regularily together with trunk.

-  Place and leave a build.properties files with the following line
in your trunk dir: tag=lucene_2_4_back_compat_tests

-  You can then test using ant test / test-tag and so on, the java
property fixes the tag directory to your branch checkout. The good thing is,
that you always have the last revision of branch and can modify and commit
it directly.

-  If everything is ok, do a tag from your checked out branch (svn
copy .) and then update the main common-build.xml

 

I was always wondering: Why do we need tags for the backwards tests? Why not
just automatically checkout the revision equal to the current trunk revision
for testing (what I did manually)? Currently we always have to create a new
tag after each commit to backwards branch, this is somehow strange (ok, by
that you fix the revision used for testing this trunk checkout, but if you
checkout the same revision no in the backwards branch that trunk currently
has, it would always be correctly related).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de http://www.thetaphi.de
eMail: u...@thetaphi.de

  _  

From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Wednesday, July 29, 2009 6:24 PM
To: java-dev@lucene.apache.org
Subject: backwards compat tests

 

Is their a wiki page on how to handle updating the back compat tests? I
found some mail regarding it, but most of what I found was older. The latest
I saw talked about the separate branch, and updating that branch with fixes
if you need too - but I see now it seems to work with tags?

 

Do I update the branch, tag it with the current date, then update the build
file to point to the new tag (compatibility.tag)?

-- 
- Mark

http://www.lucidimagination.com



Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Shai Erera
Yes of course. I meant to create an undeleteDoc variant for every deleteDoc.
So if IndexWriter has deleteDocuments(Term), I will add
undeleteDocuments(Term). If IndexReader has deleteDocument(int), I will add
undeleteDocument(int).

It is up to the caller to make sure whatever he undeletes was indeed
deleted, i.e., if you reader.deleteDocument(4) and then
reader.undeleteDocument(4), you should make sure that 4 represents the same
document.

In fact, I think it might be useful to restrict the undeleteDoc methods to
the same reader instance with which they were deleted? It's easy to do by
checking if deletedDocs does not contain any of the docs passed to the
undelete method. The rational is that I believe the best use case for these
undelete methods to be a mini undo of the last delete. Using the same
reader instance you're guaranteed that the document is still deleted
between delete() and undelete().

Also, since I can only open the index for write once, whether by IndexWriter
or IndexReader w/ readOnly=false, we can guarantee that an undelete followed
by delete is safe?

Shai

On Wed, Jul 29, 2009 at 7:26 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 +1

 Though not by docID (since they aren't reliable in context of
 IndexWriter)... and it should be undeleteDocuments (with an s) since
 it could affect more than one doc.

 Mike

 On Wed, Jul 29, 2009 at 10:55 AM, Shai Ereraser...@gmail.com wrote:
  Hi
 
  I think such methods are useful for a Lucene app, which needs to rollback
 a
  single document delete. Today, IndexReader offers undeleteAll(), which is
 a
  bit extreme. There are two scenarios for this, that I know of:
  1) (recently showed up on the user list) I'd like to synchronize
 documents
  on disk and in the index. So if I have a document in the index which I
 want
  to delete, and also a file on the file system (corresponds to an ID or
  something), and the file delete fails, I may want to undelete that
 document.
  This has alternatives, but still and undeleteDocument will be useful in
 this
  case.
 
  2) ParallelReader allows one to add a document to two indexes, some
 fields
  to one index and other to the second index, and then read those indexes
 in
  parallel. Such applications will need to delete documents sometimes, and
 an
  undeleteDocument will be useful if a transactional delete is needed:
 i.e.,
  if the first delete succeeds, and the second fails, undo the first
 delete.
 
  3) ParallelReader doesn't support deleteDocument well currently - i.e.,
 if
  one of the deletes fail, some readers will be left w/ the document and
 some
  won't (this is I think a bug).
 
  What do you think?
 
  Shai
 

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




[jira] Updated: (LUCENE-1762) Slightly more readable code in Token/TermAttributeImpl

2009-07-29 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1762:
--

Description: 
No big deal. 

growTermBuffer(int newSize) was using correct, but slightly hard to follow 
code. 

the method was returning null as a hint that the current termBuffer has enough 
space to the upstream code or reallocated buffer.

this patch simplifies logic   making this method to only reallocate buffer, 
nothing more.  
It reduces number of if(null) checks in a few methods and reduces amount of 
code. 
all tests pass.

This also adds tests for the new basic attribute impls (copies of the Token 
tests).

  was:
No big deal. 

growTermBuffer(int newSize) was using correct, but slightly hard to follow 
code. 

the method was returning null as a hint that the current termBuffer has enough 
space to the upstream code or reallocated buffer.

this patch simplifies logic   making this method to only reallocate buffer, 
nothing more.  
It reduces number of if(null) checks in a few methods and reduces amount of 
code. 
all tests pass.

Summary: Slightly more readable code in Token/TermAttributeImpl  (was: 
Slightly more readable code in TermAttributeImpl )

 Slightly more readable code in Token/TermAttributeImpl
 --

 Key: LUCENE-1762
 URL: https://issues.apache.org/jira/browse/LUCENE-1762
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 2.9
Reporter: Eks Dev
Assignee: Uwe Schindler
Priority: Trivial
 Fix For: 2.9

 Attachments: LUCENE-1762-bw.patch, LUCENE-1762.patch, 
 LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch


 No big deal. 
 growTermBuffer(int newSize) was using correct, but slightly hard to follow 
 code. 
 the method was returning null as a hint that the current termBuffer has 
 enough space to the upstream code or reallocated buffer.
 this patch simplifies logic   making this method to only reallocate buffer, 
 nothing more.  
 It reduces number of if(null) checks in a few methods and reduces amount of 
 code. 
 all tests pass.
 This also adds tests for the new basic attribute impls (copies of the Token 
 tests).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1762) Slightly more readable code in Token/TermAttributeImpl

2009-07-29 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-1762.
-

Resolution: Fixed

Committed revision: 799025
This is without CHANGES.txt updates, because nothing was changed that is 
visible to the outside :-)

Thanks Eks!

 Slightly more readable code in Token/TermAttributeImpl
 --

 Key: LUCENE-1762
 URL: https://issues.apache.org/jira/browse/LUCENE-1762
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 2.9
Reporter: Eks Dev
Assignee: Uwe Schindler
Priority: Trivial
 Fix For: 2.9

 Attachments: LUCENE-1762-bw.patch, LUCENE-1762.patch, 
 LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch


 No big deal. 
 growTermBuffer(int newSize) was using correct, but slightly hard to follow 
 code. 
 the method was returning null as a hint that the current termBuffer has 
 enough space to the upstream code or reallocated buffer.
 this patch simplifies logic   making this method to only reallocate buffer, 
 nothing more.  
 It reduces number of if(null) checks in a few methods and reduces amount of 
 code. 
 all tests pass.
 This also adds tests for the new basic attribute impls (copies of the Token 
 tests).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: backwards compat tests

2009-07-29 Thread Shai Erera
Uwe - I asked this question a while ago on LUCENE-1529 and this is an answer
Mike gave:
http://issues.apache.org/jira/browse/LUCENE-1529?focusedCommentId=12699177page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12699177

I think it's related to what you ask

Shai

On Wed, Jul 29, 2009 at 10:01 PM, Uwe Schindler u...@thetaphi.de wrote:

  I do it that way:



 -  Checkout the backwards branch (not the tag) to
 trunk/tags/lucene_2_4_back_compat_tests. I have this checkout everytime
 there, I update it regularily together with trunk.

 -  Place and leave a build.properties files with the following
 line in your trunk dir: “tag=lucene_2_4_back_compat_tests”

 -  You can then test using ant test / test-tag and so on, the java
 property fixes the tag directory to your branch checkout. The good thing is,
 that you always have the last revision of branch and can modify and commit
 it directly.

 -  If everything is ok, do a tag from your checked out branch (svn
 copy …) and then update the main common-build.xml



 I was always wondering: Why do we need tags for the backwards tests? Why
 not just automatically checkout the revision equal to the current trunk
 revision for testing (what I did manually)? Currently we always have to
 create a new tag after each commit to backwards branch, this is somehow
 strange (ok, by that you fix the revision used for testing this trunk
 checkout, but if you checkout the same revision no in the backwards branch
 that trunk currently has, it would always be correctly related).

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
   --

 *From:* Mark Miller [mailto:markrmil...@gmail.com]
 *Sent:* Wednesday, July 29, 2009 6:24 PM
 *To:* java-dev@lucene.apache.org
 *Subject:* backwards compat tests



 Is their a wiki page on how to handle updating the back compat tests? I
 found some mail regarding it, but most of what I found was older. The latest
 I saw talked about the separate branch, and updating that branch with fixes
 if you need too - but I see now it seems to work with tags?



 Do I update the branch, tag it with the current date, then update the build
 file to point to the new tag (compatibility.tag)?

 --
 - Mark

 http://www.lucidimagination.com



Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Michael McCandless
On Wed, Jul 29, 2009 at 3:05 PM, Shai Ereraser...@gmail.com wrote:
 Yes of course. I meant to create an undeleteDoc variant for every deleteDoc.
 So if IndexWriter has deleteDocuments(Term), I will add
 undeleteDocuments(Term). If IndexReader has deleteDocument(int), I will add
 undeleteDocument(int).

OK.

 It is up to the caller to make sure whatever he undeletes was indeed
 deleted, i.e., if you reader.deleteDocument(4) and then
 reader.undeleteDocument(4), you should make sure that 4 represents the same
 document.

Presumably in IndexReader we can return int count (how many deleted),
but in IndexWriter it's void.

 In fact, I think it might be useful to restrict the undeleteDoc methods to
 the same reader instance with which they were deleted? It's easy to do by
 checking if deletedDocs does not contain any of the docs passed to the
 undelete method. The rational is that I believe the best use case for these
 undelete methods to be a mini undo of the last delete. Using the same
 reader instance you're guaranteed that the document is still deleted
 between delete() and undelete().

That might be too restrictive?  Ie, this is the best use case we can
picture today, but others could come up with different use cases, and
there's no technical reason for such a restriction?

undeleteAll doesn't have such a restriction.

 Also, since I can only open the index for write once, whether by IndexWriter
 or IndexReader w/ readOnly=false, we can guarantee that an undelete followed
 by delete is safe?

Or the undelete methods in IndexReader could just acquire the write lock?

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Shai Erera

 Or the undelete methods in IndexReader could just acquire the write lock?


I'll need to open IndexReader w/ readOnly=false if I want to delete/undelete
a document, no? And then I'll need to acquire the write lock, just like any
other write operation done through IndexReader, right?

Or do you suggest we allow this for readOnly IndexReaders too?

That might be too restrictive?


Yes - I pointed that just as a safety measure. However, sometimes
(especially following the 'agile' guidelines) it's better to develop
something for a problem we know exist, rather than trying to over-engineer
for something we 'think might exist'. If a good use case will be presented
in the future which requires the undelete to work also in readers that did
not do the delete themselves, we can change that behavior then, no?

Maybe I'll start to work on it and we can decide that as we go? There's no
point making decisions now, when we don't know if it is a major thing to
support or not. Maybe it can be supported 'for free', and then it won't be a
question at all.

Shai

On Wed, Jul 29, 2009 at 10:58 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 undeleteAll doesn't have such a restriction.



Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Michael McCandless
On Wed, Jul 29, 2009 at 4:06 PM, Shai Ereraser...@gmail.com wrote:
 Or the undelete methods in IndexReader could just acquire the write lock?

 I'll need to open IndexReader w/ readOnly=false if I want to delete/undelete
 a document, no? And then I'll need to acquire the write lock, just like any
 other write operation done through IndexReader, right?

 Or do you suggest we allow this for readOnly IndexReaders too?

Right, you'll definitely need to acquire the write lock for undeleteDoc.

 That might be too restrictive?

 Yes - I pointed that just as a safety measure. However, sometimes
 (especially following the 'agile' guidelines) it's better to develop
 something for a problem we know exist, rather than trying to over-engineer
 for something we 'think might exist'. If a good use case will be presented
 in the future which requires the undelete to work also in readers that did
 not do the delete themselves, we can change that behavior then, no?

 Maybe I'll start to work on it and we can decide that as we go? There's no
 point making decisions now, when we don't know if it is a major thing to
 support or not. Maybe it can be supported 'for free', and then it won't be a
 question at all.

I agree!  There's no need to decide now.  So let's defer.

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: backwards compat tests

2009-07-29 Thread Michael McCandless
On Wed, Jul 29, 2009 at 4:31 PM, Uwe Schindleru...@thetaphi.de wrote:

 My suggestion was to write the build script in a way that it checks out the
 branch with the same revision number as the current base dir (trunk).

I think this would work, as long as we always commit top-level and
back-compat tag in one transaction (commit)?

(And, even if we don't do it as one commit, the risk that someone
happens to do a checkout between the two commits is presumably
negligible).

 Alternatively instead of putting a tag name into common-build.xml, it could
 be the revision number. So it would check out …/branches/
 lucene_2_4_back_compat_tests with the revision given in common-build.

This would also be better than what we have today (saves the extra
svn copy step), but if we can make the first approach work that's
even better!

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: backwards compat tests

2009-07-29 Thread Uwe Schindler
  My suggestion was to write the build script in a way that it checks out
 the
  branch with the same revision number as the current base dir (trunk).
 
 I think this would work, as long as we always commit top-level and
 back-compat tag in one transaction (commit)?
 
 (And, even if we don't do it as one commit, the risk that someone
 happens to do a checkout between the two commits is presumably
 negligible).

I think if you first commit in backwards-branch and then in trunk, you never
get an inconsistent state. The trunk revision is lower than the new branch
revision, so nothing changes, as a trunk checkout and test-tag would run the
tests from its current revision (that did not change).

This is the same as now. You can modify the bw-branch and create a new tag,
but as trunks common-build is not updated, nobody would see it.

You only get an inconsistent state if you have run test-tag before and have
a current checkout of the bw-branch. If you then do svn update on the
bw-branch you will update this to last revision. But if you do this, you
will also update trunk (otherwise it would not make sense).

There is only one problem: If you already have checked out the branch with a
specific revision and then update trunk, the next test-run will use the old
tests (as dir already exists, currently it would checkout a new tag because
dir name changed). Because of this, test-tag should also do a svn update to
the current trunk's revision.

  Alternatively instead of putting a tag name into common-build.xml, it
 could
  be the revision number. So it would check out ./branches/
  lucene_2_4_back_compat_tests with the revision given in common-build.
 
 This would also be better than what we have today (saves the extra
 svn copy step), but if we can make the first approach work that's
 even better!

I suggest two variables in common-build.xml:
- backwards-branch or backwards-branch-url (must be changed when 3.0 is out
and 3.1 starts in trunk).
- backwards-revision

The same problem with trunk updated and branch still available also happens
here. So each run of test-tag should do a svn update to the revision from
the config before (maybe give the possibility to switch this off or only
update, never downgrade)


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736851#action_12736851
 ] 

Mark Miller commented on LUCENE-1486:
-

If we don't have a clear path for this very soon I think we should pull it from 
this release.

 Wildcards, ORs etc inside Phrase queries
 

 Key: LUCENE-1486
 URL: https://issues.apache.org/jira/browse/LUCENE-1486
 Project: Lucene - Java
  Issue Type: Improvement
  Components: QueryParser
Affects Versions: 2.4
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Fix For: 2.9

 Attachments: ComplexPhraseQueryParser.java, 
 junit_complex_phrase_qp_07_21_2009.patch, 
 junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
 field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 LUCENE-1486.patch, TestComplexPhraseQuery.java


 An extension to the default QueryParser that overrides the parsing of 
 PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
 The implementation feels a little hacky - this is arguably better handled in 
 QueryParser itself. This works as a proof of concept  for much of the query 
 parser syntax. Examples from the Junit test include:
   checkMatches(\j*   smyth~\, 1,2); //wildcards and fuzzies 
 are OK in phrases
   checkMatches(\(jo* -john)  smith\, 2); // boolean logic 
 works
   checkMatches(\jo*  smith\~2, 1,2,3); // position logic 
 works.
   
   checkBadQuery(\jo*  id:1 smith\); //mixing fields in a 
 phrase is bad
   checkBadQuery(\jo* \smith\ \); //phrases inside phrases 
 is bad
   checkBadQuery(\jo* [sma TO smZ]\ \); //range queries 
 inside phrases not supported
 Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-625) Query auto completer

2009-07-29 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736858#action_12736858
 ] 

Jason Rutherglen commented on LUCENE-625:
-

Karl, did you ever proceed on this patch?  I'm interested in adding autosuggest 
to Solr.

 Query auto completer
 

 Key: LUCENE-625
 URL: https://issues.apache.org/jira/browse/LUCENE-625
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Karl Wettin
Priority: Minor
 Attachments: autocomplete_0.0.1.tar.gz, autocomplete_20060730.tar.gz


 A trie that helps users to type in their query. Made for AJAX, works great 
 with ruby on rails common scripts http://script.aculo.us/. Similar to the 
 Google labs suggester.
 Trained by user queries. Optimizable. Uses an in memory corpus. Serializable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1567) New flexible query parser

2009-07-29 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736879#action_12736879
 ] 

Michael Busch commented on LUCENE-1567:
---

{quote}
Could you also please fix the javadocs? When I'm building the javadocs I'm 
getting a lot of warnings about not found references.
{quote}

The warnings occur because you put links to the new contrib queryparser into 
the core queryparser. That doesn't work as the contribs are not in the 
classpath of the core, so I think we should remove those links and change them 
just to plain text.

Also, please make sure to add to the main build.xml appropriate entries for the 
javadocs, otherwise the All javadocs will not contain the contrib QP classes.

There are also some TODOs in the docs; especially in top-level places, such as 
the package.html of your new package, we should not have TODOs in the docs. 
Please fix that soon, 2.9 is coming quickly. 

 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Michael Busch
 Fix For: 2.9

 Attachments: lucene-1567.patch, 
 lucene_1567_adriano_crestani_07_13_2009.patch, 
 lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
 lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
 lucene_trunk_FlexQueryParser_2009july15_v6.patch, 
 lucene_trunk_FlexQueryParser_2009july16_v7.patch, 
 lucene_trunk_FlexQueryParser_2009july23_v8.patch, 
 lucene_trunk_FlexQueryParser_2009july27_v9.patch, 
 lucene_trunk_FlexQueryParser_2009july28_v10.patch, 
 lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
 QueryParser_restructure_meetup_june2009_v2.pdf, 
 wiki_switching_to_the_new_query_parser.txt


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 3. QueryBuilder
 1. The upper layer is the parsing layer which simply transforms the
 query text string into a QueryNodeTree. Currently our implementations of
 this layer use javacc.
 2. The query node processors do most of the work. It is in fact a
 configurable chain of processors. Each processors can walk the tree and
 modify nodes or even the tree's structure. That makes it possible to
 e.g. do query optimization before the query is executed or to tokenize
 terms.
 3. The third layer is also a configurable chain of builders, which
 transform the QueryNodeTree into Lucene Query objects.
 Furthermore the query parser uses flexible configuration objects, which
 are based on AttributeSource/Attribute. It also uses message classes that
 allow to attach resource bundles. This makes it possible to translate
 messages, which is an important feature of a query parser.
 This design allows us to develop different query syntaxes very quickly.
 Adriano wrote the 

[jira] Updated: (LUCENE-1749) FieldCache introspection API

2009-07-29 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1749:


Attachment: LUCENE-1749.patch

Updates:

* merged in updated ram usage estimator code
* updated most failing tests to work without creating top level FieldCaches
* removed offending calls to explain - I left nocommit comments here - 
depending on what we decide, we could turn off the subreader check for these
* Turned off the subreader check for stress sort test - it sorts in back compat 
mode and compares to the new mode - so it loads both on purpose.
* I don't remember if I touched anything else.

tests pass now

 FieldCache introspection API
 

 Key: LUCENE-1749
 URL: https://issues.apache.org/jira/browse/LUCENE-1749
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
Priority: Minor
 Fix For: 2.9

 Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, 
 LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, 
 LUCENE-1749.patch


 FieldCache should expose an Expert level API for runtime introspection of the 
 FieldCache to provide info about what is in the FieldCache at any given 
 moment.  We should also provide utility methods for sanity checking that the 
 FieldCache doesn't contain anything odd...
* entries for the same reader/field with different types/parsers
* entries for the same field/type/parser in a reader and it's subreader(s)
* etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-625) Query auto completer

2009-07-29 Thread Karl Wettin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736923#action_12736923
 ] 

Karl Wettin commented on LUCENE-625:


bq. Karl, did you ever proceed on this patch? I'm interested in adding 
autosuggest to Solr.

I used this patch for a few things a couple of years ago. If I recall 
everything right I ended up using the bootstrapped apriori corpus of LUCENE-626 
as training data the last time. Made the corpus rather small, speedy and still 
relevant for most users.

But the major caveat is that this patch is a trie and is thus a precise 
forward only thing. So that might not fit all use cases. It might be easier to 
get things going using an index with ngrams of untokenized user queries (i.e. 
including whitespace) or subject-like fields. 

But I really prefere user queries as using only the last n queries will make it 
sensitive to trends. That will however require quite a bit of data to work 
well. A lot as in hundreds of thousands of user queries, according to my 
experience.

Not sure if this was an answer to your question.. : )

 Query auto completer
 

 Key: LUCENE-625
 URL: https://issues.apache.org/jira/browse/LUCENE-625
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Karl Wettin
Priority: Minor
 Attachments: autocomplete_0.0.1.tar.gz, autocomplete_20060730.tar.gz


 A trie that helps users to type in their query. Made for AJAX, works great 
 with ruby on rails common scripts http://script.aculo.us/. Similar to the 
 Google labs suggester.
 Trained by user queries. Optimizable. Uses an in memory corpus. Serializable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1695) Update the Highlighter to use the new TokenStream API

2009-07-29 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1695:


Attachment: LUCENE-1695.patch

To trunk

 Update the Highlighter to use the new TokenStream API
 -

 Key: LUCENE-1695
 URL: https://issues.apache.org/jira/browse/LUCENE-1695
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 2.9

 Attachments: LUCENE-1695.patch, LUCENE-1695.patch, LUCENE-1695.patch, 
 LUCENE-1695.patch, LUCENE-1695.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-07-29 Thread Luis Alves (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736965#action_12736965
 ] 

Luis Alves commented on LUCENE-1486:


My understanding is that with New flexible query parser (LUCENE-1567),
the old QueryParser classes will be deprecated in 2.9
and removed in 3.0 (or moved to contrib in 3.0).

This change will also make ComplexPhraseQueryParser deprecated
because it currently extends the old queryparser.

ComplexPhraseQueryParser was not part of any lucene release
and was only checked in 2 months ago in trunk.

For the reasons above I think we should re-implement this functionality
using the new flexible query parser.

3.0 and 2.9 releases will be very similar 
but 3.0 will have all deprecated APIs removed (at least this is my 
understanding).

In my view the path should be:
- Wait for LUCENE-1567 to be in trunk
- re-implement this feature using the New flexible query parser
- and probably do it using a super set of the current syntax with a new 
TextParser.

I'm not sure if I'll have the time to implement a compatible implementation of
ComplexPhraseQueryParser before 2.9 release :(

I'm currently working on 1567 to finalize the patch,
cleaning up javadocs and some small clean up to the APIs.

I'll try to work on ComplexPhraseQueryParser,
once lucene-1567 is in the trunk.

So in my view, ComplexPhraseQueryParser depends on 1567, 
and will require some extra work after 1567 is in the trunk.

I think we have the following, options:
# We could wait until 1567 is in trunk and wait for a compatible implementation 
of ComplexPhraseQueryParser using 1567,
  before we release 2.9. (this would still remove the current 
ComplexPhraseQueryParser class, and provide this features with 
LuceneQueryParserHelper class, or with a new TextParser name complexphrase)
# We can release 2.9 with only 1567, but that will require 
ComplexPhraseQueryParser to be removed from trunk or at least deprecated in 
2.9, and in 3.X re-implement it using the New flexible query parser APIs

I hope this helps :)



 Wildcards, ORs etc inside Phrase queries
 

 Key: LUCENE-1486
 URL: https://issues.apache.org/jira/browse/LUCENE-1486
 Project: Lucene - Java
  Issue Type: Improvement
  Components: QueryParser
Affects Versions: 2.4
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Fix For: 2.9

 Attachments: ComplexPhraseQueryParser.java, 
 junit_complex_phrase_qp_07_21_2009.patch, 
 junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
 field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 LUCENE-1486.patch, TestComplexPhraseQuery.java


 An extension to the default QueryParser that overrides the parsing of 
 PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
 The implementation feels a little hacky - this is arguably better handled in 
 QueryParser itself. This works as a proof of concept  for much of the query 
 parser syntax. Examples from the Junit test include:
   checkMatches(\j*   smyth~\, 1,2); //wildcards and fuzzies 
 are OK in phrases
   checkMatches(\(jo* -john)  smith\, 2); // boolean logic 
 works
   checkMatches(\jo*  smith\~2, 1,2,3); // position logic 
 works.
   
   checkBadQuery(\jo*  id:1 smith\); //mixing fields in a 
 phrase is bad
   checkBadQuery(\jo* \smith\ \); //phrases inside phrases 
 is bad
   checkBadQuery(\jo* [sma TO smZ]\ \); //range queries 
 inside phrases not supported
 Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736966#action_12736966
 ] 

Mark Miller commented on LUCENE-1486:
-

Okay thanks. I think we should pull it for 2.9.

 Wildcards, ORs etc inside Phrase queries
 

 Key: LUCENE-1486
 URL: https://issues.apache.org/jira/browse/LUCENE-1486
 Project: Lucene - Java
  Issue Type: Improvement
  Components: QueryParser
Affects Versions: 2.4
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Fix For: 2.9

 Attachments: ComplexPhraseQueryParser.java, 
 junit_complex_phrase_qp_07_21_2009.patch, 
 junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
 field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 LUCENE-1486.patch, TestComplexPhraseQuery.java


 An extension to the default QueryParser that overrides the parsing of 
 PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
 The implementation feels a little hacky - this is arguably better handled in 
 QueryParser itself. This works as a proof of concept  for much of the query 
 parser syntax. Examples from the Junit test include:
   checkMatches(\j*   smyth~\, 1,2); //wildcards and fuzzies 
 are OK in phrases
   checkMatches(\(jo* -john)  smith\, 2); // boolean logic 
 works
   checkMatches(\jo*  smith\~2, 1,2,3); // position logic 
 works.
   
   checkBadQuery(\jo*  id:1 smith\); //mixing fields in a 
 phrase is bad
   checkBadQuery(\jo* \smith\ \); //phrases inside phrases 
 is bad
   checkBadQuery(\jo* [sma TO smZ]\ \); //range queries 
 inside phrases not supported
 Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: [jira] Commented: (LUCENE-1764) SampleComparable doesn't work well in contrib/remote tests

2009-07-29 Thread Chris Hostetter

: SortField.equals() and hashCode() contain a hint:
: 
:   /** Returns true if codeo/code is equal to this.  If a
:*  {...@link SortComparatorSource} (deprecated) or {...@link
:*  FieldCache.Parser} was provided, it must properly
:*  implement equals (unless a singleton is always used). */
: 
: Maybe we should make this more visible, contain all different SortField
: comparator/parsers and place it in the the setter methods for parser and
: comparators.

SortField doesn't seem like the right place at all -- people constructing 
instances of SortField, or calling setter methods of SortField shouldn't 
have to care about this at all -- it's people who extend 
SortComparatorSource or FieldCache.Parser who need to be aware of these 
issues, so shouldn't the class level javadocs for those packages spell it 
out?

(ideally those abstract classes would declare hasCode and equals as 
abstract to *force* people to implement them ... but ship has sailed)




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1567) New flexible query parser

2009-07-29 Thread Luis Alves (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736982#action_12736982
 ] 

Luis Alves commented on LUCENE-1567:


Hi Uwe,
{quote}
Will it be possible to specify some type of schema for the query parser in 
future, to automatically create NumericRangeQuery for different numeric types? 
It would then be possible to index a numeric value (double,float,long,int) 
using NumericField and then the query parser knows, which type of field this is 
and so it correctly creates a NumericRangeQuery for strings like [1.567..*] 
or (1.787..19.5]. NumericRangeQuery also supports the rewrite modes, only 
some type of schema support is missing.
{quote}

I think this is doable.
I don't think there is a way to extract if a field is numeric from the index, 
so 
the user will have to configure the FieldConfig objects in the ConfigHandler.
But if this is done, it will not be that difficult to implement the rest.

Can you create a new jira issue with the description of the feature,
so we can discuss the details there.
I'll try to implement that once we agree on all the details.



 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Michael Busch
 Fix For: 2.9

 Attachments: lucene-1567.patch, 
 lucene_1567_adriano_crestani_07_13_2009.patch, 
 lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
 lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
 lucene_trunk_FlexQueryParser_2009july15_v6.patch, 
 lucene_trunk_FlexQueryParser_2009july16_v7.patch, 
 lucene_trunk_FlexQueryParser_2009july23_v8.patch, 
 lucene_trunk_FlexQueryParser_2009july27_v9.patch, 
 lucene_trunk_FlexQueryParser_2009july28_v10.patch, 
 lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
 QueryParser_restructure_meetup_june2009_v2.pdf, 
 wiki_switching_to_the_new_query_parser.txt


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 3. QueryBuilder
 1. The upper layer is the parsing layer which simply transforms the
 query text string into a QueryNodeTree. Currently our implementations of
 this layer use javacc.
 2. The query node processors do most of the work. It is in fact a
 configurable chain of processors. Each processors can walk the tree and
 modify nodes or even the tree's structure. That makes it possible to
 e.g. do query optimization before the query is executed or to tokenize
 terms.
 3. The third layer is also a configurable chain of builders, which
 transform the QueryNodeTree into Lucene Query objects.
 Furthermore the query parser uses flexible configuration objects, which
 are based on AttributeSource/Attribute. It also uses message classes that
 allow to attach resource bundles. 

[jira] Updated: (LUCENE-1758) improve arabic analyzer: light8 - light10

2009-07-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1758:


Attachment: LUCENE-1758.patch

add lowercasefilter, and replace TODO: more tests with some tests.

 improve arabic analyzer: light8 - light10
 --

 Key: LUCENE-1758
 URL: https://issues.apache.org/jira/browse/LUCENE-1758
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Robert Muir
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1758.patch, LUCENE-1758.patch, LUCENE-1758.txt


 Someone mentioned on the java user list that the arabic analysis was not as 
 good as they would like.
 This patch adds the لل- prefix (light10 algorithm versus light8 algorithm).
 In the light10 paper, this improves precision from .390 to .413
 They mention this is not statistically significant, but it makes linguistic 
 sense and at least has been shown not to hurt.
 In the future, I hope openrelevance will allow us to try some more 
 approaches. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1628) Persian Analyzer

2009-07-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1628:


Attachment: LUCENE-1628.patch

add lowercasefilter, consistent with the arabic analyzer, its userfriendly for 
the common case where there is also some english text.


 Persian Analyzer
 

 Key: LUCENE-1628
 URL: https://issues.apache.org/jira/browse/LUCENE-1628
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Robert Muir
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, 
 LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.txt


 A simple persian analyzer.
 i measured trec scores with the benchmark package below against 
 http://ece.ut.ac.ir/DBRG/Hamshahri/ :
 SimpleAnalyzer:
 SUMMARY
   Search Seconds: 0.012
   DocName Seconds:0.020
   Num Points:   981.015
   Num Good Points:   33.738
   Max Good Points:   36.185
   Average Precision:  0.374
   MRR:0.667
   Recall: 0.905
   Precision At 1: 0.585
   Precision At 2: 0.531
   Precision At 3: 0.513
   Precision At 4: 0.496
   Precision At 5: 0.486
   Precision At 6: 0.487
   Precision At 7: 0.479
   Precision At 8: 0.465
   Precision At 9: 0.458
   Precision At 10:0.460
   Precision At 11:0.453
   Precision At 12:0.453
   Precision At 13:0.445
   Precision At 14:0.438
   Precision At 15:0.438
   Precision At 16:0.438
   Precision At 17:0.429
   Precision At 18:0.429
   Precision At 19:0.419
   Precision At 20:0.415
 PersianAnalyzer:
 SUMMARY
   Search Seconds: 0.004
   DocName Seconds:0.011
   Num Points:   987.692
   Num Good Points:   36.123
   Max Good Points:   36.185
   Average Precision:  0.481
   MRR:0.833
   Recall: 0.998
   Precision At 1: 0.754
   Precision At 2: 0.715
   Precision At 3: 0.646
   Precision At 4: 0.646
   Precision At 5: 0.631
   Precision At 6: 0.621
   Precision At 7: 0.593
   Precision At 8: 0.577
   Precision At 9: 0.573
   Precision At 10:0.566
   Precision At 11:0.572
   Precision At 12:0.562
   Precision At 13:0.554
   Precision At 14:0.549
   Precision At 15:0.542
   Precision At 16:0.538
   Precision At 17:0.533
   Precision At 18:0.527
   Precision At 19:0.525
   Precision At 20:0.518

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org