date:20091113

Hudson build is back to normal: Lucene-trunk #1008

2009-11-13 Thread Apache Hudson Server

See 



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2052) Scan method signatures and add varargs where possible

2009-11-13 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2052:


Attachment: LUCENE-2052_fa.patch

uwe, here is one i found that got skipped in LUCENE-1987

> Scan method signatures and add varargs where possible
> -
>
> Key: LUCENE-2052
> URL: https://issues.apache.org/jira/browse/LUCENE-2052
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-2052.patch, LUCENE-2052.patch, 
> LUCENE-2052_fa.patch
>
>
> I changed a lot of signatures, but there may be more. The important ones like 
> MultiReader and MultiSearcher are already done. This applies also to contrib. 
> Varargs are no backwards break, they stay arrays as before.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2065) Java 5 port phase II

2009-11-13 Thread Kay Kay (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated LUCENE-2065:


Fix Version/s: 3.1

> Java 5 port phase II 
> -
>
> Key: LUCENE-2065
> URL: https://issues.apache.org/jira/browse/LUCENE-2065
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.1
> Environment: Java 5 
>Reporter: Kay Kay
> Fix For: 3.1
>
> Attachments: LUCENE-2065.patch
>
>
> LUCENE-1257 addresses the public API changes ( generics , mainly ) and other 
> j.u.c. package changes related to the API .  The changes are frozen and 
> closed for 3.0 . This would be a placeholder JIRA for 3.0+ version to address 
> the pending changes ( tests for generics etc.) and any other internal API 
> changes as necessary. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2065) Java 5 port phase II

2009-11-13 Thread Kay Kay (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated LUCENE-2065:


Attachment: LUCENE-2065.patch

across src/test

> Java 5 port phase II 
> -
>
> Key: LUCENE-2065
> URL: https://issues.apache.org/jira/browse/LUCENE-2065
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.1
> Environment: Java 5 
>Reporter: Kay Kay
> Attachments: LUCENE-2065.patch
>
>
> LUCENE-1257 addresses the public API changes ( generics , mainly ) and other 
> j.u.c. package changes related to the API .  The changes are frozen and 
> closed for 3.0 . This would be a placeholder JIRA for 3.0+ version to address 
> the pending changes ( tests for generics etc.) and any other internal API 
> changes as necessary. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2066) Add Highlighter test for RegexQuery

2009-11-13 Thread Mark Miller (JIRA)

Add Highlighter test for RegexQuery
---

 Key: LUCENE-2066
 URL: https://issues.apache.org/jira/browse/LUCENE-2066
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Reporter: Mark Miller
Priority: Minor




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1281#action_1281
 ] 

Mark Miller edited comment on LUCENE-2064 at 11/14/09 1:05 AM:
---

bq.  Introducing dependencies on other contribs is not feasible just for being 
supported by the highlighter.

Oh its feasible :) We already depend on the only contrib that currently has a 
multiterm query - regex - and memory index. But it looks like the regex 
dependency snuck in there while working on the spanregexquery support - I don't 
think its actually needed anymore - we should remove it. Its only a build 
dependency, so its not actually a big deal - just annoying if it happened to 
keep growing.

*edit*

Hmm - actually, it looks like we can't avoid those dependencies after all - not 
if we want to test those queries - looks like the contrib dependency on regex 
stays anyway. Forgot its just there for the tests now.

  was (Author: markrmil...@gmail.com):
bq.  Introducing dependencies on other contribs is not feasible just for 
being supported by the highlighter.

Oh its feasible :) We already depend on the only contrib that currently has a 
multiterm query - regex - and memory index. But it looks like the regex 
dependency snuck in there while working on the spanregexquery support - I don't 
think its actually needed anymore - we should remove it. Its only a build 
dependency, so its not actually a big deal - just annoying if it happened to 
keep growing.
  
> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.patch, LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1282#action_1282
 ] 

Mark Miller commented on LUCENE-2064:
-

Nice Uwe! - good idea.

> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.patch, LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1281#action_1281
 ] 

Mark Miller commented on LUCENE-2064:
-

bq.  Introducing dependencies on other contribs is not feasible just for being 
supported by the highlighter.

Oh its feasible :) We already depend on the only contrib that currently has a 
multiterm query - regex - and memory index. But it looks like the regex 
dependency snuck in there while working on the spanregexquery support - I don't 
think its actually needed anymore - we should remove it. Its only a build 
dependency, so its not actually a big deal - just annoying if it happened to 
keep growing.

> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.patch, LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2064:
--

Attachment: LUCENE-2064.patch

Here the solution with empty MemoryIndex. This seems to be the quickest 
solution.

> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.patch, LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1276#action_1276
 ] 

Mark Miller commented on LUCENE-2064:
-

As I said, thinking about it, I don't think we can end up fixing it in a better 
way. We can't force older impls out there to implement what we need - sure we 
can fix it in core easy enough, but its a real hassle to do this in another way 
that doesnt require outside multitermquery impls to change - we are going to 
have to fall back to this anyway with any current plans. So might as well nix 
those plans for now. I'd prefer our "futurebetterhighlighter" prompt any 
changes that require so much hassle. Its prob best just to stick with this 
method.

I'd just make it so the rest of the IndexReader methods act as if the thing is 
empty - letting it throw a null pointer exception and catching it makes those 
impls unhighlightable when they likely could be.

> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1274#action_1274
 ] 

Uwe Schindler commented on LUCENE-2064:
---

bq. I agree with the exception, but which reader are you talking about. 

You are right, there is no IR available when instantiating (Smart)FakeReader. 
So catching the NPE is the only way, or implement all methods of IR to return 
something valid / use an empty static final unmodifiable MemoryIndex as 
delegate of the FakeReader.

> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2047) IndexWriter should immediately resolve deleted docs to docID in near-real-time mode

2009-11-13 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1273#action_1273
 ] 

Jason Rutherglen commented on LUCENE-2047:
--

Also, if the per SR delete queue were implemented, we could expose
the callback, and allow users to delete by doc id, edit norms
(and in the future, update field caches) for a particular
IndexReader.  We'd pass the reader via a callback that resembles
IndexReaderWarmer, then deletes, norms updates, etc, could be
performed like they can be today with a non-readonly IR.

> IndexWriter should immediately resolve deleted docs to docID in 
> near-real-time mode
> ---
>
> Key: LUCENE-2047
> URL: https://issues.apache.org/jira/browse/LUCENE-2047
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2047.patch, LUCENE-2047.patch
>
>
> Spinoff from LUCENE-1526.
> When deleteDocuments(Term) is called, we currently always buffer the
> Term and only later, when it's time to flush deletes, resolve to
> docIDs.  This is necessary because we don't in general hold
> SegmentReaders open.
> But, when IndexWriter is in NRT mode, we pool the readers, and so
> deleting in the foreground is possible.
> It's also beneficial, in that in can reduce the turnaround time when
> reopening a new NRT reader by taking this resolution off the reopen
> path.  And if multiple threads are used to do the deletion, then we
> gain concurrency, vs reopen which is not concurrent when flushing the
> deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1269#action_1269
 ] 

Simon Willnauer commented on LUCENE-2064:
-

As I stated above, i see this as a neat workaround until we fix this class / 
contrib eventually. It won't hurt performance or breaks any compatibility, its 
hidden deep inside the abysses of Highlighter. Most importantly it adds little 
functionality to the highlighter component which I believe a lot of people 
still using.

I will add another patch tomorrow which catches the exception.

> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1264#action_1264
 ] 

Simon Willnauer commented on LUCENE-2064:
-

bq. Or simply use the passed in reader as delegate of FakeReader, then it will 
behave correctly for all methods.

I agree with the exception, but which reader are you talking about. 

Btw. I was close to name it SmartFakeReader :D

> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1262#action_1262
 ] 

Mark Miller commented on LUCENE-2064:
-

I still think getFields on multitermquery is a better option than a Fieldable 
interface. But if we would drop back to this method anyway, I see no reason to 
anything with field and multitermquery at all really - unless another use case 
prompts it.

> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1256#action_1256
 ] 

Uwe Schindler edited comment on LUCENE-2064 at 11/14/09 12:12 AM:
--

One comment to the patch:
If a MTQ subclass does something special during rewrite or in its 
FilteredTermEnum and calls any other method of FakeReader, it throws NPE. You 
should catch this Exception and in this case fall back to extract no terms.

*EDIT*

Or simply use the passed in reader as delegate of FakeReader, then it will 
behave correctly for all methods.

  was (Author: thetaphi):
One comment to the patch:
If a MTQ subclass does something special during rewrite or in its 
FilteredTermEnum and calls any other method of FakeReader, it throws NPE. You 
should catch this Exception and in this case fall back to extract no terms.
  
> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1256#action_1256
 ] 

Uwe Schindler commented on LUCENE-2064:
---

One comment to the patch:
If a MTQ subclass does something special during rewrite or in its 
FilteredTermEnum and calls any other method of FakeReader, it throws NPE. You 
should catch this Exception and in this case fall back to extract no terms.

> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2064) Highlighter should support all MultiTermQuery subclasses without casts

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2064:
--

Summary: Highlighter should support all MultiTermQuery subclasses without 
casts  (was: Highlighter should support all MultiFieldQuery subclasses without 
casts)

Maybe we should add this patch for 3.0 to not break anything after upgrading to 
3.0. As it is completely internal in Highlighter, it would not break anything. 
Requiring a method in 3.0, whcih should be 2.9 compatible and no new 
functionality would be not good.

In 3.1 we could add a Fieldable interface that defines getField and eerybody 
(could) implement it. If not, we could still use this fallback.

> Highlighter should support all MultiTermQuery subclasses without casts
> --
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader... readers)

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1589:
--

Summary: IndexWriter.addIndexesNoOptimize(IndexReader... readers)  (was: 
IndexWriter.addIndexesNoOptimize(IndexReader[] readers))

Updated for Java 5 and consistency in IndexWriter

> IndexWriter.addIndexesNoOptimize(IndexReader... readers)
> 
>
> Key: LUCENE-1589
> URL: https://issues.apache.org/jira/browse/LUCENE-1589
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1589.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Similar to IndexWriter.addIndexesNoOptimize(Directory[] dirs)
> but for IndexReaders. This will be used to flush cloned ram
> indexes to disk for near realtime indexing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2052) Scan method signatures and add varargs where possible

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2052:
--

Attachment: LUCENE-2052.patch

Updated patch, as there may be a backwards comp problem (but only if you 
recompile!) if you try to override a varargs method with an array param (which 
is not possible). Removed the varargs from docFreq(Term[]) because of that 
again.

Also added changes.txt.

> Scan method signatures and add varargs where possible
> -
>
> Key: LUCENE-2052
> URL: https://issues.apache.org/jira/browse/LUCENE-2052
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-2052.patch, LUCENE-2052.patch
>
>
> I changed a lot of signatures, but there may be more. The important ones like 
> MultiReader and MultiSearcher are already done. This applies also to contrib. 
> Varargs are no backwards break, they stay arrays as before.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1488) multilingual analyzer based on icu

2009-11-13 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1488:


Lucene Fields: [New, Patch Available]  (was: [New])
Fix Version/s: 3.1
 Assignee: Robert Muir
   Issue Type: New Feature  (was: Wish)
  Summary: multilingual analyzer based on icu  (was: issues with 
standardanalyzer on multilingual text)

setting a fix version, setting a correct description of the issue

> multilingual analyzer based on icu
> --
>
> Key: LUCENE-1488
> URL: https://issues.apache.org/jira/browse/LUCENE-1488
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: ICUAnalyzer.patch, LUCENE-1488.patch, LUCENE-1488.patch, 
> LUCENE-1488.patch, LUCENE-1488.txt, LUCENE-1488.txt
>
>
> The standard analyzer in lucene is not exactly unicode-friendly with regards 
> to breaking text into words, especially with respect to non-alphabetic 
> scripts.  This is because it is unaware of unicode bounds properties.
> I actually couldn't figure out how the Thai analyzer could possibly be 
> working until i looked at the jflex rules and saw that codepoint range for 
> most of the Thai block was added to the alphanum specification. defining the 
> exact codepoint ranges like this for every language could help with the 
> problem but you'd basically be reimplementing the bounds properties already 
> stated in the unicode standard. 
> in general it looks like this kind of behavior is bad in lucene for even 
> latin, for instance, the analyzer will break words around accent marks in 
> decomposed form. While most latin letter + accent combinations have composed 
> forms in unicode, some do not. (this is also an issue for asciifoldingfilter 
> i suppose). 
> I've got a partially tested standardanalyzer that uses icu Rule-based 
> BreakIterator instead of jflex. Using this method you can define word 
> boundaries according to the unicode bounds properties. After getting it into 
> some good shape i'd be happy to contribute it for contrib but I wonder if 
> theres a better solution so that out of box lucene will be more friendly to 
> non-ASCII text. Unfortunately it seems jflex does not support use of these 
> properties such as [\p{Word_Break = Extend}] so this is probably the major 
> barrier.
> Thanks,
> Robert

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2052) Scan method signatures and add varargs where possible

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2052:
--

Attachment: LUCENE-2052.patch

Here a patch that adds more varargs where it makes sense (e.g. MultiSearcher 
ctor to pass Searchables, adding more than one sub query, merge boolean queries 
and so on - everywhere, where the array is not the meaning but more a unlimited 
list of parameters).

If somebody finds something in addition, speak load!

> Scan method signatures and add varargs where possible
> -
>
> Key: LUCENE-2052
> URL: https://issues.apache.org/jira/browse/LUCENE-2052
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-2052.patch
>
>
> I changed a lot of signatures, but there may be more. The important ones like 
> MultiReader and MultiSearcher are already done. This applies also to contrib. 
> Varargs are no backwards break, they stay arrays as before.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Junit4

2009-11-13 Thread Erick Erickson

OK thanks for adding me to the ACL. I'll have it tomorrow sometime. Does
anyone object to deprecating LuceneTestCase with notations to use
LuceneTestCaseJ4?

I tried two approaches, both work. Both allow you to use LuceneTestCaseJ4
rather than LuceneTestCase as a superclass, with the caveat you have to use
the proper annotations with the J4 variant.

The difference is that for one approach, I copied LuceneTestCase to
LuceneTestCaseJ4 and hacked. The other approach was extracting the meat of
LuceneTestCase to a common class, and using that class as a member of both
variants, delegating to avoid code duplication.

Personally, I think it'll be cleanest to just clone LuceneTestCase and NOT
extract to common. Eventually LuceneTestCase will fade away, enhancements
should be made to the J4 variant as needed. But if folks have strong
opinions, let me know.

Best
Erick

On Fri, Nov 13, 2009 at 5:02 PM, Chris Hostetter
wrote:

> : putting too many irons in the fire, especially non-critical ones. I don't
> : see a way to assign it to myself, either I'm missing something or I'm
> just
> : underprivileged , so if someone would go ahead and assign it to me
> I'll
> : work on it post 3.0.
>
> Jira's ACLs prevent issues from being assigned to people who aren't listed
> in the "Contributors" group.  THe policy has been to add people to that
> list (for issue assignment) on request, so i hooked you up.
>
> (NOTE: if anyone else has issues they're actively working on and would
> like to be flagged as a "Contributor" in Jira so that the issues can be
> assigned directly to you for tracking purpose, please speak up)
>
>
>
> -Hoss
>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

[jira] Commented: (LUCENE-1257) Port to Java5

2009-11-13 Thread Kay Kay (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1231#action_1231
 ] 

Kay Kay commented on LUCENE-1257:
-

|  Further updates of tests and internal APIs may follow for 3.1 in a new issue

 LUCENE-2065 in place for 3.1 to address remaining changes.  

> Port to Java5
> -
>
> Key: LUCENE-1257
> URL: https://issues.apache.org/jira/browse/LUCENE-1257
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis, Examples, Index, Other, Query/Scoring, 
> QueryParser, Search, Store, Term Vectors
>Affects Versions: 3.0
>Reporter: Cédric Champeau
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: instantiated_fieldable.patch, 
> LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
> LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
> LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
> LUCENE-1257-CompoundFileReaderWriter.patch, 
> LUCENE-1257-ConcurrentMergeScheduler.patch, 
> LUCENE-1257-DirectoryReader.patch, 
> LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
> LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
> LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
> LUCENE-1257-IndexDeleter.patch, 
> LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
> LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
> LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
> LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
> LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_contrib_ant.patch, 
> LUCENE-1257_contrib_benchmark.patch, LUCENE-1257_contrib_benchmark_2.patch, 
> LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_contrib_memory.patch, 
> LUCENE-1257_contrib_misc.patch, LUCENE-1257_contrib_smartcn.patch, 
> LUCENE-1257_heavy.patch, LUCENE-1257_heavy.patch, 
> LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_lucil.patch, 
> LUCENE-1257_lucli.patch, LUCENE-1257_messages.patch, 
> LUCENE-1257_more_unnecessary_casts.patch, 
> LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, 
> LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_demo.patch, 
> LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
> LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_precendence_parser.patch, LUCENE-1257_queryParser_jj.patch, 
> LUCENE-1257_swing_wikipedia_wordnet_xmlqp.patch, 
> LUCENE-1257_unnecessary_casts.patch, LUCENE-1257_unnnecessary_casts_2.patch, 
> lucene1257surround1.patch, lucene1257surround1.patch, 
> shinglematrixfilter_generified.patch
>
>
> For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
> Java 5 migration had been planned for 2.1 someday in the past, but don't know 
> when it is planned now. This patch against the trunk includes :
> - most obvious generics usage (there are tons of usages of sets, ... Those 
> which are commonly used have been generified)
> - PriorityQueue generification
> - replacement of indexed for loops with for each constructs
> - removal of unnececessary unboxing
> The code is to my opinion much more readable with those features (you 
> actually *know* what is stored in collections reading the code, without the 
> need to lookup for field definitions everytime) and it simplifies many 
> algorithms.
> Note that this patch also includes an interface for the Query class. This has 
> been done for my company's needs for building custom Query classes which add 
> some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
> casts. I know this introduction is not wanted by the team, but it really 
> makes our developments easier to maintain. If you don't want to use this, 
> replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2065) Java 5 port phase II

2009-11-13 Thread Kay Kay (JIRA)

Java 5 port phase II 
-

 Key: LUCENE-2065
 URL: https://issues.apache.org/jira/browse/LUCENE-2065
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1
 Environment: Java 5 
Reporter: Kay Kay


LUCENE-1257 addresses the public API changes ( generics , mainly ) and other 
j.u.c. package changes related to the API .  The changes are frozen and closed 
for 3.0 . This would be a placeholder JIRA for 3.0+ version to address the 
pending changes ( tests for generics etc.) and any other internal API changes 
as necessary. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiFieldQuery subclasses without casts

2009-11-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1217#action_1217
 ] 

Mark Miller commented on LUCENE-2064:
-

It is clever - personally I think I'd prefer the getFields method - this is 
kind of a hack to get the field - though a pretty clever hack. I suppose we 
could make the argument that this can tide us over - but it will only take a 
couple of minutes to add getFields as well.

I think Simon may argue that this will work in more cases by default - where 
external queries would have to implement the getFeilds method. Which is a good 
point. Still would prefer something cleaner, but perhaps that makes this worth 
it nonetheless. It would prob make sense to fall back to this if getFields 
returned an empty set anyway - which almost makes it not even worth it to do 
getFields as things don't get any cleaner ...

We def want the multitermquery clone - thats for sure - Uwe recently mentioned 
that as well and I'd been meaning to get around to it myself.

> Highlighter should support all MultiFieldQuery subclasses without casts
> ---
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiFieldQuery subclasses without casts

2009-11-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1214#action_1214
 ] 

Uwe Schindler commented on LUCENE-2064:
---

bq. I think there was some discussion on the list about allow a getField() or 
something for multitermqueries, but in the meantime, this patch will work for 
now, and its internal to the highlighter so its not like it would have to be 
deprecated later.

We discussed on ApacheCon about that. My idea was a Fieldable interface that 
provides getField and the highlighter must only check instanceof Fieldable. And 
cloning is really the simpliest to create a copy of MTQ, as *all* Query 
instances are cloneable (because Query implements Cloneable)

> Highlighter should support all MultiFieldQuery subclasses without casts
> ---
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2064) Highlighter should support all MultiFieldQuery subclasses without casts

2009-11-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777686#action_12777686
 ] 

Uwe Schindler edited comment on LUCENE-2064 at 11/13/09 10:42 PM:
--

This is cool.

Highlighter & MTQ is broken in 2.9.1. This patch looks completely broken, but 
it isn't - and my mind was also broken when I first saw the patch - because of 
that. This patch is cooler than all heavy commiting during ApacheCon.

+1 for 3.0 with this patch.That was what I wanted to say with my complete 
nonsense comment.

  was (Author: thetaphi):
This is cool.

Highlighter & MTQ is broken in 2.9.1. This patch looks completely broken - and 
my mind was also broken when I first saw the patch - because of that. This 
patch is cooler than all heavy commiting during ApacheCon.

+1 for 3.0 with this patch.That was what I wanted to say with my complete 
nonsense comment.
  
> Highlighter should support all MultiFieldQuery subclasses without casts
> ---
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2064) Highlighter should support all MultiFieldQuery subclasses without casts

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2064:
--

Comment: was deleted

(was: This is cool.

Highlighter & MTQ is broken in 2.9.1 - why not put this into 3.0 it is not more 
broken than highlighter at all, but it would prevent people from more broken in 
3.0.

Thinking about this patch, but it should really make it into 3.0, because it 
would help people, that migrated to 3.0, and highlighter would be broken more 
without it.

+1 for 3.0)

> Highlighter should support all MultiFieldQuery subclasses without casts
> ---
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2064) Highlighter should support all MultiFieldQuery subclasses without casts

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2064:
--

Fix Version/s: (was: 3.1)
   3.0

Sorry I was somehow like completely drunk I laughed so much about this 
patch! Only UnexpectedSuccessException is missing in it.

> Highlighter should support all MultiFieldQuery subclasses without casts
> ---
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.0
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Junit4

2009-11-13 Thread Chris Hostetter

: putting too many irons in the fire, especially non-critical ones. I don't
: see a way to assign it to myself, either I'm missing something or I'm just
: underprivileged , so if someone would go ahead and assign it to me I'll
: work on it post 3.0.

Jira's ACLs prevent issues from being assigned to people who aren't listed 
in the "Contributors" group.  THe policy has been to add people to that 
list (for issue assignment) on request, so i hooked you up.

(NOTE: if anyone else has issues they're actively working on and would 
like to be flagged as a "Contributor" in Jira so that the issues can be 
assigned directly to you for tracking purpose, please speak up)



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiFieldQuery subclasses without casts

2009-11-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777686#action_12777686
 ] 

Uwe Schindler commented on LUCENE-2064:
---

This is cool.

Highlighter & MTQ is broken in 2.9.1. This patch looks completely broken - and 
my mind was also broken when I first saw the patch - because of that. This 
patch is cooler than all heavy commiting during ApacheCon.

+1 for 3.0 with this patch.That was what I wanted to say with my complete 
nonsense comment.

> Highlighter should support all MultiFieldQuery subclasses without casts
> ---
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.1
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2037) Allow Junit4 tests in our environment.

2009-11-13 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned LUCENE-2037:


Assignee: Erick Erickson

It's all yours Erick.

> Allow Junit4 tests in our environment.
> --
>
> Key: LUCENE-2037
> URL: https://issues.apache.org/jira/browse/LUCENE-2037
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Affects Versions: 3.1
> Environment: Development
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Minor
> Fix For: 3.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate 
> Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should 
> have to be rewritten. We should start this for the 3.1 release so we can get 
> a clean 3.0 out smoothly.
> It's probably worthwhile to convert a small set of tests as an exemplar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiFieldQuery subclasses without casts

2009-11-13 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777684#action_12777684
 ] 

Robert Muir commented on LUCENE-2064:
-

Uwe, I agree, I think you should set to 3.0

I think there was some discussion on the list about allow a getField() or 
something for multitermqueries, but in the meantime, this patch will work for 
now, and its internal to the highlighter so its not like it would have to be 
deprecated later.


> Highlighter should support all MultiFieldQuery subclasses without casts
> ---
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.1
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2064) Highlighter should support all MultiFieldQuery subclasses without casts

2009-11-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777682#action_12777682
 ] 

Uwe Schindler commented on LUCENE-2064:
---

This is cool.

Highlighter & MTQ is broken in 2.9.1 - why not put this into 3.0 it is not more 
broken than highlighter at all, but it would prevent people from more broken in 
3.0.

Thinking about this patch, but it should really make it into 3.0, because it 
would help people, that migrated to 3.0, and highlighter would be broken more 
without it.

+1 for 3.0

> Highlighter should support all MultiFieldQuery subclasses without casts
> ---
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.1
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2039) Regex support and beyond in JavaCC QueryParser

2009-11-13 Thread Adriano Crestani (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777667#action_12777667
 ] 

Adriano Crestani commented on LUCENE-2039:
--

{quote}
This is pretty much what I suggested above. We can extend the queryparser 
without breaking the backwards compatibility just by adding some code which is 
aware of the fieldname scheme. Even this could be extendable. FieldNames are 
terms and therefore they can not contain unescaped special chars like : { ] ... 
I would not even hard code the separator into the query parser but have the 
field name processed by something pluggable. So If somebody wants to have a 
regex extension they could use re\:field: or re\:: or re_field:
Escaping a field is easy, just like you would do it with a term.
More interesting is that we do not change any syntax, no special character but 
we can add a default implementation with a default implementation for 
extensions. This could be a whole API which takes are of creating and escaping 
the field name, building the query once it is passed to the extension etc.
In a first step we can resolve the extension the second step calls the 
extension and build the query. If no extension is registered the query parser 
works like in previous versions so it is all up to the user.
{quote}

+1 :)

{quote}
I agree with you that we should wrap the information in a class so that we do 
not need to change the method signature if something has to be changed in the 
future. Instead we just add a new member to the wrapper though.
{quote}

A Map should solve this problem

> Regex support and beyond in JavaCC QueryParser
> --
>
> Key: LUCENE-2039
> URL: https://issues.apache.org/jira/browse/LUCENE-2039
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Reporter: Simon Willnauer
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2039.patch
>
>
> Since the early days the standard query parser was limited to the queries 
> living in core, adding other queries or extending the parser in any way 
> always forced people to change the grammar file and regenerate. Even if you 
> change the grammar you have to be extremely careful how you modify the parser 
> so that other parts of the standard parser are affected by customisation 
> changes. Eventually you had to live with all the limitation the current 
> parser has like tokenizing on whitespaces before a tokenizer / analyzer has 
> the chance to look at the tokens. 
> I was thinking about how to overcome the limitation and add regex support to 
> the query parser without introducing any dependency to core. I added a new 
> special character that basically prevents the parser from interpreting any of 
> the characters enclosed in the new special characters. I choose the forward 
> slash  '/' as the delimiter so that everything in between two forward slashes 
> is basically escaped and ignored by the parser. All chars embedded within 
> forward slashes are treated as one token even if it contains other special 
> chars like * []?{} or whitespaces. This token is subsequently passed to a 
> pluggable "parser extension" with builds a query from the embedded string. I 
> do not interpret the embedded string in any way but leave all the subsequent 
> work to the parser extension. Such an extension could be another full 
> featured query parser itself or simply a ctor call for regex query. The 
> interface remains quiet simple but makes the parser extendible in an easy way 
> compared to modifying the javaCC sources.
> The downsides of this patch is clearly that I introduce a new special char 
> into the syntax but I guess that would not be that much of a deal as it is 
> reflected in the escape method though. It would truly be nice to have more 
> than once extension an have this even more flexible so treat this patch as a 
> kickoff though.
> Another way of solving the problem with RegexQuery would be to move the JDK 
> version of regex into the core and simply have another method like:
> {code}
> protected Query newRegexQuery(Term t) {
>   ... 
> }
> {code}
> which I would like better as it would be more consistent with the idea of the 
> query parser to be a very strict and defined parser.
> I will upload a patch in a second which implements the extension based 
> approach I guess I will add a second patch with regex in core soon too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2063) Use thread pool in ConcurrentMergeScheduler

2009-11-13 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777666#action_12777666
 ] 

Jason Rutherglen commented on LUCENE-2063:
--

Right, this isn't the highest priority, however try running the
LUCENE-1313 and you'll see a lot of unique thread objects being
created because of the small segment merges happening quickly on
the RAMDir. 

It can also potentially be used for LUCENE-2047, or a follow on
async patch.

> Use thread pool in ConcurrentMergeScheduler
> ---
>
> Key: LUCENE-2063
> URL: https://issues.apache.org/jira/browse/LUCENE-2063
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.9.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 3.1
>
>
> Currently it looks like CMS creates a new thread object for each
> merge, which may not be expensive anymore on Java5+ JVMs,
> however we can fairly simply implement the Java5 thread pooling.
> Also I'm thinking we may be interested in using thread pools for
> other tasks in IndexWriter (such as LUCENE-2047 performing
> deletes in the background). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2064) Highlighter should support all MultiFieldQuery subclasses without casts

2009-11-13 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2064:


Attachment: LUCENE-2064.txt

This is the patch - please let me know if I miss something especially related 
to the removed copyMultiTermQuery method which I replaced with a clone call.  - 
All tests pass.

 

> Highlighter should support all MultiFieldQuery subclasses without casts
> ---
>
> Key: LUCENE-2064
> URL: https://issues.apache.org/jira/browse/LUCENE-2064
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 3.1
>
> Attachments: LUCENE-2064.txt
>
>
> In order to support MultiTermQuery subclasses the Highlighter component 
> applies instanceof checks for concrete classes from the lucene core. This 
> prevents classes like RegexQuery in contrib from being supported. Introducing 
> dependencies on other contribs is not feasible just for being supported by 
> the highlighter.
> While the instanceof checks and subsequent casts might hopefully go somehow 
> away  in the future but for supporting more multterm queries I have a 
> alternative approach using a fake IndexReader that uses a RewriteMethod to 
> force the MTQ to pass the field name to the given reader without doing any 
> real work. It is easier to explain once you see the patch - I will upload 
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2064) Highlighter should support all MultiFieldQuery subclasses without casts

2009-11-13 Thread Simon Willnauer (JIRA)

Highlighter should support all MultiFieldQuery subclasses without casts
---

 Key: LUCENE-2064
 URL: https://issues.apache.org/jira/browse/LUCENE-2064
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Affects Versions: 2.9.1
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.1


In order to support MultiTermQuery subclasses the Highlighter component applies 
instanceof checks for concrete classes from the lucene core. This prevents 
classes like RegexQuery in contrib from being supported. Introducing 
dependencies on other contribs is not feasible just for being supported by the 
highlighter.

While the instanceof checks and subsequent casts might hopefully go somehow 
away  in the future but for supporting more multterm queries I have a 
alternative approach using a fake IndexReader that uses a RewriteMethod to 
force the MTQ to pass the field name to the given reader without doing any real 
work. It is easier to explain once you see the patch - I will upload shortly.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2063) Use thread pool in ConcurrentMergeScheduler

2009-11-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777645#action_12777645
 ] 

Simon Willnauer commented on LUCENE-2063:
-

One of the issues with this is that the code depends on Thread instead of 
Runnable / Callable. I have looked into this class too and figured that simply 
exchanging the merger threads with runnables would not work. 
A way of doing this is to subclass ExecutorService where we can expose the 
required functionality on a pooled basis.
Beside this I'm not sure if there is any benefit performance / resource wise as 
merge threads are not needed as frequently as other threads. Threadpools make a 
lot of sense  once you use threads frequently but keeping those merge threads 
around might not be required. Not sure if this is really needed though.

simon

> Use thread pool in ConcurrentMergeScheduler
> ---
>
> Key: LUCENE-2063
> URL: https://issues.apache.org/jira/browse/LUCENE-2063
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.9.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 3.1
>
>
> Currently it looks like CMS creates a new thread object for each
> merge, which may not be expensive anymore on Java5+ JVMs,
> however we can fairly simply implement the Java5 thread pooling.
> Also I'm thinking we may be interested in using thread pools for
> other tasks in IndexWriter (such as LUCENE-2047 performing
> deletes in the background). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2047) IndexWriter should immediately resolve deleted docs to docID in near-real-time mode

2009-11-13 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777638#action_12777638
 ] 

Jason Rutherglen commented on LUCENE-2047:
--

bq. It's strange that anything here is needed

I was obtaining the segment infos synced, had a small block of
unsynced code, then synced obtaining the sometimes defunct
readers. Fixed that part, then the errors went away!

bq. the sync(IW) is in fact necessary? 

I'm hoping we can do the deletes unsynced, which will make this
patch a net performance gain because we're allowing multiple
threads to delete concurrently (whereas today we're performing
them synced at flush time, i.e. the current patch is merely
shifting the term/query lookup cost from flush to
deleteDocument).

bq. buffer the deleted docIDs into DW's deletesInRAM.docIDs

I'll need to step through this, as it's a little strange to me
how DW knows the doc id to cache for a particular SR, i.e. how
are they mapped to an SR? Oh there's the DW.remapDeletes method?
Hmm...

Couldn't we save off a per SR BV for the update doc rollback
case, merging the special updated doc BV into the SR's deletes
on successful flush, throwing them away on failure?  Memory is
less of a concern with the paged BV from the pending LUCENE-1526
patch.  On a delete by query with many hits, I'm concerned about
storing too many doc id Integers in BufferedDeletes. 

Without syncing, new deletes could arrive, and we'd need to
queue them, and apply them to new segments, or newly merged
segments because we're not locking the segments. Otherwise some
deletes could be lost. 

A possible solution is, deleteDocument would synchronously add
the delete query/term to a queue per SR and return.
Asynchronously (i.e. in background threads) the deletes could be
applied. Merging would aggregate the incoming SR's queued
deletes (as they haven't been applied yet) into the merged
reader's delete queue. On flush we'd wait for these queued
deletes to be applied.  After flush, the queues would be clear
and we'd start over.  And because the delete queue is per reader,
it would be thrown away with the closed reader. 

> IndexWriter should immediately resolve deleted docs to docID in 
> near-real-time mode
> ---
>
> Key: LUCENE-2047
> URL: https://issues.apache.org/jira/browse/LUCENE-2047
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2047.patch, LUCENE-2047.patch
>
>
> Spinoff from LUCENE-1526.
> When deleteDocuments(Term) is called, we currently always buffer the
> Term and only later, when it's time to flush deletes, resolve to
> docIDs.  This is necessary because we don't in general hold
> SegmentReaders open.
> But, when IndexWriter is in NRT mode, we pool the readers, and so
> deleting in the foreground is possible.
> It's also beneficial, in that in can reduce the turnaround time when
> reopening a new NRT reader by taking this resolution off the reopen
> path.  And if multiple threads are used to do the deletion, then we
> gain concurrency, vs reopen which is not concurrent when flushing the
> deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2063) Use thread pool in ConcurrentMergeScheduler

2009-11-13 Thread Jason Rutherglen (JIRA)

Use thread pool in ConcurrentMergeScheduler
---

 Key: LUCENE-2063
 URL: https://issues.apache.org/jira/browse/LUCENE-2063
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.9.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 3.1


Currently it looks like CMS creates a new thread object for each
merge, which may not be expensive anymore on Java5+ JVMs,
however we can fairly simply implement the Java5 thread pooling.
Also I'm thinking we may be interested in using thread pools for
other tasks in IndexWriter (such as LUCENE-2047 performing
deletes in the background). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2062) Bulgarian Analyzer

2009-11-13 Thread Robert Muir (JIRA)

Bulgarian Analyzer
--

 Key: LUCENE-2062
 URL: https://issues.apache.org/jira/browse/LUCENE-2062
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Fix For: 3.1
 Attachments: LUCENE-2062.patch

someone asked about bulgarian analysis on solr-user today... 
http://www.lucidimagination.com/search/document/e1e7a5636edb1db2/non_english_languages
I was surprised we did not have anything.

This analyzer implements the algorithm specified here, 
http://members.unine.ch/jacques.savoy/Papers/BUIR.pdf

In the measurements there, this improves MAP approx 34%


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2062) Bulgarian Analyzer

2009-11-13 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2062:


Attachment: LUCENE-2062.patch

> Bulgarian Analyzer
> --
>
> Key: LUCENE-2062
> URL: https://issues.apache.org/jira/browse/LUCENE-2062
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2062.patch
>
>
> someone asked about bulgarian analysis on solr-user today... 
> http://www.lucidimagination.com/search/document/e1e7a5636edb1db2/non_english_languages
> I was surprised we did not have anything.
> This analyzer implements the algorithm specified here, 
> http://members.unine.ch/jacques.savoy/Papers/BUIR.pdf
> In the measurements there, this improves MAP approx 34%

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2024) "ant dist" no longer generates md5's for the top-level artifacts

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2024.
---

Resolution: Fixed

Committed revision: 835889

Thanks @ all!
On ApacheCon, the others told me to open an issue in INFRA to get an account, I 
could then also take the clover update. I'll try.

> "ant dist" no longer generates md5's for the top-level artifacts
> 
>
> Key: LUCENE-2024
> URL: https://issues.apache.org/jira/browse/LUCENE-2024
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.9.1, 2.9.2, 3.0
>Reporter: Michael McCandless
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2024.patch, LUCENE-2024.patch
>
>
> Mark hit this for 2.9.0, and I just hit it again for 2.9.1.  It used to 
> work...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2024) "ant dist" no longer generates md5's for the top-level artifacts

2009-11-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777530#action_12777530
 ] 

Michael McCandless commented on LUCENE-2024:


OK looks like hudson is using ant 1.7.1.  So we're good!

> "ant dist" no longer generates md5's for the top-level artifacts
> 
>
> Key: LUCENE-2024
> URL: https://issues.apache.org/jira/browse/LUCENE-2024
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.9.1, 2.9.2, 3.0
>Reporter: Michael McCandless
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2024.patch, LUCENE-2024.patch
>
>
> Mark hit this for 2.9.0, and I just hit it again for 2.9.1.  It used to 
> work...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance

2009-11-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2061:
---

Attachment: LUCENE-2061.patch

Attached first cut python script nrtBench.py.

You have to edit the constants up top, to point to both Wiki XML
export and a Wiki line file.  It use the XML export to build up the
base index, and then the line file to do the "live" indexing.

It first runs a baseline, redline searching with 9 (default) threads,
and reports the net qps.  (You'll have to write a queries.txt w/ the
queries to test).  Then it steps through NRT reopen rates of every
0.1, 1.0, 2.5, 5.0 seconds X indexing rate of 1, 10, 100, 1000 per sec
(using 2 indexing threads), and then redlines the search threads,
comparing their search throughput to the baseline.


> Create benchmark & approach for testing Lucene's near real-time performance
> ---
>
> Key: LUCENE-2061
> URL: https://issues.apache.org/jira/browse/LUCENE-2061
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-2061.patch
>
>
> With the improvements to contrib/benchmark in LUCENE-2050, it's now
> possible to create compelling algs to test indexing & searching
> throughput against a periodically reopened near-real-time reader from
> the IndexWriter.
> Coming out of the discussions in LUCENE-1526, I think to properly
> characterize NRT, we should measure net search throughput as a
> function of both reopen rate (ie how often you get a new NRT reader
> from the writer) and indexing rate.  We should also separately measure
> pure adds vs updates (deletes + adds); the latter is much more work
> for Lucene.
> This can help apps make capacity decisions... and can help us test
> performance of pending improvements for NRT (eg LUCENE-1313,
> LUCENE-2047).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2024) "ant dist" no longer generates md5's for the top-level artifacts

2009-11-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777519#action_12777519
 ] 

Uwe Schindler commented on LUCENE-2024:
---

bq. But: what version is installed on hudson?

Can you check this for me, I still have no hudson account... :-( Thanks!

> "ant dist" no longer generates md5's for the top-level artifacts
> 
>
> Key: LUCENE-2024
> URL: https://issues.apache.org/jira/browse/LUCENE-2024
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.9.1, 2.9.2, 3.0
>Reporter: Michael McCandless
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2024.patch, LUCENE-2024.patch
>
>
> Mark hit this for 2.9.0, and I just hit it again for 2.9.1.  It used to 
> work...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance

2009-11-13 Thread Michael McCandless (JIRA)

Create benchmark & approach for testing Lucene's near real-time performance
---

 Key: LUCENE-2061
 URL: https://issues.apache.org/jira/browse/LUCENE-2061
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor


With the improvements to contrib/benchmark in LUCENE-2050, it's now
possible to create compelling algs to test indexing & searching
throughput against a periodically reopened near-real-time reader from
the IndexWriter.

Coming out of the discussions in LUCENE-1526, I think to properly
characterize NRT, we should measure net search throughput as a
function of both reopen rate (ie how often you get a new NRT reader
from the writer) and indexing rate.  We should also separately measure
pure adds vs updates (deletes + adds); the latter is much more work
for Lucene.

This can help apps make capacity decisions... and can help us test
performance of pending improvements for NRT (eg LUCENE-1313,
LUCENE-2047).



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2024) "ant dist" no longer generates md5's for the top-level artifacts

2009-11-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777517#action_12777517
 ] 

Michael McCandless commented on LUCENE-2024:


bq. Mark/Mike, can you test this on your computers?

It worked!  And I also just verified on trunk "ant dist" fails to produce the 
sha1/md5 digests.

bq. Is this a problem? 1.7.0 is now also 2 years old and as we go to Java 1.5 
we could also raise our build requirements.

I think this is reasonable.

But: what version is installed on hudson?

> "ant dist" no longer generates md5's for the top-level artifacts
> 
>
> Key: LUCENE-2024
> URL: https://issues.apache.org/jira/browse/LUCENE-2024
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.9.1, 2.9.2, 3.0
>Reporter: Michael McCandless
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2024.patch, LUCENE-2024.patch
>
>
> Mark hit this for 2.9.0, and I just hit it again for 2.9.1.  It used to 
> work...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2024) "ant dist" no longer generates md5's for the top-level artifacts

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2024:
--

Attachment: LUCENE-2024.patch

The file output pattern needs ANT 1.7.0. Attached is a patch that raises the 
version number in BUILD.txt, too (the website also needs to be updated).

Is this a problem? 1.7.0 is now also 2 years old and as we go to Java 1.5 we 
could also raise our build requirements.

> "ant dist" no longer generates md5's for the top-level artifacts
> 
>
> Key: LUCENE-2024
> URL: https://issues.apache.org/jira/browse/LUCENE-2024
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.9.1, 2.9.2, 3.0
>Reporter: Michael McCandless
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2024.patch, LUCENE-2024.patch
>
>
> Mark hit this for 2.9.0, and I just hit it again for 2.9.1.  It used to 
> work...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2050) Improve contrib/benchmark for testing near-real-time search performance

2009-11-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2050:
---

Attachment: LUCENE-2050.patch

Woops, last patch was missing added files -- this one should fix it.

> Improve contrib/benchmark for testing near-real-time search performance
> ---
>
> Key: LUCENE-2050
> URL: https://issues.apache.org/jira/browse/LUCENE-2050
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2050.patch, LUCENE-2050.patch, LUCENE-2050.patch
>
>
> It's not easy to test NRT performance right now w/ contrib/benchmark.
> I've made some initial fixes to improve this:
>   * Added new '&', that can follow any task within a serial sequence,
> to "background" the task (just like a shell).  The test runs in
> the BG, and then at the end of all serial tasks, any still running
> BG tasks are stopped & joined.
>   * Added WaitTask that simply waits; useful for controlling how long
> the BG'd tasks get to run.
>   * Added RollbackIndex task, which is real handy for using a given
> index for an NRT test, doing a bunch of updates, then reverting it
> all so your next run uses the same starting index
>   * Fixed the existing NearRealTimeReaderTask to simply periodically
> open the new reader (previously it was also running a fixed
> search), and removed its own threading (since & can do that
> now). It periodically wakes up, opens the new reader, and swaps it
> into the PerfRunData, at the schedule you specify.  I switched all
> usage of PerfRunData's get/setIndexReader APIs to use ref
> counting.
> With these changes you can now make some very simple but powerful
> algs, eg:
> {code}
> OpenIndex
> {
>   NearRealtimeReader(0.5) &
>   # Warm
>   Search
>   { "Index1" AddDoc > : * : 100/sec &
>   [ { "Search" Search > : * ] : 4 &
>   Wait(30.0)
> }
> CloseReader
> RollbackIndex
> RepSumByName
> {code}
> This alg first opens the IndexWriter, then spawns the BG thread to
> reopen the NRT reader twice per second, does one warming Search (in
> the FG), spans a new thread to index documents at the rate of 100 per
> second, then spawns 4 search threads that do as many searches as they
> can.  We then wait for 30 seconds, then stop all the threads, revert
> the index, and report.
> The patch is a work in progress -- it generally works, but there're a
> few nocommits, and, we may want to improve reporting (though I think
> that's a separate issue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2050) Improve contrib/benchmark for testing near-real-time search performance

2009-11-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2050:
---

Attachment: LUCENE-2050.patch

Patch attached.  I think it's ready to commit.

I also have a Python script that tests NRT performance, sequencing through 
combinations of reopen rate X indexing rate; I'll open a new issue for that.

> Improve contrib/benchmark for testing near-real-time search performance
> ---
>
> Key: LUCENE-2050
> URL: https://issues.apache.org/jira/browse/LUCENE-2050
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2050.patch, LUCENE-2050.patch
>
>
> It's not easy to test NRT performance right now w/ contrib/benchmark.
> I've made some initial fixes to improve this:
>   * Added new '&', that can follow any task within a serial sequence,
> to "background" the task (just like a shell).  The test runs in
> the BG, and then at the end of all serial tasks, any still running
> BG tasks are stopped & joined.
>   * Added WaitTask that simply waits; useful for controlling how long
> the BG'd tasks get to run.
>   * Added RollbackIndex task, which is real handy for using a given
> index for an NRT test, doing a bunch of updates, then reverting it
> all so your next run uses the same starting index
>   * Fixed the existing NearRealTimeReaderTask to simply periodically
> open the new reader (previously it was also running a fixed
> search), and removed its own threading (since & can do that
> now). It periodically wakes up, opens the new reader, and swaps it
> into the PerfRunData, at the schedule you specify.  I switched all
> usage of PerfRunData's get/setIndexReader APIs to use ref
> counting.
> With these changes you can now make some very simple but powerful
> algs, eg:
> {code}
> OpenIndex
> {
>   NearRealtimeReader(0.5) &
>   # Warm
>   Search
>   { "Index1" AddDoc > : * : 100/sec &
>   [ { "Search" Search > : * ] : 4 &
>   Wait(30.0)
> }
> CloseReader
> RollbackIndex
> RepSumByName
> {code}
> This alg first opens the IndexWriter, then spawns the BG thread to
> reopen the NRT reader twice per second, does one warming Search (in
> the FG), spans a new thread to index documents at the rate of 100 per
> second, then spawns 4 search threads that do as many searches as they
> can.  We then wait for 30 seconds, then stop all the threads, revert
> the index, and report.
> The patch is a work in progress -- it generally works, but there're a
> few nocommits, and, we may want to improve reporting (though I think
> that's a separate issue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2060) CMS should default its maxThreadCount to 1 (not 3)

2009-11-13 Thread Michael McCandless (JIRA)

CMS should default its maxThreadCount to 1 (not 3)
--

 Key: LUCENE-2060
 URL: https://issues.apache.org/jira/browse/LUCENE-2060
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.0


>From rough experience, I think the current default of 3 is too large.  I think 
>we get the most bang for the buck going from 0 to 1.

I think this will especially impact optimize on an index with many segments -- 
in this case the MergePolicy happily exposes concurrency (multiple pending 
merges), and CMS will happily launch 3 threads to carry that out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2024) "ant dist" no longer generates md5's for the top-level artifacts

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2024:
--

Attachment: LUCENE-2024.patch

Here a patch that uses the most recent checksum task. I have to checkout what 
the minimum ant version is now. It is more simple now. I aso added sha1 sums.

It is also faster than before. Mark/Mike, can you test this on your computers?

> "ant dist" no longer generates md5's for the top-level artifacts
> 
>
> Key: LUCENE-2024
> URL: https://issues.apache.org/jira/browse/LUCENE-2024
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.9.1, 2.9.2, 3.0
>Reporter: Michael McCandless
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2024.patch
>
>
> Mark hit this for 2.9.0, and I just hit it again for 2.9.1.  It used to 
> work...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2024) "ant dist" no longer generates md5's for the top-level artifacts

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-2024:
-

Assignee: Uwe Schindler

> "ant dist" no longer generates md5's for the top-level artifacts
> 
>
> Key: LUCENE-2024
> URL: https://issues.apache.org/jira/browse/LUCENE-2024
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.9.1, 2.9.2, 3.0
>Reporter: Michael McCandless
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
>
> Mark hit this for 2.9.0, and I just hit it again for 2.9.1.  It used to 
> work...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-1257) Port to Java5

2009-11-13 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-1257.
---

Resolution: Fixed

Closed for 3.0. Further updates of tests and internal APIs may follow for 3.1 
in a new issue.

> Port to Java5
> -
>
> Key: LUCENE-1257
> URL: https://issues.apache.org/jira/browse/LUCENE-1257
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis, Examples, Index, Other, Query/Scoring, 
> QueryParser, Search, Store, Term Vectors
>Affects Versions: 3.0
>Reporter: Cédric Champeau
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: instantiated_fieldable.patch, 
> LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
> LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
> LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
> LUCENE-1257-CompoundFileReaderWriter.patch, 
> LUCENE-1257-ConcurrentMergeScheduler.patch, 
> LUCENE-1257-DirectoryReader.patch, 
> LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
> LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
> LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
> LUCENE-1257-IndexDeleter.patch, 
> LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
> LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
> LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
> LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
> LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_contrib_ant.patch, 
> LUCENE-1257_contrib_benchmark.patch, LUCENE-1257_contrib_benchmark_2.patch, 
> LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_contrib_memory.patch, 
> LUCENE-1257_contrib_misc.patch, LUCENE-1257_contrib_smartcn.patch, 
> LUCENE-1257_heavy.patch, LUCENE-1257_heavy.patch, 
> LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_lucil.patch, 
> LUCENE-1257_lucli.patch, LUCENE-1257_messages.patch, 
> LUCENE-1257_more_unnecessary_casts.patch, 
> LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, 
> LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_demo.patch, 
> LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
> LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_precendence_parser.patch, LUCENE-1257_queryParser_jj.patch, 
> LUCENE-1257_swing_wikipedia_wordnet_xmlqp.patch, 
> LUCENE-1257_unnecessary_casts.patch, LUCENE-1257_unnnecessary_casts_2.patch, 
> lucene1257surround1.patch, lucene1257surround1.patch, 
> shinglematrixfilter_generified.patch
>
>
> For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
> Java 5 migration had been planned for 2.1 someday in the past, but don't know 
> when it is planned now. This patch against the trunk includes :
> - most obvious generics usage (there are tons of usages of sets, ... Those 
> which are commonly used have been generified)
> - PriorityQueue generification
> - replacement of indexed for loops with for each constructs
> - removal of unnececessary unboxing
> The code is to my opinion much more readable with those features (you 
> actually *know* what is stored in collections reading the code, without the 
> need to lookup for field definitions everytime) and it simplifies many 
> algorithms.
> Note that this patch also includes an interface for the Query class. This has 
> been done for my company's needs for building custom Query classes which add 
> some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
> casts. I know this introduction is not wanted by the team, but it really 
> makes our developments easier to maintain. If you don't want to use this, 
> replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Build failed in Hudson: Lucene-trunk #1007

2009-11-13 Thread Uwe Schindler

Oh I see:

The checksum task in ant now supports also the filename inside md5 file:
http://ant.apache.org/manual/CoreTasks/checksum.html

use property "format"

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Friday, November 13, 2009 12:36 PM
> To: java-dev@lucene.apache.org
> Subject: RE: Build failed in Hudson: Lucene-trunk #1007
> 
> > On Fri, Nov 13, 2009 at 5:35 AM, Uwe Schindler  wrote:
> > > I cannot reproduce it here, too (Sol, Win32). By the way, you modified
> > the
> > > test in trunk to be more verbose, but the failing one was in test-tag
> > (but
> > > the tests are identical).
> >
> > I'll add verbosity on BW branch, but not cut a new tag.
> >
> > > I got the release artefact build process running, files look good
> here.
> > On
> > > my solaris box with ant 1.7.0 also all md5 hashes are created. What
> > exactly
> > > was your and Mark's problem?
> >
> > Weird -- for me "ant dist" failed to created the md5s for the top
> > level artifacts (but did for all the maven artifacts).  I'm on ant
> > 1.71
> 
> For the maven artifacts they are created by the maven task (and a sha1,
> too). For the top-level artifacts there is an extra build target, but this
> task looks strange. The difference is, that this task appends the filename
> to the hash in the file. The maven artifacts only have the hash in the
> sha1
> and md5 file. Maybe we should change that task to simply call the checksum
> task in ant and *not* modify the resulting file. Why do we need that file
> name inside the md5 file?
> 
> Also this task takes very long time (about 10 minutes for the 25 MB ZIP
> file). I think we should raise the buffer size.
> 
> Uwe
> 
> 
> 
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2039) Regex support and beyond in JavaCC QueryParser

2009-11-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777465#action_12777465
 ] 

Simon Willnauer commented on LUCENE-2039:
-

Luis,
{quote}
syntax:
extension:fieldname:"syntax"

examples:
regexp:title:"/blah[a-z]+[0-9]+/" <- regexp extension, title index field
complex_phrase:title:"(sun OR sunny) sky" <- complex_phrase extension, 
title index field

regexp_phrase::"/blah[a-z]+[0-9]+/" <- regexp extension, default field
complex_phrase::"(sun OR sunny) sky" <- complex_phrase extension, default 
field

title:"blah" <- regular field query
{quote}

This is pretty much what I suggested above. We can extend the queryparser 
without breaking the backwards compatibility just by adding some code which is 
aware of the fieldname scheme. Even this could be extendable. FieldNames are 
terms and therefore they can not contain unescaped special chars like : { ] ... 
I would not even hard code the separator into the query parser but have the 
field name processed by something pluggable. So If somebody wants to have a 
regex extension they could use re\:field: or re\:: or re_field: 
Escaping a field is easy, just like you would do it with a term. 
More interesting is that we do not change any syntax, no special character but 
we can add a default implementation with a default implementation for 
extensions. This could be a whole API which takes are of creating and escaping 
the field name, building the query once it is passed to the extension etc. 
In a first step we can resolve the extension the second step calls the 
extension and build the query. If no extension is registered the query parser 
works like in previous versions so it is all up to the user.

@Adriano:
{quote}
The only part I disagree is when you pass the fieldname to the extension 
parser, I wouldn't implement that on the contrib parser, because it assumes the 
syntax always has field names. Anyway, for the core QP, I see the reason why 
you pass the fieldname
{quote}

You need the field to create you query in the extension, the field will always 
be set to either the default field or the explicitly defined field in the 
query. No reason why we should not pass it.
I agree with you that we should wrap the information in a class so that we do 
not need to change the method signature if something has to be changed in the 
future. Instead we just add a new member to the wrapper though.


> Regex support and beyond in JavaCC QueryParser
> --
>
> Key: LUCENE-2039
> URL: https://issues.apache.org/jira/browse/LUCENE-2039
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Reporter: Simon Willnauer
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2039.patch
>
>
> Since the early days the standard query parser was limited to the queries 
> living in core, adding other queries or extending the parser in any way 
> always forced people to change the grammar file and regenerate. Even if you 
> change the grammar you have to be extremely careful how you modify the parser 
> so that other parts of the standard parser are affected by customisation 
> changes. Eventually you had to live with all the limitation the current 
> parser has like tokenizing on whitespaces before a tokenizer / analyzer has 
> the chance to look at the tokens. 
> I was thinking about how to overcome the limitation and add regex support to 
> the query parser without introducing any dependency to core. I added a new 
> special character that basically prevents the parser from interpreting any of 
> the characters enclosed in the new special characters. I choose the forward 
> slash  '/' as the delimiter so that everything in between two forward slashes 
> is basically escaped and ignored by the parser. All chars embedded within 
> forward slashes are treated as one token even if it contains other special 
> chars like * []?{} or whitespaces. This token is subsequently passed to a 
> pluggable "parser extension" with builds a query from the embedded string. I 
> do not interpret the embedded string in any way but leave all the subsequent 
> work to the parser extension. Such an extension could be another full 
> featured query parser itself or simply a ctor call for regex query. The 
> interface remains quiet simple but makes the parser extendible in an easy way 
> compared to modifying the javaCC sources.
> The downsides of this patch is clearly that I introduce a new special char 
> into the syntax but I guess that would not be that much of a deal as it is 
> reflected in the escape method though. It would truly be nice to have more 
> than once extension an have this even more flexible so treat this patch as a 
>

RE: Build failed in Hudson: Lucene-trunk #1007

2009-11-13 Thread Uwe Schindler

> On Fri, Nov 13, 2009 at 5:35 AM, Uwe Schindler  wrote:
> > I cannot reproduce it here, too (Sol, Win32). By the way, you modified
> the
> > test in trunk to be more verbose, but the failing one was in test-tag
> (but
> > the tests are identical).
> 
> I'll add verbosity on BW branch, but not cut a new tag.
> 
> > I got the release artefact build process running, files look good here.
> On
> > my solaris box with ant 1.7.0 also all md5 hashes are created. What
> exactly
> > was your and Mark's problem?
> 
> Weird -- for me "ant dist" failed to created the md5s for the top
> level artifacts (but did for all the maven artifacts).  I'm on ant
> 1.71

For the maven artifacts they are created by the maven task (and a sha1,
too). For the top-level artifacts there is an extra build target, but this
task looks strange. The difference is, that this task appends the filename
to the hash in the file. The maven artifacts only have the hash in the sha1
and md5 file. Maybe we should change that task to simply call the checksum
task in ant and *not* modify the resulting file. Why do we need that file
name inside the md5 file?

Also this task takes very long time (about 10 minutes for the 25 MB ZIP
file). I think we should raise the buffer size.

Uwe

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2053) When thread is interrupted we should throw a clear exception

2009-11-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2053:
---

Attachment: LUCENE-2053.patch

Attached patch.

> When thread is interrupted we should throw a clear exception
> 
>
> Key: LUCENE-2053
> URL: https://issues.apache.org/jira/browse/LUCENE-2053
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2053.patch
>
>
> This is the 3.0 followon from LUCENE-1573.  We should throw a dedicated 
> exception, not just RuntimeException.
> Recent discussion from java-dev "Thread.interrupt()" subject: 
> http://www.lucidimagination.com/search/document/8423f9f0b085034e/thread_interrupt

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Build failed in Hudson: Lucene-trunk #1007

2009-11-13 Thread Mark Miller


Same here - worked for maven , not for others. Was on Linux.

- Mark

http://www.lucidimagination.com (mobile)

On Nov 13, 2009, at 5:58 AM, Michael McCandless > wrote:


On Fri, Nov 13, 2009 at 5:35 AM, Uwe Schindler   
wrote:
I cannot reproduce it here, too (Sol, Win32). By the way, you  
modified the
test in trunk to be more verbose, but the failing one was in test- 
tag (but

the tests are identical).


I'll add verbosity on BW branch, but not cut a new tag.

I got the release artefact build process running, files look good  
here. On
my solaris box with ant 1.7.0 also all md5 hashes are created. What  
exactly

was your and Mark's problem?


Weird -- for me "ant dist" failed to created the md5s for the top
level artifacts (but did for all the maven artifacts).  I'm on ant
1.71

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1257) Port to Java5

2009-11-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777454#action_12777454
 ] 

Simon Willnauer commented on LUCENE-1257:
-

We gonna reopen it in 3.0 anyway so just go ahead and close for now!

> Port to Java5
> -
>
> Key: LUCENE-1257
> URL: https://issues.apache.org/jira/browse/LUCENE-1257
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis, Examples, Index, Other, Query/Scoring, 
> QueryParser, Search, Store, Term Vectors
>Affects Versions: 3.0
>Reporter: Cédric Champeau
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: instantiated_fieldable.patch, 
> LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
> LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
> LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
> LUCENE-1257-CompoundFileReaderWriter.patch, 
> LUCENE-1257-ConcurrentMergeScheduler.patch, 
> LUCENE-1257-DirectoryReader.patch, 
> LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
> LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
> LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
> LUCENE-1257-IndexDeleter.patch, 
> LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
> LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
> LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
> LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
> LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_contrib_ant.patch, 
> LUCENE-1257_contrib_benchmark.patch, LUCENE-1257_contrib_benchmark_2.patch, 
> LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_contrib_memory.patch, 
> LUCENE-1257_contrib_misc.patch, LUCENE-1257_contrib_smartcn.patch, 
> LUCENE-1257_heavy.patch, LUCENE-1257_heavy.patch, 
> LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_lucil.patch, 
> LUCENE-1257_lucli.patch, LUCENE-1257_messages.patch, 
> LUCENE-1257_more_unnecessary_casts.patch, 
> LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, 
> LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_demo.patch, 
> LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
> LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_precendence_parser.patch, LUCENE-1257_queryParser_jj.patch, 
> LUCENE-1257_swing_wikipedia_wordnet_xmlqp.patch, 
> LUCENE-1257_unnecessary_casts.patch, LUCENE-1257_unnnecessary_casts_2.patch, 
> lucene1257surround1.patch, lucene1257surround1.patch, 
> shinglematrixfilter_generified.patch
>
>
> For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
> Java 5 migration had been planned for 2.1 someday in the past, but don't know 
> when it is planned now. This patch against the trunk includes :
> - most obvious generics usage (there are tons of usages of sets, ... Those 
> which are commonly used have been generified)
> - PriorityQueue generification
> - replacement of indexed for loops with for each constructs
> - removal of unnececessary unboxing
> The code is to my opinion much more readable with those features (you 
> actually *know* what is stored in collections reading the code, without the 
> need to lookup for field definitions everytime) and it simplifies many 
> algorithms.
> Note that this patch also includes an interface for the Query class. This has 
> been done for my company's needs for building custom Query classes which add 
> some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
> casts. I know this introduction is not wanted by the team, but it really 
> makes our developments easier to maintain. If you don't want to use this, 
> replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1257) Port to Java5

2009-11-13 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777449#action_12777449
 ] 

Robert Muir commented on LUCENE-1257:
-

bq. If we find further Java5 conversions after release of 3.0, we can open new 
issues.
Uwe, I agree, I think you should close the issue.

> Port to Java5
> -
>
> Key: LUCENE-1257
> URL: https://issues.apache.org/jira/browse/LUCENE-1257
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis, Examples, Index, Other, Query/Scoring, 
> QueryParser, Search, Store, Term Vectors
>Affects Versions: 3.0
>Reporter: Cédric Champeau
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: instantiated_fieldable.patch, 
> LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
> LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
> LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
> LUCENE-1257-CompoundFileReaderWriter.patch, 
> LUCENE-1257-ConcurrentMergeScheduler.patch, 
> LUCENE-1257-DirectoryReader.patch, 
> LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
> LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
> LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
> LUCENE-1257-IndexDeleter.patch, 
> LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
> LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
> LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
> LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
> LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_contrib_ant.patch, 
> LUCENE-1257_contrib_benchmark.patch, LUCENE-1257_contrib_benchmark_2.patch, 
> LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_contrib_memory.patch, 
> LUCENE-1257_contrib_misc.patch, LUCENE-1257_contrib_smartcn.patch, 
> LUCENE-1257_heavy.patch, LUCENE-1257_heavy.patch, 
> LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_lucil.patch, 
> LUCENE-1257_lucli.patch, LUCENE-1257_messages.patch, 
> LUCENE-1257_more_unnecessary_casts.patch, 
> LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, 
> LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_demo.patch, 
> LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
> LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_precendence_parser.patch, LUCENE-1257_queryParser_jj.patch, 
> LUCENE-1257_swing_wikipedia_wordnet_xmlqp.patch, 
> LUCENE-1257_unnecessary_casts.patch, LUCENE-1257_unnnecessary_casts_2.patch, 
> lucene1257surround1.patch, lucene1257surround1.patch, 
> shinglematrixfilter_generified.patch
>
>
> For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
> Java 5 migration had been planned for 2.1 someday in the past, but don't know 
> when it is planned now. This patch against the trunk includes :
> - most obvious generics usage (there are tons of usages of sets, ... Those 
> which are commonly used have been generified)
> - PriorityQueue generification
> - replacement of indexed for loops with for each constructs
> - removal of unnececessary unboxing
> The code is to my opinion much more readable with those features (you 
> actually *know* what is stored in collections reading the code, without the 
> need to lookup for field definitions everytime) and it simplifies many 
> algorithms.
> Note that this patch also includes an interface for the Query class. This has 
> been done for my company's needs for building custom Query classes which add 
> some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
> casts. I know this introduction is not wanted by the team, but it really 
> makes our developments easier to maintain. If you don't want to use this, 
> replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Build failed in Hudson: Lucene-trunk #1007

2009-11-13 Thread Michael McCandless

On Fri, Nov 13, 2009 at 5:35 AM, Uwe Schindler  wrote:
> I cannot reproduce it here, too (Sol, Win32). By the way, you modified the
> test in trunk to be more verbose, but the failing one was in test-tag (but
> the tests are identical).

I'll add verbosity on BW branch, but not cut a new tag.

> I got the release artefact build process running, files look good here. On
> my solaris box with ant 1.7.0 also all md5 hashes are created. What exactly
> was your and Mark's problem?

Weird -- for me "ant dist" failed to created the md5s for the top
level artifacts (but did for all the maven artifacts).  I'm on ant
1.71

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments

2009-11-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777445#action_12777445
 ] 

Simon Willnauer commented on LUCENE-2051:
-

This would also include deprecating some of the constructors. I will attach a 
patch which adds / deprecates ctors for all analyzers having those setters too.

simon

> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9.1
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 3.0
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets 
> / hashtables etc. Those setters should be deprecated as they yield unexpected 
> behaviour. The way they work is they set the reusable token stream instance 
> to null in a thread local cache which only affects the tokenstream in the 
> current thread. Analyzers itself should be immutable except of the 
> threadlocal. 
> will attach a patch soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Build failed in Hudson: Lucene-trunk #1007

2009-11-13 Thread Simon Willnauer

I tried to reproduce the error on 64 bit / 32 bit linux (gentoo /
ubuntu) no failure here too.  I also run it on different  JVMs  1.5,
1.6 no result :(

simon

On Fri, Nov 13, 2009 at 11:10 AM, Michael McCandless
 wrote:
> Hurm, this was the failure:
>
>    [junit] Testcase:
> testMaxBufferedDocsChange(org.apache.lucene.index.TestIndexWriterMergePolicy):
>   FAILED
>    [junit] null
>    [junit] junit.framework.AssertionFailedError
>    [junit]     at
> org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestIndexWriterMergePolicy.java:234)
>    [junit]     at
> org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange(TestIndexWriterMergePolicy.java:164)
>    [junit]     at
> org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:206)
>    [junit]
>
> I can't repro off current trunk, on OpenSolaris, though.  The test
> changes maxBufferedDocs & mergeFactor on an open IW, then checks if
> the changes "took", but is hitting an assert failure because there are
> too many segments on one level.  Odd.
>
> Mike
>
> On Thu, Nov 12, 2009 at 10:20 PM, Apache Hudson Server
>  wrote:
>> See 
>>
>> Changes:
>>
>> [rmuir] LUCENE-2059: allow TrecContentSource not to change the docname
>>
>> [rmuir] LUCENE-2058: specify trec_eval output file from commandline
>>
>> [markrmiller] update IndexReader javadoc concerning readonly
>>
>> [uschindler] Remove dead code from old fake norms. FieldNormModifier now 
>> creates the fake itsself.
>>
>> --
>> [...truncated 15986 lines...]
>>    [junit] -  ---
>>    [junit] Testsuite: org.apache.lucene.search.TestSetNorm
>>    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.546 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestSimilarity
>>    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.666 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestSimpleExplanations
>>    [junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 20.991 sec
>>    [junit]
>>    [junit] Testsuite: 
>> org.apache.lucene.search.TestSimpleExplanationsOfNonMatches
>>    [junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 1.661 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestSloppyPhraseQuery
>>    [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 3.417 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestSort
>>    [junit] Tests run: 22, Failures: 0, Errors: 0, Time elapsed: 14.419 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestSpanQueryFilter
>>    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.652 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestTermRangeFilter
>>    [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 8.154 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestTermRangeQuery
>>    [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 1.033 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestTermScorer
>>    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.559 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestTermVectors
>>    [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 2.795 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestThreadSafe
>>    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.324 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector
>>    [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 8.671 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector
>>    [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.708 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector
>>    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.648 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.TestWildcard
>>    [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.225 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.function.TestCustomScoreQuery
>>    [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 182.08 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.function.TestDocValues
>>    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.314 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.function.TestFieldScoreQuery
>>    [junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 2.966 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.function.TestOrdValues
>>    [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.841 sec
>>    [junit]
>>    [junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadNearQuery

RE: Build failed in Hudson: Lucene-trunk #1007

2009-11-13 Thread Uwe Schindler

I cannot reproduce it here, too (Sol, Win32). By the way, you modified the
test in trunk to be more verbose, but the failing one was in test-tag (but
the tests are identical).

I got the release artefact build process running, files look good here. On
my solaris box with ant 1.7.0 also all md5 hashes are created. What exactly
was your and Mark's problem?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Friday, November 13, 2009 11:10 AM
> To: java-dev@lucene.apache.org
> Subject: Re: Build failed in Hudson: Lucene-trunk #1007
> 
> Hurm, this was the failure:
> 
> [junit] Testcase:
> testMaxBufferedDocsChange(org.apache.lucene.index.TestIndexWriterMergePoli
> cy):  FAILED
> [junit] null
> [junit] junit.framework.AssertionFailedError
> [junit]   at
> org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestInd
> exWriterMergePolicy.java:234)
> [junit]   at
> org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChan
> ge(TestIndexWriterMergePolicy.java:164)
> [junit]   at
> org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:206)
> [junit]
> 
> I can't repro off current trunk, on OpenSolaris, though.  The test
> changes maxBufferedDocs & mergeFactor on an open IW, then checks if
> the changes "took", but is hitting an assert failure because there are
> too many segments on one level.  Odd.
> 
> Mike
> 
> On Thu, Nov 12, 2009 at 10:20 PM, Apache Hudson Server
>  wrote:
> > See  trunk/1007/changes>
> >
> > Changes:
> >
> > [rmuir] LUCENE-2059: allow TrecContentSource not to change the docname
> >
> > [rmuir] LUCENE-2058: specify trec_eval output file from commandline
> >
> > [markrmiller] update IndexReader javadoc concerning readonly
> >
> > [uschindler] Remove dead code from old fake norms. FieldNormModifier now
> creates the fake itsself.
> >
> > --
> > [...truncated 15986 lines...]
> >    [junit] -  ---
> >    [junit] Testsuite: org.apache.lucene.search.TestSetNorm
> >    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.546 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestSimilarity
> >    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.666 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestSimpleExplanations
> >    [junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 20.991
> sec
> >    [junit]
> >    [junit] Testsuite:
> org.apache.lucene.search.TestSimpleExplanationsOfNonMatches
> >    [junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 1.661
> sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestSloppyPhraseQuery
> >    [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 3.417 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestSort
> >    [junit] Tests run: 22, Failures: 0, Errors: 0, Time elapsed: 14.419
> sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestSpanQueryFilter
> >    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.652 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestTermRangeFilter
> >    [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 8.154 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestTermRangeQuery
> >    [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 1.033 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestTermScorer
> >    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.559 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestTermVectors
> >    [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 2.795 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestThreadSafe
> >    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.324 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector
> >    [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 8.671 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector
> >    [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.708 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector
> >    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.648 sec
> >    [junit]
> >    [junit] Testsuite: org.apache.lucene.search.TestWildcard
> >    [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.225 sec
> >    [junit]
> >    [junit] Testsuite:
> org.apache.lucene.search.function.TestCustomScoreQuery
> >    [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 182.08
> sec
> >    [junit]
> >    [junit]

[jira] Commented: (LUCENE-2047) IndexWriter should immediately resolve deleted docs to docID in near-real-time mode

2009-11-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777436#action_12777436
 ] 

Michael McCandless commented on LUCENE-2047:


{quote}
we'd need to
prevent the deletion of the SR's files we're deleting from, even
if that SR is no longer live. 
{quote}

It's strange that anything here is needed, because, when you check a
reader out from the pool, it's incRef'd, which should mean the files
need no protection.  Something strange is up... could it be that when
you checkout that reader to do deletions, it wasn't already open, and
then on trying to open it, its files were already deleted?  (In which
case, that segment has been merged away, and, the merge has committed,
ie already carried over all deletes, and so you should instead be
deleting against that merged segment).

So I think the sync(IW) is in fact necessary?  Note that the current
approach (deferring resolving term -> docIDs until flush time) aiso
sync(IW)'d, so we're not really changing that, here.  Though I agree
it would be nice to not have to sync(IW).  Really what we need to sync
on is "any merge that is merging this segment away and now wants to
commit".  That's actually a very narrow event so someday (separate
issue) if we could refine the sync'ing to that, it should be a good
net throughput improvement for updateDocument.

{quote}
What happens to
documents that need to be deleted but are still in the RAM
buffer?
{quote}

Ahh, yes.  We must still buffer for this case, and resolve these
deletes against the newly flushed segment.  I think we need a separate
buffer that tracks pending delete terms only against the RAM buffer?

Also, instead of actually setting the bits in SR's deletedDocs, I
think you should buffer the deleted docIDs into DW's
deletesInRAM.docIDs?  Ie, we do the resolution of Term/Query -> docID,
but buffer the docIDs we resolved to.  This is necessary for
correctness in exceptional situations, eg if you do a bunch of
updateDocuments, then DW hits an aborting exception (meaning its RAM
buffer may be corrupt) then DW currently discards the RAM buffer, but,
leaves previously flushed segments intact, so that if you then commit,
you have a consistent index.  Ie, in that situation, we don't want the
docs deleted by updateDocument calls to be committed to the index, so
we need to buffer them.


> IndexWriter should immediately resolve deleted docs to docID in 
> near-real-time mode
> ---
>
> Key: LUCENE-2047
> URL: https://issues.apache.org/jira/browse/LUCENE-2047
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2047.patch, LUCENE-2047.patch
>
>
> Spinoff from LUCENE-1526.
> When deleteDocuments(Term) is called, we currently always buffer the
> Term and only later, when it's time to flush deletes, resolve to
> docIDs.  This is necessary because we don't in general hold
> SegmentReaders open.
> But, when IndexWriter is in NRT mode, we pool the readers, and so
> deleting in the foreground is possible.
> It's also beneficial, in that in can reduce the turnaround time when
> reopening a new NRT reader by taking this resolution off the reopen
> path.  And if multiple threads are used to do the deletion, then we
> gain concurrency, vs reopen which is not concurrent when flushing the
> deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Build failed in Hudson: Lucene-trunk #1007

2009-11-13 Thread Michael McCandless

Hurm, this was the failure:

[junit] Testcase:
testMaxBufferedDocsChange(org.apache.lucene.index.TestIndexWriterMergePolicy):  
FAILED
[junit] null
[junit] junit.framework.AssertionFailedError
[junit] at
org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestIndexWriterMergePolicy.java:234)
[junit] at
org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange(TestIndexWriterMergePolicy.java:164)
[junit] at
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:206)
[junit]

I can't repro off current trunk, on OpenSolaris, though.  The test
changes maxBufferedDocs & mergeFactor on an open IW, then checks if
the changes "took", but is hitting an assert failure because there are
too many segments on one level.  Odd.

Mike

On Thu, Nov 12, 2009 at 10:20 PM, Apache Hudson Server
 wrote:
> See 
>
> Changes:
>
> [rmuir] LUCENE-2059: allow TrecContentSource not to change the docname
>
> [rmuir] LUCENE-2058: specify trec_eval output file from commandline
>
> [markrmiller] update IndexReader javadoc concerning readonly
>
> [uschindler] Remove dead code from old fake norms. FieldNormModifier now 
> creates the fake itsself.
>
> --
> [...truncated 15986 lines...]
>    [junit] -  ---
>    [junit] Testsuite: org.apache.lucene.search.TestSetNorm
>    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.546 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestSimilarity
>    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.666 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestSimpleExplanations
>    [junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 20.991 sec
>    [junit]
>    [junit] Testsuite: 
> org.apache.lucene.search.TestSimpleExplanationsOfNonMatches
>    [junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 1.661 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestSloppyPhraseQuery
>    [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 3.417 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestSort
>    [junit] Tests run: 22, Failures: 0, Errors: 0, Time elapsed: 14.419 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestSpanQueryFilter
>    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.652 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestTermRangeFilter
>    [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 8.154 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestTermRangeQuery
>    [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 1.033 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestTermScorer
>    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.559 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestTermVectors
>    [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 2.795 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestThreadSafe
>    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.324 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector
>    [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 8.671 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector
>    [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.708 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector
>    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.648 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.TestWildcard
>    [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.225 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.function.TestCustomScoreQuery
>    [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 182.08 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.function.TestDocValues
>    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.314 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.function.TestFieldScoreQuery
>    [junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 2.966 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.function.TestOrdValues
>    [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.841 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadNearQuery
>    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.353 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadTermQuery
>    [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 4.237 sec
>    [junit]
>    [junit] Testsuite: org.apache.lucene.search.spans.TestBasics
>    [junit] T

Re: 3.0 or 3.0.0 version number

2009-11-13 Thread Michael McCandless

It should be 3.0.0, I think?

Mike

On Fri, Nov 13, 2009 at 4:12 AM, Uwe Schindler  wrote:
> I looked into the archive; we are quite inconsequent with release artifact
> version numbers. Should I call the next release 3.0.0 or 3.0 ?
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

3.0 or 3.0.0 version number

2009-11-13 Thread Uwe Schindler

I looked into the archive; we are quite inconsequent with release artifact
version numbers. Should I call the next release 3.0.0 or 3.0 ?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1257) Port to Java5

2009-11-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777419#action_12777419
 ] 

Uwe Schindler commented on LUCENE-1257:
---

Robert: Is there anything else in contrib to convert? If no, I would close this 
issue for now.

If we find further Java5 conversions after release of 3.0, we can open new 
issues.

> Port to Java5
> -
>
> Key: LUCENE-1257
> URL: https://issues.apache.org/jira/browse/LUCENE-1257
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis, Examples, Index, Other, Query/Scoring, 
> QueryParser, Search, Store, Term Vectors
>Affects Versions: 3.0
>Reporter: Cédric Champeau
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: instantiated_fieldable.patch, 
> LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
> LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
> LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
> LUCENE-1257-CompoundFileReaderWriter.patch, 
> LUCENE-1257-ConcurrentMergeScheduler.patch, 
> LUCENE-1257-DirectoryReader.patch, 
> LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
> LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
> LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
> LUCENE-1257-IndexDeleter.patch, 
> LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
> LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
> LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
> LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
> LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_contrib_ant.patch, 
> LUCENE-1257_contrib_benchmark.patch, LUCENE-1257_contrib_benchmark_2.patch, 
> LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_contrib_memory.patch, 
> LUCENE-1257_contrib_misc.patch, LUCENE-1257_contrib_smartcn.patch, 
> LUCENE-1257_heavy.patch, LUCENE-1257_heavy.patch, 
> LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_lucil.patch, 
> LUCENE-1257_lucli.patch, LUCENE-1257_messages.patch, 
> LUCENE-1257_more_unnecessary_casts.patch, 
> LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, 
> LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_demo.patch, 
> LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
> LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_precendence_parser.patch, LUCENE-1257_queryParser_jj.patch, 
> LUCENE-1257_swing_wikipedia_wordnet_xmlqp.patch, 
> LUCENE-1257_unnecessary_casts.patch, LUCENE-1257_unnnecessary_casts_2.patch, 
> lucene1257surround1.patch, lucene1257surround1.patch, 
> shinglematrixfilter_generified.patch
>
>
> For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
> Java 5 migration had been planned for 2.1 someday in the past, but don't know 
> when it is planned now. This patch against the trunk includes :
> - most obvious generics usage (there are tons of usages of sets, ... Those 
> which are commonly used have been generified)
> - PriorityQueue generification
> - replacement of indexed for loops with for each constructs
> - removal of unnececessary unboxing
> The code is to my opinion much more readable with those features (you 
> actually *know* what is stored in collections reading the code, without the 
> need to lookup for field definitions everytime) and it simplifies many 
> algorithms.
> Note that this patch also includes an interface for the Query class. This has 
> been done for my company's needs for building custom Query classes which add 
> some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
> casts. I know this introduction is not wanted by the team, but it really 
> makes our developments easier to maintain. If you don't want to use this, 
> replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

73 matches

Mail list logo