[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725500#action_12725500
 ] 

Mark Harwood commented on LUCENE-1720:
--

bq. Maybe we can benchmark this approach

See 
http://www.nabble.com/Improving-TimeLimitedCollector-td24174758.html#a24229185
The figures were produced by TestTimeLimitedIndexReader that is part of this 
Jira issue so you can try benchmarks on your own indexes.

bq.if it slows down queries due to the the Thread.currentThread and hash lookup

This lookup only happens when threads start or stop timed activities and when 
there is a timed out state - all other method invocations on 
TimeLimitedIndexReader eg termDocs.next() are simply testing a volatile boolean 
which is used to indicate if any timeout has occurred. This seems to be fast in 
my benchmarks.

bq. maybe we can .. change the Lucene API such that we pass in an argument to 
the IndexReader methods where the timeout may be checked 

The current design uses static methods which remove the need to pass a timeout 
object as context everywhere but using this approach comes with the downside 
that a single client thread is unable to time >1 activity at once which we 
thought was a reasonable trade-off. See 
http://www.nabble.com/Re%3A-Improving-TimeLimitedCollector-p24234976.html

> TimeLimitedIndexReader and associated utility class
> ---
>
> Key: LUCENE-1720
> URL: https://issues.apache.org/jira/browse/LUCENE-1720
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May closed LUCENE-1723.



> KeywordTokenizer does not properly set the end offset
> -
>
> Key: LUCENE-1723
> URL: https://issues.apache.org/jira/browse/LUCENE-1723
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 2.4.1
>Reporter: Dima May
>Priority: Minor
> Fix For: 2.9
>
> Attachments: AnalyzerBug.java
>
>
> KeywordTokenizer sets the Token's term length attribute but appears to omit 
> the end offset. The issue was discovered while using a highlighter with the 
> KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
> the bug. 
> Below is a JUnit test (source is also attached) that exercises various 
> analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
> successfully wraps the text with the highlight tags, such as 
> "thetext". When using KeywordAnalyzer the tags appear before the text, 
> for example: "thetext". 
> Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
> using NewKeywordAnalyzer the tags are properly placed around the text. The 
> NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
> the end offset for the returned Token. NewKeywordAnalyzer utilizes 
> KeywordTokenizer to produce proper token.
> Unless there is an objection I will gladly post a patch in the very near 
> future . 
> -
> package lucene;
> import java.io.IOException;
> import java.io.Reader;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.KeywordAnalyzer;
> import org.apache.lucene.analysis.KeywordTokenizer;
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.analysis.StopAnalyzer;
> import org.apache.lucene.analysis.Token;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.Tokenizer;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.search.highlight.Highlighter;
> import org.apache.lucene.search.highlight.QueryScorer;
> import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
> import org.apache.lucene.search.highlight.WeightedTerm;
> import org.junit.Test;
> import static org.junit.Assert.*;
> public class AnalyzerBug {
>   @Test
>   public void testWithHighlighting() throws IOException {
>   String text = "thetext";
>   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
>   Highlighter highlighter = new Highlighter(new 
> SimpleHTMLFormatter(
>   "", ""), new QueryScorer(terms));
>   Analyzer[] analazers = { new StandardAnalyzer(), new 
> SimpleAnalyzer(),
>   new StopAnalyzer(), new WhitespaceAnalyzer(),
>   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
> };
>   // Analyzers pass except KeywordAnalyzer
>   for (Analyzer analazer : analazers) {
>   String highighted = 
> highlighter.getBestFragment(analazer,
>   "CONTENT", text);
>   assertEquals("Failed for " + 
> analazer.getClass().getName(), ""
>   + text + "", highighted);
>   System.out.println(analazer.getClass().getName()
>   + " passed, value highlighted: " + 
> highighted);
>   }
>   }
> }
> class NewKeywordAnalyzer extends KeywordAnalyzer {
>   @Override
>   public TokenStream reusableTokenStream(String fieldName, Reader reader)
>   throws IOException {
>   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
>   if (tokenizer == null) {
>   tokenizer = new NewKeywordTokenizer(reader);
>   setPreviousTokenStream(tokenizer);
>   } else
>   tokenizer.reset(reader);
>   return tokenizer;
>   }
>   @Override
>   public TokenStream tokenStream(String fieldName, Reader reader) {
>   return new NewKeywordTokenizer(reader);
>   }
> }
> class NewKeywordTokenizer extends KeywordTokenizer {
>   public NewKeywordTokenizer(Reader input) {
>   super(input);
>   }
>   @Override
>   public Token next(Token t) throws IOException {
>   Token result = super.next(t);
>   if (result != null) {
>   result.setEndOffset(result.termLength());
>   }
>   return result;
>   }
> }

-- 
This message is 

[jira] Resolved: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May resolved LUCENE-1723.
--

   Resolution: Fixed
Fix Version/s: 2.9

> KeywordTokenizer does not properly set the end offset
> -
>
> Key: LUCENE-1723
> URL: https://issues.apache.org/jira/browse/LUCENE-1723
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 2.4.1
>Reporter: Dima May
>Priority: Minor
> Fix For: 2.9
>
> Attachments: AnalyzerBug.java
>
>
> KeywordTokenizer sets the Token's term length attribute but appears to omit 
> the end offset. The issue was discovered while using a highlighter with the 
> KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
> the bug. 
> Below is a JUnit test (source is also attached) that exercises various 
> analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
> successfully wraps the text with the highlight tags, such as 
> "thetext". When using KeywordAnalyzer the tags appear before the text, 
> for example: "thetext". 
> Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
> using NewKeywordAnalyzer the tags are properly placed around the text. The 
> NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
> the end offset for the returned Token. NewKeywordAnalyzer utilizes 
> KeywordTokenizer to produce proper token.
> Unless there is an objection I will gladly post a patch in the very near 
> future . 
> -
> package lucene;
> import java.io.IOException;
> import java.io.Reader;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.KeywordAnalyzer;
> import org.apache.lucene.analysis.KeywordTokenizer;
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.analysis.StopAnalyzer;
> import org.apache.lucene.analysis.Token;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.Tokenizer;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.search.highlight.Highlighter;
> import org.apache.lucene.search.highlight.QueryScorer;
> import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
> import org.apache.lucene.search.highlight.WeightedTerm;
> import org.junit.Test;
> import static org.junit.Assert.*;
> public class AnalyzerBug {
>   @Test
>   public void testWithHighlighting() throws IOException {
>   String text = "thetext";
>   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
>   Highlighter highlighter = new Highlighter(new 
> SimpleHTMLFormatter(
>   "", ""), new QueryScorer(terms));
>   Analyzer[] analazers = { new StandardAnalyzer(), new 
> SimpleAnalyzer(),
>   new StopAnalyzer(), new WhitespaceAnalyzer(),
>   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
> };
>   // Analyzers pass except KeywordAnalyzer
>   for (Analyzer analazer : analazers) {
>   String highighted = 
> highlighter.getBestFragment(analazer,
>   "CONTENT", text);
>   assertEquals("Failed for " + 
> analazer.getClass().getName(), ""
>   + text + "", highighted);
>   System.out.println(analazer.getClass().getName()
>   + " passed, value highlighted: " + 
> highighted);
>   }
>   }
> }
> class NewKeywordAnalyzer extends KeywordAnalyzer {
>   @Override
>   public TokenStream reusableTokenStream(String fieldName, Reader reader)
>   throws IOException {
>   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
>   if (tokenizer == null) {
>   tokenizer = new NewKeywordTokenizer(reader);
>   setPreviousTokenStream(tokenizer);
>   } else
>   tokenizer.reset(reader);
>   return tokenizer;
>   }
>   @Override
>   public TokenStream tokenStream(String fieldName, Reader reader) {
>   return new NewKeywordTokenizer(reader);
>   }
> }
> class NewKeywordTokenizer extends KeywordTokenizer {
>   public NewKeywordTokenizer(Reader input) {
>   super(input);
>   }
>   @Override
>   public Token next(Token t) throws IOException {
>   Token result = super.next(t);
>   if (result != null) {
>   result.setEndOffset(result.termLength());
>   }
> 

[jira] Commented: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725460#action_12725460
 ] 

Dima May commented on LUCENE-1723:
--

Verified! You are absolutely correct, the bug has been fixed on the latest 
trunk. The next method in the KeywordTokenizer now sets the start and end 
offsets:

   reusableToken.setStartOffset(input.correctOffset(0))
   reusableToken.setEndOffset(input.correctOffset(upto));

I will resolve and close the ticket. Sorry for the trouble and thank you for 
the prompt attention. 


> KeywordTokenizer does not properly set the end offset
> -
>
> Key: LUCENE-1723
> URL: https://issues.apache.org/jira/browse/LUCENE-1723
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 2.4.1
>Reporter: Dima May
>Priority: Minor
> Attachments: AnalyzerBug.java
>
>
> KeywordTokenizer sets the Token's term length attribute but appears to omit 
> the end offset. The issue was discovered while using a highlighter with the 
> KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
> the bug. 
> Below is a JUnit test (source is also attached) that exercises various 
> analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
> successfully wraps the text with the highlight tags, such as 
> "thetext". When using KeywordAnalyzer the tags appear before the text, 
> for example: "thetext". 
> Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
> using NewKeywordAnalyzer the tags are properly placed around the text. The 
> NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
> the end offset for the returned Token. NewKeywordAnalyzer utilizes 
> KeywordTokenizer to produce proper token.
> Unless there is an objection I will gladly post a patch in the very near 
> future . 
> -
> package lucene;
> import java.io.IOException;
> import java.io.Reader;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.KeywordAnalyzer;
> import org.apache.lucene.analysis.KeywordTokenizer;
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.analysis.StopAnalyzer;
> import org.apache.lucene.analysis.Token;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.Tokenizer;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.search.highlight.Highlighter;
> import org.apache.lucene.search.highlight.QueryScorer;
> import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
> import org.apache.lucene.search.highlight.WeightedTerm;
> import org.junit.Test;
> import static org.junit.Assert.*;
> public class AnalyzerBug {
>   @Test
>   public void testWithHighlighting() throws IOException {
>   String text = "thetext";
>   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
>   Highlighter highlighter = new Highlighter(new 
> SimpleHTMLFormatter(
>   "", ""), new QueryScorer(terms));
>   Analyzer[] analazers = { new StandardAnalyzer(), new 
> SimpleAnalyzer(),
>   new StopAnalyzer(), new WhitespaceAnalyzer(),
>   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
> };
>   // Analyzers pass except KeywordAnalyzer
>   for (Analyzer analazer : analazers) {
>   String highighted = 
> highlighter.getBestFragment(analazer,
>   "CONTENT", text);
>   assertEquals("Failed for " + 
> analazer.getClass().getName(), ""
>   + text + "", highighted);
>   System.out.println(analazer.getClass().getName()
>   + " passed, value highlighted: " + 
> highighted);
>   }
>   }
> }
> class NewKeywordAnalyzer extends KeywordAnalyzer {
>   @Override
>   public TokenStream reusableTokenStream(String fieldName, Reader reader)
>   throws IOException {
>   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
>   if (tokenizer == null) {
>   tokenizer = new NewKeywordTokenizer(reader);
>   setPreviousTokenStream(tokenizer);
>   } else
>   tokenizer.reset(reader);
>   return tokenizer;
>   }
>   @Override
>   public TokenStream tokenStream(String fieldName, Reader reader) {
>   return new NewKeywordTokenizer(reader);
>   }
> }
> class NewKeywordTokenizer extends KeywordToke

[jira] Commented: (LUCENE-1653) Change DateTools to not create a Calendar in every call to dateToString or timeToString

2009-06-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725456#action_12725456
 ] 

Shai Erera commented on LUCENE-1653:


In 3.0 when we move to Java 5, we can make Resolution an enum, and then use a 
switch statement on passed in Resolution. But performance-wise I don't think it 
would make such a big difference, as we're already comparing instances, which 
should be relatively fast.

How will moving the logic of timeToString, stringToDate and round to Resolution 
make the code tighter? Resolution would still need to check its instance type 
in order to execute the right code. Unless we subclass Resolution internally 
and have each subclass implement just the code section of these 3, that it 
needs?

> Change DateTools to not create a Calendar in every call to dateToString or 
> timeToString
> ---
>
> Key: LUCENE-1653
> URL: https://issues.apache.org/jira/browse/LUCENE-1653
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Shai Erera
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1653.patch, LUCENE-1653.patch
>
>
> DateTools creates a Calendar instance on every call to dateToString and 
> timeToString. Specifically:
> # timeToString calls Calendar.getInstance on every call.
> # dateToString calls timeToString(date.getTime()), which then instantiates a 
> new Date(). I think we should change the order of the calls, or not have each 
> call the other.
> # round(), which is called from timeToString (after creating a Calendar 
> instance) creates another (!) Calendar instance ...
> Seems that if we synchronize the methods and create the Calendar instance 
> once (static), it should solve it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725448#action_12725448
 ] 

Robert Muir commented on LUCENE-1723:
-

Dima, have you tried your test against the latest lucene trunk?

I got these results:
{noformat}
org.apache.lucene.analysis.standard.StandardAnalyzer passed, value highlighted: 
thetext
org.apache.lucene.analysis.SimpleAnalyzer passed, value highlighted: 
thetext
org.apache.lucene.analysis.StopAnalyzer passed, value highlighted: 
thetext
org.apache.lucene.analysis.WhitespaceAnalyzer passed, value highlighted: 
thetext
org.apache.lucene.analysis.NewKeywordAnalyzer passed, value highlighted: 
thetext
org.apache.lucene.analysis.KeywordAnalyzer passed, value highlighted: 
thetext
{noformat}

maybe you can verify the same?

> KeywordTokenizer does not properly set the end offset
> -
>
> Key: LUCENE-1723
> URL: https://issues.apache.org/jira/browse/LUCENE-1723
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 2.4.1
>Reporter: Dima May
>Priority: Minor
> Attachments: AnalyzerBug.java
>
>
> KeywordTokenizer sets the Token's term length attribute but appears to omit 
> the end offset. The issue was discovered while using a highlighter with the 
> KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
> the bug. 
> Below is a JUnit test (source is also attached) that exercises various 
> analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
> successfully wraps the text with the highlight tags, such as 
> "thetext". When using KeywordAnalyzer the tags appear before the text, 
> for example: "thetext". 
> Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
> using NewKeywordAnalyzer the tags are properly placed around the text. The 
> NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
> the end offset for the returned Token. NewKeywordAnalyzer utilizes 
> KeywordTokenizer to produce proper token.
> Unless there is an objection I will gladly post a patch in the very near 
> future . 
> -
> package lucene;
> import java.io.IOException;
> import java.io.Reader;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.KeywordAnalyzer;
> import org.apache.lucene.analysis.KeywordTokenizer;
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.analysis.StopAnalyzer;
> import org.apache.lucene.analysis.Token;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.Tokenizer;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.search.highlight.Highlighter;
> import org.apache.lucene.search.highlight.QueryScorer;
> import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
> import org.apache.lucene.search.highlight.WeightedTerm;
> import org.junit.Test;
> import static org.junit.Assert.*;
> public class AnalyzerBug {
>   @Test
>   public void testWithHighlighting() throws IOException {
>   String text = "thetext";
>   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
>   Highlighter highlighter = new Highlighter(new 
> SimpleHTMLFormatter(
>   "", ""), new QueryScorer(terms));
>   Analyzer[] analazers = { new StandardAnalyzer(), new 
> SimpleAnalyzer(),
>   new StopAnalyzer(), new WhitespaceAnalyzer(),
>   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
> };
>   // Analyzers pass except KeywordAnalyzer
>   for (Analyzer analazer : analazers) {
>   String highighted = 
> highlighter.getBestFragment(analazer,
>   "CONTENT", text);
>   assertEquals("Failed for " + 
> analazer.getClass().getName(), ""
>   + text + "", highighted);
>   System.out.println(analazer.getClass().getName()
>   + " passed, value highlighted: " + 
> highighted);
>   }
>   }
> }
> class NewKeywordAnalyzer extends KeywordAnalyzer {
>   @Override
>   public TokenStream reusableTokenStream(String fieldName, Reader reader)
>   throws IOException {
>   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
>   if (tokenizer == null) {
>   tokenizer = new NewKeywordTokenizer(reader);
>   setPreviousTokenStream(tokenizer);
>   } else
>   tokenizer.reset(reader);

[jira] Commented: (LUCENE-1653) Change DateTools to not create a Calendar in every call to dateToString or timeToString

2009-06-29 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725447#action_12725447
 ] 

David Smiley commented on LUCENE-1653:
--

I'm looking through DateTools now and can't help but want to clean it up some.  
One thing I see that is odd is the use of a Calendar in 
timeToString(long,resolution).  The first two lines look like this right now:
{code}
calInstance.setTimeInMillis(round(time, resolution));
Date date = calInstance.getTime();
{code}

Instead, it can simply be:
{code}
Date date = new Date(round(time, resolution));
{code}.

Secondly... I think a good deal of logic can be cleaned up in the other methods 
instead of a bunch of if-else statements that is a bad code smell.  Most of the 
logic of 3 of those methods could be put into Resolution and be made tighter.

> Change DateTools to not create a Calendar in every call to dateToString or 
> timeToString
> ---
>
> Key: LUCENE-1653
> URL: https://issues.apache.org/jira/browse/LUCENE-1653
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Shai Erera
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1653.patch, LUCENE-1653.patch
>
>
> DateTools creates a Calendar instance on every call to dateToString and 
> timeToString. Specifically:
> # timeToString calls Calendar.getInstance on every call.
> # dateToString calls timeToString(date.getTime()), which then instantiates a 
> new Date(). I think we should change the order of the calls, or not have each 
> call the other.
> # round(), which is called from timeToString (after creating a Calendar 
> instance) creates another (!) Calendar instance ...
> Seems that if we synchronize the methods and create the Calendar instance 
> once (static), it should solve it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May updated LUCENE-1723:
-

Description: 
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test (source is also attached) that exercises various 
analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
successfully wraps the text with the highlight tags, such as "thetext". 
When using KeywordAnalyzer the tags appear before the text, for example: 
"thetext". 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are properly placed around the text. The 
NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
the end offset for the returned Token. NewKeywordAnalyzer utilizes 
KeywordTokenizer to produce proper token.

Unless there is an objection I will gladly post a patch in the very near future 
. 

-
package lucene;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.KeywordTokenizer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.WeightedTerm;
import org.junit.Test;
import static org.junit.Assert.*;

public class AnalyzerBug {

@Test
public void testWithHighlighting() throws IOException {
String text = "thetext";
WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };

Highlighter highlighter = new Highlighter(new 
SimpleHTMLFormatter(
"", ""), new QueryScorer(terms));

Analyzer[] analazers = { new StandardAnalyzer(), new 
SimpleAnalyzer(),
new StopAnalyzer(), new WhitespaceAnalyzer(),
new NewKeywordAnalyzer(), new KeywordAnalyzer() 
};

// Analyzers pass except KeywordAnalyzer
for (Analyzer analazer : analazers) {
String highighted = 
highlighter.getBestFragment(analazer,
"CONTENT", text);
assertEquals("Failed for " + 
analazer.getClass().getName(), ""
+ text + "", highighted);
System.out.println(analazer.getClass().getName()
+ " passed, value highlighted: " + 
highighted);
}
}
}

class NewKeywordAnalyzer extends KeywordAnalyzer {

@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new NewKeywordTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else
tokenizer.reset(reader);
return tokenizer;
}

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return new NewKeywordTokenizer(reader);
}
}

class NewKeywordTokenizer extends KeywordTokenizer {
public NewKeywordTokenizer(Reader input) {
super(input);
}

@Override
public Token next(Token t) throws IOException {
Token result = super.next(t);
if (result != null) {
result.setEndOffset(result.termLength());
}
return result;
}
}


  was:
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test (source is also attached) that exercises various 
analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
successfully wraps the text with the highlight tags, such as "thetext". 
When using KeywordAnalyzer the tags appear before the text, for example: 
"thetext". 

Pl

[jira] Updated: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May updated LUCENE-1723:
-

Description: 
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test (source is also attached) that exercises various 
analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
successfully wraps the text with the highlight tags, such as "thetext". 
When using KeywordAnalyzer the tags appear before the text, for example: 
"thetext". 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are properly placed around the text. The 
NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
the end offset for the returned Token. NewKeywordAnalyzer utilizes 
KeywordTokenizer to produce proper token.

-
package lucene;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.KeywordTokenizer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.WeightedTerm;
import org.junit.Test;
import static org.junit.Assert.*;

public class AnalyzerBug {

@Test
public void testWithHighlighting() throws IOException {
String text = "thetext";
WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };

Highlighter highlighter = new Highlighter(new 
SimpleHTMLFormatter(
"", ""), new QueryScorer(terms));

Analyzer[] analazers = { new StandardAnalyzer(), new 
SimpleAnalyzer(),
new StopAnalyzer(), new WhitespaceAnalyzer(),
new NewKeywordAnalyzer(), new KeywordAnalyzer() 
};

// Analyzers pass except KeywordAnalyzer
for (Analyzer analazer : analazers) {
String highighted = 
highlighter.getBestFragment(analazer,
"CONTENT", text);
assertEquals("Failed for " + 
analazer.getClass().getName(), ""
+ text + "", highighted);
System.out.println(analazer.getClass().getName()
+ " passed, value highlighted: " + 
highighted);
}
}
}

class NewKeywordAnalyzer extends KeywordAnalyzer {

@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new NewKeywordTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else
tokenizer.reset(reader);
return tokenizer;
}

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return new NewKeywordTokenizer(reader);
}
}

class NewKeywordTokenizer extends KeywordTokenizer {
public NewKeywordTokenizer(Reader input) {
super(input);
}

@Override
public Token next(Token t) throws IOException {
Token result = super.next(t);
if (result != null) {
result.setEndOffset(result.termLength());
}
return result;
}
}


  was:
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test that exercises various analyzers via a Highlighter 
instance. Every analyzer but the KeywordAnazlyzer successfully wraps the text 
with the highlight tags, such as "thetext". When using KeywordAnalyzer 
the tags appear before the text, for example: "thetext". 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are 

[jira] Updated: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May updated LUCENE-1723:
-

Attachment: AnalyzerBug.java

> KeywordTokenizer does not properly set the end offset
> -
>
> Key: LUCENE-1723
> URL: https://issues.apache.org/jira/browse/LUCENE-1723
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 2.4.1
>Reporter: Dima May
>Priority: Minor
> Attachments: AnalyzerBug.java
>
>
> KeywordTokenizer sets the Token's term length attribute but appears to omit 
> the end offset. The issue was discovered while using a highlighter with the 
> KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
> the bug. 
> Below is a JUnit test that exercises various analyzers via a Highlighter 
> instance. Every analyzer but the KeywordAnazlyzer successfully wraps the text 
> with the highlight tags, such as "thetext". When using KeywordAnalyzer 
> the tags appear before the text, for example: "thetext". 
> Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
> using NewKeywordAnalyzer the tags are properly placed around the text. The 
> NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
> the end offset for the returned Token. NewKeywordAnalyzer utilizes 
> KeywordTokenizer to produce proper token.
> -
> package lucene;
> import java.io.IOException;
> import java.io.Reader;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.KeywordAnalyzer;
> import org.apache.lucene.analysis.KeywordTokenizer;
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.analysis.StopAnalyzer;
> import org.apache.lucene.analysis.Token;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.Tokenizer;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.search.highlight.Highlighter;
> import org.apache.lucene.search.highlight.QueryScorer;
> import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
> import org.apache.lucene.search.highlight.WeightedTerm;
> import org.junit.Test;
> import static org.junit.Assert.*;
> public class AnalyzerBug {
>   @Test
>   public void testWithHighlighting() throws IOException {
>   String text = "thetext";
>   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
>   Highlighter highlighter = new Highlighter(new 
> SimpleHTMLFormatter(
>   "", ""), new QueryScorer(terms));
>   Analyzer[] analazers = { new StandardAnalyzer(), new 
> SimpleAnalyzer(),
>   new StopAnalyzer(), new WhitespaceAnalyzer(),
>   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
> };
>   // Analyzers pass except KeywordAnalyzer
>   for (Analyzer analazer : analazers) {
>   String highighted = 
> highlighter.getBestFragment(analazer,
>   "CONTENT", text);
>   assertEquals("Failed for " + 
> analazer.getClass().getName(), ""
>   + text + "", highighted);
>   System.out.println(analazer.getClass().getName()
>   + " passed, value highlighted: " + 
> highighted);
>   }
>   }
> }
> class NewKeywordAnalyzer extends KeywordAnalyzer {
>   @Override
>   public TokenStream reusableTokenStream(String fieldName, Reader reader)
>   throws IOException {
>   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
>   if (tokenizer == null) {
>   tokenizer = new NewKeywordTokenizer(reader);
>   setPreviousTokenStream(tokenizer);
>   } else
>   tokenizer.reset(reader);
>   return tokenizer;
>   }
>   @Override
>   public TokenStream tokenStream(String fieldName, Reader reader) {
>   return new NewKeywordTokenizer(reader);
>   }
> }
> class NewKeywordTokenizer extends KeywordTokenizer {
>   public NewKeywordTokenizer(Reader input) {
>   super(input);
>   }
>   @Override
>   public Token next(Token t) throws IOException {
>   Token result = super.next(t);
>   if (result != null) {
>   result.setEndOffset(result.termLength());
>   }
>   return result;
>   }
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)
KeywordTokenizer does not properly set the end offset
-

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Attachments: AnalyzerBug.java

KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test that exercises various analyzers via a Highlighter 
instance. Every analyzer but the KeywordAnazlyzer successfully wraps the text 
with the highlight tags, such as "thetext". When using KeywordAnalyzer 
the tags appear before the text, for example: "thetext". 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are properly placed around the text. The 
NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
the end offset for the returned Token. NewKeywordAnalyzer utilizes 
KeywordTokenizer to produce proper token.

-
package lucene;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.KeywordTokenizer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.WeightedTerm;
import org.junit.Test;
import static org.junit.Assert.*;

public class AnalyzerBug {

@Test
public void testWithHighlighting() throws IOException {
String text = "thetext";
WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };

Highlighter highlighter = new Highlighter(new 
SimpleHTMLFormatter(
"", ""), new QueryScorer(terms));

Analyzer[] analazers = { new StandardAnalyzer(), new 
SimpleAnalyzer(),
new StopAnalyzer(), new WhitespaceAnalyzer(),
new NewKeywordAnalyzer(), new KeywordAnalyzer() 
};

// Analyzers pass except KeywordAnalyzer
for (Analyzer analazer : analazers) {
String highighted = 
highlighter.getBestFragment(analazer,
"CONTENT", text);
assertEquals("Failed for " + 
analazer.getClass().getName(), ""
+ text + "", highighted);
System.out.println(analazer.getClass().getName()
+ " passed, value highlighted: " + 
highighted);
}
}
}

class NewKeywordAnalyzer extends KeywordAnalyzer {

@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new NewKeywordTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else
tokenizer.reset(reader);
return tokenizer;
}

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return new NewKeywordTokenizer(reader);
}
}

class NewKeywordTokenizer extends KeywordTokenizer {
public NewKeywordTokenizer(Reader input) {
super(input);
}

@Override
public Token next(Token t) throws IOException {
Token result = super.next(t);
if (result != null) {
result.setEndOffset(result.termLength());
}
return result;
}
}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725386#action_12725386
 ] 

Jason Rutherglen commented on LUCENE-1720:
--

Maybe we can benchmark this approach to see if it slows down
queries due to the the Thread.currentThread and hash lookup? As
this would go into 3.0 (?) maybe we can look at how to change
the Lucene API such that we pass in an argument to the
IndexReader methods where the timeout may be checked for?

> TimeLimitedIndexReader and associated utility class
> ---
>
> Key: LUCENE-1720
> URL: https://issues.apache.org/jira/browse/LUCENE-1720
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

2009-06-29 Thread Tim Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-1705:
--

Attachment: (was: TestIndexWriterDelete.patch)

> Add deleteAllDocuments() method to IndexWriter
> --
>
> Key: LUCENE-1705
> URL: https://issues.apache.org/jira/browse/LUCENE-1705
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Index
>Affects Versions: 2.4
>Reporter: Tim Smith
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: DeleteAllFlushDocCountFix.patch, 
> IndexWriterDeleteAll.patch, LUCENE-1705.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the 
> IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using 
> deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
> existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a 
> commit(), as far as index visibility goes (new IndexReader opening would get 
> the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice 
> to finally see this added such that the IndexWriter would not need to be 
> closed to perform the "clear" as this seems to be the general recommendation 
> for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been 
> received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close 
> an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

2009-06-29 Thread Tim Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-1705:
--

Attachment: DeleteAllFlushDocCountFix.patch

Here's a patch that fixes the deleteAll() + updateDocument() issue

just needed to set the FlushDocCount to 0 after aborting the outstanding 
documents

> Add deleteAllDocuments() method to IndexWriter
> --
>
> Key: LUCENE-1705
> URL: https://issues.apache.org/jira/browse/LUCENE-1705
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Index
>Affects Versions: 2.4
>Reporter: Tim Smith
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: DeleteAllFlushDocCountFix.patch, 
> IndexWriterDeleteAll.patch, LUCENE-1705.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the 
> IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using 
> deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
> existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a 
> commit(), as far as index visibility goes (new IndexReader opening would get 
> the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice 
> to finally see this added such that the IndexWriter would not need to be 
> closed to perform the "clear" as this seems to be the general recommendation 
> for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been 
> received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close 
> an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue

2009-06-29 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-1566:


Attachment: LUCENE-1566.patch

I was able to reproduce the bug on my machine using several JVMs. The attached 
patch is what I got ready by now - I though I get it out there as soon as 
possible for discussion.
Test pass on my side!

> Large Lucene index can hit false OOM due to Sun JRE issue
> -
>
> Key: LUCENE-1566
> URL: https://issues.apache.org/jira/browse/LUCENE-1566
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Attachments: LUCENE-1566.patch
>
>
> This is not a Lucene issue, but I want to open this so future google
> diggers can more easily find it.
> There's this nasty bug in Sun's JRE:
>   http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
> The gist seems to be, if you try to read a large (eg 200 MB) number of
> bytes during a single RandomAccessFile.read call, you can incorrectly
> hit OOM.  Lucene does this, with norms, since we read in one byte per
> doc per field with norms, as a contiguous array of length maxDoc().
> The workaround was a custom patch to do large file reads as several
> smaller reads.
> Background here:
>   http://www.nabble.com/problems-with-large-Lucene-index-td22347854.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

2009-06-29 Thread Tim Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-1705:
--

Attachment: TestIndexWriterDelete.patch

Here's a patch to TestIndexWriterDelete that shows the problem

after the deleteAll(), a document is added and a document is updated
the added document gets indexed, the updated document does not

> Add deleteAllDocuments() method to IndexWriter
> --
>
> Key: LUCENE-1705
> URL: https://issues.apache.org/jira/browse/LUCENE-1705
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Index
>Affects Versions: 2.4
>Reporter: Tim Smith
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: IndexWriterDeleteAll.patch, LUCENE-1705.patch, 
> TestIndexWriterDelete.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the 
> IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using 
> deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
> existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a 
> commit(), as far as index visibility goes (new IndexReader opening would get 
> the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice 
> to finally see this added such that the IndexWriter would not need to be 
> closed to perform the "clear" as this seems to be the general recommendation 
> for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been 
> received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close 
> an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1706) Site search powered by Lucene/Solr

2009-06-29 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll closed LUCENE-1706.
---

   Resolution: Fixed
Lucene Fields:   (was: [New])

> Site search powered by Lucene/Solr
> --
>
> Key: LUCENE-1706
> URL: https://issues.apache.org/jira/browse/LUCENE-1706
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1706.patch, LUCENE-1706.patch
>
>
> For a number of years now, the Lucene community has been criticized for not 
> eating our own "dog food" when it comes to search. My company has built and 
> hosts a site search (http://www.lucidimagination.com/search) that is powered 
> by Apache Solr and Lucene and we'd like to donate it's use to the Lucene 
> community. Additionally, it allows one to search all of the Lucene content 
> from a single place, including web, wiki, JIRA and mail archives. See also 
> http://www.lucidimagination.com/search/document/bf22a570bf9385c7/search_on_lucene_apache_org
> You can see it live on Mahout, Tika and Solr
> Lucid has a fault tolerant setup with replication and fail over as well as 
> monitoring services in place. We are committed to maintaining and expanding 
> the search capabilities on the site.
> The following patch adds a skin to the Forrest site that enables the Lucene 
> site to search Lucene only content using Lucene/Solr. When a search is 
> submitted, it automatically selects the Lucene facet such that only Lucene 
> content is searched. From there, users can then narrow/broaden their search 
> criteria.
> I plan on committing in a 3 or 4 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

2009-06-29 Thread Tim Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith reopened LUCENE-1705:
---


Looks like i found an issue with this

The deleteAll() method isn't resetting the nextDocID on the DocumentsWriter (or 
some similar behaviour)

so, the following state will result in an error:
* deleteAll()
* updateDocument("5", doc)
* commit()

this results in a delete for doc "5" getting buffered, but with a very high 
"maxDocId"
at the same time, doc is added, however, the following will then occur on 
commit:
* flush segments to disk
* doc "5" is now in a segment on disk
* run deletes
* doc "5" is now blacklisted from segment 

Will work on fixing this and post a new patch (along with updated test case)

(was worried i was missing an edge case)

> Add deleteAllDocuments() method to IndexWriter
> --
>
> Key: LUCENE-1705
> URL: https://issues.apache.org/jira/browse/LUCENE-1705
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Index
>Affects Versions: 2.4
>Reporter: Tim Smith
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: IndexWriterDeleteAll.patch, LUCENE-1705.patch
>
>
> Ideally, there would be a deleteAllDocuments() or clear() method on the 
> IndexWriter
> This method should have the same performance and characteristics as:
> * currentWriter.close()
> * currentWriter = new IndexWriter(..., create=true,...)
> This would greatly optimize a delete all documents case. Using 
> deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
> existing index.
> IndexWriter.deleteAllDocuments() should have the same semantics as a 
> commit(), as far as index visibility goes (new IndexReader opening would get 
> the empty index)
> I see this was previously asked for in LUCENE-932, however it would be nice 
> to finally see this added such that the IndexWriter would not need to be 
> closed to perform the "clear" as this seems to be the general recommendation 
> for working with an IndexWriter now
> deleteAllDocuments() method should:
> * abort any background merges (they are pointless once a deleteAll has been 
> received)
> * write new segments file referencing no segments
> This method would remove one of the final reasons i would ever need to close 
> an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725200#action_12725200
 ] 

Shai Erera commented on LUCENE-1720:


bq. I'm not familiar with the proposal to pass around a Timeout object

On the email thread I offered to create on QueryWeight a scorer(IndexSearcher, 
boolean, boolean, Timeout) in order to pass a Timeout object to Scorer, and 
also create a TimeLimitedQuery. But it's no longer needed.

> TimeLimitedIndexReader and associated utility class
> ---
>
> Key: LUCENE-1720
> URL: https://issues.apache.org/jira/browse/LUCENE-1720
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725197#action_12725197
 ] 

Mark Harwood commented on LUCENE-1720:
--

bq. any custom Scorer which does a lot of work, but uses IndexReader for that, 
will be stopped, even if the Scorer's developer did not implement a Timeout 
mechanism. Right?

Correct. I'm not familiar with the proposal to pass around a Timeout object but 
I get the idea and the code here would certainly avoid that overhead.

bq. We can cleat it when the time out threads' Set's size() is 0?

Yes, that would work.


> TimeLimitedIndexReader and associated utility class
> ---
>
> Key: LUCENE-1720
> URL: https://issues.apache.org/jira/browse/LUCENE-1720
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1722) SmartChineseAnalyzer javadoc improvement

2009-06-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1722:


Attachment: LUCENE-1722.txt

patch file

> SmartChineseAnalyzer javadoc improvement
> 
>
> Key: LUCENE-1722
> URL: https://issues.apache.org/jira/browse/LUCENE-1722
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Priority: Minor
> Attachments: LUCENE-1722.txt
>
>
> Chinese -> English, and corrections to match reality (removes several javadoc 
> warnings)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725183#action_12725183
 ] 

Shai Erera commented on LUCENE-1720:


bq. With only a boolean it could be hard to know precisely when to clear it, no?

We can cleat it when the time out threads' Set's size() is 0?

I agree that this issue is mostly about IndexReader (and hence the name), and 
that the scenario of IndexWriter is weaker. But a utility class together w/ the 
TimeLimitedIndexReader example can help someone write a TimeLimitedIndexWriter 
very easily, and/or reuse this utility elsewhere.

> TimeLimitedIndexReader and associated utility class
> ---
>
> Key: LUCENE-1720
> URL: https://issues.apache.org/jira/browse/LUCENE-1720
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725182#action_12725182
 ] 

Eks Dev commented on LUCENE-1720:
-

Sure, I just wanted to "sharpen definition" what is Lucene core issue, and what 
we can leave to end users. It is not only about the time, rather about 
canceling search requests (even better, general activities). 

> TimeLimitedIndexReader and associated utility class
> ---
>
> Key: LUCENE-1720
> URL: https://issues.apache.org/jira/browse/LUCENE-1720
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1722) SmartChineseAnalyzer javadoc improvement

2009-06-29 Thread Robert Muir (JIRA)
SmartChineseAnalyzer javadoc improvement


 Key: LUCENE-1722
 URL: https://issues.apache.org/jira/browse/LUCENE-1722
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor


Chinese -> English, and corrections to match reality (removes several javadoc 
warnings)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725176#action_12725176
 ] 

Mark Harwood commented on LUCENE-1720:
--

bq. Oh, I did not mean to skip this check.

But the check is on a variable with a yes/no state. We need to cater for >1 
simultaneous timeout error condition in play. With only a boolean it could be 
hard to know precisely when to clear it, no?

bq. Mark here wanted to provide a much more generalized way of stopping any 
other activity, not just search

To be fair I think the use case for IndexWriter is weaker. In reader you have 
multiple users all expressing different queries and you want them all to share 
nicely with each other. In index writing it's typically a batch system indexing 
docs and there's no "fairness" to mediate? Breaking it out into a utility class 
seems like a good idea anyway.

> TimeLimitedIndexReader and associated utility class
> ---
>
> Key: LUCENE-1720
> URL: https://issues.apache.org/jira/browse/LUCENE-1720
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725172#action_12725172
 ] 

Shai Erera commented on LUCENE-1720:


bq. ... quickly testing a single volatile boolean, "anActivityHasTimedOut".

Oh, I did not mean to skip this check. After anActivityHasTimedOut is true, 
instead of comparing Thread.currentThread() to firstAnticipatedThreadToFail, 
check if Thread.currentThread() is in the failed HashSet of threads, or 
something like that.

I totally agree this should be kept and used that way, and it's probably better 
than numberOfTimedOutThreads since we don't need to inc/dec the latter every 
failure, just set a boolean flag and test it.

bq. Imo, the problem can be reformulated as "Provide possibility to cancel 
running queries on best effort basis, with or without providing so far 
collected results".

That's where we started from, but Mark here wanted to provide a much more 
generalized way of stopping any other activity, not just search. With this 
utility class, someone can implement a TimeLimitedIndexWriter which times out 
indexing, merging etc. Search is just one operation which will be covered as 
well.

I also think that TimeLimitingCollector already provides a possibility to 
"cancel running queries on a best effort basis" and therefore if someone is 
interested in just that, he doesn't need to use TimeLimitedIndexReader. However 
this approach seems much more simple if you want to ensure queries are stopped 
ASAP, w/o passing a Timeout object around or anything. This approach also 
guarantees (I think) that any custom Scorer which does a lot of work, but uses 
IndexReader for that, will be stopped, even if the Scorer's developer did not 
implement a Timeout mechanism. Right?

> TimeLimitedIndexReader and associated utility class
> ---
>
> Key: LUCENE-1720
> URL: https://issues.apache.org/jira/browse/LUCENE-1720
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725168#action_12725168
 ] 

Eks Dev commented on LUCENE-1720:
-

it's been late for this issue, but maybe worth thinking about. We could change 
semantics of this problem completely. Imo, the problem can be reformulated as 
"Provide possibility to cancel running queries on best effort basis, with or 
without providing so far collected results"

That would leave Timer management to the end users and make an issue focus on 
one "Lucene core" ... Timeout management can be then provided as an example 
somewhere "How to implement Timeout management using ..."








> TimeLimitedIndexReader and associated utility class
> ---
>
> Key: LUCENE-1720
> URL: https://issues.apache.org/jira/browse/LUCENE-1720
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725164#action_12725164
 ] 

Mark Harwood commented on LUCENE-1720:
--

Currently the class hinges on a "fast fail" mechanism whereby all the many 
calls checking for a timeout are very quickly testing a single volatile 
boolean, "anActivityHasTimedOut".
99.99% of calls are expected to fail this test (nothing has timed out) and fail 
quickly - I was reluctant to add any hashset lookup etc in there needed to 
determine failure.

With that as a guiding principle maybe the solution is to change
volatile boolean anActivityHasTimedOut
into
volatile int numberOfTimedOutThreads;

which would cater for >1 error condition at once. The fast-fail check then 
becomes:
if(numberOfTimedOutThreads > 0)
{
 if(timedoutThreads.contains(Thread.currentThread)
 { 
timedoutThreads.remove(Thread.currentThread);
numberOfTimedOutThreads=timedoutThreads.size();
throw RuntimeException.
 }
   }




> TimeLimitedIndexReader and associated utility class
> ---
>
> Key: LUCENE-1720
> URL: https://issues.apache.org/jira/browse/LUCENE-1720
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: customizing lucene formula

2009-06-29 Thread Grant Ingersoll

See the Payloads functionality along with the BoostingTermQuery.


On Jun 28, 2009, at 6:23 PM, B0DYLANG wrote:



Thanks for your response, what i want to do is to add a function  
like the log
to the well know lucene formula, this function will take its  
argument from

the alredy indexed data, for example if we add a field like this

new Field("terms","word,100;word,300",);
so when the score returned the second word will have higher score  
from the

first one



Grant Ingersoll-6 wrote:


The source code is available.  I'd start with the Similarity class  
and

see if it can be used.  Before that, however, you might describe what
it is you are interested in doing.  Perhaps there is an alternate way
that doesn't involve editing the source.


On Jun 26, 2009, at 4:31 AM, B0DYLANG wrote:



Dears,

i want to add some arguments to the lucene formula or override it,
is there
a mean of doing so ? thanks for your  response.
--
View this message in context:
http://www.nabble.com/customizing-lucene-formula-tp24216772p24216772.html
Sent from the Lucene - Java Developer mailing list archive at
Nabble.com.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org





--
View this message in context: 
http://www.nabble.com/customizing-lucene-formula-tp24216772p24246152.html
Sent from the Lucene - Java Developer mailing list archive at  
Nabble.com.



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725144#action_12725144
 ] 

Shai Erera commented on LUCENE-1720:


In stop(), shouldn't the 'else' part be reached only if the 
firstAnticipatedThreadToFail == Thread.currentThread()? Currently, if no thread 
has timed out, and I'm not the firstAnticipatedThreadToFail, the code will 
still look for a new candidate, and probably find the same 
firstAnticipatedThreadToFail. Right?

Also, even though that's somewhat mentioned in the class, we don't support 
multiple timing out threads, and I'm not sure if that's good. Currently, if two 
threads time out, and the calling thread to checkTimeOutIsThisThread() is not 
firstAnticipatedThreadToFail, it will continue processing. That may not be 
good, if the other thread is busy-waiting somewhere, and may not call 
checkTimeOutIsThisThread for a long time.

What if we change firstAnticipatedThreadToFail to a HashSet and call 
contains()? It's slower than '==', but safer, which is also an important aspect 
of this utility. TimeoutThread can add all the timeoud threads to this HashSet, 
when it detects a timeout has occurred (by iterating on all the 'registered' 
threads and their expected time out time, and compare to the current time). 
What do you think?

> TimeLimitedIndexReader and associated utility class
> ---
>
> Key: LUCENE-1720
> URL: https://issues.apache.org/jira/browse/LUCENE-1720
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: ActivityTimedOutException.java, 
> ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
> TimeLimitedIndexReader.java
>
>
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches 
> e.g. the document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
> before last "collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as 
> yet to work up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-06-29 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725141#action_12725141
 ] 

Tim Smith commented on LUCENE-1721:
---

i suppose even that approach would cause problems if segments merge between 
getting the segment number/local doc pair and actuallly asking for the delete

> IndexWriter to allow deletion by doc ids
> 
>
> Key: LUCENE-1721
> URL: https://issues.apache.org/jira/browse/LUCENE-1721
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. 
> It makes sense for cases where a "query" has been executed beforehand, and 
> later, that query needs to be applied in order to delete the matched 
> documents.
> More information here: 
> http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-06-29 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725140#action_12725140
 ] 

Tim Smith commented on LUCENE-1721:
---

how about a delete method on the IndexWriter that takes a segment number and a 
document id

it would also be required to add methods to the IndexReader to get the segment 
number and local document id for a docid, but this should then work just fine

> IndexWriter to allow deletion by doc ids
> 
>
> Key: LUCENE-1721
> URL: https://issues.apache.org/jira/browse/LUCENE-1721
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. 
> It makes sense for cases where a "query" has been executed beforehand, and 
> later, that query needs to be applied in order to delete the matched 
> documents.
> More information here: 
> http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-06-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725109#action_12725109
 ] 

Michael McCandless commented on LUCENE-1721:


This is a frequently requested feature, and I agree it'd be useful, but the 
problem is docID is in general not usable in the context of a writer since 
docIDs shift when segments that have deletions are committed.

> IndexWriter to allow deletion by doc ids
> 
>
> Key: LUCENE-1721
> URL: https://issues.apache.org/jira/browse/LUCENE-1721
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>
> It would be great if IndexWriter would allow for deletion by doc ids as well. 
> It makes sense for cases where a "query" has been executed beforehand, and 
> later, that query needs to be applied in order to delete the matched 
> documents.
> More information here: 
> http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-06-29 Thread Shay Banon (JIRA)
IndexWriter to allow deletion by doc ids


 Key: LUCENE-1721
 URL: https://issues.apache.org/jira/browse/LUCENE-1721
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon


It would be great if IndexWriter would allow for deletion by doc ids as well. 
It makes sense for cases where a "query" has been executed beforehand, and 
later, that query needs to be applied in order to delete the matched documents.

More information here: 
http://www.nabble.com/Delete-by-docId-in-IndexWriter-td24239930.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org