date:20090406

[jira] Updated: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-06 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-1575:
---

Attachment: LUCENE-1575.8.patch

Added JustCompileSearch, JustCompileSearchFunction and JustCompileSearchSpans
that extend/implement all abstract classes/interfaces in o.a.l.s, o.a.l.s.s and
o.a.l.s.f. Those are not unit tests per-sei, however if anyone will change the
interfaces/abstract classes in a way that it breaks back-compat, we'll know it
right away. I think that in general this is something good to have for Lucene
overall, however I only took care of the search.* packages in this patch.

Refactoring Lucene collectors (HitCollector and extensions)
---

Key: LUCENE-1575
URL: https://issues.apache.org/jira/browse/LUCENE-1575
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Reporter: Shai Erera
Assignee: Michael McCandless
Fix For: 2.9

Attachments: LUCENE-1575.1.patch, LUCENE-1575.2.patch,
LUCENE-1575.3.patch, LUCENE-1575.4.patch, LUCENE-1575.5.patch,
LUCENE-1575.6.patch, LUCENE-1575.7.patch, LUCENE-1575.8.patch,
LUCENE-1575.patch, LUCENE-1575.patch, LUCENE-1575.patch, PerfTest.java,
sortBench5.py, sortCollate5.py

This issue is a result of a recent discussion we've had on the mailing list.
You can read the thread
[here|http://www.nabble.com/Is-TopDocCollector%27s-collect()-implementation-correct--td22557419.html].
We have agreed to do the following refactoring:
* Rename MultiReaderHitCollector to Collector, with the purpose that it will
be the base class for all Collector implementations.
* Deprecate HitCollector in favor of the new Collector.
* Introduce new methods in IndexSearcher that accept Collector, and deprecate
those that accept HitCollector.
** Create a final class HitCollectorWrapper, and use it in the deprecated
methods in IndexSearcher, wrapping the given HitCollector.
** HitCollectorWrapper will be marked deprecated, so we can remove it in 3.0,
when we remove HitCollector.
** It will remove any instanceof checks that currently exist in IndexSearcher
code.
* Create a new (abstract) TopDocsCollector, which will:
** Leave collect and setNextReader unimplemented.
** Introduce protected members PriorityQueue and totalHits.
** Introduce a single protected constructor which accepts a PriorityQueue.
** Implement topDocs() and getTotalHits() using the PQ and totalHits members.
These can be used as-are by extending classes, as well as be overridden.
** Introduce a new topDocs(start, howMany) method which will be used a
convenience method when implementing a search application which allows paging
through search results. It will also attempt to improve the memory
allocation, by allocating a ScoreDoc[] of the requested size only.
* Change TopScoreDocCollector to extend TopDocsCollector, use the topDocs()
and getTotalHits() implementations as they are from TopDocsCollector. The
class will also be made final.
* Change TopFieldCollector to extend TopDocsCollector, and make the class
final. Also implement topDocs(start, howMany).
* Change TopFieldDocCollector (deprecated) to extend TopDocsCollector,
instead of TopScoreDocCollector. Implement topDocs(start, howMany)
* Review other places where HitCollector is used, such as in Scorer,
deprecate those places and use Collector instead.
Additionally, the following proposal was made w.r.t. decoupling score from
collect():
* Change collect to accecpt only a doc Id (unbased).
* Introduce a setScorer(Scorer) method.
* If during collect the implementation needs the score, it can call
scorer.score().
If we do this, then we need to review all places in the code where
collect(doc, score) is called, and assert whether Scorer can be passed. Also
this raises few questions:
* What if during collect() Scorer is null? (i.e., not set) - is it even
possible?
* I noticed that many (if not all) of the collect() implementations discard
the document if its score is not greater than 0. Doesn't it mean that score
is needed in collect() always?
Open issues:
* The name for Collector
* TopDocsCollector was mentioned on the thread as TopResultsCollector, but
that was when we thought to call Colletor ResultsColletor. Since we decided
(so far) on Collector, I think TopDocsCollector makes sense, because of its
TopDocs output.
* Decoupling score from collect().
I will post a patch a bit later, as this is expected to be a very large
patch. I will split it into 2: (1) code patch (2) test cases (moving to use
Collector instead of HitCollector, as well as testing the new topDocs(start,
howMany) method.
There might be even a 3rd patch which handles the setScorer thing in
Collector (maybe even a different

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-06 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696020#action_12696020
]

Shai Erera commented on LUCENE-1575:

I'm using the latest version which sorts by that random field (the output
includes the prints of best, avg. and sum, so I'm sure of that). Also, the
times I reported are the 'best' time. I launch the JRE like you posted with
those args: -Xms1024M -Xmx1024M -Xbatch -server.

I reran now, and the results are consistent.

Refactoring Lucene collectors (HitCollector and extensions)
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-06 Thread Michael McCandless (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696017#action_12696017
]

Michael McCandless commented on LUCENE-1575:

Mark and Shai, are you guys using the last version of the bench (that sorts by
random int field)? Are you using the best time for your results? How are
you launching the JRE?

bq. BTW, if you look at Mike's table above, it's a black and white thing: the
1.5 JRE really like this patch and 1.6 really hate it. Maybe we should not move
to 1.6 then?

Actually, for my run on Linux, the patch was faster for both 1.5 1.6 JREs.

Refactoring Lucene collectors (HitCollector and extensions)
---

Attachments: LUCENE-1575.1.patch, LUCENE-1575.2.patch,
LUCENE-1575.3.patch, LUCENE-1575.4.patch, LUCENE-1575.5.patch,
LUCENE-1575.6.patch, LUCENE-1575.7.patch, LUCENE-1575.patch,
LUCENE-1575.patch, LUCENE-1575.patch, PerfTest.java, sortBench5.py,
sortCollate5.py

--
This message is

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-06 Thread Mark Miller (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696084#action_12696084
]

Mark Miller commented on LUCENE-1575:
-

I just used the defaults for cmd line - I can give it another go ensuring
server and more RAM. I used the latest perf code provided by Mike and the
latest patch.

I didn't look at the numbers too closely - my plan was to do a quick profile
with each, but eyeballing runs with each over and over, they were approx the
same (both best and avg), so I skipped the profiling.

Refactoring Lucene collectors (HitCollector and extensions)
---

--
This message is automatically generated by JIRA.
-
You can reply to

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696101#action_12696101
 ] 

Michael McCandless commented on LUCENE-1575:



I ran 2 more JREs under linux:

||OS||JRE||Trunk||Patch||%tg||
||Linux|1.7.0 ea|333 ms|320 ms|{color:green}3.9%{color}|
||Linux|IBM JRE 1.5.0|401 ms|352 ms|{color:green}12.2%{color}|


 Refactoring Lucene collectors (HitCollector and extensions)
 ---

 Key: LUCENE-1575
 URL: https://issues.apache.org/jira/browse/LUCENE-1575
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shai Erera
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: LUCENE-1575.1.patch, LUCENE-1575.2.patch, 
 LUCENE-1575.3.patch, LUCENE-1575.4.patch, LUCENE-1575.5.patch, 
 LUCENE-1575.6.patch, LUCENE-1575.7.patch, LUCENE-1575.8.patch, 
 LUCENE-1575.patch, LUCENE-1575.patch, LUCENE-1575.patch, PerfTest.java, 
 sortBench5.py, sortCollate5.py


 This issue is a result of a recent discussion we've had on the mailing list. 
 You can read the thread 
 [here|http://www.nabble.com/Is-TopDocCollector%27s-collect()-implementation-correct--td22557419.html].
 We have agreed to do the following refactoring:
 * Rename MultiReaderHitCollector to Collector, with the purpose that it will 
 be the base class for all Collector implementations.
 * Deprecate HitCollector in favor of the new Collector.
 * Introduce new methods in IndexSearcher that accept Collector, and deprecate 
 those that accept HitCollector.
 ** Create a final class HitCollectorWrapper, and use it in the deprecated 
 methods in IndexSearcher, wrapping the given HitCollector.
 ** HitCollectorWrapper will be marked deprecated, so we can remove it in 3.0, 
 when we remove HitCollector.
 ** It will remove any instanceof checks that currently exist in IndexSearcher 
 code.
 * Create a new (abstract) TopDocsCollector, which will:
 ** Leave collect and setNextReader unimplemented.
 ** Introduce protected members PriorityQueue and totalHits.
 ** Introduce a single protected constructor which accepts a PriorityQueue.
 ** Implement topDocs() and getTotalHits() using the PQ and totalHits members. 
 These can be used as-are by extending classes, as well as be overridden.
 ** Introduce a new topDocs(start, howMany) method which will be used a 
 convenience method when implementing a search application which allows paging 
 through search results. It will also attempt to improve the memory 
 allocation, by allocating a ScoreDoc[] of the requested size only.
 * Change TopScoreDocCollector to extend TopDocsCollector, use the topDocs() 
 and getTotalHits() implementations as they are from TopDocsCollector. The 
 class will also be made final.
 * Change TopFieldCollector to extend TopDocsCollector, and make the class 
 final. Also implement topDocs(start, howMany).
 * Change TopFieldDocCollector (deprecated) to extend TopDocsCollector, 
 instead of TopScoreDocCollector. Implement topDocs(start, howMany)
 * Review other places where HitCollector is used, such as in Scorer, 
 deprecate those places and use Collector instead.
 Additionally, the following proposal was made w.r.t. decoupling score from 
 collect():
 * Change collect to accecpt only a doc Id (unbased).
 * Introduce a setScorer(Scorer) method.
 * If during collect the implementation needs the score, it can call 
 scorer.score().
 If we do this, then we need to review all places in the code where 
 collect(doc, score) is called, and assert whether Scorer can be passed. Also 
 this raises few questions:
 * What if during collect() Scorer is null? (i.e., not set) - is it even 
 possible?
 * I noticed that many (if not all) of the collect() implementations discard 
 the document if its score is not greater than 0. Doesn't it mean that score 
 is needed in collect() always?
 Open issues:
 * The name for Collector
 * TopDocsCollector was mentioned on the thread as TopResultsCollector, but 
 that was when we thought to call Colletor ResultsColletor. Since we decided 
 (so far) on Collector, I think TopDocsCollector makes sense, because of its 
 TopDocs output.
 * Decoupling score from collect().
 I will post a patch a bit later, as this is expected to be a very large 
 patch. I will split it into 2: (1) code patch (2) test cases (moving to use 
 Collector instead of HitCollector, as well as testing the new topDocs(start, 
 howMany) method.
 There might be even a 3rd patch which handles the setScorer thing in 
 Collector (maybe even a different issue?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail:

[jira] Created: (LUCENE-1587) RangeQuery equals method does not compare collator property fully

2009-04-06 Thread Mark Platvoet (JIRA)

RangeQuery equals method does not compare collator property fully
-

 Key: LUCENE-1587
 URL: https://issues.apache.org/jira/browse/LUCENE-1587
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.4.1
Reporter: Mark Platvoet
Priority: Minor


The equals method in the range query has the collator comparison implemented as:
(this.collator != null  ! this.collator.equals(other.collator))

When _this.collator = null_ and _other.collator = someCollator_  this method 
will incorrectly assume they are equal. 

So adding something like
|| (this.collator == null  other.collator != null)
would fix the problem


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1588) Update Spatial Lucene sort to use FieldComparatorSource

2009-04-06 Thread patrick o'leary (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary updated LUCENE-1588:


Attachment: LUCENE-1588.patch

Deprecate DistanceSortSource and Add DistanceFieldComparator
updated Test case to use DistanceFieldComparator

Usage
{code}
// Create a distance sort
// As the radius filter has performed the distance calculations
// already, pass in the filter to reuse the results.
// 
DistanceFieldComparatorSource dsort = new 
DistanceFieldComparatorSource(dq.distanceFilter);
Sort sort = new Sort(new SortField(foo, dsort,false));

// Perform the search, using the term query, the serial chain filter, and the
// distance sort
Hits hits = searcher.search(customScore, dq.getFilter(),sort);
{code}

If nobody objects I'll apply this later today

 Update Spatial Lucene sort to use FieldComparatorSource
 ---

 Key: LUCENE-1588
 URL: https://issues.apache.org/jira/browse/LUCENE-1588
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Affects Versions: 2.9
Reporter: patrick o'leary
Assignee: patrick o'leary
Priority: Trivial
 Fix For: 2.9

 Attachments: LUCENE-1588.patch


 Update distance sorting to use FieldComparator sorting as opposed to 
 SortComparator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-06 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696145#action_12696145
]

Shai Erera commented on LUCENE-1575:

So how do we proceed? It looks like we get inconsistent results, sometimes over
the same OS and JRE, just different machine. Perhaps the test is too synthetic,
although it does capture the essence of the changes. Mike, can you post your
Wikipedia index somewhere so I can download and run your previous queries and
compare the results?

Refactoring Lucene collectors (HitCollector and extensions)
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the

[jira] Updated: (LUCENE-1587) RangeQuery equals method does not compare collator property fully

2009-04-06 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1587:


Attachment: LUCENE-1587.patch

 RangeQuery equals method does not compare collator property fully
 -

 Key: LUCENE-1587
 URL: https://issues.apache.org/jira/browse/LUCENE-1587
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.4.1
Reporter: Mark Platvoet
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1587.patch


 The equals method in the range query has the collator comparison implemented 
 as:
 (this.collator != null  ! this.collator.equals(other.collator))
 When _this.collator = null_ and _other.collator = someCollator_  this method 
 will incorrectly assume they are equal. 
 So adding something like
 || (this.collator == null  other.collator != null)
 would fix the problem

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696160#action_12696160
 ] 

Michael McCandless commented on LUCENE-1575:


bq. So how do we proceed?

The results are definitely highly varying...

It seems like I'm the only one seeing sizable performance loss with the patch,
and then only with 64bit JREs (on OS X and Windows Server 2004 x64).

Mark when you saw no performance loss on  64 bit linux, was the JRE
64 bit?

If so, then maybe we should simply proceed with the patch as is.
These differences are clearly java ghosts and there's not much we can
do about that

The index is a little too large (2.6G) to schlepp around -- instead,
here's the alg I used to create it:

{code}
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer

doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker

merge.policy=org.apache.lucene.index.LogDocMergePolicy

docs.file=/Volumes/External/lucene/wiki.txt
doc.stored = false
doc.term.vector = false
doc.add.log.step=1000
max.field.length=2147483647

directory=FSDirectory
autocommit=false
compound=false
ram.flush.mb = 128
doc.maker.forever = false

work.dir=/lucene/work

{ Rounds
  ResetSystemErase
  { BuildIndex
- CreateIndex
 { AddDocs AddDoc  : *
- CloseIndex
  }
  NewRound
} : 1

RepSumByPrefRound BuildIndex
{code}


 Refactoring Lucene collectors (HitCollector and extensions)
 ---

 Key: LUCENE-1575
 URL: https://issues.apache.org/jira/browse/LUCENE-1575
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shai Erera
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: LUCENE-1575.1.patch, LUCENE-1575.2.patch, 
 LUCENE-1575.3.patch, LUCENE-1575.4.patch, LUCENE-1575.5.patch, 
 LUCENE-1575.6.patch, LUCENE-1575.7.patch, LUCENE-1575.8.patch, 
 LUCENE-1575.patch, LUCENE-1575.patch, LUCENE-1575.patch, PerfTest.java, 
 sortBench5.py, sortCollate5.py


 This issue is a result of a recent discussion we've had on the mailing list. 
 You can read the thread 
 [here|http://www.nabble.com/Is-TopDocCollector%27s-collect()-implementation-correct--td22557419.html].
 We have agreed to do the following refactoring:
 * Rename MultiReaderHitCollector to Collector, with the purpose that it will 
 be the base class for all Collector implementations.
 * Deprecate HitCollector in favor of the new Collector.
 * Introduce new methods in IndexSearcher that accept Collector, and deprecate 
 those that accept HitCollector.
 ** Create a final class HitCollectorWrapper, and use it in the deprecated 
 methods in IndexSearcher, wrapping the given HitCollector.
 ** HitCollectorWrapper will be marked deprecated, so we can remove it in 3.0, 
 when we remove HitCollector.
 ** It will remove any instanceof checks that currently exist in IndexSearcher 
 code.
 * Create a new (abstract) TopDocsCollector, which will:
 ** Leave collect and setNextReader unimplemented.
 ** Introduce protected members PriorityQueue and totalHits.
 ** Introduce a single protected constructor which accepts a PriorityQueue.
 ** Implement topDocs() and getTotalHits() using the PQ and totalHits members. 
 These can be used as-are by extending classes, as well as be overridden.
 ** Introduce a new topDocs(start, howMany) method which will be used a 
 convenience method when implementing a search application which allows paging 
 through search results. It will also attempt to improve the memory 
 allocation, by allocating a ScoreDoc[] of the requested size only.
 * Change TopScoreDocCollector to extend TopDocsCollector, use the topDocs() 
 and getTotalHits() implementations as they are from TopDocsCollector. The 
 class will also be made final.
 * Change TopFieldCollector to extend TopDocsCollector, and make the class 
 final. Also implement topDocs(start, howMany).
 * Change TopFieldDocCollector (deprecated) to extend TopDocsCollector, 
 instead of TopScoreDocCollector. Implement topDocs(start, howMany)
 * Review other places where HitCollector is used, such as in Scorer, 
 deprecate those places and use Collector instead.
 Additionally, the following proposal was made w.r.t. decoupling score from 
 collect():
 * Change collect to accecpt only a doc Id (unbased).
 * Introduce a setScorer(Scorer) method.
 * If during collect the implementation needs the score, it can call 
 scorer.score().
 If we do this, then we need to review all places in the code where 
 collect(doc, score) is called, and assert whether Scorer can be passed. Also 
 this raises few questions:
 * What if during collect() Scorer is null? (i.e., not set) - is it even 
 possible?
 * I noticed that many (if not all) of the collect() implementations discard 
 the document if its score is not greater than 0. Doesn't

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-06 Thread Mark Miller (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696178#action_12696178
]

Mark Miller commented on LUCENE-1575:
-

Yes, both 64-bit versions - openjdk 6 and sun java 1.5. I appeared to be
getting the same results with both jvm's and patched or not. I figured I'd try
a bit of profiling, since I have a 64-bit setup, but doesnt appear I'd learn
much. I'm going to try a bit more testing tonight for the heck of it - I've got
sun 1.6 and a 32-bit 1.5 I could check with as well.

Refactoring Lucene collectors (HitCollector and extensions)
---

--
This message is automatically generated by JIRA.
-
You can reply to this email

[jira] Commented: (LUCENE-1313) Realtime Search

2009-04-06 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696186#action_12696186
 ] 

Jason Rutherglen commented on LUCENE-1313:
--

bq. So this has no external dependencies, right?

Yes.

{quote}I'd be very interested to compare (benchmark) this approach
vs solely LUCENE-1516.{quote}

Is the .alg using the NearRealtimeReader from LUCENE-1516 our
best measure of realtime performance?

{quote} 
the transactional restriction could/should layer on
top of this performance optimization for near-realtime search?
{quote}

The transactional system should be able to support both methods.
Perhaps a non-locking setting would allow the same RealtimeIndex
class support both modes of operation?

 Realtime Search
 ---

 Key: LUCENE-1313
 URL: https://issues.apache.org/jira/browse/LUCENE-1313
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, 
 lucene-1313.patch, lucene-1313.patch, lucene-1313.patch


 Realtime search with transactional semantics.  
 Possible future directions:
   * Optimistic concurrency
   * Replication
 Encoding each transaction into a set of bytes by writing to a RAMDirectory 
 enables replication.  It is difficult to replicate using other methods 
 because while the document may easily be serialized, the analyzer cannot.
 I think this issue can hold realtime benchmarks which include indexing and 
 searching concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1313) Realtime Search

2009-04-06 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696277#action_12696277
]

Jason Rutherglen commented on LUCENE-1313:
--

We'll need to integrate the RAM based indexer into IndexWriter
to carry over the deletes to the ram index while it's copied to
disk. This is similar to IndexWriter.commitMergedDeletes
carrying deletes over at the segment reader level based on a
comparison of the current reader and the cloned reader.
Otherwise there's redundant deletions to the disk index using
IW.deleteDocuments which can be unnecessarily expensive. To make
external we would need to do the delete by doc id genealogy.

Realtime Search
---

Key: LUCENE-1313
URL: https://issues.apache.org/jira/browse/LUCENE-1313
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
Fix For: 2.9

Attachments: LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch,
lucene-1313.patch, lucene-1313.patch, lucene-1313.patch

Realtime search with transactional semantics.
Possible future directions:
* Optimistic concurrency
* Replication
Encoding each transaction into a set of bytes by writing to a RAMDirectory
enables replication. It is difficult to replicate using other methods
because while the document may easily be serialized, the analyzer cannot.
I think this issue can hold realtime benchmarks which include indexing and
searching concurrently.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-06 Thread Jason Rutherglen (JIRA)

IndexWriter.addIndexesNoOptimize(IndexReader[] readers)
---

 Key: LUCENE-1589
 URL: https://issues.apache.org/jira/browse/LUCENE-1589
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 2.9


Similar to IndexWriter.addIndexesNoOptimize(Directory[] dirs)
but for IndexReaders. This will be used to flush cloned ram
indexes to disk for near realtime indexing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: HitCollector#collect(int,float,CollectionQuery)

2009-04-06 Thread Shai Erera

Hi Karl,

LUCENE-1575 refactors HitCollector by seperating the score from document
collection. So if we were to introduce this type of method (that you
suggest), it would be through a setQueries(CollectionQuery) method.

Maybe you try to verify if your use case makes sense, is efficient etc.,
before we do this change. Adding a setQueries to Collector (the new name of
HC) shouldn't be a problem since we can always add an empty-impl method, not
affecting back-compat. However I wonder from where will it be called,
whether it makes sense to create that Collection object, pass it around
while knowing that most collectors will not use it?

Is it something that you perhaps can implement by extending Collector (and
some other classes), and in your extending code call to setQueries? Today,
as far as I remember, only Scorer calls collect() and I'm not sure if Scorer
has the information of the matching queries. Even if it does, extending it
and calling setQueries from the extension seems more reasonable, than adding
such call to every query execution, which also means instantiating a new
CollectionQuery for every search (unless we provide an API on
IndexSearcher which allows you to pass such object).

What do you think?

On Tue, Apr 7, 2009 at 3:21 AM, Karl Wettin karl.wet...@gmail.com wrote:

 How crazy would it be to refactor HitCollector so it also accept the
 matching queries?

 Let's ignore my use case (not sure it makes sense yet, it's related to
 finding a threadshold between probably interesting and definitly not
 interesting results of huge OR-statements, but I really have to try it out
 before I can say if it's any good) and just focus on the speed impact. If I
 cleared and reused the Collection passed down to the HitCollector then it
 shouldn't really slow things down, right? And if I reused the collections in
 my TopDocsCollector as low scoring results was pushed down then it shouldn't
 have to be expensive there either. Or?


karl

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

[jira] Created: (LUCENE-1587) RangeQuery equals method does not compare collator property fully

[jira] Updated: (LUCENE-1588) Update Spatial Lucene sort to use FieldComparatorSource

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

[jira] Updated: (LUCENE-1587) RangeQuery equals method does not compare collator property fully

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

[jira] Commented: (LUCENE-1313) Realtime Search

[jira] Commented: (LUCENE-1313) Realtime Search

[jira] Created: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

Re: HitCollector#collect(int,float,CollectionQuery)

15 matches

Site Navigation

Mail list logo

Footer information