[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527304
 ] 

Michael McCandless commented on LUCENE-847:
---

> Looks like some anomalous tests. Last night I checked twice, but
> today results are: 58 to 48 in favor of Concurrent. I am going to
> assume my first results where invalid. Sorry for the noise and
> thanks for the great patch.

OK, phew!

> Has passed quite a few stress tests I run on my app without any
> problems so far.

I'm glad to hear that :)  Thanks for being such an early adopter!

> Do both merge policies allow for a closer to constant add time or is
> it just the Concurrent policy?

Not sure I understand the question -- you mean addDocument?  Yes it's
only ConcurrentMergeScheduler that should keep addDocument calls
constant time, because SerialMergeScheduler will hijack the addDocument
thread to do its merges.

> Factor merge policy out of IndexWriter
> --
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Steven Parkes
>Assignee: Steven Parkes
> Fix For: 2.3
>
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.take5.patch, LUCENE-847.take6.patch, LUCENE-847.take7.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527300
 ] 

Mark Miller commented on LUCENE-847:


Looks like some anomalous tests. Last night I checked twice, but today results 
are: 58 to 48 in favor of Concurrent. I am going to assume my first results 
where invalid. Sorry for the noise and thanks for the great patch. Has passed 
quite a few stress tests I run on my app without any problems so far. Do both 
merge policies allow for a closer to constant add time or is it just the 
Concurrent policy?

> Factor merge policy out of IndexWriter
> --
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Steven Parkes
>Assignee: Steven Parkes
> Fix For: 2.3
>
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.take5.patch, LUCENE-847.take6.patch, LUCENE-847.take7.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Oracle-Lucene integration (OJVMDirectory and Lucene Domain Index) - LONG

2007-09-13 Thread J. Delgado
I'm very happy to announce the partial rework and extension to LUCENE-724
(Oracle-Lucene Integration), primarily based on new requirements from
LendingClub.com, who commissioned the work to Marcelo Ochoa, the contributer
of the original patch (great job Marcelo!). As contribution of
LendingClub.com to the Lucene community we have posted the code on a public
CVS (sourceforge) as explained below.

Here at Lending Club (www.lendingclub.com) we have very specific needs
regarding the indexing of both structured and unstructured data, most of it
transactional in nature and siting in our Oracle !0gR2 DB, with a highly
complex schema. Our "ranking" of loans in the inventory includes components
of exact, textual and hardcore mathematical calculations including time,
amount and spatial constraints. This integration of Lucene into Oracle as a
Domain Index will now allow us to query this inventory in real-time. Going
against the Lucene index, created on "synthetic documents" comprised of
fields being populated from diverse tables (user data store), eliminates the
need to create very complex joins to link data from different tables at
query time. This, along with the support of the full Lucene query language,
makes this a great alternative to:

   1. Using Lucene outside the database which requires "crawling" the
   data and storing the index outside the database, loosing all the benefits of
   a fully transactional system and a secure environment.
   2. Using Oracle Text, which is very powerful but lacks the
   extensibility and flexibility that Lucene offers (for example, being able to
   query directly the index from the Java layer or implementing our our ranking
   algorithm), though to be completely fair some of it is addressed in the new
   Oracle DB 11g version.

If anyone is interested in learning more how we are going to use this within
Lending Club, please drop me a line. BTW, please make sure you check us out:
"Lending Club (http://www.lendingclub.com/), the rapidly growing
people-to-people (P2P) lending service that launched as a Facebook
application in May 2007, today announced the public availability of its
services with the launch of LendingClub.com. Lending Club connects lenders
and borrowers based upon shared affinities, enabling them to bypass banks to
secure better interest rates on loans"... more about the announcement here
http://www.sys-con.com/read/428678.htm. We have seen man entrepreneurs
applying for loans and being helped by regular people to build their
business with the money obtained at very low interest.

OK, without further marketing stuff (sorry for that), here is the original
note sent to me by Marcelo that summarizes all the new cool functionalities:

OJVMDirectory, a Lucene Integration running inside the Oracle JVM is going
one step further.

This new release includes:

   - Synchronized with latest Lucene 2.2.0 production
   - Replaced in memory storage using Vector based implementation by
   direct BLOB IO, reducing memory usage for large index.
   - Support for user data stores, it means you can not only index one
   column at time (limited by Data Cartridge API on 10g), now you can index
   multiples columns at base table and columns on related tabled joined
   together.
   - User Data Stores can be customized by the user, it means writing a
   simple Java Class users can control which column are indexed, padding
   - used or any other functionality previous to document adding step.
   - There is a DefaultUserDataStore which gets all columns of the query
   and built a Lucene Document with Fields representing each database
   - columns these fields are automatically padded if they have NUMBER or
   rounded if they have DATE data, for example.
   - lcontains() SQL operator support full Lucene's QueryParser syntax to
   provide access to all columns indexed, see examples below.
   - Support for DOMAIN_INDEX_SORT and FIRST_ROWS hint, it means that if
   you want to get rows order by lscore() operator (ascending,descending) the
   optimizer hint will assume that Lucene Domain Index will returns rowids in
   proper order avoided an inline-view to sort it.
   - Automatic index synchronization by using AQ's Call Back.
   - Lucene Domain Index creates extra tables named IndexName$T and an
   Oracle AQ named IndexName$Q with his storage table IndexName$QT at user's
   schema, so you can alter storage's preference if you want.
   - ojvm project is at SourceForge.net CVS, so anybody can get it and
   collaborate ;)
   - Tested against 10gR2 and 11g database.


Some sample usages:

create table t2 (
 f4 number primary key,
 f5 VARCHAR2(200));
create table t1 (
 f1 number,
 f2 CLOB,
 f3 number,
 CONSTRAINT t1_t2_fk FOREIGN KEY (f3)
 REFERENCES t2(f4) ON DELETE cascade);
create index it1 on t1(f3) indextype is lucene.LuceneIndex
 parameters('Analyzer:org.apache.lucene.analysis
.SimpleAnalyzer;ExtraCols:f2');

alter index it1
parameters('ExtraCols:f2,t2.f5;ExtraTabs:t2;WhereCondition:t1.f3=

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527297
 ] 

Michael McCandless commented on LUCENE-847:
---

> I have to triple check, but on first glance, my apps performance
> halfed using the ConcurrentMergeScheduler on a recent core duo with
> 2 GB RAM (As compared to the SerialMergeSceduler). Seems unexpected?

Whoa, that's certainly unexpected!  I'll go re-run my perf test.

> Factor merge policy out of IndexWriter
> --
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Steven Parkes
>Assignee: Steven Parkes
> Fix For: 2.3
>
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.take5.patch, LUCENE-847.take6.patch, LUCENE-847.take7.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527295
 ] 

Michael McCandless commented on LUCENE-847:
---


> Today, applications use multiple threads on IndexWriter to get some
> concurrency on document parsing. With this patch, applications that
> want concurrent merges would simply use ConcurrentMergeScheduler,
> no?

True.  OK I will make SerialMergeScheduler.merge serialized.  This way
only one merge can happen at a time even when the application is using
multiple threads.


> Factor merge policy out of IndexWriter
> --
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Steven Parkes
>Assignee: Steven Parkes
> Fix For: 2.3
>
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.take5.patch, LUCENE-847.take6.patch, LUCENE-847.take7.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527289
 ] 

Mark Miller commented on LUCENE-847:


I have to triple check, but on first glance, my apps performance halfed using 
the ConcurrentMergeScheduler on a recent core duo with 2 GB RAM (As compared to 
the SerialMergeSceduler). Seems unexpected?

> Factor merge policy out of IndexWriter
> --
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Steven Parkes
>Assignee: Steven Parkes
> Fix For: 2.3
>
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.take5.patch, LUCENE-847.take6.patch, LUCENE-847.take7.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Ning Li (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527286
 ] 

Ning Li commented on LUCENE-847:


> This was actually intentional: I thought it fine if the application is
> sending multiple threads into IndexWriter to allow merges to run
> concurrently.  Because, the application can always back down to a
> single thread to get everything serialized if that's really required?

Today, applications use multiple threads on IndexWriter to get some concurrency 
on document parsing. With this patch, applications that want concurrent merges 
would simply use ConcurrentMergeScheduler, no?

Or a rename since it doesn't really serialize merges?

> Factor merge policy out of IndexWriter
> --
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Steven Parkes
>Assignee: Steven Parkes
> Fix For: 2.3
>
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.take5.patch, LUCENE-847.take6.patch, LUCENE-847.take7.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

2007-09-13 Thread Sean Timm (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527264
 ] 

Sean Timm commented on LUCENE-997:
--

Here are some additional details on the changes.

New files:
TimeLimitedCollector.java

Extends HitCollector and detects timeouts resulting in a 
TimeLimitedCollector.TimeExceeded exception being thrown.

TimerThread.java

TimerThread provides a pseudo-clock service to all searching threads, so 
that they can count elapsed time with less overhead than repeatedly calling 
System.currentTimeMillis.  A single thread should be created to be used for all 
searches.

Modified Files:
Hits.java

Added partial result flag.

IndexSearcher.java

Catches TimeLimitedCollector.TimeExceeded, sets partial results flag on 
TopDocs and estimates the total hit count (if we hadn't timed out partway 
through).  Returns TopDocs with partial results.

Searcher.java

Added methods to set and get the timeout parameters.  This implementation 
decision has the limitation of only permitting a single timeout value per 
Searcher instance (of which there is usually only one).  However, this greatly 
minimizes the number of search methods that would need to be added.  In 
practice, I have not needed the functionality to change the timeout settings on 
a per query basis.

TopFieldDocCollector.java

Uses TimeLimitedCollector functionality.

TopDocCollector.java

Uses TimeLimitedCollector functionality and exposes it to child class 
TopFieldDocCollector.

TopDocs.java

Added partial results flag.  Note, TopFieldDocs extends this class and 
inherits the new functionality.

> Add search timeout support to Lucene
> 
>
> Key: LUCENE-997
> URL: https://issues.apache.org/jira/browse/LUCENE-997
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Sean Timm
>Priority: Minor
> Attachments: LuceneTimeoutTest.java, timeout.patch
>
>
> This patch is based on Nutch-308. 
> This patch adds support for a maximum search time limit. After this time is 
> exceeded, the search thread is stopped, partial results (if any) are returned 
> and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a 
> version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-997) Add search timeout support to Lucene

2007-09-13 Thread Sean Timm (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Timm updated LUCENE-997:
-

Attachment: LuceneTimeoutTest.java

Simple test case.  Run by passing in the index directory as an argument.

> Add search timeout support to Lucene
> 
>
> Key: LUCENE-997
> URL: https://issues.apache.org/jira/browse/LUCENE-997
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Sean Timm
>Priority: Minor
> Attachments: LuceneTimeoutTest.java, timeout.patch
>
>
> This patch is based on Nutch-308. 
> This patch adds support for a maximum search time limit. After this time is 
> exceeded, the search thread is stopped, partial results (if any) are returned 
> and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a 
> version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-997) Add search timeout support to Lucene

2007-09-13 Thread Sean Timm (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Timm updated LUCENE-997:
-

Attachment: timeout.patch

Patch against trunk revision 575451.

> Add search timeout support to Lucene
> 
>
> Key: LUCENE-997
> URL: https://issues.apache.org/jira/browse/LUCENE-997
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Sean Timm
>Priority: Minor
> Attachments: timeout.patch
>
>
> This patch is based on Nutch-308. 
> This patch adds support for a maximum search time limit. After this time is 
> exceeded, the search thread is stopped, partial results (if any) are returned 
> and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a 
> version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-997) Add search timeout support to Lucene

2007-09-13 Thread Sean Timm (JIRA)
Add search timeout support to Lucene


 Key: LUCENE-997
 URL: https://issues.apache.org/jira/browse/LUCENE-997
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Sean Timm
Priority: Minor


This patch is based on Nutch-308. 

This patch adds support for a maximum search time limit. After this time is 
exceeded, the search thread is stopped, partial results (if any) are returned 
and the total number of results is estimated.

This patch tries to minimize the overhead related to time-keeping by using a 
version of safe unsynchronized timer.

This was also discussed in an e-mail thread.
http://www.nabble.com/search-timeout-tf3410206.html#a9501029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527258
 ] 

Michael McCandless commented on LUCENE-847:
---

> Hmm, it's actually possible to have concurrent merges with
> SerialMergeScheduler.

This was actually intentional: I thought it fine if the application is
sending multiple threads into IndexWriter to allow merges to run
concurrently.  Because, the application can always back down to a
single thread to get everything serialized if that's really required?


> Factor merge policy out of IndexWriter
> --
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Steven Parkes
>Assignee: Steven Parkes
> Fix For: 2.3
>
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.take5.patch, LUCENE-847.take6.patch, LUCENE-847.take7.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Ning Li (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527239
 ] 

Ning Li commented on LUCENE-847:


Hmm, it's actually possible to have concurrent merges with SerialMergeScheduler.

Making SerialMergeScheduler.merge synchronize on SerialMergeScheduler will 
serialize all merges. A merge can still be concurrent with a ram flush.

Making SerialMergeScheduler.merge synchronize on IndexWriter will serialize all 
merges and ram flushes.

> Factor merge policy out of IndexWriter
> --
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Steven Parkes
>Assignee: Steven Parkes
> Fix For: 2.3
>
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.take5.patch, LUCENE-847.take6.patch, LUCENE-847.take7.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527227
 ] 

Michael McCandless commented on LUCENE-847:
---

Ahh, good catch.  Will fix!

> Factor merge policy out of IndexWriter
> --
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Steven Parkes
>Assignee: Steven Parkes
> Fix For: 2.3
>
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.take5.patch, LUCENE-847.take6.patch, LUCENE-847.take7.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Ning Li (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527224
 ] 

Ning Li commented on LUCENE-847:


Access of mergeThreads in ConcurrentMergeScheduler.merge() should be 
synchronized.

> Factor merge policy out of IndexWriter
> --
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Steven Parkes
>Assignee: Steven Parkes
> Fix For: 2.3
>
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.take5.patch, LUCENE-847.take6.patch, LUCENE-847.take7.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-996) Parsing mixed inclusive/exclusive range queries

2007-09-13 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527204
 ] 

Hoss Man commented on LUCENE-996:
-


so this changes the query syntax such that foo:{a TO z] and foo:[a TO z} are 
now legal ... the querysyntax docs should be modified to mention this in the 
patch as well.

one hitch: this seems to break backwards compatibility for anyone who has 
previously subclassed QueryParser and overridden the getRangeQuery(String, 
String, String, boolean) method ... if someone defines that method in their 
query parser, it will now never be called -- even if they don't take advantage 
of the new syntax.

off the top of my head, one way to remain backwards compatible is to have a 
deprecated getRangeQuery(String, String, String, boolean) method which does the 
same thing it currently does, and have the new getRangeQuery(String, String, 
String, boolean, boolean) method call it if the booleans have the same value 
... if they don't have the same value then do the new stuff.  document that 
people subclassing QueryParser that want to override RangeQueries only need to 
override the double boolean method.

> Parsing mixed inclusive/exclusive range queries
> ---
>
> Key: LUCENE-996
> URL: https://issues.apache.org/jira/browse/LUCENE-996
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.2
>Reporter: Andrew Schurman
>Priority: Minor
> Attachments: lucene-996.patch
>
>
> The current query parser doesn't handle parsing a range query (i.e. 
> ConstantScoreRangeQuery) with mixed inclusive/exclusive bounds.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-09-13 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: spanhighlighter11.patch

Thanks a lot Andy. As I suspected, the issue is that the conversion from 
PhraseQuery to SpanQuery is inexact. I have updated the code to handle this 
case though. If a PhraseQuery has 0 slop then the created Span query will now 
force an inorder match. This should be a nice improvement to the PhraseQuery to 
SpanQuery approximation.

Patch with fix and new junit test attached.

patch 11

- Mark

> Extend contrib Highlighter to properly support phrase queries and span queries
> --
>
> Key: LUCENE-794
> URL: https://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Mark Miller
>Priority: Minor
> Attachments: CachedTokenStream.java, CachedTokenStream.java, 
> CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, 
> Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, 
> Highlighter.java, HighlighterTest.java, HighlighterTest.java, 
> HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, 
> QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, 
> QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, 
> spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter2.patch, 
> spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
> spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
> spanhighlighter_patch_4.zip, SpanHighlighterTest.java, 
> SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, 
> WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
> package that scores just like QueryScorer, but scores a 0 for Terms that did 
> not cause the Query hit. This gives 'actual' hit highlighting for the range 
> of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
> to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-996) Parsing mixed inclusive/exclusive range queries

2007-09-13 Thread Andrew Schurman (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schurman updated LUCENE-996:
---

Attachment: lucene-996.patch

Potential fix for revision 574260.

> Parsing mixed inclusive/exclusive range queries
> ---
>
> Key: LUCENE-996
> URL: https://issues.apache.org/jira/browse/LUCENE-996
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.2
>Reporter: Andrew Schurman
>Priority: Minor
> Attachments: lucene-996.patch
>
>
> The current query parser doesn't handle parsing a range query (i.e. 
> ConstantScoreRangeQuery) with mixed inclusive/exclusive bounds.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-996) Parsing mixed inclusive/exclusive range queries

2007-09-13 Thread Andrew Schurman (JIRA)
Parsing mixed inclusive/exclusive range queries
---

 Key: LUCENE-996
 URL: https://issues.apache.org/jira/browse/LUCENE-996
 Project: Lucene - Java
  Issue Type: Improvement
  Components: QueryParser
Affects Versions: 2.2
Reporter: Andrew Schurman
Priority: Minor


The current query parser doesn't handle parsing a range query (i.e. 
ConstantScoreRangeQuery) with mixed inclusive/exclusive bounds.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-941) Benchmark alg line - {[AddDoc(4000)]: 4} : * - causes an infinite loop

2007-09-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527139
 ] 

Michael McCandless commented on LUCENE-941:
---

Doron are you working on this one?  I think we want to release 2.3 pretty soon 
and this one is marked with 2.3 fix version.

> Benchmark alg line -  {[AddDoc(4000)]: 4} : * - causes an infinite loop
> ---
>
> Key: LUCENE-941
> URL: https://issues.apache.org/jira/browse/LUCENE-941
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 2.3
>
>
> Background in 
> http://www.mail-archive.com/java-dev@lucene.apache.org/msg10831.html 
> The line  
>{[AddDoc(4000)]: 4} : * 
> causes an infinite loop because the parallel sequence would mask the 
> exhaustion from the outer sequential sequence.
> To fix this the DocMaker exhaustion check should be modified to rely  on the 
> doc maker instance only, and to be reset when the inputs are being reset. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-09-13 Thread Andy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527134
 ] 

Andy Liu commented on LUCENE-794:
-

Ah, I wasn't crazy.  I had the test data wrong.  Here's the code I'm using to 
produce the failing result:

String text = "y z x y z a b";

Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser("body", analyzer);
Query query = parser.parse("\"x y z\"");

CachingTokenFilter tokenStream = new 
CachingTokenFilter(analyzer.tokenStream("body", new StringReader(text)));
Highlighter highlighter = new Highlighter(new SpanScorer(query, "body", 
tokenStream));
highlighter.setTextFragmenter(new NullFragmenter());
tokenStream.reset();

String result = highlighter.getBestFragments(tokenStream, text, 1, 
"...");
System.out.println(result);

This produces:

y z x y z a b

The beginning y and z shouldn't be highlighted.

If I change the the beginning y and z to x and y, I get the correct result:

"x y x y z a b" => x y x y z a b

Here's a couple other failing results:

"z x y z a b" => z x y z a b
"z a x y z a b" => z a x y z a b

FYI, I'm using the latest version of Lucene.

> Extend contrib Highlighter to properly support phrase queries and span queries
> --
>
> Key: LUCENE-794
> URL: https://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Mark Miller
>Priority: Minor
> Attachments: CachedTokenStream.java, CachedTokenStream.java, 
> CachedTokenStream.java, DefaultEncoder.java, Encoder.java, Formatter.java, 
> Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, 
> Highlighter.java, HighlighterTest.java, HighlighterTest.java, 
> HighlighterTest.java, HighlighterTest.java, MemoryIndex.java, 
> QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, 
> QuerySpansExtractor.java, SimpleFormatter.java, spanhighlighter.patch, 
> spanhighlighter10.patch, spanhighlighter2.patch, spanhighlighter3.patch, 
> spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, 
> spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, 
> SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, 
> SpanScorer.java, WeightedSpanTerm.java
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
> package that scores just like QueryScorer, but scores a 0 for Terms that did 
> not cause the Query hit. This gives 'actual' hit highlighting for the range 
> of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
> to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]