[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.

2011-01-24 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986267#action_12986267
 ] 

Uwe Schindler commented on LUCENE-2010:
---

Look fine to me! Its indeed quite simple. Will test this later.

Do you want to fix the rest of the tests and remove the text-only 
keepAllSegments method? At least this method should be hidden by a 
package-private accessor or, if not possible, @lucene.internal.

> Remove segments with all documents deleted in commit/flush/close of 
> IndexWriter instead of waiting until a merge occurs.
> 
>
> Key: LUCENE-2010
> URL: https://issues.apache.org/jira/browse/LUCENE-2010
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2010.patch
>
>
> I do not know if this is a bug in 2.9.0, but it seems that segments with all 
> documents deleted are not automatically removed:
> {noformat}
> 4 of 14: name=_dlo docCount=5
>   compound=true
>   hasProx=true
>   numFiles=2
>   size (MB)=0.059
>   diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 
> 2009-09-21 10:25:09, os=SunOS,
>  os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, 
> source=flush}
>   has deletions [delFileName=_dlo_1.del]
>   test: open reader.OK [5 deleted docs]
>   test: fields..OK [136 fields]
>   test: field norms.OK [136 fields]
>   test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens]
>   test: stored fields...OK [0 total field count; avg ? fields per doc]
>   test: term vectorsOK [0 total vector count; avg ? term/freq vector 
> fields per doc]
> {noformat}
> Shouldn't such segments not be removed automatically during the next 
> commit/close of IndexWriter?
> *Mike McCandless:*
> Lucene doesn't actually short-circuit this case, ie, if every single doc in a 
> given segment has been deleted, it will still merge it [away] like normal, 
> rather than simply dropping it immediately from the index, which I agree 
> would be a simple optimization. Can you open a new issue? I would think IW 
> can drop such a segment immediately (ie not wait for a merge or optimize) on 
> flushing new deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene & Google Summer of Code 2011

2011-01-24 Thread Simon Willnauer
On Mon, Jan 24, 2011 at 11:04 PM, Michael Busch  wrote:
> Oh my god, Uwe, I was hoping you would never write a "sophisticated™
> backwards® compatibility layer" again!

LOL - we all did :)
>
>  Michael
>
> On 1/24/11 12:39 PM, Uwe Schindler wrote:
>>
>> +1
>>
>> I also have an idea from the attributes and TokenStream policeman. So I
>> could even help mentoring.
>>
>> Uwe
>>
>>
>>
>> "Simon Willnauer"  schrieb:
>>
>>> hey folks,
>>>
>>> Google has announce GSoC 2011 lately and mentoring organizations can
>>> start submitting applications by the end of feb
>>>
>>> (http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/timeline).
>>> I wonder if we should participate this year again? I think we have
>>> plenty of work to do and its a great opportunity to get fresh blood
>>> into the project on both ends Solr&  Lucene.  I already have a couple
>>> of tasks / projects in mind though...
>>>
>>> Thoughts?
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>> --
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, 28213 Bremen
>> http://www.thetaphi.de
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene & Google Summer of Code 2011

2011-01-24 Thread Simon Willnauer
On Mon, Jan 24, 2011 at 11:58 PM, Grant Ingersoll  wrote:
> GSOC has been a great boon to Mahout.  +1 on us doing it.  Note, committers 
> should subscribe to code-awards@a.o to get on the list to coordinate efforts, 
> as the ASF only gets a certain number of slots.

Ah good to know! thanks grant!
>
>
> On Jan 24, 2011, at 3:29 PM, Simon Willnauer wrote:
>
>> hey folks,
>>
>> Google has announce GSoC 2011 lately and mentoring organizations can
>> start submitting applications by the end of feb
>> (http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/timeline).
>> I wonder if we should participate this year again? I think we have
>> plenty of work to do and its a great opportunity to get fresh blood
>> into the project on both ends Solr & Lucene.  I already have a couple
>> of tasks / projects in mind though...
>>
>> Thoughts?
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

2011-01-24 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986248#action_12986248
 ] 

Karl Wright commented on LUCENE-2868:
-

It occurs to me that the name of the common class that gets created in 
IndexSearcher and passed around should probably be named something more 
appropriate, like QueryContext.  That way people will feel free to extend it to 
hold all sorts of query-local data, in time.  Thoughts?


> It should be easy to make use of TermState; rewritten queries should be 
> shared automatically
> 
>
> Key: LUCENE-2868
> URL: https://issues.apache.org/jira/browse/LUCENE-2868
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Karl Wright
> Attachments: lucene-2868.patch, query-rewriter.patch
>
>
> When you have the same query in a query hierarchy multiple times, tremendous 
> savings can now be had if the user knows enough to share the rewritten 
> queries in the hierarchy, due to the TermState addition.  But this is clumsy 
> and requires a lot of coding by the user to take advantage of.  Lucene should 
> be smart enough to share the rewritten queries automatically.
> This can be most readily (and powerfully) done by introducing a new method to 
> Query.java:
> Query rewriteUsingCache(IndexReader indexReader)
> ... and including a caching implementation right in Query.java which would 
> then work for all.  Of course, all callers would want to use this new method 
> rather than the current rewrite().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-24 Thread Nick Pellow (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pellow updated LUCENE-2666:


Attachment: checkindex-out.txt

Hi Michael, 

Thanks for looking at that log.
I ran CheckIndex on the corrupt index - and have attached the output here. It 
doesn't appear to have detected any problems.

Do you think this problem could be caused by a cache not being flushed 
correctly ?

Cheers,
Nick

> ArrayIndexOutOfBoundsException when iterating over TermDocs
> ---
>
> Key: LUCENE-2666
> URL: https://issues.apache.org/jira/browse/LUCENE-2666
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.0.2
>Reporter: Shay Banon
> Attachments: checkindex-out.txt
>
>
> A user got this very strange exception, and I managed to get the index that 
> it happens on. Basically, iterating over the TermDocs causes an AAOIB 
> exception. I easily reproduced it using the FieldCache which does exactly 
> that (the field in question is indexed as numeric). Here is the exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
>   at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
>   at 
> org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
>   at TestMe.main(TestMe.java:56)
> It happens on the following segment: _26t docCount: 914 delCount: 1 
> delFileName: _26t_1.del
> And as you can see, it smells like a corner case (it fails for document 
> number 912, the AIOOB happens from the deleted docs). The code to recreate it 
> is simple:
> FSDirectory dir = FSDirectory.open(new File("index"));
> IndexReader reader = IndexReader.open(dir, true);
> IndexReader[] subReaders = reader.getSequentialSubReaders();
> for (IndexReader subReader : subReaders) {
> Field field = 
> subReader.getClass().getSuperclass().getDeclaredField("si");
> field.setAccessible(true);
> SegmentInfo si = (SegmentInfo) field.get(subReader);
> System.out.println("--> " + si);
> if (si.getDocStoreSegment().contains("_26t")) {
> // this is the probleatic one...
> System.out.println("problematic one...");
> FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
> FieldCache.NUMERIC_UTILS_LONG_PARSER);
> }
> }
> Here is the result of a check index on that segment:
>   8 of 10: name=_26t docCount=914
> compound=true
> hasProx=true
> numFiles=2
> size (MB)=1.641
> diagnostics = {optimize=false, mergeFactor=10, 
> os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
> lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
> os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
> has deletions [delFileName=_26t_1.del]
> test: open reader.OK [1 deleted docs]
> test: fields..OK [32 fields]
> test: field norms.OK [32 fields]
> test: terms, freq, prox...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
>   at 
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
>   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: stored fields...ERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
>   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
>   at TestMe.main(TestMe.java:47)
> test: term vectorsERROR [114]
> java.lang.ArrayIndexOutOfBoundsException: 114
>   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
>   at 
> org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
>   at 
> org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.j

[jira] Updated: (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2011-01-24 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-1972:
---

Fix Version/s: (was: 3.1)

> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> -
>
> Key: SOLR-1972
> URL: https://issues.apache.org/jira/browse/SOLR-1972
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Shawn Heisey
>Priority: Minor
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2314) replicate/index.jsp UI does not work with repeaters (both master and slave)

2011-01-24 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986155#action_12986155
 ] 

Hoss Man commented on SOLR-2314:


will: if you can try out the patch i've attached to SOLR-2320 and let me know 
if that solves the problem you were trying to describe, it would be much 
appreciated.

> replicate/index.jsp UI does not work with repeaters (both master and slave)
> ---
>
> Key: SOLR-2314
> URL: https://issues.apache.org/jira/browse/SOLR-2314
> Project: Solr
>  Issue Type: Bug
>  Components: web gui
>Affects Versions: 1.4.1
> Environment: jdk 1.6.0.23  ; both jetty and jboss/tomcat. 
>Reporter: will milspec
>Priority: Minor
>
> Summary:
> ==
> - Admin UI replication/index.jsp checks for master or slave with the 
> following code:
>if ("true".equals(detailsMap.get("isSlave"))) 
> -  if slave, replication/index.jsp displays the "Master" and "Poll 
> Intervals", etc. sections (everything up to "Cores")
> - if false, replication/index.jsp does not display the "Master", "Poll 
> Intervals" sections 
> -This "slave check/UI difference" works correctly if the solrconfig.xml has a 
>  "slave" but not "master" section or vice versa
> Expected results:
> ==
> Same UI difference would occur in the following scenario:
>a) solrconfig.xml has both master and slave entries
>b) use java.properties (-Dsolr.enable.master -Dsolr.enable.slave) to set 
> "master" or "slave" at runtime
> *OR*
> c) use solrcore.properties  to set "master" and "slave" at runtime
> Actual results:
> ==
> If solrconfig.xml has both master and slave entries, replication/index.jsp 
> shows both "master" and "slave" section regardless of system.properties or 
> solrcore.properties

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2320) ReplicationHandler doesn't return master details unless it's also configured as a slave

2011-01-24 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2320:
---

Attachment: SOLR-2320.patch

enhanced the test to also look at the details command for a repeater.

this required some rather invasive refactoring of the test in order to be able 
to construct a SolrInstance which was both a master and a slave -- but i think 
on the whole the test is improved (logic added for dealing with arbitrary 
"solrconfig-${name}.xml" files in SolrInstance constructor, and a lot of config 
file copying was refactored into common methods)

If there are no objections, i'll commit ASAP.

> ReplicationHandler doesn't return master details unless it's also configured 
> as a slave
> ---
>
> Key: SOLR-2320
> URL: https://issues.apache.org/jira/browse/SOLR-2320
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 1.4, 1.4.1
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-2320.patch, SOLR-2320.patch, SOLR-2320.patch
>
>
> While investigating SOLR-2314 i found a bug which seems to be the opposite of 
> the behavior described there -- so i'm filing a seperate bug to track it.
> if ReplicationHandler is only configured as a master, "command=details" 
> requests won't include the "master" section.  that section is only output if 
> it is also configured as a slave.
> the method responsible for the details command generates the "master" details 
> just fine, but the code to add it to the response seems to have erroneously 
> been nested inside an if that only evaluates to true if there is a non-null 
> SnapPuller (ie: it's also a slave)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2336) waitFLush, waitSearcher on optimize does not come back immediately

2011-01-24 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2336.


Resolution: Duplicate

SOLR-2018

> waitFLush, waitSearcher on optimize does not come back immediately
> --
>
> Key: SOLR-2336
> URL: https://issues.apache.org/jira/browse/SOLR-2336
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Bill Bell
>
> {code}
> http://localhost:8983/solr/provs/update?stream.body=%3Coptimize%20waitFlush=%22false%22%20waitSearcher=%22false%22/%3E
> {code}
> This sites there for 5 minutes while it is optimizing. Then it returns a 
> result QTime. I thought waitFlush, waitSearcher was supposed to return 
> immediately?
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2320) ReplicationHandler doesn't return master details unless it's also configured as a slave

2011-01-24 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2320:
---

Attachment: SOLR-2320.patch

Updated patch that includes a really trivial test of the details command (which 
fails w/o the previously mentioned fix)

> ReplicationHandler doesn't return master details unless it's also configured 
> as a slave
> ---
>
> Key: SOLR-2320
> URL: https://issues.apache.org/jira/browse/SOLR-2320
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 1.4, 1.4.1
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-2320.patch, SOLR-2320.patch
>
>
> While investigating SOLR-2314 i found a bug which seems to be the opposite of 
> the behavior described there -- so i'm filing a seperate bug to track it.
> if ReplicationHandler is only configured as a master, "command=details" 
> requests won't include the "master" section.  that section is only output if 
> it is also configured as a slave.
> the method responsible for the details command generates the "master" details 
> just fine, but the code to add it to the response seems to have erroneously 
> been nested inside an if that only evaluates to true if there is a non-null 
> SnapPuller (ie: it's also a slave)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2457) QueryNode implementors should override equals method

2011-01-24 Thread Adriano Crestani (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986116#action_12986116
 ] 

Adriano Crestani commented on LUCENE-2457:
--

Hi Shai,

Please, don't close it, this is a nice feature, mainly for automated tests. I 
haven't had time to give attention to it yet, but keep it in the open lis for 
now,t so we don't forget it :)

> QueryNode implementors should override equals method
> 
>
> Key: LUCENE-2457
> URL: https://issues.apache.org/jira/browse/LUCENE-2457
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Reporter: Adriano Crestani
>Priority: Minor
> Fix For: 3.2, 4.0
>
>
> Discussed on thread: http://markmail.org/thread/gjqk35t7e3y4fo5j
> "QueryNode(s) are data objects, and it makes sense to override
> their equals method. But before, we need to define what is a QueryNode
> equality. Should two nodes be considered equal if they represent
> syntactically or semantically the same query? e.g. an ORQueryNode created
> from the query  will not have the same children ordering as the
> query , so they are syntactically not equal, but they are
> semantically equal, because the order of the OR operands (usually) does not
> matter when the query is executed. I say it usually does not matter, because
> it's up to the Query object implementation built from that ORQueryNode
> object, for this reason, I vote for defining that two query nodes should be
> equals if they are syntactically equal.
> I also vote for excluding query node tags from the equality check, because
> they are not meant to represent the query structure, but to attach extra
> info to the node, which is usually used for communication between
> processors."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2317) Slaves have leftover index.xxxxx directories, and leftover files in index/ directory

2011-01-24 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986115#action_12986115
 ] 

Bill Bell commented on SOLR-2317:
-

Another issue - when there are leftover files in the directory, and I do a 
delta-index it seems to automatically do an optimize=true. This causes the 
delta index to take 5 minutes (same as optimize).

The optimize does not help anything since these files are orphaned.

Thanks.

> Slaves have leftover index.x directories, and leftover files in index/ 
> directory
> 
>
> Key: SOLR-2317
> URL: https://issues.apache.org/jira/browse/SOLR-2317
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.1
>Reporter: Bill Bell
>
> When replicating, we are getting leftover files on slaves. Some slaves are 
> getting index. with files leftover. And more concerning, the index/ 
> direcotry has left over files from previous replicated runs.
> This is a pain to keep cleaning up.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2336) waitFLush, waitSearcher on optimize does not come back immediately

2011-01-24 Thread Bill Bell (JIRA)
waitFLush, waitSearcher on optimize does not come back immediately
--

 Key: SOLR-2336
 URL: https://issues.apache.org/jira/browse/SOLR-2336
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Bill Bell


{code}

http://localhost:8983/solr/provs/update?stream.body=%3Coptimize%20waitFlush=%22false%22%20waitSearcher=%22false%22/%3E

{code}

This sites there for 5 minutes while it is optimizing. Then it returns a result 
QTime. I thought waitFlush, waitSearcher was supposed to return immediately?

Bill


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2584) Concurrency issues in SegmentInfo.files() could lead to ConcurrentModificationException

2011-01-24 Thread Alexander Kanarsky (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986092#action_12986092
 ] 

Alexander Kanarsky commented on LUCENE-2584:


Thanks Shai and Michael!

> Concurrency issues in SegmentInfo.files() could lead to 
> ConcurrentModificationException
> ---
>
> Key: LUCENE-2584
> URL: https://issues.apache.org/jira/browse/LUCENE-2584
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.9, 2.9.1, 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2
>Reporter: Alexander Kanarsky
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 2.9.5, 3.0.4, 3.1
>
> Attachments: LUCENE-2584-branch_3x.patch, 
> LUCENE-2584-lucene-2_9.patch, LUCENE-2584-lucene-3_0.patch, LUCENE-2584.patch
>
>
> The multi-threaded call of the files() in SegmentInfo could lead to the 
> ConcurrentModificationException if one thread is not finished additions to 
> the ArrayList (files) yet while the other thread already obtained it as 
> cached (see below). This is a rare exception, but it would be nice to fix. I 
> see the code is no longer problematic in the trunk (and others ported from 
> flex_1458), looks it was fixed while implementing post 3.x features. The fix 
> to 3.x and 2.9.x branches could be the same - create the files set first and 
> populate it, and then assign to the member variable at the end of the method. 
> This will resolve the issue. I could prepare the patch for 2.9.4 and 3.x, if 
> needed.
> --
> INFO: [19] webapp= path=/replication params={command=fetchindex&wt=javabin} 
> status=0 QTime=1
> Jul 30, 2010 9:13:05 AM org.apache.solr.core.SolrCore execute
> INFO: [19] webapp= path=/replication params={command=details&wt=javabin} 
> status=0 QTime=24
> Jul 30, 2010 9:13:05 AM org.apache.solr.handler.ReplicationHandler doFetch
> SEVERE: SnapPull failed
> java.util.ConcurrentModificationException
> at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
> at java.util.AbstractList$Itr.next(AbstractList.java:343)
> at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
> at org.apache.lucene.index.SegmentInfos.files(SegmentInfos.java:826)
> at 
> org.apache.lucene.index.DirectoryReader$ReaderCommit.(DirectoryReader.java:916)
> at 
> org.apache.lucene.index.DirectoryReader.getIndexCommit(DirectoryReader.java:856)
> at 
> org.apache.solr.search.SolrIndexReader.getIndexCommit(SolrIndexReader.java:454)
> at 
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:261)
> at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
> at 
> org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:146)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1240) Numerical Range faceting

2011-01-24 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1240.


Resolution: Fixed

Thanks for the poke yonik, i totally forgot i re-opened this.

committed the one line change to both 3x and trunk

> Numerical Range faceting
> 
>
> Key: SOLR-1240
> URL: https://issues.apache.org/jira/browse/SOLR-1240
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Gijs Kunze
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch, 
> SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch, 
> SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.use-nl.patch
>
>
> For faceting numerical ranges using many facet.query query arguments leads to 
> unmanageably large queries as the fields you facet over increase. Adding the 
> same faceting parameter for numbers which already exists for dates should fix 
> this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene & Google Summer of Code 2011

2011-01-24 Thread Grant Ingersoll
GSOC has been a great boon to Mahout.  +1 on us doing it.  Note, committers 
should subscribe to code-awards@a.o to get on the list to coordinate efforts, 
as the ASF only gets a certain number of slots.


On Jan 24, 2011, at 3:29 PM, Simon Willnauer wrote:

> hey folks,
> 
> Google has announce GSoC 2011 lately and mentoring organizations can
> start submitting applications by the end of feb
> (http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/timeline).
> I wonder if we should participate this year again? I think we have
> plenty of work to do and its a great opportunity to get fresh blood
> into the project on both ends Solr & Lucene.  I already have a couple
> of tasks / projects in mind though...
> 
> Thoughts?
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene & Google Summer of Code 2011

2011-01-24 Thread Michael Busch
Oh my god, Uwe, I was hoping you would never write a "sophisticated™ 
backwards® compatibility layer" again!


 Michael

On 1/24/11 12:39 PM, Uwe Schindler wrote:

+1

I also have an idea from the attributes and TokenStream policeman. So I could 
even help mentoring.

Uwe



"Simon Willnauer"  schrieb:


hey folks,

Google has announce GSoC 2011 lately and mentoring organizations can
start submitting applications by the end of feb
(http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/timeline).
I wonder if we should participate this year again? I think we have
plenty of work to do and its a great opportunity to get fresh blood
into the project on both ends Solr&  Lucene.  I already have a couple
of tasks / projects in mind though...

Thoughts?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.

2011-01-24 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2010:
---

Attachment: LUCENE-2010.patch

Patch.

The change itself is very simple -- I added a pruneDeletedSegments method to 
SegmentInfos, and I call that on commit in IW and IR.

But, then, various tests assume that Lucene doesn't do this :)  EG asserting 
maxDoc(), docFreq(), etc... so I had to fix those up...

> Remove segments with all documents deleted in commit/flush/close of 
> IndexWriter instead of waiting until a merge occurs.
> 
>
> Key: LUCENE-2010
> URL: https://issues.apache.org/jira/browse/LUCENE-2010
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2010.patch
>
>
> I do not know if this is a bug in 2.9.0, but it seems that segments with all 
> documents deleted are not automatically removed:
> {noformat}
> 4 of 14: name=_dlo docCount=5
>   compound=true
>   hasProx=true
>   numFiles=2
>   size (MB)=0.059
>   diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 
> 2009-09-21 10:25:09, os=SunOS,
>  os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, 
> source=flush}
>   has deletions [delFileName=_dlo_1.del]
>   test: open reader.OK [5 deleted docs]
>   test: fields..OK [136 fields]
>   test: field norms.OK [136 fields]
>   test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens]
>   test: stored fields...OK [0 total field count; avg ? fields per doc]
>   test: term vectorsOK [0 total vector count; avg ? term/freq vector 
> fields per doc]
> {noformat}
> Shouldn't such segments not be removed automatically during the next 
> commit/close of IndexWriter?
> *Mike McCandless:*
> Lucene doesn't actually short-circuit this case, ie, if every single doc in a 
> given segment has been deleted, it will still merge it [away] like normal, 
> rather than simply dropping it immediately from the index, which I agree 
> would be a simple optimization. Can you open a new issue? I would think IW 
> can drop such a segment immediately (ie not wait for a merge or optimize) on 
> flushing new deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Created: (LUCENE-2886) Adaptive Frame Of Reference

2011-01-24 Thread Paul Elschot
Any idea on how this compares to the vector split encoding here:
http://puma.isti.cnr.it/publichtml/section_cnr_isti/cnr_isti_2010-TR-016.html
?

Regards,
Paul Elschot

On Monday 24 January 2011 19:32:44 Renaud Delbru (JIRA) wrote:
> Adaptive Frame Of Reference 
> 
> 
>  Key: LUCENE-2886
>  URL: https://issues.apache.org/jira/browse/LUCENE-2886
>  Project: Lucene - Java
>   Issue Type: New Feature
>   Components: Codecs
> Reporter: Renaud Delbru
>  Fix For: 4.0
> 
> 
> We could test the implementation of the Adaptive Frame Of Reference [1] on 
> the lucene-4.0 branch.
> I am providing the source code of its implementation. Some work needs to be 
> done, as this implementation is working on the old lucene-1458 branch. 
> I will attach a tarball containing a running version (with tests) of the AFOR 
> implementation, as well as the implementations of PFOR and of Simple64 
> (simple family codec working on 64bits word) that has been used in the 
> experiments in [1].
> 
> [1] http://www.deri.ie/fileadmin/documents/deri-tr-afor.pdf
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen resolved LUCENE-1726.
--

Resolution: Won't Fix

> IndexWriter.readerPool create new segmentReader outside of sync block
> -
>
> Key: LUCENE-1726
> URL: https://issues.apache.org/jira/browse/LUCENE-1726
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Trivial
> Fix For: 4.0
>
> Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
> LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
> LUCENE-1726.trunk.test.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I think we will want to do something like what field cache does
> with CreationPlaceholder for IndexWriter.readerPool. Otherwise
> we have the (I think somewhat problematic) issue of all other
> readerPool.get* methods waiting for an SR to warm.
> It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1767) Add sizeof to OpenBitSet

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1767.


Resolution: Won't Fix

Won't be working on these and they're old

> Add sizeof to OpenBitSet
> 
>
> Key: LUCENE-1767
> URL: https://issues.apache.org/jira/browse/LUCENE-1767
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: 4.0
>
> Attachments: LUCENE-1767.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Adding a sizeof method to OpenBitSet will facilitate estimating RAM usage 
> when many OBS' are cached (such as Solr).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1711) Field meta-data

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1711.


Resolution: Won't Fix

Won't be working on these and they're old

> Field meta-data
> ---
>
> Key: LUCENE-1711
> URL: https://issues.apache.org/jira/browse/LUCENE-1711
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Allow user defined meta-data per Field. This would be stored by
> FieldInfos.write. Not sure about how to merge different values.
> The actual typed value should be Map available
> from Field. 
> The functionality can be used for a variety of purposes
> including trie, schemas, CSF, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1533) Deleted documents as a Filter or top level Query

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1533.


Resolution: Won't Fix

Won't be working on these and they're old

> Deleted documents as a Filter or top level Query
> 
>
> Key: LUCENE-1533
> URL: https://issues.apache.org/jira/browse/LUCENE-1533
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> In exploring alternative and perhaps faster ways to implement the
> deleted documents functionality, the idea of filtering the deleted
> documents at a higher level came up. This system would save on
> checking the deleted docs BitVector of each doc read from the posting
> list by SegmentTermDocs. This is equivalent to an AND NOT deleted
> docs query.
> If the patch improves the speed of indexes with delete documents,
> many core unit tests will need to change, or alternatively the
> functionality provided by this patch can be an IndexReader option.
> I'm thinking the first implementation will be a Filter in
> IndexSearcher. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1537) InstantiatedIndexReader.clone

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1537.


Resolution: Won't Fix

Won't be working on these and they're old

> InstantiatedIndexReader.clone
> -
>
> Key: LUCENE-1537
> URL: https://issues.apache.org/jira/browse/LUCENE-1537
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Karl Wettin
>Priority: Trivial
> Fix For: 4.0
>
> Attachments: LUCENE-1537.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> This patch will implement IndexReader.clone for InstantiatedIndexReader.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1526.


Resolution: Won't Fix

Won't be working on these and they're old

> For near real-time search, use paged copy-on-write BitVector impl
> -
>
> Key: LUCENE-1526
> URL: https://issues.apache.org/jira/browse/LUCENE-1526
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-1526.patch, LUCENE-1526.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader. 
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs. 
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be set in the clone/reopen methods or on the
> IndexReader. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1485) Use OpenBitSet instead of BitVector in SegmentReader

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1485.


Resolution: Won't Fix

Won't be working on these and they're old

> Use OpenBitSet instead of BitVector in SegmentReader
> 
>
> Key: LUCENE-1485
> URL: https://issues.apache.org/jira/browse/LUCENE-1485
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: TestDeletedDocsSpeed.java
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Tried out BitVector.get vs OpenBitSet.get here's the results which are about 
> the same after running 25 times in milliseconds.  It is assumed that 
> implementing DocIdSetIterator in SegmentTermDocs will speed things up more.
> bit set size: 10,485,760
> set bits count: 524,032
> openbitset: 68
> bitvector: 89
> 24% speed increase.
> I will implement a patch that adds the WriteableBitSet interface and make a 
> subclass of OpenBitSet that is writeable to disk.  We're working on an 
> isSparse method for OpenBitSet.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1473) Implement standard Serialization across Lucene versions

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1473.


Resolution: Won't Fix

Won't be working on these and they're old

> Implement standard Serialization across Lucene versions
> ---
>
> Key: LUCENE-1473
> URL: https://issues.apache.org/jira/browse/LUCENE-1473
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, 
> LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, 
> lucene-contrib-remote.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, 
> serialVersionUID needs to be added to classes that implement 
> java.io.Serializable.  java.io.Externalizable may be implemented in classes 
> for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1471) Faster MultiSearcher.search merge docs

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1471.


Resolution: Won't Fix

Won't be working on these and they're old

> Faster MultiSearcher.search merge docs 
> ---
>
> Key: LUCENE-1471
> URL: https://issues.apache.org/jira/browse/LUCENE-1471
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1471.patch, multisearcher.patch, 
> multisearcher.take2.patch, multisearcher.take3.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> MultiSearcher.search places sorted search results from individual searchers 
> into a PriorityQueue.  This can be made to be more optimal by taking 
> advantage of the fact that the results returned are already sorted.  
> The proposed solution places the sub-searcher results iterator into a custom 
> PriorityQueue that produces the sorted ScoreDocs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1336) Distributed Lucene using Hadoop RPC based RMI with dynamic classloading

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1336.


Resolution: Won't Fix

Won't be working on these and they're old

> Distributed Lucene using Hadoop RPC based RMI with dynamic classloading
> ---
>
> Key: LUCENE-1336
> URL: https://issues.apache.org/jira/browse/LUCENE-1336
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/*
>Affects Versions: 2.3.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: lucene-1336.patch, lucene-1336.patch, lucene-1336.patch
>
>
> Hadoop RPC based RMI system for use with Lucene Searchable.  Keeps the 
> application logic on the client side with removing the need to deploy 
> application logic to the Lucene servers.  Removes the need to provision new 
> code to potentially hundreds of servers for every application logic change.  
> The use case is any deployment requiring Lucene on many servers.  This system 
> provides the added advantage of allowing custom Query and Filter classes (or 
> other classes) to be defined on for example a development machine and 
> executed on the server without deploying the custom classes to the servers 
> first.  This can save a lot of time and effort in provisioning, restarting 
> processes.  In the future this patch will include an IndexWriterService 
> interface which will enable document indexing.  This will allow subclasses of 
> Analyzer to be dynamically loaded onto a server as documents are added by the 
> client.
> Hadoop RPC is more scalable than Sun's RMI implementation because it uses non 
> blocking sockets.  Hadoop RPC is also far easier to understand and customize 
> if needed as it is embodied in 2 main class files 
> org.apache.hadoop.ipc.Client and org.apache.hadoop.ipc.Server.  
> Features include automatic dynamic classloading.  The dynamic classloading 
> enables newly compiled client classes inheriting core objects such as Query 
> or Filter to be used to query the server without first deploying the code to 
> the server.  
> Using RMI dynamic classloading is not used in practice because it is hard to 
> setup, requiring placing the new code in jar files on a web server on the 
> client.  Then requires custom system properties to be setup as well as Java 
> security manager configuration.  
> The dynamic classloading in Hadoop RMI for Lucene uses RMI to load the 
> classes.  Custom serialization and deserialization manages the classes and 
> the class versions on the server and client side.  New class files are 
> automatically detected and loaded using ClassLoader.getResourceAsStream and 
> so this system does not require creating a JAR file.  The use of the same 
> networking system used for the remote method invocation is used for the 
> loading classes over the network.  This removes the necessity of a separate 
> web server dedicated to the task and makes deployment a few lines of code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1476.


Resolution: Won't Fix

Won't be working on these and they're old

> BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
> ---
>
> Key: LUCENE-1476
> URL: https://issues.apache.org/jira/browse/LUCENE-1476
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Trivial
> Attachments: hacked-deliterator.patch, LUCENE-1476.patch, 
> LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, 
> quasi_iterator_deletions.diff, quasi_iterator_deletions_r2.diff, 
> quasi_iterator_deletions_r3.diff, searchdeletes.alg, sortBench2.py, 
> sortCollate2.py, TestDeletesDocIdSet.java
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Update BitVector to implement DocIdSet.  Expose deleted docs DocIdSet from 
> IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1477) Pluggable SegmentReader.deletedDocs

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1477.


Resolution: Won't Fix

Won't be working on these and they're old

> Pluggable SegmentReader.deletedDocs
> ---
>
> Key: LUCENE-1477
> URL: https://issues.apache.org/jira/browse/LUCENE-1477
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
> Environment: Unix
>Reporter: Jason Rutherglen
>Priority: Minor
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Expose a set method in SegmentReader that allows setting the deletedDocs 
> variable.  For realtime indexing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1332) Enable reader and binary fields in InstantiatedIndexWriter

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1332.


Resolution: Won't Fix

Won't be working on these and they're old

> Enable reader and binary fields in InstantiatedIndexWriter
> --
>
> Key: LUCENE-1332
> URL: https://issues.apache.org/jira/browse/LUCENE-1332
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.3.1
>Reporter: Jason Rutherglen
>Priority: Trivial
> Attachments: lucene-1332.patch
>
>
> Currently InstantiatedIndexWriter does not support fields with a Reader or 
> that are binary.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1319) Allow user configurable buffersize for RAMDirectory

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1319.


Resolution: Won't Fix

Won't be working on these and they're old

> Allow user configurable buffersize for RAMDirectory
> ---
>
> Key: LUCENE-1319
> URL: https://issues.apache.org/jira/browse/LUCENE-1319
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Store
>Affects Versions: 2.3.1
>Reporter: Jason Rutherglen
>Priority: Trivial
> Attachments: lucene-1319.patch, lucene-1319.patch
>
>
> Currently RAMDirectory via RAMOutputStream has a package protected value of 
> 1024 as the buffer size for use in RAMFile.  This issue proposes adding a 
> single constructor to RAMDirectory allowing the user to specify the buffer 
> size.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1317) Add InstantiatedIndexWriter.addIndexes(IndexReader[] readers)

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1317.


Resolution: Won't Fix

Won't be working on these and they're old

> Add InstantiatedIndexWriter.addIndexes(IndexReader[] readers)
> -
>
> Key: LUCENE-1317
> URL: https://issues.apache.org/jira/browse/LUCENE-1317
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Jason Rutherglen
>Assignee: Karl Wettin
>
> Enable InstantiatedIndexWriter to have IndexReaders passed in like 
> IndexWriter and merged into the index.  
> Karl mentioned:
> bq: It's doable. The simplest solution I can think of is to reconstruct all 
> the documents in one single enumeration of the source index and then add them 
> to the writer. I'm however not certain this is the best way nor if 
> InstantiatedIndexWriter is the place for the code.
> How would the documents be reconstructed without creating a lot of overhead?  
> It seems like InstantiatedIndexWriter is the right place, given it is 
> presumably more efficient to recreate all the IndexReaders and then commit?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1313) Near Realtime Search (using a built in RAMDirectory)

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1313.


Resolution: Won't Fix

Won't be working on these and they're old

> Near Realtime Search (using a built in RAMDirectory)
> 
>
> Key: LUCENE-1313
> URL: https://issues.apache.org/jira/browse/LUCENE-1313
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, 
> LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, 
> lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, TestLuceneNRT.java
>
>
> Enable near realtime search in Lucene without external
> dependencies. When RAM NRT is enabled, the implementation adds a
> RAMDirectory to IndexWriter. Flushes go to the ramdir unless
> there is no available space. Merges are completed in the ram
> dir until there is no more available ram. 
> IW.optimize and IW.commit flush the ramdir to the primary
> directory, all other operations try to keep segments in ram
> until there is no more space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1292) Tag Index

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1292.


Resolution: Won't Fix

Won't be working on these and they're old

> Tag Index
> -
>
> Key: LUCENE-1292
> URL: https://issues.apache.org/jira/browse/LUCENE-1292
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.3.1
>Reporter: Jason Rutherglen
> Attachments: lucene-1292.patch
>
>
> The problem the tag index solves is slow field cache loading and range 
> queries, and reindexing an entire document to update fields that are not 
> tokenized.  
> The tag index holds untokenized terms with a docfreq of 1 in a term 
> dictionary like index file.  The file also stores the docs per term, similar 
> to LUCENE-1278.  The index also has a transaction log and in memory index for 
> realtime updates to the tags.  The transaction log is periodically merged 
> into the existing tag term dictionary index file.
> The TagIndexReader extends IndexReader and is unified with a regular index by 
> ParallelReader.  There is a doc id to terms skip pointer file for the 
> IndexReader.document method.  This file contains a pointer for looking up the 
> terms for a document.  
> There is a higher level class that encapsulates writing a document with tag 
> fields to IndexWriter and TagIndexWriter.  This requires a hook into 
> IndexWriter to coordinate doc ids and flushing segments to disk.  
> The writer class could be as simple as:
> {code}
> public class TagIndexWriter {
>   
>   public void add(Term term, DocIdSetIterator iterator) {
>   }
>   
>   public void delete(Term term, DocIdSetIterator iterator) {
>   }
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1289) Make FieldDocSortedHitQueue and methods public, make FieldSortedHitQueue.fillFields public

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1289.


Resolution: Won't Fix

Won't be working on these and they're old

> Make FieldDocSortedHitQueue and methods public, make 
> FieldSortedHitQueue.fillFields public 
> ---
>
> Key: LUCENE-1289
> URL: https://issues.apache.org/jira/browse/LUCENE-1289
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.3.1
>Reporter: Jason Rutherglen
>Priority: Minor
>
> In implementing custom MultiSearcher like class, need public access.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1761) Command line Solr check softwares

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1761.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Command line Solr check softwares
> -
>
> Key: SOLR-1761
> URL: https://issues.apache.org/jira/browse/SOLR-1761
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
> Fix For: Next
>
> Attachments: SOLR-1761.patch, SOLR-1761.patch
>
>
> I'm in need of a command tool Nagios and the like can execute that verifies a 
> Solr server is working... Basically it'll be a jar with apps that return 
> error codes if a given criteria isn't met.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1617) Cache and merge facets per segment

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1617.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Cache and merge facets per segment
> --
>
> Key: SOLR-1617
> URL: https://issues.apache.org/jira/browse/SOLR-1617
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
>
> Spinoff from SOLR-1308.  We'll enable per-segment facet caching and merging 
> which will allow near realtime faceted searching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1619) Cache documents by their internal ID

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1619.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Cache documents by their internal ID
> 
>
> Key: SOLR-1619
> URL: https://issues.apache.org/jira/browse/SOLR-1619
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
>
> Currently documents are cached by their Lucene docid, however we can instead 
> cache them using their schema derived unique id.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1618) Merge docsets on segment merge

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1618.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Merge docsets on segment merge
> --
>
> Key: SOLR-1618
> URL: https://issues.apache.org/jira/browse/SOLR-1618
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
>
> When SOLR-1308 is implemented, we can save some time when creating new 
> docsets by merging them in RAM as segments are merged (similar to LUCENE-1785)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1614) Search in Hadoop

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1614.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Search in Hadoop
> 
>
> Key: SOLR-1614
> URL: https://issues.apache.org/jira/browse/SOLR-1614
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
>
> What's the use case? Sometimes queries are expensive (such as
> regex) or one has indexes located in HDFS, that then need to be
> searched on. By leveraging Hadoop, these non-time sensitive
> queries may be executed without dynamically deploying the
> indexes to new Solr servers. 
> We'll download the index out of HDFS (assuming they're zipped),
> perform the queries in a batch on the index shard, then merge
> the results either using a Solr query results priority queue, or
> simply using Hadoop's built in merge sorting. 
> The query file will be encoded in JSON format, (ID, query,
> numresults,fields). The shards file will simply contain newline
> delimited paths (HDFS or otherwise). The output can be a Solr
> encoded results file per query.
> I'm hoping to add an actual Hadoop unit test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1502) Add form to perform updates

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1502.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Add form to perform updates
> ---
>
> Key: SOLR-1502
> URL: https://issues.apache.org/jira/browse/SOLR-1502
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
>
> A convenience UI to perform updates via the Web UI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1609) Create a cache implementation that limits itself to a given RAM size

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1609.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Create a cache implementation that limits itself to a given RAM size
> 
>
> Key: SOLR-1609
> URL: https://issues.apache.org/jira/browse/SOLR-1609
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
>
> This is a spinoff from the unrelated SOLR-1308. We can limit the
> cache sizes by estimated RAM usage. I think in some cases this
> is a better approach when compared with using soft references as
> this will effectively limit the cache RAM used. Soft references
> will utilize the max heap before divesting itself of excessive
> cached items, which in some cases may not be the desired
> behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1606) Integrate Near Realtime

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1606.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Integrate Near Realtime 
> 
>
> Key: SOLR-1606
> URL: https://issues.apache.org/jira/browse/SOLR-1606
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
> Attachments: SOLR-1606.patch
>
>
> We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1457) Deploy shards from HDFS into local cores

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1457.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Deploy shards from HDFS into local cores
> 
>
> Key: SOLR-1457
> URL: https://issues.apache.org/jira/browse/SOLR-1457
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
> Attachments: hadoop-0.19.0-core.jar, SOLR-1457.patch, SOLR-1457.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> This issue extends CoreAdminHandler to allow installation of new cores from 
> HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1495) Store UnInvertedField on disk

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1495.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Store UnInvertedField on disk
> -
>
> Key: SOLR-1495
> URL: https://issues.apache.org/jira/browse/SOLR-1495
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3, 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
>
> There are a couple of reasons for this, NRT and avoiding OOMs.
> Users who don't know, easily run into OOMs when generating
> facets on fields with numerous terms.  
> Creating the UIF on disk prior to making use of it means the
> user may know upfront the memory cost of their faceting
> operation (as opposed to after which leads to OOMs and
> unexpected behavior).
> For NRT it means amortizing the cost of creating UIFs into
> segment creation, as opposed to creating them dynamically as
> queries arrive.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1477) Search on multi-tier cores

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1477.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Search on multi-tier cores
> --
>
> Key: SOLR-1477
> URL: https://issues.apache.org/jira/browse/SOLR-1477
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
> Attachments: SOLR-1477.patch, SOLR-1477.patch, SOLR-1477.patch, 
> SOLR-1477.patch, SOLR-1477.patch
>
>
> Search on cores in the container, using distributed search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1438) Timeout distributed query stage get fields

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1438.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Timeout distributed query stage get fields
> --
>
> Key: SOLR-1438
> URL: https://issues.apache.org/jira/browse/SOLR-1438
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In a distributed query, timeouts work for PURPOSE_GET_TOP_IDS
> but we need them for PURPOSE_GET_FIELDS (obtaining the document
> data). We'll reuse the timeAllowed parameter and pass it to the
> shards during the get fields distributed request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1405) Show the index files in the web UI

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1405.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Show the index files in the web UI
> --
>
> Key: SOLR-1405
> URL: https://issues.apache.org/jira/browse/SOLR-1405
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: Next
>
> Attachments: data menu.png, Index file list.png, SOLR-1405.patch, 
> SOLR-1405.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> It would be great to view the actual index files from the web console.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1374) When a test fails, display the test file in the console via ant junit

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1374.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> When a test fails, display the test file in the console via ant junit
> -
>
> Key: SOLR-1374
> URL: https://issues.apache.org/jira/browse/SOLR-1374
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: Next
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When a test fails, it would be great if the junit test output file were 
> displayed in the terminal.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1308) Cache docsets at the SegmentReader level

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1308.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Cache docsets at the SegmentReader level
> 
>
> Key: SOLR-1308
> URL: https://issues.apache.org/jira/browse/SOLR-1308
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Solr caches docsets at the top level Multi*Reader level. After a
> commit, the filter/docset caches are flushed. Reloading the
> cache in near realtime (i.e. commits every 1s - 2min)
> unnecessarily consumes IO resources when reloading the filters,
> especially for largish indexes.
> We'll cache docsets at the SegmentReader level. The cache key
> will include the reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-1278) Near Realtime Search Replication

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-1278.
--

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Near Realtime Search Replication
> 
>
> Key: SOLR-1278
> URL: https://issues.apache.org/jira/browse/SOLR-1278
> Project: Solr
>  Issue Type: New Feature
>  Components: replication (java), search, update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Next
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Lucene 2.9 and later offer near realtime search (LUCENE-1516,
> LUCENE-1313). For SOLR this means integrating
> IndexWriter.getReader and adding a way to replicate newly
> created segments that may not exist on the file system to other
> SOLR servers in an efficient way. 
> I don't think replicating documents as is would be optimal as it
> requires re-analyzing on the slaves which we should seek to
> avoid.
> Issues:
> * Replicate using the existing Java based non-script system that
> uses HTTP or create a protocol that uses sockets?
> * Lucene needs a more efficient way of adding these segments
> (LUCENE-1738)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-576) Make DocSetHitCollector public

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-576.
-

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Make DocSetHitCollector public
> --
>
> Key: SOLR-576
> URL: https://issues.apache.org/jira/browse/SOLR-576
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Priority: Minor
>
> Make org.apache.solr.search.DocSetHitCollector public for use by other code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-569) SimpleFacet binarysearch optimization

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-569.
-

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> SimpleFacet binarysearch optimization
> -
>
> Key: SOLR-569
> URL: https://issues.apache.org/jira/browse/SOLR-569
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Priority: Minor
>
> Looks like the SimpleFacets.getFieldCacheCounts could have small optimization:
> {noformat}
> startTermIndex = Arrays.binarySearch(terms,prefix,nullStrComparator);
> if (startTermIndex<0) startTermIndex=-startTermIndex-1;
> // find the end term.  \u isn't a legal unicode char, but only compareTo
> // is used, so it should be fine, and is guaranteed to be bigger than legal 
> chars.
> endTermIndex = 
> Arrays.binarySearch(terms,prefix+"\u\u\u\u",nullStrComparator);
> endTermIndex = -endTermIndex-1;
> {noformat} 
> to:
> {noformat}
> startTermIndex = Arrays.binarySearch(terms,prefix,nullStrComparator);
> if (startTermIndex<0) startTermIndex=-startTermIndex-1;
> // find the end term.  \u isn't a legal unicode char, but only compareTo
> // is used, so it should be fine, and is guaranteed to be bigger than legal 
> chars.
> endTermIndex = Arrays.binarySearch(terms, startTermIndex,
> terms.length, prefix+"\u\u\u\u",nullStrComparator);
> endTermIndex = -endTermIndex-1;
> {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-568) MultiDocSet

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-568.
-

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> MultiDocSet
> ---
>
> Key: SOLR-568
> URL: https://issues.apache.org/jira/browse/SOLR-568
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
> Attachments: solr.568.5.8.2008.patch
>
>
> Analogous to MultiReader or MultiSearcher.  For use with implementations that 
> cache doc sets at the SegmentReader level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-567) SolrCore Pluggable

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-567.
-

Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> SolrCore Pluggable
> --
>
> Key: SOLR-567
> URL: https://issues.apache.org/jira/browse/SOLR-567
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
> Attachments: solr-567.patch, solr-567.patch
>
>
> SolrCore needs to be an abstract class with the existing functionality in a 
> subclass.  SolrIndexSearcher the same.  It seems that most of the Searcher 
> methods in SolrIndexSearcher are not used.  The new abstract class need only 
> have the methods used by the other Solr classes.  This will allow other 
> indexing and search implementations to reuse the other parts of Solr.  Any 
> other classes that have functionality specific to the Solr implementation of 
> indexing and replication such as SolrConfig can be made abstract.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-2138) Allow custom index readers when using IndexWriter.getReader

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-2138.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Allow custom index readers when using IndexWriter.getReader
> ---
>
> Key: LUCENE-2138
> URL: https://issues.apache.org/jira/browse/LUCENE-2138
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 3.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2138.patch
>
>
> This is needed for backwards compatible support with Solr, and is a spin-off 
> from SOLR-1606.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1738) IndexWriter.addIndexes without syncing

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1738.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> IndexWriter.addIndexes without syncing
> --
>
> Key: LUCENE-1738
> URL: https://issues.apache.org/jira/browse/LUCENE-1738
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-1738.patch, LUCENE-1738.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When LUCENE-1313 is completed, it would be good to have a way to
> replicate segments from one IndexWriter to another.
> * Callback on successful flush (maybe for other events as well?)
> * Ability to access files for a segment (which would presumably
> be read from the IW ramdir), then copy them to a temporary
> serialized ramdir (or equivalent as ramdir uses extra space in
> blocks, whereas we'll already know the size of the files before
> we write them).
> * On the receiving end, we may be able to use
> addIndexesNoOptimize(Directory[]), however this would entail
> each directory having an extraneous segment_N file for each
> replicated update (so we may want another format). 
> * It will rely on having a new public version of SegmentInfo. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-2083) Use ReadWriteLock in IndexWriter

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-2083.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Use ReadWriteLock in IndexWriter
> 
>
> Key: LUCENE-2083
> URL: https://issues.apache.org/jira/browse/LUCENE-2083
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.9.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
>
> Doing a small patch to make sure things don't break in a big
> way, we'll use RRWL replacing some of the global synchronized
> locks in IndexWriter. 
> We'll read lock during operations that for example delete from a
> segment, and gwrite lock when we're changing the main segment
> infos collection (i.e. we're swapping in new segments after a
> merge, or flushing a new segment). 
> I want to implement this, see if any tests break. 
> Spin off from LUCENE-2047.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-2071) Allow updating of IndexWriter SegmentReaders

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-2071.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Allow updating of IndexWriter SegmentReaders
> 
>
> Key: LUCENE-2071
> URL: https://issues.apache.org/jira/browse/LUCENE-2071
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.9.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2071.patch
>
>
> This discussion kind of started in LUCENE-2047.  Basically, we'll allow users 
> to perform delete document, and norms updates on SegmentReaders that are 
> handled by IndexWriter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-2063) Use thread pool in ConcurrentMergeScheduler

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-2063.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Use thread pool in ConcurrentMergeScheduler
> ---
>
> Key: LUCENE-2063
> URL: https://issues.apache.org/jira/browse/LUCENE-2063
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.9.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
>
> Currently it looks like CMS creates a new thread object for each
> merge, which may not be expensive anymore on Java5+ JVMs,
> however we can fairly simply implement the Java5 thread pooling.
> Also I'm thinking we may be interested in using thread pools for
> other tasks in IndexWriter (such as LUCENE-2047 performing
> deletes in the background). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1917) ShingleFilter include words

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1917.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> ShingleFilter include words
> ---
>
> Key: LUCENE-1917
> URL: https://issues.apache.org/jira/browse/LUCENE-1917
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> By default ShingleFilter creates shingles (i.e. combines tokens
> into a single token) from all tokens. For the purposes of for
> example, indexing stop words as shingles, however not creating
> shingles out of every word, we can supply an include words
> CharArraySet to ShingleFilter that determines the tokens to
> shingle. 
> This is similar to Nutch CommonGrams and SOLR-908. SOLR-908
> does not utilize the new token attribute API, and I figured this
> functionality is more suitable being a part of Lucene. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1746) Improve ParallelMultiSearcher

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1746.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Improve ParallelMultiSearcher
> -
>
> Key: LUCENE-1746
> URL: https://issues.apache.org/jira/browse/LUCENE-1746
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 3.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> * As we're going to Java5, we can use the java.util.concurrent
> thread pool. The thread pool size can default to the number of
> processors.
> * We can optimize usage of readers where small segments are
> searched sequentially, larger segments are searched on in
> parallel
> * Need a plan for how Collector.setNextReader works when
> parallelized (i.e. where do we add synchronization without
> creating a bottleneck?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1674) Add IndexReaderFactory to IndexWriter

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1674.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Add IndexReaderFactory to IndexWriter
> -
>
> Key: LUCENE-1674
> URL: https://issues.apache.org/jira/browse/LUCENE-1674
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: 4.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> With LUCENE-1516, IndexWriter.getReader, we take over the
> instantiating of IndexReaders which prevents users who have
> implemented custom IndexReader subclasses from using them.
> The patch will create an IndexWriter.setReaderFactory method and
> a IndexReaderFactory class that allows custom creation of the
> internal readers created by IndexWriter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1671) FSDirectory internally caches and clones FSIndexInput

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1671.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> FSDirectory internally caches and clones FSIndexInput
> -
>
> Key: LUCENE-1671
> URL: https://issues.apache.org/jira/browse/LUCENE-1671
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Store
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: 4.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> The patch will fix this small problem where if FSDirectory.openInput is 
> called, a new unnecessary file descriptors is opened (whereas an 
> IndexInput.clone would work).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1667) ConcurrentMergeScheduler use a thread pool (per directory)

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1667.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> ConcurrentMergeScheduler use a thread pool (per directory)
> --
>
> Key: LUCENE-1667
> URL: https://issues.apache.org/jira/browse/LUCENE-1667
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Modifiy ConcurrentMergeScheduler to use a thread pool per merge target 
> directory.  Add settings for the thread pool.  For use with LUCENE-1313.  
> We may want to wait to implement this in 3.0 when we can reuse 
> ThreadPoolExecutor in Java 1.5.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1577) Benchmark of different in RAM realtime techniques

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1577.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> Benchmark of different in RAM realtime techniques
> -
>
> Key: LUCENE-1577
> URL: https://issues.apache.org/jira/browse/LUCENE-1577
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-1577.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> A place to post code that benchmarks the differences in the speed of indexing 
> and searching using different realtime techniques.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader... readers)

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed LUCENE-1589.


Resolution: Won't Fix

Sorry if this spam's things, however it's unlikely that I'll work on these.

> IndexWriter.addIndexesNoOptimize(IndexReader... readers)
> 
>
> Key: LUCENE-1589
> URL: https://issues.apache.org/jira/browse/LUCENE-1589
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-1589.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Similar to IndexWriter.addIndexesNoOptimize(Directory[] dirs)
> but for IndexReaders. This will be used to flush cloned ram
> indexes to disk for near realtime indexing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2321) Upgrade Jetty to 6.1.26 in standard disto

2011-01-24 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2321:
---

Fix Version/s: 4.0
   3.1

> Upgrade Jetty to 6.1.26 in standard disto
> -
>
> Key: SOLR-2321
> URL: https://issues.apache.org/jira/browse/SOLR-2321
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4.1
>Reporter: Bill Bell
> Fix For: 3.1, 4.0
>
>
> Upgrade distro to 6.1.26.
> http://dist.codehaus.org/jetty/jetty-6.1.26/
> The main BUg that is causing trouble:
> [JETTY-547] - Jetty should rely on socket.shutdownOutput() to close sockets

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2011-01-24 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985975#action_12985975
 ] 

Jason Rutherglen commented on LUCENE-1574:
--

What size segments is the benchmark deleting against?  Maybe we're 
underestimating the speed of arraycopy, eg, it's really a hardware operation 
that could be optimized?

> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---
>
> Key: LUCENE-1574
> URL: https://issues.apache.org/jira/browse/LUCENE-1574
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-1574.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> PooledSegmentReader pools the underlying byte arrays of deleted docs and 
> norms for realtime search.  It is designed for use with IndexReader.clone 
> which can create many copies of byte arrays, which are of the same length for 
> a given segment.  When pooled they can be reused which could save on memory.  
> Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
> GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene & Google Summer of Code 2011

2011-01-24 Thread Simon Willnauer
On Mon, Jan 24, 2011 at 9:33 PM, Michael McCandless
 wrote:
> Big +1.
>
> We need all the help we can get...
>
> Should we make a wiki page where we can post/iterate on the ideas?

done - http://wiki.apache.org/lucene-java/SummerOfCode2011
>
> Mike
>
> On Mon, Jan 24, 2011 at 3:29 PM, Simon Willnauer
>  wrote:
>> hey folks,
>>
>> Google has announce GSoC 2011 lately and mentoring organizations can
>> start submitting applications by the end of feb
>> (http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/timeline).
>> I wonder if we should participate this year again? I think we have
>> plenty of work to do and its a great opportunity to get fresh blood
>> into the project on both ends Solr & Lucene.  I already have a couple
>> of tasks / projects in mind though...
>>
>> Thoughts?
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene & Google Summer of Code 2011

2011-01-24 Thread Uwe Schindler
+1

I also have an idea from the attributes and TokenStream policeman. So I could 
even help mentoring.

Uwe



"Simon Willnauer"  schrieb:

>hey folks,
>
>Google has announce GSoC 2011 lately and mentoring organizations can
>start submitting applications by the end of feb
>(http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/timeline).
>I wonder if we should participate this year again? I think we have
>plenty of work to do and its a great opportunity to get fresh blood
>into the project on both ends Solr & Lucene.  I already have a couple
>of tasks / projects in mind though...
>
>Thoughts?
>
>-
>To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>For additional commands, e-mail: dev-h...@lucene.apache.org

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2856) Create IndexWriter event listener, specifically for merges

2011-01-24 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2856:
-

Attachment: LUCENE-2856.patch

Here's an update, there's one nocommit as I'm not sure how we want to capture 
and exception and rethrow (a Throwable).  Adding the reason a flush occurred 
requires quite a bit of refactoring that we can probably leave for later if 
it's needed.  Updated to trunk, and all tests pass.

> Create IndexWriter event listener, specifically for merges
> --
>
> Key: LUCENE-2856
> URL: https://issues.apache.org/jira/browse/LUCENE-2856
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
> Attachments: LUCENE-2856.patch, LUCENE-2856.patch, LUCENE-2856.patch, 
> LUCENE-2856.patch, LUCENE-2856.patch
>
>
> The issue will allow users to monitor merges occurring within IndexWriter 
> using a callback notifier event listener.  This can be used by external 
> applications such as Solr to monitor large segment merges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2011-01-24 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1574:
---

Attachment: LUCENE-1574.patch

Attached rough patch.

At least one test fails

And, I haven't yet seen that this is in fact worthwhile.  The rough benchmark I 
have (which hits other issues so the results aren't conclusive yet) doesn't 
show much difference w/ this patch.  I think this patch may only be worthwhile 
at insane reopen rates, which I think in practice is rarely a legitimate use 
case (even though many apps start off thinking it is).

> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---
>
> Key: LUCENE-1574
> URL: https://issues.apache.org/jira/browse/LUCENE-1574
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-1574.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> PooledSegmentReader pools the underlying byte arrays of deleted docs and 
> norms for realtime search.  It is designed for use with IndexReader.clone 
> which can create many copies of byte arrays, which are of the same length for 
> a given segment.  When pooled they can be reused which could save on memory.  
> Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
> GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2887) Remove/deprecate IndexReader.undeleteAll

2011-01-24 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985967#action_12985967
 ] 

Andrzej Bialecki  commented on LUCENE-2887:
---

The MultiPassIndexSplitter won't work without this API. It can be modified to 
use a subclass of IndexReader & term / postings enumerators to restore the 
current behavior, or it can be rewritten entirely as a (fabled) 
SinglePassIndexSplitter. ;)

> Remove/deprecate IndexReader.undeleteAll
> 
>
> Key: LUCENE-2887
> URL: https://issues.apache.org/jira/browse/LUCENE-2887
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1
>
>
> This API is rather dangerous in that it's "best effort" since it can only 
> un-delete docs that have not yet been merged away, or, dropped (as of 
> LUCENE-2010).
> Given that it exposes impl details of how Lucene prunes deleted docs, I think 
> we should remove this API.
> Are there legitimate use cases?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Lucene & Google Summer of Code 2011

2011-01-24 Thread karl.wright
A nice idea.  I've always wondered about this, because for me "summer" and 
"code" do not go together very well. ;-)
Karl

-Original Message-
From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com] 
Sent: Monday, January 24, 2011 3:30 PM
To: dev@lucene.apache.org
Subject: Lucene & Google Summer of Code 2011

hey folks,

Google has announce GSoC 2011 lately and mentoring organizations can
start submitting applications by the end of feb
(http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/timeline).
I wonder if we should participate this year again? I think we have
plenty of work to do and its a great opportunity to get fresh blood
into the project on both ends Solr & Lucene.  I already have a couple
of tasks / projects in mind though...

Thoughts?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene & Google Summer of Code 2011

2011-01-24 Thread Michael McCandless
Big +1.

We need all the help we can get...

Should we make a wiki page where we can post/iterate on the ideas?

Mike

On Mon, Jan 24, 2011 at 3:29 PM, Simon Willnauer
 wrote:
> hey folks,
>
> Google has announce GSoC 2011 lately and mentoring organizations can
> start submitting applications by the end of feb
> (http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/timeline).
> I wonder if we should participate this year again? I think we have
> plenty of work to do and its a great opportunity to get fresh blood
> into the project on both ends Solr & Lucene.  I already have a couple
> of tasks / projects in mind though...
>
> Thoughts?
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2881) Track FieldInfo per segment instead of per-IW-session

2011-01-24 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch reassigned LUCENE-2881:
-

Assignee: Michael Busch

> Track FieldInfo per segment instead of per-IW-session
> -
>
> Key: LUCENE-2881
> URL: https://issues.apache.org/jira/browse/LUCENE-2881
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: Realtime Branch, CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Michael Busch
> Fix For: Realtime Branch, CSF branch, 4.0
>
>
> Currently FieldInfo is tracked per IW session to guarantee consistent global 
> field-naming / ordering. IW carries FI instances over from previous segments 
> which also carries over field properties like isIndexed etc. While having 
> consistent field ordering per IW session appears to be important due to bulk 
> merging stored fields etc. carrying over other properties might become 
> problematic with Lucene's Codec support.  Codecs that rely on consistent 
> properties in FI will fail if FI properties are carried over.
> The DocValuesCodec (DocValuesBranch) for instance writes files per segment 
> and field (using the field id within the file name). Yet, if a segment has no 
> DocValues indexed in a particular segment but a previous segment in the same 
> IW session had DocValues, FieldInfo#docValues will be true  since those 
> values are reused from previous segments. 
> We already work around this "limitation" in SegmentInfo with properties like 
> hasVectors or hasProx which is really something we should manage per Codec & 
> Segment. Ideally FieldInfo would be managed per Segment and Codec such that 
> its properties are valid per segment. It also seems to be necessary to bind 
> FieldInfoS to SegmentInfo logically since its really just per segment 
> metadata.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene & Google Summer of Code 2011

2011-01-24 Thread Simon Willnauer
hey folks,

Google has announce GSoC 2011 lately and mentoring organizations can
start submitting applications by the end of feb
(http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/timeline).
I wonder if we should participate this year again? I think we have
plenty of work to do and its a great opportunity to get fresh blood
into the project on both ends Solr & Lucene.  I already have a couple
of tasks / projects in mind though...

Thoughts?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2887) Remove/deprecate IndexReader.undeleteAll

2011-01-24 Thread Michael McCandless (JIRA)
Remove/deprecate IndexReader.undeleteAll


 Key: LUCENE-2887
 URL: https://issues.apache.org/jira/browse/LUCENE-2887
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1


This API is rather dangerous in that it's "best effort" since it can only 
un-delete docs that have not yet been merged away, or, dropped (as of 
LUCENE-2010).

Given that it exposes impl details of how Lucene prunes deleted docs, I think 
we should remove this API.

Are there legitimate use cases?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2881) Track FieldInfo per segment instead of per-IW-session

2011-01-24 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985959#action_12985959
 ] 

Simon Willnauer commented on LUCENE-2881:
-

bq. Yeah I agree. Hmm maybe I can spend some hours tonight on this, otherwise I 
don't think I'll have much time before Thursday.
Michael, if you start something I can work on it tomorrow for a while too. That 
way we can get it in quickly ;) 

> Track FieldInfo per segment instead of per-IW-session
> -
>
> Key: LUCENE-2881
> URL: https://issues.apache.org/jira/browse/LUCENE-2881
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: Realtime Branch, CSF branch, 4.0
>Reporter: Simon Willnauer
> Fix For: Realtime Branch, CSF branch, 4.0
>
>
> Currently FieldInfo is tracked per IW session to guarantee consistent global 
> field-naming / ordering. IW carries FI instances over from previous segments 
> which also carries over field properties like isIndexed etc. While having 
> consistent field ordering per IW session appears to be important due to bulk 
> merging stored fields etc. carrying over other properties might become 
> problematic with Lucene's Codec support.  Codecs that rely on consistent 
> properties in FI will fail if FI properties are carried over.
> The DocValuesCodec (DocValuesBranch) for instance writes files per segment 
> and field (using the field id within the file name). Yet, if a segment has no 
> DocValues indexed in a particular segment but a previous segment in the same 
> IW session had DocValues, FieldInfo#docValues will be true  since those 
> values are reused from previous segments. 
> We already work around this "limitation" in SegmentInfo with properties like 
> hasVectors or hasProx which is really something we should manage per Codec & 
> Segment. Ideally FieldInfo would be managed per Segment and Codec such that 
> its properties are valid per segment. It also seems to be necessary to bind 
> FieldInfoS to SegmentInfo logically since its really just per segment 
> metadata.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.

2011-01-24 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-2010:
--

Assignee: Michael McCandless

> Remove segments with all documents deleted in commit/flush/close of 
> IndexWriter instead of waiting until a merge occurs.
> 
>
> Key: LUCENE-2010
> URL: https://issues.apache.org/jira/browse/LUCENE-2010
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
>
> I do not know if this is a bug in 2.9.0, but it seems that segments with all 
> documents deleted are not automatically removed:
> {noformat}
> 4 of 14: name=_dlo docCount=5
>   compound=true
>   hasProx=true
>   numFiles=2
>   size (MB)=0.059
>   diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 
> 2009-09-21 10:25:09, os=SunOS,
>  os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, 
> source=flush}
>   has deletions [delFileName=_dlo_1.del]
>   test: open reader.OK [5 deleted docs]
>   test: fields..OK [136 fields]
>   test: field norms.OK [136 fields]
>   test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens]
>   test: stored fields...OK [0 total field count; avg ? fields per doc]
>   test: term vectorsOK [0 total vector count; avg ? term/freq vector 
> fields per doc]
> {noformat}
> Shouldn't such segments not be removed automatically during the next 
> commit/close of IndexWriter?
> *Mike McCandless:*
> Lucene doesn't actually short-circuit this case, ie, if every single doc in a 
> given segment has been deleted, it will still merge it [away] like normal, 
> rather than simply dropping it immediately from the index, which I agree 
> would be a simple optimization. Can you open a new issue? I would think IW 
> can drop such a segment immediately (ie not wait for a merge or optimize) on 
> flushing new deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2884) StandardCodec sometimes supplies skip pointers past EOF

2011-01-24 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2884.
-

   Resolution: Fixed
Fix Version/s: 4.0
 Assignee: Michael McCandless

Thanks for tracking this down Mike!


> StandardCodec sometimes supplies skip pointers past EOF
> ---
>
> Key: LUCENE-2884
> URL: https://issues.apache.org/jira/browse/LUCENE-2884
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Codecs
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> Pretty sure this is 4.0-only:
> I added an assertion, the test to reproduce is:
> ant test-core -Dtestcase=TestPayloadNearQuery -Dtestmethod=testMinFunction 
> -Dtests.seed=4841190615781133892:3888521539169738727 -Dtests.multiplier=3
> {noformat}
> [junit] Testcase: 
> testMinFunction(org.apache.lucene.search.payloads.TestPayloadNearQuery):  
> FAILED
> [junit] invalid skip pointer: 404, length=337
> [junit] junit.framework.AssertionFailedError: invalid skip pointer: 404, 
> length=337
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059)
> [junit] at 
> org.apache.lucene.index.codecs.MultiLevelSkipListReader.init(MultiLevelSkipListReader.java:176)
> [junit] at 
> org.apache.lucene.index.codecs.standard.DefaultSkipListReader.init(DefaultSkipListReader.java:50)
> [junit] at 
> org.apache.lucene.index.codecs.standard.StandardPostingsReader$SegmentDocsAndPositionsAndPayloadsEnum.advance(StandardPostingsReader.java:742)
> [junit] at 
> org.apache.lucene.search.spans.TermSpans.skipTo(TermSpans.java:72)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



StopTokenizer Proposal

2011-01-24 Thread Fernando Wasylyszyn
Hi everybody. I am a developer and researcher working for Snoop Consulting 
S.R.L. in Argentina, specially in projects related to information retrieval and 
machine learning.
Working on a project from Yell Argentina (Yellow Pages) I have developed what I 
called a StopTokenizer.

Problem:

We developed a small "suggest engine" to be included in the project. This 
suggest engine shouldn't generate suggestions for a set of stopwords (for 
example: "for"). So we add a StopFilter with a predefined set of stopwords 
(including "for"). The problem arised when we tested the engine with prefixes 
that match with a stopword. For example, we were testing "for" expecting 
"forsaken" to be returned as a result and it did not happen.

Solution:

We implemented the StopTokenizer. This tokenizer takes an input string from a 
Reader, a set of characters to be used as delimiters and a set of stopwords. 
The 
text is tokenized using the delimiters. Then, it analyze each token and decides 
if a token is a stopword not only based on a predefined set of stopwords  (like 
the StopFilter does) but also based on:

1) The position of the token: if the text is "for sending" and whitespace is a 
delimiter, then "for" is recognized as a stopword. Also if the the text is 
"this 
is for sending"
2) Characters surrounding the token: if the text is "for " (note the trailing 
whitespace) then the token is recognized as a stopword, but if the text is 
"for" 
(without whitespaces surrounding), then the token is NOT recognized as a 
stopword in order to retrieve "forsaken" as a suggestion.

We think that this tokenizer in query analysis, combined with a StopFilter for 
indexing, can be useful for the community.
Comments and ideas are welcome!

Thank you.

Cheers.
Fernando.


  

[jira] Deleted: (LUCENE-2418) NativeFSLock should allow for the existence of the lock file, if it was released successfully but fails to delete

2011-01-24 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler deleted LUCENE-2418:
--


> NativeFSLock should allow for the existence of the lock file, if it was 
> released successfully but fails to delete
> -
>
> Key: LUCENE-2418
> URL: https://issues.apache.org/jira/browse/LUCENE-2418
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
>
> When running JUnit tests, sometimes NativeFSLock.release() throws an 
> exception because it cannot delete the test lock file. After some 
> investigation it seems that NativeFSLock should relax its policy around the 
> existence of the lock file (whether the regular or the test one):
> * Even if it's the slimmest of chances, two JVMs can draw the same random 
> lock file (as happened during the JUnit tests) and then one of them will fail 
> to delete it, because the file will be deleted by one JVM, and File.delete() 
> returns false if the file does not exist.
> * Between the lock is released and a delete() is attempted, some external 
> process like AntiVirus, may hold the file, prevent its deletion.
> Unlike SimpleFSLock, the existence of the native lock should not prevent one 
> from obtaining it. Therefore, the following changes are proposed:
> * release() is allowed to fail to delete the lock file.
> * obtain() should not return false if the lock file exists - it should really 
> attempt to obtain it.
> * in acquireTestLock(), if after release() is called the lock file still 
> exists, we'll retry the delete few ms later, and if that fails, call 
> deleteOnExit.
> ** The only reason to do that is for 'niceness' -- we don't want to pollute 
> the filesystem w/ random lock files. W/ the regular lock file there's no 
> problem, because the next obtain() will operate on the same lock file, always.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2418) NativeFSLock should allow for the existence of the lock file, if it was released successfully but fails to delete

2011-01-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985933#action_12985933
 ] 

Shai Erera commented on LUCENE-2418:


bq. I can delete it, would this be an option? 

Yes. I don't see that we can resolve it, and it being open doesn't help much. 
So if that's the only option, then plz delete it.

> NativeFSLock should allow for the existence of the lock file, if it was 
> released successfully but fails to delete
> -
>
> Key: LUCENE-2418
> URL: https://issues.apache.org/jira/browse/LUCENE-2418
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
>
> When running JUnit tests, sometimes NativeFSLock.release() throws an 
> exception because it cannot delete the test lock file. After some 
> investigation it seems that NativeFSLock should relax its policy around the 
> existence of the lock file (whether the regular or the test one):
> * Even if it's the slimmest of chances, two JVMs can draw the same random 
> lock file (as happened during the JUnit tests) and then one of them will fail 
> to delete it, because the file will be deleted by one JVM, and File.delete() 
> returns false if the file does not exist.
> * Between the lock is released and a delete() is attempted, some external 
> process like AntiVirus, may hold the file, prevent its deletion.
> Unlike SimpleFSLock, the existence of the native lock should not prevent one 
> from obtaining it. Therefore, the following changes are proposed:
> * release() is allowed to fail to delete the lock file.
> * obtain() should not return false if the lock file exists - it should really 
> attempt to obtain it.
> * in acquireTestLock(), if after release() is called the lock file still 
> exists, we'll retry the delete few ms later, and if that fails, call 
> deleteOnExit.
> ** The only reason to do that is for 'niceness' -- we don't want to pollute 
> the filesystem w/ random lock files. W/ the regular lock file there's no 
> problem, because the next obtain() will operate on the same lock file, always.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2418) NativeFSLock should allow for the existence of the lock file, if it was released successfully but fails to delete

2011-01-24 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2418:
---

Issue Type: Improvement  (was: Sub-task)
Parent: (was: LUCENE-2421)

> NativeFSLock should allow for the existence of the lock file, if it was 
> released successfully but fails to delete
> -
>
> Key: LUCENE-2418
> URL: https://issues.apache.org/jira/browse/LUCENE-2418
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
>
> When running JUnit tests, sometimes NativeFSLock.release() throws an 
> exception because it cannot delete the test lock file. After some 
> investigation it seems that NativeFSLock should relax its policy around the 
> existence of the lock file (whether the regular or the test one):
> * Even if it's the slimmest of chances, two JVMs can draw the same random 
> lock file (as happened during the JUnit tests) and then one of them will fail 
> to delete it, because the file will be deleted by one JVM, and File.delete() 
> returns false if the file does not exist.
> * Between the lock is released and a delete() is attempted, some external 
> process like AntiVirus, may hold the file, prevent its deletion.
> Unlike SimpleFSLock, the existence of the native lock should not prevent one 
> from obtaining it. Therefore, the following changes are proposed:
> * release() is allowed to fail to delete the lock file.
> * obtain() should not return false if the lock file exists - it should really 
> attempt to obtain it.
> * in acquireTestLock(), if after release() is called the lock file still 
> exists, we'll retry the delete few ms later, and if that fails, call 
> deleteOnExit.
> ** The only reason to do that is for 'niceness' -- we don't want to pollute 
> the filesystem w/ random lock files. W/ the regular lock file there's no 
> problem, because the next obtain() will operate on the same lock file, always.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

2011-01-24 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated LUCENE-2868:


Attachment: lucene-2868.patch

Here's my take on the patch, including ability to cache weight objects.



> It should be easy to make use of TermState; rewritten queries should be 
> shared automatically
> 
>
> Key: LUCENE-2868
> URL: https://issues.apache.org/jira/browse/LUCENE-2868
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Karl Wright
> Attachments: lucene-2868.patch, query-rewriter.patch
>
>
> When you have the same query in a query hierarchy multiple times, tremendous 
> savings can now be had if the user knows enough to share the rewritten 
> queries in the hierarchy, due to the TermState addition.  But this is clumsy 
> and requires a lot of coding by the user to take advantage of.  Lucene should 
> be smart enough to share the rewritten queries automatically.
> This can be most readily (and powerfully) done by introducing a new method to 
> Query.java:
> Query rewriteUsingCache(IndexReader indexReader)
> ... and including a caching implementation right in Query.java which would 
> then work for all.  Of course, all callers would want to use this new method 
> rather than the current rewrite().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2418) NativeFSLock should allow for the existence of the lock file, if it was released successfully but fails to delete

2011-01-24 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985926#action_12985926
 ] 

Robert Muir commented on LUCENE-2418:
-

converting it to a subtask didnt help either... sorry :)

> NativeFSLock should allow for the existence of the lock file, if it was 
> released successfully but fails to delete
> -
>
> Key: LUCENE-2418
> URL: https://issues.apache.org/jira/browse/LUCENE-2418
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
>
> When running JUnit tests, sometimes NativeFSLock.release() throws an 
> exception because it cannot delete the test lock file. After some 
> investigation it seems that NativeFSLock should relax its policy around the 
> existence of the lock file (whether the regular or the test one):
> * Even if it's the slimmest of chances, two JVMs can draw the same random 
> lock file (as happened during the JUnit tests) and then one of them will fail 
> to delete it, because the file will be deleted by one JVM, and File.delete() 
> returns false if the file does not exist.
> * Between the lock is released and a delete() is attempted, some external 
> process like AntiVirus, may hold the file, prevent its deletion.
> Unlike SimpleFSLock, the existence of the native lock should not prevent one 
> from obtaining it. Therefore, the following changes are proposed:
> * release() is allowed to fail to delete the lock file.
> * obtain() should not return false if the lock file exists - it should really 
> attempt to obtain it.
> * in acquireTestLock(), if after release() is called the lock file still 
> exists, we'll retry the delete few ms later, and if that fails, call 
> deleteOnExit.
> ** The only reason to do that is for 'niceness' -- we don't want to pollute 
> the filesystem w/ random lock files. W/ the regular lock file there's no 
> problem, because the next obtain() will operate on the same lock file, always.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2418) NativeFSLock should allow for the existence of the lock file, if it was released successfully but fails to delete

2011-01-24 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2418:


Issue Type: Sub-task  (was: Bug)
Parent: LUCENE-2421

> NativeFSLock should allow for the existence of the lock file, if it was 
> released successfully but fails to delete
> -
>
> Key: LUCENE-2418
> URL: https://issues.apache.org/jira/browse/LUCENE-2418
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
>
> When running JUnit tests, sometimes NativeFSLock.release() throws an 
> exception because it cannot delete the test lock file. After some 
> investigation it seems that NativeFSLock should relax its policy around the 
> existence of the lock file (whether the regular or the test one):
> * Even if it's the slimmest of chances, two JVMs can draw the same random 
> lock file (as happened during the JUnit tests) and then one of them will fail 
> to delete it, because the file will be deleted by one JVM, and File.delete() 
> returns false if the file does not exist.
> * Between the lock is released and a delete() is attempted, some external 
> process like AntiVirus, may hold the file, prevent its deletion.
> Unlike SimpleFSLock, the existence of the native lock should not prevent one 
> from obtaining it. Therefore, the following changes are proposed:
> * release() is allowed to fail to delete the lock file.
> * obtain() should not return false if the lock file exists - it should really 
> attempt to obtain it.
> * in acquireTestLock(), if after release() is called the lock file still 
> exists, we'll retry the delete few ms later, and if that fails, call 
> deleteOnExit.
> ** The only reason to do that is for 'niceness' -- we don't want to pollute 
> the filesystem w/ random lock files. W/ the regular lock file there's no 
> problem, because the next obtain() will operate on the same lock file, always.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2418) NativeFSLock should allow for the existence of the lock file, if it was released successfully but fails to delete

2011-01-24 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985923#action_12985923
 ] 

Uwe Schindler commented on LUCENE-2418:
---

I can delete it, would this be an option?

> NativeFSLock should allow for the existence of the lock file, if it was 
> released successfully but fails to delete
> -
>
> Key: LUCENE-2418
> URL: https://issues.apache.org/jira/browse/LUCENE-2418
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
>
> When running JUnit tests, sometimes NativeFSLock.release() throws an 
> exception because it cannot delete the test lock file. After some 
> investigation it seems that NativeFSLock should relax its policy around the 
> existence of the lock file (whether the regular or the test one):
> * Even if it's the slimmest of chances, two JVMs can draw the same random 
> lock file (as happened during the JUnit tests) and then one of them will fail 
> to delete it, because the file will be deleted by one JVM, and File.delete() 
> returns false if the file does not exist.
> * Between the lock is released and a delete() is attempted, some external 
> process like AntiVirus, may hold the file, prevent its deletion.
> Unlike SimpleFSLock, the existence of the native lock should not prevent one 
> from obtaining it. Therefore, the following changes are proposed:
> * release() is allowed to fail to delete the lock file.
> * obtain() should not return false if the lock file exists - it should really 
> attempt to obtain it.
> * in acquireTestLock(), if after release() is called the lock file still 
> exists, we'll retry the delete few ms later, and if that fails, call 
> deleteOnExit.
> ** The only reason to do that is for 'niceness' -- we don't want to pollute 
> the filesystem w/ random lock files. W/ the regular lock file there's no 
> problem, because the next obtain() will operate on the same lock file, always.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2418) NativeFSLock should allow for the existence of the lock file, if it was released successfully but fails to delete

2011-01-24 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera reassigned LUCENE-2418:
--

Assignee: (was: Shai Erera)

> NativeFSLock should allow for the existence of the lock file, if it was 
> released successfully but fails to delete
> -
>
> Key: LUCENE-2418
> URL: https://issues.apache.org/jira/browse/LUCENE-2418
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
>
> When running JUnit tests, sometimes NativeFSLock.release() throws an 
> exception because it cannot delete the test lock file. After some 
> investigation it seems that NativeFSLock should relax its policy around the 
> existence of the lock file (whether the regular or the test one):
> * Even if it's the slimmest of chances, two JVMs can draw the same random 
> lock file (as happened during the JUnit tests) and then one of them will fail 
> to delete it, because the file will be deleted by one JVM, and File.delete() 
> returns false if the file does not exist.
> * Between the lock is released and a delete() is attempted, some external 
> process like AntiVirus, may hold the file, prevent its deletion.
> Unlike SimpleFSLock, the existence of the native lock should not prevent one 
> from obtaining it. Therefore, the following changes are proposed:
> * release() is allowed to fail to delete the lock file.
> * obtain() should not return false if the lock file exists - it should really 
> attempt to obtain it.
> * in acquireTestLock(), if after release() is called the lock file still 
> exists, we'll retry the delete few ms later, and if that fails, call 
> deleteOnExit.
> ** The only reason to do that is for 'niceness' -- we don't want to pollute 
> the filesystem w/ random lock files. W/ the regular lock file there's no 
> problem, because the next obtain() will operate on the same lock file, always.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2418) NativeFSLock should allow for the existence of the lock file, if it was released successfully but fails to delete

2011-01-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985920#action_12985920
 ] 

Shai Erera commented on LUCENE-2418:


Ok, last thing I try (unassign) in hope it will make the close/resolve 
re-appear, but nada.

So in case someone takes a look at it in the future:

THIS ISSUE IS A DUP OF 2421 AND NEEDS TO BE CLOSED AS SUCH !!!

> NativeFSLock should allow for the existence of the lock file, if it was 
> released successfully but fails to delete
> -
>
> Key: LUCENE-2418
> URL: https://issues.apache.org/jira/browse/LUCENE-2418
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
>
> When running JUnit tests, sometimes NativeFSLock.release() throws an 
> exception because it cannot delete the test lock file. After some 
> investigation it seems that NativeFSLock should relax its policy around the 
> existence of the lock file (whether the regular or the test one):
> * Even if it's the slimmest of chances, two JVMs can draw the same random 
> lock file (as happened during the JUnit tests) and then one of them will fail 
> to delete it, because the file will be deleted by one JVM, and File.delete() 
> returns false if the file does not exist.
> * Between the lock is released and a delete() is attempted, some external 
> process like AntiVirus, may hold the file, prevent its deletion.
> Unlike SimpleFSLock, the existence of the native lock should not prevent one 
> from obtaining it. Therefore, the following changes are proposed:
> * release() is allowed to fail to delete the lock file.
> * obtain() should not return false if the lock file exists - it should really 
> attempt to obtain it.
> * in acquireTestLock(), if after release() is called the lock file still 
> exists, we'll retry the delete few ms later, and if that fails, call 
> deleteOnExit.
> ** The only reason to do that is for 'niceness' -- we don't want to pollute 
> the filesystem w/ random lock files. W/ the regular lock file there's no 
> problem, because the next obtain() will operate on the same lock file, always.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level

2011-01-24 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2236.
-

Resolution: Fixed

Committed revision 1062927.

Thanks Doron!


> Similarity can only be set per index, but I may want to adjust scoring 
> behaviour at a field level
> -
>
> Key: LUCENE-2236
> URL: https://issues.apache.org/jira/browse/LUCENE-2236
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Affects Versions: 3.0
>Reporter: Paul taylor
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2236.patch, LUCENE-2236.patch, LUCENE-2236.patch, 
> LUCENE-2236.patch, LUCENE-2236.patch
>
>
> Similarity can only be set per index, but I may want to adjust scoring 
> behaviour at a field level, to faciliate this could we pass make field name 
> available to all score methods.
> Currently it is only passed to some such as lengthNorm() but not others such 
> as tf()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2418) NativeFSLock should allow for the existence of the lock file, if it was released successfully but fails to delete

2011-01-24 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2418:
---

Fix Version/s: (was: 4.0)

For some reason this issue cannot be resolved (resolve/close links don't appear 
in my JIRA). Removed the Fix Version so that it doesn't bug us.

> NativeFSLock should allow for the existence of the lock file, if it was 
> released successfully but fails to delete
> -
>
> Key: LUCENE-2418
> URL: https://issues.apache.org/jira/browse/LUCENE-2418
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
>
> When running JUnit tests, sometimes NativeFSLock.release() throws an 
> exception because it cannot delete the test lock file. After some 
> investigation it seems that NativeFSLock should relax its policy around the 
> existence of the lock file (whether the regular or the test one):
> * Even if it's the slimmest of chances, two JVMs can draw the same random 
> lock file (as happened during the JUnit tests) and then one of them will fail 
> to delete it, because the file will be deleted by one JVM, and File.delete() 
> returns false if the file does not exist.
> * Between the lock is released and a delete() is attempted, some external 
> process like AntiVirus, may hold the file, prevent its deletion.
> Unlike SimpleFSLock, the existence of the native lock should not prevent one 
> from obtaining it. Therefore, the following changes are proposed:
> * release() is allowed to fail to delete the lock file.
> * obtain() should not return false if the lock file exists - it should really 
> attempt to obtain it.
> * in acquireTestLock(), if after release() is called the lock file still 
> exists, we'll retry the delete few ms later, and if that fails, call 
> deleteOnExit.
> ** The only reason to do that is for 'niceness' -- we don't want to pollute 
> the filesystem w/ random lock files. W/ the regular lock file there's no 
> problem, because the next obtain() will operate on the same lock file, always.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSSION] Trunk and Stable release strategy

2011-01-24 Thread Shai Erera
Glad that we reached consensus on this one so quickly :).

Another thing - I think it'd also make sense to stop fixing bugs on 3.0 once
we release 3.1. That way, we can have bug-fix releases for 2.9 and latest
released 3.x. We then have two options about 3.0.4 (not yet released):
1) Release it (as it includes some bug fixes) and say "this will be the last
3.0.x release".
2) Don't release it and say "any bugs found in 3.0 are either fixed in 3.1
or will be fixed in 3.1.x".

I personally don't have any strong feelings about either option, but option
#2 involves much less efforts :).

Shai

On Mon, Jan 24, 2011 at 3:50 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> +1
>
> Mike
>
> On Mon, Jan 24, 2011 at 8:32 AM, Robert Muir  wrote:
> > On Mon, Jan 24, 2011 at 4:07 AM, Shai Erera  wrote:
> >> This will allow us to release 3x as frequent as we want, hold on w/
> trunk as
> >> much as we want, and at some point cut over to 4.0 and think about the
> next
> >> big things we'd like to bring to Lucene.
> >>
> >
> > +1, this way development is simple: we are always working on the next
> > big release in trunk (for incompatible changes), and port compatible
> > changes back to the next minor release.
> > this seems to be working now, so lets stick with what works.
> > when we release 4.0, 3.x goes into bugfix-mode like 2.9 is now, we
> > start working on 5.0 in trunk, and open up branch_4x to backport
> > compatible changes (e.g. for 4.1)
> >
> > also, I think its simpler to users: no confusion such as 4.1 not being
> > able to read 3.4 indexes or similar silliness: to the users the
> > versions are still completely sequential.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] Assigned: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

2011-01-24 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1076:
--

Assignee: Michael McCandless

I'll take a crack at this.  It's compelling, now that we always bulk merge doc 
stores...

> Allow MergePolicy to select non-contiguous merges
> -
>
> Key: LUCENE-1076
> URL: https://issues.apache.org/jira/browse/LUCENE-1076
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.3
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1076.patch
>
>
> I started work on this but with LUCENE-1044 I won't make much progress
> on it for a while, so I want to checkpoint my current state/patch.
> For backwards compatibility we must leave the default MergePolicy as
> selecting contiguous merges.  This is necessary because some
> applications rely on "temporal monotonicity" of doc IDs, which means
> even though merges can re-number documents, the renumbering will
> always reflect the order in which the documents were added to the
> index.
> Still, for those apps that do not rely on this, we should offer a
> MergePolicy that is free to select the best merges regardless of
> whether they are continuguous.  This requires fixing IndexWriter to
> accept such a merge, and, fixing LogMergePolicy to optionally allow
> it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2886) Adaptive Frame Of Reference

2011-01-24 Thread Renaud Delbru (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renaud Delbru updated LUCENE-2886:
--

Attachment: lucene-afor.tar.gz

tarball containing a maven project with source code and unit tests for:
- AFOR1
- AFOR2
- FOR
- PFOR Non Compulsive
- Simple64
- a basic tool for debugging IntBlock codecs.

It includes also the lucene-1458 snapshot dependencies that are necessary to 
compile the code and run the tests.

> Adaptive Frame Of Reference 
> 
>
> Key: LUCENE-2886
> URL: https://issues.apache.org/jira/browse/LUCENE-2886
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Codecs
>Reporter: Renaud Delbru
> Fix For: 4.0
>
> Attachments: lucene-afor.tar.gz
>
>
> We could test the implementation of the Adaptive Frame Of Reference [1] on 
> the lucene-4.0 branch.
> I am providing the source code of its implementation. Some work needs to be 
> done, as this implementation is working on the old lucene-1458 branch. 
> I will attach a tarball containing a running version (with tests) of the AFOR 
> implementation, as well as the implementations of PFOR and of Simple64 
> (simple family codec working on 64bits word) that has been used in the 
> experiments in [1].
> [1] http://www.deri.ie/fileadmin/documents/deri-tr-afor.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2886) Adaptive Frame Of Reference

2011-01-24 Thread Renaud Delbru (JIRA)
Adaptive Frame Of Reference 


 Key: LUCENE-2886
 URL: https://issues.apache.org/jira/browse/LUCENE-2886
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Codecs
Reporter: Renaud Delbru
 Fix For: 4.0


We could test the implementation of the Adaptive Frame Of Reference [1] on the 
lucene-4.0 branch.
I am providing the source code of its implementation. Some work needs to be 
done, as this implementation is working on the old lucene-1458 branch. 
I will attach a tarball containing a running version (with tests) of the AFOR 
implementation, as well as the implementations of PFOR and of Simple64 (simple 
family codec working on 64bits word) that has been used in the experiments in 
[1].

[1] http://www.deri.ie/fileadmin/documents/deri-tr-afor.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >