from:"Jason Rutherglen"

Re: Ant command for installing Lucene and Solr Maven dependencies locally?

2012-10-29 Thread Jason Rutherglen

Any way to make it skip tests?

On Mon, Oct 29, 2012 at 12:55 PM, Tommaso Teofili
 wrote:
> 'ant run-maven-build' should do the trick.
> Tommaso
>
> 2012/10/29 Jason Rutherglen 
>>
>> I have used 'ant generate-maven-artifacts' to generate the Maven
>> artifacts.  Is there a target to install them locally?
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Count of keys of an FST

2012-06-28 Thread Jason Rutherglen

Thanks, it's done!  :)

https://issues.apache.org/jira/browse/CASSANDRA-4324

On Thu, Jun 28, 2012 at 9:36 AM, Dawid Weiss
wrote:

> Let me know if you need that snippet of code to count the keys; or try
> it yourself -- should be good practice :)
>
> Dawid
>
> On Thu, Jun 28, 2012 at 3:32 PM, Jason Rutherglen
>  wrote:
> > I looked at the sources and didn't see a key count.
> >
> > Thanks Dawid and Mike.
> >
> > On Thu, Jun 28, 2012 at 6:37 AM, Michael McCandless
> >  wrote:
> >>
> >> I believe node and arc count are stored, but not key count.  But check
> >> the sources to be sure!
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Wed, Jun 27, 2012 at 4:53 PM, Dawid Weiss
> >>  wrote:
> >> > If you need the count with constant time then yes, you should store it
> >> > separately. You could also make a transducer that would store it at
> >> > the root node as side-effect of values associated with keys, but it's
> >> > kind of ugly.
> >> >
> >> > Please check the fst header though -- I'm not sure, maybe Mike wrote
> >> > it so that the node count/ keys count is in there.
> >> >
> >> > Dawid
> >> >
> >> > On Wed, Jun 27, 2012 at 10:50 PM, Jason Rutherglen
> >> >  wrote:
> >> >> Sounds like I should just count as the keys are added and store the
> >> >> count
> >> >> separately.
> >> >>
> >> >> On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss
> >> >> 
> >> >> wrote:
> >> >>>
> >> >>> I don't think there is one that you could use out of the box... but
> >> >>> maybe I'm wrong and it's stored in the header somewhere (don't have
> >> >>> the source in front of me).
> >> >>>
> >> >>> To calculate it by hand the worst case is that you'll need a
> recursive
> >> >>> traversal, which would mean O(number of stored states) with
> >> >>> intermediate count caches or O(number of keys) without any caches
> and
> >> >>> memory overhead (just recursive traversal).
> >> >>>
> >> >>> Dawid
> >> >>>
> >> >>> On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
> >> >>>  wrote:
> >> >>> > The FST class has a number of methods that return counts, which
> one
> >> >>> > returns
> >> >>> > the total number of keys that have been encoded into the FST?
> >> >>>
> >> >>>
> -
> >> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >>>
> >> >>
> >> >
> >> > -
> >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Count of keys of an FST

2012-06-28 Thread Jason Rutherglen

I looked at the sources and didn't see a key count.

Thanks Dawid and Mike.

On Thu, Jun 28, 2012 at 6:37 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> I believe node and arc count are stored, but not key count.  But check
> the sources to be sure!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jun 27, 2012 at 4:53 PM, Dawid Weiss
>  wrote:
> > If you need the count with constant time then yes, you should store it
> > separately. You could also make a transducer that would store it at
> > the root node as side-effect of values associated with keys, but it's
> > kind of ugly.
> >
> > Please check the fst header though -- I'm not sure, maybe Mike wrote
> > it so that the node count/ keys count is in there.
> >
> > Dawid
> >
> > On Wed, Jun 27, 2012 at 10:50 PM, Jason Rutherglen
> >  wrote:
> >> Sounds like I should just count as the keys are added and store the
> count
> >> separately.
> >>
> >> On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss <
> dawid.we...@cs.put.poznan.pl>
> >> wrote:
> >>>
> >>> I don't think there is one that you could use out of the box... but
> >>> maybe I'm wrong and it's stored in the header somewhere (don't have
> >>> the source in front of me).
> >>>
> >>> To calculate it by hand the worst case is that you'll need a recursive
> >>> traversal, which would mean O(number of stored states) with
> >>> intermediate count caches or O(number of keys) without any caches and
> >>> memory overhead (just recursive traversal).
> >>>
> >>> Dawid
> >>>
> >>> On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
> >>>  wrote:
> >>> > The FST class has a number of methods that return counts, which one
> >>> > returns
> >>> > the total number of keys that have been encoded into the FST?
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>>
> >>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Count of keys of an FST

2012-06-27 Thread Jason Rutherglen

Sounds like I should just count as the keys are added and store the count
separately.

On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss
wrote:

> I don't think there is one that you could use out of the box... but
> maybe I'm wrong and it's stored in the header somewhere (don't have
> the source in front of me).
>
> To calculate it by hand the worst case is that you'll need a recursive
> traversal, which would mean O(number of stored states) with
> intermediate count caches or O(number of keys) without any caches and
> memory overhead (just recursive traversal).
>
> Dawid
>
> On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
>  wrote:
> > The FST class has a number of methods that return counts, which one
> returns
> > the total number of keys that have been encoded into the FST?
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: [jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-06-09 Thread Jason Rutherglen

Bill, which patch is working for you?  It is difficult to follow! :)

On Sat, Jun 9, 2012 at 1:02 AM, William Bell  wrote:

> I am not sure what the issue is.
>
> This is working for me...
>
> On Fri, Jun 8, 2012 at 8:35 AM, Jason Rutherglen (JIRA) 
> wrote:
> >
> >[
> https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291803#comment-13291803]
> >
> > Jason Rutherglen commented on SOLR-2242:
> > 
> >
> > Terrance, can you post a patch to the Jira?  It makes sense to start
> this Jira off non-distributed, and add a distributed version in another
> Jira issue...
> >
> >> Get distinct count of names for a facet field
> >> -
> >>
> >> Key: SOLR-2242
> >> URL: https://issues.apache.org/jira/browse/SOLR-2242
> >> Project: Solr
> >>  Issue Type: New Feature
> >>  Components: Response Writers
> >>Affects Versions: 4.0
> >>Reporter: Bill Bell
> >>Priority: Minor
> >> Fix For: 4.0
> >>
> >> Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_5_tests.patch,
> SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch,
> SOLR-2242.patch, SOLR-2242.shard.withtests.patch,
> SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch,
> SOLR.2242.solr3.1.patch
> >>
> >>
> >> When returning facet.field= you will get a list of
> matches for distinct values. This is normal behavior. This patch tells you
> how many distinct values you have (# of rows). Use with limit=-1 and
> mincount=1.
> >> The feature is called "namedistinct". Here is an example:
> >>
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=2&facet.limit=-1&facet.field=price
> >>
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=0&facet.limit=-1&facet.field=price
> >>
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=1&facet.limit=-1&facet.field=price
> >> This currently only works on facet.field.
> >> {code}
> >> 
> >>   
> >> 14
> >> 31 name="19.95">111 name="179.99">11 name="279.95">11 name="350.0">111 name="649.99">11
> >>   
> >> 
> >> {code}
> >> Several people use this to get the group.field count (the # of groups).
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] [Resolved] (SOLR-2569) Enable facile moving of cores

2012-06-08 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen resolved SOLR-2569.


Resolution: Won't Fix

> Enable facile moving of cores
> -
>
> Key: SOLR-2569
> URL: https://issues.apache.org/jira/browse/SOLR-2569
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore, replication (java)
>Affects Versions: 4.0
>    Reporter: Jason Rutherglen
>
> Spin-off from this thread: 
> http://search-lucene.com/m/5CO7Z1oOrh6/elastic+search&subj=Solr+vs+ElasticSearch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-06-08 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291803#comment-13291803
 ] 

Jason Rutherglen commented on SOLR-2242:


Terrance, can you post a patch to the Jira?  It makes sense to start this Jira 
off non-distributed, and add a distributed version in another Jira issue...

> Get distinct count of names for a facet field
> -
>
> Key: SOLR-2242
> URL: https://issues.apache.org/jira/browse/SOLR-2242
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
>Affects Versions: 4.0
>Reporter: Bill Bell
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_5_tests.patch, 
> SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
> SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
> SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch
>
>
> When returning facet.field= you will get a list of matches for 
> distinct values. This is normal behavior. This patch tells you how many 
> distinct values you have (# of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=2&facet.limit=-1&facet.field=price
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=0&facet.limit=-1&facet.field=price
> http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=*:*&facet=true&facet.mincount=1&facet.numFacetTerms=1&facet.limit=-1&facet.field=price
> This currently only works on facet.field.
> {code}
> 
>   
> 14
> 31 name="19.95">111 name="179.99">111 name="329.95">111 name="479.95">111
>   
> 
> {code} 
> Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Merge IO throttling

2012-04-05 Thread Jason Rutherglen

Thanks Mike.

On Thu, Apr 5, 2012 at 11:55 AM, Michael McCandless
 wrote:
> Yes, in trunk: FSDirectory.setMaxMergeWriteMBPerSec.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Apr 5, 2012 at 11:54 AM, Jason Rutherglen
>  wrote:
>> Has any type of IO rate limiting been implemented for merges?
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Error when running 'ant generate-maven-artifacts'

2012-04-04 Thread Jason Rutherglen

I ran ''ant ivy-bootstrap' then 'ant generate-maven-artifacts'

At the end of the latter, this is the error:

m2-deploy-solr-parent-pom:
[artifact:install-provider] Installing provider:
org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7:runtime
java.lang.OutOfMemoryError: PermGen space
PermGen space

On Wed, Apr 4, 2012 at 9:09 AM, Uwe Schindler  wrote:
> Hi,
>
> Read BUILD.txt, we moved the whole Lucene build to Apache Ivy dependency
> (LUCENE-3930). In fact you have to install Apache IVY (there is a bootstrap
> task for it).
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
>> Sent: Wednesday, April 04, 2012 3:58 PM
>> To: dev@lucene.apache.org
>> Subject: Error when running 'ant generate-maven-artifacts'
>>
>> I am getting the following error when running 'ant
> generate-maven-artifacts':
>>
>> Buildfile: /Users/jasonrutherglen/src/LUCENE-TRUNK/build.xml
>>
>> generate-maven-artifacts:
>>
>> filter-pom-templates:
>>      [copy] Copying 42 files to
>> /Users/jasonrutherglen/src/LUCENE-TRUNK/lucene/build/poms
>>
>> install-maven-tasks:
>>
>> BUILD FAILED
>> /Users/jasonrutherglen/src/LUCENE-TRUNK/build.xml:95: The following error
>> occurred while executing this line:
>> /Users/jasonrutherglen/src/LUCENE-TRUNK/lucene/common-build.xml:847:
>> Problem: failed to create task or type
>> antlib:org.apache.ivy.ant:cachepath
>> Cause: The name is undefined.
>> Action: Check the spelling.
>> Action: Check that any custom tasks/types have been declared.
>> Action: Check that any / declarations have taken
> place.
>> No types or tasks have been defined in this namespace yet
>>
>> This appears to be an antlib declaration.
>> Action: Check that the implementing library exists in one of:
>>         -/usr/share/ant/lib
>>         -/Users/jasonrutherglen/.ant/lib
>>         -a directory added on the command line with the -lib argument
>>
>>
>> Total time: 0 seconds
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>> commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Error when running 'ant generate-maven-artifacts'

2012-04-04 Thread Jason Rutherglen

I am getting the following error when running 'ant generate-maven-artifacts':

Buildfile: /Users/jasonrutherglen/src/LUCENE-TRUNK/build.xml

generate-maven-artifacts:

filter-pom-templates:
 [copy] Copying 42 files to
/Users/jasonrutherglen/src/LUCENE-TRUNK/lucene/build/poms

install-maven-tasks:

BUILD FAILED
/Users/jasonrutherglen/src/LUCENE-TRUNK/build.xml:95: The following
error occurred while executing this line:
/Users/jasonrutherglen/src/LUCENE-TRUNK/lucene/common-build.xml:847:
Problem: failed to create task or type
antlib:org.apache.ivy.ant:cachepath
Cause: The name is undefined.
Action: Check the spelling.
Action: Check that any custom tasks/types have been declared.
Action: Check that any / declarations have taken place.
No types or tasks have been defined in this namespace yet

This appears to be an antlib declaration.
Action: Check that the implementing library exists in one of:
-/usr/share/ant/lib
-/Users/jasonrutherglen/.ant/lib
-a directory added on the command line with the -lib argument


Total time: 0 seconds

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2698) Enhance CoreAdmin STATUS command to return index size

2012-03-28 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240498#comment-13240498
 ] 

Jason Rutherglen commented on SOLR-2698:


+1 This'd be useful.

> Enhance CoreAdmin STATUS command to return index size
> -
>
> Key: SOLR-2698
> URL: https://issues.apache.org/jira/browse/SOLR-2698
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Affects Versions: 4.0
>Reporter: Yury Kats
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-2698.patch, SOLR-2698.patch
>
>
> CoreAdmin STATUS command returns all kinds of index info for all cores on the 
> server, except for the index size.
> However, indexSize can be retrieved for an individual core via a 
> /replication&command=details request.
> I have N Solrs servers, running M cores each. My application is monitoring 
> the status of all cores, including their index size.
> As it stands today, I need to issue N status requests plus N*M replication 
> requests to get all the information I need.
> If STATUS command returned indexSize, number of requests would be just N.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Maven artifacts not working?

2012-03-20 Thread Jason Rutherglen

Steven, Thanks!

On Tue, Mar 20, 2012 at 12:15 PM, Steven A Rowe  wrote:
> Hi Jason,
>
> I switched nightly maven artifact deployment to the Apache Snapshot 
> repository; Jenkins no longer hosts maven snapshot artifacts.
>
> For more details, see <http://wiki.apache.org/lucene-java/NightlyBuilds>,
> dev-tools/maven/README.maven in your local working copy, and 
> <https://issues.apache.org/jira/browse/LUCENE-3825>.
>
> Stevev
>
> -----Original Message-
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: Tuesday, March 20, 2012 10:46 AM
> To: dev@lucene.apache.org
> Subject: Maven artifacts not working?
>
> This link seems to not work:
>
> https://builds.apache.org/job/Lucene-Solr-Maven-trunk/lastSuccessfulBuild/artifact/maven_artifacts
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional 
> commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Maven artifacts not working?

2012-03-20 Thread Jason Rutherglen

This link seems to not work:

https://builds.apache.org/job/Lucene-Solr-Maven-trunk/lastSuccessfulBuild/artifact/maven_artifacts

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3221) Make Shard handler threadpool configurable

2012-03-08 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225788#comment-13225788
 ] 

Jason Rutherglen commented on SOLR-3221:


+1 Long overdue.

> Make Shard handler threadpool configurable
> --
>
> Key: SOLR-3221
> URL: https://issues.apache.org/jira/browse/SOLR-3221
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.6
>Reporter: Greg Bowyer
>Assignee: Erick Erickson
>  Labels: distributed, http, shard
> Attachments: SOLR-3221-3x_branch.patch
>
>
> From profiling of monitor contention, as well as observations of the
> 95th and 99th response times for nodes that perform distributed search
> (or ‟aggregator‟ nodes) it would appear that the HttpShardHandler code
> currently does a suboptimal job of managing outgoing shard level
> requests.
> Presently the code contained within lucene 3.5's SearchHandler and
> Lucene trunk / 3x's ShardHandlerFactory create arbitrary threads in
> order to service distributed search requests. This is done presently to
> limit the size of the threadpool such that it does not consume resources
> in deployment configurations that do not use distributed search.
> This unfortunately has two impacts on the response time if the node
> coordinating the distribution is under high load.
> The usage of the MaxConnectionsPerHost configuration option results in
> aggressive activity on semaphores within HttpCommons, it has been
> observed that the aggregator can have a response time far greater than
> that of the searchers. The above monitor contention would appear to
> suggest that in some cases its possible for liveness issues to occur and
> for simple queries to be starved of resources simply due to a lack of
> attention from the viewpoint of context switching.
> With, as mentioned above the http commons connection being hotly
> contended
> The fair, queue based configuration eliminates this, at the cost of
> throughput.
> This patch aims to make the threadpool largely configurable allowing for
> those using solr to choose the throughput vs latency balance they
> desire.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Plans to add functions to results of groups

2012-03-07 Thread Jason Rutherglen

It is a fairly typical use case due to the availability of aggregation
functions in combination with GROUP BY in SQL. Conceptually, given the
work that has already been completed with Lucene's group by
functionality these may be simple add ons.

A couple of features that would effectively duplicate SQL GROUP BY:

1. Group by multiple fields (eg, combine per doc fields into one
unique key and group by the key)
2. Aggregation functions on a single field.  These can be implemented
as an interface that evaluates each per group document, and outputs a
final value.  COUNT, COUNT DISTINCT, AVG, return a single numeric
values.

Thanks for pointing out LUCENE-3444, that is a great direction.

On Wed, Mar 7, 2012 at 5:06 PM, Martijn v Groningen
 wrote:
> I haven't seen an issue describing this. Something like this was available
> in the SOLR-236 patches, but never got committed.
> I started to create a second pass collector that counts the distinct values
> of a particular field for the top N groups in LUCENE-3444.
>
> I think there might be a need for a more general approach for this kind of
> functionality that uses the ValueSource concept in the queries module.
>
> Martijn
>
> On 7 March 2012 07:03, Jason Rutherglen  wrote:
>>
>> Are there plans to add the ability to apply functions (eg, sum,
>> average, distinct, or custom functions) to group'd documents.  Such
>> that the document list per group is not returned, instead the result
>> of the function is.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3141) Deprecate OPTIMIZE command in Solr

2012-02-19 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211467#comment-13211467
 ] 

Jason Rutherglen commented on SOLR-3141:


-1 Serious over/under-engineering.  

> Deprecate OPTIMIZE command in Solr
> --
>
> Key: SOLR-3141
> URL: https://issues.apache.org/jira/browse/SOLR-3141
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 3.5
>Reporter: Jan Høydahl
>  Labels: force, optimize
> Fix For: 3.6
>
>
> Background: LUCENE-3454 renames optimize() as forceMerge(). Please read that 
> issue first.
> Now that optimize() is rarely necessary anymore, and renamed in Lucene APIs, 
> what should be done with Solr's ancient optimize command?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2593) A new core admin action 'split' for splitting index

2012-02-13 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207407#comment-13207407
 ] 

Jason Rutherglen commented on SOLR-2593:


Is there a patch for this issue available?  If not it's fine.

> A new core admin action 'split' for splitting index
> ---
>
> Key: SOLR-2593
> URL: https://issues.apache.org/jira/browse/SOLR-2593
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
> Fix For: 4.0
>
>
> If an index is too large/hot it would be desirable to split it out to another 
> core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example :
> action=split&split=20percent&newcore=my_new_index
> or
> action=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3756) Don't allow IndexWriterConfig setters to chain

2012-02-08 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204160#comment-13204160
 ] 

Jason Rutherglen commented on LUCENE-3756:
--

+1 I agree with Mike.

> Don't allow IndexWriterConfig setters to chain
> --
>
> Key: LUCENE-3756
> URL: https://issues.apache.org/jira/browse/LUCENE-3756
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>
> Spinoff from LUCENE-3736.
> I don't like that IndexWriterConfig's setters are chainable; it
> results in code in our tests like this:
> {noformat}
> IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig( 
> TEST_VERSION_CURRENT, new 
> MockAnalyzer(random)).setMaxBufferedDocs(2).setMergePolicy(newLogMergePolicy()));
> {noformat}
> I think in general we should avoid chaining since it encourages hard
> to read code (code is already hard enough to read!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3759) Support joining a distributed environment.

2012-02-07 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203118#comment-13203118
 ] 

Jason Rutherglen commented on LUCENE-3759:
--

+1 Nice, distributed join will be super useful.

> Support joining a distributed environment.
> --
>
> Key: LUCENE-3759
> URL: https://issues.apache.org/jira/browse/LUCENE-3759
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/join
>Reporter: Martijn van Groningen
>
> Add two more methods in JoinUtil to support joining in a distributed manner.
> * Method to retrieve all from values.
> * Method to create a TermsQuery based on a set of from terms.
> With these two methods distributed joining can be supported following these 
> steps:
> # Retrieve from values from each shard
> # Merge the retrieved from values. 
> # Create a TermsQuery based on the merged from terms and send this query to 
> all shards. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3734) Allow customizing/subclassing of DirectoryReader

2012-01-31 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197074#comment-13197074
 ] 

Jason Rutherglen commented on LUCENE-3734:
--

The issues mentioned were brought up in LUCENE-3498 and LUCENE-3497 thus 
yielding a +1 from me.

> Allow customizing/subclassing of DirectoryReader
> 
>
> Key: LUCENE-3734
> URL: https://issues.apache.org/jira/browse/LUCENE-3734
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/index
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.0
>
> Attachments: LUCENE-3734.patch
>
>
> DirectoryReader is final and has only static factory methods. It is not 
> possible to subclass it in any way.
> The problem is mainly Solr, as Solr accesses directory(), IndexCommits,... 
> and therefore cannot work on abstract IndexReader anymore. This should be 
> changed, by e.g. handling reopening in the IRFactory, also versions, 
> commits,... Currently its not possible to implement any other IRFactory that 
> returns something else.
> On the other hand, it should be possible to "wrap" a DirectoryReader / 
> CompositeReader to handle filtering of collection based information 
> (subreaders, reopening hooks,...). This can be done by making DirectoryReader 
> abstract and let DirectoryReader.open return a internal hidden class 
> "StandardDirectoryReader". This is similar to the relatinship between 
> IndexReader and hidden DirectoryReader in the past.
> DirectoryReader will have final implementations of most methods like getting 
> document stored fields, global docFreq and other statistics, but allows 
> hooking into doOpenIfChanged. Also it should not be limited to SegmentReaders 
> as childs - any AtomicReader is fine. This allows users to create e.g. a 
> Directory-based ParallelReader (see LUCENE-3736) that supports reopen and 
> (partially commits).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3602) Add join query to Lucene

2012-01-19 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189509#comment-13189509
 ] 

Jason Rutherglen commented on LUCENE-3602:
--

Just following up on the per-segment terms collection.  Join is going to be 
used as a filter in most cases (?).  Filters can be applied per-segment (unlike 
scoring queries).  So it seems possible to avoid the BRH creation by using the 
DTI?

> Add join query to Lucene
> 
>
> Key: LUCENE-3602
> URL: https://issues.apache.org/jira/browse/LUCENE-3602
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/join
>Reporter: Martijn van Groningen
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3602-3x.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch
>
>
> Solr has (psuedo) join query for a while now. I think this should also be 
> available in Lucene.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3602) Add join query to Lucene

2012-01-19 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189501#comment-13189501
 ] 

Jason Rutherglen commented on LUCENE-3602:
--

{quote}The terms collected in the first phase are from many segments{quote}

Why is that necessary?

{quote}Caching can be improved in the second phase as you described, by saving 
a bitset per fromTerm?{quote}

Possibly, only for terms with a high number of documents.  Or we can use a 
faster to decode (less compressed) posting codec.

bq. The JoinUtil is between 2 till 3 times faster than Solr's JoinQuery with 
this data set on my dev machine

Interesting, thanks for sharing.



> Add join query to Lucene
> 
>
> Key: LUCENE-3602
> URL: https://issues.apache.org/jira/browse/LUCENE-3602
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/join
>Reporter: Martijn van Groningen
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3602-3x.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch
>
>
> Solr has (psuedo) join query for a while now. I think this should also be 
> available in Lucene.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3602) Add join query to Lucene

2012-01-19 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189375#comment-13189375
 ] 

Jason Rutherglen commented on LUCENE-3602:
--

I was reviewing this issue to use where Solr's join implementation may not be 
the right choice.

In this Lucene Join implementation, a new BytesRefHash is built per query (and 
cannot be reused).  This could generate a fair amount of garbage quickly.  

Also the sort compare using BRH is per byte (not as cheap as an ord compare).  
We can probably use DocTermsIndex to replace the use of BytesRefHash by 
comparing DTI's ords.  Then we are saving off the bytes into BRH per query, and 
the comparison would be faster.

Additionally, for a join with many terms, the number of postings could become a 
factor in performance.  Because we are not caching bitsets like Solr does, it 
seems like an excellent occasion for a faster less-compressed codec.

Further, to save on term seeking, if the term state was cached (eg, the file 
pointers into the posting list), the iteration on the terms dict would be 
removed.

Granted all this requires more RAM, however in many cases (eg, mine), that 
would be acceptable.

> Add join query to Lucene
> 
>
> Key: LUCENE-3602
> URL: https://issues.apache.org/jira/browse/LUCENE-3602
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/join
>Reporter: Martijn van Groningen
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch
>
>
> Solr has (psuedo) join query for a while now. I think this should also be 
> available in Lucene.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3044) Incrementally deprecate NamedList & replace with typesafe API

2012-01-18 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188549#comment-13188549
 ] 

Jason Rutherglen commented on SOLR-3044:


+1 NamedList is an oldie, not not a goodie.

> Incrementally deprecate NamedList & replace with typesafe API
> -
>
> Key: SOLR-3044
> URL: https://issues.apache.org/jira/browse/SOLR-3044
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.6, 4.0
>Reporter: Simon Willnauer
>Priority: Critical
>
> The first thing I can think of when I see how and where NamedList is used in 
> solr is "if you have a hammer in your hands, every problem looks like a 
> nail". IMO and I know others think the same way the use of NamedList is way 
> over the top for a long time. However the biggest issues here is the massive 
> use of this class all over the place which has several problem, here is a 
> list just to name a some:
> * no type safety
> * produces lots of garbage
> * makes things hard to refactor
> * binds everything strongly to Solr and is contra modularization 
> * code is hardly readable - one example is all the distributed request / 
> response processing
> * requires autoboxing of primitives all over the place
> * some processing is N^2 where N is possible 
> * requires tons of instanceof conditions
> * ...
> Yet this task is not simple nor is it possible to do this in a single patch. 
> I think the target of this issue and all its subtasks will be 5.0 but we need 
> to start doing it to eventually clean up the API enough to get rid of all the 
> issues I named above.
> One way of starting would be to create a couple of subtasks like:
> * Refactor ResponseWriters to pass in a StreamWriter similar to what XML or 
> JSON apis (Jackson / STAX) and let the ResponseObject write itself based on 
> the StreamWriter impl.
> * Refactor configuration and resourceloading to use some libraries that are 
> specialized to do that.
> * Deprecate SearchComponent methods that accept named list in favor of a 
> typesafe API
> I think we should start doing this its time to move on here!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3602) Add join query to Lucene

2012-01-16 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187370#comment-13187370
 ] 

Jason Rutherglen commented on LUCENE-3602:
--

Sweet!  How join would work in distributed mode, that would be very useful for 
BigData projects.

> Add join query to Lucene
> 
>
> Key: LUCENE-3602
> URL: https://issues.apache.org/jira/browse/LUCENE-3602
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/join
>Reporter: Martijn van Groningen
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
> LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch
>
>
> Solr has (psuedo) join query for a while now. I think this should also be 
> available in Lucene.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2702) Add support for NRTCachingDirectory

2011-12-30 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177885#comment-13177885
 ] 

Jason Rutherglen commented on SOLR-2702:


This issue [only] needs to add configuration options given 
NRTCachingDirectoryFactory is in trunk.

> Add support for NRTCachingDirectory
> ---
>
> Key: SOLR-2702
> URL: https://issues.apache.org/jira/browse/SOLR-2702
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0
>
>
> would be nice to have this option for the new NRT support

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

SolrJ commit still has flush parameter

2011-12-29 Thread Jason Rutherglen

SolrJ commit still has the flush parameter, it should be removed, and
softcommit should be added.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3654) Optimize BytesRef comparator to use Unsafe long based comparison (when possible)

2011-12-20 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173853#comment-13173853
 ] 

Jason Rutherglen commented on LUCENE-3654:
--

+1 There are 3 other MAJOR Apache projects that have already integrated this 
efficiency.  It's completely silly not to use it.

> Optimize BytesRef comparator to use Unsafe long based comparison (when 
> possible)
> 
>
> Key: LUCENE-3654
> URL: https://issues.apache.org/jira/browse/LUCENE-3654
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index, core/search
>Reporter: Shay Banon
> Attachments: LUCENE-3654.patch
>
>
> Inspire by Google Guava UnsignedBytes lexi comparator, that uses unsafe to do 
> long based comparisons over the bytes instead of one by one (which yields 
> 2-4x better perf), use similar logic in BytesRef comparator. The code was 
> adapted to support offset/length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3654) Optimize BytesRef comparator to use Unsafe long based comparison (when possible)

2011-12-17 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171716#comment-13171716
 ] 

Jason Rutherglen commented on LUCENE-3654:
--

Nice, I mentioned this on the dev list over a month ago (originally it was 
mentioned on the Hadoop list), nice to see it get into Lucene, though am 
curious where the speed improvement will be for Lucene.

> Optimize BytesRef comparator to use Unsafe long based comparison (when 
> possible)
> 
>
> Key: LUCENE-3654
> URL: https://issues.apache.org/jira/browse/LUCENE-3654
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index, core/search
>Reporter: Shay Banon
> Attachments: LUCENE-3654.patch
>
>
> Inspire by Google Guava UnsignedBytes lexi comparator, that uses unsafe to do 
> long based comparisons over the bytes instead of one by one (which yields 
> 2-4x better perf), use similar logic in BytesRef comparator. The code was 
> adapted to support offset/length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3602) Add join query to Lucene

2011-12-12 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167959#comment-13167959
 ] 

Jason Rutherglen commented on LUCENE-3602:
--

Maybe we can (in another issue) move bit set filter caching into SearchManager, 
for use by Lucene Join (here), and others.  At the same time making bitset 
filtering per-segment, a fundamental improvement from the existing (old) Solr 
code.

> Add join query to Lucene
> 
>
> Key: LUCENE-3602
> URL: https://issues.apache.org/jira/browse/LUCENE-3602
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/join
>Reporter: Martijn van Groningen
> Attachments: LUCENE-3602.patch, LUCENE-3602.patch
>
>
> Solr has (psuedo) join query for a while now. I think this should also be 
> available in Lucene.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3622) separate IndexDocValues interface from implementation

2011-12-09 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166305#comment-13166305
 ] 

Jason Rutherglen commented on LUCENE-3622:
--

+! To the function naming being completely off.  That's the naming that should 
change.

> separate IndexDocValues interface from implementation
> -
>
> Key: LUCENE-3622
> URL: https://issues.apache.org/jira/browse/LUCENE-3622
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
> Attachments: LUCENE-3622.patch
>
>
> Currently the o.a.l.index.values contains both the abstract apis and 
> Lucene40's current implementation.
> I think we should move the implementation underneath Lucene40Codec, leaving 
> only the abstract apis.
> For example, simpletext might have a different implementation, and we might 
> make a int8 implementation
> underneath preflexcodec to support norms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3602) Add join query to Lucene

2011-11-29 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159335#comment-13159335
 ] 

Jason Rutherglen commented on LUCENE-3602:
--

Great to see this moving out of Solr and getting new eyes on it (with added 
improvements)!

> Add join query to Lucene
> 
>
> Key: LUCENE-3602
> URL: https://issues.apache.org/jira/browse/LUCENE-3602
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/join
>Reporter: Martijn van Groningen
> Attachments: LUCENE-3602.patch
>
>
> Solr has (psuedo) join query for a while now. I think this should also be 
> available in Lucene.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3587) Attempting to link to Java SE JavaDocs is competely unreliable

2011-11-23 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156300#comment-13156300
 ] 

Jason Rutherglen commented on LUCENE-3587:
--

+1 on the patch.  Javadoc external links should not destroy a build.

> Attempting to link to Java SE JavaDocs is competely unreliable
> --
>
> Key: LUCENE-3587
> URL: https://issues.apache.org/jira/browse/LUCENE-3587
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Hoss Man
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3587.3x.patch, 
> LUCENE-3587.keep-javadoc-link.3x.patch, 
> LUCENE-3587.keep-javadoc-link.trunk.patch, LUCENE-3587.trunk.patch
>
>
> As noted several times since Oracle bought Sun, the canonical links to the 
> Java SE JavaDocs have been unreliable and frequently cause warnings.
> Since we choose to fail the build on javadoc warnings, this is a serious 
> problem for anyone trying to build from source if/when the package-list we 
> reference in our common-build.xml is not available. 
> We should eliminate this dependency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2889) Implement Adaptive Replacement Cache

2011-11-10 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148269#comment-13148269
 ] 

Jason Rutherglen commented on SOLR-2889:


Simon and Yonik, re-read what you wrote, have fun.

> Implement Adaptive Replacement Cache
> 
>
> Key: SOLR-2889
> URL: https://issues.apache.org/jira/browse/SOLR-2889
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 3.4
>Reporter: Shawn Heisey
>Priority: Minor
>
> Currently Solr's caches are LRU, which doesn't look at hitcount to decide 
> which entries are most important.  There is a method that takes both 
> frequency and time of cache hits into account:
> http://en.wikipedia.org/wiki/Adaptive_Replacement_Cache
> If it's feasible, this could be a good addition to Solr/Lucene.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2889) Implement Adaptive Replacement Cache

2011-11-10 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148065#comment-13148065
 ] 

Jason Rutherglen commented on SOLR-2889:


Yonik, Take a step back.  No analyzers are in Solr, and the caching and other 
'parts' will be moved out.  It's reasonable to expect that process to happen on 
new additions to what is a singular project.

> Implement Adaptive Replacement Cache
> 
>
> Key: SOLR-2889
> URL: https://issues.apache.org/jira/browse/SOLR-2889
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 3.4
>Reporter: Shawn Heisey
>Priority: Minor
>
> Currently Solr's caches are LRU, which doesn't look at hitcount to decide 
> which entries are most important.  There is a method that takes both 
> frequency and time of cache hits into account:
> http://en.wikipedia.org/wiki/Adaptive_Replacement_Cache
> If it's feasible, this could be a good addition to Solr/Lucene.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2889) Implement Adaptive Replacement Cache

2011-11-10 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148003#comment-13148003
 ] 

Jason Rutherglen commented on SOLR-2889:


+1 - Put it in Lucene and NOT Solr.  thanks.  

When this is implemented, using Google collections should be developed as well 
(which appropriately jettisons the cache values before OOM), ala the previously 
created though not committed SOLR-1513.

> Implement Adaptive Replacement Cache
> 
>
> Key: SOLR-2889
> URL: https://issues.apache.org/jira/browse/SOLR-2889
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 3.4
>Reporter: Shawn Heisey
>Priority: Minor
>
> Currently Solr's caches are LRU, which doesn't look at hitcount to decide 
> which entries are most important.  There is a method that takes both 
> frequency and time of cache hits into account:
> http://en.wikipedia.org/wiki/Adaptive_Replacement_Cache
> If it's feasible, this could be a good addition to Solr/Lucene.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Perhaps more efficient byte[] comparisons

2011-10-31 Thread Jason Rutherglen

"...benchmarks show it as being 2x more CPU-efficient than the
equivalent pure-Java implementation..."

https://issues.apache.org/jira/browse/HADOOP-7761

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2849) Solr maven dependencies: logging

2011-10-24 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134357#comment-13134357
 ] 

Jason Rutherglen commented on SOLR-2849:


{quote}As an aside, it's unfortunate to see all those velocity dependencies.  
It even depends on struts -- seriously?!  I hope solritas gets put back into a 
contrib sometime: SOLR-2588{quote}

+1, move it out!

> Solr maven dependencies: logging
> 
>
> Key: SOLR-2849
> URL: https://issues.apache.org/jira/browse/SOLR-2849
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0
>Reporter: David Smiley
>Priority: Trivial
>
> I was looking at my maven based project's Solr-core dependencies (trunk), and 
> observed some issues that I think should be fixed in Solr's maven poms. I ran 
> {{mvn dependency:tree}} -- the output is further below.  There are two 
> changes I see needed, related to logging:
> * slf4j-jdk14 should be runtime scope, and optional.
> * httpclient depends on commons-logging.  Exclude this dependency from the 
> httpclient dependency, and add a dependency on jcl-over-slf4j with compile 
> scope.
> * Zookeeper depends on Log4j, unfortunately. There is an issue to change this 
> to SLF4J: ZOOKEEPER-850. In the mean time we should exclude it and use 
> log4j-over-slf4j with compile scope, at the solrj pom.
> As an aside, it's unfortunate to see all those velocity dependencies.  It 
> even depends on struts -- seriously?!  I hope solritas gets put back into a 
> contrib sometime: SOLR-2588
> Steve, if you'd like to me to create the patch, I will.
> {code}
> [INFO] +- org.apache.solr:solr-core:jar:4.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.solr:solr-solrj:jar:4.0-SNAPSHOT:compile
> [INFO] |  |  \- org.apache.zookeeper:zookeeper:jar:3.3.3:compile
> [INFO] |  | +- log4j:log4j:jar:1.2.15:compile
> [INFO] |  | |  \- javax.mail:mail:jar:1.4:compile
> [INFO] |  | | \- javax.activation:activation:jar:1.1:compile
> [INFO] |  | \- jline:jline:jar:0.9.94:compile
> [INFO] |  +- org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT:compile
> [INFO] |  +- 
> org.apache.lucene:lucene-analyzers-phonetic:jar:4.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.lucene:lucene-highlighter:jar:4.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.lucene:lucene-memory:jar:4.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.lucene:lucene-misc:jar:4.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.lucene:lucene-queryparser:jar:4.0-SNAPSHOT:compile
> [INFO] |  |  \- org.apache.lucene:lucene-sandbox:jar:4.0-SNAPSHOT:compile
> [INFO] |  | \- jakarta-regexp:jakarta-regexp:jar:1.4:compile
> [INFO] |  +- org.apache.lucene:lucene-spatial:jar:4.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.lucene:lucene-suggest:jar:4.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.lucene:lucene-grouping:jar:4.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.solr:solr-commons-csv:jar:4.0-SNAPSHOT:compile
> [INFO] |  +- commons-codec:commons-codec:jar:1.4:compile
> [INFO] |  +- commons-fileupload:commons-fileupload:jar:1.2.1:compile
> [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
> [INFO] |  |  \- commons-logging:commons-logging:jar:1.0.4:compile
> [INFO] |  +- commons-io:commons-io:jar:1.4:compile
> [INFO] |  +- org.apache.velocity:velocity:jar:1.6.4:compile
> [INFO] |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
> [INFO] |  |  \- oro:oro:jar:2.0.8:compile
> [INFO] |  +- org.apache.velocity:velocity-tools:jar:2.0:compile
> [INFO] |  |  +- commons-beanutils:commons-beanutils:jar:1.7.0:compile
> [INFO] |  |  +- commons-digester:commons-digester:jar:1.8:compile
> [INFO] |  |  +- commons-chain:commons-chain:jar:1.1:compile
> [INFO] |  |  +- commons-validator:commons-validator:jar:1.3.1:compile
> [INFO] |  |  +- dom4j:dom4j:jar:1.1:compile
> [INFO] |  |  +- sslext:sslext:jar:1.2-0:compile
> [INFO] |  |  +- org.apache.struts:struts-core:jar:1.3.8:compile
> [INFO] |  |  |  \- antlr:antlr:jar:2.7.2:compile
> [INFO] |  |  +- org.apache.struts:struts-taglib:jar:1.3.8:compile
> [INFO] |  |  \- org.apache.struts:struts-tiles:jar:1.3.8:compile
> [INFO] |  +- org.slf4j:slf4j-jdk14:jar:1.6.1:compile
> [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3498) IndexReaderFactory for Lucene

2011-10-09 Thread Jason Rutherglen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3498:
-

Attachment: LUCENE-3498.patch

Removed SolrIndexReaderFactory, instead SolrCoreAware is used for the exact 
same purpose.  

Nice.

> IndexReaderFactory for Lucene
> -
>
> Key: LUCENE-3498
> URL: https://issues.apache.org/jira/browse/LUCENE-3498
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Affects Versions: 3.4, 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3498.patch, LUCENE-3498.patch, LUCENE-3498.patch, 
> LUCENE-3498.patch
>
>
> An IndexReaderFactory can be used by IndexWriter and DirectoryReader to 
> enable subclasses of DR to be instantiated by Lucene, automatically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3498) IndexReaderFactory for Lucene

2011-10-09 Thread Jason Rutherglen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3498:
-

Attachment: LUCENE-3498.patch

Added a new [backwards compatibility] SolrIndexReaderFactory that accepts a 
Solr core as a parameter.

All tests pass.

> IndexReaderFactory for Lucene
> -
>
> Key: LUCENE-3498
> URL: https://issues.apache.org/jira/browse/LUCENE-3498
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Affects Versions: 3.4, 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3498.patch, LUCENE-3498.patch, LUCENE-3498.patch
>
>
> An IndexReaderFactory can be used by IndexWriter and DirectoryReader to 
> enable subclasses of DR to be instantiated by Lucene, automatically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-09 Thread Jason Rutherglen (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen resolved LUCENE-3493.
--

Resolution: Unresolved

LUCENE-3498 supercedes this issue as a 100% workable solution.

> Solr reopen on a custom reader doesn't work
> ---
>
> Key: LUCENE-3493
> URL: https://issues.apache.org/jira/browse/LUCENE-3493
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.4
>    Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3493.patch
>
>
> When a custom index reader is used with Solr and reopen, the custom reader 
> vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3498) IndexReaderFactory for Lucene

2011-10-09 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123697#comment-13123697
 ] 

Jason Rutherglen commented on LUCENE-3498:
--

The current Solr IndexReaderFactory can be deprecated and / or replaced with 
one that accepts a Solr core.

> IndexReaderFactory for Lucene
> -
>
> Key: LUCENE-3498
> URL: https://issues.apache.org/jira/browse/LUCENE-3498
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Affects Versions: 3.4, 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3498.patch, LUCENE-3498.patch
>
>
> An IndexReaderFactory can be used by IndexWriter and DirectoryReader to 
> enable subclasses of DR to be instantiated by Lucene, automatically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3498) IndexReaderFactory for Lucene

2011-10-09 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123695#comment-13123695
 ] 

Jason Rutherglen commented on LUCENE-3498:
--

Also, the patch is against 3.x.

> IndexReaderFactory for Lucene
> -
>
> Key: LUCENE-3498
> URL: https://issues.apache.org/jira/browse/LUCENE-3498
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Affects Versions: 3.4, 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3498.patch, LUCENE-3498.patch
>
>
> An IndexReaderFactory can be used by IndexWriter and DirectoryReader to 
> enable subclasses of DR to be instantiated by Lucene, automatically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3498) IndexReaderFactory for Lucene

2011-10-09 Thread Jason Rutherglen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3498:
-

Attachment: LUCENE-3498.patch

Added support for attaching the new Lucene IndexReaderFactory to a Solr index 
at the solrconfig.xml level.  All tests pass.

> IndexReaderFactory for Lucene
> -
>
> Key: LUCENE-3498
> URL: https://issues.apache.org/jira/browse/LUCENE-3498
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Affects Versions: 3.4, 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3498.patch, LUCENE-3498.patch
>
>
> An IndexReaderFactory can be used by IndexWriter and DirectoryReader to 
> enable subclasses of DR to be instantiated by Lucene, automatically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3498) IndexReaderFactory for Lucene

2011-10-07 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123250#comment-13123250
 ] 

Jason Rutherglen commented on LUCENE-3498:
--

Simon, I think you'd be surprised at how many of the current [uber-complex] 
features of Lucene, few people use (readerTermsIndexDivisor, termIndexInterval, 
mergedSegmentWarmer are great examples, that are not-so-complex in IWC alone).  
For people who use this, the factory system is a lot more user friendly than 
subclassing.

A protected method in IW doesn't take into account opening a DR from a DR, to 
do that please commit LUCENE-3497.  If that gets in, we can open an issue to 
add a protected method to IW.

> IndexReaderFactory for Lucene
> -
>
> Key: LUCENE-3498
> URL: https://issues.apache.org/jira/browse/LUCENE-3498
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Affects Versions: 3.4, 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3498.patch
>
>
> An IndexReaderFactory can be used by IndexWriter and DirectoryReader to 
> enable subclasses of DR to be instantiated by Lucene, automatically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3498) IndexReaderFactory for Lucene

2011-10-07 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123232#comment-13123232
 ] 

Jason Rutherglen commented on LUCENE-3498:
--

I think the Solr one could be made better, by passing in the core that's 
creating it.  In that regard, Solr's can be improved rather than nuked.

> IndexReaderFactory for Lucene
> -
>
> Key: LUCENE-3498
> URL: https://issues.apache.org/jira/browse/LUCENE-3498
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Affects Versions: 3.4, 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3498.patch
>
>
> An IndexReaderFactory can be used by IndexWriter and DirectoryReader to 
> enable subclasses of DR to be instantiated by Lucene, automatically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3498) IndexReaderFactory for Lucene

2011-10-07 Thread Jason Rutherglen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3498:
-

Attachment: LUCENE-3498.patch

Here's a first cut and all tests pass!

> IndexReaderFactory for Lucene
> -
>
> Key: LUCENE-3498
> URL: https://issues.apache.org/jira/browse/LUCENE-3498
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Affects Versions: 3.4, 4.0
>    Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3498.patch
>
>
> An IndexReaderFactory can be used by IndexWriter and DirectoryReader to 
> enable subclasses of DR to be instantiated by Lucene, automatically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3498) IndexReaderFactory for Lucene

2011-10-07 Thread Jason Rutherglen (Created) (JIRA)

IndexReaderFactory for Lucene
-

 Key: LUCENE-3498
 URL: https://issues.apache.org/jira/browse/LUCENE-3498
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Jason Rutherglen
Priority: Minor


An IndexReaderFactory can be used by IndexWriter and DirectoryReader to enable 
subclasses of DR to be instantiated by Lucene, automatically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3497) Make DirectoryReader protected methods non-final

2011-10-07 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123102#comment-13123102
 ] 

Jason Rutherglen commented on LUCENE-3497:
--

I'll open another issue for the factory method of enabling Lucene to open 
custom DirectoryReader's.

> Make DirectoryReader protected methods non-final
> 
>
> Key: LUCENE-3497
> URL: https://issues.apache.org/jira/browse/LUCENE-3497
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.4, 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3497.patch
>
>
> DirectoryReader has protected methods that are overridden and made final.  
> This is silly because it prevents other classes from overriding 
> DirectoryReader.  The methods are doOpenIfChanged(*) and a handful of related 
> variables that are private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3497) Make DirectoryReader protected methods non-final

2011-10-07 Thread Jason Rutherglen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3497:
-

Attachment: LUCENE-3497.patch

The patch makes most of what is private in DR protected.  Protected is just 
like private, however, only subclasses can access protected methods and 
variables, or in the same package.

> Make DirectoryReader protected methods non-final
> 
>
> Key: LUCENE-3497
> URL: https://issues.apache.org/jira/browse/LUCENE-3497
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.4, 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3497.patch
>
>
> DirectoryReader has protected methods that are overridden and made final.  
> This is silly because it prevents other classes from overriding 
> DirectoryReader.  The methods are doOpenIfChanged(*) and a handful of related 
> variables that are private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3497) Make DirectoryReader protected methods non-final

2011-10-07 Thread Jason Rutherglen (Created) (JIRA)

Make DirectoryReader protected methods non-final


 Key: LUCENE-3497
 URL: https://issues.apache.org/jira/browse/LUCENE-3497
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Jason Rutherglen
Priority: Blocker


DirectoryReader has protected methods that are overridden and made final.  This 
is silly because it prevents other classes from overriding DirectoryReader.  
The methods are doOpenIfChanged(*) and a handful of related variables that are 
private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-07 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122858#comment-13122858
 ] 

Jason Rutherglen commented on LUCENE-1536:
--

+1

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122506#comment-13122506
 ] 

Jason Rutherglen commented on LUCENE-3493:
--

Uwe, check this out on the 3.3 version, what do you think? :)

{code}
@Override
public final IndexReader reopen() throws CorruptIndexException, IOException {
  // Preserve current readOnly
  return doReopen(readOnly, null);
}

@Override
public final IndexReader reopen(boolean openReadOnly) throws 
CorruptIndexException, IOException {
  return doReopen(openReadOnly, null);
}

@Override
public final IndexReader reopen(final IndexCommit commit) throws 
CorruptIndexException, IOException {
  return doReopen(true, commit);
}
{code}

> Solr reopen on a custom reader doesn't work
> ---
>
> Key: LUCENE-3493
> URL: https://issues.apache.org/jira/browse/LUCENE-3493
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.4
>Reporter: Jason Rutherglen
>Priority: Blocker
> Attachments: LUCENE-3493.patch
>
>
> When a custom index reader is used with Solr and reopen, the custom reader 
> vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122095#comment-13122095
 ] 

Jason Rutherglen commented on LUCENE-3493:
--

One way to solve all of this without subclassing, is to move the 
IndexReaderFactory to Lucene, integrate it into IW and DR.  

That will be much cleaner than forcing users to subclass, which is a monstrous 
pain, and will generate excessive unnecessary code in the end.

> Solr reopen on a custom reader doesn't work
> ---
>
> Key: LUCENE-3493
> URL: https://issues.apache.org/jira/browse/LUCENE-3493
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.4
>Reporter: Jason Rutherglen
>Priority: Blocker
> Attachments: LUCENE-3493.patch
>
>
> When a custom index reader is used with Solr and reopen, the custom reader 
> vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122089#comment-13122089
 ] 

Jason Rutherglen commented on LUCENE-3493:
--

Uwe, I tried your idea.  It doesn't work!  Here's why: DR.writeLock and 
DR.segmentInfos are private.  Meaning the re-duplicated code because the useful 
methods aren't protected, cannot access these private variables.  Of course one 
can use reflection but that's just 'atrocious'.  :)

> Solr reopen on a custom reader doesn't work
> ---
>
> Key: LUCENE-3493
> URL: https://issues.apache.org/jira/browse/LUCENE-3493
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.4
>Reporter: Jason Rutherglen
>Priority: Blocker
> Attachments: LUCENE-3493.patch
>
>
> When a custom index reader is used with Solr and reopen, the custom reader 
> vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122050#comment-13122050
 ] 

Jason Rutherglen commented on LUCENE-3493:
--

bq. Solr's NRT does not rely on a custom IndexReader

Yikes, logically the custom reader functionality should!

{quote}properly override doOpenIfChanged, else it would be a bug{quote}

It's a bug because there's no way to implement that today.  The DirectoryReader 
is created deep inside of IW.getReader (there's no way to re-implement it's 
functionality either because of private variable access).  

I think we need a protected method for creating reader in IW.  I think though 
this becomes almost endless because I don't think there's a way to implement a 
custom IW in Solr.

> Solr reopen on a custom reader doesn't work
> ---
>
> Key: LUCENE-3493
> URL: https://issues.apache.org/jira/browse/LUCENE-3493
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.4
>Reporter: Jason Rutherglen
>Priority: Blocker
> Attachments: LUCENE-3493.patch
>
>
> When a custom index reader is used with Solr and reopen, the custom reader 
> vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122033#comment-13122033
 ] 

Jason Rutherglen commented on LUCENE-3493:
--

Uwe, I'd like to agree with you, however I cannot (because then I wouldn't have 
had to create an issue!).  Look at DR.doOpen* methods.  They're private.  
There's no reason for them to be.  They need to be protected, that's in the 
next patch.  Fairly simple.  The follow on to this is overriding IW to return 
custom readers.  I had an issue and patch for that a while back.  It's best to 
implement both here, as Lucene 4.x Solr's NRT will show the same problem!

I think you're right, looks like this *could* be done be overriding 
doOpenIfChanged* however, it doesn't make sense to duplicate code!

> Solr reopen on a custom reader doesn't work
> ---
>
> Key: LUCENE-3493
> URL: https://issues.apache.org/jira/browse/LUCENE-3493
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.4
>Reporter: Jason Rutherglen
>Priority: Blocker
> Attachments: LUCENE-3493.patch
>
>
> When a custom index reader is used with Solr and reopen, the custom reader 
> vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122019#comment-13122019
 ] 

Jason Rutherglen commented on LUCENE-3493:
--

The patch shows the bug only.  Which needs a test in Solr.  The next patch will 
show the fix etc.  A Lucene test makes sense as well.  

> Solr reopen on a custom reader doesn't work
> ---
>
> Key: LUCENE-3493
> URL: https://issues.apache.org/jira/browse/LUCENE-3493
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.4
>Reporter: Jason Rutherglen
>Priority: Blocker
> Attachments: LUCENE-3493.patch
>
>
> When a custom index reader is used with Solr and reopen, the custom reader 
> vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: PagedBytes additional method

2011-10-06 Thread Jason Rutherglen

I try not to without having a patch somewhat prepared!

On Thu, Oct 6, 2011 at 11:38 AM, Simon Willnauer
 wrote:
> why don't you open an issue for this?
>
> thanks,
>
> simon
>
> On Thu, Oct 6, 2011 at 5:33 PM, Jason Rutherglen
>  wrote:
>> PagedBytes is great!  Even better would be a couple of additional
>> methods, one to write it out to an IndexOutput and the other for the
>> total bytes used.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

PagedBytes additional method

2011-10-06 Thread Jason Rutherglen

PagedBytes is great!  Even better would be a couple of additional
methods, one to write it out to an IndexOutput and the other for the
total bytes used.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3493:
-

Attachment: LUCENE-3493.patch

Patch with unit test demonstrating the bug.  

The fix required in Lucene is randomly in the patch as well.

I'll post another patch showing the Lucene fix, allows fixing the bug on the 
Solr side.

> Solr reopen on a custom reader doesn't work
> ---
>
> Key: LUCENE-3493
> URL: https://issues.apache.org/jira/browse/LUCENE-3493
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.4
>Reporter: Jason Rutherglen
>Priority: Blocker
> Attachments: LUCENE-3493.patch
>
>
> When a custom index reader is used with Solr and reopen, the custom reader 
> vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Created) (JIRA)

Solr reopen on a custom reader doesn't work
---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker


When a custom index reader is used with Solr and reopen, the custom reader 
vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3488) Factor out SearcherManager from NRTManager

2011-10-05 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121556#comment-13121556
 ] 

Jason Rutherglen commented on LUCENE-3488:
--

bq. Arn't you used to pushback on your code / ideas by now?

Mark, I missed this, it's particularly funny given this issue isn't mine!  
Please stay on topic! (Sorry Simon, nice work!)

> Factor out SearcherManager from NRTManager
> --
>
> Key: LUCENE-3488
> URL: https://issues.apache.org/jira/browse/LUCENE-3488
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.5, 4.0
>Reporter: Simon Willnauer
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3488.patch, LUCENE-3488.patch
>
>
> Currently we have NRTManager and SearcherManager while NRTManager contains a 
> big piece of the code that is already in SearcherManager. Users are kind of 
> forced to use NRTManager if they want to have SearcherManager goodness with 
> NRT. The integration into NRTManager also forces you to maintain two 
> instances even if you know you always want deletes. To me NRTManager tries to 
> do more than necessary and mixes lots of responsibilities ie. handling 
> searchers and handling indexing generations. NRTManager should use a 
> SearcherManager by aggregation rather than duplicate a lot of logic. 
> SearcherManager should have a NRT and Directory based implementation users 
> can simply choose from.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3488) Factor out SearcherManager from NRTManager

2011-10-05 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121381#comment-13121381
 ] 

Jason Rutherglen commented on LUCENE-3488:
--

bq. Ok. lets got back to this code. non of the comments have been related to 
this!

+1 to the code! 

> Factor out SearcherManager from NRTManager
> --
>
> Key: LUCENE-3488
> URL: https://issues.apache.org/jira/browse/LUCENE-3488
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.5, 4.0
>Reporter: Simon Willnauer
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3488.patch
>
>
> Currently we have NRTManager and SearcherManager while NRTManager contains a 
> big piece of the code that is already in SearcherManager. Users are kind of 
> forced to use NRTManager if they want to have SearcherManager goodness with 
> NRT. The integration into NRTManager also forces you to maintain two 
> instances even if you know you always want deletes. To me NRTManager tries to 
> do more than necessary and mixes lots of responsibilities ie. handling 
> searchers and handling indexing generations. NRTManager should use a 
> SearcherManager by aggregation rather than duplicate a lot of logic. 
> SearcherManager should have a NRT and Directory based implementation users 
> can simply choose from.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3488) Factor out SearcherManager from NRTManager

2011-10-05 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121369#comment-13121369
 ] 

Jason Rutherglen commented on LUCENE-3488:
--

bq. manager classes leaner 

Leaner, better, modularized, pluggable, etc ... etc.  

SolrCore is *final*.  I remember having that debate a while back with Chris 
Hostetter.  Why Solr needs to be monolithic, I don't know.  Attempts to fix 
that have met with, and continue to be met with push back.  That is quite 
evidently clear!

> Factor out SearcherManager from NRTManager
> --
>
> Key: LUCENE-3488
> URL: https://issues.apache.org/jira/browse/LUCENE-3488
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.5, 4.0
>Reporter: Simon Willnauer
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3488.patch
>
>
> Currently we have NRTManager and SearcherManager while NRTManager contains a 
> big piece of the code that is already in SearcherManager. Users are kind of 
> forced to use NRTManager if they want to have SearcherManager goodness with 
> NRT. The integration into NRTManager also forces you to maintain two 
> instances even if you know you always want deletes. To me NRTManager tries to 
> do more than necessary and mixes lots of responsibilities ie. handling 
> searchers and handling indexing generations. NRTManager should use a 
> SearcherManager by aggregation rather than duplicate a lot of logic. 
> SearcherManager should have a NRT and Directory based implementation users 
> can simply choose from.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3488) Factor out SearcherManager from NRTManager

2011-10-05 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121350#comment-13121350
 ] 

Jason Rutherglen commented on LUCENE-3488:
--

Atrocious or perhaps horrible is:

Lines 1041 - 1345 of [1].  Saying patches are welcome when these issues were 
brought up in SOLR-2193 when that gave way to SOLR-2565 which ended up being 
155k plus several additional patches is ludicrous. :)  A redesign could / 
should have yielded much better results.  I didn't.  

1. 
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/core/SolrCore.java?view=markup

> Factor out SearcherManager from NRTManager
> --
>
> Key: LUCENE-3488
> URL: https://issues.apache.org/jira/browse/LUCENE-3488
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.5, 4.0
>Reporter: Simon Willnauer
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3488.patch
>
>
> Currently we have NRTManager and SearcherManager while NRTManager contains a 
> big piece of the code that is already in SearcherManager. Users are kind of 
> forced to use NRTManager if they want to have SearcherManager goodness with 
> NRT. The integration into NRTManager also forces you to maintain two 
> instances even if you know you always want deletes. To me NRTManager tries to 
> do more than necessary and mixes lots of responsibilities ie. handling 
> searchers and handling indexing generations. NRTManager should use a 
> SearcherManager by aggregation rather than duplicate a lot of logic. 
> SearcherManager should have a NRT and Directory based implementation users 
> can simply choose from.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3488) Factor out SearcherManager from NRTManager

2011-10-05 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120984#comment-13120984
 ] 

Jason Rutherglen commented on LUCENE-3488:
--

bq. has been around for 7 years

That's far too long.  Hence the push for modules.

> Factor out SearcherManager from NRTManager
> --
>
> Key: LUCENE-3488
> URL: https://issues.apache.org/jira/browse/LUCENE-3488
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.5, 4.0
>Reporter: Simon Willnauer
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3488.patch
>
>
> Currently we have NRTManager and SearcherManager while NRTManager contains a 
> big piece of the code that is already in SearcherManager. Users are kind of 
> forced to use NRTManager if they want to have SearcherManager goodness with 
> NRT. The integration into NRTManager also forces you to maintain two 
> instances even if you know you always want deletes. To me NRTManager tries to 
> do more than necessary and mixes lots of responsibilities ie. handling 
> searchers and handling indexing generations. NRTManager should use a 
> SearcherManager by aggregation rather than duplicate a lot of logic. 
> SearcherManager should have a NRT and Directory based implementation users 
> can simply choose from.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Solr IndexReaderFactory doesn't return the correct reader on reopen

2011-10-05 Thread Jason Rutherglen

This is due to this code DirectoryReader.  The AlternateDirectoryTest
class is wy too minimal in it's testing as well...  We probably
need to allow doReopen to be subclassed.

private synchronized DirectoryReader doReopen(SegmentInfos infos,
boolean doClone, boolean openReadOnly) throws CorruptIndexException,
IOException {
DirectoryReader reader;
if (openReadOnly) {
  reader = new ReadOnlyDirectoryReader(directory, infos,
subReaders, starts, normsCache, doClone, termInfosIndexDivisor,
readerFinishedListeners);
} else {
  reader = new DirectoryReader(directory, infos, subReaders,
starts, normsCache, false, doClone, termInfosIndexDivisor,
readerFinishedListeners);
}
return reader;
  }

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3486) Add SearcherLifetimeManager, so you can retrieve the same searcher you previously used

2011-10-05 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120960#comment-13120960
 ] 

Jason Rutherglen commented on LUCENE-3486:
--

bq. Looks good. I noticed you marked close() with @Override. Are we on Java 6 
in 3.x?

@Override is all over the place in Solr!?

> Add SearcherLifetimeManager, so you can retrieve the same searcher you 
> previously used
> --
>
> Key: LUCENE-3486
> URL: https://issues.apache.org/jira/browse/LUCENE-3486
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3486.patch, LUCENE-3486.patch
>
>
> The idea is similar to SOLR-2809 (adding searcher leases to Solr).
> This utility class sits above whatever your source is for "the
> current" searcher (eg NRTManager, SearcherManager, etc.), and records
> (holds a reference to) each searcher in recent history.
> The idea is to ensure that when a user does a follow-on action (clicks
> next page, drills down/up), or when two or more searcher invocations
> within a single user search need to happen against the same searcher
> (eg in distributed search), you can retrieve the same searcher you
> used "last time".
> I think with the new searchAfter API (LUCENE-2215), doing follow-on
> searches on the same searcher is more important, since the "bottom"
> (score/docID) held for that API can easily shift when a new searcher
> is opened.
> When you do a "new" search, you record the searcher you used with the
> manager, and it returns to you a long token (currently just the
> IR.getVersion()), which you can later use to retrieve the same
> searcher.
> Separately you must periodically call prune(), to prune the old
> searchers, ideally from the same thread / at the same time that
> you open a new searcher.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3488) Factor out SearcherManager from NRTManager

2011-10-05 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120941#comment-13120941
 ] 

Jason Rutherglen commented on LUCENE-3488:
--

Great!  Next step is integrating it into Solr and nuking the current atrocious 
Solr code.  :)

> Factor out SearcherManager from NRTManager
> --
>
> Key: LUCENE-3488
> URL: https://issues.apache.org/jira/browse/LUCENE-3488
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.5, 4.0
>Reporter: Simon Willnauer
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3488.patch
>
>
> Currently we have NRTManager and SearcherManager while NRTManager contains a 
> big piece of the code that is already in SearcherManager. Users are kind of 
> forced to use NRTManager if they want to have SearcherManager goodness with 
> NRT. The integration into NRTManager also forces you to maintain two 
> instances even if you know you always want deletes. To me NRTManager tries to 
> do more than necessary and mixes lots of responsibilities ie. handling 
> searchers and handling indexing generations. NRTManager should use a 
> SearcherManager by aggregation rather than duplicate a lot of logic. 
> SearcherManager should have a NRT and Directory based implementation users 
> can simply choose from.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2809) searcher leases

2011-10-04 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120223#comment-13120223
 ] 

Jason Rutherglen commented on SOLR-2809:


SOLR-2778 is the issue that seeks to clean up and modularize the distributed 
search code.  

> searcher leases
> ---
>
> Key: SOLR-2809
> URL: https://issues.apache.org/jira/browse/SOLR-2809
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>
> Leases/reservations on searcher instances would give us the ability to use 
> the same searcher across phases of a distributed search, or for clients to 
> send multiple requests and have them hit a consistent/unchanging view of the 
> index. The latter requires something extra to ensure that the load balancer 
> contacts the same replicas as before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2809) searcher leases

2011-10-04 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120217#comment-13120217
 ] 

Jason Rutherglen commented on SOLR-2809:


{quote}no need to modify any of the guts of the complex 
SolrCore.getSearcher(){quote}

That code should be removed entirely, better now in 4.x.

The unique id per searcher idea would work, however it needs to also implement 
a retry when a given id no longer exists.

Still, this would be best implemented in the context of rewriting distributed 
search and the getSearcher code.  Otherwise is layering hacked up code on top 
of further hacked up code.  It's a mess to debug and change later.

> searcher leases
> ---
>
> Key: SOLR-2809
> URL: https://issues.apache.org/jira/browse/SOLR-2809
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>
> Leases/reservations on searcher instances would give us the ability to use 
> the same searcher across phases of a distributed search, or for clients to 
> send multiple requests and have them hit a consistent/unchanging view of the 
> index. The latter requires something extra to ensure that the load balancer 
> contacts the same replicas as before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2809) searcher leases

2011-10-04 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120211#comment-13120211
 ] 

Jason Rutherglen commented on SOLR-2809:


{quote}
if we support distributed stats (idf etc), then this assumption could break 
down in a lot of cases, because the stats from the first phase don't make sense 
with regards to the documents being scored in the second phrase.
{quote}

They would make sense.  Though it's like discussing the wind since RT isn't 
completed.  Sounds like you're thinking of a general retry?  That's a good 
idea, it would need to retry the entire distributed query in all phases.

That functionality should be modular and should not be 'pre-baked / canned' 
into Solr.  [Again] a simple policy class would suffice here.

> searcher leases
> ---
>
> Key: SOLR-2809
> URL: https://issues.apache.org/jira/browse/SOLR-2809
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>
> Leases/reservations on searcher instances would give us the ability to use 
> the same searcher across phases of a distributed search, or for clients to 
> send multiple requests and have them hit a consistent/unchanging view of the 
> index. The latter requires something extra to ensure that the load balancer 
> contacts the same replicas as before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2595) Split and migrate indexes

2011-10-04 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120189#comment-13120189
 ] 

Jason Rutherglen commented on SOLR-2595:


How will splitting occur on an index that is actively being updated?

> Split and migrate indexes
> -
>
> Key: SOLR-2595
> URL: https://issues.apache.org/jira/browse/SOLR-2595
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, replication (java), SolrCloud
>Reporter: Shalin Shekhar Mangar
> Fix For: 4.0
>
>
> When an shard's index grows too large or a shard becomes too loaded, it 
> should be possible to split parts of a shard's index and migrate/merge to a 
> less loaded node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2809) searcher leases

2011-10-04 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120187#comment-13120187
 ] 

Jason Rutherglen commented on SOLR-2809:


In RT the searchers are cheap.  The easiest approach would be to record the 
segments and max doc ids used to satisfy phase 1 of a given distributed query, 
then send that signature back in subsequent phases.  The retry would only be 
necessary in the infrequent case of a merge have occurred.

With NRT it's probably best to implement a searcher policy similar to index 
deletion policy.  Then any timeout / searcher removal system can be implemented 
by the user, rather than dictated by Solr.  

The described searcher management system belongs in a module in Lucene rather 
than Solr, probably in one of Mike's new classes.

> searcher leases
> ---
>
> Key: SOLR-2809
> URL: https://issues.apache.org/jira/browse/SOLR-2809
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>
> Leases/reservations on searcher instances would give us the ability to use 
> the same searcher across phases of a distributed search, or for clients to 
> send multiple requests and have them hit a consistent/unchanging view of the 
> index. The latter requires something extra to ensure that the load balancer 
> contacts the same replicas as before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene / Solr 4.x release

2011-10-03 Thread Jason Rutherglen

Will the bulk postings only help PFOR?

On Sun, Oct 2, 2011 at 2:28 PM, Uwe Schindler  wrote:
> ...And flexible stored fields + TV, so the file format is complete flexible.
>
> Uwe
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de
>
>
>
> Jason Rutherglen  schrieb:
>>
>> I asked this a little while ago, and figured I'd ask again.  It seemed
>> like the important remaining issue was the bulk postings iterator?  Is
>> that still true?  Thanks!
>>
>> 
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene / Solr 4.x release

2011-10-02 Thread Jason Rutherglen

I asked this a little while ago, and figured I'd ask again.  It seemed
like the important remaining issue was the bulk postings iterator?  Is
that still true?  Thanks!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader

2011-09-21 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109599#comment-13109599
 ] 

Jason Rutherglen commented on LUCENE-3441:
--

It would be great if the cost of (re)opening a new LTR is.  Also an explanation 
of what it's doing underneath.

> Add NRT support to LuceneTaxonomyReader
> ---
>
> Key: LUCENE-3441
> URL: https://issues.apache.org/jira/browse/LUCENE-3441
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to 
> LuceneTaxonomyWriter, you cannot have the reader updated, like 
> IndexReader/Writer. In order to do that we need to do the following:
> # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with 
> LuceneTaxonomyWriter.
> # Add API to LuceneTaxonomyWriter to expose its internal IndexReader
> # Change LTR.refresh() to return an LTR, rather than void. This is actually 
> not strictly related to that issue, but since we'll need to modify refresh() 
> impl, I think it'll be good to change its API as well. Since all of facet API 
> is @lucene.experimental, no backwards issues here (and the sooner we do it, 
> the better).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2778) Revise distributed code inside SearchComponents

2011-09-19 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108316#comment-13108316
 ] 

Jason Rutherglen commented on SOLR-2778:


Sweet-ness.com!

> Revise distributed code inside SearchComponents
> ---
>
> Key: SOLR-2778
> URL: https://issues.apache.org/jira/browse/SOLR-2778
> Project: Solr
>  Issue Type: Improvement
>Reporter: Martijn van Groningen
>
> The distributed code inside search components such as QueryComponent and 
> FacetComponent is complex. By structuring responsibilities
> the code becomes less complex and easier to understand. There is already a 
> start for this that was part of distributed grouping (SOLR-2066).
> The following concepts were developed inside QueryComponent for SOLR-2066:
> * ShardRequestFactory is responsible for creating requests to shards in the 
> cluster based on the incoming request from the client.
> * ShardResultTransformer. Transforming a NamedList response from the client 
> in for example SearchGroup or TopDocs instance.
> * ShardResponseProcessor. Basically merges the shard responses. The 
> ShardReponseProcessor uses a ShardResultTransformer to transform the shard 
> response into a native structure (SearchGroup / TopGroups).
> These concepts are now only used for distributed grouping, but I think can 
> also be used for non grouped distributed search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search

2011-09-15 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105474#comment-13105474
 ] 

Jason Rutherglen commented on SOLR-2066:


+1 on "Concepts that can also be used for non grouped distributed searches" in 
a separate issue.  The Solr distributed search code is overly complicated.

> Search Grouping: support distributed search
> ---
>
> Key: SOLR-2066
> URL: https://issues.apache.org/jira/browse/SOLR-2066
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Yonik Seeley
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch
>
>
> Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-09-13 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104165#comment-13104165
 ] 

Jason Rutherglen commented on LUCENE-3433:
--

Here's another thread discussing MMap'ing and field caches, where the consensus 
is against it:

http://www.lucidimagination.com/search/document/70623ef5879bca38/fst_and_fieldcache#45006a7fe2847c09
 posted in "1969-12-31 19:00" :)

> Random access non RAM resident IndexDocValues (CSF)
> ---
>
> Key: LUCENE-3433
> URL: https://issues.apache.org/jira/browse/LUCENE-3433
> Project: Lucene - Java
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Yonik Seeley
> Fix For: 4.0
>
>
> There should be a way to get specific IndexDocValues by going through the 
> Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-09-13 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104079#comment-13104079
 ] 

Jason Rutherglen commented on LUCENE-3433:
--

This is somewhat funny, as it seems the opinion has changed on MMap'ing and the 
potential for page faults:

http://www.lucidimagination.com/search/document/8951a336dffa9535/storing_and_loading_the_fst_directly_from_disk#8951a336dffa9535

> Random access non RAM resident IndexDocValues (CSF)
> ---
>
> Key: LUCENE-3433
> URL: https://issues.apache.org/jira/browse/LUCENE-3433
> Project: Lucene - Java
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Yonik Seeley
> Fix For: 4.0
>
>
> There should be a way to get specific IndexDocValues by going through the 
> Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-09-09 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101391#comment-13101391
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

There are many important use cases for immediate / zero delay index readers.

I'm not sure if people realize it, but one of the major gains from this issue, 
is the ability to obtain a reader after every indexed document.  

In this case, instead of performing an array copy of the RT data structures, we 
will queue the changes, and then apply to the new reader.  For arrays like term 
freqs, we will use a temp hash map of the changes made since the main array was 
created (when the hash map grows too large we can perform a full array copy).



> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, 
> LUCENE-2312.patch, LUCENE-2312.patch
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Regarding Transaction logging

2011-09-08 Thread Jason Rutherglen

This isn't a new problem.  Databases have been around for what, 30+ years?

On Thu, Sep 8, 2011 at 11:01 AM, Simon Willnauer
 wrote:
> On Thu, Sep 8, 2011 at 4:21 PM, Jason Rutherglen
>  wrote:
>> The delete by query is solved by recording the primary / UID of the
>> document(s) deleted.  It's only expensive if the transaction log
>> implementation is not designed properly.  :)
>
> phew I don't think this is realistic. I mean this could be a lot of
> documents and looking up a lot of primary keys, plus you need to know
> what the primary key is and you somehow need to do this async. I don't
> consider this as an option.
>
> simon
>>
>> On Thu, Sep 8, 2011 at 5:35 AM, Simon Willnauer
>>  wrote:
>>> hey folks,
>>>
>>> we already have transaction logging on Solr side so I should have
>>> started this discussion earlier. However, I want to bring this up to
>>> the list since I think this is a very valuable feature also for plain
>>> Lucene users and eventually this should also be available to them. I
>>> don't think this needs to be a core feature at all but I think we need
>>> to provide the necessary hooks in Lucene core to make this reliable
>>> and consistent. I have a couple of concerns that which the current
>>> extension mechanism we provide on the IndexWriter side this feature
>>> can only be implemented in a sub-optimal way on the Solr (or basically
>>> on top of lucene) but lemme elaborate this a little.
>>>
>>> IndexWriter doesn't provide any transaction guarantees neither does it
>>> give any guarantees on the order. So if you index two versions of a
>>> document with the same delete key you can't tell which one wins unless
>>> you prevent IW from seeing those two documents at the same time ie.
>>> locking before you hit IW. This is basically what other implementation
>>> do like ElasticSearch which uses locks assigned to buckets in an array
>>> selected based on the del terms hash. However this gets a little more
>>> complex once you get to DeleteQueries where you can't tell which
>>> document is affected so they might be misplaced in the transaction log
>>> if the order doesn't match the order the IW sees. Under the hood IW
>>> does maintain such an order inside the DocumentsWriterDeleteQueue
>>> which could be utilized to provide a total ordering that IMO should be
>>> reflected in the transaction log.
>>>
>>> Before I am going to propose ways of how this could be implemented I
>>> want to check if other think we should provide more reliable ways for
>>> users with the need for durability and consistent recovery.
>>>
>>> simon
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Regarding Transaction logging

2011-09-08 Thread Jason Rutherglen

The delete by query is solved by recording the primary / UID of the
document(s) deleted.  It's only expensive if the transaction log
implementation is not designed properly.  :)

On Thu, Sep 8, 2011 at 5:35 AM, Simon Willnauer
 wrote:
> hey folks,
>
> we already have transaction logging on Solr side so I should have
> started this discussion earlier. However, I want to bring this up to
> the list since I think this is a very valuable feature also for plain
> Lucene users and eventually this should also be available to them. I
> don't think this needs to be a core feature at all but I think we need
> to provide the necessary hooks in Lucene core to make this reliable
> and consistent. I have a couple of concerns that which the current
> extension mechanism we provide on the IndexWriter side this feature
> can only be implemented in a sub-optimal way on the Solr (or basically
> on top of lucene) but lemme elaborate this a little.
>
> IndexWriter doesn't provide any transaction guarantees neither does it
> give any guarantees on the order. So if you index two versions of a
> document with the same delete key you can't tell which one wins unless
> you prevent IW from seeing those two documents at the same time ie.
> locking before you hit IW. This is basically what other implementation
> do like ElasticSearch which uses locks assigned to buckets in an array
> selected based on the del terms hash. However this gets a little more
> complex once you get to DeleteQueries where you can't tell which
> document is affected so they might be misplaced in the transaction log
> if the order doesn't match the order the IW sees. Under the hood IW
> does maintain such an order inside the DocumentsWriterDeleteQueue
> which could be utilized to provide a total ordering that IMO should be
> reflected in the transaction log.
>
> Before I am going to propose ways of how this could be implemented I
> want to check if other think we should provide more reliable ways for
> users with the need for durability and consistent recovery.
>
> simon
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2748) autocommit commits too many times

2011-09-07 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099722#comment-13099722
 ] 

Jason Rutherglen commented on SOLR-2748:


Seeing all of the bugs related to the Solr NRT code, I can't help but wonder 
why the 4.x version of the project needs to be backward compatible.  

Also why it's not using IndexReaderWarmer which was ostensibly created 
precisely for Solr's usage (and, it's not used in Solr and never has been).

> autocommit commits too many times
> -
>
> Key: SOLR-2748
> URL: https://issues.apache.org/jira/browse/SOLR-2748
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
> Attachments: SOLR-2748.patch
>
>
> autocommit seems to commit more frequently than configured.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2700) transaction logging

2011-09-07 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099239#comment-13099239
 ] 

Jason Rutherglen commented on SOLR-2700:


This is going to best be amazing, I wonder if other projects have already 
implemented these features years ago?

> transaction logging
> ---
>
> Key: SOLR-2700
> URL: https://issues.apache.org/jira/browse/SOLR-2700
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Attachments: SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, 
> SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch
>
>
> A transaction log is needed for durability of updates, for a more performant 
> realtime-get, and for replaying updates to recovering peers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2700) transaction logging

2011-09-07 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099138#comment-13099138
 ] 

Jason Rutherglen commented on SOLR-2700:


I'm not sure how this feature makes any sense, the documents are already being 
serialized to disk, eg, to the docstore by StoredFieldsWriter.  Now the system 
will be serializing the exact same documents twice, that is extremely 
redundant.  

> transaction logging
> ---
>
> Key: SOLR-2700
> URL: https://issues.apache.org/jira/browse/SOLR-2700
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Attachments: SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, 
> SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch
>
>
> A transaction log is needed for durability of updates, for a more performant 
> realtime-get, and for replaying updates to recovering peers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

DirectoryReader package protected?

2011-09-06 Thread Jason Rutherglen

I was browsing code, and noticed DirectoryReader is package protected.
 Why is this?  Ie, SegmentReader is not.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-05 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097257#comment-13097257
 ] 

Jason Rutherglen commented on LUCENE-3199:
--

Ok, solved the above comment by taking the sorted ord array and building a new 
reverse array from that... 

> Add non-desctructive sort to BytesRefHash
> -
>
> Key: LUCENE-3199
> URL: https://issues.apache.org/jira/browse/LUCENE-3199
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, 
> LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns 
> a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-05 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097246#comment-13097246
 ] 

Jason Rutherglen commented on LUCENE-3199:
--

I started integrating the patch into LUCENE-2312.  I think the main 
functionality missing is a reverse int[] that points from a term id to the 
sorted ords array.  That array would be used for implementing the RT version of 
DocTermsIndex, where a doc id -> term id -> sorted term id index.  

> Add non-desctructive sort to BytesRefHash
> -
>
> Key: LUCENE-3199
> URL: https://issues.apache.org/jira/browse/LUCENE-3199
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, 
> LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns 
> a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-02 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096231#comment-13096231
 ] 

Jason Rutherglen commented on LUCENE-3199:
--

Simon, I think your patch should be in a different issue, eg, sorted bytes ref 
hash view or something.

> Add non-desctructive sort to BytesRefHash
> -
>
> Key: LUCENE-3199
> URL: https://issues.apache.org/jira/browse/LUCENE-3199
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, 
> LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns 
> a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-02 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096108#comment-13096108
 ] 

Jason Rutherglen commented on LUCENE-3199:
--

Simon,

In summary this is using the BytesRefHash sort, performing array copies and
then merge [sorting] into a new copy / view. 

Array copies are fast and counter intuitively generate far less garbage than
objects (in Java). 

Instead of creating term 'segments' that would be merged while iterating the
terms enum, we'll be generating static point-in-time terms dict views. These
will be useful for enabling DocTermsIndex field caches for RT, the only
remaining design 'challenge' for RT / LUCENE-2312. Because there is a terms
hash, we can seek exact to the term rather than perform an [optimized] seek to
the term.

> Add non-desctructive sort to BytesRefHash
> -
>
> Key: LUCENE-3199
> URL: https://issues.apache.org/jira/browse/LUCENE-3199
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, 
> LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns 
> a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-02 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3199:
-

Attachment: LUCENE-3199.patch

This is a minor update when compared with the last patch.  

It adds the option of pruning the [oversized] int[] returned by the compact 
method.  

Added are additional unit tests.

> Add non-desctructive sort to BytesRefHash
> -
>
> Key: LUCENE-3199
> URL: https://issues.apache.org/jira/browse/LUCENE-3199
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3199.patch, LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns 
> a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

2011-09-01 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3199:
-

Attachment: LUCENE-3199.patch

Here's a version of this issue.  Added are a couple of new methods to 
TestBytesRefHash to test the new frozen compact and sorting functionality of 
BytesRefHash.

This is being posted now because it's useful in relation to LUCENE-2312 and a 
terms dictionary that is composed of sorted by term[id]s int[]s.

> Add non-desctructive sort to BytesRefHash
> -
>
> Key: LUCENE-3199
> URL: https://issues.apache.org/jira/browse/LUCENE-3199
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns 
> a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-09-01 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2312:
-

Fix Version/s: (was: Realtime Branch)
Affects Version/s: (was: Realtime Branch)
   4.0

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: 4.0
>    Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, 
> LUCENE-2312.patch, LUCENE-2312.patch
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-09-01 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095412#comment-13095412
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

I'll post a new patch shortly that fixes bugs and adds a bit more to the
functionality.

The benchmark results are interesting. Array copies are very fast, I don't see
any problems with that, the median time is 2 ms. The concurrent skip list map
is expensive to add numerous 10s of thousands of terms to. I think that is to
be expected. The strategy of amortizing that cost by creating sorted by term
int[]s will probably be more performant than CSLM. 

The sorted int[] terms can be merged just like segments, thus RT becomes a way
to remove the [NRT] cost of merging [numerous] postings lists. The int[] terms
can be merged in the background so that raw indexing speed is not affected.

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, 
> LUCENE-2312.patch, LUCENE-2312.patch
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-08-30 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2312:
-

Attachment: LUCENE-2312.patch

Here's a new patch that incrementally adds field cache and norms values.  
Meaning that as documents are added / indexed, norms and field cache values are 
automatically created.  The field cache values are only added to if they have 
already been created.  

The field cache functionality needs to be completed for all types.

We probably need to get the indexing lock while the field cache value is 
initially being created (eg, the terms enumeration).

We're more or less feature complete now. 

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, 
> LUCENE-2312.patch, LUCENE-2312.patch
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-08-24 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090764#comment-13090764
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

A benchmark plan is, compare the speed of NRT vs. RT.  

Index documents in a single thread, in a 2nd thread open a reader and perform a 
query.  It would be nice to synchronize the point / max doc at which RT and NRT 
open new readers to additionally verify the correctness of the directly 
comparable search results.  To make the test fair, concurrent merge scheduler 
should be turned off in the NRT test.

The hypothesis is that array copying, even on large [RT] indexes is no big deal 
compared with the excessive segment merging with NRT.

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, 
> LUCENE-2312.patch
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2700) transaction logging

2011-08-24 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090722#comment-13090722
 ] 

Jason Rutherglen commented on SOLR-2700:


Typically a transaction log configured to be written to a different hard drive 
than the indexes / database.

> transaction logging
> ---
>
> Key: SOLR-2700
> URL: https://issues.apache.org/jira/browse/SOLR-2700
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Attachments: SOLR-2700.patch, SOLR-2700.patch, SOLR-2700.patch, 
> SOLR-2700.patch, SOLR-2700.patch
>
>
> A transaction log is needed for durability of updates, for a more performant 
> realtime-get, and for replaying updates to recovering peers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 650 matches

Mail list logo