Re: Joins Debug info is missing when we hit Filter Cache

2013-09-03 Thread Kranti Parisa
Even if it's not production, Joins Debug info (similar to other components
like mlt, highlight, facets etc) could be very useful with/without hitting
caches.

I can create a Jira ticket and write more detais, but wanted to here the
opinions first.

Thanks & Regards,
Kranti K Parisa
http://www.linkedin.com/in/krantiparisa



On Wed, Sep 4, 2013 at 2:24 AM, Kranti Parisa wrote:

> I am sure this doesn't exist today, but just wondering about your thoughts.
>
> When we use Join queries (first time or with out hitting Filter Cache) and
> say debug=true, we are able to see good amount of debug info in the
> response.
>
> Do we have any plans of supporting this debug info even when we hit the
> Filter Cache. I believe that this information will be helpful with/without
> hitting the caches.
>
> Consider this use case: in production, a request comes in and builds the
> Filter Cache for a Join Query and at some point of time we want to run that
> query manually with debug turned on, we can't see a bunch of very useful
> stats/numbers.
>
>
> Thanks & Regards,
> Kranti K Parisa
> http://www.linkedin.com/in/krantiparisa
>
>


Joins Debug info is missing when we hit Filter Cache

2013-09-03 Thread Kranti Parisa
I am sure this doesn't exist today, but just wondering about your thoughts.

When we use Join queries (first time or with out hitting Filter Cache) and
say debug=true, we are able to see good amount of debug info in the
response.

Do we have any plans of supporting this debug info even when we hit the
Filter Cache. I believe that this information will be helpful with/without
hitting the caches.

Consider this use case: in production, a request comes in and builds the
Filter Cache for a Join Query and at some point of time we want to run that
query manually with debug turned on, we can't see a bunch of very useful
stats/numbers.


Thanks & Regards,
Kranti K Parisa
http://www.linkedin.com/in/krantiparisa


[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine

2013-09-03 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757487#comment-13757487
 ] 

Tommaso Teofili commented on SOLR-5201:
---

bq. unless the AnalysisEngine constructor/factory/provider does something 
special to keep track of them, they won't know anything about eachother

ok, so I think we can go with the second option of having the 
_UIMAUpdateRequestProcessorFactory_ serve the _AnalysisEngine_ to 
_UIMAUpdateRequestProcessors_.
I'll post a patch later today.

> UIMAUpdateRequestProcessor should reuse the AnalysisEngine
> --
>
> Key: SOLR-5201
> URL: https://issues.apache.org/jira/browse/SOLR-5201
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - UIMA
>Affects Versions: 4.4
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5201-ae-cache-every-request_branch_4x.patch, 
> SOLR-5201-ae-cache-only-single-request_branch_4x.patch
>
>
> As reported in http://markmail.org/thread/2psiyl4ukaejl4fx 
> UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request 
> which is bad for performance therefore it'd be nice if such AEs could be 
> reused whenever that's possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4956) make maxBufferedAddsPerServer configurable

2013-09-03 Thread asif (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757468#comment-13757468
 ] 

asif commented on SOLR-4956:


Just to put things in perspective - I ran few more tests on our setup. 

We buffer up to 1000 documents before we post to cloud every 5-10 seconds - On 
average I notice 2-4 times higher cpu on replica's when 
maxBufferedAddsPerServer is set to 10. 

The fact that 1000 documents are sent as 100 different requests of 10 documents 
each might explain the higher load on the replica. 

When we alter it to about 1000 - cpu usage is more or less in line. 

> make maxBufferedAddsPerServer configurable
> --
>
> Key: SOLR-4956
> URL: https://issues.apache.org/jira/browse/SOLR-4956
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.3, 5.0
>Reporter: Erick Erickson
>
> Anecdotal user's list evidence indicates that in high-throughput situations, 
> the default of 10 docs/batch for inter-shard batching can generate 
> significant CPU load. See the thread titled "Sharding and Replication" on 
> June 19th, but the gist is below.
> I haven't poked around, but it's a little surprising on the surface that Asif 
> is seeing this kind of difference. So I'm wondering if this change indicates 
> some other underlying issue. Regardless, this seems like it would be good to 
> investigate.
> Here's the gist of Asif's experience from the thread:
> Its a completely practical problem - we are exploring Solr to build a real
> time analytics/data solution for a system handling about 1000 qps. We have
> various metrics that are stored as different collections on the cloud,
> which means very high amount of writes. The cloud also needs to support
> about 300-400 qps.
> We initially tested with a single Solr node on a 16 core / 24 GB box  for a
> single metric. We saw that writes were not a issue at all - Solr was
> handling it extremely well. We were also able to achieve about 200 qps from
> a single node.
> When we set up the cloud ( a ensemble on 6 boxes), we saw very high CPU
> usage on the replicas. Up to 10 cores were getting used for writes on the
> replicas. Hence my concern with respect to batch updates for the replicas.
> BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU usage is
> very similar to single node installation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-03 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757466#comment-13757466
 ] 

Hoss Man commented on SOLR-2548:


bq. This bit in SimpleFacets.getFacetFieldCounts bothers me:
...
bq. It seems like if the user doesn't specify anything for FACET_THREADS, they 
wind up spawning as many threads as there are facet fields specified

I haven't reviewed the patch, but based on the snippet you posted i suspect you 
are reading that bit correctly.

If FACET_THREADS isn't specified, or if it's specified and equals the default 
value of 0, then the directExecutor is used and _no_ threads should be spawned 
at all -- the value of maxThreads shouldn't matter at that point, instead the 
existing request thread should processes all of them sequentially.

I'm guessing you should change the patch back.

Side comments...

1) Something sketchy probably does happen if a user passes in a negative value 
-- it looks like that's the case when facetExecutor will be used with an 
unlimited number of threads ... that may actually have been intentional -- that 
if you say facet.threads=-1 every facet.field should get it's own thread, no 
matter how many there are, but if that is intentional i'd love to see a comment 
there making that obvious. (and a test showing that it works).

2) can you please fix that Integer.parseInt(..."0")) to just use 
params.getInt(...,0) ... that way the correct error message will be returned if 
it's not an int (and it's easier to read)



> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch, SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: term collection frequence in lucene 3.6.2?

2013-09-03 Thread jiangwen jiang
Thanks, Mike


2013/9/3 Michael McCandless 

> 3.6.x doesn't track this statistic, but 4.x does:
> TermsEnum.totalTermFreq().
>
> In 3.6.x you could visit every doc, summing up the .freq(), but this is
> slowish.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Sep 3, 2013 at 4:19 AM, jiangwen jiang 
> wrote:
> > Hi, gay.
> >
> > Term collection frequence (which means how many times a particular term
> > appears in all documents), is this data exists in lucene 3.6.2.
> >
> > for example:
> > doc1 contains terms: T1 T2 T3 T1 T1
> > doc2 contains Term T1 T4 T4
> > 
> >
> > T1 appears 4 times in all documents, so term collection freq of T1 is 4
> >
> > Thanks for your help
> > Regards
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-09-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757419#comment-13757419
 ] 

ASF subversion and git services commented on LUCENE-3069:
-

Commit 1519909 from [~billy] in branch 'dev/branches/lucene3069'
[ https://svn.apache.org/r1519909 ]

LUCENE-3069: javadocs

> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5200) HighFreqTerms has confusing behavior with -t option

2013-09-03 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5200:
---

 Summary: HighFreqTerms has confusing behavior with -t option
 Key: LUCENE-5200
 URL: https://issues.apache.org/jira/browse/LUCENE-5200
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/other
Reporter: Robert Muir


{code}
 * HighFreqTerms class extracts the top n most frequent terms
 * (by document frequency) from an existing Lucene index and reports their
 * document frequency.
 * 
 * If the -t flag is given, both document frequency and total tf (total
 * number of occurrences) are reported, ordered by descending total tf.
{code}

Problem #1:
Its tricky what happens with -t: if you ask for the top-100 terms, it requests 
the top-100 terms (by docFreq), then resorts the top-N by totalTermFreq.

So its not really the top 100 most frequently occurring terms.

Problem #2: 
Using the -t option can be confusing and slow: the reported docFreq includes 
deletions, but totalTermFreq does not (it actually walks postings lists if 
there is even one deletion).

I think this is a relic from 3.x days when lucene did not support this 
statistic. I think we should just always output both TermsEnum.docFreq() and 
TermsEnum.totalTermFreq(), and -t just determines the comparator of the PQ.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2548) Multithreaded faceting

2013-09-03 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2548:
-

Attachment: SOLR-2548.patch

Latest version that does two things:

1> does the max thread change I commented on earlier.

2> puts in some checking to insure that if multiple threads try to uninvert the 
same field at the same time, it'll only be loaded once. I used a simple 
wait/sleep loop here since this method is called from several places and it 
looks like a real pain to try to do a Future or whatever.

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch, SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-03 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757323#comment-13757323
 ] 

Erick Erickson commented on SOLR-2548:
--

This bit in SimpleFacets.getFacetFieldCounts bothers me:

int maxThreads = 
Integer.parseInt(req.getParams().get(FacetParams.FACET_THREADS, "0"));
Executor executor = maxThreads == 0 ? directExecutor : facetExecutor;
maxThreads = maxThreads <= 0? Integer.MAX_VALUE : maxThreads;


It seems like if the user doesn't specify anything for FACET_THREADS, they wind 
up spawning as many threads as there are facet fields specified. Probably not a 
real problem given this list will be fairly small, but it seems more true to 
the old default behavior if it's changed to something like

int maxThreads = 
Integer.parseInt(req.getParams().get(FacetParams.FACET_THREADS, "1"));
Executor executor = maxThreads == 1 ? directExecutor : facetExecutor;
maxThreads = maxThreads < 1 ? Integer.MAX_VALUE : maxThreads;


Or am I seeing things that aren't there?

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5212) bad qs and mm when using edismax for field with CJKBigramFilter

2013-09-03 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-5212.


Resolution: Not A Problem

Naomi: if you questions/confusion/problems using Solr, please ask on the 
solr-user mailing list and only file Bugs once there is confirmation of a 
problem in solr itself.

In particular your initial report is confusing for a few reasons...

1) you mentioned the value of "qs" is being set based on the number bigrams -- 
however there isn't anything in your comments to suggest anything even remotely 
related to the "qs" param is coming into play here.  "qs" specifies the query 
slop property of any phrase queries created due to explicit phrase queries in 
the input query string -- nohting in our example input or example debug output 
suggests any PhraseQueries are ever getting built.

2) the number you seem to be commenting on in each case is the minNrShouldMatch 
on each of hte top level BooleanQueries produced from your input -- since your 
configured mm is {{6<-1 6<90%}} the smallest minNrShouldMatch value that will 
every be programatically assigned is "6", but all of your example queries have 
less then 6 clauses, so instead the minNrShouldMatch used in each case is the 
total number of query clauses -- ie: in each case, wherey you have N "SHOULD" 
clauses in the final query, all N clauses must match.

---

Please start a thread on the solr-user mailing list, providing all of the 
details you included in this issue, along with some specifics about what you 
expect/desire to have happen and how the actual behaior you are observing 
differs from those expecations.

> bad qs and mm when using edismax for field with CJKBigramFilter 
> 
>
> Key: SOLR-5212
> URL: https://issues.apache.org/jira/browse/SOLR-5212
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.4
>Reporter: Naomi Dushay
>Priority: Critical
>
> When I have a field using CJKBigramFilter, a mysterious qs value (or what i 
> take as qs, because it shows as ~x after the first DisjunctionMaxQuery) 
> appears in my parsed query.  The qs value that appears is the minimum of:
>   mm setting, number of bigrams in query string.
> This makes no sense, from a retrieval standpoint.  It could possibly make 
> sense to adjust the ps value, but certainly not the qs.  Moreover, changing 
> the mm setting via an HTTP param can affect the qs, but sending in a qs 
> parameter has no effect on the qs in the parsed query.
> If I use a field in qf that has only bigrams, then qs is set to MIN(original 
> mm setting, number of bigrams in query string)
> arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
>旧小说   is 3 chars, so 2 bigrams
> debugQuery
>   {!qf=cjk_bi_search pf= pf2= pf3=}旧小说
>   {!qf=cjk_bi_search pf= pf2= pf3=}旧小说
>   (+DisjunctionMaxQuerycjk_bi_search:旧小 
> cjk_bi_search:小说)~2))~0.01) ())/no_coord
>   +(((cjk_bi_search:旧小 
> cjk_bi_search:小说)~2))~0.01 ()
> If I use a field in qf that has only unigrams, then qs is set to MIN(original 
> mm setting, number of unigrams in query string)
> arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
>旧小说   is 3 chars, so 3 bigrams
> debugQuery
>   {!qf=cjk_uni_search pf= pf2= pf3=}旧小说
>   {!qf=cjk_uni_search pf= pf2= pf3=}旧小说
>   (+DisjunctionMaxQuerycjk_uni_search:旧 
> cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord
>   +(((cjk_uni_search:旧 cjk_uni_search:小 
> cjk_uni_search:说)~3))~0.01 ()
> If I use a field in qf that has both bigrams and unigrams, then qs is set to 
> MIN(original mm setting, number of bigrams + unigrams in query string). 
> arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
>旧小说   is 3 chars, so 3 unigrams + 2 bigrams = 5
> debugQuery
>   {!qf=cjk_both_pub_search pf= pf2= 
> pf3=}旧小说
>   {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说
>   (+DisjunctionMaxQuerycjk_both_search:旧 
> cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
> cjk_both_search:说)~5))~0.01) ())/no_coord
>   +(((cjk_both_search:旧 
> cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
> cjk_both_search:说)~5))~0.01 ()
> I am running Solr 4.4.  I have fields defined like so:
>  positionIncrementGap="1" autoGeneratePhraseQueries="false">
>  
> 
> 
>  id="Traditional-Simplified"/>
>  id="Katakana-Hiragana"/>
> 
>  hiragana="true" katakana="true" hangul="true" outputUnigrams="true" />
>   
> 
>  positionIncrementGap="1" autoGeneratePhraseQueries="false">
>  
> 
> 
>  id="Traditional-Simplified"/>
>  id="Katakana-Hiragana"/>
> 
>  hiragana="true" katakana="true" hangul="true" outputUnigrams="false" />
> 

Solr 5.0 roadmap added to wiki

2013-09-03 Thread Shawn Heisey
I added a roadmap section to the Solr5.0 page.  At this moment I can
only think of one major thing we are planning for the 5.0 release, and I
put it on there.

https://wiki.apache.org/solr/Solr5.0

If that should be a separate page rather than part of the main 5.0 page,
I'm perfectly OK with it being changed.

Thanks,
Shawn

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5212) bad qs and mm when using edismax for field with CJKBigramFilter

2013-09-03 Thread Naomi Dushay (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naomi Dushay updated SOLR-5212:
---

Description: 
When I have a field using CJKBigramFilter, a mysterious qs value (or what i 
take as qs, because it shows as ~x after the first DisjunctionMaxQuery) appears 
in my parsed query.  The qs value that appears is the minimum of:
  mm setting, number of bigrams in query string.

This makes no sense, from a retrieval standpoint.  It could possibly make sense 
to adjust the ps value, but certainly not the qs.  Moreover, changing the mm 
setting via an HTTP param can affect the qs, but sending in a qs parameter has 
no effect on the qs in the parsed query.

If I use a field in qf that has only bigrams, then qs is set to MIN(original mm 
setting, number of bigrams in query string)

arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 2 bigrams

debugQuery
{!qf=cjk_bi_search pf= pf2= pf3=}旧小说
{!qf=cjk_bi_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01) ())/no_coord
+(((cjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01 ()


If I use a field in qf that has only unigrams, then qs is set to MIN(original 
mm setting, number of unigrams in query string)

arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 bigrams

debugQuery
{!qf=cjk_uni_search pf= pf2= pf3=}旧小说
{!qf=cjk_uni_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_uni_search:旧 
cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord
+(((cjk_uni_search:旧 cjk_uni_search:小 
cjk_uni_search:说)~3))~0.01 ()


If I use a field in qf that has both bigrams and unigrams, then qs is set to 
MIN(original mm setting, number of bigrams + unigrams in query string). 

arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 unigrams + 2 bigrams = 5

debugQuery
{!qf=cjk_both_pub_search pf= pf2= 
pf3=}旧小说
{!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01) ())/no_coord
+(((cjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01 ()




I am running Solr 4.4.  I have fields defined like so:


 

  




  


 

  




  


 

  



  



The request handler uses edismax:

  

  edismax
  *:*
  6<-1 6<90%
  1
  0


  was:
When I have a field using CJKBigramFilter, a mysterious qs value (or what i 
take as qs, because it shows as ~x after the first DisjunctionMaxQuery) appears 
in my parsed query.  The qs value that appears is the minimum of:
  mm setting, number of bigrams in query string.

This makes no sense, from a retrieval standpoint.  It could possibly make sense 
to adjust the ps value, but certainly not the qs.

If I use a field in qf that has only bigrams, then qs is set to MIN(original mm 
setting, number of bigrams in query string)

arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 2 bigrams

debugQuery
{!qf=cjk_bi_search pf= pf2= pf3=}旧小说
{!qf=cjk_bi_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01) ())/no_coord
+(((cjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01 ()


If I use a field in qf that has only unigrams, then qs is set to MIN(original 
mm setting, number of unigrams in query string)

arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 bigrams

debugQuery
{!qf=cjk_uni_search pf= pf2= pf3=}旧小说
{!qf=cjk_uni_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_uni_search:旧 
cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord
+(((cjk_uni_search:旧 cjk_uni_search:小 
cjk_uni_search:说)~3))~0.01 ()


If I use a field in qf that has both bigrams and unigrams, then qs is set to 
MIN(original mm setting, number of bigrams + unigrams in query string). 

arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 unigrams + 2 bigrams = 5

debugQuery
{!qf=cjk_both_pub_search pf= pf2= 
pf3=}旧小说
{!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01) ())/no_coord
+(((cjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01 ()




I am running Solr 4.4.  I have fields defined like so:


 

  




  


 

  
   

[jira] [Updated] (SOLR-5212) bad qs and mm when using edismax for field with CJKBigramFilter

2013-09-03 Thread Naomi Dushay (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naomi Dushay updated SOLR-5212:
---

Description: 
When I have a field using CJKBigramFilter, a mysterious qs value (or what i 
take as qs, because it shows as ~x after the first DisjunctionMaxQuery) appears 
in my parsed query.  The qs value that appears is the minimum of:
  mm setting, number of bigrams in query string.

This makes no sense, from a retrieval standpoint.  It could possibly make sense 
to adjust the ps value, but certainly not the qs.

If I use a field in qf that has only bigrams, then qs is set to MIN(original mm 
setting, number of bigrams in query string)

arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 2 bigrams

debugQuery
{!qf=cjk_bi_search pf= pf2= pf3=}旧小说
{!qf=cjk_bi_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01) ())/no_coord
+(((cjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01 ()


If I use a field in qf that has only unigrams, then qs is set to MIN(original 
mm setting, number of unigrams in query string)

arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 bigrams

debugQuery
{!qf=cjk_uni_search pf= pf2= pf3=}旧小说
{!qf=cjk_uni_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_uni_search:旧 
cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord
+(((cjk_uni_search:旧 cjk_uni_search:小 
cjk_uni_search:说)~3))~0.01 ()


If I use a field in qf that has both bigrams and unigrams, then qs is set to 
MIN(original mm setting, number of bigrams + unigrams in query string). 

arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 unigrams + 2 bigrams = 5

debugQuery
{!qf=cjk_both_pub_search pf= pf2= 
pf3=}旧小说
{!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01) ())/no_coord
+(((cjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01 ()




I am running Solr 4.4.  I have fields defined like so:


 

  




  


 

  




  


 

  



  



The request handler uses edismax:

  

  edismax
  *:*
  6<-1 6<90%
  1
  0


  was:
When I have a field using CJKBigramFilter, a mysterious qs value appears in my 
parsed query.  The qs value that appears is the minimum of:
  mm setting, number of bigrams in query string.

If I use a field in qf that has only bigrams, then qs is set to MIN(original mm 
setting, number of bigrams in query string)

arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 2 bigrams

debugQuery
{!qf=cjk_bi_search pf= pf2= pf3=}旧小说
{!qf=cjk_bi_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01) ())/no_coord
+(((cjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01 ()


If I use a field in qf that has only unigrams, then qs is set to MIN(original 
mm setting, number of unigrams in query string)

arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 bigrams

debugQuery
{!qf=cjk_uni_search pf= pf2= pf3=}旧小说
{!qf=cjk_uni_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_uni_search:旧 
cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord
+(((cjk_uni_search:旧 cjk_uni_search:小 
cjk_uni_search:说)~3))~0.01 ()


If I use a field in qf that has both bigrams and unigrams, then qs is set to 
MIN(original mm setting, number of bigrams + unigrams in query string). 

arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 unigrams + 2 bigrams = 5

debugQuery
{!qf=cjk_both_pub_search pf= pf2= 
pf3=}旧小说
{!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01) ())/no_coord
+(((cjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01 ()




I am running Solr 4.4.  I have fields defined like so:


 

  




  


 

  




  


 

  



  



The request handler uses edismax:

  

  edismax
  *:*
  6<-1 6<90%
  1
  0



> bad qs and mm when using edismax for field with CJKBigramFilter 
> 
>
>

[jira] [Created] (SOLR-5212) bad qs and mm when using edismax for field with CJKBigramFilter

2013-09-03 Thread Naomi Dushay (JIRA)
Naomi Dushay created SOLR-5212:
--

 Summary: bad qs and mm when using edismax for field with 
CJKBigramFilter 
 Key: SOLR-5212
 URL: https://issues.apache.org/jira/browse/SOLR-5212
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.4
Reporter: Naomi Dushay
Priority: Critical


When I have a field using CJKBigramFilter, a mysterious qs value appears in my 
parsed query.  The qs value that appears is the minimum of:
  mm setting, number of bigrams in query string.

If I use a field in qf that has only bigrams, then qs is set to MIN(original mm 
setting, number of bigrams in query string)

arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 2 bigrams

debugQuery
{!qf=cjk_bi_search pf= pf2= pf3=}旧小说
{!qf=cjk_bi_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01) ())/no_coord
+(((cjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01 ()


If I use a field in qf that has only unigrams, then qs is set to MIN(original 
mm setting, number of unigrams in query string)

arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 bigrams

debugQuery
{!qf=cjk_uni_search pf= pf2= pf3=}旧小说
{!qf=cjk_uni_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_uni_search:旧 
cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord
+(((cjk_uni_search:旧 cjk_uni_search:小 
cjk_uni_search:说)~3))~0.01 ()


If I use a field in qf that has both bigrams and unigrams, then qs is set to 
MIN(original mm setting, number of bigrams + unigrams in query string). 

arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 unigrams + 2 bigrams = 5

debugQuery
{!qf=cjk_both_pub_search pf= pf2= 
pf3=}旧小说
{!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说
(+DisjunctionMaxQuerycjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01) ())/no_coord
+(((cjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01 ()




I am running Solr 4.4.  I have fields defined like so:


 

  




  


 

  




  


 

  



  



The request handler uses edismax:

  

  edismax
  *:*
  6<-1 6<90%
  1
  0


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4909) Solr and IndexReader Re-opening on Replication Slave

2013-09-03 Thread Michael Garski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Garski updated SOLR-4909:
-

Attachment: SOLR-4909_confirm_keys.patch

I've updated the patch to include the initial directory opened via the cores 
indexReaderFactory & included a test that verifies the value of the core cache 
key's hash code after a commit.

> Solr and IndexReader Re-opening on Replication Slave
> 
>
> Key: SOLR-4909
> URL: https://issues.apache.org/jira/browse/SOLR-4909
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java), search
>Affects Versions: 4.3
>Reporter: Michael Garski
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4909_confirm_keys.patch, SOLR-4909-demo.patch, 
> SOLR-4909_fix.patch, SOLR-4909.patch, SOLR-4909_v2.patch, SOLR-4909_v3.patch
>
>
> I've been experimenting with caching filter data per segment in Solr using a 
> CachingWrapperFilter & FilteredQuery within a custom query parser (as 
> suggested by [~yo...@apache.org] in SOLR-3763) and encountered situations 
> where the value of getCoreCacheKey() on the AtomicReader for each segment can 
> change for a given segment on disk when the searcher is reopened. As 
> CachingWrapperFilter uses the value of the segment's getCoreCacheKey() as the 
> key in the cache, there are situations where the data cached on that segment 
> is not reused when the segment on disk is still part of the index. This 
> affects the Lucene field cache and field value caches as well as they are 
> cached per segment.
> When Solr first starts it opens the searcher's underlying DirectoryReader in 
> StandardIndexReaderFactory.newReader by calling 
> DirectoryReader.open(indexDir, termInfosIndexDivisor), and the reader is 
> subsequently reopened in SolrCore.openNewSearcher by calling 
> DirectoryReader.openIfChanged(currentReader, writer.get(), true). The act of 
> reopening the reader with the writer when it was first opened without a 
> writer results in the value of getCoreCacheKey() changing on each of the 
> segments even though some of the segments have not changed. Depending on the 
> role of the Solr server, this has different effects:
> * On a SolrCloud node or free-standing index and search server the segment 
> cache is invalidated during the first DirectoryReader reopen - subsequent 
> reopens use the same IndexWriter instance and as such the value of 
> getCoreCacheKey() on each segment does not change so the cache is retained. 
> * For a master-slave replication set up the segment cache invalidation occurs 
> on the slave during every replication as the index is reopened using a new 
> IndexWriter instance which results in the value of getCoreCacheKey() changing 
> on each segment when the DirectoryReader is reopened using a different 
> IndexWriter instance.
> I can think of a few approaches to alter the re-opening behavior to allow 
> reuse of segment level caches in both cases, and I'd like to get some input 
> on other ideas before digging in:
> * To change the cloud node/standalone first commit issue it might be possible 
> to create the UpdateHandler and IndexWriter before the DirectoryReader, and 
> use the writer to open the reader. There is a comment in the SolrCore 
> constructor by [~yo...@apache.org] that the searcher should be opened before 
> the update handler so that may not be an acceptable approach. 
> * To change the behavior of a slave in a replication set up, one solution 
> would be to not open a writer from the SnapPuller when the new index is 
> retrieved if the core is enabled as a slave only. The writer is needed on a 
> server configured as a master & slave that is functioning as a replication 
> repeater so downstream slaves can see the changes in the index and retrieve 
> them.
> I'll attach a unit test that demonstrates the behavior of reopening the 
> DirectoryReader and it's effects on the value of getCoreCacheKey. My 
> assumption is that the behavior of Lucene during the various reader reopen 
> operations is correct and that the changes are necessary on the Solr side of 
> things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5206) OpenExchangeRatesOrgProvider never refreshes the rates

2013-09-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757113#comment-13757113
 ] 

ASF subversion and git services commented on SOLR-5206:
---

Commit 1519858 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1519858 ]

SOLR-5206: Fixed OpenExchangeRatesOrgProvider to use refreshInterval correctly

> OpenExchangeRatesOrgProvider never refreshes the rates
> --
>
> Key: SOLR-5206
> URL: https://issues.apache.org/jira/browse/SOLR-5206
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.4
>Reporter: Catalin
>Priority: Critical
> Fix For: 4.4
>
> Attachments: fixRefresh.patch, SOLR-5206.patch
>
>
> The OpenExchangeRatesOrgProvider never reloads the rates after the initial 
> load, no matter what refreshInterval is set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5206) OpenExchangeRatesOrgProvider never refreshes the rates

2013-09-03 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-5206.


   Resolution: Fixed
Fix Version/s: (was: 4.4)
   5.0
   4.5
 Assignee: Hoss Man

Thanks for reporting this Catalin.

> OpenExchangeRatesOrgProvider never refreshes the rates
> --
>
> Key: SOLR-5206
> URL: https://issues.apache.org/jira/browse/SOLR-5206
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.4
>Reporter: Catalin
>Assignee: Hoss Man
>Priority: Critical
> Fix For: 4.5, 5.0
>
> Attachments: fixRefresh.patch, SOLR-5206.patch
>
>
> The OpenExchangeRatesOrgProvider never reloads the rates after the initial 
> load, no matter what refreshInterval is set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5206) OpenExchangeRatesOrgProvider never refreshes the rates

2013-09-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757137#comment-13757137
 ] 

ASF subversion and git services commented on SOLR-5206:
---

Commit 1519865 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1519865 ]

SOLR-5206: Fixed OpenExchangeRatesOrgProvider to use refreshInterval correctly 
(merge r1519858)

> OpenExchangeRatesOrgProvider never refreshes the rates
> --
>
> Key: SOLR-5206
> URL: https://issues.apache.org/jira/browse/SOLR-5206
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.4
>Reporter: Catalin
>Priority: Critical
> Fix For: 4.4
>
> Attachments: fixRefresh.patch, SOLR-5206.patch
>
>
> The OpenExchangeRatesOrgProvider never reloads the rates after the initial 
> load, no matter what refreshInterval is set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors

2013-09-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757111#comment-13757111
 ] 

Robert Muir commented on LUCENE-5188:
-

nice idea!

> Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
> ---
>
> Key: LUCENE-5188
> URL: https://issues.apache.org/jira/browse/LUCENE-5188
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5188.patch
>
>
> The way CompressingStoredFieldsFormat works is that it first decompresses 
> data and then consults the StoredFieldVisitor. This is a bit wasteful in case 
> documents are big and only the first field of a document is of interest so 
> maybe we could decompress and consult the StoredFieldVicitor in a more 
> streaming fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757109#comment-13757109
 ] 

Dawid Weiss commented on LUCENE-5197:
-

> And from a big O perspective, this might be just fine.

He, he. From a big-O perspective the cost of RUE vs. custom code is negligible 
;)

Anyway, fine -- I still think it'd be a nice addition to RUE to allow selective 
counting (to exclude certain fields from the equation) and it'd be a perfect 
use case to apply here. But it can be used in other places (like in tests to 
count static memory usage held by a class, etc.).

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors

2013-09-03 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5188:
-

Attachment: LUCENE-5188.patch

Here is a patch that slices large chunks (>= twice the configured chunk size) 
into several LZ4 blocks (of chunkSize bytes each). The LZ4 blocks will be 
decompressed as needed so that you don't end up decompressing everything if you 
only need the first field of your document.

A nice side-effect of this patch is that it reduces memory pressure as well 
when working with big documents (LUCENE-4955): since big documents are sliced 
into fixed-size blocks, it is not needed anymore to allocate a byte[] of the 
size of the document (potentially several MB) to decompress it.

> Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
> ---
>
> Key: LUCENE-5188
> URL: https://issues.apache.org/jira/browse/LUCENE-5188
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5188.patch
>
>
> The way CompressingStoredFieldsFormat works is that it first decompresses 
> data and then consults the StoredFieldVisitor. This is a bit wasteful in case 
> documents are big and only the first field of a document is of interest so 
> maybe we could decompress and consult the StoredFieldVicitor in a more 
> streaming fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4465) Configurable Collectors

2013-09-03 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757041#comment-13757041
 ] 

Kranti Parisa commented on SOLR-4465:
-

Anyone applied this patch on 4.4 branch?

> Configurable Collectors
> ---
>
> Key: SOLR-4465
> URL: https://issues.apache.org/jira/browse/SOLR-4465
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.1
>Reporter: Joel Bernstein
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch
>
>
> This ticket provides a patch to add pluggable collectors to Solr. This patch 
> was generated and tested with Solr 4.1.
> This is how the patch functions:
> Collectors are plugged into Solr in the solconfig.xml using the new 
> collectorFactory element. For example:
> 
> 
> The elements above define two collector factories. The first one is the 
> "default" collectorFactory. The class attribute points to 
> org.apache.solr.handler.component.CollectorFactory, which implements logic 
> that returns the default TopScoreDocCollector and TopFieldCollector. 
> To create your own collectorFactory you must subclass the default 
> CollectorFactory and at a minimum override the getCollector method to return 
> your new collector. 
> The parameter "cl" turns on pluggable collectors:
> cl=true
> If cl is not in the parameters, Solr will automatically use the default 
> collectorFactory.
> *Pluggable Doclist Sorting With the Docs Collector*
> You can specify two types of pluggable collectors. The first type is the docs 
> collector. For example:
> cl.docs=
> The above param points to a named collectorFactory in the solrconfig.xml to 
> construct the collector. The docs collectorFactorys must return a collector 
> that extends the TopDocsCollector base class. Docs collectors are responsible 
> for collecting the doclist.
> You can specify only one docs collector per query.
> You can pass parameters to the docs collector using local params syntax. For 
> example:
> cl.docs=\{! sort=mycustomesort\}mycollector
> If cl=true and a docs collector is not specified, Solr will use the default 
> collectorFactory to create the docs collector.
> *Pluggable Custom Analytics With Delegating Collectors*
> You can also specify any number of custom analytic collectors with the 
> "cl.analytic" parameter. Analytic collectors are designed to collect 
> something else besides the doclist. Typically this would be some type of 
> custom analytic. For example:
> cl.analytic=sum
> The parameter above specifies a analytic collector named sum. Like the docs 
> collectors, "sum" points to a named collectorFactory in the solrconfig.xml. 
> You can specificy any number of analytic collectors by adding additional 
> cl.analytic parameters.
> Analytic collector factories must return Collector instances that extend 
> DelegatingCollector. 
> A sample analytic collector is provided in the patch through the 
> org.apache.solr.handler.component.SumCollectorFactory.
> This collectorFactory provides a very simple DelegatingCollector that groups 
> by a field and sums a column of floats. The sum collector is not designed to 
> be a fully functional sum function but to be a proof of concept for pluggable 
> analytics through delegating collectors.
> You can send parameters to analytic collectors with solr local param syntax.
> For example:
> cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
> The "id" parameter is mandatory for analytic collectors and is used to 
> identify the output from the collector. In this example the "groupby" and 
> "column" params tell the sum collector which field to group by and sum.
> Analytic collectors are passed a reference to the ResponseBuilder and can 
> place maps with analytic output directory into the SolrQueryResponse with the 
> add() method.
> Maps that are placed in the SolrQueryResponse are automatically added to the 
> outgoing response. The response will include a list named cl.analytic., 
> where id is specified in the local param.
> *Distributed Search*
> The CollectorFactory also has a method called merge(). This method aggregates 
> the results from each of the shards during distributed search. The "default" 
> CollectoryFactory implements the default merge logic for merging documents 
> from each shard. If you define a different docs collector you can override 
> the default merge method to merge documents in accordance with how they are 
> collected at the shard level.
> With analytic collectors, you'll need to override the merge met

[jira] [Commented] (SOLR-4465) Configurable Collectors

2013-09-03 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757058#comment-13757058
 ] 

Joel Bernstein commented on SOLR-4465:
--

This ticket has been split into smaller tickets with a different design. See 
the related issues for more info.

> Configurable Collectors
> ---
>
> Key: SOLR-4465
> URL: https://issues.apache.org/jira/browse/SOLR-4465
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.1
>Reporter: Joel Bernstein
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
> SOLR-4465.patch, SOLR-4465.patch
>
>
> This ticket provides a patch to add pluggable collectors to Solr. This patch 
> was generated and tested with Solr 4.1.
> This is how the patch functions:
> Collectors are plugged into Solr in the solconfig.xml using the new 
> collectorFactory element. For example:
> 
> 
> The elements above define two collector factories. The first one is the 
> "default" collectorFactory. The class attribute points to 
> org.apache.solr.handler.component.CollectorFactory, which implements logic 
> that returns the default TopScoreDocCollector and TopFieldCollector. 
> To create your own collectorFactory you must subclass the default 
> CollectorFactory and at a minimum override the getCollector method to return 
> your new collector. 
> The parameter "cl" turns on pluggable collectors:
> cl=true
> If cl is not in the parameters, Solr will automatically use the default 
> collectorFactory.
> *Pluggable Doclist Sorting With the Docs Collector*
> You can specify two types of pluggable collectors. The first type is the docs 
> collector. For example:
> cl.docs=
> The above param points to a named collectorFactory in the solrconfig.xml to 
> construct the collector. The docs collectorFactorys must return a collector 
> that extends the TopDocsCollector base class. Docs collectors are responsible 
> for collecting the doclist.
> You can specify only one docs collector per query.
> You can pass parameters to the docs collector using local params syntax. For 
> example:
> cl.docs=\{! sort=mycustomesort\}mycollector
> If cl=true and a docs collector is not specified, Solr will use the default 
> collectorFactory to create the docs collector.
> *Pluggable Custom Analytics With Delegating Collectors*
> You can also specify any number of custom analytic collectors with the 
> "cl.analytic" parameter. Analytic collectors are designed to collect 
> something else besides the doclist. Typically this would be some type of 
> custom analytic. For example:
> cl.analytic=sum
> The parameter above specifies a analytic collector named sum. Like the docs 
> collectors, "sum" points to a named collectorFactory in the solrconfig.xml. 
> You can specificy any number of analytic collectors by adding additional 
> cl.analytic parameters.
> Analytic collector factories must return Collector instances that extend 
> DelegatingCollector. 
> A sample analytic collector is provided in the patch through the 
> org.apache.solr.handler.component.SumCollectorFactory.
> This collectorFactory provides a very simple DelegatingCollector that groups 
> by a field and sums a column of floats. The sum collector is not designed to 
> be a fully functional sum function but to be a proof of concept for pluggable 
> analytics through delegating collectors.
> You can send parameters to analytic collectors with solr local param syntax.
> For example:
> cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
> The "id" parameter is mandatory for analytic collectors and is used to 
> identify the output from the collector. In this example the "groupby" and 
> "column" params tell the sum collector which field to group by and sum.
> Analytic collectors are passed a reference to the ResponseBuilder and can 
> place maps with analytic output directory into the SolrQueryResponse with the 
> add() method.
> Maps that are placed in the SolrQueryResponse are automatically added to the 
> outgoing response. The response will include a list named cl.analytic., 
> where id is specified in the local param.
> *Distributed Search*
> The CollectorFactory also has a method called merge(). This method aggregates 
> the results from each of the shards during distributed search. The "default" 
> CollectoryFactory implements the default merge logic for merging documents 
> from each shard. If you define a different docs collector you can override 
> the default merge method to merge documents in accordance with how they are 
> collected at the shard

[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-03 Thread Areek Zillur (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757021#comment-13757021
 ] 

Areek Zillur commented on LUCENE-5197:
--

>Can you elaborate on this? Where was it incorrect?
  - by incorrect I meant walking up undesirable member variables in a an 
object. In hindsight, I would say that was a bad choice of wording. I think the 
correct word would be inflexible.

I do like the RamUsageEstimator "aware"-ness idea. I think that along with some 
kind of filtering mechanism in RamUsageEstimator would be perfect.

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans

2013-09-03 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756970#comment-13756970
 ] 

Yonik Seeley commented on SOLR-5211:


bq. I'm observing an interesting behavior. If I delete one parent doc, 
ToParentBJQ doesn't stick orphan children to the next parent, but it happens 
after optimize! 

This seems fine - deleting a parent doc and not the children results in 
undefined behavior.

bq.  it seems ToParentBJQ doesn't mind deletes in parents filter.

Right - that seems fine too.

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 5.0
>Reporter: Mikhail Khludnev
> Fix For: 4.5, 5.0
>
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757009#comment-13757009
 ] 

Robert Muir commented on LUCENE-5197:
-

{quote}
What you're doing is actually a skewed view – it measures certain fields 
selectively.
{quote}

And from a big O perspective, this might be just fine.

The way i see it, this would be a way to see how much RAM the lucene segment 
needs for someone's content.
Things like the terms index and docvalues fields grow according to the content 
in different ways: e.g. how large/how many terms you have, how they share 
prefixes, how many documents you have, and so on.

The "skew" is just boring constants pulled out of the equation, even if its 2KB 
or so, its not interesting at all since its just a constant cost independent of 
the content.

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756884#comment-13756884
 ] 

Robert Muir commented on LUCENE-5189:
-

It really doesn't work: its definitely a blocker for me!

This leaves the general api (FieldInfo.attributes and SegmentInfo.attributes) 
broken for codecs, and only hacks a specific implementation that uses them.

With or without the current boolean, if a numeric docvalues impl puts something 
in FieldInfo.attributes during an update, it will go into a black hole, because 
FieldInfos is write-once per-segment (and not per-commit). Same goes with 
SegmentInfo.attributes.


> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5206) OpenExchangeRatesOrgProvider never refreshes the rates

2013-09-03 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5206:
---

Attachment: SOLR-5206.patch

Bahg!

That's terrible .. thanks for reporting this and identifying the fix Catalin.

I've attached a patch including Catalin's fix (along with a bit of refactoring 
the match) and a fix so that the "reload" test actually tests something useful.

doing more testing now.

> OpenExchangeRatesOrgProvider never refreshes the rates
> --
>
> Key: SOLR-5206
> URL: https://issues.apache.org/jira/browse/SOLR-5206
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.4
>Reporter: Catalin
>Priority: Critical
> Fix For: 4.4
>
> Attachments: fixRefresh.patch, SOLR-5206.patch
>
>
> The OpenExchangeRatesOrgProvider never reloads the rates after the initial 
> load, no matter what refreshInterval is set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans

2013-09-03 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756947#comment-13756947
 ] 

Mikhail Khludnev commented on SOLR-5211:


I'm observing an interesting behavior. If I delete one parent doc, ToParentBJQ 
doesn't stick orphan children to the next parent, but it happens after 
optimize! it seems ToParentBJQ doesn't mind deletes in parents filter. Isn't it 
a separate LUCENE issue?

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 5.0
>Reporter: Mikhail Khludnev
> Fix For: 4.5, 5.0
>
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-03 Thread Areek Zillur (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756905#comment-13756905
 ] 

Areek Zillur edited comment on LUCENE-5197 at 9/3/13 6:50 PM:
--

Attached a patch removing redundant null checks as Adrien suggested.

First of all wanted to thank everybody for their valuable inputs.

The reasons why I choose to have an explicit method to calculate the heap size 
rather than using the RAMUsageEstimator already has surfaced in the discussion 
above (slow for many objects, incorrect for some type of objects).
It would be nice to have an API to call from (solr Admin for example) to 
estimate the current index heap size.

I do understand the concern regarding the "out of sync with implementation 
changes" concern, I mainly took into account the codecs for the size estimation 
only such that higher level APIs need not to implement the method.

The suggested modified RAMUsageEstimator sounds nice, but as far as I 
understand would not the logic in implementing the "excluded objects" change 
just as much? while being more implicit than the proposed solution above? 


  was (Author: areek):
Attached a patch removing redundant null checks as Adrien suggested.

First of all wanted to thank everybody for their valuable inputs.

The reasons why I choose to have an explicit method to calculate the heap size 
rather than using the RAMUsageEstimator already has surfaced in the discussion 
above (slow for many objects, incorrect for some type of objects).
It would be nice to have an API to call from (solr Admin for example) to 
estimate the current index heap size.

I do understand the concern regarding the "out of sync with implementation 
changes" concern, I mainly took into account the 
codecs for the size estimation only such that higher level APIs need not to 
implement the method.

The suggested modified RAMUsageEstimator sounds nice, but as far as I 
understand would not the logic in implementing the "excluded objects" change 
just as much? while being more implicit than the proposed solution above? 

  
> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-03 Thread Areek Zillur (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated LUCENE-5197:
-

Attachment: LUCENE-5197.patch

Attached a patch removing redundant null checks as Adrien suggested.

First of all wanted to thank everybody for their valuable inputs.

The reasons why I choose to have an explicit method to calculate the heap size 
rather than using the RAMUsageEstimator already has surfaced in the discussion 
above (slow for many objects, incorrect for some type of objects).
It would be nice to have an API to call from (solr Admin for example) to 
estimate the current index heap size.

I do understand the concern regarding the "out of sync with implementation 
changes" concern, I mainly took into account the 
codecs for the size estimation only such that higher level APIs need not to 
implement the method.

The suggested modified RAMUsageEstimator sounds nice, but as far as I 
understand would not the logic in implementing the "excluded objects" change 
just as much? while being more implicit than the proposed solution above? 


> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756896#comment-13756896
 ] 

Robert Muir commented on LUCENE-5189:
-

By the way: the "general" issue is that for updates, its unfortunately not 
enough to concern ourselves with data, we have to worry about metadata too:

I see at least 4 problems (and i have not thought about it completely):
# FieldInfo.attributes: these "writes" by the NumericDocValues impl will be 
completely discarded during update, because its per-segment, not per-commit.
# SegmentInfo.attributes: same as the above
# Field doesnt exist in FieldInfo at all: (because the segment the update 
applies to happens to have no values for the field)
# Field exists in FieldInfo, but is incomplete: (because the segment the update 
applies to, had say a stored-only or stored+indexed value for the field, but no 
dv one).

PerFieldDVF is just one implementation that happens to use #1. Fixing it is 
fixing the symptom, thats why I say we really need to instead fix the disease, 
or things will get very ugly.

The only reasons you dont see more problems with #1 and #2, is that currently 
its not used very much (only by PerField and back-compat). If we had more 
codecs exercising the APIs, you would be seeing these problems already.

A perfectly good solution would be to remove these APIs completely for public 
use (which would solve #1 and #2). PerField(PF/DVF) could write its own .per 
file instead. Back compat cruft could then use these now-internal-only-APIs 
(and it wont matter since they dont support updates), or we could implement 
their hacks in another way.

But this still leaves issues like #3 and #4.

Adding a boolean 'isFieldUpdate' doesn't really solve anything, and it totally 
breaks the whole concept of the codec being unaware of updates.

It is the wrong direction.


> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756931#comment-13756931
 ] 

Shai Erera commented on LUCENE-5189:


OK, so now I get your point. The problem is that we pass to Codec FI.attributes 
with say an attribute 'foo=bar'. The Codec, unaware that this is an update, 
looks at the given numericFields and decides to encode them using method 
"bar2", so it encodes into the attributes 'foo=bar2', but those attributes get 
lost because they're not rewritten to FIS. Do I understand correctly?

Of course, we could say that since the Codec has to peek into 
SWS.isFieldUpdate, thereby making it updates-aware, it should not encode stuff 
in a different format, but SWS.isFieldUpdate is not enough to enforce that.

I don't think that gen'ing FIS solves the problem of obtaining the right DVF in 
the first place. Sure, after we do that, the Codec can put whatever attributes 
that it wants, they will be recorded in the new FIS.gen.

But maybe we can solve these two problems by gen'ing FIS:

* Add FieldInfo.dvGen. The Codec will receive the FieldInfos with their dvGen 
bumped up.
* Codec can choose to look at FI.dvGen and pull the right DVF e.g. like 
PerField does.
** Or it can choose to completely ignore it, and always write udpates using the 
new format.
* Codec is free to record whatever attributes it wants on this FI. Since we gen 
FIS, they will be recorded and used by the reader.

What do you think?

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756924#comment-13756924
 ] 

Dawid Weiss commented on LUCENE-5197:
-

> incorrect for some type of objects

Can you elaborate on this? Where was it incorrect?

> would not the logic in implementing the "excluded objects" change just as 
> much?

I think it'd be much simpler to exclude the type of objects we know spin out of 
control - loggers, thread locals, thread references and leave the remaining 
stuff accounted. After all if it's referenced it does take space on the heap so 
the figure is correct. What you're doing is actually a skewed view -- it 
measures certain fields selectively.

I was also thinking in terms of tests -- one can create a sanity test which 
will create a small index, measure its RAM usage and then fail if it seems "too 
large" (because a thread local or some other field was accounted for). I don't 
see a way to do such a sanity check for per-class handcrafted code (unless you 
want to test against RamUsageEstimator, which would duplicate the effort 
anyway).

Let me stress again that I'm not against your patch, I just have a gut feeling 
it'll be a recurring theme of new issues in Jira.

Yet another idea sprang to my mind -- perhaps if speed is an issue and certain 
types of objects can efficiently calculate their RAM usage (FSTs), we could 
make RamUsageEstimator "aware" of such objects by introducing an interface like:
{code}
interface IKnowMySize {
  public long /* super. :) */ sizeMe();
}
{code}

Jokes aside, this could be implemented for classes which indeed have a complex 
structure and the rest (arrays, etc.) could be counted efficiently by walking 
the reference graph. Just a thought.

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756885#comment-13756885
 ] 

Dawid Weiss commented on LUCENE-5197:
-

> Also it totally breaks down if it hits certain objects like a ThreadLocal.

That's why I suggested a visitor pattern, you could tune it not to enter such 
variables. Also note that if there are lots of objects then the object 
representation overhead itself will be significant and will vary depending on 
each VM, its settings, etc; a specific snippet of code to estimate each 
object's memory use may be faster but it'll be either a nightmare to maintain 
or it'll be a very rough approximate.

I think it'd be better to try to make RUE faster/ more flexible. Like Shai 
mentioned -- if it's not a performance-critical API then the difference will 
not be at all significant.

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756863#comment-13756863
 ] 

Shai Erera commented on LUCENE-5189:


I don't understand the problem that you raise. Until then, I think that 
SWS.isFieldUpdate is fine. It works, it's simple, and most importantly, it 
allows me to move forward. Let's discuss how to improve it even further, but I 
don't think this is a blocker. We can always improve that later on.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5211) updating parent as childless makes old children orphans

2013-09-03 Thread Mikhail Khludnev (JIRA)
Mikhail Khludnev created SOLR-5211:
--

 Summary: updating parent as childless makes old children orphans
 Key: SOLR-5211
 URL: https://issues.apache.org/jira/browse/SOLR-5211
 Project: Solr
  Issue Type: Sub-task
  Components: update
Affects Versions: 4.5, 5.0
Reporter: Mikhail Khludnev


if I have parent with children in the index, I can send update omitting 
children. as a result old children become orphaned. 
I suppose separate \_root_ fields makes much trouble. I propose to extend 
notion of uniqueKey, and let it spans across blocks that makes updates 
unambiguous.  
WDYT? Do you like to see a test proves this issue?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API

2013-09-03 Thread Tim Vaillancourt (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756859#comment-13756859
 ] 

Tim Vaillancourt commented on SOLR-5208:


Thanks Erick.

I agree on RELOAD - I'm not sure if that makes sense or not either, but thought 
of it randomly while listing those commands :). I'll make a new JIRA to discuss 
if that is a good idea or not.

Tim

> Support for the setting of core.properties key/values at create-time on 
> Collections API
> ---
>
> Key: SOLR-5208
> URL: https://issues.apache.org/jira/browse/SOLR-5208
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Tim Vaillancourt
>Assignee: Erick Erickson
> Attachments: SOLR-5208.patch
>
>
> As discussed on e-mail thread "Sharing SolrCloud collection configs 
> w/overrides" 
> (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides),
>  Erick brought up a neat solution using HTTP params at create-time for the 
> Collection API.
> Essentially, this request is for a functionality that allows the setting of 
> variables (core.properties) on Collections API CREATE command.
> Erick's idea:
> "Maybe it's as simple as allowing more params for creation like
> collection.coreName where each param of the form collection.blah=blort
> gets an entry in the properties file blah=blort?..."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate

2013-09-03 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756861#comment-13756861
 ] 

Mark Miller commented on SOLR-5209:
---

Right, but the sub commands are just the wrapper calls - except the shard 
commands - those are new. Th delete core one is mostly about cleanup ini 
remember right. 

The problem is, the overseer and zk do not own the state. The individual cores 
do basically. Mostly that's due to historical stuff. We intend to change that, 
but it's no small feat. Until that is done, I think this is much trickier to 
get right than it looks. 

> cores/action=UNLOAD of last replica removes shard from clusterstate
> ---
>
> Key: SOLR-5209
> URL: https://issues.apache.org/jira/browse/SOLR-5209
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Attachments: SOLR-5209.patch
>
>
> The problem we saw was that unloading of an only replica of a shard deleted 
> that shard's info from the clusterstate. Once it was gone then there was no 
> easy way to re-create the shard (other than dropping and re-creating the 
> whole collection's state).
> This seems like a bug?
> Overseer.java around line 600 has a comment and commented out code:
> // TODO TODO TODO!!! if there are no replicas left for the slice, and the 
> slice has no hash range, remove it
> // if (newReplicas.size() == 0 && slice.getRange() == null) {
> // if there are no replicas left for the slice remove it

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3852) Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere in ZK tree

2013-09-03 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-3852.


   Resolution: Fixed
Fix Version/s: 5.0
   4.5
 Assignee: Hoss Man

> Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere 
> in ZK tree
> 
>
> Key: SOLR-3852
> URL: https://issues.apache.org/jira/browse/SOLR-3852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0-BETA
> Environment: Tomcat 6, external zookeeper-3.3.5 
>Reporter: Vadim Kisselmann
>Assignee: Hoss Man
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-3852.patch
>
>
> Original bug description indicated that when using Solr with embedded ZK 
> everything was fine, but with an external ZK you'd get an 
> ArrayIndexOutOfBoundsException.
> Crux of the problem is some bad assumptions about any ZK node containing data 
> -- the ZookeeperInfoServlet powering the tree view of the Cloud Admin UI 
> assumed that any data would be utf8 text.
> If you are using extenral ZK, and other systems are writing data into ZK, 
> then you are more likely to see this problem, because those other systems 
> might be writing binary data in to ZK nodes -- if you are using ZK embedded 
> in solr, or using solr with it's own private (external) ZK instance, then you 
> would only see this problem if you explicitly put binary files into solr 
> configs and upconfig them into ZK.
> 
> One workarround for people encountering this problem when using Solr with a 
> ZK instance shared by other tools is to make sure you use a "chroot" patch 
> when pointing Solr at ZK, so that it won't know about any other paths in your 
> ZK tree that might have binary data...
> https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot
> If you are having this problem because you put binay files into your own 
> config dir (ie: images for velocity or something like that) then there is no 
> straight forward workarround.
> Example stack trace for this bug...
> {noformat}
> 43242 [qtp965223859-14] WARN  org.eclipse.jetty.servlet.ServletHandler 
> /solr/zookeeper
> java.lang.ArrayIndexOutOfBoundsException: 213
> at 
> org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:620)
> at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:168)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:303)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339)
> ...
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:228)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:106)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3852) Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere in ZK tree

2013-09-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756855#comment-13756855
 ] 

ASF subversion and git services commented on SOLR-3852:
---

Commit 1519779 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1519779 ]

SOLR-3852: Fixed ZookeeperInfoServlet so that the SolrCloud Admin UI pages will 
work even if ZK contains nodes with data which are not utf8 text (merge 
r1519763)

> Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere 
> in ZK tree
> 
>
> Key: SOLR-3852
> URL: https://issues.apache.org/jira/browse/SOLR-3852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0-BETA
> Environment: Tomcat 6, external zookeeper-3.3.5 
>Reporter: Vadim Kisselmann
> Attachments: SOLR-3852.patch
>
>
> Original bug description indicated that when using Solr with embedded ZK 
> everything was fine, but with an external ZK you'd get an 
> ArrayIndexOutOfBoundsException.
> Crux of the problem is some bad assumptions about any ZK node containing data 
> -- the ZookeeperInfoServlet powering the tree view of the Cloud Admin UI 
> assumed that any data would be utf8 text.
> If you are using extenral ZK, and other systems are writing data into ZK, 
> then you are more likely to see this problem, because those other systems 
> might be writing binary data in to ZK nodes -- if you are using ZK embedded 
> in solr, or using solr with it's own private (external) ZK instance, then you 
> would only see this problem if you explicitly put binary files into solr 
> configs and upconfig them into ZK.
> 
> One workarround for people encountering this problem when using Solr with a 
> ZK instance shared by other tools is to make sure you use a "chroot" patch 
> when pointing Solr at ZK, so that it won't know about any other paths in your 
> ZK tree that might have binary data...
> https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot
> If you are having this problem because you put binay files into your own 
> config dir (ie: images for velocity or something like that) then there is no 
> straight forward workarround.
> Example stack trace for this bug...
> {noformat}
> 43242 [qtp965223859-14] WARN  org.eclipse.jetty.servlet.ServletHandler 
> /solr/zookeeper
> java.lang.ArrayIndexOutOfBoundsException: 213
> at 
> org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:620)
> at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:168)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:303)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339)
> ...
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:228)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:106)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API

2013-09-03 Thread Tim Vaillancourt (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756831#comment-13756831
 ] 

Tim Vaillancourt edited comment on SOLR-5208 at 9/3/13 5:50 PM:


Shalin: good point on the SPLITSHARD. To be consistent, are there any other 
places this is needed?

* Core API (already there).
* Collections API "CREATE": discussed here.
* Collections API "SPLITSHARD": Thanks Shalin!.
* Collections API "CREATEALIAS"(?): An alias shouldn't have it's own properties 
AFAIK, but calling that out.
* Collections API "RELOAD"(?): I'm not sure if the Core API functionality does 
this, but adding this to RELOAD would allow changing of properties 
post-create-time. Without this you'd need to DELETE/CREATE to change 
properties, or bypass.

Tim


  was (Author: tvaillancourt):
Shalin: good point on the SPLITSHARD. To be consistent, are there any other 
places this is needed?

* Core API (already there).
* Collections API CREATE: discussed here.
* Collections API SPLITSHARD: Thanks Shalin!.
* Collections API CREATEALIAS(?): An alias shouldn't have it's own properties 
AFAIK, but calling that out.
* Collecitons API RELOAD(?): I'm not sure if the Core API functionality does 
this, but adding this to RELOAD would allow changing of properties 
post-create-time. Without this you'd need to DELETE/CREATE to change 
properties, or bypass.

Tim

  
> Support for the setting of core.properties key/values at create-time on 
> Collections API
> ---
>
> Key: SOLR-5208
> URL: https://issues.apache.org/jira/browse/SOLR-5208
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Tim Vaillancourt
>Assignee: Erick Erickson
> Attachments: SOLR-5208.patch
>
>
> As discussed on e-mail thread "Sharing SolrCloud collection configs 
> w/overrides" 
> (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides),
>  Erick brought up a neat solution using HTTP params at create-time for the 
> Collection API.
> Essentially, this request is for a functionality that allows the setting of 
> variables (core.properties) on Collections API CREATE command.
> Erick's idea:
> "Maybe it's as simple as allowing more params for creation like
> collection.coreName where each param of the form collection.blah=blort
> gets an entry in the properties file blah=blort?..."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API

2013-09-03 Thread Tim Vaillancourt (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756831#comment-13756831
 ] 

Tim Vaillancourt edited comment on SOLR-5208 at 9/3/13 5:51 PM:


Thanks for the patch Erick!

Shalin: good point on the SPLITSHARD. To be consistent, are there any other 
places this is needed?

* Core API (already there).
* Collections API "CREATE": discussed here.
* Collections API "SPLITSHARD": Thanks Shalin!.
* Collections API "CREATEALIAS"(?): An alias shouldn't have it's own properties 
AFAIK, but calling that out.
* Collections API "RELOAD"(?): I'm not sure if the Core API functionality does 
this, but adding this to RELOAD would allow changing of properties 
post-create-time. Without this you'd need to DELETE/CREATE to change 
properties, or bypass.

Tim


  was (Author: tvaillancourt):
Shalin: good point on the SPLITSHARD. To be consistent, are there any other 
places this is needed?

* Core API (already there).
* Collections API "CREATE": discussed here.
* Collections API "SPLITSHARD": Thanks Shalin!.
* Collections API "CREATEALIAS"(?): An alias shouldn't have it's own properties 
AFAIK, but calling that out.
* Collections API "RELOAD"(?): I'm not sure if the Core API functionality does 
this, but adding this to RELOAD would allow changing of properties 
post-create-time. Without this you'd need to DELETE/CREATE to change 
properties, or bypass.

Tim

  
> Support for the setting of core.properties key/values at create-time on 
> Collections API
> ---
>
> Key: SOLR-5208
> URL: https://issues.apache.org/jira/browse/SOLR-5208
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Tim Vaillancourt
>Assignee: Erick Erickson
> Attachments: SOLR-5208.patch
>
>
> As discussed on e-mail thread "Sharing SolrCloud collection configs 
> w/overrides" 
> (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides),
>  Erick brought up a neat solution using HTTP params at create-time for the 
> Collection API.
> Essentially, this request is for a functionality that allows the setting of 
> variables (core.properties) on Collections API CREATE command.
> Erick's idea:
> "Maybe it's as simple as allowing more params for creation like
> collection.coreName where each param of the form collection.blah=blort
> gets an entry in the properties file blah=blort?..."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API

2013-09-03 Thread Tim Vaillancourt (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756831#comment-13756831
 ] 

Tim Vaillancourt commented on SOLR-5208:


Shalin: good point on the SPLITSHARD. To be consistent, are there any other 
places this is needed?

* Core API (already there).
* Collections API CREATE: discussed here.
* Collections API SPLITSHARD: Thanks Shalin!.
* Collections API CREATEALIAS(?): An alias shouldn't have it's own properties 
AFAIK, but calling that out.
* Collecitons API RELOAD(?): I'm not sure if the Core API functionality does 
this, but adding this to RELOAD would allow changing of properties 
post-create-time. Without this you'd need to DELETE/CREATE to change 
properties, or bypass.

Tim


> Support for the setting of core.properties key/values at create-time on 
> Collections API
> ---
>
> Key: SOLR-5208
> URL: https://issues.apache.org/jira/browse/SOLR-5208
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Tim Vaillancourt
>Assignee: Erick Erickson
> Attachments: SOLR-5208.patch
>
>
> As discussed on e-mail thread "Sharing SolrCloud collection configs 
> w/overrides" 
> (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides),
>  Erick brought up a neat solution using HTTP params at create-time for the 
> Collection API.
> Essentially, this request is for a functionality that allows the setting of 
> variables (core.properties) on Collections API CREATE command.
> Erick's idea:
> "Maybe it's as simple as allowing more params for creation like
> collection.coreName where each param of the form collection.blah=blort
> gets an entry in the properties file blah=blort?..."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate

2013-09-03 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-5209:
-

Assignee: Mark Miller

> cores/action=UNLOAD of last replica removes shard from clusterstate
> ---
>
> Key: SOLR-5209
> URL: https://issues.apache.org/jira/browse/SOLR-5209
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Attachments: SOLR-5209.patch
>
>
> The problem we saw was that unloading of an only replica of a shard deleted 
> that shard's info from the clusterstate. Once it was gone then there was no 
> easy way to re-create the shard (other than dropping and re-creating the 
> whole collection's state).
> This seems like a bug?
> Overseer.java around line 600 has a comment and commented out code:
> // TODO TODO TODO!!! if there are no replicas left for the slice, and the 
> slice has no hash range, remove it
> // if (newReplicas.size() == 0 && slice.getRange() == null) {
> // if there are no replicas left for the slice remove it

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate

2013-09-03 Thread Daniel Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756827#comment-13756827
 ] 

Daniel Collins edited comment on SOLR-5209 at 9/3/13 5:46 PM:
--

Ok, my bad, I wasn't clear enough.  At the user-level there is collections API 
and core API, and yes one is just a wrapper around the other.  But at the 
Overseer level, we seem to have various different sub-commands (not sure what 
the correct terminology for them is!): {{create_shard}}, {{removeshard}}, 
{{createcollection}}, {{removecollection}}, {{deletecore}}, etc.  I appreciate 
this is probably historical code, but since we have these other methods, it 
felt like deletecore was overstepping its bounds  now :)

Could submit an extra patch, but wasn't sure of the historical nature of this 
code, hence just a comment first to get an opinion/discussion.

  was (Author: dancollins):
Ok, my bad, I wasn't clear enough.  At the user-level there is collections 
API and core API, and yes one is just a wrapper around the other.  But at the 
Overseer level, we seem to have various different sub-commands (not sure what 
the correct terminology for them is!): create_shard, removeshard, 
createcollection, removecollection, deletecore, etc.  I appreciate this is 
probably historical code, but since we have these other methods, it felt like 
deletecore was overstepping its bounds  now :)

Could submit an extra patch, but wasn't sure of the historical nature of this 
code, hence just a comment first to get an opinion/discussion.
  
> cores/action=UNLOAD of last replica removes shard from clusterstate
> ---
>
> Key: SOLR-5209
> URL: https://issues.apache.org/jira/browse/SOLR-5209
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Attachments: SOLR-5209.patch
>
>
> The problem we saw was that unloading of an only replica of a shard deleted 
> that shard's info from the clusterstate. Once it was gone then there was no 
> easy way to re-create the shard (other than dropping and re-creating the 
> whole collection's state).
> This seems like a bug?
> Overseer.java around line 600 has a comment and commented out code:
> // TODO TODO TODO!!! if there are no replicas left for the slice, and the 
> slice has no hash range, remove it
> // if (newReplicas.size() == 0 && slice.getRange() == null) {
> // if there are no replicas left for the slice remove it

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3852) Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere in ZK tree

2013-09-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756809#comment-13756809
 ] 

ASF subversion and git services commented on SOLR-3852:
---

Commit 1519763 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1519763 ]

SOLR-3852: Fixed ZookeeperInfoServlet so that the SolrCloud Admin UI pages will 
work even if ZK contains nodes with data which are not utf8 text

> Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere 
> in ZK tree
> 
>
> Key: SOLR-3852
> URL: https://issues.apache.org/jira/browse/SOLR-3852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0-BETA
> Environment: Tomcat 6, external zookeeper-3.3.5 
>Reporter: Vadim Kisselmann
> Attachments: SOLR-3852.patch
>
>
> Original bug description indicated that when using Solr with embedded ZK 
> everything was fine, but with an external ZK you'd get an 
> ArrayIndexOutOfBoundsException.
> Crux of the problem is some bad assumptions about any ZK node containing data 
> -- the ZookeeperInfoServlet powering the tree view of the Cloud Admin UI 
> assumed that any data would be utf8 text.
> If you are using extenral ZK, and other systems are writing data into ZK, 
> then you are more likely to see this problem, because those other systems 
> might be writing binary data in to ZK nodes -- if you are using ZK embedded 
> in solr, or using solr with it's own private (external) ZK instance, then you 
> would only see this problem if you explicitly put binary files into solr 
> configs and upconfig them into ZK.
> 
> One workarround for people encountering this problem when using Solr with a 
> ZK instance shared by other tools is to make sure you use a "chroot" patch 
> when pointing Solr at ZK, so that it won't know about any other paths in your 
> ZK tree that might have binary data...
> https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot
> If you are having this problem because you put binay files into your own 
> config dir (ie: images for velocity or something like that) then there is no 
> straight forward workarround.
> Example stack trace for this bug...
> {noformat}
> 43242 [qtp965223859-14] WARN  org.eclipse.jetty.servlet.ServletHandler 
> /solr/zookeeper
> java.lang.ArrayIndexOutOfBoundsException: 213
> at 
> org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:620)
> at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:168)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:303)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339)
> ...
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:228)
> at 
> org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:106)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate

2013-09-03 Thread Daniel Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756827#comment-13756827
 ] 

Daniel Collins commented on SOLR-5209:
--

Ok, my bad, I wasn't clear enough.  At the user-level there is collections API 
and core API, and yes one is just a wrapper around the other.  But at the 
Overseer level, we seem to have various different sub-commands (not sure what 
the correct terminology for them is!): create_shard, removeshard, 
createcollection, removecollection, deletecore, etc.  I appreciate this is 
probably historical code, but since we have these other methods, it felt like 
deletecore was overstepping its bounds  now :)

Could submit an extra patch, but wasn't sure of the historical nature of this 
code, hence just a comment first to get an opinion/discussion.

> cores/action=UNLOAD of last replica removes shard from clusterstate
> ---
>
> Key: SOLR-5209
> URL: https://issues.apache.org/jira/browse/SOLR-5209
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Attachments: SOLR-5209.patch
>
>
> The problem we saw was that unloading of an only replica of a shard deleted 
> that shard's info from the clusterstate. Once it was gone then there was no 
> easy way to re-create the shard (other than dropping and re-creating the 
> whole collection's state).
> This seems like a bug?
> Overseer.java around line 600 has a comment and commented out code:
> // TODO TODO TODO!!! if there are no replicas left for the slice, and the 
> slice has no hash range, remove it
> // if (newReplicas.size() == 0 && slice.getRange() == null) {
> // if there are no replicas left for the slice remove it

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API

2013-09-03 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756843#comment-13756843
 ] 

Erick Erickson commented on SOLR-5208:
--

Tim:

Use At Your Own Risk!!! I just did a _very_ quick test on it, didn't even write 
any tests yet. If you're brave it'd be great to see if it fixes your problem.

About the other commands

SPLITSHARD - I expect so, it has to create another core so we have to at least 
copy the properties file over (which I'm not sure we do either).

CREATALIAS - I doubt it. This shouldn't affect the core.properties.

RELOAD - That's interesting, hadn't really thought about that. It seems 
possible to shoot yourself in the foot here though. I'm also not sure that the 
reload writes out the core.properties file already or not.

> Support for the setting of core.properties key/values at create-time on 
> Collections API
> ---
>
> Key: SOLR-5208
> URL: https://issues.apache.org/jira/browse/SOLR-5208
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Tim Vaillancourt
>Assignee: Erick Erickson
> Attachments: SOLR-5208.patch
>
>
> As discussed on e-mail thread "Sharing SolrCloud collection configs 
> w/overrides" 
> (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides),
>  Erick brought up a neat solution using HTTP params at create-time for the 
> Collection API.
> Essentially, this request is for a functionality that allows the setting of 
> variables (core.properties) on Collections API CREATE command.
> Erick's idea:
> "Maybe it's as simple as allowing more params for creation like
> collection.coreName where each param of the form collection.blah=blort
> gets an entry in the properties file blah=blort?..."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine

2013-09-03 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756769#comment-13756769
 ] 

Hoss Man commented on SOLR-5201:


bq. The second option sounds nice but I wonder if that would cause a problem 
with multiple configurations (2 update chains with 2 different configurations 
of UIMAUpdateRequestProcessorFactory),

It depends on what kinds or problems you are worried about ... each 
UIMAUpdateRequestProcessorFactory instance (ie: one instance per chain) should 
each have it's own AnalysisEngine using it's own configuration ... unless the 
AnalysisEngine constructor/factory/provider does something special to keep 
track of them, they won't know anything about eachother.

If you *want* them to know about eachother (ie: to share an AnalysisEngine  
between chains, or between chains in different SolrCores) then something a lot 
more special case would need to be done.

> UIMAUpdateRequestProcessor should reuse the AnalysisEngine
> --
>
> Key: SOLR-5201
> URL: https://issues.apache.org/jira/browse/SOLR-5201
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - UIMA
>Affects Versions: 4.4
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5201-ae-cache-every-request_branch_4x.patch, 
> SOLR-5201-ae-cache-only-single-request_branch_4x.patch
>
>
> As reported in http://markmail.org/thread/2psiyl4ukaejl4fx 
> UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request 
> which is bad for performance therefore it'd be nice if such AEs could be 
> reused whenever that's possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread Han Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756766#comment-13756766
 ] 

Han Jiang commented on LUCENE-5199:
---

Thanks Rob! Yeah, I just hit another failure around TestSortDocValues. :)

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5199.patch, LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5023) deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj

2013-09-03 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756776#comment-13756776
 ] 

Mark Miller commented on SOLR-5023:
---

I reviewed the patch the other day, looks good, but it still needs a test that 
uses the new code.

> deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj
> -
>
> Key: SOLR-5023
> URL: https://issues.apache.org/jira/browse/SOLR-5023
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Affects Versions: 4.2.1
>Reporter: Lyubov Romanchuk
>Assignee: Shalin Shekhar Mangar
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5023.patch
>
>
> deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload 
> CoreAdminRequest

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756773#comment-13756773
 ] 

ASF subversion and git services commented on LUCENE-5199:
-

Commit 1519756 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1519756 ]

LUCENE-5199: don't use old codec components mixed in with new ones when using 
-Ds

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5199.patch, LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756757#comment-13756757
 ] 

Shai Erera commented on LUCENE-5199:


Thanks Rob!

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5199.patch, LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate

2013-09-03 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756786#comment-13756786
 ] 

Mark Miller commented on SOLR-5209:
---

bq. given we have the collections API to do that

We don't actually have the collections API to do that - it's simply a thin 
candy wrapper around SolrCore admin calls. Everything is driven by SolrCores 
being added or removed. There is work being done to migrate towards something 
where the collections API is actually large and in charge, but currently it's 
just a sugar wrapper.

> cores/action=UNLOAD of last replica removes shard from clusterstate
> ---
>
> Key: SOLR-5209
> URL: https://issues.apache.org/jira/browse/SOLR-5209
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Christine Poerschke
> Attachments: SOLR-5209.patch
>
>
> The problem we saw was that unloading of an only replica of a shard deleted 
> that shard's info from the clusterstate. Once it was gone then there was no 
> easy way to re-create the shard (other than dropping and re-creating the 
> whole collection's state).
> This seems like a bug?
> Overseer.java around line 600 has a comment and commented out code:
> // TODO TODO TODO!!! if there are no replicas left for the slice, and the 
> slice has no hash range, remove it
> // if (newReplicas.size() == 0 && slice.getRange() == null) {
> // if there are no replicas left for the slice remove it

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756775#comment-13756775
 ] 

ASF subversion and git services commented on LUCENE-5199:
-

Commit 1519757 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1519757 ]

LUCENE-5199: don't use old codec components mixed in with new ones when using 
-Ds

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5199.patch, LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5142) Block Indexing / Join Improvements

2013-09-03 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756542#comment-13756542
 ] 

Mikhail Khludnev edited comment on SOLR-5142 at 9/3/13 4:28 PM:


I have a subject for consideration: 
right now unique key field is required for children documents also, but it 
doesn't enforce anything. It's explicitly asserted at 
https://svn.apache.org/viewvc?view=revision&revision=r1519679 I suppose that 
uniqueness is provided by parents and \_root_ field. Don't you feel unique key 
should be optional for children documents? 

  was (Author: mkhludnev):
I have a subject for consideration: 
right now unique key is required for children documents, however, I suppose 
that uniqueness is provided by parents and \_root_ field. Don't you feel unique 
key should be optional for children documents? 
  
> Block Indexing / Join Improvements
> --
>
> Key: SOLR-5142
> URL: https://issues.apache.org/jira/browse/SOLR-5142
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 4.5, 5.0
>
>
> Follow-on main issue for general block indexing / join improvements

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1519516 - /lucene/board-reports/2013/board-report-september.txt

2013-09-03 Thread Chris Hostetter

: Subject: svn commit: r1519516 -
: /lucene/board-reports/2013/board-report-september.txt

+1


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756750#comment-13756750
 ] 

Robert Muir commented on LUCENE-5199:
-

I will commit it, i am worried Billy will hit other problems: because in 
general old codec components should not be "mixed in"

When we test an old codec, it should always be "the whole codec" to 
realistically simulate the backwards format...

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5199.patch, LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756743#comment-13756743
 ] 

Shai Erera commented on LUCENE-5199:


Thanks Rob. I can commit it later, but feel free to commit it if you get to it 
before me.

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5199.patch, LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756748#comment-13756748
 ] 

Michael McCandless commented on LUCENE-5199:


+1

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5199.patch, LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate

2013-09-03 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756714#comment-13756714
 ] 

Shalin Shekhar Mangar commented on SOLR-5209:
-

I think this deserves another look. We have the deleteshard API now which can 
be used to completely remove a slice from the cluster state. We should remove 
this trappy behaviour.

> cores/action=UNLOAD of last replica removes shard from clusterstate
> ---
>
> Key: SOLR-5209
> URL: https://issues.apache.org/jira/browse/SOLR-5209
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Christine Poerschke
> Attachments: SOLR-5209.patch
>
>
> The problem we saw was that unloading of an only replica of a shard deleted 
> that shard's info from the clusterstate. Once it was gone then there was no 
> easy way to re-create the shard (other than dropping and re-creating the 
> whole collection's state).
> This seems like a bug?
> Overseer.java around line 600 has a comment and commented out code:
> // TODO TODO TODO!!! if there are no replicas left for the slice, and the 
> slice has no hash range, remove it
> // if (newReplicas.size() == 0 && slice.getRange() == null) {
> // if there are no replicas left for the slice remove it

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5023) deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj

2013-09-03 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-5023:
---

Assignee: Shalin Shekhar Mangar  (was: Mark Miller)

> deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj
> -
>
> Key: SOLR-5023
> URL: https://issues.apache.org/jira/browse/SOLR-5023
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Affects Versions: 4.2.1
>Reporter: Lyubov Romanchuk
>Assignee: Shalin Shekhar Mangar
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5023.patch
>
>
> deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload 
> CoreAdminRequest

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5023) deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj

2013-09-03 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756732#comment-13756732
 ] 

Shalin Shekhar Mangar commented on SOLR-5023:
-

Patch looks good. I'll commit shortly.

> deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj
> -
>
> Key: SOLR-5023
> URL: https://issues.apache.org/jira/browse/SOLR-5023
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Affects Versions: 4.2.1
>Reporter: Lyubov Romanchuk
>Assignee: Mark Miller
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5023.patch
>
>
> deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload 
> CoreAdminRequest

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API

2013-09-03 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756710#comment-13756710
 ] 

Shalin Shekhar Mangar commented on SOLR-5208:
-

+1 for adding this to splitshard and createshard as well.

> Support for the setting of core.properties key/values at create-time on 
> Collections API
> ---
>
> Key: SOLR-5208
> URL: https://issues.apache.org/jira/browse/SOLR-5208
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Tim Vaillancourt
>Assignee: Erick Erickson
> Attachments: SOLR-5208.patch
>
>
> As discussed on e-mail thread "Sharing SolrCloud collection configs 
> w/overrides" 
> (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides),
>  Erick brought up a neat solution using HTTP params at create-time for the 
> Collection API.
> Essentially, this request is for a functionality that allows the setting of 
> variables (core.properties) on Collections API CREATE command.
> Erick's idea:
> "Maybe it's as simple as allowing more params for creation like
> collection.coreName where each param of the form collection.blah=blort
> gets an entry in the properties file blah=blort?..."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5199:


Attachment: LUCENE-5199.patch

Here is the correct fix. the brokenness was just in 
TestRuleSetupAndRestoreClassEnv.

The current approach is no good, Han will hit many other issues testing because 
we should _never_ mix in old deprecated codecs with new ones... its not 
supported.

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5199.patch, LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-5177) test covers overwrite true/false for block updates

2013-09-03 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev closed SOLR-5177.
--


well done!

> test covers overwrite true/false for block updates 
> ---
>
> Key: SOLR-5177
> URL: https://issues.apache.org/jira/browse/SOLR-5177
> Project: Solr
>  Issue Type: Sub-task
>Affects Versions: 4.5, 5.0
>Reporter: Mikhail Khludnev
>  Labels: patch, test
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5177.patch
>
>
> DUH2 uses \_root_ field to support overwrite for block updates. I want to 
> contribute this test, which assert the current functionality.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread Han Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756670#comment-13756670
 ] 

Han Jiang commented on LUCENE-5199:
---

Thanks Shai!

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.0, 4.5
>
> Attachments: LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)

2013-09-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756687#comment-13756687
 ] 

Robert Muir commented on LUCENE-5178:
-

Can this commit please be reverted? 

The change makes the test API so complicated for something that cannot happen:
You cannot have "unsupported fields" its all or none.

This is a bug in LuceneTestCase, it should not do this when someone uses 
-Dtests.postingsformat. 

> doc values should expose missing values (or allow configurable defaults)
> 
>
> Key: LUCENE-5178
> URL: https://issues.apache.org/jira/browse/LUCENE-5178
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5178.patch, LUCENE-5178_reintegrate.patch
>
>
> DocValues should somehow allow a configurable default per-field.
> Possible implementations include setting it on the field in the document or 
> registration of an IndexWriter callback.
> If we don't make the default configurable, then another option is to have 
> DocValues fields keep track of whether a value was indexed for that document 
> or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5177) test covers overwrite true/false for block updates

2013-09-03 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-5177.


   Resolution: Fixed
Fix Version/s: 5.0
   4.5

> test covers overwrite true/false for block updates 
> ---
>
> Key: SOLR-5177
> URL: https://issues.apache.org/jira/browse/SOLR-5177
> Project: Solr
>  Issue Type: Sub-task
>Affects Versions: 4.5, 5.0
>Reporter: Mikhail Khludnev
>  Labels: patch, test
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5177.patch
>
>
> DUH2 uses \_root_ field to support overwrite for block updates. I want to 
> contribute this test, which assert the current functionality.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates

2013-09-03 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5189:
---

Attachment: LUCENE-5189.patch

Patch improves RAM accounting in BufferedDeletes and FrozenBD. I added 
NumericUpdate.sizeInBytes() so most of the logic is done there. BD adds two 
constants - for adding to outer Map (includes inner map OBJ_HEADER) and adding 
an actual update to inner map (excludes map's OBJ_HEADER). Only the pointers 
are taken into account.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reopened LUCENE-5199:
-


this is not necessary. the only codecs that dont support this, arent 
instantiated with per-field docvalues (Unless there is a bug in facet/ tests 
with what its doing).

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.0, 4.5
>
> Attachments: LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756678#comment-13756678
 ] 

Robert Muir commented on LUCENE-5189:
-

{quote}
Patch adds per-field support. I currently do that by adding a boolean 
'isFieldUpdate' to SegWriteState which is set to true only by 
ReaderAndLiveDocs. PerFieldDVF then peeks into that boolean and if it's true, 
it reads the format name from FieldInfo.attributes() instead of relying on 
Codec.getPerFieldDVF(). If we'll eventually gen FieldInfos, there won't be a 
need for this boolean as PerFieldDVF will get that from FI.dvGen.
{quote}

We can't move forward really with this boolean: it only attacks the symptom 
(puts a HACK in per-field) without fixing the disease (the codec API).

In general if a codec needs to write to and read from 
FieldInfos/SegmentInfos.attributes, it doesnt work here: this api needs to be 
fixed.


> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756676#comment-13756676
 ] 

Robert Muir commented on LUCENE-5197:
-

RAMUsageEstimator is quite slow when you need to run it on lots of objects 
(e.g. the codec tree here).

Also it totally breaks down if it hits certain objects like a ThreadLocal.

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756650#comment-13756650
 ] 

Shai Erera commented on LUCENE-5189:


Thanks for the review Mike. I nuked the two unused methodsand I like 
SegmentCommitInfo, so changed the nocommit text.

I changed the nocomit in SIPC to a TODO. Don't think we need to tackle it in 
this issue.

I'm working on improving the RAM accounting. I want to add to 
NumericUpdate.sizeInBytes() and count it per-update that is actually added. 
Then, because it's a Map> and the Term and 
String are both in NumericUdpate already (and will be accounted for in its 
calculation, only their PTR needs to be taken into account. Also, the 
sizeInBytes should grow by new entry to outer map only when one is actually 
added, same for inner map. Therefore I don't think we can have a single 
constant here, but instead maybe two: for every Entry added to the 
outer map and every Entry added to the inner map. I 
think, because I need to compute the shallow sizes only (since Term and String 
are accounted for in NumericUpdate), it's a single constant for 
Entry?

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5177) test covers overwrite true/false for block updates

2013-09-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756649#comment-13756649
 ] 

ASF subversion and git services commented on SOLR-5177:
---

Commit 1519694 from [~yo...@apache.org] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1519694 ]

SOLR-5177: tests - add overwrite test for block join

> test covers overwrite true/false for block updates 
> ---
>
> Key: SOLR-5177
> URL: https://issues.apache.org/jira/browse/SOLR-5177
> Project: Solr
>  Issue Type: Sub-task
>Affects Versions: 4.5, 5.0
>Reporter: Mikhail Khludnev
>  Labels: patch, test
> Attachments: SOLR-5177.patch
>
>
> DUH2 uses \_root_ field to support overwrite for block updates. I want to 
> contribute this test, which assert the current functionality.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API

2013-09-03 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5208:
-

Attachment: SOLR-5208.patch

Here's a Proof-of-Concept patch. It creates a core.properties file that has any 
property.key=value pair specified on the collection.create line reproduced as 
key=value. Mark was spot-on that it's just a matter of passing the params 
through to core creation..

[~markrmil...@gmail.com] is that what you had in mind?

Several questions though.

1> Is copying the property.key=value necessary in 
CollectionsHandler.handleCreateShard?

2> Similarly, should the property.key=value stuff be done in 
OverseerCollectionProcessor.createShard? What about splitShard? Just going by 
all the params.set that "look kinda like create" it seems possible at least.



> Support for the setting of core.properties key/values at create-time on 
> Collections API
> ---
>
> Key: SOLR-5208
> URL: https://issues.apache.org/jira/browse/SOLR-5208
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Tim Vaillancourt
>Assignee: Erick Erickson
> Attachments: SOLR-5208.patch
>
>
> As discussed on e-mail thread "Sharing SolrCloud collection configs 
> w/overrides" 
> (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides),
>  Erick brought up a neat solution using HTTP params at create-time for the 
> Collection API.
> Essentially, this request is for a functionality that allows the setting of 
> variables (core.properties) on Collections API CREATE command.
> Erick's idea:
> "Maybe it's as simple as allowing more params for creation like
> collection.coreName where each param of the form collection.blah=blort
> gets an entry in the properties file blah=blort?..."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5199.


   Resolution: Fixed
Fix Version/s: 4.5
   5.0

Committed to trunk and 4x.

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.0, 4.5
>
> Attachments: LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756633#comment-13756633
 ] 

ASF subversion and git services commented on LUCENE-5199:
-

Commit 1519690 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1519690 ]

LUCENE-5199: Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check 
the actual DocValuesFormat used per-field

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2013-09-03 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5199:
---

Summary: Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check 
the actual DocValuesFormat used per-field  (was: Improve 
LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
DocValuesFormat in user per-field)

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat used per-field
> ---
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat in user per-field

2013-09-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756626#comment-13756626
 ] 

ASF subversion and git services commented on LUCENE-5199:
-

Commit 1519685 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1519685 ]

LUCENE-5199: Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check 
the actual DocValuesFormat used per-field

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat in user per-field
> --
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5177) test covers overwrite true/false for block updates

2013-09-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756610#comment-13756610
 ] 

ASF subversion and git services commented on SOLR-5177:
---

Commit 1519679 from [~yo...@apache.org] in branch 'dev/trunk'
[ https://svn.apache.org/r1519679 ]

SOLR-5177: tests - add overwrite test for block join

> test covers overwrite true/false for block updates 
> ---
>
> Key: SOLR-5177
> URL: https://issues.apache.org/jira/browse/SOLR-5177
> Project: Solr
>  Issue Type: Sub-task
>Affects Versions: 4.5, 5.0
>Reporter: Mikhail Khludnev
>  Labels: patch, test
> Attachments: SOLR-5177.patch
>
>
> DUH2 uses \_root_ field to support overwrite for block updates. I want to 
> contribute this test, which assert the current functionality.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-03 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756595#comment-13756595
 ] 

Uwe Schindler commented on LUCENE-5197:
---

>From my perspective: Usage of RAMUsageEstimator is to be preferred, everything 
>else gets outdated very fast (especially we have no idea about caches like 
>FieldCache used). In production RAMDirectory is not used, and things like MMap 
>or NIOFSDir have no heap usage and default codec is also not much, so I see no 
>reason to don't trust offical RAM usage as reported by RAMUsageEstimator.
The memory usage counting of IndexWriter is way different to what we have on 
the IndexReader side. The accounting done on IndexWriter side are much more 
under control of the Lucene code and are very fine granular, but stuff like 
proposed changes in FixedBitSet are just nonsense to me. RAMUsageEstimator can 
estimate FixedBitSet very correct (that's just easy and in my opinion 100% 
correct).

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-09-03 Thread Han Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-3069:
--

Attachment: LUCENE-3069.patch

The uploaded patch should show all the changes against trunk: I added two 
different implementations of term dict, and refactored the PostingsBaseFormat 
to plug in non-block based term dicts.

I'm still working on the javadocs, and maybe we should rename that 'temp' 
package, like 'fstterms'?



> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756556#comment-13756556
 ] 

Dawid Weiss commented on LUCENE-5197:
-

This can be done with the same implementation -- a tree visitor could collect 
partial sums and aggregate them for the object hierarchy in the form of a tree 
rather than sum it all up. 

This is sort-of implemented here (although I kept the same implementation of 
RamUsageEstimator; a visitor pattern would be more elegant I think).
https://github.com/dweiss/java-sizeof/blob/master/src/main/java/com/carrotsearch/sizeof/ObjectTree.java

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat in user per-field

2013-09-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756555#comment-13756555
 ] 

Shai Erera commented on LUCENE-5199:


Core and Facet tests pass (only users of this API). I think it's ready to 
commit. We can add more formats as Jenkins hunts them down (if there are any).

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat in user per-field
> --
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat in user per-field

2013-09-03 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5199:
---

Attachment: LUENE-5199.patch

Added {{String... fields}} to LTC.defaultCodecSupportsDocsWithField and tested 
that the DVF returned for each field is not "Lucene40/41/42". Are there more 
DVFs I should add?

I also changed all tests that call this method to pass the list of fields they 
use up front.

> Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
> DocValuesFormat in user per-field
> --
>
> Key: LUCENE-5199
> URL: https://issues.apache.org/jira/browse/LUCENE-5199
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUENE-5199.patch
>
>
> On LUCENE-5178 Han reported the following test failure:
> {noformat}
> [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
>[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
>[junit4]>   less than 10 ([8)
>[junit4]>   less than or equal to 10 (]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...> but was:<...(0)
>[junit4]>   less than 10 ([28)
>[junit4]>   less than or equal to 10 (2]8)
>[junit4]>   over 90 (8)
>[junit4]>   9...>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
>[junit4]>  at 
> org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
>[junit4]>  at java.lang.Thread.run(Thread.java:722)
> {noformat}
> which can be reproduced with
> {noformat}
> tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
> -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
> -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
> -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
> {noformat}
> It seems that the Codec that is picked is a Lucene45Codec with 
> Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
> should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
> and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756552#comment-13756552
 ] 

Michael McCandless commented on LUCENE-5197:


This would be a great addition!  And I agree if we can somehow do this with 
RUE, being able to restrict where it's allowed to "crawl", that would be nice.

Separately, I wonder if we could get a breakdown of the RAM usage ... sort of 
like Explanation, which returns both the value and a String (English) 
description of how the value was computed.  But this can come later ... just a 
ramBytesUsed returning long is a great start.

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)

2013-09-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756551#comment-13756551
 ] 

Shai Erera commented on LUCENE-5178:


Opened LUCENE-5199.

> doc values should expose missing values (or allow configurable defaults)
> 
>
> Key: LUCENE-5178
> URL: https://issues.apache.org/jira/browse/LUCENE-5178
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5178.patch, LUCENE-5178_reintegrate.patch
>
>
> DocValues should somehow allow a configurable default per-field.
> Possible implementations include setting it on the field in the document or 
> registration of an IndexWriter callback.
> If we don't make the default configurable, then another option is to have 
> DocValues fields keep track of whether a value was indexed for that document 
> or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat in user per-field

2013-09-03 Thread Shai Erera (JIRA)
Shai Erera created LUCENE-5199:
--

 Summary: Improve LuceneTestCase.defaultCodecSupportsDocsWithField 
to check the actual DocValuesFormat in user per-field
 Key: LUCENE-5199
 URL: https://issues.apache.org/jira/browse/LUCENE-5199
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Shai Erera
Assignee: Shai Erera


On LUCENE-5178 Han reported the following test failure:

{noformat}
[junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<<
   [junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0)
   [junit4]>   less than 10 ([8)
   [junit4]>   less than or equal to 10 (]8)
   [junit4]>   over 90 (8)
   [junit4]>   9...> but was:<...(0)
   [junit4]>   less than 10 ([28)
   [junit4]>   less than or equal to 10 (2]8)
   [junit4]>   over 90 (8)
   [junit4]>   9...>
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
   [junit4]>at 
org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
   [junit4]>at java.lang.Thread.run(Thread.java:722)
{noformat}

which can be reproduced with

{noformat}
tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
-Dtests.seed=815B6AA86D05329C -Dtests.slow=true -Dtests.postingsformat=Lucene41 
-Dtests.locale=ca -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
{noformat}

It seems that the Codec that is picked is a Lucene45Codec with 
Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
and check that the actual DVF used for each field supports it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5142) Block Indexing / Join Improvements

2013-09-03 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756542#comment-13756542
 ] 

Mikhail Khludnev commented on SOLR-5142:


I have a subject for consideration: 
right now unique key is required for children documents, however, I suppose 
that uniqueness is provided by parents and \_root_ field. Don't you feel unique 
key should be optional for children documents? 

> Block Indexing / Join Improvements
> --
>
> Key: SOLR-5142
> URL: https://issues.apache.org/jira/browse/SOLR-5142
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 4.5, 5.0
>
>
> Follow-on main issue for general block indexing / join improvements

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5175) Don't reorder children document

2013-09-03 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-5175:
---

Issue Type: Sub-task  (was: Bug)
Parent: SOLR-5142

> Don't reorder children document
> ---
>
> Key: SOLR-5175
> URL: https://issues.apache.org/jira/browse/SOLR-5175
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Reporter: Mikhail Khludnev
>  Labels: patch, test
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5175.patch
>
>
> AddUpdateCommand reverses children documents that causes failure of 
> BJQParserTest.testGrandChildren() discussed in SOLR-5168  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)

2013-09-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756536#comment-13756536
 ] 

Shai Erera commented on LUCENE-5178:


I see. I think this can also happen if you use RandomCodec and it draws 
Lucene42DVF? So in this case, with this seed, it trips if you set 
postingsformat, but I'm not sure that in general this assume() is correct.

The ugly part of having a test calling _TestUtil.geDVF(field) (or we wrap it in 
a nice method) is that the test will need to decide up front on all the fields 
it uses, and if there's a mistake, the error may happen in the future and 
harder to debug (i.e. spot that the test uses a different field than what it 
passed to assume()). But I don't think that asserting the Codec is the right 
test here, so this has to change.

> doc values should expose missing values (or allow configurable defaults)
> 
>
> Key: LUCENE-5178
> URL: https://issues.apache.org/jira/browse/LUCENE-5178
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5178.patch, LUCENE-5178_reintegrate.patch
>
>
> DocValues should somehow allow a configurable default per-field.
> Possible implementations include setting it on the field in the document or 
> registration of an IndexWriter callback.
> If we don't make the default configurable, then another option is to have 
> DocValues fields keep track of whether a value was indexed for that document 
> or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5177) test covers overwrite true/false for block updates

2013-09-03 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-5177:
---

Issue Type: Sub-task  (was: Test)
Parent: SOLR-5142

> test covers overwrite true/false for block updates 
> ---
>
> Key: SOLR-5177
> URL: https://issues.apache.org/jira/browse/SOLR-5177
> Project: Solr
>  Issue Type: Sub-task
>Affects Versions: 4.5, 5.0
>Reporter: Mikhail Khludnev
>  Labels: patch, test
> Attachments: SOLR-5177.patch
>
>
> DUH2 uses \_root_ field to support overwrite for block updates. I want to 
> contribute this test, which assert the current functionality.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)

2013-09-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756523#comment-13756523
 ] 

Michael McCandless commented on LUCENE-5178:


It sounds like we need to check the actual DVFormat for that field 
(_TestUtil.getDocValuesFormat("field")) and then test whether that format 
supports missing values.

I think this failure can only happen if you explicitly set 
-Dtests.postingsformat, because then we make an anon subclass of Lucene45 
(TestRuleSetupAndRestoreClassEnv.java at line 194) ... so it sounds like in 
general we should not be using defaultCodecSupportsDocsWithField() but rather 
something like defaultDVFormatSupportsDocsWithField(String field) ...

> doc values should expose missing values (or allow configurable defaults)
> 
>
> Key: LUCENE-5178
> URL: https://issues.apache.org/jira/browse/LUCENE-5178
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5178.patch, LUCENE-5178_reintegrate.patch
>
>
> DocValues should somehow allow a configurable default per-field.
> Possible implementations include setting it on the field in the document or 
> registration of an IndexWriter callback.
> If we don't make the default configurable, then another option is to have 
> DocValues fields keep track of whether a value was indexed for that document 
> or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: term collection frequence in lucene 3.6.2?

2013-09-03 Thread Michael McCandless
3.6.x doesn't track this statistic, but 4.x does: TermsEnum.totalTermFreq().

In 3.6.x you could visit every doc, summing up the .freq(), but this is slowish.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Sep 3, 2013 at 4:19 AM, jiangwen jiang  wrote:
> Hi, gay.
>
> Term collection frequence (which means how many times a particular term
> appears in all documents), is this data exists in lucene 3.6.2.
>
> for example:
> doc1 contains terms: T1 T2 T3 T1 T1
> doc2 contains Term T1 T4 T4
> 
>
> T1 appears 4 times in all documents, so term collection freq of T1 is 4
>
> Thanks for your help
> Regards

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates

2013-09-03 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5189:
---

Attachment: LUCENE-5189.patch

Patch adds per-field support. I currently do that by adding a boolean 
'isFieldUpdate' to SegWriteState which is set to true only by 
ReaderAndLiveDocs. PerFieldDVF then peeks into that boolean and if it's true, 
it reads the format name from FieldInfo.attributes() instead of relying on 
Codec.getPerFieldDVF(). If we'll eventually gen FieldInfos, there won't be a 
need for this boolean as PerFieldDVF will get that from FI.dvGen.

So far all Codecs work. I had to remove an assert from SimpleText which tested 
that all fields read from the file are in the state.fieldInfos. But it doesn't 
use that information, only an assert. And SegCoreReader now passes to each 
DVProducer only the fields it needs to read.

Added some tests too.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756494#comment-13756494
 ] 

Michael McCandless commented on LUCENE-5189:


Patch looks great!  I review some of the remaining nocommits:

{quote}
// nocommit no one calls this method, why do we have it? and if we need it, do 
we need one for docValuesGen too?
public void setDelGen(long delGen) {
{quote}

Nuke it!  We only use .advanceNextWriteDelGen (and the patch adds this
for DVs too).

{quote}
// nocommit no one calls this, remove?
void clearDelGen() {
{quote}

Nuke it!

bq. class ReadersAndLiveDocs { // nocommit (RENAME) to ReaderAndUpdates?

+1 for ReaderAndUpdates

{quote}
// nocommit why do we do that, vs relying on TrackingDir.getCreatedFiles(),
// like we do for updates?
{quote}

That's a good question ... I'm not sure.  We in fact already use
TrackingDirWrapper (in ReadersAndLiveDocs.writeLiveDocs)... so we
could in theory record those files in SIPC and remove
LiveDocsFormat.files().  Maybe make this a TODO though?

{quote}
// nocommit: review!
final static int BYTES_PER_NUMERIC_UPDATE = BYTES_PER_DEL_TERM + 
2*RamUsageEstimator.NUM_BYTES_OBJECT_REF + RamUsageEstimator.NUM_BYTES_INT + 
RamUsageEstimator.NUM_BYTES_LONG;
{quote}

I think it makes sense to start from BYTES_PER_DEL_TERM, but then
instead of mapping to value Integer we map to value
Map whose per-Term RAM usage is something like:

{noformat}
  PTR (for LinkedHashMap, since it must link each entry to the next?)

  Map
HEADER
PTR (to array)
3 INT
1 FLOAT

  for each occupied Entry
PTR (from Map's entries array) * 2 (overhead for load factor)
HEADER
PTR * 2 (key, value)

String key
  HEADER
  INT
  PTR (to char[])
  
  ARRAY_HEADER + 2 * length-of-string (field)

NumericUpdate value
  HEADER
  PTR (to Term; ram already accounted for)
  PTR (to String; ram already accounted for)
  PTR (to Long value) + HEADER + 8 (long)
  INT
{noformat}

The thing is, this is so hairy ... that I think maybe we should
instead use RamUsageEstimator to "calibrate" this?  Ie, make a
standalone test that keeps adding Term + fields into this structure
and measures the RAM with RUE?  Do this on 32 bit and on 64 bit JVM,
and then conditionalize the constants.  You'll still need to add in
bytes according to field/term lengths...

bq. +public class SegmentInfoPerCommit { // nocommit (RENAME) to SegmentCommit?

Not sure about that rename ... since this class is just the "metadata"
about a commit, not an "actual" commit.  Maybe SegmentCommitInfo?


> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 792 - Failure!

2013-09-03 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/792/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 9482 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/bin/java 
-XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=7732BB689312EA08 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Dtests.disableHdfs=true -classpath 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/test-framework/lib/junit4-ant-2.0.10.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test-files:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/lucene-codecs-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queries/lucene-queries-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queryparser/lucene-queryparser-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/join/lucene-join-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-cli-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-codec-1.7.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-configuration-1.6.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-fileupload-1.2.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-lang-2.6.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/concurrentlinkedhashmap-lru-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/dom4j-1.6.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/guava-14.0.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-annotations-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-auth-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-common-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-hdfs-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/joda-time-2.2.jar:/Users/jenkins/wor

  1   2   >