Re: Joins Debug info is missing when we hit Filter Cache
Even if it's not production, Joins Debug info (similar to other components like mlt, highlight, facets etc) could be very useful with/without hitting caches. I can create a Jira ticket and write more detais, but wanted to here the opinions first. Thanks & Regards, Kranti K Parisa http://www.linkedin.com/in/krantiparisa On Wed, Sep 4, 2013 at 2:24 AM, Kranti Parisa wrote: > I am sure this doesn't exist today, but just wondering about your thoughts. > > When we use Join queries (first time or with out hitting Filter Cache) and > say debug=true, we are able to see good amount of debug info in the > response. > > Do we have any plans of supporting this debug info even when we hit the > Filter Cache. I believe that this information will be helpful with/without > hitting the caches. > > Consider this use case: in production, a request comes in and builds the > Filter Cache for a Join Query and at some point of time we want to run that > query manually with debug turned on, we can't see a bunch of very useful > stats/numbers. > > > Thanks & Regards, > Kranti K Parisa > http://www.linkedin.com/in/krantiparisa > >
Joins Debug info is missing when we hit Filter Cache
I am sure this doesn't exist today, but just wondering about your thoughts. When we use Join queries (first time or with out hitting Filter Cache) and say debug=true, we are able to see good amount of debug info in the response. Do we have any plans of supporting this debug info even when we hit the Filter Cache. I believe that this information will be helpful with/without hitting the caches. Consider this use case: in production, a request comes in and builds the Filter Cache for a Join Query and at some point of time we want to run that query manually with debug turned on, we can't see a bunch of very useful stats/numbers. Thanks & Regards, Kranti K Parisa http://www.linkedin.com/in/krantiparisa
[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine
[ https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757487#comment-13757487 ] Tommaso Teofili commented on SOLR-5201: --- bq. unless the AnalysisEngine constructor/factory/provider does something special to keep track of them, they won't know anything about eachother ok, so I think we can go with the second option of having the _UIMAUpdateRequestProcessorFactory_ serve the _AnalysisEngine_ to _UIMAUpdateRequestProcessors_. I'll post a patch later today. > UIMAUpdateRequestProcessor should reuse the AnalysisEngine > -- > > Key: SOLR-5201 > URL: https://issues.apache.org/jira/browse/SOLR-5201 > Project: Solr > Issue Type: Improvement > Components: contrib - UIMA >Affects Versions: 4.4 >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 4.5, 5.0 > > Attachments: SOLR-5201-ae-cache-every-request_branch_4x.patch, > SOLR-5201-ae-cache-only-single-request_branch_4x.patch > > > As reported in http://markmail.org/thread/2psiyl4ukaejl4fx > UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request > which is bad for performance therefore it'd be nice if such AEs could be > reused whenever that's possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4956) make maxBufferedAddsPerServer configurable
[ https://issues.apache.org/jira/browse/SOLR-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757468#comment-13757468 ] asif commented on SOLR-4956: Just to put things in perspective - I ran few more tests on our setup. We buffer up to 1000 documents before we post to cloud every 5-10 seconds - On average I notice 2-4 times higher cpu on replica's when maxBufferedAddsPerServer is set to 10. The fact that 1000 documents are sent as 100 different requests of 10 documents each might explain the higher load on the replica. When we alter it to about 1000 - cpu usage is more or less in line. > make maxBufferedAddsPerServer configurable > -- > > Key: SOLR-4956 > URL: https://issues.apache.org/jira/browse/SOLR-4956 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.3, 5.0 >Reporter: Erick Erickson > > Anecdotal user's list evidence indicates that in high-throughput situations, > the default of 10 docs/batch for inter-shard batching can generate > significant CPU load. See the thread titled "Sharding and Replication" on > June 19th, but the gist is below. > I haven't poked around, but it's a little surprising on the surface that Asif > is seeing this kind of difference. So I'm wondering if this change indicates > some other underlying issue. Regardless, this seems like it would be good to > investigate. > Here's the gist of Asif's experience from the thread: > Its a completely practical problem - we are exploring Solr to build a real > time analytics/data solution for a system handling about 1000 qps. We have > various metrics that are stored as different collections on the cloud, > which means very high amount of writes. The cloud also needs to support > about 300-400 qps. > We initially tested with a single Solr node on a 16 core / 24 GB box for a > single metric. We saw that writes were not a issue at all - Solr was > handling it extremely well. We were also able to achieve about 200 qps from > a single node. > When we set up the cloud ( a ensemble on 6 boxes), we saw very high CPU > usage on the replicas. Up to 10 cores were getting used for writes on the > replicas. Hence my concern with respect to batch updates for the replicas. > BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU usage is > very similar to single node installation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757466#comment-13757466 ] Hoss Man commented on SOLR-2548: bq. This bit in SimpleFacets.getFacetFieldCounts bothers me: ... bq. It seems like if the user doesn't specify anything for FACET_THREADS, they wind up spawning as many threads as there are facet fields specified I haven't reviewed the patch, but based on the snippet you posted i suspect you are reading that bit correctly. If FACET_THREADS isn't specified, or if it's specified and equals the default value of 0, then the directExecutor is used and _no_ threads should be spawned at all -- the value of maxThreads shouldn't matter at that point, instead the existing request thread should processes all of them sequentially. I'm guessing you should change the patch back. Side comments... 1) Something sketchy probably does happen if a user passes in a negative value -- it looks like that's the case when facetExecutor will be used with an unlimited number of threads ... that may actually have been intentional -- that if you say facet.threads=-1 every facet.field should get it's own thread, no matter how many there are, but if that is intentional i'd love to see a comment there making that obvious. (and a test showing that it works). 2) can you please fix that Integer.parseInt(..."0")) to just use params.getInt(...,0) ... that way the correct error message will be returned if it's not an int (and it's easier to read) > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch, SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: term collection frequence in lucene 3.6.2?
Thanks, Mike 2013/9/3 Michael McCandless > 3.6.x doesn't track this statistic, but 4.x does: > TermsEnum.totalTermFreq(). > > In 3.6.x you could visit every doc, summing up the .freq(), but this is > slowish. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Sep 3, 2013 at 4:19 AM, jiangwen jiang > wrote: > > Hi, gay. > > > > Term collection frequence (which means how many times a particular term > > appears in all documents), is this data exists in lucene 3.6.2. > > > > for example: > > doc1 contains terms: T1 T2 T3 T1 T1 > > doc2 contains Term T1 T4 T4 > > > > > > T1 appears 4 times in all documents, so term collection freq of T1 is 4 > > > > Thanks for your help > > Regards > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757419#comment-13757419 ] ASF subversion and git services commented on LUCENE-3069: - Commit 1519909 from [~billy] in branch 'dev/branches/lucene3069' [ https://svn.apache.org/r1519909 ] LUCENE-3069: javadocs > Lucene should have an entirely memory resident term dictionary > -- > > Key: LUCENE-3069 > URL: https://issues.apache.org/jira/browse/LUCENE-3069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index, core/search >Affects Versions: 4.0-ALPHA >Reporter: Simon Willnauer >Assignee: Han Jiang > Labels: gsoc2013 > Fix For: 5.0, 4.5 > > Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch > > > FST based TermDictionary has been a great improvement yet it still uses a > delta codec file for scanning to terms. Some environments have enough memory > available to keep the entire FST based term dict in memory. We should add a > TermDictionary implementation that encodes all needed information for each > term into the FST (custom fst.Output) and builds a FST from the entire term > not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5200) HighFreqTerms has confusing behavior with -t option
Robert Muir created LUCENE-5200: --- Summary: HighFreqTerms has confusing behavior with -t option Key: LUCENE-5200 URL: https://issues.apache.org/jira/browse/LUCENE-5200 Project: Lucene - Core Issue Type: Bug Components: modules/other Reporter: Robert Muir {code} * HighFreqTerms class extracts the top n most frequent terms * (by document frequency) from an existing Lucene index and reports their * document frequency. * * If the -t flag is given, both document frequency and total tf (total * number of occurrences) are reported, ordered by descending total tf. {code} Problem #1: Its tricky what happens with -t: if you ask for the top-100 terms, it requests the top-100 terms (by docFreq), then resorts the top-N by totalTermFreq. So its not really the top 100 most frequently occurring terms. Problem #2: Using the -t option can be confusing and slow: the reported docFreq includes deletions, but totalTermFreq does not (it actually walks postings lists if there is even one deletion). I think this is a relic from 3.x days when lucene did not support this statistic. I think we should just always output both TermsEnum.docFreq() and TermsEnum.totalTermFreq(), and -t just determines the comparator of the PQ. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2548: - Attachment: SOLR-2548.patch Latest version that does two things: 1> does the max thread change I commented on earlier. 2> puts in some checking to insure that if multiple threads try to uninvert the same field at the same time, it'll only be loaded once. I used a simple wait/sleep loop here since this method is called from several places and it looks like a real pain to try to do a Future or whatever. > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch, SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757323#comment-13757323 ] Erick Erickson commented on SOLR-2548: -- This bit in SimpleFacets.getFacetFieldCounts bothers me: int maxThreads = Integer.parseInt(req.getParams().get(FacetParams.FACET_THREADS, "0")); Executor executor = maxThreads == 0 ? directExecutor : facetExecutor; maxThreads = maxThreads <= 0? Integer.MAX_VALUE : maxThreads; It seems like if the user doesn't specify anything for FACET_THREADS, they wind up spawning as many threads as there are facet fields specified. Probably not a real problem given this list will be fairly small, but it seems more true to the old default behavior if it's changed to something like int maxThreads = Integer.parseInt(req.getParams().get(FacetParams.FACET_THREADS, "1")); Executor executor = maxThreads == 1 ? directExecutor : facetExecutor; maxThreads = maxThreads < 1 ? Integer.MAX_VALUE : maxThreads; Or am I seeing things that aren't there? > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Assignee: Erick Erickson >Priority: Minor > Labels: facet > Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, > SOLR-2548.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5212) bad qs and mm when using edismax for field with CJKBigramFilter
[ https://issues.apache.org/jira/browse/SOLR-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-5212. Resolution: Not A Problem Naomi: if you questions/confusion/problems using Solr, please ask on the solr-user mailing list and only file Bugs once there is confirmation of a problem in solr itself. In particular your initial report is confusing for a few reasons... 1) you mentioned the value of "qs" is being set based on the number bigrams -- however there isn't anything in your comments to suggest anything even remotely related to the "qs" param is coming into play here. "qs" specifies the query slop property of any phrase queries created due to explicit phrase queries in the input query string -- nohting in our example input or example debug output suggests any PhraseQueries are ever getting built. 2) the number you seem to be commenting on in each case is the minNrShouldMatch on each of hte top level BooleanQueries produced from your input -- since your configured mm is {{6<-1 6<90%}} the smallest minNrShouldMatch value that will every be programatically assigned is "6", but all of your example queries have less then 6 clauses, so instead the minNrShouldMatch used in each case is the total number of query clauses -- ie: in each case, wherey you have N "SHOULD" clauses in the final query, all N clauses must match. --- Please start a thread on the solr-user mailing list, providing all of the details you included in this issue, along with some specifics about what you expect/desire to have happen and how the actual behaior you are observing differs from those expecations. > bad qs and mm when using edismax for field with CJKBigramFilter > > > Key: SOLR-5212 > URL: https://issues.apache.org/jira/browse/SOLR-5212 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 4.4 >Reporter: Naomi Dushay >Priority: Critical > > When I have a field using CJKBigramFilter, a mysterious qs value (or what i > take as qs, because it shows as ~x after the first DisjunctionMaxQuery) > appears in my parsed query. The qs value that appears is the minimum of: > mm setting, number of bigrams in query string. > This makes no sense, from a retrieval standpoint. It could possibly make > sense to adjust the ps value, but certainly not the qs. Moreover, changing > the mm setting via an HTTP param can affect the qs, but sending in a qs > parameter has no effect on the qs in the parsed query. > If I use a field in qf that has only bigrams, then qs is set to MIN(original > mm setting, number of bigrams in query string) > arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说 >旧小说 is 3 chars, so 2 bigrams > debugQuery > {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 > {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 > (+DisjunctionMaxQuerycjk_bi_search:旧小 > cjk_bi_search:小说)~2))~0.01) ())/no_coord > +(((cjk_bi_search:旧小 > cjk_bi_search:小说)~2))~0.01 () > If I use a field in qf that has only unigrams, then qs is set to MIN(original > mm setting, number of unigrams in query string) > arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说 >旧小说 is 3 chars, so 3 bigrams > debugQuery > {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 > {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 > (+DisjunctionMaxQuerycjk_uni_search:旧 > cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord > +(((cjk_uni_search:旧 cjk_uni_search:小 > cjk_uni_search:说)~3))~0.01 () > If I use a field in qf that has both bigrams and unigrams, then qs is set to > MIN(original mm setting, number of bigrams + unigrams in query string). > arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说 >旧小说 is 3 chars, so 3 unigrams + 2 bigrams = 5 > debugQuery > {!qf=cjk_both_pub_search pf= pf2= > pf3=}旧小说 > {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说 > (+DisjunctionMaxQuerycjk_both_search:旧 > cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 > cjk_both_search:说)~5))~0.01) ())/no_coord > +(((cjk_both_search:旧 > cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 > cjk_both_search:说)~5))~0.01 () > I am running Solr 4.4. I have fields defined like so: > positionIncrementGap="1" autoGeneratePhraseQueries="false"> > > > > id="Traditional-Simplified"/> > id="Katakana-Hiragana"/> > > hiragana="true" katakana="true" hangul="true" outputUnigrams="true" /> > > > positionIncrementGap="1" autoGeneratePhraseQueries="false"> > > > > id="Traditional-Simplified"/> > id="Katakana-Hiragana"/> > > hiragana="true" katakana="true" hangul="true" outputUnigrams="false" /> >
Solr 5.0 roadmap added to wiki
I added a roadmap section to the Solr5.0 page. At this moment I can only think of one major thing we are planning for the 5.0 release, and I put it on there. https://wiki.apache.org/solr/Solr5.0 If that should be a separate page rather than part of the main 5.0 page, I'm perfectly OK with it being changed. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5212) bad qs and mm when using edismax for field with CJKBigramFilter
[ https://issues.apache.org/jira/browse/SOLR-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naomi Dushay updated SOLR-5212: --- Description: When I have a field using CJKBigramFilter, a mysterious qs value (or what i take as qs, because it shows as ~x after the first DisjunctionMaxQuery) appears in my parsed query. The qs value that appears is the minimum of: mm setting, number of bigrams in query string. This makes no sense, from a retrieval standpoint. It could possibly make sense to adjust the ps value, but certainly not the qs. Moreover, changing the mm setting via an HTTP param can affect the qs, but sending in a qs parameter has no effect on the qs in the parsed query. If I use a field in qf that has only bigrams, then qs is set to MIN(original mm setting, number of bigrams in query string) arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 2 bigrams debugQuery {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_bi_search:旧小 cjk_bi_search:小说)~2))~0.01) ())/no_coord +(((cjk_bi_search:旧小 cjk_bi_search:小说)~2))~0.01 () If I use a field in qf that has only unigrams, then qs is set to MIN(original mm setting, number of unigrams in query string) arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 3 bigrams debugQuery {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_uni_search:旧 cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord +(((cjk_uni_search:旧 cjk_uni_search:小 cjk_uni_search:说)~3))~0.01 () If I use a field in qf that has both bigrams and unigrams, then qs is set to MIN(original mm setting, number of bigrams + unigrams in query string). arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 3 unigrams + 2 bigrams = 5 debugQuery {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说 {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_both_search:旧 cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 cjk_both_search:说)~5))~0.01) ())/no_coord +(((cjk_both_search:旧 cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 cjk_both_search:说)~5))~0.01 () I am running Solr 4.4. I have fields defined like so: The request handler uses edismax: edismax *:* 6<-1 6<90% 1 0 was: When I have a field using CJKBigramFilter, a mysterious qs value (or what i take as qs, because it shows as ~x after the first DisjunctionMaxQuery) appears in my parsed query. The qs value that appears is the minimum of: mm setting, number of bigrams in query string. This makes no sense, from a retrieval standpoint. It could possibly make sense to adjust the ps value, but certainly not the qs. If I use a field in qf that has only bigrams, then qs is set to MIN(original mm setting, number of bigrams in query string) arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 2 bigrams debugQuery {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_bi_search:旧小 cjk_bi_search:小说)~2))~0.01) ())/no_coord +(((cjk_bi_search:旧小 cjk_bi_search:小说)~2))~0.01 () If I use a field in qf that has only unigrams, then qs is set to MIN(original mm setting, number of unigrams in query string) arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 3 bigrams debugQuery {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_uni_search:旧 cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord +(((cjk_uni_search:旧 cjk_uni_search:小 cjk_uni_search:说)~3))~0.01 () If I use a field in qf that has both bigrams and unigrams, then qs is set to MIN(original mm setting, number of bigrams + unigrams in query string). arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 3 unigrams + 2 bigrams = 5 debugQuery {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说 {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_both_search:旧 cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 cjk_both_search:说)~5))~0.01) ())/no_coord +(((cjk_both_search:旧 cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 cjk_both_search:说)~5))~0.01 () I am running Solr 4.4. I have fields defined like so:
[jira] [Updated] (SOLR-5212) bad qs and mm when using edismax for field with CJKBigramFilter
[ https://issues.apache.org/jira/browse/SOLR-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naomi Dushay updated SOLR-5212: --- Description: When I have a field using CJKBigramFilter, a mysterious qs value (or what i take as qs, because it shows as ~x after the first DisjunctionMaxQuery) appears in my parsed query. The qs value that appears is the minimum of: mm setting, number of bigrams in query string. This makes no sense, from a retrieval standpoint. It could possibly make sense to adjust the ps value, but certainly not the qs. If I use a field in qf that has only bigrams, then qs is set to MIN(original mm setting, number of bigrams in query string) arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 2 bigrams debugQuery {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_bi_search:旧小 cjk_bi_search:小说)~2))~0.01) ())/no_coord +(((cjk_bi_search:旧小 cjk_bi_search:小说)~2))~0.01 () If I use a field in qf that has only unigrams, then qs is set to MIN(original mm setting, number of unigrams in query string) arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 3 bigrams debugQuery {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_uni_search:旧 cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord +(((cjk_uni_search:旧 cjk_uni_search:小 cjk_uni_search:说)~3))~0.01 () If I use a field in qf that has both bigrams and unigrams, then qs is set to MIN(original mm setting, number of bigrams + unigrams in query string). arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 3 unigrams + 2 bigrams = 5 debugQuery {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说 {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_both_search:旧 cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 cjk_both_search:说)~5))~0.01) ())/no_coord +(((cjk_both_search:旧 cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 cjk_both_search:说)~5))~0.01 () I am running Solr 4.4. I have fields defined like so: The request handler uses edismax: edismax *:* 6<-1 6<90% 1 0 was: When I have a field using CJKBigramFilter, a mysterious qs value appears in my parsed query. The qs value that appears is the minimum of: mm setting, number of bigrams in query string. If I use a field in qf that has only bigrams, then qs is set to MIN(original mm setting, number of bigrams in query string) arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 2 bigrams debugQuery {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_bi_search:旧小 cjk_bi_search:小说)~2))~0.01) ())/no_coord +(((cjk_bi_search:旧小 cjk_bi_search:小说)~2))~0.01 () If I use a field in qf that has only unigrams, then qs is set to MIN(original mm setting, number of unigrams in query string) arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 3 bigrams debugQuery {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_uni_search:旧 cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord +(((cjk_uni_search:旧 cjk_uni_search:小 cjk_uni_search:说)~3))~0.01 () If I use a field in qf that has both bigrams and unigrams, then qs is set to MIN(original mm setting, number of bigrams + unigrams in query string). arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 3 unigrams + 2 bigrams = 5 debugQuery {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说 {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_both_search:旧 cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 cjk_both_search:说)~5))~0.01) ())/no_coord +(((cjk_both_search:旧 cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 cjk_both_search:说)~5))~0.01 () I am running Solr 4.4. I have fields defined like so: The request handler uses edismax: edismax *:* 6<-1 6<90% 1 0 > bad qs and mm when using edismax for field with CJKBigramFilter > > >
[jira] [Created] (SOLR-5212) bad qs and mm when using edismax for field with CJKBigramFilter
Naomi Dushay created SOLR-5212: -- Summary: bad qs and mm when using edismax for field with CJKBigramFilter Key: SOLR-5212 URL: https://issues.apache.org/jira/browse/SOLR-5212 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.4 Reporter: Naomi Dushay Priority: Critical When I have a field using CJKBigramFilter, a mysterious qs value appears in my parsed query. The qs value that appears is the minimum of: mm setting, number of bigrams in query string. If I use a field in qf that has only bigrams, then qs is set to MIN(original mm setting, number of bigrams in query string) arg sent in:q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 2 bigrams debugQuery {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 {!qf=cjk_bi_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_bi_search:旧小 cjk_bi_search:小说)~2))~0.01) ())/no_coord +(((cjk_bi_search:旧小 cjk_bi_search:小说)~2))~0.01 () If I use a field in qf that has only unigrams, then qs is set to MIN(original mm setting, number of unigrams in query string) arg sent in:q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 3 bigrams debugQuery {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 {!qf=cjk_uni_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_uni_search:旧 cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord +(((cjk_uni_search:旧 cjk_uni_search:小 cjk_uni_search:说)~3))~0.01 () If I use a field in qf that has both bigrams and unigrams, then qs is set to MIN(original mm setting, number of bigrams + unigrams in query string). arg sent in:q={!qf=cjk_both_search pf= pf2= pf3=}旧小说 旧小说 is 3 chars, so 3 unigrams + 2 bigrams = 5 debugQuery {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说 {!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说 (+DisjunctionMaxQuerycjk_both_search:旧 cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 cjk_both_search:说)~5))~0.01) ())/no_coord +(((cjk_both_search:旧 cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 cjk_both_search:说)~5))~0.01 () I am running Solr 4.4. I have fields defined like so: The request handler uses edismax: edismax *:* 6<-1 6<90% 1 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4909) Solr and IndexReader Re-opening on Replication Slave
[ https://issues.apache.org/jira/browse/SOLR-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Garski updated SOLR-4909: - Attachment: SOLR-4909_confirm_keys.patch I've updated the patch to include the initial directory opened via the cores indexReaderFactory & included a test that verifies the value of the core cache key's hash code after a commit. > Solr and IndexReader Re-opening on Replication Slave > > > Key: SOLR-4909 > URL: https://issues.apache.org/jira/browse/SOLR-4909 > Project: Solr > Issue Type: Improvement > Components: replication (java), search >Affects Versions: 4.3 >Reporter: Michael Garski > Fix For: 4.5, 5.0 > > Attachments: SOLR-4909_confirm_keys.patch, SOLR-4909-demo.patch, > SOLR-4909_fix.patch, SOLR-4909.patch, SOLR-4909_v2.patch, SOLR-4909_v3.patch > > > I've been experimenting with caching filter data per segment in Solr using a > CachingWrapperFilter & FilteredQuery within a custom query parser (as > suggested by [~yo...@apache.org] in SOLR-3763) and encountered situations > where the value of getCoreCacheKey() on the AtomicReader for each segment can > change for a given segment on disk when the searcher is reopened. As > CachingWrapperFilter uses the value of the segment's getCoreCacheKey() as the > key in the cache, there are situations where the data cached on that segment > is not reused when the segment on disk is still part of the index. This > affects the Lucene field cache and field value caches as well as they are > cached per segment. > When Solr first starts it opens the searcher's underlying DirectoryReader in > StandardIndexReaderFactory.newReader by calling > DirectoryReader.open(indexDir, termInfosIndexDivisor), and the reader is > subsequently reopened in SolrCore.openNewSearcher by calling > DirectoryReader.openIfChanged(currentReader, writer.get(), true). The act of > reopening the reader with the writer when it was first opened without a > writer results in the value of getCoreCacheKey() changing on each of the > segments even though some of the segments have not changed. Depending on the > role of the Solr server, this has different effects: > * On a SolrCloud node or free-standing index and search server the segment > cache is invalidated during the first DirectoryReader reopen - subsequent > reopens use the same IndexWriter instance and as such the value of > getCoreCacheKey() on each segment does not change so the cache is retained. > * For a master-slave replication set up the segment cache invalidation occurs > on the slave during every replication as the index is reopened using a new > IndexWriter instance which results in the value of getCoreCacheKey() changing > on each segment when the DirectoryReader is reopened using a different > IndexWriter instance. > I can think of a few approaches to alter the re-opening behavior to allow > reuse of segment level caches in both cases, and I'd like to get some input > on other ideas before digging in: > * To change the cloud node/standalone first commit issue it might be possible > to create the UpdateHandler and IndexWriter before the DirectoryReader, and > use the writer to open the reader. There is a comment in the SolrCore > constructor by [~yo...@apache.org] that the searcher should be opened before > the update handler so that may not be an acceptable approach. > * To change the behavior of a slave in a replication set up, one solution > would be to not open a writer from the SnapPuller when the new index is > retrieved if the core is enabled as a slave only. The writer is needed on a > server configured as a master & slave that is functioning as a replication > repeater so downstream slaves can see the changes in the index and retrieve > them. > I'll attach a unit test that demonstrates the behavior of reopening the > DirectoryReader and it's effects on the value of getCoreCacheKey. My > assumption is that the behavior of Lucene during the various reader reopen > operations is correct and that the changes are necessary on the Solr side of > things. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5206) OpenExchangeRatesOrgProvider never refreshes the rates
[ https://issues.apache.org/jira/browse/SOLR-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757113#comment-13757113 ] ASF subversion and git services commented on SOLR-5206: --- Commit 1519858 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1519858 ] SOLR-5206: Fixed OpenExchangeRatesOrgProvider to use refreshInterval correctly > OpenExchangeRatesOrgProvider never refreshes the rates > -- > > Key: SOLR-5206 > URL: https://issues.apache.org/jira/browse/SOLR-5206 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.4 >Reporter: Catalin >Priority: Critical > Fix For: 4.4 > > Attachments: fixRefresh.patch, SOLR-5206.patch > > > The OpenExchangeRatesOrgProvider never reloads the rates after the initial > load, no matter what refreshInterval is set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5206) OpenExchangeRatesOrgProvider never refreshes the rates
[ https://issues.apache.org/jira/browse/SOLR-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-5206. Resolution: Fixed Fix Version/s: (was: 4.4) 5.0 4.5 Assignee: Hoss Man Thanks for reporting this Catalin. > OpenExchangeRatesOrgProvider never refreshes the rates > -- > > Key: SOLR-5206 > URL: https://issues.apache.org/jira/browse/SOLR-5206 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.4 >Reporter: Catalin >Assignee: Hoss Man >Priority: Critical > Fix For: 4.5, 5.0 > > Attachments: fixRefresh.patch, SOLR-5206.patch > > > The OpenExchangeRatesOrgProvider never reloads the rates after the initial > load, no matter what refreshInterval is set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5206) OpenExchangeRatesOrgProvider never refreshes the rates
[ https://issues.apache.org/jira/browse/SOLR-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757137#comment-13757137 ] ASF subversion and git services commented on SOLR-5206: --- Commit 1519865 from hoss...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1519865 ] SOLR-5206: Fixed OpenExchangeRatesOrgProvider to use refreshInterval correctly (merge r1519858) > OpenExchangeRatesOrgProvider never refreshes the rates > -- > > Key: SOLR-5206 > URL: https://issues.apache.org/jira/browse/SOLR-5206 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.4 >Reporter: Catalin >Priority: Critical > Fix For: 4.4 > > Attachments: fixRefresh.patch, SOLR-5206.patch > > > The OpenExchangeRatesOrgProvider never reloads the rates after the initial > load, no matter what refreshInterval is set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
[ https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757111#comment-13757111 ] Robert Muir commented on LUCENE-5188: - nice idea! > Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors > --- > > Key: LUCENE-5188 > URL: https://issues.apache.org/jira/browse/LUCENE-5188 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5188.patch > > > The way CompressingStoredFieldsFormat works is that it first decompresses > data and then consults the StoredFieldVisitor. This is a bit wasteful in case > documents are big and only the first field of a document is of interest so > maybe we could decompress and consult the StoredFieldVicitor in a more > streaming fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757109#comment-13757109 ] Dawid Weiss commented on LUCENE-5197: - > And from a big O perspective, this might be just fine. He, he. From a big-O perspective the cost of RUE vs. custom code is negligible ;) Anyway, fine -- I still think it'd be a nice addition to RUE to allow selective counting (to exclude certain fields from the equation) and it'd be a perfect use case to apply here. But it can be used in other places (like in tests to count static memory usage held by a class, etc.). > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
[ https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-5188: - Attachment: LUCENE-5188.patch Here is a patch that slices large chunks (>= twice the configured chunk size) into several LZ4 blocks (of chunkSize bytes each). The LZ4 blocks will be decompressed as needed so that you don't end up decompressing everything if you only need the first field of your document. A nice side-effect of this patch is that it reduces memory pressure as well when working with big documents (LUCENE-4955): since big documents are sliced into fixed-size blocks, it is not needed anymore to allocate a byte[] of the size of the document (potentially several MB) to decompress it. > Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors > --- > > Key: LUCENE-5188 > URL: https://issues.apache.org/jira/browse/LUCENE-5188 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5188.patch > > > The way CompressingStoredFieldsFormat works is that it first decompresses > data and then consults the StoredFieldVisitor. This is a bit wasteful in case > documents are big and only the first field of a document is of interest so > maybe we could decompress and consult the StoredFieldVicitor in a more > streaming fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4465) Configurable Collectors
[ https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757041#comment-13757041 ] Kranti Parisa commented on SOLR-4465: - Anyone applied this patch on 4.4 branch? > Configurable Collectors > --- > > Key: SOLR-4465 > URL: https://issues.apache.org/jira/browse/SOLR-4465 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 4.1 >Reporter: Joel Bernstein > Fix For: 4.5, 5.0 > > Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, > SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, > SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, > SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, > SOLR-4465.patch, SOLR-4465.patch > > > This ticket provides a patch to add pluggable collectors to Solr. This patch > was generated and tested with Solr 4.1. > This is how the patch functions: > Collectors are plugged into Solr in the solconfig.xml using the new > collectorFactory element. For example: > > > The elements above define two collector factories. The first one is the > "default" collectorFactory. The class attribute points to > org.apache.solr.handler.component.CollectorFactory, which implements logic > that returns the default TopScoreDocCollector and TopFieldCollector. > To create your own collectorFactory you must subclass the default > CollectorFactory and at a minimum override the getCollector method to return > your new collector. > The parameter "cl" turns on pluggable collectors: > cl=true > If cl is not in the parameters, Solr will automatically use the default > collectorFactory. > *Pluggable Doclist Sorting With the Docs Collector* > You can specify two types of pluggable collectors. The first type is the docs > collector. For example: > cl.docs= > The above param points to a named collectorFactory in the solrconfig.xml to > construct the collector. The docs collectorFactorys must return a collector > that extends the TopDocsCollector base class. Docs collectors are responsible > for collecting the doclist. > You can specify only one docs collector per query. > You can pass parameters to the docs collector using local params syntax. For > example: > cl.docs=\{! sort=mycustomesort\}mycollector > If cl=true and a docs collector is not specified, Solr will use the default > collectorFactory to create the docs collector. > *Pluggable Custom Analytics With Delegating Collectors* > You can also specify any number of custom analytic collectors with the > "cl.analytic" parameter. Analytic collectors are designed to collect > something else besides the doclist. Typically this would be some type of > custom analytic. For example: > cl.analytic=sum > The parameter above specifies a analytic collector named sum. Like the docs > collectors, "sum" points to a named collectorFactory in the solrconfig.xml. > You can specificy any number of analytic collectors by adding additional > cl.analytic parameters. > Analytic collector factories must return Collector instances that extend > DelegatingCollector. > A sample analytic collector is provided in the patch through the > org.apache.solr.handler.component.SumCollectorFactory. > This collectorFactory provides a very simple DelegatingCollector that groups > by a field and sums a column of floats. The sum collector is not designed to > be a fully functional sum function but to be a proof of concept for pluggable > analytics through delegating collectors. > You can send parameters to analytic collectors with solr local param syntax. > For example: > cl.analytic=\{! id=1 groupby=field1 column=field2\}sum > The "id" parameter is mandatory for analytic collectors and is used to > identify the output from the collector. In this example the "groupby" and > "column" params tell the sum collector which field to group by and sum. > Analytic collectors are passed a reference to the ResponseBuilder and can > place maps with analytic output directory into the SolrQueryResponse with the > add() method. > Maps that are placed in the SolrQueryResponse are automatically added to the > outgoing response. The response will include a list named cl.analytic., > where id is specified in the local param. > *Distributed Search* > The CollectorFactory also has a method called merge(). This method aggregates > the results from each of the shards during distributed search. The "default" > CollectoryFactory implements the default merge logic for merging documents > from each shard. If you define a different docs collector you can override > the default merge method to merge documents in accordance with how they are > collected at the shard level. > With analytic collectors, you'll need to override the merge met
[jira] [Commented] (SOLR-4465) Configurable Collectors
[ https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757058#comment-13757058 ] Joel Bernstein commented on SOLR-4465: -- This ticket has been split into smaller tickets with a different design. See the related issues for more info. > Configurable Collectors > --- > > Key: SOLR-4465 > URL: https://issues.apache.org/jira/browse/SOLR-4465 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 4.1 >Reporter: Joel Bernstein > Fix For: 4.5, 5.0 > > Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, > SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, > SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, > SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, > SOLR-4465.patch, SOLR-4465.patch > > > This ticket provides a patch to add pluggable collectors to Solr. This patch > was generated and tested with Solr 4.1. > This is how the patch functions: > Collectors are plugged into Solr in the solconfig.xml using the new > collectorFactory element. For example: > > > The elements above define two collector factories. The first one is the > "default" collectorFactory. The class attribute points to > org.apache.solr.handler.component.CollectorFactory, which implements logic > that returns the default TopScoreDocCollector and TopFieldCollector. > To create your own collectorFactory you must subclass the default > CollectorFactory and at a minimum override the getCollector method to return > your new collector. > The parameter "cl" turns on pluggable collectors: > cl=true > If cl is not in the parameters, Solr will automatically use the default > collectorFactory. > *Pluggable Doclist Sorting With the Docs Collector* > You can specify two types of pluggable collectors. The first type is the docs > collector. For example: > cl.docs= > The above param points to a named collectorFactory in the solrconfig.xml to > construct the collector. The docs collectorFactorys must return a collector > that extends the TopDocsCollector base class. Docs collectors are responsible > for collecting the doclist. > You can specify only one docs collector per query. > You can pass parameters to the docs collector using local params syntax. For > example: > cl.docs=\{! sort=mycustomesort\}mycollector > If cl=true and a docs collector is not specified, Solr will use the default > collectorFactory to create the docs collector. > *Pluggable Custom Analytics With Delegating Collectors* > You can also specify any number of custom analytic collectors with the > "cl.analytic" parameter. Analytic collectors are designed to collect > something else besides the doclist. Typically this would be some type of > custom analytic. For example: > cl.analytic=sum > The parameter above specifies a analytic collector named sum. Like the docs > collectors, "sum" points to a named collectorFactory in the solrconfig.xml. > You can specificy any number of analytic collectors by adding additional > cl.analytic parameters. > Analytic collector factories must return Collector instances that extend > DelegatingCollector. > A sample analytic collector is provided in the patch through the > org.apache.solr.handler.component.SumCollectorFactory. > This collectorFactory provides a very simple DelegatingCollector that groups > by a field and sums a column of floats. The sum collector is not designed to > be a fully functional sum function but to be a proof of concept for pluggable > analytics through delegating collectors. > You can send parameters to analytic collectors with solr local param syntax. > For example: > cl.analytic=\{! id=1 groupby=field1 column=field2\}sum > The "id" parameter is mandatory for analytic collectors and is used to > identify the output from the collector. In this example the "groupby" and > "column" params tell the sum collector which field to group by and sum. > Analytic collectors are passed a reference to the ResponseBuilder and can > place maps with analytic output directory into the SolrQueryResponse with the > add() method. > Maps that are placed in the SolrQueryResponse are automatically added to the > outgoing response. The response will include a list named cl.analytic., > where id is specified in the local param. > *Distributed Search* > The CollectorFactory also has a method called merge(). This method aggregates > the results from each of the shards during distributed search. The "default" > CollectoryFactory implements the default merge logic for merging documents > from each shard. If you define a different docs collector you can override > the default merge method to merge documents in accordance with how they are > collected at the shard
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757021#comment-13757021 ] Areek Zillur commented on LUCENE-5197: -- >Can you elaborate on this? Where was it incorrect? - by incorrect I meant walking up undesirable member variables in a an object. In hindsight, I would say that was a bad choice of wording. I think the correct word would be inflexible. I do like the RamUsageEstimator "aware"-ness idea. I think that along with some kind of filtering mechanism in RamUsageEstimator would be perfect. > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans
[ https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756970#comment-13756970 ] Yonik Seeley commented on SOLR-5211: bq. I'm observing an interesting behavior. If I delete one parent doc, ToParentBJQ doesn't stick orphan children to the next parent, but it happens after optimize! This seems fine - deleting a parent doc and not the children results in undefined behavior. bq. it seems ToParentBJQ doesn't mind deletes in parents filter. Right - that seems fine too. > updating parent as childless makes old children orphans > --- > > Key: SOLR-5211 > URL: https://issues.apache.org/jira/browse/SOLR-5211 > Project: Solr > Issue Type: Sub-task > Components: update >Affects Versions: 4.5, 5.0 >Reporter: Mikhail Khludnev > Fix For: 4.5, 5.0 > > > if I have parent with children in the index, I can send update omitting > children. as a result old children become orphaned. > I suppose separate \_root_ fields makes much trouble. I propose to extend > notion of uniqueKey, and let it spans across blocks that makes updates > unambiguous. > WDYT? Do you like to see a test proves this issue? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757009#comment-13757009 ] Robert Muir commented on LUCENE-5197: - {quote} What you're doing is actually a skewed view – it measures certain fields selectively. {quote} And from a big O perspective, this might be just fine. The way i see it, this would be a way to see how much RAM the lucene segment needs for someone's content. Things like the terms index and docvalues fields grow according to the content in different ways: e.g. how large/how many terms you have, how they share prefixes, how many documents you have, and so on. The "skew" is just boring constants pulled out of the equation, even if its 2KB or so, its not interesting at all since its just a constant cost independent of the content. > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756884#comment-13756884 ] Robert Muir commented on LUCENE-5189: - It really doesn't work: its definitely a blocker for me! This leaves the general api (FieldInfo.attributes and SegmentInfo.attributes) broken for codecs, and only hacks a specific implementation that uses them. With or without the current boolean, if a numeric docvalues impl puts something in FieldInfo.attributes during an update, it will go into a black hole, because FieldInfos is write-once per-segment (and not per-commit). Same goes with SegmentInfo.attributes. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5206) OpenExchangeRatesOrgProvider never refreshes the rates
[ https://issues.apache.org/jira/browse/SOLR-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5206: --- Attachment: SOLR-5206.patch Bahg! That's terrible .. thanks for reporting this and identifying the fix Catalin. I've attached a patch including Catalin's fix (along with a bit of refactoring the match) and a fix so that the "reload" test actually tests something useful. doing more testing now. > OpenExchangeRatesOrgProvider never refreshes the rates > -- > > Key: SOLR-5206 > URL: https://issues.apache.org/jira/browse/SOLR-5206 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.4 >Reporter: Catalin >Priority: Critical > Fix For: 4.4 > > Attachments: fixRefresh.patch, SOLR-5206.patch > > > The OpenExchangeRatesOrgProvider never reloads the rates after the initial > load, no matter what refreshInterval is set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans
[ https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756947#comment-13756947 ] Mikhail Khludnev commented on SOLR-5211: I'm observing an interesting behavior. If I delete one parent doc, ToParentBJQ doesn't stick orphan children to the next parent, but it happens after optimize! it seems ToParentBJQ doesn't mind deletes in parents filter. Isn't it a separate LUCENE issue? > updating parent as childless makes old children orphans > --- > > Key: SOLR-5211 > URL: https://issues.apache.org/jira/browse/SOLR-5211 > Project: Solr > Issue Type: Sub-task > Components: update >Affects Versions: 4.5, 5.0 >Reporter: Mikhail Khludnev > Fix For: 4.5, 5.0 > > > if I have parent with children in the index, I can send update omitting > children. as a result old children become orphaned. > I suppose separate \_root_ fields makes much trouble. I propose to extend > notion of uniqueKey, and let it spans across blocks that makes updates > unambiguous. > WDYT? Do you like to see a test proves this issue? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756905#comment-13756905 ] Areek Zillur edited comment on LUCENE-5197 at 9/3/13 6:50 PM: -- Attached a patch removing redundant null checks as Adrien suggested. First of all wanted to thank everybody for their valuable inputs. The reasons why I choose to have an explicit method to calculate the heap size rather than using the RAMUsageEstimator already has surfaced in the discussion above (slow for many objects, incorrect for some type of objects). It would be nice to have an API to call from (solr Admin for example) to estimate the current index heap size. I do understand the concern regarding the "out of sync with implementation changes" concern, I mainly took into account the codecs for the size estimation only such that higher level APIs need not to implement the method. The suggested modified RAMUsageEstimator sounds nice, but as far as I understand would not the logic in implementing the "excluded objects" change just as much? while being more implicit than the proposed solution above? was (Author: areek): Attached a patch removing redundant null checks as Adrien suggested. First of all wanted to thank everybody for their valuable inputs. The reasons why I choose to have an explicit method to calculate the heap size rather than using the RAMUsageEstimator already has surfaced in the discussion above (slow for many objects, incorrect for some type of objects). It would be nice to have an API to call from (solr Admin for example) to estimate the current index heap size. I do understand the concern regarding the "out of sync with implementation changes" concern, I mainly took into account the codecs for the size estimation only such that higher level APIs need not to implement the method. The suggested modified RAMUsageEstimator sounds nice, but as far as I understand would not the logic in implementing the "excluded objects" change just as much? while being more implicit than the proposed solution above? > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated LUCENE-5197: - Attachment: LUCENE-5197.patch Attached a patch removing redundant null checks as Adrien suggested. First of all wanted to thank everybody for their valuable inputs. The reasons why I choose to have an explicit method to calculate the heap size rather than using the RAMUsageEstimator already has surfaced in the discussion above (slow for many objects, incorrect for some type of objects). It would be nice to have an API to call from (solr Admin for example) to estimate the current index heap size. I do understand the concern regarding the "out of sync with implementation changes" concern, I mainly took into account the codecs for the size estimation only such that higher level APIs need not to implement the method. The suggested modified RAMUsageEstimator sounds nice, but as far as I understand would not the logic in implementing the "excluded objects" change just as much? while being more implicit than the proposed solution above? > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756896#comment-13756896 ] Robert Muir commented on LUCENE-5189: - By the way: the "general" issue is that for updates, its unfortunately not enough to concern ourselves with data, we have to worry about metadata too: I see at least 4 problems (and i have not thought about it completely): # FieldInfo.attributes: these "writes" by the NumericDocValues impl will be completely discarded during update, because its per-segment, not per-commit. # SegmentInfo.attributes: same as the above # Field doesnt exist in FieldInfo at all: (because the segment the update applies to happens to have no values for the field) # Field exists in FieldInfo, but is incomplete: (because the segment the update applies to, had say a stored-only or stored+indexed value for the field, but no dv one). PerFieldDVF is just one implementation that happens to use #1. Fixing it is fixing the symptom, thats why I say we really need to instead fix the disease, or things will get very ugly. The only reasons you dont see more problems with #1 and #2, is that currently its not used very much (only by PerField and back-compat). If we had more codecs exercising the APIs, you would be seeing these problems already. A perfectly good solution would be to remove these APIs completely for public use (which would solve #1 and #2). PerField(PF/DVF) could write its own .per file instead. Back compat cruft could then use these now-internal-only-APIs (and it wont matter since they dont support updates), or we could implement their hacks in another way. But this still leaves issues like #3 and #4. Adding a boolean 'isFieldUpdate' doesn't really solve anything, and it totally breaks the whole concept of the codec being unaware of updates. It is the wrong direction. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756931#comment-13756931 ] Shai Erera commented on LUCENE-5189: OK, so now I get your point. The problem is that we pass to Codec FI.attributes with say an attribute 'foo=bar'. The Codec, unaware that this is an update, looks at the given numericFields and decides to encode them using method "bar2", so it encodes into the attributes 'foo=bar2', but those attributes get lost because they're not rewritten to FIS. Do I understand correctly? Of course, we could say that since the Codec has to peek into SWS.isFieldUpdate, thereby making it updates-aware, it should not encode stuff in a different format, but SWS.isFieldUpdate is not enough to enforce that. I don't think that gen'ing FIS solves the problem of obtaining the right DVF in the first place. Sure, after we do that, the Codec can put whatever attributes that it wants, they will be recorded in the new FIS.gen. But maybe we can solve these two problems by gen'ing FIS: * Add FieldInfo.dvGen. The Codec will receive the FieldInfos with their dvGen bumped up. * Codec can choose to look at FI.dvGen and pull the right DVF e.g. like PerField does. ** Or it can choose to completely ignore it, and always write udpates using the new format. * Codec is free to record whatever attributes it wants on this FI. Since we gen FIS, they will be recorded and used by the reader. What do you think? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756924#comment-13756924 ] Dawid Weiss commented on LUCENE-5197: - > incorrect for some type of objects Can you elaborate on this? Where was it incorrect? > would not the logic in implementing the "excluded objects" change just as > much? I think it'd be much simpler to exclude the type of objects we know spin out of control - loggers, thread locals, thread references and leave the remaining stuff accounted. After all if it's referenced it does take space on the heap so the figure is correct. What you're doing is actually a skewed view -- it measures certain fields selectively. I was also thinking in terms of tests -- one can create a sanity test which will create a small index, measure its RAM usage and then fail if it seems "too large" (because a thread local or some other field was accounted for). I don't see a way to do such a sanity check for per-class handcrafted code (unless you want to test against RamUsageEstimator, which would duplicate the effort anyway). Let me stress again that I'm not against your patch, I just have a gut feeling it'll be a recurring theme of new issues in Jira. Yet another idea sprang to my mind -- perhaps if speed is an issue and certain types of objects can efficiently calculate their RAM usage (FSTs), we could make RamUsageEstimator "aware" of such objects by introducing an interface like: {code} interface IKnowMySize { public long /* super. :) */ sizeMe(); } {code} Jokes aside, this could be implemented for classes which indeed have a complex structure and the rest (arrays, etc.) could be counted efficiently by walking the reference graph. Just a thought. > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch, LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756885#comment-13756885 ] Dawid Weiss commented on LUCENE-5197: - > Also it totally breaks down if it hits certain objects like a ThreadLocal. That's why I suggested a visitor pattern, you could tune it not to enter such variables. Also note that if there are lots of objects then the object representation overhead itself will be significant and will vary depending on each VM, its settings, etc; a specific snippet of code to estimate each object's memory use may be faster but it'll be either a nightmare to maintain or it'll be a very rough approximate. I think it'd be better to try to make RUE faster/ more flexible. Like Shai mentioned -- if it's not a performance-critical API then the difference will not be at all significant. > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756863#comment-13756863 ] Shai Erera commented on LUCENE-5189: I don't understand the problem that you raise. Until then, I think that SWS.isFieldUpdate is fine. It works, it's simple, and most importantly, it allows me to move forward. Let's discuss how to improve it even further, but I don't think this is a blocker. We can always improve that later on. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5211) updating parent as childless makes old children orphans
Mikhail Khludnev created SOLR-5211: -- Summary: updating parent as childless makes old children orphans Key: SOLR-5211 URL: https://issues.apache.org/jira/browse/SOLR-5211 Project: Solr Issue Type: Sub-task Components: update Affects Versions: 4.5, 5.0 Reporter: Mikhail Khludnev if I have parent with children in the index, I can send update omitting children. as a result old children become orphaned. I suppose separate \_root_ fields makes much trouble. I propose to extend notion of uniqueKey, and let it spans across blocks that makes updates unambiguous. WDYT? Do you like to see a test proves this issue? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API
[ https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756859#comment-13756859 ] Tim Vaillancourt commented on SOLR-5208: Thanks Erick. I agree on RELOAD - I'm not sure if that makes sense or not either, but thought of it randomly while listing those commands :). I'll make a new JIRA to discuss if that is a good idea or not. Tim > Support for the setting of core.properties key/values at create-time on > Collections API > --- > > Key: SOLR-5208 > URL: https://issues.apache.org/jira/browse/SOLR-5208 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Tim Vaillancourt >Assignee: Erick Erickson > Attachments: SOLR-5208.patch > > > As discussed on e-mail thread "Sharing SolrCloud collection configs > w/overrides" > (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides), > Erick brought up a neat solution using HTTP params at create-time for the > Collection API. > Essentially, this request is for a functionality that allows the setting of > variables (core.properties) on Collections API CREATE command. > Erick's idea: > "Maybe it's as simple as allowing more params for creation like > collection.coreName where each param of the form collection.blah=blort > gets an entry in the properties file blah=blort?..." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate
[ https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756861#comment-13756861 ] Mark Miller commented on SOLR-5209: --- Right, but the sub commands are just the wrapper calls - except the shard commands - those are new. Th delete core one is mostly about cleanup ini remember right. The problem is, the overseer and zk do not own the state. The individual cores do basically. Mostly that's due to historical stuff. We intend to change that, but it's no small feat. Until that is done, I think this is much trickier to get right than it looks. > cores/action=UNLOAD of last replica removes shard from clusterstate > --- > > Key: SOLR-5209 > URL: https://issues.apache.org/jira/browse/SOLR-5209 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Christine Poerschke >Assignee: Mark Miller > Attachments: SOLR-5209.patch > > > The problem we saw was that unloading of an only replica of a shard deleted > that shard's info from the clusterstate. Once it was gone then there was no > easy way to re-create the shard (other than dropping and re-creating the > whole collection's state). > This seems like a bug? > Overseer.java around line 600 has a comment and commented out code: > // TODO TODO TODO!!! if there are no replicas left for the slice, and the > slice has no hash range, remove it > // if (newReplicas.size() == 0 && slice.getRange() == null) { > // if there are no replicas left for the slice remove it -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3852) Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere in ZK tree
[ https://issues.apache.org/jira/browse/SOLR-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-3852. Resolution: Fixed Fix Version/s: 5.0 4.5 Assignee: Hoss Man > Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere > in ZK tree > > > Key: SOLR-3852 > URL: https://issues.apache.org/jira/browse/SOLR-3852 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0-BETA > Environment: Tomcat 6, external zookeeper-3.3.5 >Reporter: Vadim Kisselmann >Assignee: Hoss Man > Fix For: 4.5, 5.0 > > Attachments: SOLR-3852.patch > > > Original bug description indicated that when using Solr with embedded ZK > everything was fine, but with an external ZK you'd get an > ArrayIndexOutOfBoundsException. > Crux of the problem is some bad assumptions about any ZK node containing data > -- the ZookeeperInfoServlet powering the tree view of the Cloud Admin UI > assumed that any data would be utf8 text. > If you are using extenral ZK, and other systems are writing data into ZK, > then you are more likely to see this problem, because those other systems > might be writing binary data in to ZK nodes -- if you are using ZK embedded > in solr, or using solr with it's own private (external) ZK instance, then you > would only see this problem if you explicitly put binary files into solr > configs and upconfig them into ZK. > > One workarround for people encountering this problem when using Solr with a > ZK instance shared by other tools is to make sure you use a "chroot" patch > when pointing Solr at ZK, so that it won't know about any other paths in your > ZK tree that might have binary data... > https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot > If you are having this problem because you put binay files into your own > config dir (ie: images for velocity or something like that) then there is no > straight forward workarround. > Example stack trace for this bug... > {noformat} > 43242 [qtp965223859-14] WARN org.eclipse.jetty.servlet.ServletHandler > /solr/zookeeper > java.lang.ArrayIndexOutOfBoundsException: 213 > at > org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:620) > at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:168) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:303) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339) > ... > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:228) > at > org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:106) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3852) Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere in ZK tree
[ https://issues.apache.org/jira/browse/SOLR-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756855#comment-13756855 ] ASF subversion and git services commented on SOLR-3852: --- Commit 1519779 from hoss...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1519779 ] SOLR-3852: Fixed ZookeeperInfoServlet so that the SolrCloud Admin UI pages will work even if ZK contains nodes with data which are not utf8 text (merge r1519763) > Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere > in ZK tree > > > Key: SOLR-3852 > URL: https://issues.apache.org/jira/browse/SOLR-3852 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0-BETA > Environment: Tomcat 6, external zookeeper-3.3.5 >Reporter: Vadim Kisselmann > Attachments: SOLR-3852.patch > > > Original bug description indicated that when using Solr with embedded ZK > everything was fine, but with an external ZK you'd get an > ArrayIndexOutOfBoundsException. > Crux of the problem is some bad assumptions about any ZK node containing data > -- the ZookeeperInfoServlet powering the tree view of the Cloud Admin UI > assumed that any data would be utf8 text. > If you are using extenral ZK, and other systems are writing data into ZK, > then you are more likely to see this problem, because those other systems > might be writing binary data in to ZK nodes -- if you are using ZK embedded > in solr, or using solr with it's own private (external) ZK instance, then you > would only see this problem if you explicitly put binary files into solr > configs and upconfig them into ZK. > > One workarround for people encountering this problem when using Solr with a > ZK instance shared by other tools is to make sure you use a "chroot" patch > when pointing Solr at ZK, so that it won't know about any other paths in your > ZK tree that might have binary data... > https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot > If you are having this problem because you put binay files into your own > config dir (ie: images for velocity or something like that) then there is no > straight forward workarround. > Example stack trace for this bug... > {noformat} > 43242 [qtp965223859-14] WARN org.eclipse.jetty.servlet.ServletHandler > /solr/zookeeper > java.lang.ArrayIndexOutOfBoundsException: 213 > at > org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:620) > at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:168) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:303) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339) > ... > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:228) > at > org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:106) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API
[ https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756831#comment-13756831 ] Tim Vaillancourt edited comment on SOLR-5208 at 9/3/13 5:50 PM: Shalin: good point on the SPLITSHARD. To be consistent, are there any other places this is needed? * Core API (already there). * Collections API "CREATE": discussed here. * Collections API "SPLITSHARD": Thanks Shalin!. * Collections API "CREATEALIAS"(?): An alias shouldn't have it's own properties AFAIK, but calling that out. * Collections API "RELOAD"(?): I'm not sure if the Core API functionality does this, but adding this to RELOAD would allow changing of properties post-create-time. Without this you'd need to DELETE/CREATE to change properties, or bypass. Tim was (Author: tvaillancourt): Shalin: good point on the SPLITSHARD. To be consistent, are there any other places this is needed? * Core API (already there). * Collections API CREATE: discussed here. * Collections API SPLITSHARD: Thanks Shalin!. * Collections API CREATEALIAS(?): An alias shouldn't have it's own properties AFAIK, but calling that out. * Collecitons API RELOAD(?): I'm not sure if the Core API functionality does this, but adding this to RELOAD would allow changing of properties post-create-time. Without this you'd need to DELETE/CREATE to change properties, or bypass. Tim > Support for the setting of core.properties key/values at create-time on > Collections API > --- > > Key: SOLR-5208 > URL: https://issues.apache.org/jira/browse/SOLR-5208 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Tim Vaillancourt >Assignee: Erick Erickson > Attachments: SOLR-5208.patch > > > As discussed on e-mail thread "Sharing SolrCloud collection configs > w/overrides" > (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides), > Erick brought up a neat solution using HTTP params at create-time for the > Collection API. > Essentially, this request is for a functionality that allows the setting of > variables (core.properties) on Collections API CREATE command. > Erick's idea: > "Maybe it's as simple as allowing more params for creation like > collection.coreName where each param of the form collection.blah=blort > gets an entry in the properties file blah=blort?..." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API
[ https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756831#comment-13756831 ] Tim Vaillancourt edited comment on SOLR-5208 at 9/3/13 5:51 PM: Thanks for the patch Erick! Shalin: good point on the SPLITSHARD. To be consistent, are there any other places this is needed? * Core API (already there). * Collections API "CREATE": discussed here. * Collections API "SPLITSHARD": Thanks Shalin!. * Collections API "CREATEALIAS"(?): An alias shouldn't have it's own properties AFAIK, but calling that out. * Collections API "RELOAD"(?): I'm not sure if the Core API functionality does this, but adding this to RELOAD would allow changing of properties post-create-time. Without this you'd need to DELETE/CREATE to change properties, or bypass. Tim was (Author: tvaillancourt): Shalin: good point on the SPLITSHARD. To be consistent, are there any other places this is needed? * Core API (already there). * Collections API "CREATE": discussed here. * Collections API "SPLITSHARD": Thanks Shalin!. * Collections API "CREATEALIAS"(?): An alias shouldn't have it's own properties AFAIK, but calling that out. * Collections API "RELOAD"(?): I'm not sure if the Core API functionality does this, but adding this to RELOAD would allow changing of properties post-create-time. Without this you'd need to DELETE/CREATE to change properties, or bypass. Tim > Support for the setting of core.properties key/values at create-time on > Collections API > --- > > Key: SOLR-5208 > URL: https://issues.apache.org/jira/browse/SOLR-5208 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Tim Vaillancourt >Assignee: Erick Erickson > Attachments: SOLR-5208.patch > > > As discussed on e-mail thread "Sharing SolrCloud collection configs > w/overrides" > (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides), > Erick brought up a neat solution using HTTP params at create-time for the > Collection API. > Essentially, this request is for a functionality that allows the setting of > variables (core.properties) on Collections API CREATE command. > Erick's idea: > "Maybe it's as simple as allowing more params for creation like > collection.coreName where each param of the form collection.blah=blort > gets an entry in the properties file blah=blort?..." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API
[ https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756831#comment-13756831 ] Tim Vaillancourt commented on SOLR-5208: Shalin: good point on the SPLITSHARD. To be consistent, are there any other places this is needed? * Core API (already there). * Collections API CREATE: discussed here. * Collections API SPLITSHARD: Thanks Shalin!. * Collections API CREATEALIAS(?): An alias shouldn't have it's own properties AFAIK, but calling that out. * Collecitons API RELOAD(?): I'm not sure if the Core API functionality does this, but adding this to RELOAD would allow changing of properties post-create-time. Without this you'd need to DELETE/CREATE to change properties, or bypass. Tim > Support for the setting of core.properties key/values at create-time on > Collections API > --- > > Key: SOLR-5208 > URL: https://issues.apache.org/jira/browse/SOLR-5208 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Tim Vaillancourt >Assignee: Erick Erickson > Attachments: SOLR-5208.patch > > > As discussed on e-mail thread "Sharing SolrCloud collection configs > w/overrides" > (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides), > Erick brought up a neat solution using HTTP params at create-time for the > Collection API. > Essentially, this request is for a functionality that allows the setting of > variables (core.properties) on Collections API CREATE command. > Erick's idea: > "Maybe it's as simple as allowing more params for creation like > collection.coreName where each param of the form collection.blah=blort > gets an entry in the properties file blah=blort?..." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate
[ https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-5209: - Assignee: Mark Miller > cores/action=UNLOAD of last replica removes shard from clusterstate > --- > > Key: SOLR-5209 > URL: https://issues.apache.org/jira/browse/SOLR-5209 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Christine Poerschke >Assignee: Mark Miller > Attachments: SOLR-5209.patch > > > The problem we saw was that unloading of an only replica of a shard deleted > that shard's info from the clusterstate. Once it was gone then there was no > easy way to re-create the shard (other than dropping and re-creating the > whole collection's state). > This seems like a bug? > Overseer.java around line 600 has a comment and commented out code: > // TODO TODO TODO!!! if there are no replicas left for the slice, and the > slice has no hash range, remove it > // if (newReplicas.size() == 0 && slice.getRange() == null) { > // if there are no replicas left for the slice remove it -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate
[ https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756827#comment-13756827 ] Daniel Collins edited comment on SOLR-5209 at 9/3/13 5:46 PM: -- Ok, my bad, I wasn't clear enough. At the user-level there is collections API and core API, and yes one is just a wrapper around the other. But at the Overseer level, we seem to have various different sub-commands (not sure what the correct terminology for them is!): {{create_shard}}, {{removeshard}}, {{createcollection}}, {{removecollection}}, {{deletecore}}, etc. I appreciate this is probably historical code, but since we have these other methods, it felt like deletecore was overstepping its bounds now :) Could submit an extra patch, but wasn't sure of the historical nature of this code, hence just a comment first to get an opinion/discussion. was (Author: dancollins): Ok, my bad, I wasn't clear enough. At the user-level there is collections API and core API, and yes one is just a wrapper around the other. But at the Overseer level, we seem to have various different sub-commands (not sure what the correct terminology for them is!): create_shard, removeshard, createcollection, removecollection, deletecore, etc. I appreciate this is probably historical code, but since we have these other methods, it felt like deletecore was overstepping its bounds now :) Could submit an extra patch, but wasn't sure of the historical nature of this code, hence just a comment first to get an opinion/discussion. > cores/action=UNLOAD of last replica removes shard from clusterstate > --- > > Key: SOLR-5209 > URL: https://issues.apache.org/jira/browse/SOLR-5209 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Christine Poerschke >Assignee: Mark Miller > Attachments: SOLR-5209.patch > > > The problem we saw was that unloading of an only replica of a shard deleted > that shard's info from the clusterstate. Once it was gone then there was no > easy way to re-create the shard (other than dropping and re-creating the > whole collection's state). > This seems like a bug? > Overseer.java around line 600 has a comment and commented out code: > // TODO TODO TODO!!! if there are no replicas left for the slice, and the > slice has no hash range, remove it > // if (newReplicas.size() == 0 && slice.getRange() == null) { > // if there are no replicas left for the slice remove it -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3852) Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere in ZK tree
[ https://issues.apache.org/jira/browse/SOLR-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756809#comment-13756809 ] ASF subversion and git services commented on SOLR-3852: --- Commit 1519763 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1519763 ] SOLR-3852: Fixed ZookeeperInfoServlet so that the SolrCloud Admin UI pages will work even if ZK contains nodes with data which are not utf8 text > Admin UI - Cloud Tree ArrayIndexOutOfBoundsException if binary files anywhere > in ZK tree > > > Key: SOLR-3852 > URL: https://issues.apache.org/jira/browse/SOLR-3852 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0-BETA > Environment: Tomcat 6, external zookeeper-3.3.5 >Reporter: Vadim Kisselmann > Attachments: SOLR-3852.patch > > > Original bug description indicated that when using Solr with embedded ZK > everything was fine, but with an external ZK you'd get an > ArrayIndexOutOfBoundsException. > Crux of the problem is some bad assumptions about any ZK node containing data > -- the ZookeeperInfoServlet powering the tree view of the Cloud Admin UI > assumed that any data would be utf8 text. > If you are using extenral ZK, and other systems are writing data into ZK, > then you are more likely to see this problem, because those other systems > might be writing binary data in to ZK nodes -- if you are using ZK embedded > in solr, or using solr with it's own private (external) ZK instance, then you > would only see this problem if you explicitly put binary files into solr > configs and upconfig them into ZK. > > One workarround for people encountering this problem when using Solr with a > ZK instance shared by other tools is to make sure you use a "chroot" patch > when pointing Solr at ZK, so that it won't know about any other paths in your > ZK tree that might have binary data... > https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot > If you are having this problem because you put binay files into your own > config dir (ie: images for velocity or something like that) then there is no > straight forward workarround. > Example stack trace for this bug... > {noformat} > 43242 [qtp965223859-14] WARN org.eclipse.jetty.servlet.ServletHandler > /solr/zookeeper > java.lang.ArrayIndexOutOfBoundsException: 213 > at > org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:620) > at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:168) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:303) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339) > at > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:339) > ... > org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:228) > at > org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:106) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate
[ https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756827#comment-13756827 ] Daniel Collins commented on SOLR-5209: -- Ok, my bad, I wasn't clear enough. At the user-level there is collections API and core API, and yes one is just a wrapper around the other. But at the Overseer level, we seem to have various different sub-commands (not sure what the correct terminology for them is!): create_shard, removeshard, createcollection, removecollection, deletecore, etc. I appreciate this is probably historical code, but since we have these other methods, it felt like deletecore was overstepping its bounds now :) Could submit an extra patch, but wasn't sure of the historical nature of this code, hence just a comment first to get an opinion/discussion. > cores/action=UNLOAD of last replica removes shard from clusterstate > --- > > Key: SOLR-5209 > URL: https://issues.apache.org/jira/browse/SOLR-5209 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Christine Poerschke >Assignee: Mark Miller > Attachments: SOLR-5209.patch > > > The problem we saw was that unloading of an only replica of a shard deleted > that shard's info from the clusterstate. Once it was gone then there was no > easy way to re-create the shard (other than dropping and re-creating the > whole collection's state). > This seems like a bug? > Overseer.java around line 600 has a comment and commented out code: > // TODO TODO TODO!!! if there are no replicas left for the slice, and the > slice has no hash range, remove it > // if (newReplicas.size() == 0 && slice.getRange() == null) { > // if there are no replicas left for the slice remove it -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API
[ https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756843#comment-13756843 ] Erick Erickson commented on SOLR-5208: -- Tim: Use At Your Own Risk!!! I just did a _very_ quick test on it, didn't even write any tests yet. If you're brave it'd be great to see if it fixes your problem. About the other commands SPLITSHARD - I expect so, it has to create another core so we have to at least copy the properties file over (which I'm not sure we do either). CREATALIAS - I doubt it. This shouldn't affect the core.properties. RELOAD - That's interesting, hadn't really thought about that. It seems possible to shoot yourself in the foot here though. I'm also not sure that the reload writes out the core.properties file already or not. > Support for the setting of core.properties key/values at create-time on > Collections API > --- > > Key: SOLR-5208 > URL: https://issues.apache.org/jira/browse/SOLR-5208 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Tim Vaillancourt >Assignee: Erick Erickson > Attachments: SOLR-5208.patch > > > As discussed on e-mail thread "Sharing SolrCloud collection configs > w/overrides" > (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides), > Erick brought up a neat solution using HTTP params at create-time for the > Collection API. > Essentially, this request is for a functionality that allows the setting of > variables (core.properties) on Collections API CREATE command. > Erick's idea: > "Maybe it's as simple as allowing more params for creation like > collection.coreName where each param of the form collection.blah=blort > gets an entry in the properties file blah=blort?..." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine
[ https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756769#comment-13756769 ] Hoss Man commented on SOLR-5201: bq. The second option sounds nice but I wonder if that would cause a problem with multiple configurations (2 update chains with 2 different configurations of UIMAUpdateRequestProcessorFactory), It depends on what kinds or problems you are worried about ... each UIMAUpdateRequestProcessorFactory instance (ie: one instance per chain) should each have it's own AnalysisEngine using it's own configuration ... unless the AnalysisEngine constructor/factory/provider does something special to keep track of them, they won't know anything about eachother. If you *want* them to know about eachother (ie: to share an AnalysisEngine between chains, or between chains in different SolrCores) then something a lot more special case would need to be done. > UIMAUpdateRequestProcessor should reuse the AnalysisEngine > -- > > Key: SOLR-5201 > URL: https://issues.apache.org/jira/browse/SOLR-5201 > Project: Solr > Issue Type: Improvement > Components: contrib - UIMA >Affects Versions: 4.4 >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 4.5, 5.0 > > Attachments: SOLR-5201-ae-cache-every-request_branch_4x.patch, > SOLR-5201-ae-cache-only-single-request_branch_4x.patch > > > As reported in http://markmail.org/thread/2psiyl4ukaejl4fx > UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request > which is bad for performance therefore it'd be nice if such AEs could be > reused whenever that's possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756766#comment-13756766 ] Han Jiang commented on LUCENE-5199: --- Thanks Rob! Yeah, I just hit another failure around TestSortDocValues. :) > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5199.patch, LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5023) deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj
[ https://issues.apache.org/jira/browse/SOLR-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756776#comment-13756776 ] Mark Miller commented on SOLR-5023: --- I reviewed the patch the other day, looks good, but it still needs a test that uses the new code. > deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj > - > > Key: SOLR-5023 > URL: https://issues.apache.org/jira/browse/SOLR-5023 > Project: Solr > Issue Type: Improvement > Components: multicore >Affects Versions: 4.2.1 >Reporter: Lyubov Romanchuk >Assignee: Shalin Shekhar Mangar > Fix For: 4.5, 5.0 > > Attachments: SOLR-5023.patch > > > deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload > CoreAdminRequest -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756773#comment-13756773 ] ASF subversion and git services commented on LUCENE-5199: - Commit 1519756 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1519756 ] LUCENE-5199: don't use old codec components mixed in with new ones when using -Ds > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5199.patch, LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756757#comment-13756757 ] Shai Erera commented on LUCENE-5199: Thanks Rob! > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5199.patch, LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate
[ https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756786#comment-13756786 ] Mark Miller commented on SOLR-5209: --- bq. given we have the collections API to do that We don't actually have the collections API to do that - it's simply a thin candy wrapper around SolrCore admin calls. Everything is driven by SolrCores being added or removed. There is work being done to migrate towards something where the collections API is actually large and in charge, but currently it's just a sugar wrapper. > cores/action=UNLOAD of last replica removes shard from clusterstate > --- > > Key: SOLR-5209 > URL: https://issues.apache.org/jira/browse/SOLR-5209 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Christine Poerschke > Attachments: SOLR-5209.patch > > > The problem we saw was that unloading of an only replica of a shard deleted > that shard's info from the clusterstate. Once it was gone then there was no > easy way to re-create the shard (other than dropping and re-creating the > whole collection's state). > This seems like a bug? > Overseer.java around line 600 has a comment and commented out code: > // TODO TODO TODO!!! if there are no replicas left for the slice, and the > slice has no hash range, remove it > // if (newReplicas.size() == 0 && slice.getRange() == null) { > // if there are no replicas left for the slice remove it -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756775#comment-13756775 ] ASF subversion and git services commented on LUCENE-5199: - Commit 1519757 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1519757 ] LUCENE-5199: don't use old codec components mixed in with new ones when using -Ds > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5199.patch, LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5142) Block Indexing / Join Improvements
[ https://issues.apache.org/jira/browse/SOLR-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756542#comment-13756542 ] Mikhail Khludnev edited comment on SOLR-5142 at 9/3/13 4:28 PM: I have a subject for consideration: right now unique key field is required for children documents also, but it doesn't enforce anything. It's explicitly asserted at https://svn.apache.org/viewvc?view=revision&revision=r1519679 I suppose that uniqueness is provided by parents and \_root_ field. Don't you feel unique key should be optional for children documents? was (Author: mkhludnev): I have a subject for consideration: right now unique key is required for children documents, however, I suppose that uniqueness is provided by parents and \_root_ field. Don't you feel unique key should be optional for children documents? > Block Indexing / Join Improvements > -- > > Key: SOLR-5142 > URL: https://issues.apache.org/jira/browse/SOLR-5142 > Project: Solr > Issue Type: Improvement >Reporter: Yonik Seeley > Fix For: 4.5, 5.0 > > > Follow-on main issue for general block indexing / join improvements -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1519516 - /lucene/board-reports/2013/board-report-september.txt
: Subject: svn commit: r1519516 - : /lucene/board-reports/2013/board-report-september.txt +1 -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756750#comment-13756750 ] Robert Muir commented on LUCENE-5199: - I will commit it, i am worried Billy will hit other problems: because in general old codec components should not be "mixed in" When we test an old codec, it should always be "the whole codec" to realistically simulate the backwards format... > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5199.patch, LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756743#comment-13756743 ] Shai Erera commented on LUCENE-5199: Thanks Rob. I can commit it later, but feel free to commit it if you get to it before me. > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5199.patch, LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756748#comment-13756748 ] Michael McCandless commented on LUCENE-5199: +1 > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5199.patch, LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5209) cores/action=UNLOAD of last replica removes shard from clusterstate
[ https://issues.apache.org/jira/browse/SOLR-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756714#comment-13756714 ] Shalin Shekhar Mangar commented on SOLR-5209: - I think this deserves another look. We have the deleteshard API now which can be used to completely remove a slice from the cluster state. We should remove this trappy behaviour. > cores/action=UNLOAD of last replica removes shard from clusterstate > --- > > Key: SOLR-5209 > URL: https://issues.apache.org/jira/browse/SOLR-5209 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Christine Poerschke > Attachments: SOLR-5209.patch > > > The problem we saw was that unloading of an only replica of a shard deleted > that shard's info from the clusterstate. Once it was gone then there was no > easy way to re-create the shard (other than dropping and re-creating the > whole collection's state). > This seems like a bug? > Overseer.java around line 600 has a comment and commented out code: > // TODO TODO TODO!!! if there are no replicas left for the slice, and the > slice has no hash range, remove it > // if (newReplicas.size() == 0 && slice.getRange() == null) { > // if there are no replicas left for the slice remove it -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5023) deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj
[ https://issues.apache.org/jira/browse/SOLR-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-5023: --- Assignee: Shalin Shekhar Mangar (was: Mark Miller) > deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj > - > > Key: SOLR-5023 > URL: https://issues.apache.org/jira/browse/SOLR-5023 > Project: Solr > Issue Type: Improvement > Components: multicore >Affects Versions: 4.2.1 >Reporter: Lyubov Romanchuk >Assignee: Shalin Shekhar Mangar > Fix For: 4.5, 5.0 > > Attachments: SOLR-5023.patch > > > deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload > CoreAdminRequest -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5023) deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj
[ https://issues.apache.org/jira/browse/SOLR-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756732#comment-13756732 ] Shalin Shekhar Mangar commented on SOLR-5023: - Patch looks good. I'll commit shortly. > deleteInstanceDir is added to CoreAdminHandler but can't be passed with solrj > - > > Key: SOLR-5023 > URL: https://issues.apache.org/jira/browse/SOLR-5023 > Project: Solr > Issue Type: Improvement > Components: multicore >Affects Versions: 4.2.1 >Reporter: Lyubov Romanchuk >Assignee: Mark Miller > Fix For: 4.5, 5.0 > > Attachments: SOLR-5023.patch > > > deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload > CoreAdminRequest -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API
[ https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756710#comment-13756710 ] Shalin Shekhar Mangar commented on SOLR-5208: - +1 for adding this to splitshard and createshard as well. > Support for the setting of core.properties key/values at create-time on > Collections API > --- > > Key: SOLR-5208 > URL: https://issues.apache.org/jira/browse/SOLR-5208 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Tim Vaillancourt >Assignee: Erick Erickson > Attachments: SOLR-5208.patch > > > As discussed on e-mail thread "Sharing SolrCloud collection configs > w/overrides" > (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides), > Erick brought up a neat solution using HTTP params at create-time for the > Collection API. > Essentially, this request is for a functionality that allows the setting of > variables (core.properties) on Collections API CREATE command. > Erick's idea: > "Maybe it's as simple as allowing more params for creation like > collection.coreName where each param of the form collection.blah=blort > gets an entry in the properties file blah=blort?..." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5199: Attachment: LUCENE-5199.patch Here is the correct fix. the brokenness was just in TestRuleSetupAndRestoreClassEnv. The current approach is no good, Han will hit many other issues testing because we should _never_ mix in old deprecated codecs with new ones... its not supported. > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5199.patch, LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-5177) test covers overwrite true/false for block updates
[ https://issues.apache.org/jira/browse/SOLR-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev closed SOLR-5177. -- well done! > test covers overwrite true/false for block updates > --- > > Key: SOLR-5177 > URL: https://issues.apache.org/jira/browse/SOLR-5177 > Project: Solr > Issue Type: Sub-task >Affects Versions: 4.5, 5.0 >Reporter: Mikhail Khludnev > Labels: patch, test > Fix For: 4.5, 5.0 > > Attachments: SOLR-5177.patch > > > DUH2 uses \_root_ field to support overwrite for block updates. I want to > contribute this test, which assert the current functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756670#comment-13756670 ] Han Jiang commented on LUCENE-5199: --- Thanks Shai! > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 5.0, 4.5 > > Attachments: LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)
[ https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756687#comment-13756687 ] Robert Muir commented on LUCENE-5178: - Can this commit please be reverted? The change makes the test API so complicated for something that cannot happen: You cannot have "unsupported fields" its all or none. This is a bug in LuceneTestCase, it should not do this when someone uses -Dtests.postingsformat. > doc values should expose missing values (or allow configurable defaults) > > > Key: LUCENE-5178 > URL: https://issues.apache.org/jira/browse/LUCENE-5178 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Yonik Seeley > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5178.patch, LUCENE-5178_reintegrate.patch > > > DocValues should somehow allow a configurable default per-field. > Possible implementations include setting it on the field in the document or > registration of an IndexWriter callback. > If we don't make the default configurable, then another option is to have > DocValues fields keep track of whether a value was indexed for that document > or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5177) test covers overwrite true/false for block updates
[ https://issues.apache.org/jira/browse/SOLR-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-5177. Resolution: Fixed Fix Version/s: 5.0 4.5 > test covers overwrite true/false for block updates > --- > > Key: SOLR-5177 > URL: https://issues.apache.org/jira/browse/SOLR-5177 > Project: Solr > Issue Type: Sub-task >Affects Versions: 4.5, 5.0 >Reporter: Mikhail Khludnev > Labels: patch, test > Fix For: 4.5, 5.0 > > Attachments: SOLR-5177.patch > > > DUH2 uses \_root_ field to support overwrite for block updates. I want to > contribute this test, which assert the current functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5189: --- Attachment: LUCENE-5189.patch Patch improves RAM accounting in BufferedDeletes and FrozenBD. I added NumericUpdate.sizeInBytes() so most of the logic is done there. BD adds two constants - for adding to outer Map (includes inner map OBJ_HEADER) and adding an actual update to inner map (excludes map's OBJ_HEADER). Only the pointers are taken into account. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reopened LUCENE-5199: - this is not necessary. the only codecs that dont support this, arent instantiated with per-field docvalues (Unless there is a bug in facet/ tests with what its doing). > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 5.0, 4.5 > > Attachments: LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756678#comment-13756678 ] Robert Muir commented on LUCENE-5189: - {quote} Patch adds per-field support. I currently do that by adding a boolean 'isFieldUpdate' to SegWriteState which is set to true only by ReaderAndLiveDocs. PerFieldDVF then peeks into that boolean and if it's true, it reads the format name from FieldInfo.attributes() instead of relying on Codec.getPerFieldDVF(). If we'll eventually gen FieldInfos, there won't be a need for this boolean as PerFieldDVF will get that from FI.dvGen. {quote} We can't move forward really with this boolean: it only attacks the symptom (puts a HACK in per-field) without fixing the disease (the codec API). In general if a codec needs to write to and read from FieldInfos/SegmentInfos.attributes, it doesnt work here: this api needs to be fixed. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756676#comment-13756676 ] Robert Muir commented on LUCENE-5197: - RAMUsageEstimator is quite slow when you need to run it on lots of objects (e.g. the codec tree here). Also it totally breaks down if it hits certain objects like a ThreadLocal. > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756650#comment-13756650 ] Shai Erera commented on LUCENE-5189: Thanks for the review Mike. I nuked the two unused methodsand I like SegmentCommitInfo, so changed the nocommit text. I changed the nocomit in SIPC to a TODO. Don't think we need to tackle it in this issue. I'm working on improving the RAM accounting. I want to add to NumericUpdate.sizeInBytes() and count it per-update that is actually added. Then, because it's a Map> and the Term and String are both in NumericUdpate already (and will be accounted for in its calculation, only their PTR needs to be taken into account. Also, the sizeInBytes should grow by new entry to outer map only when one is actually added, same for inner map. Therefore I don't think we can have a single constant here, but instead maybe two: for every Entry added to the outer map and every Entry added to the inner map. I think, because I need to compute the shallow sizes only (since Term and String are accounted for in NumericUpdate), it's a single constant for Entry? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5177) test covers overwrite true/false for block updates
[ https://issues.apache.org/jira/browse/SOLR-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756649#comment-13756649 ] ASF subversion and git services commented on SOLR-5177: --- Commit 1519694 from [~yo...@apache.org] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1519694 ] SOLR-5177: tests - add overwrite test for block join > test covers overwrite true/false for block updates > --- > > Key: SOLR-5177 > URL: https://issues.apache.org/jira/browse/SOLR-5177 > Project: Solr > Issue Type: Sub-task >Affects Versions: 4.5, 5.0 >Reporter: Mikhail Khludnev > Labels: patch, test > Attachments: SOLR-5177.patch > > > DUH2 uses \_root_ field to support overwrite for block updates. I want to > contribute this test, which assert the current functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5208) Support for the setting of core.properties key/values at create-time on Collections API
[ https://issues.apache.org/jira/browse/SOLR-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-5208: - Attachment: SOLR-5208.patch Here's a Proof-of-Concept patch. It creates a core.properties file that has any property.key=value pair specified on the collection.create line reproduced as key=value. Mark was spot-on that it's just a matter of passing the params through to core creation.. [~markrmil...@gmail.com] is that what you had in mind? Several questions though. 1> Is copying the property.key=value necessary in CollectionsHandler.handleCreateShard? 2> Similarly, should the property.key=value stuff be done in OverseerCollectionProcessor.createShard? What about splitShard? Just going by all the params.set that "look kinda like create" it seems possible at least. > Support for the setting of core.properties key/values at create-time on > Collections API > --- > > Key: SOLR-5208 > URL: https://issues.apache.org/jira/browse/SOLR-5208 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Tim Vaillancourt >Assignee: Erick Erickson > Attachments: SOLR-5208.patch > > > As discussed on e-mail thread "Sharing SolrCloud collection configs > w/overrides" > (http://search-lucene.com/m/MUWXu1DIsqY1&subj=Sharing+SolrCloud+collection+configs+w+overrides), > Erick brought up a neat solution using HTTP params at create-time for the > Collection API. > Essentially, this request is for a functionality that allows the setting of > variables (core.properties) on Collections API CREATE command. > Erick's idea: > "Maybe it's as simple as allowing more params for creation like > collection.coreName where each param of the form collection.blah=blort > gets an entry in the properties file blah=blort?..." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5199. Resolution: Fixed Fix Version/s: 4.5 5.0 Committed to trunk and 4x. > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 5.0, 4.5 > > Attachments: LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756633#comment-13756633 ] ASF subversion and git services commented on LUCENE-5199: - Commit 1519690 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1519690 ] LUCENE-5199: Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5199: --- Summary: Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field (was: Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat in user per-field) > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat used per-field > --- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat in user per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756626#comment-13756626 ] ASF subversion and git services commented on LUCENE-5199: - Commit 1519685 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1519685 ] LUCENE-5199: Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat in user per-field > -- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5177) test covers overwrite true/false for block updates
[ https://issues.apache.org/jira/browse/SOLR-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756610#comment-13756610 ] ASF subversion and git services commented on SOLR-5177: --- Commit 1519679 from [~yo...@apache.org] in branch 'dev/trunk' [ https://svn.apache.org/r1519679 ] SOLR-5177: tests - add overwrite test for block join > test covers overwrite true/false for block updates > --- > > Key: SOLR-5177 > URL: https://issues.apache.org/jira/browse/SOLR-5177 > Project: Solr > Issue Type: Sub-task >Affects Versions: 4.5, 5.0 >Reporter: Mikhail Khludnev > Labels: patch, test > Attachments: SOLR-5177.patch > > > DUH2 uses \_root_ field to support overwrite for block updates. I want to > contribute this test, which assert the current functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756595#comment-13756595 ] Uwe Schindler commented on LUCENE-5197: --- >From my perspective: Usage of RAMUsageEstimator is to be preferred, everything >else gets outdated very fast (especially we have no idea about caches like >FieldCache used). In production RAMDirectory is not used, and things like MMap >or NIOFSDir have no heap usage and default codec is also not much, so I see no >reason to don't trust offical RAM usage as reported by RAMUsageEstimator. The memory usage counting of IndexWriter is way different to what we have on the IndexReader side. The accounting done on IndexWriter side are much more under control of the Lucene code and are very fine granular, but stuff like proposed changes in FixedBitSet are just nonsense to me. RAMUsageEstimator can estimate FixedBitSet very correct (that's just easy and in my opinion 100% correct). > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-3069: -- Attachment: LUCENE-3069.patch The uploaded patch should show all the changes against trunk: I added two different implementations of term dict, and refactored the PostingsBaseFormat to plug in non-block based term dicts. I'm still working on the javadocs, and maybe we should rename that 'temp' package, like 'fstterms'? > Lucene should have an entirely memory resident term dictionary > -- > > Key: LUCENE-3069 > URL: https://issues.apache.org/jira/browse/LUCENE-3069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index, core/search >Affects Versions: 4.0-ALPHA >Reporter: Simon Willnauer >Assignee: Han Jiang > Labels: gsoc2013 > Fix For: 5.0, 4.5 > > Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch > > > FST based TermDictionary has been a great improvement yet it still uses a > delta codec file for scanning to terms. Some environments have enough memory > available to keep the entire FST based term dict in memory. We should add a > TermDictionary implementation that encodes all needed information for each > term into the FST (custom fst.Output) and builds a FST from the entire term > not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756556#comment-13756556 ] Dawid Weiss commented on LUCENE-5197: - This can be done with the same implementation -- a tree visitor could collect partial sums and aggregate them for the object hierarchy in the form of a tree rather than sum it all up. This is sort-of implemented here (although I kept the same implementation of RamUsageEstimator; a visitor pattern would be more elegant I think). https://github.com/dweiss/java-sizeof/blob/master/src/main/java/com/carrotsearch/sizeof/ObjectTree.java > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat in user per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756555#comment-13756555 ] Shai Erera commented on LUCENE-5199: Core and Facet tests pass (only users of this API). I think it's ready to commit. We can add more formats as Jenkins hunts them down (if there are any). > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat in user per-field > -- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat in user per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5199: --- Attachment: LUENE-5199.patch Added {{String... fields}} to LTC.defaultCodecSupportsDocsWithField and tested that the DVF returned for each field is not "Lucene40/41/42". Are there more DVFs I should add? I also changed all tests that call this method to pass the list of fields they use up front. > Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual > DocValuesFormat in user per-field > -- > > Key: LUCENE-5199 > URL: https://issues.apache.org/jira/browse/LUCENE-5199 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUENE-5199.patch > > > On LUCENE-5178 Han reported the following test failure: > {noformat} > [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< >[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) >[junit4]> less than 10 ([8) >[junit4]> less than or equal to 10 (]8) >[junit4]> over 90 (8) >[junit4]> 9...> but was:<...(0) >[junit4]> less than 10 ([28) >[junit4]> less than or equal to 10 (2]8) >[junit4]> over 90 (8) >[junit4]> 9...> >[junit4]> at > __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) >[junit4]> at > org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) >[junit4]> at java.lang.Thread.run(Thread.java:722) > {noformat} > which can be reproduced with > {noformat} > tcase=TestRangeAccumulator -Dtests.method=testMissingValues > -Dtests.seed=815B6AA86D05329C -Dtests.slow=true > -Dtests.postingsformat=Lucene41 -Dtests.locale=ca > -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 > {noformat} > It seems that the Codec that is picked is a Lucene45Codec with > Lucene42DVFormat, which does not support docsWithFields for numericDV. We > should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields > and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size
[ https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756552#comment-13756552 ] Michael McCandless commented on LUCENE-5197: This would be a great addition! And I agree if we can somehow do this with RUE, being able to restrict where it's allowed to "crawl", that would be nice. Separately, I wonder if we could get a breakdown of the RAM usage ... sort of like Explanation, which returns both the value and a String (English) description of how the value was computed. But this can come later ... just a ramBytesUsed returning long is a great start. > Add a method to SegmentReader to get the current index heap memory size > --- > > Key: LUCENE-5197 > URL: https://issues.apache.org/jira/browse/LUCENE-5197 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/index >Reporter: Areek Zillur > Attachments: LUCENE-5197.patch > > > It would be useful to at least estimate the index heap size being used by > Lucene. Ideally a method exposing this information at the SegmentReader level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)
[ https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756551#comment-13756551 ] Shai Erera commented on LUCENE-5178: Opened LUCENE-5199. > doc values should expose missing values (or allow configurable defaults) > > > Key: LUCENE-5178 > URL: https://issues.apache.org/jira/browse/LUCENE-5178 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Yonik Seeley > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5178.patch, LUCENE-5178_reintegrate.patch > > > DocValues should somehow allow a configurable default per-field. > Possible implementations include setting it on the field in the document or > registration of an IndexWriter callback. > If we don't make the default configurable, then another option is to have > DocValues fields keep track of whether a value was indexed for that document > or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat in user per-field
Shai Erera created LUCENE-5199: -- Summary: Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat in user per-field Key: LUCENE-5199 URL: https://issues.apache.org/jira/browse/LUCENE-5199 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Shai Erera Assignee: Shai Erera On LUCENE-5178 Han reported the following test failure: {noformat} [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues <<< [junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<...(0) [junit4]> less than 10 ([8) [junit4]> less than or equal to 10 (]8) [junit4]> over 90 (8) [junit4]> 9...> but was:<...(0) [junit4]> less than 10 ([28) [junit4]> less than or equal to 10 (2]8) [junit4]> over 90 (8) [junit4]> 9...> [junit4]>at __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) [junit4]>at org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) [junit4]>at java.lang.Thread.run(Thread.java:722) {noformat} which can be reproduced with {noformat} tcase=TestRangeAccumulator -Dtests.method=testMissingValues -Dtests.seed=815B6AA86D05329C -Dtests.slow=true -Dtests.postingsformat=Lucene41 -Dtests.locale=ca -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 {noformat} It seems that the Codec that is picked is a Lucene45Codec with Lucene42DVFormat, which does not support docsWithFields for numericDV. We should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields and check that the actual DVF used for each field supports it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5142) Block Indexing / Join Improvements
[ https://issues.apache.org/jira/browse/SOLR-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756542#comment-13756542 ] Mikhail Khludnev commented on SOLR-5142: I have a subject for consideration: right now unique key is required for children documents, however, I suppose that uniqueness is provided by parents and \_root_ field. Don't you feel unique key should be optional for children documents? > Block Indexing / Join Improvements > -- > > Key: SOLR-5142 > URL: https://issues.apache.org/jira/browse/SOLR-5142 > Project: Solr > Issue Type: Improvement >Reporter: Yonik Seeley > Fix For: 4.5, 5.0 > > > Follow-on main issue for general block indexing / join improvements -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5175) Don't reorder children document
[ https://issues.apache.org/jira/browse/SOLR-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-5175: --- Issue Type: Sub-task (was: Bug) Parent: SOLR-5142 > Don't reorder children document > --- > > Key: SOLR-5175 > URL: https://issues.apache.org/jira/browse/SOLR-5175 > Project: Solr > Issue Type: Sub-task > Components: update >Reporter: Mikhail Khludnev > Labels: patch, test > Fix For: 4.5, 5.0 > > Attachments: SOLR-5175.patch > > > AddUpdateCommand reverses children documents that causes failure of > BJQParserTest.testGrandChildren() discussed in SOLR-5168 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)
[ https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756536#comment-13756536 ] Shai Erera commented on LUCENE-5178: I see. I think this can also happen if you use RandomCodec and it draws Lucene42DVF? So in this case, with this seed, it trips if you set postingsformat, but I'm not sure that in general this assume() is correct. The ugly part of having a test calling _TestUtil.geDVF(field) (or we wrap it in a nice method) is that the test will need to decide up front on all the fields it uses, and if there's a mistake, the error may happen in the future and harder to debug (i.e. spot that the test uses a different field than what it passed to assume()). But I don't think that asserting the Codec is the right test here, so this has to change. > doc values should expose missing values (or allow configurable defaults) > > > Key: LUCENE-5178 > URL: https://issues.apache.org/jira/browse/LUCENE-5178 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Yonik Seeley > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5178.patch, LUCENE-5178_reintegrate.patch > > > DocValues should somehow allow a configurable default per-field. > Possible implementations include setting it on the field in the document or > registration of an IndexWriter callback. > If we don't make the default configurable, then another option is to have > DocValues fields keep track of whether a value was indexed for that document > or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5177) test covers overwrite true/false for block updates
[ https://issues.apache.org/jira/browse/SOLR-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-5177: --- Issue Type: Sub-task (was: Test) Parent: SOLR-5142 > test covers overwrite true/false for block updates > --- > > Key: SOLR-5177 > URL: https://issues.apache.org/jira/browse/SOLR-5177 > Project: Solr > Issue Type: Sub-task >Affects Versions: 4.5, 5.0 >Reporter: Mikhail Khludnev > Labels: patch, test > Attachments: SOLR-5177.patch > > > DUH2 uses \_root_ field to support overwrite for block updates. I want to > contribute this test, which assert the current functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)
[ https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756523#comment-13756523 ] Michael McCandless commented on LUCENE-5178: It sounds like we need to check the actual DVFormat for that field (_TestUtil.getDocValuesFormat("field")) and then test whether that format supports missing values. I think this failure can only happen if you explicitly set -Dtests.postingsformat, because then we make an anon subclass of Lucene45 (TestRuleSetupAndRestoreClassEnv.java at line 194) ... so it sounds like in general we should not be using defaultCodecSupportsDocsWithField() but rather something like defaultDVFormatSupportsDocsWithField(String field) ... > doc values should expose missing values (or allow configurable defaults) > > > Key: LUCENE-5178 > URL: https://issues.apache.org/jira/browse/LUCENE-5178 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Yonik Seeley > Fix For: 5.0, 4.5 > > Attachments: LUCENE-5178.patch, LUCENE-5178_reintegrate.patch > > > DocValues should somehow allow a configurable default per-field. > Possible implementations include setting it on the field in the document or > registration of an IndexWriter callback. > If we don't make the default configurable, then another option is to have > DocValues fields keep track of whether a value was indexed for that document > or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: term collection frequence in lucene 3.6.2?
3.6.x doesn't track this statistic, but 4.x does: TermsEnum.totalTermFreq(). In 3.6.x you could visit every doc, summing up the .freq(), but this is slowish. Mike McCandless http://blog.mikemccandless.com On Tue, Sep 3, 2013 at 4:19 AM, jiangwen jiang wrote: > Hi, gay. > > Term collection frequence (which means how many times a particular term > appears in all documents), is this data exists in lucene 3.6.2. > > for example: > doc1 contains terms: T1 T2 T3 T1 T1 > doc2 contains Term T1 T4 T4 > > > T1 appears 4 times in all documents, so term collection freq of T1 is 4 > > Thanks for your help > Regards - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5189: --- Attachment: LUCENE-5189.patch Patch adds per-field support. I currently do that by adding a boolean 'isFieldUpdate' to SegWriteState which is set to true only by ReaderAndLiveDocs. PerFieldDVF then peeks into that boolean and if it's true, it reads the format name from FieldInfo.attributes() instead of relying on Codec.getPerFieldDVF(). If we'll eventually gen FieldInfos, there won't be a need for this boolean as PerFieldDVF will get that from FI.dvGen. So far all Codecs work. I had to remove an assert from SimpleText which tested that all fields read from the file are in the state.fieldInfos. But it doesn't use that information, only an assert. And SegCoreReader now passes to each DVProducer only the fields it needs to read. Added some tests too. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756494#comment-13756494 ] Michael McCandless commented on LUCENE-5189: Patch looks great! I review some of the remaining nocommits: {quote} // nocommit no one calls this method, why do we have it? and if we need it, do we need one for docValuesGen too? public void setDelGen(long delGen) { {quote} Nuke it! We only use .advanceNextWriteDelGen (and the patch adds this for DVs too). {quote} // nocommit no one calls this, remove? void clearDelGen() { {quote} Nuke it! bq. class ReadersAndLiveDocs { // nocommit (RENAME) to ReaderAndUpdates? +1 for ReaderAndUpdates {quote} // nocommit why do we do that, vs relying on TrackingDir.getCreatedFiles(), // like we do for updates? {quote} That's a good question ... I'm not sure. We in fact already use TrackingDirWrapper (in ReadersAndLiveDocs.writeLiveDocs)... so we could in theory record those files in SIPC and remove LiveDocsFormat.files(). Maybe make this a TODO though? {quote} // nocommit: review! final static int BYTES_PER_NUMERIC_UPDATE = BYTES_PER_DEL_TERM + 2*RamUsageEstimator.NUM_BYTES_OBJECT_REF + RamUsageEstimator.NUM_BYTES_INT + RamUsageEstimator.NUM_BYTES_LONG; {quote} I think it makes sense to start from BYTES_PER_DEL_TERM, but then instead of mapping to value Integer we map to value Map whose per-Term RAM usage is something like: {noformat} PTR (for LinkedHashMap, since it must link each entry to the next?) Map HEADER PTR (to array) 3 INT 1 FLOAT for each occupied Entry PTR (from Map's entries array) * 2 (overhead for load factor) HEADER PTR * 2 (key, value) String key HEADER INT PTR (to char[]) ARRAY_HEADER + 2 * length-of-string (field) NumericUpdate value HEADER PTR (to Term; ram already accounted for) PTR (to String; ram already accounted for) PTR (to Long value) + HEADER + 8 (long) INT {noformat} The thing is, this is so hairy ... that I think maybe we should instead use RamUsageEstimator to "calibrate" this? Ie, make a standalone test that keeps adding Term + fields into this structure and measures the RAM with RUE? Do this on 32 bit and on 64 bit JVM, and then conditionalize the constants. You'll still need to add in bytes according to field/term lengths... bq. +public class SegmentInfoPerCommit { // nocommit (RENAME) to SegmentCommit? Not sure about that rename ... since this class is just the "metadata" about a commit, not an "actual" commit. Maybe SegmentCommitInfo? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 792 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/792/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 9482 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/bin/java -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=7732BB689312EA08 -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.docvaluesformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Dtests.disableHdfs=true -classpath /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/test-framework/lib/junit4-ant-2.0.10.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test-files:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/lucene-codecs-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queries/lucene-queries-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queryparser/lucene-queryparser-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/join/lucene-join-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-cli-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-codec-1.7.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-configuration-1.6.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-fileupload-1.2.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-lang-2.6.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/concurrentlinkedhashmap-lru-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/dom4j-1.6.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/guava-14.0.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-annotations-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-auth-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-common-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-hdfs-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/joda-time-2.2.jar:/Users/jenkins/wor