Re: Collections API
https://issues.apache.org/jira/browse/SOLR-5510 I don't really understand all the details why is that happening, but the workaround is to add genericCoreNodeNames="${genericCoreNodeNames:true} attribute to cores element in your solr.xml file. On Tue, Nov 26, 2013 at 10:10 PM, Steve Molloy wrote: > I'm trying to reconcile our fork with 4.6 tag and I'm getting weird > behaviour in Collections API, more specifically in ZkController's > preRegister method after calling the create method of the collections API. > When it checks if a slice has a replica for current node name, there is > never any because at this stage, the slice has no replica. This is the new > code that seems to be causing my issue, I can force the "autoCreated" to be > always true to avoid the issue, but would like a cleaner way if there is > one. > > if(cd.getCloudDescriptor().getCollectionName() !=null && > cd.getCloudDescriptor().getCoreNodeName() != null ) { > //we were already registered > > if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){ > DocCollection coll = > zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName()); > if(!"true".equals(coll.getStr("autoCreated"))){ >Slice slice = > coll.getSlice(cd.getCloudDescriptor().getShardId()); >if(slice != null){ > ==> if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName()) == > null) { >log.info("core_removed This core is removed from ZK"); >throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName +" > is removed"); > } >} > } > } > } > > Thanks. > Steve > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Created] (SOLR-5513) automate SOLR core creation when installation is done...
Deep Kumar Lakharia created SOLR-5513: - Summary: automate SOLR core creation when installation is done... Key: SOLR-5513 URL: https://issues.apache.org/jira/browse/SOLR-5513 Project: Solr Issue Type: New Feature Components: clients - C# Environment: Windows,Ubuntu Reporter: Deep Kumar Lakharia Fix For: 4.4, 4.3 Automate SOLR core creation when installation is done...as of now we are creating the core manually -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5512) Optimize DocValuesFacets
[ https://issues.apache.org/jira/browse/SOLR-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-5512: -- Attachment: SOLR-5512.patch patch. also fixes a few unrelated bugs that got in the way: * BitDocSet's DISI's cost() method doesnt prorate for the size of the segment relative to the whole thing (since the bitset is unfortunately top-level) * If you hit an exception during faceting, the stacktrace is completely lost because of a bug in SimpleFacets (from SOLR-2548) > Optimize DocValuesFacets > - > > Key: SOLR-5512 > URL: https://issues.apache.org/jira/browse/SOLR-5512 > Project: Solr > Issue Type: Improvement >Reporter: Robert Muir > Attachments: SOLR-5512.patch > > > This works well in the general case (esp with huge numbers of unique values), > but the SortedSetDocValuesAccumulator in lucene/facets does the algorithm > better for typical cases (smaller number of unique values wrt the size of the > document set). > In this case, it collects directly with per-segment local ords, then remaps > as a second step. So this is a lot less remapping. > Its too bad the code is separate at the moment, for now lets steal the > heuristic. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5512) Optimize DocValuesFacets
Robert Muir created SOLR-5512: - Summary: Optimize DocValuesFacets Key: SOLR-5512 URL: https://issues.apache.org/jira/browse/SOLR-5512 Project: Solr Issue Type: Improvement Reporter: Robert Muir This works well in the general case (esp with huge numbers of unique values), but the SortedSetDocValuesAccumulator in lucene/facets does the algorithm better for typical cases (smaller number of unique values wrt the size of the document set). In this case, it collects directly with per-segment local ords, then remaps as a second step. So this is a lot less remapping. Its too bad the code is separate at the moment, for now lets steal the heuristic. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834527#comment-13834527 ] ASF subversion and git services commented on SOLR-5509: --- Commit 1546286 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1546286 ] SOLR-5509: Do not retry to yourself. > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834528#comment-13834528 ] ASF subversion and git services commented on SOLR-5509: --- Commit 1546287 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1546287 ] SOLR-5509: Do not retry to yourself. > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834514#comment-13834514 ] Mark Miller commented on SOLR-5509: --- Interesting - as soon as I did that, I caught a fail. Need to think about it a bit. The phantom add is coming from the leader to itself and then getting distributed. > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834501#comment-13834501 ] Mark Miller edited comment on SOLR-5509 at 11/28/13 4:14 AM: - Bah - this bug all but disappeared when I started sending the from node param on all requests - that is pretty odd, it wouldn't affect any defensive checks here that I can see. Might be rolling 100 heads in a row though. Anyway, better to not hide adding the param behind the log level - always adding and seeing if I can find this fail again. was (Author: markrmil...@gmail.com): Bah - this all bug disappeared when I started sending the from node param on all requests - that is pretty odd, it wouldn't affect any defensive checks here that I can see. Might be rolling 100 heads in a row though. Anyway, better to not hide adding the param behind the log level - always adding and seeing if I can find this fail again. > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834501#comment-13834501 ] Mark Miller commented on SOLR-5509: --- Bah - this all bug disappeared when I started sending the from node param on all requests - that is pretty odd, it wouldn't affect any defensive checks here that I can see. Might be rolling 100 heads in a row though. Anyway, better to not hide adding the param behind the log level - always adding and seeing if I can find this fail again. > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834500#comment-13834500 ] ASF subversion and git services commented on SOLR-5509: --- Commit 1546279 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1546279 ] SOLR-5509: Always add a param for 'from' node. > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834499#comment-13834499 ] ASF subversion and git services commented on SOLR-5509: --- Commit 1546278 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1546278 ] SOLR-5509: Always add a param for 'from' node. > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834494#comment-13834494 ] ASF subversion and git services commented on SOLR-5488: --- Commit 1546263 from [~erickoerickson] in branch 'dev/trunk' [ https://svn.apache.org/r1546263 ] SOLR-5488: Added more test info output. Somehow lost some of what I did yesterday > Fix up test failures for Analytics Component > > > Key: SOLR-5488 > URL: https://issues.apache.org/jira/browse/SOLR-5488 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 4.7 >Reporter: Erick Erickson >Assignee: Erick Erickson > Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch > > > The analytics component has a few test failures, perhaps > environment-dependent. This is just to collect the test fixes in one place > for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5285) FastVectorHighlighter copies segments scores when splitting segments across multi-valued fields
[ https://issues.apache.org/jira/browse/LUCENE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-5285: Attachment: LUCENE-5285.patch Ah! += yeah. This fixes it and improves the test so it would notice the difference. > FastVectorHighlighter copies segments scores when splitting segments across > multi-valued fields > --- > > Key: LUCENE-5285 > URL: https://issues.apache.org/jira/browse/LUCENE-5285 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nik Everett >Priority: Minor > Attachments: LUCENE-5285.patch, LUCENE-5285.patch > > > FastVectorHighlighter copies segments scores when splitting segments across > multi-valued fields. This is only a problem when you want to sort the > fragments by score. Technically BaseFragmentsBuilder (line 261 in my copy of > the source) does the copying. > Rather than copying the score I _think_ it'd be more right to pull that > copying logic into a protected method that child classes (such as > ScoreOrderFragmentsBuilder) can override to do more intelligent things. > Exactly what that means isn't clear to me at the moment. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5506) Support DocValues in (ICU)CollationField
[ https://issues.apache.org/jira/browse/SOLR-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-5506. --- Resolution: Fixed Fix Version/s: 4.7 5.0 > Support DocValues in (ICU)CollationField > > > Key: SOLR-5506 > URL: https://issues.apache.org/jira/browse/SOLR-5506 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Robert Muir > Fix For: 5.0, 4.7 > > Attachments: SOLR-5506.patch > > > These field types don't support DV... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5506) Support DocValues in (ICU)CollationField
[ https://issues.apache.org/jira/browse/SOLR-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834413#comment-13834413 ] ASF subversion and git services commented on SOLR-5506: --- Commit 1546247 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1546247 ] SOLR-5506: Support docValues in (ICU)CollationField > Support DocValues in (ICU)CollationField > > > Key: SOLR-5506 > URL: https://issues.apache.org/jira/browse/SOLR-5506 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Robert Muir > Fix For: 5.0, 4.7 > > Attachments: SOLR-5506.patch > > > These field types don't support DV... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType
[ https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-5354: - Attachment: SOLR-5354.patch Thanks Robert and Hoss for the reviews. New patch incorporating Hoss's suggestions, details below: bq. call me paranoid, but i really dislike distrib tests that only use the query() method to ensure that the distrib response is the same as the control response – could we please add some assertions that use queryServer() to prove the docs are coming back in the right order in the distrib test? As Hoss suggested on #solr IRC, I changed {{BasicDistributedSearchTestCase#query()}} to return the {{QueryResponse}} (instead of void), so that {{queryServer()}} doesn't have to be called separately from the random server vs. control check that {{query()}} performs. I then added a new method to {{SolrTestCaseJ4}}: {{assertFieldValues(doclist,fieldName,values)}}, and now check that {{id}} field values are in the expected order using this new method. bq. the test should really sanity check that multi-level sorts (eg: "payload asc, id desc") are working properly Added. bq. we should be really clear & careful in the javadocs for FieldType.marshalSortValue and FieldType.unmarshalSortValue – in your patch they refer to "a value of this FieldType" but that's not actually what they operate on. They operate on the values used by the FieldComparator returned by the SortField for this FieldType (ie: SortableDoubleField's toObject returns a Double, but the marshal method operates on ByteRef) Reworded. bq. I'm confused why we still need comparatorNatural() and it's use for REWRITEABLE. Why not actually rewrite() the SortField using the local IndexSearcher and then wrap the rewritten SortField's FieldComparator using comparatorFieldComparator() just like any other SortField? Since we're only ever going to compare the raw values on the coordinator it shouldn't matter if we rewrite in terms of the local IndexSearcher - it's the best we can do, and that seems safer then assuming REWRITABLE == function and trusting comparatorNatural. (ie: consider someone who writes a custom FieldType that uses REWRITABLE) Fixed - {{comparatorNatural()}} is gone. bq. don't the marshal methods in StrField, TextField, and CollationField need null checks (for the possibilities of docs w/o a value in the sort field?) Yes, they do, {{ICUCollatonField}} too - added. bq. do we even have any existing tests of distributed sorting on strings & numerics using sortMisstingLast / sortMissingFirst to be sure we don't break that? No, I couldn't find any, so I added tests for {{sortMissingFirst}} and {{sortMissingLast}} on both {{SortableIntField}} and {{StringField}} on the existing {{TestDistributedSearch}} class. I'll add testing for a Trie field too before I commit, not in the patch yet. > Distributed sort is broken with CUSTOM FieldType > > > Key: SOLR-5354 > URL: https://issues.apache.org/jira/browse/SOLR-5354 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 4.4, 4.5, 5.0 >Reporter: Jessica Cheng >Assignee: Steve Rowe > Labels: custom, query, sort > Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch > > > We added a custom field type to allow an indexed binary field type that > supports search (exact match), prefix search, and sort as unsigned bytes > lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator > accomplishes what we want, and even though the name of the comparator > mentions UTF8, it doesn't actually assume so and just does byte-level > operation, so it's good. However, when we do this across different nodes, we > run into an issue where in QueryComponent.doFieldSortValues: > // Must do the same conversion when sorting by a > // String field in Lucene, which returns the terms > // data as BytesRef: > if (val instanceof BytesRef) { > UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); > field.setStringValue(spare.toString()); > val = ft.toObject(field); > } > UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually > UTF8. I did a hack where I specified our own field comparator to be > ByteBuffer based to get around that instanceof check, but then the field > value gets transformed into BYTEARR in JavaBinCodec, and when it's > unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a > ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, > which decides to give me comparatorNatural in the else of the TODO for > CUSTOM, which barfs because byte[] are not Comparable... > From Chris Hostetter: > I'm not very fami
[jira] [Commented] (SOLR-5506) Support DocValues in (ICU)CollationField
[ https://issues.apache.org/jira/browse/SOLR-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834394#comment-13834394 ] ASF subversion and git services commented on SOLR-5506: --- Commit 1546245 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1546245 ] SOLR-5506: Support docValues in (ICU)CollationField > Support DocValues in (ICU)CollationField > > > Key: SOLR-5506 > URL: https://issues.apache.org/jira/browse/SOLR-5506 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Robert Muir > Attachments: SOLR-5506.patch > > > These field types don't support DV... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5463: --- Attachment: SOLR-5463__straw_man.patch Added simple support for sorts involving score, and added randomized testing of multi-level sorts, both in single node and distributed modes. next up i'm going to look into improving the serialization of the totem to make it work better with strings and CUSTOM SortFields -- which requires leveraging the improvements sarowe is working on in SOLR-5354. > Provide cursor/token based "searchAfter" support that works with arbitrary > sorting (ie: "deep paging") > -- > > Key: SOLR-5463 > URL: https://issues.apache.org/jira/browse/SOLR-5463 > Project: Solr > Issue Type: New Feature >Reporter: Hoss Man >Assignee: Hoss Man > Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch > > > I'd like to revist a solution to the problem of "deep paging" in Solr, > leveraging an HTTP based API similar to how IndexSearcher.searchAfter works > at the lucene level: require the clients to provide back a token indicating > the sort values of the last document seen on the previous "page". This is > similar to the "cursor" model I've seen in several other REST APIs that > support "pagnation" over a large sets of results (notable the twitter API and > it's "since_id" param) except that we'll want something that works with > arbitrary multi-level sort critera that can be either ascending or descending. > SOLR-1726 laid some initial ground work here and was commited quite a while > ago, but the key bit of argument parsing to leverage it was commented out due > to some problems (see comments in that issue). It's also somewhat out of > date at this point: at the time it was commited, IndexSearcher only supported > searchAfter for simple scores, not arbitrary field sorts; and the params > added in SOLR-1726 suffer from this limitation as well. > --- > I think it would make sense to start fresh with a new issue with a focus on > ensuring that we have deep paging which: > * supports arbitrary field sorts in addition to sorting by score > * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5285) FastVectorHighlighter copies segments scores when splitting segments across multi-valued fields
[ https://issues.apache.org/jira/browse/LUCENE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834375#comment-13834375 ] Adrien Grand commented on LUCENE-5285: -- I think the patch is good. I'm just a bit confused by WeightedFragListBuilderTest, shouldn't the "=" in {{totalSubInfoBoost = subInfo.getBoost();}} be a "+=" actually? > FastVectorHighlighter copies segments scores when splitting segments across > multi-valued fields > --- > > Key: LUCENE-5285 > URL: https://issues.apache.org/jira/browse/LUCENE-5285 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nik Everett >Priority: Minor > Attachments: LUCENE-5285.patch > > > FastVectorHighlighter copies segments scores when splitting segments across > multi-valued fields. This is only a problem when you want to sort the > fragments by score. Technically BaseFragmentsBuilder (line 261 in my copy of > the source) does the copying. > Rather than copying the score I _think_ it'd be more right to pull that > copying logic into a protected method that child classes (such as > ScoreOrderFragmentsBuilder) can override to do more intelligent things. > Exactly what that means isn't clear to me at the moment. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834318#comment-13834318 ] Furkan KAMACI commented on SOLR-5332: - I just gave an example use case of that option. I mean that: EdgeNGram may have that option or this option may be removed from WordDelimiter too it depends on whichever is a good choice. Of course it does not mean that if WordDelimiter has that option others should have too. However they have similar use cases and WordDelimiter one has that option. On the other hand this issue is a duplicate of another one as I mentioned at my comment. This issue has some problems at description section as I mentioned too so we should not directly care about it as a use case. I implemented a wish for community because some people needs and wants it (I do not use it at my current application/s). It is up to us to decide using it or not. > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4, 4.5, 4.5.1, 4.6 >Reporter: Alexander S. > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5504) We need better testing for SolrCmdDistributor retry logic.
[ https://issues.apache.org/jira/browse/SOLR-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834313#comment-13834313 ] ASF subversion and git services commented on SOLR-5504: --- Commit 1546225 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1546225 ] SOLR-5504: Windows can throw a ConnectException when Linux throws a SocketException - also add a bit more testing. > We need better testing for SolrCmdDistributor retry logic. > -- > > Key: SOLR-5504 > URL: https://issues.apache.org/jira/browse/SOLR-5504 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: SOLR-5504.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5504) We need better testing for SolrCmdDistributor retry logic.
[ https://issues.apache.org/jira/browse/SOLR-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834306#comment-13834306 ] ASF subversion and git services commented on SOLR-5504: --- Commit 1546224 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1546224 ] SOLR-5504: Windows can throw a ConnectException when Linux throws a SocketException - also add a bit more testing. > We need better testing for SolrCmdDistributor retry logic. > -- > > Key: SOLR-5504 > URL: https://issues.apache.org/jira/browse/SOLR-5504 > Project: Solr > Issue Type: Test >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: SOLR-5504.patch > > -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834285#comment-13834285 ] Robert Muir commented on SOLR-5332: --- Just because WordDelimiterFilter has an option doesnt mean other filters should have it, its hardly a model citizen. Probably even more reason to really think about what is happening and question if its the right thing to do. For the use case described in the issue, a separate field suffices and is likely more flexible and just as efficient. I admit i dont fully understand what James is doing. I'm just saying I dont think our filters need options like "preserve" or "inject" because I see generally no value versus just using another field: its typically just users who dont understand that the underlying cost in an inverted index is the same. > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4, 4.5, 4.5.1, 4.6 >Reporter: Alexander S. > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834223#comment-13834223 ] Furkan KAMACI commented on SOLR-5332: - Actually there is same situation at WordDelimiterFilterFactory. It splits words into new ones but still has a preserveOriginal capability too. > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4, 4.5, 4.5.1, 4.6 >Reporter: Alexander S. > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834141#comment-13834141 ] James Dyer commented on SOLR-5332: -- there is if a user enters 2 keywords, the one matches an edgengram and the other matches an original keyword. Our case involves book contributors. If a book has 2 contributors, John Smith & Edward Jones, we want the user to get a result if they query "edward jones" or "e jones" or "ed jones", but not "edward smith" nor "e smith", etc. The only solution I could come up with involved with a combination of edge n-grams and the original keywords in the same field. I think there are valid usecases for this, perhaps not very many. > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4, 4.5, 4.5.1, 4.6 >Reporter: Alexander S. > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834135#comment-13834135 ] Robert Muir commented on SOLR-5332: --- James but the issue is still the same. There is no savings of doing this in the same field! So to me its more clear to query on foo_exact:whatever if you want an exact match versus doing it in a roundabout way with a sloppy phrase query. > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4, 4.5, 4.5.1, 4.6 >Reporter: Alexander S. > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834125#comment-13834125 ] ASF subversion and git services commented on SOLR-5509: --- Commit 1546190 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1546190 ] SOLR-5509: fix possible npe > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834124#comment-13834124 ] ASF subversion and git services commented on SOLR-5509: --- Commit 1546189 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1546189 ] SOLR-5509: fix possible npe > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834122#comment-13834122 ] ASF subversion and git services commented on SOLR-5509: --- Commit 1546186 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1546186 ] SOLR-5509: better debug logging > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834123#comment-13834123 ] ASF subversion and git services commented on SOLR-5509: --- Commit 1546187 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1546187 ] SOLR-5509: better debug logging > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1690) JSONKeyValueTokenizerFactory -- JSON Tokenizer
[ https://issues.apache.org/jira/browse/SOLR-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834119#comment-13834119 ] Prashant Saraswat commented on SOLR-1690: - @Ryan Mckinley: Many thanks for attaching the patch here. It is most useful. @Hoss Man: Consider this usecase.Take your favorite ecommerce site ( say newegg.com, ebay.com etc ). Notice that they have some kind of category hierarchy. Each category has category attributes ( say Brand ) with category sensitive possible values(Apple/Samsung for cell phone and Sharp/Samsung for HDTVs) (. In these cases the number of categories specific attributes are in 10's of thousand. It is not realistically possible to create such a schema so that every category specific attribute is mapped to a solr field. However, you can store the category specific attributes per category as a json string. Now, you do need to filter by category specific attributes. Say you are searching for HDTVs and you only want to see those manufactured by Samsung. As is, solr will not allow you to search in a field which looks like this: {"name":"Brand", "value":"Samsung"} something like fq=categoryattribute:"name":"brand","value":"samsung" ( properly escaped ) doesn't work Enter the awesome jsontokenizer written by Ryan McKinley. This allows the same field to be indexed as json and something like fq=categoryattribute:"name:brand" AND categoryattribute:"value:Samsung" works. Happy to provide more information if needed. Also happy to take the slap if I'm missing something obvious here. > JSONKeyValueTokenizerFactory -- JSON Tokenizer > -- > > Key: SOLR-1690 > URL: https://issues.apache.org/jira/browse/SOLR-1690 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis >Reporter: Ryan McKinley >Priority: Minor > Attachments: SOLR-1690-JSONKeyValueTokenizerFactory.patch, > noggit-1.0-A1.jar > > > Sometimes it is nice to group structured data into a single field. > This (rough) patch, takes JSON input and indexes tokens based on the key > values pairs in the json. > {code:xml|title=schema.xml} > > omitNorms="true"> > > hierarchicalKey="false"/> > > > > > > > > > > {code} > Given text: > {code} > { "hello": "world", "rank":5 } > {code} > indexed as two tokens: > || term position |1 | 2 | > || term text |hello:world | rank:5 | > || term type |word | word | > || source start,end | 12,17 | 27,28 | -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834109#comment-13834109 ] Mark Miller commented on SOLR-5509: --- Better retry logging: {noformat} [junit4] 2> 67370 T27 C26 P44590 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={CONTROL=TRUE&wt=javabin&version=2} {add=[50060 (1452887946460921856)]} 0 172 [junit4] 2> 67695 T80 C29 P36244 oasc.SolrException.log ERROR forwarding update to http://127.0.0.1:50810/collection1/ failed - retrying ... retries: 1 add{,id=50060}:org.apache.solr.common.SolrException: Can not find: /collection1/update [junit4] 2> 68200 T80 C29 P36244 oasc.SolrException.log ERROR forwarding update to http://127.0.0.1:50810/collection1/ failed - retrying ... retries: 2 add{,id=50060}:org.apache.solr.common.SolrException: Can not find: /collection1/update [junit4] 2> 69877 T112 C30 P40511 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={distrib.from=http://127.0.0.1:36244/collection1/&wt=javabin&version=2&update.distrib=FROMLEADER} {add=[50060 (1452887949111721984)]} 0 3 [junit4] 2> 69878 T80 C29 P36244 oasc.SolrException.log ERROR forwarding update to http://127.0.0.1:50810/collection1/ failed - retrying ... retries: 3 add{,id=50060}:org.apache.http.conn.HttpHostConnectException: Connection to http://127.0.0.1:50810 refused [junit4] 2> 69978 T152 C28 P34789 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={distrib.from=http://127.0.0.1:36244/collection1/&wt=javabin&version=2&update.distrib=FROMLEADER} {add=[50060 (1452887949111721984)]} 0 104 [junit4] 2> 69978 T81 C29 P36244 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50060 (1452887949111721984)]} 0 1610 [junit4] 2> 70106 T26 C26 P44590 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={CONTROL=TRUE&wt=javabin&version=2} {delete=[50060 (-1452887949358137344)]} 0 1 [junit4] 2> 70112 T151 C28 P34789 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={distrib.from=http://127.0.0.1:36244/collection1/&wt=javabin&version=2&update.distrib=FROMLEADER} {delete=[50060 (-1452887949361283072)]} 0 1 [junit4] 2> 70112 T114 C30 P40511 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={distrib.from=http://127.0.0.1:36244/collection1/&wt=javabin&version=2&update.distrib=FROMLEADER} {delete=[50060 (-1452887949361283072)]} 0 1 [junit4] 2> 70113 T81 C29 P36244 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={wt=javabin&version=2} {delete=[50060 (-1452887949361283072)]} 0 5 [junit4] 2> 70386 T116 C30 P40511 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={distrib.from=http://127.0.0.1:36244/collection1/&wt=javabin&version=2&update.distrib=FROMLEADER} {add=[50060 (1452887949646495744)]} 0 2 [junit4] 2> 70386 T150 C28 P34789 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={distrib.from=http://127.0.0.1:36244/collection1/&wt=javabin&version=2&update.distrib=FROMLEADER} {add=[50060 (1452887949646495744)]} 0 2 [junit4] 2> 70387 T78 C29 P36244 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={wt=javabin&version=2&update.distrib=TOLEADER} {add=[50060 (1452887949646495744)]} 0 7 [junit4] 2> 70388 T80 C29 P36244 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50060]} 0 3001 [junit4] 2> ## Only in cloudDocList: [{id=50060}] [junit4] 2>cloudClient :{numFound=1,start=0,docs=[SolrDocument{id=50060, _version_=1452887949646495744}]} {noformat} > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpda
[jira] [Updated] (SOLR-5505) LoggingInfoStream not usabe in a multi-core setup
[ https://issues.apache.org/jira/browse/SOLR-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shikhar Bhushan updated SOLR-5505: -- Attachment: SOLR-5505.patch Attaching patch against trunk. It does something different than what I proposed earlier: a) {{LoggingInfoStream}} constructor takes the slf4j {{Logger}} instance to be used as a constructor param. b) {{SolrIndexConfig}} checks if there is a "loggerName" configuration attribute on the "infoStream" tag, and if so this is used as the name for the {{Logger}}. Otherwise, the previous default of the {{LoggingInfoStream}} class name is used. This will enable users to manage the log output using their logging subsystem, e.g. the formatting pattern, to what log file etc. b) Additionally, I removed logging of the thread name from within {{LoggingInfoStream}}, since this is commonly configured at the level of the formatting patter for a logger. > LoggingInfoStream not usabe in a multi-core setup > - > > Key: SOLR-5505 > URL: https://issues.apache.org/jira/browse/SOLR-5505 > Project: Solr > Issue Type: Bug >Affects Versions: 4.6 >Reporter: Shikhar Bhushan > Attachments: SOLR-5505.patch > > > {{LoggingInfoStream}} that was introduced in SOLR-4977 does not log any core > context. > Previously this was possible by encoding this into the infoStream's file path. > This means in a multi-core setup it is very hard to distinguish between the > infoStream messages for different cores. > {{LoggingInfoStream}} should be automatically configured to prepend the core > name to log messages. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5511) Provide a more customizable explain
Jamie Johnson created SOLR-5511: --- Summary: Provide a more customizable explain Key: SOLR-5511 URL: https://issues.apache.org/jira/browse/SOLR-5511 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 4.5.1 Reporter: Jamie Johnson It would be great if there was the capability to choose the items we want returned when using explain. For instance there are cases where tf is needed, but fieldnorm, idf, etc are not. I'm not sure if this requires any additional or less processing but could certainly save some on the transfer and make the clients life easier in only getting items they are interested in. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834074#comment-13834074 ] James Dyer commented on SOLR-5332: -- We have a use case where we use a modified version of EdgeNGramFilter to "preserve the original". The field we used this on is multi-valued. We change all user queries against the field to phrases with slop to prevent partial matches across values. But our users also want to be able to enter sub-strings on this field. (Because all queries are phrase queries, wildcards are not an option.) So had this functionality existed we would have been spared of having to implement it ourselves. (I didn't contribute the code because I couldn't imagine it had broad applicability. But it seems that with this issue, at least a few others out there have cases for it as well) > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4, 4.5, 4.5.1, 4.6 >Reporter: Alexander S. > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834068#comment-13834068 ] ASF subversion and git services commented on LUCENE-5339: - Commit 1546167 from [~mikemccand] in branch 'dev/branches/lucene5339' [ https://svn.apache.org/r1546167 ] LUCENE-5339: nocommits > Simplify the facet module APIs > -- > > Key: LUCENE-5339 > URL: https://issues.apache.org/jira/browse/LUCENE-5339 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-5339.patch, LUCENE-5339.patch > > > I'd like to explore simplifications to the facet module's APIs: I > think the current APIs are complex, and the addition of a new feature > (sparse faceting, LUCENE-5333) threatens to add even more classes > (e.g., FacetRequestBuilder). I think we can do better. > So, I've been prototyping some drastic changes; this is very > early/exploratory and I'm not sure where it'll wind up but I think the > new approach shows promise. > The big changes are: > * Instead of *FacetRequest/Params/Result, you directly instantiate > the classes that do facet counting (currently TaxonomyFacetCounts, > RangeFacetCounts or SortedSetDVFacetCounts), passing in the > SimpleFacetsCollector, and then you interact with those classes to > pull labels + values (topN under a path, sparse, specific labels). > * At index time, no more FacetIndexingParams/CategoryListParams; > instead, you make a new SimpleFacetFields and pass it the field it > should store facets + drill downs under. If you want more than > one CLI you create more than one instance of SimpleFacetFields. > * I added a simple schema, where you state which dimensions are > hierarchical or multi-valued. From this we decide how to index > the ordinals (no more OrdinalPolicy). > Sparse faceting is just another method (getAllDims), on both taxonomy > & ssdv facet classes. > I haven't created a common base class / interface for all of the > search-time facet classes, but I think this may be possible/clean, and > perhaps useful for drill sideways. > All the new classes are under oal.facet.simple.*. > Lots of things that don't work yet: drill sideways, complements, > associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834063#comment-13834063 ] Mark Miller commented on SOLR-5509: --- Who sent you version 1452880908850823168? > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834061#comment-13834061 ] ASF subversion and git services commented on SOLR-5509: --- Commit 1546166 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1546166 ] SOLR-5509: Beef up retry logging > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834055#comment-13834055 ] ASF subversion and git services commented on SOLR-5509: --- Commit 1546164 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1546164 ] SOLR-5509: Beef up retry logging > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5510) genericCoreNodeNames="${genericCoreNodeNames:false}" and old style solr.xml fails to create collection
[ https://issues.apache.org/jira/browse/SOLR-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834052#comment-13834052 ] Mark Miller commented on SOLR-5510: --- coreNodeName is different than genericCoreNodeNames and is important so that you can start up a new node and have it replace a node that died. > genericCoreNodeNames="${genericCoreNodeNames:false}" and old style solr.xml > fails to create collection > -- > > Key: SOLR-5510 > URL: https://issues.apache.org/jira/browse/SOLR-5510 > Project: Solr > Issue Type: Bug >Affects Versions: 4.6 >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > > See this for some more details > https://gist.github.com/serba/1fe113e78ae7e01a4f58 > This is a regression caused by SOLR-5311 > There are two reasons why a core does not have a reference in clusterstate > # It is starting up for the first time (core creation) > # Somebody invoked a DELETEREPLICA when the node itself was down > we neded to differentiate these two because for 1) the registration should > succeed and for #2 the registration should fail > The only way to do that was to check for the presence of the attribute > coreNodeName in the core.properties. In case #1 it would be absent and in > case#2 it would be present > but when genericCoreNodeNames="${genericCoreNodeNames:false}" > ZkController#getCoreNodeName(getCoreNodeName) behaves similarly for both the > cases and hence the failure -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5510) genericCoreNodeNames="${genericCoreNodeNames:false}" and old style solr.xml fails to create collection
[ https://issues.apache.org/jira/browse/SOLR-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834048#comment-13834048 ] Mark Miller commented on SOLR-5510: --- The genericCoreNodeNames option is there for back compatibility with those that have core names based on ip addresses from the first few SolrCloud releases. It doesn't 'create' coreNodeNames - it's an option to get names like node_1, node_2, rather than 127.0.0.1__solr,127.0.0.2_8881_solr, etc. > genericCoreNodeNames="${genericCoreNodeNames:false}" and old style solr.xml > fails to create collection > -- > > Key: SOLR-5510 > URL: https://issues.apache.org/jira/browse/SOLR-5510 > Project: Solr > Issue Type: Bug >Affects Versions: 4.6 >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > > See this for some more details > https://gist.github.com/serba/1fe113e78ae7e01a4f58 > This is a regression caused by SOLR-5311 > There are two reasons why a core does not have a reference in clusterstate > # It is starting up for the first time (core creation) > # Somebody invoked a DELETEREPLICA when the node itself was down > we neded to differentiate these two because for 1) the registration should > succeed and for #2 the registration should fail > The only way to do that was to check for the presence of the attribute > coreNodeName in the core.properties. In case #1 it would be absent and in > case#2 it would be present > but when genericCoreNodeNames="${genericCoreNodeNames:false}" > ZkController#getCoreNodeName(getCoreNodeName) behaves similarly for both the > cases and hence the failure -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5510) genericCoreNodeNames="${genericCoreNodeNames:false}" and old style solr.xml fails to create collection
[ https://issues.apache.org/jira/browse/SOLR-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834032#comment-13834032 ] Noble Paul commented on SOLR-5510: -- I feel the solution would be to eliminate the genericCoreNodeNames option. There should be only one way to create coreNodeName and that should be from the OverSeer > genericCoreNodeNames="${genericCoreNodeNames:false}" and old style solr.xml > fails to create collection > -- > > Key: SOLR-5510 > URL: https://issues.apache.org/jira/browse/SOLR-5510 > Project: Solr > Issue Type: Bug >Affects Versions: 4.6 >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > > See this for some more details > https://gist.github.com/serba/1fe113e78ae7e01a4f58 > This is a regression caused by SOLR-5311 > There are two reasons why a core does not have a reference in clusterstate > # It is starting up for the first time (core creation) > # Somebody invoked a DELETEREPLICA when the node itself was down > we neded to differentiate these two because for 1) the registration should > succeed and for #2 the registration should fail > The only way to do that was to check for the presence of the attribute > coreNodeName in the core.properties. In case #1 it would be absent and in > case#2 it would be present > but when genericCoreNodeNames="${genericCoreNodeNames:false}" > ZkController#getCoreNodeName(getCoreNodeName) behaves similarly for both the > cases and hence the failure -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5510) genericCoreNodeNames="${genericCoreNodeNames:false}" and old style solr.xml fails to create collection
[ https://issues.apache.org/jira/browse/SOLR-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5510: - Summary: genericCoreNodeNames="${genericCoreNodeNames:false}" and old style solr.xml fails to create collection (was: genericCoreNodeNames="${genericCoreNodeNames:true}" and old style solr.xml fails to create collection) > genericCoreNodeNames="${genericCoreNodeNames:false}" and old style solr.xml > fails to create collection > -- > > Key: SOLR-5510 > URL: https://issues.apache.org/jira/browse/SOLR-5510 > Project: Solr > Issue Type: Bug >Affects Versions: 4.6 >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > > See this for some more details > https://gist.github.com/serba/1fe113e78ae7e01a4f58 > This is a regression caused by SOLR-5311 > There are two reasons why a core does not have a reference in clusterstate > # It is starting up for the first time (core creation) > # Somebody invoked a DELETEREPLICA when the node itself was down > we neded to differentiate these two because for 1) the registration should > succeed and for #2 the registration should fail > The only way to do that was to check for the presence of the attribute > coreNodeName in the core.properties. In case #1 it would be absent and in > case#2 it would be present > but when genericCoreNodeNames="${genericCoreNodeNames:false}" > ZkController#getCoreNodeName(getCoreNodeName) behaves similarly for both the > cases and hence the failure -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5510) genericCoreNodeNames="${genericCoreNodeNames:true}" and old style solr.xml fails to create collection
Noble Paul created SOLR-5510: Summary: genericCoreNodeNames="${genericCoreNodeNames:true}" and old style solr.xml fails to create collection Key: SOLR-5510 URL: https://issues.apache.org/jira/browse/SOLR-5510 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Noble Paul Assignee: Noble Paul Priority: Minor See this for some more details https://gist.github.com/serba/1fe113e78ae7e01a4f58 This is a regression caused by SOLR-5311 There are two reasons why a core does not have a reference in clusterstate # It is starting up for the first time (core creation) # Somebody invoked a DELETEREPLICA when the node itself was down we neded to differentiate these two because for 1) the registration should succeed and for #2 the registration should fail The only way to do that was to check for the presence of the attribute coreNodeName in the core.properties. In case #1 it would be absent and in case#2 it would be present but when genericCoreNodeNames="${genericCoreNodeNames:false}" ZkController#getCoreNodeName(getCoreNodeName) behaves similarly for both the cases and hence the failure -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-5509: -- Attachment: cmns-test-cloud-off-by1-control-2.log > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: cmns-test-cloud-off-by1-control-2.log > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
[ https://issues.apache.org/jira/browse/SOLR-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834021#comment-13834021 ] Mark Miller commented on SOLR-5509: --- This involves an add, a delete and then an add. An add may be an update on every fail I've seen - any id over 5 might be also be added by another thread. The first add hits the control and the cloud shard. The delete does the same. Then a TOLEADER add pops back up - nothing went to control, so this came out of nowhere. > ChaosMonkeyNothingIsSafeTest rare fail. > --- > > Key: SOLR-5509 > URL: https://issues.apache.org/jira/browse/SOLR-5509 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > > {noformat} >[junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {add=[50086 (1452880907553734656)]} 0 142 >[junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 > (1452880908206997504)]} 0 254 >[junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} > {delete=[50086 (-1452880908537298944)]} 0 2 >[junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {delete=[50086 (-1452880908542541824)]} 0 1 >[junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update > params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 > (1452880908850823168)]} 0 1 >[junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish > [collection1] webapp= path=/update params={wt=javabin&version=2} > {add=[50086]} 0 1223 >[junit4] 2> ## Only in cloudDocList: [{id=50086}] >[junit4] 2> cloudClient > :{numFound=1,start=0,docs=[SolrDocument{id=50086, > _version_=1452880908850823168}]} > h > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5509) ChaosMonkeyNothingIsSafeTest rare fail.
Mark Miller created SOLR-5509: - Summary: ChaosMonkeyNothingIsSafeTest rare fail. Key: SOLR-5509 URL: https://issues.apache.org/jira/browse/SOLR-5509 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.0, 4.7 {noformat} [junit4] 2> 41386 T28 C21 P57194 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} {add=[50086 (1452880907553734656)]} 0 142 [junit4] 2> 42009 T133 C27 P60411 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086 (1452880908206997504)]} 0 254 [junit4] 2> 42323 T27 C21 P57194 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={wt=javabin&CONTROL=TRUE&version=2} {delete=[50086 (-1452880908537298944)]} 0 2 [junit4] 2> 42327 T131 C27 P60411 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={wt=javabin&version=2} {delete=[50086 (-1452880908542541824)]} 0 1 [junit4] 2> 42622 T132 C27 P60411 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={update.distrib=TOLEADER&wt=javabin&version=2} {add=[50086 (1452880908850823168)]} 0 1 [junit4] 2> 42623 T48 C22 P57136 oasup.LogUpdateProcessor.finish [collection1] webapp= path=/update params={wt=javabin&version=2} {add=[50086]} 0 1223 [junit4] 2> ## Only in cloudDocList: [{id=50086}] [junit4] 2>cloudClient :{numFound=1,start=0,docs=[SolrDocument{id=50086, _version_=1452880908850823168}]} h {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors
[ https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833990#comment-13833990 ] Dawid Weiss commented on LUCENE-5212: - We inspected the code with Robert Muir and came to the conclusion that this bug may also affect SSE machines. Since hotspot code is very complex I also contacted Vladimir and he kindly replied that yes, this is the case. {quote} And you are correct the problem also affects SSE for which we can generate 16 bytes vectors which are larger then 8 bytes stack frame alignment we have. Actually in the bug there were only 16 bytes vectors max. I think it may be related to register masks we use on AVX machine, they are larger to be able map 32 bytes vectors. But again with SSE and 16 bytes vectors you can hit the same 8024830 problem but may be much less frequent. {quote} > java 7u40 causes sigsegv and corrupt term vectors > - > > Key: LUCENE-5212 > URL: https://issues.apache.org/jira/browse/LUCENE-5212 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: crashFaster.patch, crashFaster2.0.patch, > hs_err_pid32714.log, jenkins.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833975#comment-13833975 ] ASF subversion and git services commented on LUCENE-5339: - Commit 1546129 from [~mikemccand] in branch 'dev/branches/lucene5339' [ https://svn.apache.org/r1546129 ] LUCENE-5339: address some nocommits > Simplify the facet module APIs > -- > > Key: LUCENE-5339 > URL: https://issues.apache.org/jira/browse/LUCENE-5339 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-5339.patch, LUCENE-5339.patch > > > I'd like to explore simplifications to the facet module's APIs: I > think the current APIs are complex, and the addition of a new feature > (sparse faceting, LUCENE-5333) threatens to add even more classes > (e.g., FacetRequestBuilder). I think we can do better. > So, I've been prototyping some drastic changes; this is very > early/exploratory and I'm not sure where it'll wind up but I think the > new approach shows promise. > The big changes are: > * Instead of *FacetRequest/Params/Result, you directly instantiate > the classes that do facet counting (currently TaxonomyFacetCounts, > RangeFacetCounts or SortedSetDVFacetCounts), passing in the > SimpleFacetsCollector, and then you interact with those classes to > pull labels + values (topN under a path, sparse, specific labels). > * At index time, no more FacetIndexingParams/CategoryListParams; > instead, you make a new SimpleFacetFields and pass it the field it > should store facets + drill downs under. If you want more than > one CLI you create more than one instance of SimpleFacetFields. > * I added a simple schema, where you state which dimensions are > hierarchical or multi-valued. From this we decide how to index > the ordinals (no more OrdinalPolicy). > Sparse faceting is just another method (getAllDims), on both taxonomy > & ssdv facet classes. > I haven't created a common base class / interface for all of the > search-time facet classes, but I think this may be possible/clean, and > perhaps useful for drill sideways. > All the new classes are under oal.facet.simple.*. > Lots of things that don't work yet: drill sideways, complements, > associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833923#comment-13833923 ] ASF subversion and git services commented on LUCENE-5339: - Commit 1546097 from [~mikemccand] in branch 'dev/branches/lucene5339' [ https://svn.apache.org/r1546097 ] LUCENE-5339: factor out base classes for int/float taxonomy aggregates > Simplify the facet module APIs > -- > > Key: LUCENE-5339 > URL: https://issues.apache.org/jira/browse/LUCENE-5339 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-5339.patch, LUCENE-5339.patch > > > I'd like to explore simplifications to the facet module's APIs: I > think the current APIs are complex, and the addition of a new feature > (sparse faceting, LUCENE-5333) threatens to add even more classes > (e.g., FacetRequestBuilder). I think we can do better. > So, I've been prototyping some drastic changes; this is very > early/exploratory and I'm not sure where it'll wind up but I think the > new approach shows promise. > The big changes are: > * Instead of *FacetRequest/Params/Result, you directly instantiate > the classes that do facet counting (currently TaxonomyFacetCounts, > RangeFacetCounts or SortedSetDVFacetCounts), passing in the > SimpleFacetsCollector, and then you interact with those classes to > pull labels + values (topN under a path, sparse, specific labels). > * At index time, no more FacetIndexingParams/CategoryListParams; > instead, you make a new SimpleFacetFields and pass it the field it > should store facets + drill downs under. If you want more than > one CLI you create more than one instance of SimpleFacetFields. > * I added a simple schema, where you state which dimensions are > hierarchical or multi-valued. From this we decide how to index > the ordinals (no more OrdinalPolicy). > Sparse faceting is just another method (getAllDims), on both taxonomy > & ssdv facet classes. > I haven't created a common base class / interface for all of the > search-time facet classes, but I think this may be possible/clean, and > perhaps useful for drill sideways. > All the new classes are under oal.facet.simple.*. > Lots of things that don't work yet: drill sideways, complements, > associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Similarity - No Match
The decimal similarity gets translated into a number of characters, based on the term length. so it will be 1, 2, 3, or 4, which correspond to 0.25, 0.50, 0.75, or 1.00. Your 0.6 is getting rounded up to 0.75, which means three-quarters or three out of four characters must match. With 0.5, only two out of four characters must match. (Note: This is not a precise description of fuzzy matching, but close enough to explain the issue here.) Also, decimal similarity for fuzzy query is deprecated in favor of specifying the editing distance, so you should be using ~1 or ~2 – only 0, 1, and 2 are supported. -- Jack Krupansky From: Fabian Vigna Sent: Wednesday, November 27, 2013 9:55 AM To: dev@lucene.apache.org Subject: Similarity - No Match Hello everybody, The case I have pending pertains to BANK REFAH. If you enter BANK REFHA inverting the last two letters, it does not find a match with Similarity 6. It does find it with similarity 5. (REFHA~0.6 AND BANK~0.6) My question is: Why just inverting the last 2 letters it does not find a match? Thanks! Fabian
[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833874#comment-13833874 ] Erick Erickson commented on SOLR-5488: -- I noticed that the Facets testing was also using the indexOf operations, so I fixed that up. What I also did was changed all the assert statements to dump the raw response on failure so we can see what the actual data we're working with is. At least I think I got them all, there are a _lot_ of them, which is good. We'll see how this bakes. > Fix up test failures for Analytics Component > > > Key: SOLR-5488 > URL: https://issues.apache.org/jira/browse/SOLR-5488 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 4.7 >Reporter: Erick Erickson >Assignee: Erick Erickson > Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch > > > The analytics component has a few test failures, perhaps > environment-dependent. This is just to collect the test fixes in one place > for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833869#comment-13833869 ] ASF subversion and git services commented on SOLR-5488: --- Commit 1546074 from [~erickoerickson] in branch 'dev/trunk' [ https://svn.apache.org/r1546074 ] SOLR-5488: Changing Facets testing to use DOM rather than string operations > Fix up test failures for Analytics Component > > > Key: SOLR-5488 > URL: https://issues.apache.org/jira/browse/SOLR-5488 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 4.7 >Reporter: Erick Erickson >Assignee: Erick Erickson > Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch > > > The analytics component has a few test failures, perhaps > environment-dependent. This is just to collect the test fixes in one place > for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Similarity - No Match
Hello everybody, The case I have pending pertains to BANK REFAH. If you enter BANK REFHA inverting the last two letters, it does not find a match with Similarity 6. It does find it with similarity 5. (REFHA~0.6 AND BANK~0.6) My question is: Why just inverting the last 2 letters it does not find a match? Thanks! Fabian
[jira] [Updated] (SOLR-3702) String concatenation function
[ https://issues.apache.org/jira/browse/SOLR-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kudryavtsev updated SOLR-3702: - Attachment: SOLR-3702.patch Test added > String concatenation function > - > > Key: SOLR-3702 > URL: https://issues.apache.org/jira/browse/SOLR-3702 > Project: Solr > Issue Type: New Feature > Components: query parsers >Affects Versions: 4.0-ALPHA >Reporter: Ted Strauss > Attachments: SOLR-3702.patch, SOLR-3702.patch > > > Related to https://issues.apache.org/jira/browse/SOLR-2526 > Add query function to support concatenation of Strings. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring
[ https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833812#comment-13833812 ] Aaron Daubman commented on LUCENE-4100: --- Thanks for the update [~spo] - My particular use-case seems tailor made for this. I have several decently large (10-30G indices) solr instances, all of which run in read-only mode and are created ~2x a day via a snapshot process that rolls the index out to load-balanced servers. Several of these instances routinely match 30-80% (custom MLT-like queries) of the 2-25M docs in the index per-query, so efficient scoring would be a huge win here. I already have to patch and custom-build solr for our use (until I get around to creating required tests to haver SOLR-2052 accepted) and am wondering if you have any thoughts/guidance on trying out your patch? The main use-case is from a custom extension of QueryComponent that overrides perpare() and essentially builds up a custom boosted boolean query and uses rb.setQueryString and rb.setFilters... > Maxscore - Efficient Scoring > > > Key: LUCENE-4100 > URL: https://issues.apache.org/jira/browse/LUCENE-4100 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/query/scoring, core/search >Affects Versions: 4.0-ALPHA >Reporter: Stefan Pohl > Labels: api-change, patch, performance > Fix For: 4.7 > > Attachments: contrib_maxscore.tgz, maxscore.patch > > > At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient > algorithm first published in the IR domain in 1995 by H. Turtle & J. Flood, > that I find deserves more attention among Lucene users (and developers). > I implemented a proof of concept and did some performance measurements with > example queries and lucenebench, the package of Mike McCandless, resulting in > very significant speedups. > This ticket is to get started the discussion on including the implementation > into Lucene's codebase. Because the technique requires awareness about it > from the Lucene user/developer, it seems best to become a contrib/module > package so that it consciously can be chosen to be used. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833811#comment-13833811 ] Robert Muir commented on SOLR-5332: --- Why not just use another field? Its the same cost either way as this setting: except it works today and we dont have to maintain it. Additionally you maintain more control: you can control boosting etc across the different fields > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4, 4.5, 4.5.1, 4.6 >Reporter: Alexander S. > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: RC0 Release apache-solr-ref-guide-4.6.pdf
I noticed a couple of small typos and inconsistencies that I've fixed, but I don't think they warrant a respin. They're more for appearance than for any factual problems. +1 Sorry for the delay from me - I've been traveling for holidays. On Tue, Nov 26, 2013 at 4:22 AM, Jan Høydahl wrote: > * Page 5: Screenshots with 4.0.0-beta texts > * Page 165: Links to 4.0.0 version of JavaDoc (now fixed in Confluence) > * Page 204: Table - group.func - "Supported only in Sol4r 4.0." (should be > "Supported since Solr 4.0.") (now fixed in Confluence) > * Page 308: Strange xml code box layout, why all the whitespace? > > But these are minors, so here's my +1 > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > 25. nov. 2013 kl. 19:34 skrev Chris Hostetter : > >> >> Please VOTE to release the following as apache-solr-ref-guide-4.6.pdf ... >> >> https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.6-RC0/ >> >> $ cat apache-solr-ref-guide-4.6.pdf.sha1 >> 7ad494c5a3cdc085e01a54d507ae33a75cc319e6 apache-solr-ref-guide-4.6.pdf >> >> >> >> >> -Hoss >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833711#comment-13833711 ] ASF subversion and git services commented on LUCENE-5339: - Commit 1546008 from [~mikemccand] in branch 'dev/branches/lucene5339' [ https://svn.apache.org/r1546008 ] LUCENE-5339: Gilad's feedback, improve javadocs > Simplify the facet module APIs > -- > > Key: LUCENE-5339 > URL: https://issues.apache.org/jira/browse/LUCENE-5339 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-5339.patch, LUCENE-5339.patch > > > I'd like to explore simplifications to the facet module's APIs: I > think the current APIs are complex, and the addition of a new feature > (sparse faceting, LUCENE-5333) threatens to add even more classes > (e.g., FacetRequestBuilder). I think we can do better. > So, I've been prototyping some drastic changes; this is very > early/exploratory and I'm not sure where it'll wind up but I think the > new approach shows promise. > The big changes are: > * Instead of *FacetRequest/Params/Result, you directly instantiate > the classes that do facet counting (currently TaxonomyFacetCounts, > RangeFacetCounts or SortedSetDVFacetCounts), passing in the > SimpleFacetsCollector, and then you interact with those classes to > pull labels + values (topN under a path, sparse, specific labels). > * At index time, no more FacetIndexingParams/CategoryListParams; > instead, you make a new SimpleFacetFields and pass it the field it > should store facets + drill downs under. If you want more than > one CLI you create more than one instance of SimpleFacetFields. > * I added a simple schema, where you state which dimensions are > hierarchical or multi-valued. From this we decide how to index > the ordinals (no more OrdinalPolicy). > Sparse faceting is just another method (getAllDims), on both taxonomy > & ssdv facet classes. > I haven't created a common base class / interface for all of the > search-time facet classes, but I think this may be possible/clean, and > perhaps useful for drill sideways. > All the new classes are under oal.facet.simple.*. > Lots of things that don't work yet: drill sideways, complements, > associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833695#comment-13833695 ] Michael McCandless commented on LUCENE-5339: bq. Been away from the issue for some time and it looks like a major progress, Chapeau à lui Thanks Gilad, bit by bit... bq. LabelAndValue & FacetResult use instanceof checks in their equals method - is that a must? Hmm, I'm don't know how to implement .equals without an instanceof check? bq. FacetResult has a member called childCount - I think it's the number of categories/path/labels that were encountered. The current jdocs "How many labels were populated under the requested path" reveals implementation (population). Perhaps exchange populated with encountered? Fixed. bq. FloatRange and DoubleRange uses Math.nextUp/Down for infinity as the ranges are always inclusive. Perhaps these constants for float and double could be static final. Well, .nextUp and .nextAfter. But, what constants? The number is computed differently for each range (from the provided min and mx)... bq. TaxonomyFacetSumFloatAssociations and TaxonomyFacetSumValueSource reuse a LOT of code, can they extend one another? perhaps extract a common super for both? Well, they differ in the source of the value to aggregate (per doc vs per ord), but then the other methods are nearly the same except for the int/float difference... in fact, Fast/TaxoFacetCounts getTopChildren is also the same. I'll add a TODO... bq. In TaxonomyFacets the parents array is saves, I could not see where it's being used (and I think it's not used even in the older taxonomy-facet implementation). Ooh, good catch: I removed it. {quote} FacetConfig confuses me a bit, as it's very much aware of the Taxonomy, on another it handles all the kinds of the facets. Perhaps FacetConfig.build() could be split up, allowing each FacetField.Type a build() method of its own, rather than every types' building being done in the same method. It will also bring a common parent class to all FacetField types, which I also like. As such, the taxonomy part, with processFacetFields() could be moved to its respective Facet implementation. {quote} I'm on the fence on whether FacetsConfig should hold the taxonomyWriter, vs you must pass it to the build method ... I like the idea of moving the build logic into each FacetField impl, but I want to keep it simple for the app (i.e. the app should not have to invoke N build methods). > Simplify the facet module APIs > -- > > Key: LUCENE-5339 > URL: https://issues.apache.org/jira/browse/LUCENE-5339 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-5339.patch, LUCENE-5339.patch > > > I'd like to explore simplifications to the facet module's APIs: I > think the current APIs are complex, and the addition of a new feature > (sparse faceting, LUCENE-5333) threatens to add even more classes > (e.g., FacetRequestBuilder). I think we can do better. > So, I've been prototyping some drastic changes; this is very > early/exploratory and I'm not sure where it'll wind up but I think the > new approach shows promise. > The big changes are: > * Instead of *FacetRequest/Params/Result, you directly instantiate > the classes that do facet counting (currently TaxonomyFacetCounts, > RangeFacetCounts or SortedSetDVFacetCounts), passing in the > SimpleFacetsCollector, and then you interact with those classes to > pull labels + values (topN under a path, sparse, specific labels). > * At index time, no more FacetIndexingParams/CategoryListParams; > instead, you make a new SimpleFacetFields and pass it the field it > should store facets + drill downs under. If you want more than > one CLI you create more than one instance of SimpleFacetFields. > * I added a simple schema, where you state which dimensions are > hierarchical or multi-valued. From this we decide how to index > the ordinals (no more OrdinalPolicy). > Sparse faceting is just another method (getAllDims), on both taxonomy > & ssdv facet classes. > I haven't created a common base class / interface for all of the > search-time facet classes, but I think this may be possible/clean, and > perhaps useful for drill sideways. > All the new classes are under oal.facet.simple.*. > Lots of things that don't work yet: drill sideways, complements, > associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833604#comment-13833604 ] Benjamin Brandmeier commented on SOLR-2366: --- I'm also interested in this. Currently I'm using lots of facet.query parameters. It works like that, however, I guess it could be done simpler and maybe even more performant with multiple range gap values. > Facet Range Gaps > > > Key: SOLR-2366 > URL: https://issues.apache.org/jira/browse/SOLR-2366 > Project: Solr > Issue Type: Improvement >Reporter: Grant Ingersoll >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 4.6 > > Attachments: SOLR-2366.patch, SOLR-2366.patch > > > There really is no reason why the range gap for date and numeric faceting > needs to be evenly spaced. For instance, if and when SOLR-1581 is completed > and one were doing spatial distance calculations, one could facet by function > into 3 different sized buckets: walking distance (0-5KM), driving distance > (5KM-150KM) and everything else (150KM+), for instance. We should be able to > quantize the results into arbitrarily sized buckets. > (Original syntax proposal removed, see discussion for concrete syntax) -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833596#comment-13833596 ] Gilad Barkai commented on LUCENE-5339: -- Been away from the issue for some time and it looks like a major progress, Chapeau à lui {{LabelAndValue}} & {{FacetResult}} use {{instanceof}} checks in their {{equals}} method - is that a must? {{FacetResult}} has a member called {{childCount}} - I think it's the number of categories/path/labels that were encountered. The current jdocs "How many labels were populated under the requested path" reveals implementation (population). Perhaps exchange populated with encountered? {{FloatRange}} and {{DoubleRange}} uses {{Math.nextUp/Down}} for infinity as the ranges are always inclusive. Perhaps these constants for float and double could be static final. {{TaxonomyFacetSumFloatAssociations}} and {{TaxonomyFacetSumValueSource}} reuse a LOT of code, can they extend one another? perhaps extract a common super for both? In {{TaxonomyFacets}} the parents array is saves, I could not see where it's being used (and I think it's not used even in the older taxonomy-facet implementation). {{FacetConfig}} confuses me a bit, as it's very much aware of the Taxonomy, on another it handles all the kinds of the facets. Perhaps {{FacetConfig.build()}} could be split up, allowing each {{FacetField.Type}} a build() method of its own, rather than every types' building being done in the same method. It will also bring a common parent class to all FacetField types, which I also like. As such, the taxonomy part, with {{processFacetFields()}} could be moved to its respective Facet implementation. > Simplify the facet module APIs > -- > > Key: LUCENE-5339 > URL: https://issues.apache.org/jira/browse/LUCENE-5339 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-5339.patch, LUCENE-5339.patch > > > I'd like to explore simplifications to the facet module's APIs: I > think the current APIs are complex, and the addition of a new feature > (sparse faceting, LUCENE-5333) threatens to add even more classes > (e.g., FacetRequestBuilder). I think we can do better. > So, I've been prototyping some drastic changes; this is very > early/exploratory and I'm not sure where it'll wind up but I think the > new approach shows promise. > The big changes are: > * Instead of *FacetRequest/Params/Result, you directly instantiate > the classes that do facet counting (currently TaxonomyFacetCounts, > RangeFacetCounts or SortedSetDVFacetCounts), passing in the > SimpleFacetsCollector, and then you interact with those classes to > pull labels + values (topN under a path, sparse, specific labels). > * At index time, no more FacetIndexingParams/CategoryListParams; > instead, you make a new SimpleFacetFields and pass it the field it > should store facets + drill downs under. If you want more than > one CLI you create more than one instance of SimpleFacetFields. > * I added a simple schema, where you state which dimensions are > hierarchical or multi-valued. From this we decide how to index > the ordinals (no more OrdinalPolicy). > Sparse faceting is just another method (getAllDims), on both taxonomy > & ssdv facet classes. > I haven't created a common base class / interface for all of the > search-time facet classes, but I think this may be possible/clean, and > perhaps useful for drill sideways. > All the new classes are under oal.facet.simple.*. > Lots of things that don't work yet: drill sideways, complements, > associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-5332: --- Affects Version/s: 4.6 4.5 4.5.1 > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4, 4.5, 4.5.1, 4.6 >Reporter: Alexander S. > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833582#comment-13833582 ] Furkan KAMACI commented on SOLR-5332: - [~aheaven] if you change the Fix Version/s to the next release this issue can be regarded. > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4 >Reporter: Alexander S. > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-5332: --- Affects Version/s: 4.4 > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4 >Reporter: Alexander S. > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring
[ https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833573#comment-13833573 ] Stefan Pohl commented on LUCENE-4100: - Thanks for your interest, [~daubman]. [~rcmuir] accurately describes the current status and challenges here. There have to be made quite a few major and even more minor steps to eventually arrive at anything general-purpose and user-ready. However, given the feedback that I got so far (there seem to be many people that don't have extremely high NRT-requirements and who rebuild their whole indexes every few hours/days anyways), I still think that it would be worthwhile to have this as-is (with the limitation to static indexes) in a separate contrib-package that users are discouraged to use, except they really know what they are doing. Adding a few exceptions to guard users from wrong behaviour/misusage might also be useful. As long as I, as the reporter, don't close the issue, it won't 'bite rot', don't worry ;) I also got to hear of quite some interest when I mentioned it at this year's Lucene/Solr-Revolutions.EU conference. > Maxscore - Efficient Scoring > > > Key: LUCENE-4100 > URL: https://issues.apache.org/jira/browse/LUCENE-4100 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/query/scoring, core/search >Affects Versions: 4.0-ALPHA >Reporter: Stefan Pohl > Labels: api-change, patch, performance > Fix For: 4.7 > > Attachments: contrib_maxscore.tgz, maxscore.patch > > > At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient > algorithm first published in the IR domain in 1995 by H. Turtle & J. Flood, > that I find deserves more attention among Lucene users (and developers). > I implemented a proof of concept and did some performance measurements with > example queries and lucenebench, the package of Mike McCandless, resulting in > very significant speedups. > This ticket is to get started the discussion on including the implementation > into Lucene's codebase. Because the technique requires awareness about it > from the Lucene user/developer, it seems best to become a contrib/module > package so that it consciously can be chosen to be used. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-2366: --- Assignee: Shalin Shekhar Mangar > Facet Range Gaps > > > Key: SOLR-2366 > URL: https://issues.apache.org/jira/browse/SOLR-2366 > Project: Solr > Issue Type: Improvement >Reporter: Grant Ingersoll >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 4.6 > > Attachments: SOLR-2366.patch, SOLR-2366.patch > > > There really is no reason why the range gap for date and numeric faceting > needs to be evenly spaced. For instance, if and when SOLR-1581 is completed > and one were doing spatial distance calculations, one could facet by function > into 3 different sized buckets: walking distance (0-5KM), driving distance > (5KM-150KM) and everything else (150KM+), for instance. We should be able to > quantize the results into arbitrarily sized buckets. > (Original syntax proposal removed, see discussion for concrete syntax) -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org