[jira] [Commented] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand
[ https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335405#comment-14335405 ] Dallan Quass commented on SOLR-6581: Any ideas how much slower numeric collapse/expand implementation is than string collapse/expand with the top_fc hint? I'm trying to decide if I should re-index my int collapse field as a string. (I don't care about real-time performance.) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand -- Key: SOLR-6581 URL: https://issues.apache.org/jira/browse/SOLR-6581 Project: Solr Issue Type: Bug Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 5.0, Trunk Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, renames.diff The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent are optimized to work with a top level FieldCache. Top level FieldCaches have a very fast docID to top-level ordinal lookup. Fast access to the top-level ordinals allows for very high performance field collapsing on high cardinality fields. LUCENE-5666 unified the DocValues and FieldCache api's so that the top level FieldCache is no longer in regular use. Instead all top level caches are accessed through MultiDocValues. This ticket does the following: 1) Optimizes Collapse and Expand to use MultiDocValues and makes this the default approach when collapsing on String fields 2) Provides an option to use a top level FieldCache if the performance of MultiDocValues is a blocker. The mechanism for switching to the FieldCache is a new hint parameter. If the hint parameter is set to top_fc then the top-level FieldCache would be used for both Collapse and Expand. Example syntax: {code} fq={!collapse field=x hint=TOP_FC} {code} 3) Adds numeric collapse field implementations. 4) Resolves issue SOLR-6066 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1069) CSV document and field boosting support
[ https://issues.apache.org/jira/browse/SOLR-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dallan Quass updated SOLR-1069: --- Attachment: CSVLoader.java CSVRequestHandler.java.diff FWIW, I made a few changes to CSVRequestHandler.java, which mainly involve extracting CSVLoader into a separate public class and making a few variables/functions visible outside the package. The attached files show the changes I made. Doing this allowed me to create a subclass of CSVLoader that does boosting: public class BoostingCSVRequestHandler extends ContentStreamHandlerBase { protected ContentStreamLoader newLoader(SolrQueryRequest req, UpdateRequestProcessor processor) { return new BoostingCSVLoader(req, processor); } SolrInfoMBeans methods // @Override public String getDescription() { return boost CSV documents; } @Override public String getVersion() { return ; } @Override public String getSourceId() { return ; } @Override public String getSource() { return ; } } class BoostingCSVLoader extends CSVLoader { int boostFieldNum; BoostingCSVLoader(SolrQueryRequest req, UpdateRequestProcessor processor) { super(req, processor); } private String[] removeElement(String[] a, int pos) { String[] n = new String[a.length-1]; if (pos 0) System.arraycopy(a, 0, n, 0, pos); if (pos n.length) System.arraycopy(a, pos+1, n, pos, n.length - pos); return n; } @Override protected void prepareFields() { boostFieldNum = -1; for (int i = 0; i fieldnames.length; i++) { if (fieldnames[i].equals(boost)) { boostFieldNum = i; break; } } if (boostFieldNum = 0) { fieldnames = removeElement(fieldnames, boostFieldNum); } super.prepareFields(); } public void addDoc(int line, String[] vals) throws IOException { templateAdd.indexedId = null; SolrInputDocument doc = new SolrInputDocument(); if (boostFieldNum = 0) { float boost = Float.parseFloat(vals[boostFieldNum]); doc.setDocumentBoost(boost); vals = removeElement(vals, boostFieldNum); } doAdd(line, vals, doc, templateAdd); } } CSV document and field boosting support --- Key: SOLR-1069 URL: https://issues.apache.org/jira/browse/SOLR-1069 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Attachments: CSVLoader.java, CSVRequestHandler.java.diff It would be good if CSV loader could do document and field boosting. I believe this could be handled via additional special columns that are tacked on such as doc.boost and field.name.boost, which are then filled in with boost values on a per row basis. Obviously, this approach would prevent someone having an actual column named field.name.boost, so maybe we can make that configurable as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1069) CSV document and field boosting support
[ https://issues.apache.org/jira/browse/SOLR-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838526#action_12838526 ] Dallan Quass edited comment on SOLR-1069 at 2/25/10 8:33 PM: - FWIW, I made a few changes to CSVRequestHandler.java, which mainly involve extracting CSVLoader into a separate public class and making a few variables/functions visible outside the package. The attached files show the changes I made. Doing this allowed me to create a subclass of CSVLoader that does boosting: {code} public class BoostingCSVRequestHandler extends ContentStreamHandlerBase { protected ContentStreamLoader newLoader(SolrQueryRequest req, UpdateRequestProcessor processor) { return new BoostingCSVLoader(req, processor); } SolrInfoMBeans methods // @Override public String getDescription() { return boost CSV documents; } @Override public String getVersion() { return ; } @Override public String getSourceId() { return ; } @Override public String getSource() { return ; } } class BoostingCSVLoader extends CSVLoader { int boostFieldNum; BoostingCSVLoader(SolrQueryRequest req, UpdateRequestProcessor processor) { super(req, processor); } private String[] removeElement(String[] a, int pos) { String[] n = new String[a.length-1]; if (pos 0) System.arraycopy(a, 0, n, 0, pos); if (pos n.length) System.arraycopy(a, pos+1, n, pos, n.length - pos); return n; } @Override protected void prepareFields() { boostFieldNum = -1; for (int i = 0; i fieldnames.length; i++) { if (fieldnames[i].equals(boost)) { boostFieldNum = i; break; } } if (boostFieldNum = 0) { fieldnames = removeElement(fieldnames, boostFieldNum); } super.prepareFields(); } public void addDoc(int line, String[] vals) throws IOException { templateAdd.indexedId = null; SolrInputDocument doc = new SolrInputDocument(); if (boostFieldNum = 0) { float boost = Float.parseFloat(vals[boostFieldNum]); doc.setDocumentBoost(boost); vals = removeElement(vals, boostFieldNum); } doAdd(line, vals, doc, templateAdd); } } {code} was (Author: dallanq): FWIW, I made a few changes to CSVRequestHandler.java, which mainly involve extracting CSVLoader into a separate public class and making a few variables/functions visible outside the package. The attached files show the changes I made. Doing this allowed me to create a subclass of CSVLoader that does boosting: public class BoostingCSVRequestHandler extends ContentStreamHandlerBase { protected ContentStreamLoader newLoader(SolrQueryRequest req, UpdateRequestProcessor processor) { return new BoostingCSVLoader(req, processor); } SolrInfoMBeans methods // @Override public String getDescription() { return boost CSV documents; } @Override public String getVersion() { return ; } @Override public String getSourceId() { return ; } @Override public String getSource() { return ; } } class BoostingCSVLoader extends CSVLoader { int boostFieldNum; BoostingCSVLoader(SolrQueryRequest req, UpdateRequestProcessor processor) { super(req, processor); } private String[] removeElement(String[] a, int pos) { String[] n = new String[a.length-1]; if (pos 0) System.arraycopy(a, 0, n, 0, pos); if (pos n.length) System.arraycopy(a, pos+1, n, pos, n.length - pos); return n; } @Override protected void prepareFields() { boostFieldNum = -1; for (int i = 0; i fieldnames.length; i++) { if (fieldnames[i].equals(boost)) { boostFieldNum = i; break; } } if (boostFieldNum = 0) { fieldnames = removeElement(fieldnames, boostFieldNum); } super.prepareFields(); } public void addDoc(int line, String[] vals) throws IOException { templateAdd.indexedId = null; SolrInputDocument doc = new SolrInputDocument(); if (boostFieldNum = 0) { float boost = Float.parseFloat(vals[boostFieldNum]); doc.setDocumentBoost(boost); vals = removeElement(vals, boostFieldNum); } doAdd(line, vals, doc, templateAdd); } } CSV document and field boosting support --- Key: SOLR-1069 URL: https://issues.apache.org/jira/browse/SOLR-1069 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Attachments: CSVLoader.java, CSVRequestHandler.java.diff It would be good if
[jira] Created: (SOLR-1795) Subclassing QueryComponent for fetching results from a database
Subclassing QueryComponent for fetching results from a database --- Key: SOLR-1795 URL: https://issues.apache.org/jira/browse/SOLR-1795 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Dallan Quass This is a request to change the access on a few fields from package to public. I've subclassed QueryComponent to allow me to fetch results from a database (based upon the stored uniqueKey field) instead of from the shards. The only stored field in solr is the uniqueKey field, and whatever fields I might need for sorting. To do this I've overridden QueryComponent.finishStage so that after executing the query, SolrDocuments are created with the uniqueKey field. A later component populates the rest of the fields in the documents by reading them from a database. {code} public void finishStage(ResponseBuilder rb) { if (rb.stage == ResponseBuilder.STAGE_EXECUTE_QUERY) { // Create SolrDocument's from the ShardDoc's boolean returnScores = (rb.getFieldFlags() SolrIndexSearcher.GET_SCORES) != 0; for (ShardDoc sdoc : rb.resultIds.values()) { SolrDocument doc = new SolrDocument(); doc.setField(UNIQUE_KEY_FIELDNAME, sdoc.id); if (returnScores sdoc.score != null) { doc.setField(score, sdoc.score); } rb._responseDocs.set(sdoc.positionInResponse, doc); } } } {code} Everything works fine, but ResponseBuilder variables: *resultIds* and *_responseDocs*, and ShardDoc variables: *id*, *score*, and *positionInResponse* currently all have package visibility. I needed to modify the core solr files to change their visibility to public so that I could access them in the function above. Is there any chance that they could be changed to public in a future version of Solr, or somehow make them accessible outside the package? If people are interested, I could post the QueryComponent subclass and database component that I wrote. But it gets a bit involved because the QueryComponent subclass also handles parsing the query just at the main solr server, and sending serialized parsed queries to the shards. (Query parsing in my environment is pretty cpu- and memory-intensive so I do it just at the main server instead of the shards.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1795) Subclassing QueryComponent for fetching results from a database
[ https://issues.apache.org/jira/browse/SOLR-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dallan Quass updated SOLR-1795: --- Description: This is a request to change the access on a few fields from package to public. I've subclassed QueryComponent to allow me to fetch results from a database (based upon the stored uniqueKey field) instead of from the shards. The only stored field in solr is the uniqueKey field, and whatever fields I might need for sorting. To do this I've overridden QueryComponent.finishStage so that after executing the query, SolrDocuments are created with the uniqueKey field. A later component populates the rest of the fields in the documents by reading them from a database. {code} public void finishStage(ResponseBuilder rb) { if (rb.stage == ResponseBuilder.STAGE_EXECUTE_QUERY) { // Create SolrDocument's from the ShardDoc's boolean returnScores = (rb.getFieldFlags() SolrIndexSearcher.GET_SCORES) != 0; for (ShardDoc sdoc : rb.resultIds.values()) { SolrDocument doc = new SolrDocument(); doc.setField(id, sdoc.id); if (returnScores sdoc.score != null) { doc.setField(score, sdoc.score); } rb._responseDocs.set(sdoc.positionInResponse, doc); } } } {code} Everything works fine, but ResponseBuilder variables: *resultIds* and *_responseDocs*, and ShardDoc variables: *id*, *score*, and *positionInResponse* currently all have package visibility. I needed to modify the core solr files to change their visibility to public so that I could access them in the function above. Is there any chance that they could be changed to public in a future version of Solr, or somehow make them accessible outside the package? If people are interested, I could post the QueryComponent subclass and database component that I wrote. But it gets a bit involved because the QueryComponent subclass also handles parsing the query just at the main solr server, and sending serialized parsed queries to the shards. (Query parsing in my environment is pretty cpu- and memory-intensive so I do it just at the main server instead of the shards.) was: This is a request to change the access on a few fields from package to public. I've subclassed QueryComponent to allow me to fetch results from a database (based upon the stored uniqueKey field) instead of from the shards. The only stored field in solr is the uniqueKey field, and whatever fields I might need for sorting. To do this I've overridden QueryComponent.finishStage so that after executing the query, SolrDocuments are created with the uniqueKey field. A later component populates the rest of the fields in the documents by reading them from a database. {code} public void finishStage(ResponseBuilder rb) { if (rb.stage == ResponseBuilder.STAGE_EXECUTE_QUERY) { // Create SolrDocument's from the ShardDoc's boolean returnScores = (rb.getFieldFlags() SolrIndexSearcher.GET_SCORES) != 0; for (ShardDoc sdoc : rb.resultIds.values()) { SolrDocument doc = new SolrDocument(); doc.setField(UNIQUE_KEY_FIELDNAME, sdoc.id); if (returnScores sdoc.score != null) { doc.setField(score, sdoc.score); } rb._responseDocs.set(sdoc.positionInResponse, doc); } } } {code} Everything works fine, but ResponseBuilder variables: *resultIds* and *_responseDocs*, and ShardDoc variables: *id*, *score*, and *positionInResponse* currently all have package visibility. I needed to modify the core solr files to change their visibility to public so that I could access them in the function above. Is there any chance that they could be changed to public in a future version of Solr, or somehow make them accessible outside the package? If people are interested, I could post the QueryComponent subclass and database component that I wrote. But it gets a bit involved because the QueryComponent subclass also handles parsing the query just at the main solr server, and sending serialized parsed queries to the shards. (Query parsing in my environment is pretty cpu- and memory-intensive so I do it just at the main server instead of the shards.) Subclassing QueryComponent for fetching results from a database --- Key: SOLR-1795 URL: https://issues.apache.org/jira/browse/SOLR-1795 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Dallan Quass This is a request to change the access on a few fields from package to public. I've subclassed QueryComponent to allow me to fetch results from a database (based upon the stored uniqueKey field) instead of from the shards. The only stored field in solr is the uniqueKey field, and whatever
[jira] Updated: (SOLR-1795) Subclassing QueryComponent for fetching results from a database
[ https://issues.apache.org/jira/browse/SOLR-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dallan Quass updated SOLR-1795: --- Priority: Minor (was: Major) Subclassing QueryComponent for fetching results from a database --- Key: SOLR-1795 URL: https://issues.apache.org/jira/browse/SOLR-1795 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Dallan Quass Priority: Minor This is a request to change the access on a few fields from package to public. I've subclassed QueryComponent to allow me to fetch results from a database (based upon the stored uniqueKey field) instead of from the shards. The only stored field in solr is the uniqueKey field, and whatever fields I might need for sorting. To do this I've overridden QueryComponent.finishStage so that after executing the query, SolrDocuments are created with the uniqueKey field. A later component populates the rest of the fields in the documents by reading them from a database. {code} public void finishStage(ResponseBuilder rb) { if (rb.stage == ResponseBuilder.STAGE_EXECUTE_QUERY) { // Create SolrDocument's from the ShardDoc's boolean returnScores = (rb.getFieldFlags() SolrIndexSearcher.GET_SCORES) != 0; for (ShardDoc sdoc : rb.resultIds.values()) { SolrDocument doc = new SolrDocument(); doc.setField(id, sdoc.id); if (returnScores sdoc.score != null) { doc.setField(score, sdoc.score); } rb._responseDocs.set(sdoc.positionInResponse, doc); } } } {code} Everything works fine, but ResponseBuilder variables: *resultIds* and *_responseDocs*, and ShardDoc variables: *id*, *score*, and *positionInResponse* currently all have package visibility. I needed to modify the core solr files to change their visibility to public so that I could access them in the function above. Is there any chance that they could be changed to public in a future version of Solr, or somehow make them accessible outside the package? If people are interested, I could post the QueryComponent subclass and database component that I wrote. But it gets a bit involved because the QueryComponent subclass also handles parsing the query just at the main solr server, and sending serialized parsed queries to the shards. (Query parsing in my environment is pretty cpu- and memory-intensive so I do it just at the main server instead of the shards.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1202) Field facets missing in distributed queries when facet.sort=index and facet.mincount0
Field facets missing in distributed queries when facet.sort=index and facet.mincount0 -- Key: SOLR-1202 URL: https://issues.apache.org/jira/browse/SOLR-1202 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.4 Reporter: Dallan Quass Fix For: 1.4 On line 385 of FacetComponent.java, the line: if (counts[i].count dff.minCount) break; will cause some facets to not be returned in the case where facet.sort=index and facet.mincount0. To fix, you could add a condition where you checked whether the facets were sorted by count. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans
[ http://issues.apache.org/jira/browse/LUCENE-413?page=comments#action_12372983 ] Dallan Quass commented on LUCENE-413: - That fixed it! I made the patch in DisjunctionSumScorerPath5.txt and used the posted SpanScorer, and I'm no longer experiencing the array index out of bounds problem. Thank-you Paul! [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans - Key: LUCENE-413 URL: http://issues.apache.org/jira/browse/LUCENE-413 Project: Lucene - Java Type: Bug Components: Search Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: paul.elschot Assignee: Lucene Developers Attachments: DisjunctionSumScorerPatch3.txt, DisjunctionSumScorerPatch4.txt, DisjunctionSumScorerPatch5.txt, DisjunctionSumScorerTestPatch1.txt, NearSpansOrdered.java, NearSpansOrderedBugHuntPatch1.txt, NearSpansUnordered.java, SpanNearQueryPatch1.txt, SpanScorer.java, SpanScorerTestPatch1.txt, TestSpansAdvanced.java, TestSpansAdvanced2.java From Erik's post at java-dev: [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 [java] at org.apache.lucene.search.BooleanScorer2 $Coordinator.coordFactor(BooleanScorer2.java:54) [java] at org.apache.lucene.search.BooleanScorer2.score (BooleanScorer2.java:292) ... and my answer: Probably nrMatchers is increased too often in score() by calling score() more than once. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-413) [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans
[ http://issues.apache.org/jira/browse/LUCENE-413?page=comments#action_12372615 ] Dallan Quass commented on LUCENE-413: - Sorry for cluttering up the comments here. I wanted to add that the part of the query exhibiting the error in the second case is a boolean query with three SpanTermQuery clauses, all Occur.SHOULD. If I change the SpanTermQuery clauses to just TermQuery clauses, the problem goes away. [PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans - Key: LUCENE-413 URL: http://issues.apache.org/jira/browse/LUCENE-413 Project: Lucene - Java Type: Bug Components: Search Versions: CVS Nightly - Specify date in submission Environment: Operating System: other Platform: Other Reporter: paul.elschot Assignee: Lucene Developers Attachments: DisjunctionSumScorerPatch3.txt, DisjunctionSumScorerPatch4.txt, DisjunctionSumScorerPatch5.txt, DisjunctionSumScorerTestPatch1.txt, NearSpansOrdered.java, NearSpansOrderedBugHuntPatch1.txt, NearSpansUnordered.java, SpanNearQueryPatch1.txt, SpanScorerTestPatch1.txt From Erik's post at java-dev: [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 [java] at org.apache.lucene.search.BooleanScorer2 $Coordinator.coordFactor(BooleanScorer2.java:54) [java] at org.apache.lucene.search.BooleanScorer2.score (BooleanScorer2.java:292) ... and my answer: Probably nrMatchers is increased too often in score() by calling score() more than once. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]