[jira] [Updated] (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)
[ https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wooden updated SOLR-1837: -- Attachment: SOLR-1837.patch Added a simple UX for the document reconstructor. When a core is selected within the management console, a new "DocInspector" link appears. Provide a Document ID and the handler will be triggered. Applies to Solr 4.10.4 > Reconstruct a Document (stored fields, indexed fields, payloads) > > > Key: SOLR-1837 > URL: https://issues.apache.org/jira/browse/SOLR-1837 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis, web gui >Affects Versions: 1.5 > Environment: All >Reporter: Trey Grainger >Priority: Minor > Labels: admin, indexed, luke, payload, reconstruct, stored > Fix For: 4.9, Trunk > > Attachments: SOLR-1837.patch, SOLR-1837.patch, > SOLR-1837_WithHandler.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > One Solr feature I've been sorely in need of is the ability to inspect an > index for any particular document. While the analysis page is good when you > have specific content and a specific field/type your want to test the > analysis process for, once a document is indexed it is not currently possible > to easily see what is actually sitting in the index. > One can use the Lucene Index Browser (Luke), but this has several limitations > (gui only, doesn't understand solr schema, doesn't display many non-text > fields in human readable format, doesn't show payloads, some bugs lead to > missing terms, exposes features dangerous to use in a production Solr > environment, slow or difficult to check from a remote location, etc.). The > document reconstruction feature of Luke provides the base for what can become > a much more powerful tool when coupled with Solr's understanding of a schema, > however. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4310) If groups.ngroups is specified, the docList's numFound should be the number of groups
[ https://issues.apache.org/jira/browse/SOLR-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wooden updated SOLR-4310: -- Attachment: SOLR-4310_4.patch Added another test class to check a few more use cases, both single-core and distributed. There are also test cases written that rely on SOLR-2894 which are commented-out. Fixed an issue where numFound (group count) would be 0 for a single-core edge case. Our tests suggest that distributed works just fine with 4310 unless Amit recalls what he alluded to previously. On a semi-related note When writing the additional tests, we noticed some inconsistent behavior around rows. Not a result of the 4310 patch nor pertinent to 4310's purpose, just something we discovered. On a single core with group.limit > 1 and group.main, setting rows=10 will return 10 _documents_. A distributed setup with the same params will return 10 _groups_. A commented-out failing testcase is included in the patch. If others can confirm it, we can open a new JIRA ticket for it. > If groups.ngroups is specified, the docList's numFound should be the number > of groups > - > > Key: SOLR-4310 > URL: https://issues.apache.org/jira/browse/SOLR-4310 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.1 >Reporter: Amit Nithian >Assignee: Hoss Man >Priority: Minor > Fix For: 4.4 > > Attachments: SOLR-4310_2.patch, SOLR-4310_3.patch, SOLR-4310_4.patch, > SOLR-4310.patch > > > If you group by a field, the response may look like this: > > > 138 > 1 > > > 267038365 > > Larry's Grand Ole Garage Country Dance - Pure Country > > > > > > and if you specify group.main then the doclist becomes the result and you > lose all context of the number of groups. If you want to keep your response > format backwards compatible with clients (i.e. clients who don't know about > the grouped format), setting group.main=true solves this BUT the numFound is > the number of raw matches instead of the number of groups. This may have > downstream consequences. > I'd like to propose that if the user specifies ngroups=true then when > creating the returning DocSlice, set the numFound to be the number of groups > instead of the number of raw matches to keep the response consistent with > what the user would expect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4310) If groups.ngroups is specified, the docList's numFound should be the number of groups
[ https://issues.apache.org/jira/browse/SOLR-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702504#comment-13702504 ] John Wooden commented on SOLR-4310: --- Amit: Could you elaborate on the remaining issues with distributed search? > If groups.ngroups is specified, the docList's numFound should be the number > of groups > - > > Key: SOLR-4310 > URL: https://issues.apache.org/jira/browse/SOLR-4310 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.1 >Reporter: Amit Nithian >Assignee: Hoss Man >Priority: Minor > Fix For: 4.4 > > Attachments: SOLR-4310_2.patch, SOLR-4310_3.patch, SOLR-4310.patch > > > If you group by a field, the response may look like this: > > > 138 > 1 > > > 267038365 > > Larry's Grand Ole Garage Country Dance - Pure Country > > > > > > and if you specify group.main then the doclist becomes the result and you > lose all context of the number of groups. If you want to keep your response > format backwards compatible with clients (i.e. clients who don't know about > the grouped format), setting group.main=true solves this BUT the numFound is > the number of raw matches instead of the number of groups. This may have > downstream consequences. > I'd like to propose that if the user specifies ngroups=true then when > creating the returning DocSlice, set the numFound to be the number of groups > instead of the number of raw matches to keep the response consistent with > what the user would expect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)
[ https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wooden updated SOLR-1837: -- Attachment: SOLR-1837_WithHandler.patch I've updated this patch to use a handler rather than JSP. Patch is also confirmed working with 4.2.1. Performance is still quite slow. The SolrDocReconstructor class hasn't changed much since the prior version. -- How to use -- 1. Add the handler to your config: 2. Sample call: /solr/coreX/admin/docinspector?documentid=12345 3. Wait. Time required varies by size of document and index. A large document in a large index may allow enough time for a doughnut & coffee run. 4. Sample output: 0 x 12345 12345 true 16 test 2013-07-03T19:06:42.069Z 12345 28 | 0 | 0 | 0 17 | 0 | 0 | 0 16 | 0 | 0 | 0 test 2013-07-03T19:06:42.069Z | 2013-07-03T19:06:42.048Z | 2013-07-03T19:05:40.096Z | 2013-07-03T14:46:48.064Z | 2013-06-01T13:49:27.424Z | 2004-11-03T19:53:47.776Z | 1970-01-01T00:00:00Z | 1970-01-01T00:00:00Z > Reconstruct a Document (stored fields, indexed fields, payloads) > > > Key: SOLR-1837 > URL: https://issues.apache.org/jira/browse/SOLR-1837 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis, web gui >Affects Versions: 1.5 > Environment: All >Reporter: Trey Grainger >Priority: Minor > Labels: admin, indexed, luke, payload, reconstruct, stored > Fix For: 4.4 > > Attachments: SOLR-1837.patch, SOLR-1837_WithHandler.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > One Solr feature I've been sorely in need of is the ability to inspect an > index for any particular document. While the analysis page is good when you > have specific content and a specific field/type your want to test the > analysis process for, once a document is indexed it is not currently possible > to easily see what is actually sitting in the index. > One can use the Lucene Index Browser (Luke), but this has several limitations > (gui only, doesn't understand solr schema, doesn't display many non-text > fields in human readable format, doesn't show payloads, some bugs lead to > missing terms, exposes features dangerous to use in a production Solr > environment, slow or difficult to check from a remote location, etc.). The > document reconstruction feature of Luke provides the base for what can become > a much more powerful tool when coupled with Solr's understanding of a schema, > however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3030) StopTermTypesFilter and Factory allows filtering based on the TermTypeAttribute
[ https://issues.apache.org/jira/browse/SOLR-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601675#comment-13601675 ] John Wooden commented on SOLR-3030: --- Thanks for pointing this out. We've now switched to using TypeTokenFilter. > StopTermTypesFilter and Factory allows filtering based on the > TermTypeAttribute > --- > > Key: SOLR-3030 > URL: https://issues.apache.org/jira/browse/SOLR-3030 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis >Affects Versions: 4.0-ALPHA >Reporter: Monica Skidmore >Priority: Trivial > Labels: features, newbie, patch > Attachments: SOLR-3030 4.0.patch, SOLR-3030.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > This filter will remove terms based on the TermTypeAttribute, using a list of > StopTermTypes from the "StopTermTypes" file specified by the user as an > attribute of the filter factory in their schema. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3030) StopTermTypesFilter and Factory allows filtering based on the TermTypeAttribute
[ https://issues.apache.org/jira/browse/SOLR-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wooden updated SOLR-3030: -- Attachment: SOLR-3030 4.0.patch Updated this patch for 4.0.0 > StopTermTypesFilter and Factory allows filtering based on the > TermTypeAttribute > --- > > Key: SOLR-3030 > URL: https://issues.apache.org/jira/browse/SOLR-3030 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis >Affects Versions: 4.0-ALPHA >Reporter: Monica Skidmore >Priority: Trivial > Labels: features, newbie, patch > Attachments: SOLR-3030 4.0.patch, SOLR-3030.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > This filter will remove terms based on the TermTypeAttribute, using a list of > StopTermTypes from the "StopTermTypes" file specified by the user as an > attribute of the filter factory in their schema. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org