[jira] [Updated] (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2015-04-14 Thread John Wooden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Wooden updated SOLR-1837:
--
Attachment: SOLR-1837.patch

Added a simple UX for the document reconstructor. When a core is selected 
within the management console, a new "DocInspector" link appears. Provide a 
Document ID and the handler will be triggered.

Applies to Solr 4.10.4

> Reconstruct a Document (stored fields, indexed fields, payloads)
> 
>
> Key: SOLR-1837
> URL: https://issues.apache.org/jira/browse/SOLR-1837
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis, web gui
>Affects Versions: 1.5
> Environment: All
>Reporter: Trey Grainger
>Priority: Minor
>  Labels: admin, indexed, luke, payload, reconstruct, stored
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-1837.patch, SOLR-1837.patch, 
> SOLR-1837_WithHandler.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> One Solr feature I've been sorely in need of is the ability to inspect an 
> index for any particular document.  While the analysis page is good when you 
> have specific content and a specific field/type your want to test the 
> analysis process for, once a document is indexed it is not currently possible 
> to easily see what is actually sitting in the index.
> One can use the Lucene Index Browser (Luke), but this has several limitations 
> (gui only, doesn't understand solr schema, doesn't display many non-text 
> fields in human readable format, doesn't show payloads, some bugs lead to 
> missing terms, exposes features dangerous to use in a production Solr 
> environment, slow or difficult to check from a remote location, etc.).  The 
> document reconstruction feature of Luke provides the base for what can become 
> a much more powerful tool when coupled with Solr's understanding of a schema, 
> however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4310) If groups.ngroups is specified, the docList's numFound should be the number of groups

2013-07-16 Thread John Wooden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Wooden updated SOLR-4310:
--

Attachment: SOLR-4310_4.patch

Added another test class to check a few more use cases, both single-core and 
distributed.  There are also test cases written that rely on SOLR-2894 which 
are commented-out.

Fixed an issue where numFound (group count) would be 0 for a single-core edge 
case.

Our tests suggest that distributed works just fine with 4310 unless Amit 
recalls what he alluded to previously.

On a semi-related note

When writing the additional tests, we noticed some inconsistent behavior around 
rows. Not a result of the 4310 patch nor pertinent to 4310's purpose, just 
something we discovered.

On a single core with group.limit > 1 and group.main, setting rows=10 will 
return 10 _documents_. A distributed setup with the same params will return 10 
_groups_. A commented-out failing testcase is included in the patch. If others 
can confirm it, we can open a new JIRA ticket for it.

> If groups.ngroups is specified, the docList's numFound should be the number 
> of groups
> -
>
> Key: SOLR-4310
> URL: https://issues.apache.org/jira/browse/SOLR-4310
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.1
>Reporter: Amit Nithian
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 4.4
>
> Attachments: SOLR-4310_2.patch, SOLR-4310_3.patch, SOLR-4310_4.patch, 
> SOLR-4310.patch
>
>
> If you group by a field, the response may look like this:
> 
> 
> 138
> 1
> 
> 
> 267038365
> 
> Larry's Grand Ole Garage Country Dance - Pure Country
> 
> 
> 
> 
> 
> and if you specify group.main then the doclist becomes the result and you 
> lose all context of the number of groups. If you want to keep your response 
> format backwards compatible with clients (i.e. clients who don't know about 
> the grouped format), setting group.main=true solves this BUT the numFound is 
> the number of raw matches instead of the number of groups. This may have 
> downstream consequences.
> I'd like to propose that if the user specifies ngroups=true then when 
> creating the returning DocSlice, set the numFound to be the number of groups 
> instead of the number of raw matches to keep the response consistent with 
> what the user would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4310) If groups.ngroups is specified, the docList's numFound should be the number of groups

2013-07-08 Thread John Wooden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702504#comment-13702504
 ] 

John Wooden commented on SOLR-4310:
---

Amit: Could you elaborate on the remaining issues with distributed search?

> If groups.ngroups is specified, the docList's numFound should be the number 
> of groups
> -
>
> Key: SOLR-4310
> URL: https://issues.apache.org/jira/browse/SOLR-4310
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.1
>Reporter: Amit Nithian
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 4.4
>
> Attachments: SOLR-4310_2.patch, SOLR-4310_3.patch, SOLR-4310.patch
>
>
> If you group by a field, the response may look like this:
> 
> 
> 138
> 1
> 
> 
> 267038365
> 
> Larry's Grand Ole Garage Country Dance - Pure Country
> 
> 
> 
> 
> 
> and if you specify group.main then the doclist becomes the result and you 
> lose all context of the number of groups. If you want to keep your response 
> format backwards compatible with clients (i.e. clients who don't know about 
> the grouped format), setting group.main=true solves this BUT the numFound is 
> the number of raw matches instead of the number of groups. This may have 
> downstream consequences.
> I'd like to propose that if the user specifies ngroups=true then when 
> creating the returning DocSlice, set the numFound to be the number of groups 
> instead of the number of raw matches to keep the response consistent with 
> what the user would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1837) Reconstruct a Document (stored fields, indexed fields, payloads)

2013-07-08 Thread John Wooden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Wooden updated SOLR-1837:
--

Attachment: SOLR-1837_WithHandler.patch

I've updated this patch to use a handler rather than JSP. Patch is also 
confirmed working with 4.2.1.

Performance is still quite slow. The SolrDocReconstructor class hasn't changed 
much since the prior version.

-- How to use --

1. Add the handler to your config:



2. Sample call:

/solr/coreX/admin/docinspector?documentid=12345

3. Wait. Time required varies by size of document and index. A large document 
in a large index may allow enough time for a doughnut & coffee run.

4. Sample output:


   
  0
  x
   
   12345
   
  
 12345
 true
 16
 test
 2013-07-03T19:06:42.069Z
  
  
 12345
 28 | 0 | 0 | 0
 17 | 0 | 0 | 0
 16 | 0 | 0 | 0
 test
 2013-07-03T19:06:42.069Z | 2013-07-03T19:06:42.048Z 
| 2013-07-03T19:05:40.096Z | 2013-07-03T14:46:48.064Z | 
2013-06-01T13:49:27.424Z | 2004-11-03T19:53:47.776Z | 1970-01-01T00:00:00Z | 
1970-01-01T00:00:00Z
  
   


> Reconstruct a Document (stored fields, indexed fields, payloads)
> 
>
> Key: SOLR-1837
> URL: https://issues.apache.org/jira/browse/SOLR-1837
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis, web gui
>Affects Versions: 1.5
> Environment: All
>Reporter: Trey Grainger
>Priority: Minor
>  Labels: admin, indexed, luke, payload, reconstruct, stored
> Fix For: 4.4
>
> Attachments: SOLR-1837.patch, SOLR-1837_WithHandler.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> One Solr feature I've been sorely in need of is the ability to inspect an 
> index for any particular document.  While the analysis page is good when you 
> have specific content and a specific field/type your want to test the 
> analysis process for, once a document is indexed it is not currently possible 
> to easily see what is actually sitting in the index.
> One can use the Lucene Index Browser (Luke), but this has several limitations 
> (gui only, doesn't understand solr schema, doesn't display many non-text 
> fields in human readable format, doesn't show payloads, some bugs lead to 
> missing terms, exposes features dangerous to use in a production Solr 
> environment, slow or difficult to check from a remote location, etc.).  The 
> document reconstruction feature of Luke provides the base for what can become 
> a much more powerful tool when coupled with Solr's understanding of a schema, 
> however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3030) StopTermTypesFilter and Factory allows filtering based on the TermTypeAttribute

2013-03-13 Thread John Wooden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601675#comment-13601675
 ] 

John Wooden commented on SOLR-3030:
---

Thanks for pointing this out. We've now switched to using TypeTokenFilter.

> StopTermTypesFilter and Factory allows filtering based on the 
> TermTypeAttribute
> ---
>
> Key: SOLR-3030
> URL: https://issues.apache.org/jira/browse/SOLR-3030
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Monica Skidmore
>Priority: Trivial
>  Labels: features, newbie, patch
> Attachments: SOLR-3030 4.0.patch, SOLR-3030.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This filter will remove terms based on the TermTypeAttribute, using a list of 
> StopTermTypes from the "StopTermTypes" file specified by the user as an 
> attribute of the filter factory in their schema.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3030) StopTermTypesFilter and Factory allows filtering based on the TermTypeAttribute

2012-12-04 Thread John Wooden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Wooden updated SOLR-3030:
--

Attachment: SOLR-3030 4.0.patch

Updated this patch for 4.0.0

> StopTermTypesFilter and Factory allows filtering based on the 
> TermTypeAttribute
> ---
>
> Key: SOLR-3030
> URL: https://issues.apache.org/jira/browse/SOLR-3030
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 4.0-ALPHA
>Reporter: Monica Skidmore
>Priority: Trivial
>  Labels: features, newbie, patch
> Attachments: SOLR-3030 4.0.patch, SOLR-3030.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This filter will remove terms based on the TermTypeAttribute, using a list of 
> StopTermTypes from the "StopTermTypes" file specified by the user as an 
> attribute of the filter factory in their schema.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org