[jira] [Commented] (SOLR-5244) Full Search Result Export

Kranti Parisa (JIRA) Tue, 17 Sep 2013 14:36:47 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769993#comment-13769993
 ]


Kranti Parisa commented on SOLR-5244:
-------------------------------------

Joel,

In one of my emails to the dev-group I have asked the following question
-----
I am sure this doesn't exist today, but just wondering about your thoughts.

When we use Join queries (first time or with out hitting Filter Cache) and say 
debug=true, we are able to see good amount of debug info in the response. 

Do we have any plans of supporting this debug info even when we hit the Filter 
Cache. I believe that this information will be helpful with/without hitting the 
caches.

Consider this use case: in production, a request comes in and builds the Filter 
Cache for a Join Query and at some point of time we want to run that query 
manually with debug turned on, we can't see a bunch of very useful 
stats/numbers. 
---

ExportQParserPlugin will save the BitSet into the request context even when we 
hit the caches? The idea of saving the BitSets into the request context is very 
helpful when we do Joins. Because, when we write the response, for each 
document we would want to specify what all the cores this document was matched 
for the given criteria/filters

So, I think it is also a good idea to support an extra local_param in the new 
join implementations (SOLR-4787) say matchFlag="true" and if its true save the 
BitSet into the request context (even in the case of a cache hit). by default 
it can be "false" so that we don't need to save the BitSet in memory

Example response:
<doc>
<long name="id">111</long>
<str name="title">my title</str>
<arr name="joinMatches">
<str>coreA</str>
<str>coreB</str>
</arr>
</doc>

I was able to achieve that saving the BitSet into the join debug info. but was 
not able to get the point about cache hits. I think your idea of saving that 
into the request context makes more sense.

Your thoughts?
                
> Full Search Result Export
> -------------------------
>
>                 Key: SOLR-5244
>                 URL: https://issues.apache.org/jira/browse/SOLR-5244
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 5.0
>            Reporter: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: SOLR-5244.patch
>
>
> It would be great if Solr could efficiently export entire search result sets 
> without scoring or ranking documents. This would allow external systems to 
> perform rapid bulk imports from Solr. It also provides a possible platform 
> for exporting results to support distributed join scenarios within Solr.
> This ticket provides a patch that has two pluggable components:
> 1) ExportQParserPlugin: which is a post filter that gathers a BitSet with 
> document results and does not delegate to ranking collectors. Instead it puts 
> the BitSet on the request context.
> 2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints 
> the entire result as a binary stream. A header is provided at the beginning 
> of the stream so external clients can self configure.
> Note:
> These two components will be sufficient for a non-distributed environment. 
> For distributed export a new Request handler will need to be developed.
> After applying the patch and building the dist or example, you can register 
> the components through the following changes to solrconfig.xml
> Register export contrib libraries:
> <lib dir="../../../dist/" regex="solr-export-\d.*\.jar" />
>  
> Register the "export" queryParser with the following line:
>  
> <queryParser name="export" 
> class="org.apache.solr.export.ExportQParserPlugin"/>
>  
> Register the "xbin" writer:
>  
> <queryResponseWriter name="xbin" 
> class="org.apache.solr.export.BinaryExportWriter"/>
>  
> The following query will perform the export:
> {code}
> http://localhost:8983/solr/collection1/select?q=*:*&fq={!export}&wt=xbin&fl=join_i
> {code}
> Initial patch supports export of four data-types:
> 1) Single value trie int, long and float
> 2) Binary doc values.
> The numerics are currently exported from the FieldCache and the Binary doc 
> values can be in memory or on disk.
> Since this is designed to export very large result sets efficiently, stored 
> fields are not used for the export.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5244) Full Search Result Export

Reply via email to