[jira] [Updated] (SOLR-5244) Exporting Full Sorted Result Sets

Joel Bernstein (JIRA) Wed, 13 Aug 2014 14:10:30 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joel Bernstein updated SOLR-5244:
---------------------------------

    Attachment: SOLR-5244.patch

Made the /export init poperties invariants and added distrib=false to ensure 
that the requests aren't distributed when in SolrCloud mode.

In the initial release developers can create SolrCloud aware clients that query 
each node in the cluster and merge the results in any way they see fit. 

> Exporting Full Sorted Result Sets
> ---------------------------------
>
>                 Key: SOLR-5244
>                 URL: https://issues.apache.org/jira/browse/SOLR-5244
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 5.0
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0, 4.10
>
>         Attachments: 0001-SOLR_5244.patch, SOLR-5244.patch, SOLR-5244.patch, 
> SOLR-5244.patch, SOLR-5244.patch, SOLR-5244.patch, SOLR-5244.patch, 
> SOLR-5244.patch
>
>
> This ticket allows Solr to export full sorted result sets. A new export 
> request handler has been created that sets up the default writer type 
> (SortingResponseWriter) and the required rank query (ExportQParserPlugin). 
> The syntax is:
> {code}
> /solr/collection1/export?q=*:*&fl=a,b,c&sort=a desc,b desc
> {code}
> This capability will open up Solr for a whole range of uses that were 
> typically done using aggregation engines like Hadoop. For example:
> *Large Distributed Joins*
> A client outside of Solr calls two different Solr collections and returns the 
> results sorted by a join key. The client iterates through both streams and 
> performs a merge join.
> *Fully Distributed Field Collapsing/Grouping*
> A client outside of Solr makes individual calls to all the servers in a 
> single collection and returns results sorted by the collapse key. The client 
> merge joins the sorted lists on the collapse key to perform the field 
> collapse.
> *High Cardinality Distributed Aggregation*
> A client outside of Solr makes individual calls to all the servers in a 
> single collection and sorts on a high cardinality field. The client then 
> merge joins the sorted lists to perform the high cardinality aggregation.
> *Large Scale Time Series Rollups*
> A client outside Solr makes individual calls to all servers in a collection 
> and sorts on time dimensions. The client merge joins the sorted result sets 
> and rolls up the time dimensions as it iterates through the data.
> In these scenarios Solr is being used as a distributed sorting engine. 
> Developers can write clients that take advantage of this sorting capability 
> in any way they wish.
> *Session Analysis and Aggregation*
> A client outside Solr makes individual calls to all servers in a collection 
> and sorts on the sessionID. The client merge joins the sorted results and 
> aggregates sessions as it iterates through the results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-5244) Exporting Full Sorted Result Sets

Reply via email to