[jira] [Updated] (SOLR-5244) Exporting Full Sorted Result Sets

Joel Bernstein (JIRA) Wed, 13 Aug 2014 11:34:25 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joel Bernstein updated SOLR-5244:
---------------------------------

    Description: 
This ticket allows Solr to export full sorted result sets. A new export request 
handler has been created that sets up the default writer type 
(SortingResponseWriter) and the required rank query (ExportQParserPlugin). The 
syntax is:

{code}
/solr/collection1/export?q=*:*&fl=a,b,c&sort=a desc,b desc
{code}

This capability will open up Solr for a whole range of uses that were typically 
done using aggregation engines like Hadoop. For example:

*Large Distributed Joins*

A client outside of Solr calls two different Solr collections and returns the 
results sorted by a join key. The client iterates through both streams and 
performs a merge join.

*Fully Distributed Field Collapsing/Grouping*

A client outside of Solr makes individual calls to all the servers in a single 
collection and returns results sorted by the collapse key. The client merge 
joins the sorted lists on the collapse key to perform the field collapse.

*High Cardinality Distributed Aggregation*

A client outside of Solr makes individual calls to all the servers in a single 
collection and sorts on a high cardinality field. The client then merge joins 
the sorted lists to perform the high cardinality aggregation.

*Large Scale Time Series Rollups*

A client outside Solr makes individual calls to all servers in a collection and 
sorts on time dimensions. The client merge joins the sorted result sets and 
rolls up the time dimensions as it iterates through the data.

In these scenarios Solr is being used as a distributed sorting engine. 
Developers can write clients that take advantage of this sorting capability in 
any way they wish.

*Session Analysis and Aggregation*

A client outside Solr makes individual calls to all servers in a collection and 
sorts on the sessionID. The client merge joins the sorted results and 
aggregates sessions as it iterates through the results.






  was:
This ticket allows Solr to export full sorted result sets. The proposed syntax 
is:

{code}
q=*:*&rows=-1&wt=xsort&fl=a,b,c&sort=a desc,b desc
{code}

Under the covers, the rows=-1 parameter will signal Solr to use the 
ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
results. The SortingResponseWriter will sort the results based on the sort 
criteria and stream the results out.

This capability will open up Solr for a whole range of uses that were typically 
done using aggregation engines like Hadoop. For example:

*Large Distributed Joins*

A client outside of Solr calls two different Solr collections and returns the 
results sorted by a join key. The client iterates through both streams and 
performs a merge join.

*Fully Distributed Field Collapsing/Grouping*

A client outside of Solr makes individual calls to all the servers in a single 
collection and returns results sorted by the collapse key. The client merge 
joins the sorted lists on the collapse key to perform the field collapse.

*High Cardinality Distributed Aggregation*

A client outside of Solr makes individual calls to all the servers in a single 
collection and sorts on a high cardinality field. The client then merge joins 
the sorted lists to perform the high cardinality aggregation.

*Large Scale Time Series Rollups*

A client outside Solr makes individual calls to all servers in a collection and 
sorts on time dimensions. The client merge joins the sorted result sets and 
rolls up the time dimensions as it iterates through the data.

In these scenarios Solr is being used as a distributed sorting engine. 
Developers can write clients that take advantage of this sorting capability in 
any way they wish.

*Session Analysis and Aggregation*

A client outside Solr makes individual calls to all servers in a collection and 
sorts on the sessionID. The client merge joins the sorted results and 
aggregates sessions as it iterates through the results.







> Exporting Full Sorted Result Sets
> ---------------------------------
>
>                 Key: SOLR-5244
>                 URL: https://issues.apache.org/jira/browse/SOLR-5244
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 5.0
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0, 4.10
>
>         Attachments: 0001-SOLR_5244.patch, SOLR-5244.patch, SOLR-5244.patch, 
> SOLR-5244.patch, SOLR-5244.patch, SOLR-5244.patch, SOLR-5244.patch
>
>
> This ticket allows Solr to export full sorted result sets. A new export 
> request handler has been created that sets up the default writer type 
> (SortingResponseWriter) and the required rank query (ExportQParserPlugin). 
> The syntax is:
> {code}
> /solr/collection1/export?q=*:*&fl=a,b,c&sort=a desc,b desc
> {code}
> This capability will open up Solr for a whole range of uses that were 
> typically done using aggregation engines like Hadoop. For example:
> *Large Distributed Joins*
> A client outside of Solr calls two different Solr collections and returns the 
> results sorted by a join key. The client iterates through both streams and 
> performs a merge join.
> *Fully Distributed Field Collapsing/Grouping*
> A client outside of Solr makes individual calls to all the servers in a 
> single collection and returns results sorted by the collapse key. The client 
> merge joins the sorted lists on the collapse key to perform the field 
> collapse.
> *High Cardinality Distributed Aggregation*
> A client outside of Solr makes individual calls to all the servers in a 
> single collection and sorts on a high cardinality field. The client then 
> merge joins the sorted lists to perform the high cardinality aggregation.
> *Large Scale Time Series Rollups*
> A client outside Solr makes individual calls to all servers in a collection 
> and sorts on time dimensions. The client merge joins the sorted result sets 
> and rolls up the time dimensions as it iterates through the data.
> In these scenarios Solr is being used as a distributed sorting engine. 
> Developers can write clients that take advantage of this sorting capability 
> in any way they wish.
> *Session Analysis and Aggregation*
> A client outside Solr makes individual calls to all servers in a collection 
> and sorts on the sessionID. The client merge joins the sorted results and 
> aggregates sessions as it iterates through the results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5244) Exporting Full Sorted Result Sets

Reply via email to