[
https://issues.apache.org/jira/browse/SOLR-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234128#comment-15234128
]
Joel Bernstein edited comment on SOLR-8962 at 4/10/16 2:02 PM:
---------------------------------------------------------------
Looks good.
One thing we can consider in future implementations is the merge sort
fork/join. The gatherNodes function is going to return a stream of Tuples that
contains long runs of pre-sorted Tuples. This is because the /export handler is
going to be returning the nodes already sorted. But because the traversal is
done in batches, the stream will have a pattern of runs of sorted Tuples. I
suspect this will work nicely with the merge sort fork join, plus we get the
threading. In my testing sorting is an operation that scales really nicely in
parallel because the memory locality of sorts is very tight, so you don't get
memory bound.
was (Author: joel.bernstein):
Looks good.
On thing we can consider in future implementations is the merge sort fork/join.
The gatherNodes function is going to return a stream of Tuples that contains
long runs of pre-sorted Tuples. This is because the /export handler is going to
be returning the nodes already sorted. But because the traversal is done in
batches, the stream will have a pattern of runs of sorted Tuples. I suspect
this will work nicely with the merge sort fork join, plus we get the threading.
In my testing sorting is an operation that scales really nicely in parallel
because the memory locality of sorts is very tight, so you don't get memory
bound.
> Add sort Streaming Expression
> -----------------------------
>
> Key: SOLR-8962
> URL: https://issues.apache.org/jira/browse/SOLR-8962
> Project: Solr
> Issue Type: New Feature
> Reporter: Joel Bernstein
> Priority: Critical
> Fix For: 6.1
>
> Attachments: SOLR-8962.patch
>
>
> The sort Streaming Expression does an in memory sort of the Tuples returned
> by it's underlying stream. This is intended to be used for sorting sets
> gathered during local graph traversals. This will make it easy to gather sets
> during a traversal and use all of the sort based set operations (merge,
> innerJoin, outerJoin, reduce, complement, intersect).
> This will be particularly useful with the gatherNodes expression (SOLR-8925).
> Sample syntax:
> {code}
> intersect(
> sort(gatherNodes(...), "fieldA asc"),
> sort(gatherNodes(...), "fieldA asc"),
> on)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]