[
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785486#comment-15785486
]
Joel Bernstein edited comment on SOLR-9636 at 12/29/16 3:15 PM:
----------------------------------------------------------------
I added a new NullStream to test the performance of exporting and sorting on a
high cardinality field. High cardinality exporting/sorting is an important real
world use case for supporting distributed joins on primary keys. The query
looks like this:
{code}
parallel(collection2, workers=7, sort="count desc",
null(search(collection1,
q=*:*,
fl="id",
sort="id desc",
qt="/export",
wt="javabin",
partitionKeys=id)))
{code}
Notice the new *null* function which eats the tuples and returns a count to
verify the number of tuples processed.
The test query is sorting on the id field which has a unique value in each
record. Again performance was impressive:
* With json: 1,210,000 Tuples per second.
* With javabin: 1,350,000 Tuples per second.
So the ExportWriter doesn't slow down sorting on a high cardinality field.
Going forward the NullStream will be useful for testing the raw performance of
the ExportWriter in isolation. This will help developers diagnose where the
bottlekneck is if distributed joins aren't performing as expected.
For example if a join is slow, but the same export using the NullStream is
fast, then you can be sure that the bottleneck is not in the ExportWriter, and
is likely in Join stream.
was (Author: joel.bernstein):
I added a new NullStream to test the performance of exporting and sorting on a
high cardinality field. This is a much more real world scenario for supporting
distributed joins on primary keys. The query looks like this:
{code}
parallel(collection2, workers=7, sort="count desc",
null(search(collection1,
q=*:*,
fl="id",
sort="id desc",
qt="/export",
wt="javabin",
partitionKeys=id)))
{code}
Notice the new *null* function which eats the tuples and returns a count to
verify the number of tuples processed.
The test query is sorting on the id field which has a unique value in each
record. Again performance was impressive:
* With json: 1,210,000 Tuples per second.
* With javabin: 1,350,000 Tuples per second.
So the ExportWriter doesn't slow down sorting on a high cardinality field.
Going forward the NullStream will be useful for testing the raw performance of
the ExportWriter in isolation. This will help developers diagnose where the
bottlekneck is if distributed joins aren't performing as expected.
For example if a join is slow, but the same export using the NullStream is
fast, then you can be sure that the bottleneck is not in the ExportWriter, and
is likely in Join stream.
> Add support for javabin for /stream, /sql internode communication
> -----------------------------------------------------------------
>
> Key: SOLR-9636
> URL: https://issues.apache.org/jira/browse/SOLR-9636
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Noble Paul
> Assignee: Noble Paul
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]