[ 
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785486#comment-15785486
 ] 

Joel Bernstein edited comment on SOLR-9636 at 12/29/16 3:15 PM:
----------------------------------------------------------------

I added a new NullStream to test the performance of exporting and sorting on a 
high cardinality field. High cardinality exporting/sorting is an important real 
world use case for supporting distributed joins on primary keys. The query 
looks like this:
 
{code}
parallel(collection2, workers=7, sort="count desc", 
      null(search(collection1, 
                   q=*:*, 
                   fl="id", 
                   sort="id desc", 
                   qt="/export", 
                   wt="javabin", 
                   partitionKeys=id)))
{code}

Notice the new *null* function which eats the tuples and returns a count to 
verify the number of tuples processed.

The test query is sorting on the id field which has a unique value in each 
record. Again performance was impressive:

* With json: 1,210,000 Tuples per second.
* With javabin: 1,350,000 Tuples per second.

So the ExportWriter doesn't slow down sorting on a high cardinality field.

Going forward the NullStream will be useful for testing the raw performance of 
the ExportWriter in isolation. This will help developers diagnose where the 
bottlekneck is if distributed joins aren't performing as expected.

For example if a join is slow, but the same export using the NullStream is 
fast, then you can be sure that the bottleneck is not in the ExportWriter, and 
is likely in Join stream.





was (Author: joel.bernstein):
I added a new NullStream to test the performance of exporting and sorting on a 
high cardinality field. This is a much more real world scenario for supporting 
distributed joins on primary keys. The query looks like this:
 
{code}
parallel(collection2, workers=7, sort="count desc", 
      null(search(collection1, 
                   q=*:*, 
                   fl="id", 
                   sort="id desc", 
                   qt="/export", 
                   wt="javabin", 
                   partitionKeys=id)))
{code}

Notice the new *null* function which eats the tuples and returns a count to 
verify the number of tuples processed.

The test query is sorting on the id field which has a unique value in each 
record. Again performance was impressive:

* With json: 1,210,000 Tuples per second.
* With javabin: 1,350,000 Tuples per second.

So the ExportWriter doesn't slow down sorting on a high cardinality field.

Going forward the NullStream will be useful for testing the raw performance of 
the ExportWriter in isolation. This will help developers diagnose where the 
bottlekneck is if distributed joins aren't performing as expected.

For example if a join is slow, but the same export using the NullStream is 
fast, then you can be sure that the bottleneck is not in the ExportWriter, and 
is likely in Join stream.




> Add support for javabin for /stream, /sql internode communication
> -----------------------------------------------------------------
>
>                 Key: SOLR-9636
>                 URL: https://issues.apache.org/jira/browse/SOLR-9636
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>             Fix For: master (7.0), 6.4
>
>         Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to