[ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
---------------------------------
    Description: 
Currently the topic() function doesn't accept a partitionKeys parameter like 
the search() function does. This means the topic() function can't be wrapped by 
the parallel() function to run across worker nodes.

It would be useful to support parallelizing the topic function because it would 
provide a general purpose parallelized approach for processing batches of data 
as they enter the index.

For example this would allow a classify() function to be wrapped around a 
topic() function to classify documents in parallel across worker nodes. 

Sample syntax:

{code}
parallel(daemon(update(classify(topic(..., partitionKeys="id")))))
{code}

The example above would send a daemon to worker nodes that would classify all 
new documents returned by the topic() function. The update function would send 
the output of classify() to a SolrCloud collection for indexing.

The partitionKeys parameter would ensure that each worker would receive a 
partition of the results returned by the topic() function. This allows the 
classify() function to be run in parallel.






  was:
Currently the topic() function doesn't accept a partitionKeys parameter like 
the search() function does. This means the topic() function can't be wrapped by 
the parallel() function to run across worker nodes.

It would be useful to support parallelizing the topic function because it would 
provide a general purpose parallelized approach for processing batches of data 
as they enter the index.

For example this would allow a classify() function to be wrapped around a 
topic() function to classify documents in parallel across worker nodes. 

Sample syntax:

{code}
parallel(daemon(update(classify(topic(..., partitionKeys="id")))))
{code}

The example above would send a daemon out to worker nodes that would classify 
all new documents returned by the topic() function. The update function would 
send the output of classify() to a SolrCloud collection for indexing.







> Add the partitionKeys parameter to the topic() Streaming Expression
> -------------------------------------------------------------------
>
>                 Key: SOLR-9240
>                 URL: https://issues.apache.org/jira/browse/SOLR-9240
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Joel Bernstein
>
> Currently the topic() function doesn't accept a partitionKeys parameter like 
> the search() function does. This means the topic() function can't be wrapped 
> by the parallel() function to run across worker nodes.
> It would be useful to support parallelizing the topic function because it 
> would provide a general purpose parallelized approach for processing batches 
> of data as they enter the index.
> For example this would allow a classify() function to be wrapped around a 
> topic() function to classify documents in parallel across worker nodes. 
> Sample syntax:
> {code}
> parallel(daemon(update(classify(topic(..., partitionKeys="id")))))
> {code}
> The example above would send a daemon to worker nodes that would classify all 
> new documents returned by the topic() function. The update function would 
> send the output of classify() to a SolrCloud collection for indexing.
> The partitionKeys parameter would ensure that each worker would receive a 
> partition of the results returned by the topic() function. This allows the 
> classify() function to be run in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to