[
https://issues.apache.org/jira/browse/SOLR-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cassandra Targett updated SOLR-8965:
------------------------------------
Component/s: streaming expressions
> Add Path reduce operation to aggregate paths in a session
> ---------------------------------------------------------
>
> Key: SOLR-8965
> URL: https://issues.apache.org/jira/browse/SOLR-8965
> Project: Solr
> Issue Type: New Feature
> Components: streaming expressions
> Reporter: Joel Bernstein
> Priority: Major
>
> Session aggregation can be hard to do at scale. MapReduce of course makes
> this easy. Now that we have MapReduce it would be good to add some session
> aggregations to the base library.
> The Path reduce operation can be used with the *reduce* function to
> concatenate the path taken in a session into a single field. These path
> records can then be added to another SolrCloud collection using the update
> stream. Once they have been consolidated in another collection aggregations
> can be run on the paths using the RollupStream.
> A HashRollupStream could also be developed to aggregate the paths as they are
> reduced. The HashRollupStream would keep all the paths in a hash map during
> the aggregation so it would not require the paths to be received in order.
> sample syntax:
> {code}
> reduce(search(logs, q="*:*", sort="sessionId, timestamp", ...),
> by="sessionId",
> path(field="pageId"))
> {code}
> This would work great in parallel by partitioning on the sessionId.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]