[ 
https://issues.apache.org/jira/browse/SOLR-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-8965:
---------------------------------
    Description: 
Session aggregation can be hard to do at scale. MapReduce of course makes this 
easy. Now that we have MapReduce it would be good to add some session 
aggregations to the base library. 

The Path reduce operation can be used with the *reduce* function to concatenate 
the path taken in a session into a single field. These path records can then be 
added to another SolrCloud collection using the update stream. Once they have 
been consolidated in another collection aggregations can be run on the paths 
using the RollupStream.

A HashRollupStream could also be developed to aggregate the paths as they are 
reduced. The HashRollupStream would keep all the paths in a hash map during the 
aggregation so it would not require the paths to be received in order.

sample syntax:

{code}
reduce(search(logs, q="*:*", sort="sessionId, timestamp", ...),
       by="sessionId",
       path(field="pageId"))
{code}

This would work great in parallel by partitioning on the sessionId.





  was:
One of the things it's tricky to do at scale is session aggregation. MapReduce 
of course makes this easy. Now that we have MapReduce it would be good to add 
some session aggregations to the base library. 

The Path reduce operation can be used with the *reduce* function to concatenate 
the path taken in a session into a single field. These path records can then be 
added to another SolrCloud collection using the update stream. Once they have 
been consolidated in another collection aggregations can be run on the paths 
using the RollupStream.

A HashRollupStream could also be developed to aggregate the paths as they are 
reduced. The HashRollupStream would keep all the paths in a hash map during the 
aggregation so it would not require the paths to be received in order.

sample syntax:

{code}
reduce(search(logs, q="*:*", sort="sessionId, timestamp", ...),
       by="sessionId",
       path(field="pageId"))
{code}

This would work great in parallel by partitioning on the sessionId.






> Add Path reduce operation to aggregate paths in a session
> ---------------------------------------------------------
>
>                 Key: SOLR-8965
>                 URL: https://issues.apache.org/jira/browse/SOLR-8965
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joel Bernstein
>
> Session aggregation can be hard to do at scale. MapReduce of course makes 
> this easy. Now that we have MapReduce it would be good to add some session 
> aggregations to the base library. 
> The Path reduce operation can be used with the *reduce* function to 
> concatenate the path taken in a session into a single field. These path 
> records can then be added to another SolrCloud collection using the update 
> stream. Once they have been consolidated in another collection aggregations 
> can be run on the paths using the RollupStream.
> A HashRollupStream could also be developed to aggregate the paths as they are 
> reduced. The HashRollupStream would keep all the paths in a hash map during 
> the aggregation so it would not require the paths to be received in order.
> sample syntax:
> {code}
> reduce(search(logs, q="*:*", sort="sessionId, timestamp", ...),
>        by="sessionId",
>        path(field="pageId"))
> {code}
> This would work great in parallel by partitioning on the sessionId.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to