[ 
https://issues.apache.org/jira/browse/PHOENIX-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991313#comment-13991313
 ] 

Andrew Purtell commented on PHOENIX-838:
----------------------------------------

After PHOENIX-971, there might be a middle tier, with sufficient resources for 
tracking and buffering streaming results, suitable to host this sort of 
function.

> Continuous queries
> ------------------
>
>                 Key: PHOENIX-838
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-838
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>
> Support continuous queries. 
> As a coprocessor application, Phoenix is well positioned to observe  
> mutations and treat those observations as an event stream. 
> Continuous queries are persistent queries that run server side, typically 
> expressed as structured queries using some extensions for defining a bounded 
> subset of a potentially unbounded tuple stream. A Phoenix user could create a 
> materialized view using WINDOW and other OLAP extensions to SQL discussed on 
> PHOENIX-154 to define time- or tuple- based sliding windows, possibly 
> partitioned, and an aggregating or filtering operation over those windows. 
> This would trigger instantiation of a long running distributed task on the 
> cluster for incrementally maintaining the view. ("Task" is meant here as a 
> logical notion, it may not be a separate thread of execution.) As the task 
> receives observer events and performs work, it would update state in memory 
> for on-demand retrieval. For state reconstruction after failure the WAL could 
> be overloaded with in-window event history and/or the in-memory state could 
> be periodically checkpointed into shadow stores in the region.
> Users would pick up the latest state maintained by the continuous query by 
> querying the view, or perhaps Phoenix can do this transparently on any query 
> if the optimizer determines equivalence.
> This could be an important feature for Phoenix. Generally Phoenix and HBase 
> are meant to handle high data volumes that overwhelm other data management 
> options, so even subsets of the full data may present scale challenges. Many 
> use cases mix ad hoc or exploratory full table scans with aggregates, 
> rollups, or sampling queries over a subset or sample. The user wishes the 
> latter queries to run as fast as possible. If that work can be done inline 
> with the process of initially persisting mutations then we trade some memory 
> and CPU resources up front to eliminate significant IO time later that would 
> otherwise dominate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to