[
https://issues.apache.org/jira/browse/PHOENIX-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991313#comment-13991313
]
Andrew Purtell commented on PHOENIX-838:
----------------------------------------
After PHOENIX-971, there might be a middle tier, with sufficient resources for
tracking and buffering streaming results, suitable to host this sort of
function.
> Continuous queries
> ------------------
>
> Key: PHOENIX-838
> URL: https://issues.apache.org/jira/browse/PHOENIX-838
> Project: Phoenix
> Issue Type: New Feature
> Reporter: Andrew Purtell
>
> Support continuous queries.
> As a coprocessor application, Phoenix is well positioned to observe
> mutations and treat those observations as an event stream.
> Continuous queries are persistent queries that run server side, typically
> expressed as structured queries using some extensions for defining a bounded
> subset of a potentially unbounded tuple stream. A Phoenix user could create a
> materialized view using WINDOW and other OLAP extensions to SQL discussed on
> PHOENIX-154 to define time- or tuple- based sliding windows, possibly
> partitioned, and an aggregating or filtering operation over those windows.
> This would trigger instantiation of a long running distributed task on the
> cluster for incrementally maintaining the view. ("Task" is meant here as a
> logical notion, it may not be a separate thread of execution.) As the task
> receives observer events and performs work, it would update state in memory
> for on-demand retrieval. For state reconstruction after failure the WAL could
> be overloaded with in-window event history and/or the in-memory state could
> be periodically checkpointed into shadow stores in the region.
> Users would pick up the latest state maintained by the continuous query by
> querying the view, or perhaps Phoenix can do this transparently on any query
> if the optimizer determines equivalence.
> This could be an important feature for Phoenix. Generally Phoenix and HBase
> are meant to handle high data volumes that overwhelm other data management
> options, so even subsets of the full data may present scale challenges. Many
> use cases mix ad hoc or exploratory full table scans with aggregates,
> rollups, or sampling queries over a subset or sample. The user wishes the
> latter queries to run as fast as possible. If that work can be done inline
> with the process of initially persisting mutations then we trade some memory
> and CPU resources up front to eliminate significant IO time later that would
> otherwise dominate.
--
This message was sent by Atlassian JIRA
(v6.2#6252)