[jira] [Commented] (LENS-251) Support to query streaming data sources

Aniruddha Gangopadhyay (JIRA) Mon, 02 Feb 2015 04:08:40 -0800

    [ 
https://issues.apache.org/jira/browse/LENS-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301190#comment-14301190
 ]


Aniruddha Gangopadhyay commented on LENS-251:
---------------------------------------------

For a storage that does not have partitions, your understanding is correct. 

In case of partitioned storage tables, I am presenting what I have understood 
about how the update period related info is used during query execution. Please 
correct me if I am wrong. 
1) A query with a time range is fired (from and to dates are made available)
2) Based on the difference between to and from, the most coarse grain of 
registered update periods is chosen. (so if difference between to and from > 1 
year, and a yearly update period is registered, the query will be fired to the 
yearly partitioned table)
3) If there are multiple storages with the chosen update period, the storage 
(native) table is chosen based on cost. 

Considering how step 2 works, since there is no notion of a time range 
associated with the native tables, we might end up querying a storage table 
that has no data in it (as it is very much possible that no partitions for that 
time range were added to it) just because it fits our initial criteria of an 
acceptable update period. It may be possible that a less optimal update period 
(a more finer grain) has partitions registered for that queried time range, but 
because of the existing model, we will end up showing an empty resultset. 
That's why information about the time range for which data is present in a 
particular storage table (partitioned or non-partitioned) is required so that 
an appropriate elimination of storage tables happens before Step 2. In case of 
partitioned tables, this information regarding time range can very well be 
inferred from the already registered partitions, but in order to maintain 
uniformity across storages, it would make more sense to have time range as a 
first class construct. 

As for the overhead of managing partitions (for any update period), this is 
something that exists innately in the model. It ideally should not have 
anything to do with the time range based change.  

 

> Support to query streaming data sources
> ---------------------------------------
>
>                 Key: LENS-251
>                 URL: https://issues.apache.org/jira/browse/LENS-251
>             Project: Apache Lens
>          Issue Type: New Feature
>          Components: cube
>            Reporter: Sharad Agarwal
>
> For certain stores that allows streaming ingestion, to make the data 
> available immediately for querying, we need the notion of streaming update 
> period or some such.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (LENS-251) Support to query streaming data sources

Reply via email to