[
https://issues.apache.org/jira/browse/LENS-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301190#comment-14301190
]
Aniruddha Gangopadhyay commented on LENS-251:
---------------------------------------------
For a storage that does not have partitions, your understanding is correct.
In case of partitioned storage tables, I am presenting what I have understood
about how the update period related info is used during query execution. Please
correct me if I am wrong.
1) A query with a time range is fired (from and to dates are made available)
2) Based on the difference between to and from, the most coarse grain of
registered update periods is chosen. (so if difference between to and from > 1
year, and a yearly update period is registered, the query will be fired to the
yearly partitioned table)
3) If there are multiple storages with the chosen update period, the storage
(native) table is chosen based on cost.
Considering how step 2 works, since there is no notion of a time range
associated with the native tables, we might end up querying a storage table
that has no data in it (as it is very much possible that no partitions for that
time range were added to it) just because it fits our initial criteria of an
acceptable update period. It may be possible that a less optimal update period
(a more finer grain) has partitions registered for that queried time range, but
because of the existing model, we will end up showing an empty resultset.
That's why information about the time range for which data is present in a
particular storage table (partitioned or non-partitioned) is required so that
an appropriate elimination of storage tables happens before Step 2. In case of
partitioned tables, this information regarding time range can very well be
inferred from the already registered partitions, but in order to maintain
uniformity across storages, it would make more sense to have time range as a
first class construct.
As for the overhead of managing partitions (for any update period), this is
something that exists innately in the model. It ideally should not have
anything to do with the time range based change.
> Support to query streaming data sources
> ---------------------------------------
>
> Key: LENS-251
> URL: https://issues.apache.org/jira/browse/LENS-251
> Project: Apache Lens
> Issue Type: New Feature
> Components: cube
> Reporter: Sharad Agarwal
>
> For certain stores that allows streaming ingestion, to make the data
> available immediately for querying, we need the notion of streaming update
> period or some such.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)