Puneet Gupta created LENS-1309:
----------------------------------

             Summary: Add capability to specify that "Future Partitions" should 
not be considered while answering qeuries
                 Key: LENS-1309
                 URL: https://issues.apache.org/jira/browse/LENS-1309
             Project: Apache Lens
          Issue Type: Improvement
            Reporter: Puneet Gupta


Use case . 

Lets say we have a Fact A which has DAILY and HOURLY update periods. 
We have partitioned the fact based on pt(process time) and et(event arrival 
time).
Assume today is Sep 9th and while processing data for Sep 8th 23rd(last) hour 
(i.e , pt=2016-09-08-23), we found few records with Event time as Sep 9, 0th 
hour (due to .. clock synchronization, fraud data,etc). This will lead to 
partitions like pt=2016-09-08-23 an et =2016-09-09-00  at HOUR level and 
pt=2016-09-08 and et =2016-09-09 at DAY level. 

This makes the system believe that 9th DAY level data is available for event 
time queries (as the time line does not consider pt for event time queries). 
This will lead to wrong query outputs since this day partition  pt=2016-09-08 
and et =2016-09-09 will have only a very small part of 9th day data.  Major 
chunk of DAY data for 9th will only get created on 10th morning (pt=2016-09-09 
and et =2016-09-09). In this case LENS will answer query from DAY update period 
for 9th Sep, while it should have used HOURLY data for 9th.

Expose a query level config to enforce/specify semantics that make sure LENS 
considers et partitions only if they are <= most recent pt partition. The 
future partitions should be ignored for higher granularity(DAY) and instead 
query should get answered form lower granularity data(HOUR). This should also 
apply for lookahead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to