[ 
https://issues.apache.org/jira/browse/LENS-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481121#comment-14481121
 ] 

Jaideep Dhok commented on LENS-481:
-----------------------------------

We are selecting fact with least partitions since it is assumed that the one 
with lower number of partitions will lead to less I/O for the MR job, but 
ideally I belive table weight is the right way to deal with this. If table 
weight is set correctly, then least partition resolver is not necessary.

*We should have a flag to disable least partition resolver to give more control 
to users*

With partial data enabled current implementation has these issues. - 
# Higher granularity fact gets picked up even if it has *no* data. For example 
partition count for monthly fact with 0 partitions becomes 12 after partial 
data enabled, and daily with _one_ partition becomes 365. So monthly fact gets 
selected, even though it does not make sense since we know that this way user 
will get an empty result.
# If we only count actual partitions and ignore the missing ones, then there is 
one more issue - Let's say for daily fact there are partitions for day 1, 2, 
day 100 and  day 200, and monthly fact has partitions for month 1, 2, 3, 4, 5 & 
6. In this case daily has less number of partitions, but monthly will have more 
complete data so monthly fact should be selected.


> LeastPartitionResolver should check for actual partitions and not requested 
> partitions when lens.cube.query.fail.if.data.partial=false
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LENS-481
>                 URL: https://issues.apache.org/jira/browse/LENS-481
>             Project: Apache Lens
>          Issue Type: Bug
>            Reporter: Angad Singh
>            Assignee: Rajat Khandelwal
>
> Currently LeastPartitionResolver prunes facts based on the desired/requested 
> partitions of facts (based on granularity). Consider a cube with 2 facts - 
> one at daily granularity and one of monthly. Suppose the monthly fact is 
> registered with no partitions and the daily fact has 1 partitions. when you 
> query on a year-long time range, the LeastPartitionResolver spuriously prunes 
> the daily fact (even though it had data) and selects the monthly cube (which 
> doesn't have any data), and runs the query (giving no output).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to