[
https://issues.apache.org/jira/browse/LENS-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481121#comment-14481121
]
Jaideep Dhok commented on LENS-481:
-----------------------------------
We are selecting fact with least partitions since it is assumed that the one
with lower number of partitions will lead to less I/O for the MR job, but
ideally I belive table weight is the right way to deal with this. If table
weight is set correctly, then least partition resolver is not necessary.
*We should have a flag to disable least partition resolver to give more control
to users*
With partial data enabled current implementation has these issues. -
# Higher granularity fact gets picked up even if it has *no* data. For example
partition count for monthly fact with 0 partitions becomes 12 after partial
data enabled, and daily with _one_ partition becomes 365. So monthly fact gets
selected, even though it does not make sense since we know that this way user
will get an empty result.
# If we only count actual partitions and ignore the missing ones, then there is
one more issue - Let's say for daily fact there are partitions for day 1, 2,
day 100 and day 200, and monthly fact has partitions for month 1, 2, 3, 4, 5 &
6. In this case daily has less number of partitions, but monthly will have more
complete data so monthly fact should be selected.
> LeastPartitionResolver should check for actual partitions and not requested
> partitions when lens.cube.query.fail.if.data.partial=false
> --------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LENS-481
> URL: https://issues.apache.org/jira/browse/LENS-481
> Project: Apache Lens
> Issue Type: Bug
> Reporter: Angad Singh
> Assignee: Rajat Khandelwal
>
> Currently LeastPartitionResolver prunes facts based on the desired/requested
> partitions of facts (based on granularity). Consider a cube with 2 facts -
> one at daily granularity and one of monthly. Suppose the monthly fact is
> registered with no partitions and the daily fact has 1 partitions. when you
> query on a year-long time range, the LeastPartitionResolver spuriously prunes
> the daily fact (even though it had data) and selects the monthly cube (which
> doesn't have any data), and runs the query (giving no output).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)