[
https://issues.apache.org/jira/browse/HIVE-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384917#comment-14384917
]
Mostafa Mokhtar commented on HIVE-10122:
----------------------------------------
[~sershe]
Ok, now I understand what is going on, this is the problematic metastore query
{code}
SELECT
A0.PART_NAME AS NUCORDER0
FROM
PARTITIONS A0
LEFT OUTER JOIN
TBLS B0 ON A0.TBL_ID = B0.TBL_ID
LEFT OUTER JOIN
DBS C0 ON B0.DB_ID = C0.DB_ID
WHERE
C0.NAME = 'tpcds_bin_partitioned_orc_30000'
AND B0.TBL_NAME = 'store_sales'
ORDER BY NUCORDER0;
{code}
Then the remaining ones use the PART_NAME and PART_ID from the previous query
{code}
select
PARTITIONS.PART_ID
from
PARTITIONS
inner join
TBLS ON PARTITIONS.TBL_ID = TBLS.TBL_ID
and TBLS.TBL_NAME = 'store_sales'
inner join
DBS ON TBLS.DB_ID = DBS.DB_ID
and DBS.NAME = 'tpcds_bin_partitioned_orc_30000'
where
PARTITIONS.PART_NAME in ('ss_sold_date_sk=2450816' ,
'ss_sold_date_sk=2450817');
{code}
{code}
select
PARTITIONS.PART_ID,
SDS.SD_ID,
SDS.CD_ID,
SERDES.SERDE_ID,
PARTITIONS.CREATE_TIME,
PARTITIONS.LAST_ACCESS_TIME,
SDS.INPUT_FORMAT,
SDS.IS_COMPRESSED,
SDS.IS_STOREDASSUBDIRECTORIES,
SDS.LOCATION,
SDS.NUM_BUCKETS,
SDS.OUTPUT_FORMAT,
SERDES.NAME,
SERDES.SLIB
from
PARTITIONS
left outer join
SDS ON PARTITIONS.SD_ID = SDS.SD_ID
left outer join
SERDES ON SDS.SERDE_ID = SERDES.SERDE_ID
where
PART_ID in (59203 , 58422)
order by PART_NAME asc;
{code}
If filters are on the partitioned column only as in
{code}
select ss_item_sk rowcount from store_sales where ss_sold_date_sk between
2450816 and 2450817 ;
{code}
Then PARTITIONS table is queried with a partition filter.
{code}
select
PARTITIONS.PART_ID
from
PARTITIONS
inner join
TBLS ON PARTITIONS.TBL_ID = TBLS.TBL_ID
and TBLS.TBL_NAME = 'store_sales'
inner join
DBS ON TBLS.DB_ID = DBS.DB_ID
and DBS.NAME = 'tpcds_bin_partitioned_orc_30000'
inner join
PARTITION_KEY_VALS FILTER0 ON FILTER0.PART_ID = PARTITIONS.PART_ID
and FILTER0.INTEGER_IDX = 0
where
((((case
when
TBLS.TBL_NAME = 'store_sales'
and DBS.NAME = 'tpcds_bin_partitioned_orc_30000'
then
cast(FILTER0.PART_KEY_VAL as decimal (21 , 0 ))
else null
end) >= 2450816)
and ((case
when
TBLS.TBL_NAME = 'store_sales'
and DBS.NAME = 'tpcds_bin_partitioned_orc_30000'
then
cast(FILTER0.PART_KEY_VAL as decimal (21 , 0 ))
else null
end) <= 2450817)));
{code}
For 2K partitions there is no measurable performance difference between
{code}
explain select ss_item_sk rowcount from store_sales where ss_sold_date_sk
between 2450816 and 2450817 ;
{code}
and
{code}
explain select ss_item_sk rowcount from store_sales where ss_sold_date_sk
between 2450816 and 2450817 and ss_ticket_number > 100000000 and ss_item_sk >
50;
{code}
> Hive metastore filter-by-expression is broken for non-partition expressions
> ---------------------------------------------------------------------------
>
> Key: HIVE-10122
> URL: https://issues.apache.org/jira/browse/HIVE-10122
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Affects Versions: 0.14.0, 1.0.0, 1.1.0
> Reporter: Sergey Shelukhin
>
> See
> https://issues.apache.org/jira/browse/HIVE-10091?focusedCommentId=14382413&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14382413
> These two lines of code
> {noformat}
> // Replace virtual columns with nulls. See javadoc for details.
> prunerExpr = removeNonPartCols(prunerExpr, extractPartColNames(tab),
> partColsUsedInFilter);
> // Remove all parts that are not partition columns. See javadoc for
> details.
> ExprNodeDesc compactExpr = compactExpr(prunerExpr.clone());
> {noformat}
> are supposed to take care of this; I see there were bunch of changes to this
> code over some time, and now it appears to be broken.
> Thanks to [~thejas] for info.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)