[ https://issues.apache.org/jira/browse/IMPALA-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948745#comment-16948745 ]
Tim Armstrong commented on IMPALA-9035: --------------------------------------- [~Freedom007] will you submit a patch to Impala? Would be great to get this improvement. > Simplify cast string to timestamp > --------------------------------- > > Key: IMPALA-9035 > URL: https://issues.apache.org/jira/browse/IMPALA-9035 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Reporter: 徐洲 > Priority: Critical > > Some BI tools generate compatible sqls to cast string to timestamp: > cast(unix_timestamp('timestr', 'fmt') as timestamp) or > cast(unix_timestamp('timestr') as timestamp) > The internal type change in impala: > StringVal->TimestampValue->BigIntVal→TimestampValue→TimestampVal > In earlier 2.x versions, casting TimestampValue to BigIntVal using libc > functions which is more expensive than current design with CCTZ. > However, the cast to BIgIntVal seems to be redundant. We can simplify > cast(unix_timestamp('timestr', 'fmt') as timestamp) to > to_timestamp('timestr', 'fmt'), simplify cast(unix_timestamp('timestr') as > timestamp) to cast('timestr' as timestamp). > I managed to modify the fe to support such expr rewriting. > Benefiting from the rewriting, the query time cost is reduced from 2.52s to > 2.02s. > Here is the sql: > SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` FROM > `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = `t2`.`sku_id`) > WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= > cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS > TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY > `t2`.`phy_category1_name` order by d0; > Query: SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` > FROM `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = > `t2`.`sku_id`) WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= > cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS > TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY > `t2`.`phy_category1_name` order by d0 > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org