[ 
https://issues.apache.org/jira/browse/IMPALA-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948745#comment-16948745
 ] 

Tim Armstrong commented on IMPALA-9035:
---------------------------------------

[~Freedom007] will you submit a patch to Impala? Would be great to get this 
improvement.

> Simplify cast string to timestamp
> ---------------------------------
>
>                 Key: IMPALA-9035
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9035
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: 徐洲
>            Priority: Critical
>
> Some BI tools generate compatible sqls to cast string to timestamp:
> cast(unix_timestamp('timestr', 'fmt') as timestamp) or 
> cast(unix_timestamp('timestr') as timestamp)
> The internal type change in impala: 
> StringVal->TimestampValue->BigIntVal→TimestampValue→TimestampVal
> In earlier 2.x versions,  casting TimestampValue to BigIntVal using libc 
> functions which is more expensive than current design with CCTZ.
> However, the cast to BIgIntVal seems to be redundant. We can simplify 
> cast(unix_timestamp('timestr', 'fmt') as timestamp) to 
> to_timestamp('timestr', 'fmt'), simplify cast(unix_timestamp('timestr') as 
> timestamp) to cast('timestr' as timestamp).
> I managed to modify the fe to support such expr rewriting.
> Benefiting from the rewriting, the query time cost is reduced from 2.52s to 
> 2.02s.
> Here is the sql:
> SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` FROM 
> `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = `t2`.`sku_id`) 
> WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= 
> cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS 
> TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY 
> `t2`.`phy_category1_name` order by d0;
> Query: SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` 
> FROM `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = 
> `t2`.`sku_id`) WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= 
> cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS 
> TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY 
> `t2`.`phy_category1_name` order by d0
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to