[GitHub] [hudi] rshanmugam1 commented on issue #2609: [SUPPORT] Presto hudi query slow when compared to parquet

2021-03-31 Thread GitBox


rshanmugam1 commented on issue #2609:
URL: https://github.com/apache/hudi/issues/2609#issuecomment-811582823


   Thanks very much Sudha and team. 
   
   i will look in that direction and make sure that the cause. if that is the 
case, any pointers how to fix it. or any reference how it got fixed in facebook 
version of presto would be helpful. so that we can try same in Trino. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rshanmugam1 commented on issue #2609: [SUPPORT] Presto hudi query slow when compared to parquet

2021-03-01 Thread GitBox


rshanmugam1 commented on issue #2609:
URL: https://github.com/apache/hudi/issues/2609#issuecomment-788314281


   @pengzhiwei2018
   
   it is COW table only. even after commenting out 
//@UseRecordReaderFromInputFormat dont see any improvement in my case.




This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rshanmugam1 commented on issue #2609: [SUPPORT] Presto hudi query slow when compared to parquet

2021-02-27 Thread GitBox


rshanmugam1 commented on issue #2609:
URL: https://github.com/apache/hudi/issues/2609#issuecomment-787403720


   @lw309637554 thanks for your response.
   
   **_1. about first attempt parquet is 23 secs, but hudi is 40 secs. i see 
metadata init cost some time in the log._**
   yes, 2 major spending on meta data loading. is that expected or anything 
optimized ?.
   
   20 sec in this section
   `2021-02-27T04:27:24.714ZINFOhive-hive-18
org.apache.hudi.hadoop.utils.HoodieInputFormatUtils Total paths to process 
after hoodie filter 691`
   `2021-02-27T04:27:45.364ZINFOhive-hive-17
org.apache.hudi.hadoop.utils.HoodieInputFormatUtils Reading hoodie metadata 
from path s3a://my-test-bucket/tmp/ramesh/hudi_0_7_cl2/sample_data'=`
   
   another 15 sec goes here
   `2021-02-27T04:27:46.360ZINFOhive-hive-17
org.apache.hudi.hadoop.utils.HoodieInputFormatUtils Total paths to process 
after hoodie filter 623`
   `2021-02-27T04:28:02.931ZDEBUG   query-execution-16  
io.prestosql.execution.StageStateMachineStage 
20210227_042722_00016_9dket.2 is SCHEDULED`
   
   **2 about second attempt parquet is very fast,maybe presto support the 
parquet format local cache.**
   seems like local caching.  will look in to that direction how presto local 
cache works.
   
   **3.also parquet and hudi table result is not equal?**
   both are same dataset. sorry, the result's order not maintained. 151 rows 
less in hudi dataset because duplicate rows eliminated during ingestion.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org