rshanmugam1 commented on issue #2609:
URL: https://github.com/apache/hudi/issues/2609#issuecomment-787403720
@lw309637554 thanks for your response.
**_1. about first attempt parquet is 23 secs, but hudi is 40 secs. i see
metadata init cost some time in the log._**
yes, 2 major spending on meta data loading. is that expected or anything
optimized ?.
20 sec in this section
`2021-02-27T04:27:24.714ZINFOhive-hive-18
org.apache.hudi.hadoop.utils.HoodieInputFormatUtils Total paths to process
after hoodie filter 691`
`2021-02-27T04:27:45.364ZINFOhive-hive-17
org.apache.hudi.hadoop.utils.HoodieInputFormatUtils Reading hoodie metadata
from path s3a://my-test-bucket/tmp/ramesh/hudi_0_7_cl2/sample_data'=`
another 15 sec goes here
`2021-02-27T04:27:46.360ZINFOhive-hive-17
org.apache.hudi.hadoop.utils.HoodieInputFormatUtils Total paths to process
after hoodie filter 623`
`2021-02-27T04:28:02.931ZDEBUG query-execution-16
io.prestosql.execution.StageStateMachineStage
20210227_042722_00016_9dket.2 is SCHEDULED`
**2 about second attempt parquet is very fast,maybe presto support the
parquet format local cache.**
seems like local caching. will look in to that direction how presto local
cache works.
**3.also parquet and hudi table result is not equal?**
both are same dataset. sorry, the result's order not maintained. 151 rows
less in hudi dataset because duplicate rows eliminated during ingestion.
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org