VitoMakarevich opened a new issue, #9848:
URL: https://github.com/apache/hudi/issues/9848

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Recently we enabled metadata for one big table. This table includes ~700k 
partitions in the test environment and the usual insert affects ~600. Also it's 
important to say it's even more pressing if we have enabled the timeline server.
   The issue we have is visible in the slow `getting small files` stage - e.g. 
for 600 insert-affected partitions(600 tasks in this stage) it takes ~2-3 
minutes, and for 12k tasks(partitions affected) - ~40 minutes.
   important details:
   1. Metadata is enabled
   2. Timeline server is enabled
   3. Metadata table `hfile` file size - about 40MB
   4. Number of log files: "hoodie.metadata.compact.max.delta.commits" = "1"
   5. Only file listing is enabled in metadata
   I dug into this a lot and enabled detailed logs and this is what I think 
maybe the issue:
   In logs "Metadata read for %s keys took [baseFileRead, logMerge] %s ms" - 
half of the values are single digit numbers, half are > 1000, e.g. Metadata 
read for 1 keys took [baseFileRead, logMerge] [0, 12456] ms.
   There are also logs like
   `Updating metadata metrics (basefile_read.totalDuration=12155)`
   `Updating metadata metrics (lookup_files.totalDuration=12156)`
   So my suspect is that given such a large metadata `file`(40mb), it looks 
like hfile lookup is suboptimal. Do you aware of any issue similar to this(in 
0.12.1 and 0.12.2)? As I see in the code it seeks up to a partition path, may 
it happen that readers are somehow now being reused well, so these 40mb files 
are seeked again and again thousands of times?
   As a remediate, we turned off the embed server, so as I understand same 
thing will be done on executors, but it will be less problematic since 
parallelism is much bigger.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   Probably generating a large set of partitions(up to 30-40 MB hfile size) and 
running insert to thousands of partitions may reproduce it.
   
   **Expected behavior**
   
   `getting small files` should not be so long. Without metadata + embed server 
it takes ~40sec to check 12k partitions, while with metadata it's 40+ minutes.
   
   **Environment Description**
   
   * Hudi version : 0.12.1-0.12.2
   
   * Spark version : 3.3.0-3.3.1
   
   * Hive version :
   
   * Hadoop version : 3.3.3
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : yes
   
   
   **Additional context**
   I can try to make a reproduction if you don't know about anything like this.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to