[ 
https://issues.apache.org/jira/browse/HUDI-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsha Teja Kanna updated HUDI-3066:
------------------------------------
    Description: 
After 'metadata table' is enabled, File listing takes long time.

If metadata is enabled on Reader side, it is taking even more time per file 
listing task.

Existing tables (COW) have inline clustering on and have many replace commits.

Logs seem to suggest the delay is in view.AbstractTableFileSystemView 

resetFileGroupsReplaced function.

 

2021-12-18 23:17:54,056 INFO view.AbstractTableFileSystemView: Took 4118 ms to 
read  136 instants, 9731 replaced file groups

Sample file listing tasks

!Screen Shot 2021-12-18 at 6.16.29 PM.png!

 

  was:
After 'metadata table' is enabled, File listing takes long time.

Existing tables (COW) have inline clustering on and have many replace commits.

Logs seem to suggest the delay is in view.AbstractTableFileSystemView 

resetFileGroupsReplaced function.

 

2021-12-18 23:17:54,056 INFO view.AbstractTableFileSystemView: Took 4118 ms to 
read  136 instants, 9731 replaced file groups

Sample file listing tasks

!Screen Shot 2021-12-18 at 6.16.29 PM.png!

 


> Very slow file listing after enabling metadata for existing tables in 0.10.0 
> release
> ------------------------------------------------------------------------------------
>
>                 Key: HUDI-3066
>                 URL: https://issues.apache.org/jira/browse/HUDI-3066
>             Project: Apache Hudi
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>         Environment: EMR 6.4.0
> Hudi version : 0.10.0
>            Reporter: Harsha Teja Kanna
>            Priority: Critical
>              Labels: performance
>         Attachments: Screen Shot 2021-12-18 at 6.16.29 PM.png
>
>
> After 'metadata table' is enabled, File listing takes long time.
> If metadata is enabled on Reader side, it is taking even more time per file 
> listing task.
> Existing tables (COW) have inline clustering on and have many replace commits.
> Logs seem to suggest the delay is in view.AbstractTableFileSystemView 
> resetFileGroupsReplaced function.
>  
> 2021-12-18 23:17:54,056 INFO view.AbstractTableFileSystemView: Took 4118 ms 
> to read  136 instants, 9731 replaced file groups
> Sample file listing tasks
> !Screen Shot 2021-12-18 at 6.16.29 PM.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to