[ 
https://issues.apache.org/jira/browse/FLINK-33508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796058#comment-17796058
 ] 

Anika Kelhanka commented on FLINK-33508:
----------------------------------------

Approach to update Flink's History Server logic to enable getting logs and data 
from multiple directories at a time by using a path with wildcards (i.e glob 
pattern) from HadoopFileSystem locations:

1. Flink's {{HistoryServerArchiveFetcher}} class currently uses the 
HadoopFileSystem's {{listStatus}} API method which not resolve 
patterns/wildcards in the history server file path.
2. Introduce a new method {{globStatus(Path pathPattern)}} in {{Flink's 
FileSystem}} API.
3. Implement new Method in Flink's HadoopFileSystem class such that it 
internally calls the [Hadoop's globStatus 
func|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L2217].
4. Finally, point {{HistoryServerArchiveFetcher}} to the new globStatus() API 
instead of listStatus() for HadoopFileSystem.

> Support for wildcard paths in Flink History Server for multi cluster 
> environment
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-33508
>                 URL: https://issues.apache.org/jira/browse/FLINK-33508
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Jayadeep Jayaraman
>            Assignee: Jayadeep Jayaraman
>            Priority: Major
>              Labels: pull-request-available
>
> In Cloud users typically create multiple clusters which are ephemeral and 
> want a single history server to look at historical jobs.
> To implement this history server needs to support wildcard paths and this 
> change is to support such wildcard paths



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to