[ 
https://issues.apache.org/jira/browse/FLINK-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191887#comment-16191887
 ] 

ASF GitHub Bot commented on FLINK-7643:
---------------------------------------

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/4776

    [FLINK-7643] [core] Rework FileSystem loading to use factories

    ## What is the purpose of the change
    
    This change reworks the loading and instantiation of File System objects 
(including file systems supported via Hadoop) to use factories. 
    
    This makes sure that configurations (Flink and possibly Hadoop) are loaded 
once (on TaskManager / JobManager startup) and file system instances are 
properly reused by scheme and authority. That way, this change 
    
    This change is also a prerequisite for an extensible file system loading 
mechanism via a service framework.
    
    ## Brief change log
    
      - The special-case configuration of the `FileSystem` class to set the 
"default file system scheme" is extended to a generic configuration call.
      - The directory of directly supported file systems is changed from 
classes (instantiated via reflection) to factories.
      - These factories are also configured when the `FileSystem` is configured.
      - The Hadoop file system factory loads the Hadoop configuration once when 
being configured and applies it to all subsequently instantiated file systems.
      - File systems supported via Hadoop are now properly cached and not 
reloaded, reinstantiated, and reconfigured on each access.
      - This also throws out a lot of legacy code for how to find Hadoop file 
system implementations
      - The `FileSystem` class is much cleaner now because a lot of the Hadoop 
FS
      - All file systems now eagerly initialize their settings, rather than 
dividing that between the constructor and the `initialize()` method.
      - This also factors out a lot of the special treatment of Hadoop file 
systems and simply makes the Hadoop File System factory the default fallback 
factory.
    
    ## Verifying this change
    
    Reworked some tests to cover the behavior of this change:
      - 
`flink-core/src/test/java/org/apache/flink/configuration/FilesystemSchemeConfigTest.java`
      - 
`flink-runtime/src/test/java/org/apache/flink/runtime/taskmanager/TaskManagerConfigurationTest.java`
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (**yes** / no)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / 
**no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
    
    *Note:* The breaking changes made on `@Public` class `FileSystem` do not 
include methods that are meant for users, but only the setup configuration.
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink fs_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4776.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4776
    
----
commit ba312e137c7af1d2c331c5231b5b0ae3e0401549
Author: Stephan Ewen <[email protected]>
Date:   2017-10-02T12:34:27Z

    [FLINK-7643] [core] Misc. cleanups in FileSystem
    
      - Simplify access to local file system
      - Use a fair lock for all FileSystem.get() operations
      - Robust falback to local fs for default scheme (avoids URI parsing error 
on Windows)
      - Deprecate 'getDefaultBlockSize()'
      - Deprecate create(...) with block sizes and replication factor, which is 
not applicable to many FS

commit 8130d874b8b823f22964f435bf1a1d1bd39774d6
Author: Stephan Ewen <[email protected]>
Date:   2017-10-02T14:25:18Z

    [FLINK-7643] [core] Rework FileSystem loading to use factories
    
    This makes sure that configurations are loaded once and file system 
instances are
    properly reused by scheme and authority.
    
    This also factors out a lot of the special treatment of Hadoop file systems 
and simply
    makes the Hadoop File System factory the default fallback factory.

commit c652f1322044f9715a0d94fa21ec853769be9a78
Author: Stephan Ewen <[email protected]>
Date:   2017-10-02T14:30:07Z

    [FLINK-7643] [core] Drop eager checks for file system support.
    
    Some places validate if the file URIs are resolvable on the client. This 
leads to
    problems when file systems are not accessible from the client, when the 
full libraries for
    the file systems are not present on the client (for example often the case 
in cloud setups),
    or when the configuration on the client is different from the 
nodes/containers that will
    execute the application.

----


> Configure FileSystems only once
> -------------------------------
>
>                 Key: FLINK-7643
>                 URL: https://issues.apache.org/jira/browse/FLINK-7643
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.4.0
>            Reporter: Ufuk Celebi
>            Assignee: Stephan Ewen
>
> HadoopFileSystem always reloads GlobalConfiguration, which potentially leads 
> to a lot of noise in the logs, because this happens on each checkpoint.
> Instead, file systems should be configured once upon process startup, when 
> the configuration is loaded.
> This will also increase efficiency of checkpoints, as it avoids redundant 
> parsing for each data chunk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to