[ 
https://issues.apache.org/jira/browse/HADOOP-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HADOOP-15358:
--------------------------------------
    Description: 
Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some 
methods of SFTPFileSystem are chained together resulting in establishing 
multiple connections to the SFTP server to accomplish one compound action, 
those methods are listed below:
 # mkdirs method
the public mkdirs method acquires a new ChannelSftp from the pool [1]
and then recursively creates directories, checking for the directory existence 
beforehand by calling the method exists[2] which delegates to the 
getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it 
ends up in returning the FilesStatus instance [4]. The resource leakage occurs 
in the method getWorkingDirectory which calls the getHomeDirectory method [5] 
which in turn establishes a new connection to the sftp server instead of using 
an already created connection. As the mkdirs method is recursive this results 
in creating a huge number of connections.
 # open method [6]. This method returns an instance of FSDataInputStream which 
consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp 
instance back to the pool but instead it closes it[7]. This leads to 
establishing another connection to an SFTP server when the next method is 
called on the FileSystem instance.


[1] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658

[2] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321

[3] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202

[4] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290

[5] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640

[6] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504

[7] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123

  was:
Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some 
methods of SFTPFileSystem are chained together resulting in establishing 
multiple connections to the SFTP server to accomplish one compound action, 
those methods are listed below:
 # mkdirs method
the public mkdirs method acquires a new ChannelSftp [from the 
pool|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658]]
and then recursively creates directories, checking for the directory existence 
beforehand by calling the method 
[exists|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321]
] which delegates to the getFileStatus(ChannelSftp channel, Path file) 
[method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202]]
 and so on until it ends up in returning the [FilesStatus 
instance|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290]].
 The resource leakage occurs in the method getWorkingDirectory which calls the 
getHomeDirectory 
[method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640]]
 which in turn establishes a new connection to the sftp server instead of using 
an already created connection. As the mkdirs method is recursive this results 
in creating a huge number of connections.
 # open 
[method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504]].
 This method returns an instance of FSDataInputStream which consumes 
SFTPInputStream instance which doesn't return an acquired ChannelSftp instance 
back to the pool but instead it 
[closes|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123]]
 it. This leads to establishing another connection to an SFTP server when the 
next method is called on the FileSystem instance.

 


> SFTPConnectionPool connections leakage
> --------------------------------------
>
>                 Key: HADOOP-15358
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15358
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0
>            Reporter: Mikhail Pryakhin
>            Priority: Critical
>
> Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus 
> some methods of SFTPFileSystem are chained together resulting in establishing 
> multiple connections to the SFTP server to accomplish one compound action, 
> those methods are listed below:
>  # mkdirs method
> the public mkdirs method acquires a new ChannelSftp from the pool [1]
> and then recursively creates directories, checking for the directory 
> existence beforehand by calling the method exists[2] which delegates to the 
> getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it 
> ends up in returning the FilesStatus instance [4]. The resource leakage 
> occurs in the method getWorkingDirectory which calls the getHomeDirectory 
> method [5] which in turn establishes a new connection to the sftp server 
> instead of using an already created connection. As the mkdirs method is 
> recursive this results in creating a huge number of connections.
>  # open method [6]. This method returns an instance of FSDataInputStream 
> which consumes SFTPInputStream instance which doesn't return an acquired 
> ChannelSftp instance back to the pool but instead it closes it[7]. This leads 
> to establishing another connection to an SFTP server when the next method is 
> called on the FileSystem instance.
> [1] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658
> [2] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321
> [3] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202
> [4] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290
> [5] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640
> [6] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504
> [7] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to