[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044793#comment-13044793 ] Hudson commented on HDFS-1965: -- Integrated in Hadoop-Hdfs-22-branch #61 (See [https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/61/]) > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt, > hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038076#comment-13038076 ] Tsz Wo (Nicholas), SZE commented on HDFS-1965: -- Okay, I fine with it since it is only a temporary fix. +1 the 0.22 patch looks good. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt, > hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038072#comment-13038072 ] Todd Lipcon commented on HDFS-1965: --- I think in trunk, it's not possible, since the connection is only lazily opened by the actual RPC to the DataNode. Then, it won't close since there's a call outstanding. In 0.22, it's possible that it will open one connection for the getProtocolVersion() call and a second one for the actual RPC. Unless I'm missing something, that should only be an efficiency issue and not a correctness issue. Do you agree? > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt, > hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038055#comment-13038055 ] Tsz Wo (Nicholas), SZE commented on HDFS-1965: -- Came up a question: By setting maxidletime to 0, is there a race condition that the timeout occurs before the first call, i.e. the proxy is closed before the first call? > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt, > hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037684#comment-13037684 ] Todd Lipcon commented on HDFS-1965: --- Nicholas: can you please take a quick look at the 0.22 patch? > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt, > hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037355#comment-13037355 ] Hudson commented on HDFS-1965: -- Integrated in Hadoop-Hdfs-trunk #673 (See [https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/673/]) HDFS-1965. IPCs done using block token-based tickets can't reuse connections. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125605 Files : * /hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/DFSTestUtil.java * /hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/security/token/block/TestBlockToken.java * /hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/hdfs/trunk/CHANGES.txt > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt, > hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037215#comment-13037215 ] Hadoop QA commented on HDFS-1965: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479975/hdfs-1965-0.22.txt against trunk revision 1125605. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/607//console This message is automatically generated. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt, > hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037214#comment-13037214 ] Tsz Wo (Nicholas), SZE commented on HDFS-1965: -- Hey Todd, please wait for Hadoop QA before committing the patch. It sometimes catches unexpected problems. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965-0.22.txt, hdfs-1965.txt, hdfs-1965.txt, > hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037209#comment-13037209 ] Hadoop QA commented on HDFS-1965: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479967/hdfs-1965.txt against trunk revision 1125217. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/605//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/605//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/605//console This message is automatically generated. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965.txt, hdfs-1965.txt, hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037205#comment-13037205 ] Hudson commented on HDFS-1965: -- Integrated in Hadoop-Hdfs-trunk-Commit #677 (See [https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/677/]) HDFS-1965. IPCs done using block token-based tickets can't reuse connections. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125605 Files : * /hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/DFSTestUtil.java * /hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/security/token/block/TestBlockToken.java * /hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/hdfs/trunk/CHANGES.txt > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965.txt, hdfs-1965.txt, hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037195#comment-13037195 ] Todd Lipcon commented on HDFS-1965: --- Committed to trunk after re-running the test. It doesn't apply directly to 0.22. Let me format a patch there and upload it soon. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965.txt, hdfs-1965.txt, hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037113#comment-13037113 ] Tsz Wo (Nicholas), SZE commented on HDFS-1965: -- Okay, you mean this is a temporary fix. Sounds good. Some comments on the patch: - Instead of changing it to public, we could create add a utility method, say in {{DFSTestUtil}}, for invoking the package private method. {code} + /** Public only for tests */ + public static ClientDatanodeProtocol createClientDatanodeProtocolProxy( {code} - How about putting {{confWithNoIpcIdle}} as a member field? - Please use {{CommonConfigurationKeysPublic.IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY}} for "ipc.client.connection.maxidletime". - Please add a comment saying that this is a temporary fix and the corresponding codes should be removed once {{stopProxy(..)}} is fixed. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965.txt, hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037104#comment-13037104 ] Todd Lipcon commented on HDFS-1965: --- bq. Todd, just saw you comments. I think this is the real bug: we should fix stopProxy(..) instead of changing max idle time. Yes, you're probably right. But maybe we can use this as a stop-gap for 0.22 while we work on the stopProxy fix in trunk? I'm afraid the stopProxy stuff will be complicated - that IPC code is kind of spaghetti. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965.txt, hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037095#comment-13037095 ] Tsz Wo (Nicholas), SZE commented on HDFS-1965: -- > Turns out the reason that RPC.stopProxy isn't effective in "real life" is > that the WritableRpcEngine "Client" objects are cached in ClientCache with > keys that aren't tied to principals. So, stopProxy doesn't actually cause the > connection to disconnect. I'm not sure if that's a bug or by design. Todd, just saw you comments. I think this is the real bug: we should fix {{stopProxy(..)}} instead of changing max idle time. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965.txt, hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037090#comment-13037090 ] Tsz Wo (Nicholas), SZE commented on HDFS-1965: -- It seems that the reasons of {{TestFileConcurrentReader}} failing are: - The test open many files within a short period of time, says in a few seconds. - {{DFSClient}} creates a proxy for each open. - Since the default ipc.client.connection.maxidletime is 10 seconds, so the proxies are not yet closed. - Therefore, {{TestFileConcurrentReader}} fails with runtime exceptions (out of descriptors?) Todd, do you agree? *Questions*: We already have {{RPC.stopProxy(cdp)}} in a finally-block. Why the resource is still not released? Is it because {{TestFileConcurrentReader}} opens files so fast that the finally-block is not yet reached? Or {{RPC.stopProxy(..)}} does not work? > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965.txt, hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036704#comment-13036704 ] Hadoop QA commented on HDFS-1965: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479878/hdfs-1965.txt against trunk revision 1125217. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/599//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/599//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/599//console This message is automatically generated. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965.txt, hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036614#comment-13036614 ] Hadoop QA commented on HDFS-1965: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479859/hdfs-1965.txt against trunk revision 1125145. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 29 javac compiler warnings (more than the trunk's current 28 warnings). +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/596//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/596//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/596//console This message is automatically generated. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > Attachments: hdfs-1965.txt > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036554#comment-13036554 ] Todd Lipcon commented on HDFS-1965: --- I implemented option (b) and have a test case that shows that it fixes the problem... BUT: the real DFSInputStream code seems to call RPC.stopProxy() after it uses the proxy, which should also avoid this issue. Doing so in my test case makes the case pass without any other fix. So there's still some mystery. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1965) IPCs done using block token-based tickets can't reuse connections
[ https://issues.apache.org/jira/browse/HDFS-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036531#comment-13036531 ] Todd Lipcon commented on HDFS-1965: --- I can think of a couple possible solutions: a) make the methods that operate on a block take an additional parameter to contain block tokens, rather than using the normal token selector mechanism that scopes credentials on a per-connection basis. This has the advantage that we can even re-use an IPC connection across different blocks. b) when the client creates an IPC proxy to a DN, it can explicitly configure the maxIdleTime to 0 so that we don't leave connections hanging around after the call completes. This is less efficient than option A above, but it probably doesn't matter much for this use case. > IPCs done using block token-based tickets can't reuse connections > - > > Key: HDFS-1965 > URL: https://issues.apache.org/jira/browse/HDFS-1965 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.22.0 > > > This is the reason that TestFileConcurrentReaders has been failing a lot. > Reproducing a comment from HDFS-1057: > The test has a thread which continually re-opens the file which is being > written to. Since the file's in the middle of being written, it makes an RPC > to the DataNode in order to determine the visible length of the file. This > RPC is authenticated using the block token which came back in the > LocatedBlocks object as the security ticket. > When this RPC hits the IPC layer, it looks at its existing connections and > sees none that can be re-used, since the block token differs between the two > requesters. Hence, it reconnects, and we end up with hundreds or thousands of > IPC connections to the datanode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira