[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset
[ https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903152#comment-14903152 ] Hadoop QA commented on HDFS-9095: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761698/HDFS-9095.001.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / cc2b473 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12605/console | This message was automatically generated. > RPC client should fail gracefully when the connection is timed out or reset > --- > > Key: HDFS-9095 > URL: https://issues.apache.org/jira/browse/HDFS-9095 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-9095.000.patch, HDFS-9095.001.patch > > > The RPC client should fail gracefully when the connection is timed out or > reset. instead of bailing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset
[ https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903134#comment-14903134 ] Haohui Mai commented on HDFS-9095: -- Thanks [~James Clampffer] and [~bobhansen] for the reviews. The v1 patch changes {{CMAKE_CURRENT_SOURCE_DIR}} to {{CMAKE_CURRENT_LIST_DIR}}. > RPC client should fail gracefully when the connection is timed out or reset > --- > > Key: HDFS-9095 > URL: https://issues.apache.org/jira/browse/HDFS-9095 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-9095.000.patch, HDFS-9095.001.patch > > > The RPC client should fail gracefully when the connection is timed out or > reset. instead of bailing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset
[ https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901053#comment-14901053 ] Haohui Mai commented on HDFS-9095: -- bq. You may want to use CMAKE_CURRENT_LIST_DIR rather than CMAKE_CURRENT_SOURCE_DIR as a more stable root directory. I don't understand why it's an issue here? I've not seen many people use {{CMAKE_CURRENT_LIST_DIR}} in practice. ${CMAKE_CURRENT_SOURCE_DIR} will points to {{hadoop-hdfs-project/hadoop-hdfs-client/src/main/native/libhdfspp}}. When can it be a problem. Following the experiences learned from the Java client, should the server address be passed in with the options (eventually, they will probably all be loaded from the same XML files at at startup). bq. No. It's important to make the distinction here. Options specially mean tunable parameters, while server addresses are input for the RPC library. bq. In RpcConnection methods, should we be calling into the handler while holding the lock on the engine state? If any method there does synchronous I/O or hangs for any reason, the whole Rpc system locks up. bq. Can we have assertions that the lock is held in RpcConnection rather than comments stating that it should be? This is a known issue coming from https://github.com/haohui/libhdfspp/issues/39. Please feel free to file jiras to fix it. In RpcConnectionImpl, should options_ and next_layer_ be const? bq. {{next_layer_}} cannot be const, but options_ should be. Will fix it. > RPC client should fail gracefully when the connection is timed out or reset > --- > > Key: HDFS-9095 > URL: https://issues.apache.org/jira/browse/HDFS-9095 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-9095.000.patch > > > The RPC client should fail gracefully when the connection is timed out or > reset. instead of bailing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset
[ https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900971#comment-14900971 ] Bob Hansen commented on HDFS-9095: -- You may want to use CMAKE_CURRENT_LIST_DIR rather than CMAKE_CURRENT_SOURCE_DIR as a more stable root directory. I'm glad you started to add some logging and the start of an options architecture. I was going to file another Jira for both of those (I probably will to make a space for more full-featured efforts). Following the experiences learned from the Java client, should the server address be passed in with the options (eventually, they will probably all be loaded from the same XML files at at startup). In RpcConnection methods, should we be calling into the handler while holding the lock on the engine state? If any method there does synchronous I/O or hangs for any reason, the whole Rpc system locks up. Can we have assertions that the lock is held in RpcConnection rather than comments stating that it should be? In RpcConnectionImpl, should options_ and next_layer_ be const? > RPC client should fail gracefully when the connection is timed out or reset > --- > > Key: HDFS-9095 > URL: https://issues.apache.org/jira/browse/HDFS-9095 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-9095.000.patch > > > The RPC client should fail gracefully when the connection is timed out or > reset. instead of bailing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset
[ https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901293#comment-14901293 ] James Clampffer commented on HDFS-9095: --- Agree with bob about making the CMakeLists as robust as possible, otherwise +1 on the patch. Getting in the basics for logging is very nice as well. Re: In RpcConnection methods, should we be calling into the handler while holding the lock on the engine state? If any method there does synchronous I/O or hangs for any reason, the whole Rpc system locks up. This was done to avoid using a std::recursive_mutex because right now that handler only gets called from OnRecvCompleted. I don't think the handler is going to be changing much unless we start using multiple connections from a single RpcEngine. Lock contention is one of the things I hope to start profiling soon; if the overhead is negligible I'll switch that back to a recursive_mutex and grab the lock in the handler as well (I'll file a jira if that's the case). > RPC client should fail gracefully when the connection is timed out or reset > --- > > Key: HDFS-9095 > URL: https://issues.apache.org/jira/browse/HDFS-9095 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-9095.000.patch > > > The RPC client should fail gracefully when the connection is timed out or > reset. instead of bailing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset
[ https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901244#comment-14901244 ] Bob Hansen commented on HDFS-9095: -- Re: CMAKE_CURRENT_LIST_DIR vs. CMAKE_CURRENT_SRC_DIR: According to ye olde [StackOverflow|http://stackoverflow.com/questions/15662497/in-cmake-what-is-the-difference-between-cmake-current-source-dir-and-cmake-curr], it becomes more of an issue when files are included across directories (as some of the protobuf stuff is). The difference is what led to hours of angst in HDFS-9025 where the cwd was under the CMakeLists.txt. It's not a super-big deal, but once bitten, twice shy. Re: Options - what you have here is a good start; we can discuss an architectural solution under HDFS-9117. > RPC client should fail gracefully when the connection is timed out or reset > --- > > Key: HDFS-9095 > URL: https://issues.apache.org/jira/browse/HDFS-9095 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-9095.000.patch > > > The RPC client should fail gracefully when the connection is timed out or > reset. instead of bailing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset
[ https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803187#comment-14803187 ] Hadoop QA commented on HDFS-9095: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12756394/HDFS-9095.000.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / 58d1a02 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12508/console | This message was automatically generated. > RPC client should fail gracefully when the connection is timed out or reset > --- > > Key: HDFS-9095 > URL: https://issues.apache.org/jira/browse/HDFS-9095 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-9095.000.patch > > > The RPC client should fail gracefully when the connection is timed out or > reset. instead of bailing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)