[jira] [Commented] (HADOOP-18567) LogThrottlingHelper: the dependent recorder is not triggered correctly
[ https://issues.apache.org/jira/browse/HADOOP-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1762#comment-1762 ] Erik Krogen commented on HADOOP-18567: -- Thanks for fixing up the versions [~ste...@apache.org]! My mistake. > LogThrottlingHelper: the dependent recorder is not triggered correctly > -- > > Key: HADOOP-18567 > URL: https://issues.apache.org/jira/browse/HADOOP-18567 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.4 >Reporter: Chengbing Liu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > The current implementation of {{LogThrottlingHelper}} works well most of the > time, but it missed out one case, which appears quite common in the > production codes: > - if the dependent recorder was not suppressed before the primary one is > triggered on the next period, then the next logging of the dependent recorder > will be unexpectedly suppressed. > {code:java} > helper = new LogThrottlingHelper(LOG_PERIOD, "foo", timer); > assertTrue(helper.record("foo", 0).shouldLog()); > assertTrue(helper.record("bar", 0).shouldLog()); > // Both should log once the period has elapsed > // > assertTrue(helper.record("foo", LOG_PERIOD).shouldLog()); > assertTrue(helper.record("bar", LOG_PERIOD).shouldLog()); <--- This > assertion will now fail > {code} > Note if we call {{helper.record("bar", LOG_PERIOD * 2)}} in , as the > existing test cases do, it will work as expected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18567) LogThrottlingHelper: the dependent recorder is not triggered correctly
[ https://issues.apache.org/jira/browse/HADOOP-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-18567: - Resolution: Fixed Status: Resolved (was: Patch Available) > LogThrottlingHelper: the dependent recorder is not triggered correctly > -- > > Key: HADOOP-18567 > URL: https://issues.apache.org/jira/browse/HADOOP-18567 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.4 >Reporter: Chengbing Liu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6 > > > The current implementation of {{LogThrottlingHelper}} works well most of the > time, but it missed out one case, which appears quite common in the > production codes: > - if the dependent recorder was not suppressed before the primary one is > triggered on the next period, then the next logging of the dependent recorder > will be unexpectedly suppressed. > {code:java} > helper = new LogThrottlingHelper(LOG_PERIOD, "foo", timer); > assertTrue(helper.record("foo", 0).shouldLog()); > assertTrue(helper.record("bar", 0).shouldLog()); > // Both should log once the period has elapsed > // > assertTrue(helper.record("foo", LOG_PERIOD).shouldLog()); > assertTrue(helper.record("bar", LOG_PERIOD).shouldLog()); <--- This > assertion will now fail > {code} > Note if we call {{helper.record("bar", LOG_PERIOD * 2)}} in , as the > existing test cases do, it will work as expected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18567) LogThrottlingHelper: the dependent recorder is not triggered correctly
[ https://issues.apache.org/jira/browse/HADOOP-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-18567: - Fix Version/s: 3.3.6 > LogThrottlingHelper: the dependent recorder is not triggered correctly > -- > > Key: HADOOP-18567 > URL: https://issues.apache.org/jira/browse/HADOOP-18567 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.4 >Reporter: Chengbing Liu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6 > > > The current implementation of {{LogThrottlingHelper}} works well most of the > time, but it missed out one case, which appears quite common in the > production codes: > - if the dependent recorder was not suppressed before the primary one is > triggered on the next period, then the next logging of the dependent recorder > will be unexpectedly suppressed. > {code:java} > helper = new LogThrottlingHelper(LOG_PERIOD, "foo", timer); > assertTrue(helper.record("foo", 0).shouldLog()); > assertTrue(helper.record("bar", 0).shouldLog()); > // Both should log once the period has elapsed > // > assertTrue(helper.record("foo", LOG_PERIOD).shouldLog()); > assertTrue(helper.record("bar", LOG_PERIOD).shouldLog()); <--- This > assertion will now fail > {code} > Note if we call {{helper.record("bar", LOG_PERIOD * 2)}} in , as the > existing test cases do, it will work as expected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18567) LogThrottlingHelper: the dependent recorder is not triggered correctly
[ https://issues.apache.org/jira/browse/HADOOP-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-18567: - Fix Version/s: 3.4.0 > LogThrottlingHelper: the dependent recorder is not triggered correctly > -- > > Key: HADOOP-18567 > URL: https://issues.apache.org/jira/browse/HADOOP-18567 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.4 >Reporter: Chengbing Liu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > The current implementation of {{LogThrottlingHelper}} works well most of the > time, but it missed out one case, which appears quite common in the > production codes: > - if the dependent recorder was not suppressed before the primary one is > triggered on the next period, then the next logging of the dependent recorder > will be unexpectedly suppressed. > {code:java} > helper = new LogThrottlingHelper(LOG_PERIOD, "foo", timer); > assertTrue(helper.record("foo", 0).shouldLog()); > assertTrue(helper.record("bar", 0).shouldLog()); > // Both should log once the period has elapsed > // > assertTrue(helper.record("foo", LOG_PERIOD).shouldLog()); > assertTrue(helper.record("bar", LOG_PERIOD).shouldLog()); <--- This > assertion will now fail > {code} > Note if we call {{helper.record("bar", LOG_PERIOD * 2)}} in , as the > existing test cases do, it will work as expected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18446) Add a re-queue metric to RpcMetrics.java to quantify the number of re-queue RPCs
[ https://issues.apache.org/jira/browse/HADOOP-18446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved HADOOP-18446. -- Fix Version/s: 3.4.0 Resolution: Fixed > Add a re-queue metric to RpcMetrics.java to quantify the number of re-queue > RPCs > > > Key: HADOOP-18446 > URL: https://issues.apache.org/jira/browse/HADOOP-18446 > Project: Hadoop Common > Issue Type: Improvement >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > Add a reQueue metric to RpcMetrics.java to quantify the number of RPCs > reQueued. > Because Observer NameNode will re-queue the rpc if the call processing should > be postponed. > > There is no any metric to quantify the number of re-queue RPCs, so I think we > should do it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18315) Fix 3.3 build problems caused by backport of HADOOP-11867.
[ https://issues.apache.org/jira/browse/HADOOP-18315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566555#comment-17566555 ] Erik Krogen commented on HADOOP-18315: -- Looks like this was fixed in HADOOP-18322, is there anything left to be done here? > Fix 3.3 build problems caused by backport of HADOOP-11867. > -- > > Key: HADOOP-18315 > URL: https://issues.apache.org/jira/browse/HADOOP-18315 > Project: Hadoop Common > Issue Type: Bug > Components: build >Affects Versions: 3.3.5 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18322) Yetus build failure in branch-3.3.
[ https://issues.apache.org/jira/browse/HADOOP-18322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566554#comment-17566554 ] Erik Krogen commented on HADOOP-18322: -- [~mthakur] it looks like this was resolved in [commit 7eb1c908a0c8ae1f5ab4dcb662affccef7dba8c0|https://github.com/apache/hadoop/commit/7eb1c908a0c8ae1f5ab4dcb662affccef7dba8c0]. Can we close this? > Yetus build failure in branch-3.3. > -- > > Key: HADOOP-18322 > URL: https://issues.apache.org/jira/browse/HADOOP-18322 > Project: Hadoop Common > Issue Type: Bug > Components: common, fs >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Critical > > [ERROR] The build could not read 1 project -> [Help 1] [ERROR] [ERROR] The > project org.apache.hadoop:hadoop-benchmark:3.4.0-SNAPSHOT > (/home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-4517/src/hadoop-tools/hadoop-benchmark/pom.xml) > has 1 error [ERROR] Non-resolvable parent POM for > org.apache.hadoop:hadoop-benchmark:3.4.0-SNAPSHOT: Could not find artifact > org.apache.hadoop:hadoop-project:pom:3.4.0-SNAPSHOT and 'parent.relativePath' > points at wrong local POM @ line 22, column 11 -> > > > https://github.com/apache/hadoop/pull/4517#issuecomment-117019 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18315) Fix 3.3 build problems caused by backport of HADOOP-11867.
[ https://issues.apache.org/jira/browse/HADOOP-18315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17560666#comment-17560666 ] Erik Krogen commented on HADOOP-18315: -- {{branch-3.3.4}} was [cut on 6/22/22|https://github.com/apache/hadoop/commit/a1ce2fc44bccddabca2b92f2e38e4c31ad90d349], but [HADOOP-11867 was backported on 6/23/22|https://github.com/apache/hadoop/commit/5c348c41ab8ddb81146355570856e61e8d129a1e]. This affects 3.3.5, not 3.3.4. Updated the version appropriately. > Fix 3.3 build problems caused by backport of HADOOP-11867. > -- > > Key: HADOOP-18315 > URL: https://issues.apache.org/jira/browse/HADOOP-18315 > Project: Hadoop Common > Issue Type: Bug > Components: build >Affects Versions: 3.3.5 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18315) Fix 3.3 build problems caused by backport of HADOOP-11867.
[ https://issues.apache.org/jira/browse/HADOOP-18315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-18315: - Affects Version/s: 3.3.5 (was: 3.3.4) > Fix 3.3 build problems caused by backport of HADOOP-11867. > -- > > Key: HADOOP-18315 > URL: https://issues.apache.org/jira/browse/HADOOP-18315 > Project: Hadoop Common > Issue Type: Bug > Components: build >Affects Versions: 3.3.5 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Fix For: 3.3.4 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18315) Fix 3.3 build problems caused by backport of HADOOP-11867.
[ https://issues.apache.org/jira/browse/HADOOP-18315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-18315: - Fix Version/s: (was: 3.3.4) > Fix 3.3 build problems caused by backport of HADOOP-11867. > -- > > Key: HADOOP-18315 > URL: https://issues.apache.org/jira/browse/HADOOP-18315 > Project: Hadoop Common > Issue Type: Bug > Components: build >Affects Versions: 3.3.5 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17127) Use RpcMetrics.TIMEUNIT to initialize rpc queueTime and processingTime
[ https://issues.apache.org/jira/browse/HADOOP-17127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158483#comment-17158483 ] Erik Krogen commented on HADOOP-17127: -- Cherry picked to branch-3.3 and pushed to branch-3.2, branch-3.1, and branch-2.10 based on the supplied patches. Thanks [~Jim_Brennan]! > Use RpcMetrics.TIMEUNIT to initialize rpc queueTime and processingTime > -- > > Key: HADOOP-17127 > URL: https://issues.apache.org/jira/browse/HADOOP-17127 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.4.0 > > Attachments: HADOOP-17127-branch-2.10.001.patch, > HADOOP-17127-branch-3.1.001.patch, HADOOP-17127-branch-3.2.001.patch, > HADOOP-17127.001.patch, HADOOP-17127.002.patch > > > While making an internal change to use {{TimeUnit.MICROSECONDS}} instead of > {{TimeUnit.MILLISECONDS}} for rpc details, we found that we also had to > modify this code in DecayRpcScheduler.addResponseTime() to initialize > {{queueTime}} and {{processingTime}} with the correct units. > {noformat} > long queueTime = details.get(Timing.QUEUE, TimeUnit.MILLISECONDS); > long processingTime = details.get(Timing.PROCESSING, > TimeUnit.MILLISECONDS); > {noformat} > If we change these to use {{RpcMetrics.TIMEUNIT}} it is simpler. > We also found one test case in TestRPC that was assuming the units were > milliseconds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17127) Use RpcMetrics.TIMEUNIT to initialize rpc queueTime and processingTime
[ https://issues.apache.org/jira/browse/HADOOP-17127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-17127: - Fix Version/s: 3.3.1 2.10.1 3.2.2 3.1.4 > Use RpcMetrics.TIMEUNIT to initialize rpc queueTime and processingTime > -- > > Key: HADOOP-17127 > URL: https://issues.apache.org/jira/browse/HADOOP-17127 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.1.4, 3.2.2, 2.10.1, 3.3.1, 3.4.0 > > Attachments: HADOOP-17127-branch-2.10.001.patch, > HADOOP-17127-branch-3.1.001.patch, HADOOP-17127-branch-3.2.001.patch, > HADOOP-17127.001.patch, HADOOP-17127.002.patch > > > While making an internal change to use {{TimeUnit.MICROSECONDS}} instead of > {{TimeUnit.MILLISECONDS}} for rpc details, we found that we also had to > modify this code in DecayRpcScheduler.addResponseTime() to initialize > {{queueTime}} and {{processingTime}} with the correct units. > {noformat} > long queueTime = details.get(Timing.QUEUE, TimeUnit.MILLISECONDS); > long processingTime = details.get(Timing.PROCESSING, > TimeUnit.MILLISECONDS); > {noformat} > If we change these to use {{RpcMetrics.TIMEUNIT}} it is simpler. > We also found one test case in TestRPC that was assuming the units were > milliseconds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17127) Use RpcMetrics.TIMEUNIT to initialize rpc queueTime and processingTime
[ https://issues.apache.org/jira/browse/HADOOP-17127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-17127: - Fix Version/s: 3.4.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Use RpcMetrics.TIMEUNIT to initialize rpc queueTime and processingTime > -- > > Key: HADOOP-17127 > URL: https://issues.apache.org/jira/browse/HADOOP-17127 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.4.0 > > Attachments: HADOOP-17127.001.patch, HADOOP-17127.002.patch > > > While making an internal change to use {{TimeUnit.MICROSECONDS}} instead of > {{TimeUnit.MILLISECONDS}} for rpc details, we found that we also had to > modify this code in DecayRpcScheduler.addResponseTime() to initialize > {{queueTime}} and {{processingTime}} with the correct units. > {noformat} > long queueTime = details.get(Timing.QUEUE, TimeUnit.MILLISECONDS); > long processingTime = details.get(Timing.PROCESSING, > TimeUnit.MILLISECONDS); > {noformat} > If we change these to use {{RpcMetrics.TIMEUNIT}} it is simpler. > We also found one test case in TestRPC that was assuming the units were > milliseconds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17127) Use RpcMetrics.TIMEUNIT to initialize rpc queueTime and processingTime
[ https://issues.apache.org/jira/browse/HADOOP-17127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157561#comment-17157561 ] Erik Krogen commented on HADOOP-17127: -- LGTM, thanks [~Jim_Brennan]! This looks good from a consistency standpoint and has no user-facing impact. I just committed this to {{trunk}}. > Use RpcMetrics.TIMEUNIT to initialize rpc queueTime and processingTime > -- > > Key: HADOOP-17127 > URL: https://issues.apache.org/jira/browse/HADOOP-17127 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HADOOP-17127.001.patch, HADOOP-17127.002.patch > > > While making an internal change to use {{TimeUnit.MICROSECONDS}} instead of > {{TimeUnit.MILLISECONDS}} for rpc details, we found that we also had to > modify this code in DecayRpcScheduler.addResponseTime() to initialize > {{queueTime}} and {{processingTime}} with the correct units. > {noformat} > long queueTime = details.get(Timing.QUEUE, TimeUnit.MILLISECONDS); > long processingTime = details.get(Timing.PROCESSING, > TimeUnit.MILLISECONDS); > {noformat} > If we change these to use {{RpcMetrics.TIMEUNIT}} it is simpler. > We also found one test case in TestRPC that was assuming the units were > milliseconds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16700) RpcQueueTime may be negative when the response has to be sent later
[ https://issues.apache.org/jira/browse/HADOOP-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987312#comment-16987312 ] Erik Krogen commented on HADOOP-16700: -- Just a heads up I also backported this down to branch-2.10, to get all of the version with HADOOP-16266 present. > RpcQueueTime may be negative when the response has to be sent later > --- > > Key: HADOOP-16700 > URL: https://issues.apache.org/jira/browse/HADOOP-16700 > Project: Hadoop Common > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1, 2.11.0 > > Attachments: HADOOP-16700-trunk-001.patch, HADOOP-16700.002.patch > > > RpcQueueTime may be negative when the response has to be sent later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16700) RpcQueueTime may be negative when the response has to be sent later
[ https://issues.apache.org/jira/browse/HADOOP-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16700: - Fix Version/s: 2.11.0 2.10.1 > RpcQueueTime may be negative when the response has to be sent later > --- > > Key: HADOOP-16700 > URL: https://issues.apache.org/jira/browse/HADOOP-16700 > Project: Hadoop Common > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1, 2.11.0 > > Attachments: HADOOP-16700-trunk-001.patch, HADOOP-16700.002.patch > > > RpcQueueTime may be negative when the response has to be sent later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16700) RpcQueueTime may be negative when the response has to be sent later
[ https://issues.apache.org/jira/browse/HADOOP-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16700: - Fix Version/s: 3.2.2 3.1.4 > RpcQueueTime may be negative when the response has to be sent later > --- > > Key: HADOOP-16700 > URL: https://issues.apache.org/jira/browse/HADOOP-16700 > Project: Hadoop Common > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HADOOP-16700-trunk-001.patch, HADOOP-16700.002.patch > > > RpcQueueTime may be negative when the response has to be sent later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16700) RpcQueueTime may be negative when the response has to be sent later
[ https://issues.apache.org/jira/browse/HADOOP-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16700: - Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) > RpcQueueTime may be negative when the response has to be sent later > --- > > Key: HADOOP-16700 > URL: https://issues.apache.org/jira/browse/HADOOP-16700 > Project: Hadoop Common > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Minor > Fix For: 3.3.0 > > Attachments: HADOOP-16700-trunk-001.patch, HADOOP-16700.002.patch > > > RpcQueueTime may be negative when the response has to be sent later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16700) RpcQueueTime may be negative when the response has to be sent later
[ https://issues.apache.org/jira/browse/HADOOP-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978542#comment-16978542 ] Erik Krogen commented on HADOOP-16700: -- The checkstyle issues are because of the fields not being private, but this fits with how {{Call}} is used overall. I am +1 on the new patch. I just committed this to trunk. Thanks for the contribution [~xuzq_zander]! > RpcQueueTime may be negative when the response has to be sent later > --- > > Key: HADOOP-16700 > URL: https://issues.apache.org/jira/browse/HADOOP-16700 > Project: Hadoop Common > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Minor > Attachments: HADOOP-16700-trunk-001.patch, HADOOP-16700.002.patch > > > RpcQueueTime may be negative when the response has to be sent later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16700) RpcQueueTime may be negative when the response has to be sent later
[ https://issues.apache.org/jira/browse/HADOOP-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975457#comment-16975457 ] Erik Krogen commented on HADOOP-16700: -- Thanks for the explanation [~xuzq_zander]! It definitely seems like a valid issue. I took a look at the v001 patch. By the way, please follow the [patch naming conventions|https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute#HowToContribute-Namingyourpatch] -- it should be {{HADOOP-16700.001.patch}} (period instead of hyphen before the version, and you don't need to specify a branch when it is trunk). The general approach seems sound to me. I am concerned about all of the changes you've made to the signatures of methods, removing {{receiveTime}}. First off {{Server}} is a public interface, so we should not make breaking changes to its API. To introduce a new method here, you need to keep the old one but mark it as {{@Deprecated}}. Second off, this change seems unrelated to this JIRA? If that is the case, we should keep it separate. My only other comment is that we should update the comments here within {{Call}}: {code} long timestampNanos; // time received when response is null // time served when response is not null long responseTimestampNanos; {code} I think that it would now be more accurate to say: {code} long timestampNanos; // time the call was received long responseTimestampNanos; // time the call was served {code} Let me know if you think that isn't correct. > RpcQueueTime may be negative when the response has to be sent later > --- > > Key: HADOOP-16700 > URL: https://issues.apache.org/jira/browse/HADOOP-16700 > Project: Hadoop Common > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Minor > Attachments: HADOOP-16700-trunk-001.patch > > > RpcQueueTime may be negative when the response has to be sent later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-16700) RpcQueueTime may be negative when the response has to be sent later
[ https://issues.apache.org/jira/browse/HADOOP-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975457#comment-16975457 ] Erik Krogen edited comment on HADOOP-16700 at 11/15/19 10:58 PM: - Thanks for the explanation [~xuzq_zander]! Very helpful. It definitely seems like a valid issue. I took a look at the v001 patch. By the way, please follow the [patch naming conventions|https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute#HowToContribute-Namingyourpatch] -- it should be {{HADOOP-16700.001.patch}} (period instead of hyphen before the version, and you don't need to specify a branch when it is trunk). The general approach seems sound to me. I am concerned about all of the changes you've made to the signatures of methods, removing {{receiveTime}}. First off {{Server}} is a public interface, so we should not make breaking changes to its API. To introduce a new method here, you need to keep the old one but mark it as {{@Deprecated}}. Second off, this change seems unrelated to this JIRA? If that is the case, we should keep it separate. My only other comment is that we should update the comments here within {{Call}}: {code} long timestampNanos; // time received when response is null // time served when response is not null long responseTimestampNanos; {code} I think that it would now be more accurate to say: {code} long timestampNanos; // time the call was received long responseTimestampNanos; // time the call was served {code} Let me know if you think that isn't correct. was (Author: xkrogen): Thanks for the explanation [~xuzq_zander]! It definitely seems like a valid issue. I took a look at the v001 patch. By the way, please follow the [patch naming conventions|https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute#HowToContribute-Namingyourpatch] -- it should be {{HADOOP-16700.001.patch}} (period instead of hyphen before the version, and you don't need to specify a branch when it is trunk). The general approach seems sound to me. I am concerned about all of the changes you've made to the signatures of methods, removing {{receiveTime}}. First off {{Server}} is a public interface, so we should not make breaking changes to its API. To introduce a new method here, you need to keep the old one but mark it as {{@Deprecated}}. Second off, this change seems unrelated to this JIRA? If that is the case, we should keep it separate. My only other comment is that we should update the comments here within {{Call}}: {code} long timestampNanos; // time received when response is null // time served when response is not null long responseTimestampNanos; {code} I think that it would now be more accurate to say: {code} long timestampNanos; // time the call was received long responseTimestampNanos; // time the call was served {code} Let me know if you think that isn't correct. > RpcQueueTime may be negative when the response has to be sent later > --- > > Key: HADOOP-16700 > URL: https://issues.apache.org/jira/browse/HADOOP-16700 > Project: Hadoop Common > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Minor > Attachments: HADOOP-16700-trunk-001.patch > > > RpcQueueTime may be negative when the response has to be sent later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973629#comment-16973629 ] Erik Krogen commented on HADOOP-16683: -- Hey folks, should this be backported to older 3.x lines to match HADOOP-16580? > Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped > AccessControlException > -- > > Key: HADOOP-16683 > URL: https://issues.apache.org/jira/browse/HADOOP-16683 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch, > HADOOP-16683.003.patch > > > Follow up patch on HADOOP-16580. > We successfully disabled the retry in case of an AccessControlException which > has resolved some of the cases, but in other cases AccessControlException is > wrapped inside another IOException and you can only get the original > exception by calling getCause(). > Let's add this extra case as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16700) RpcQueueTime may be negative when the response has to be sent later
[ https://issues.apache.org/jira/browse/HADOOP-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971746#comment-16971746 ] Erik Krogen commented on HADOOP-16700: -- Hi [~xuzq_zander], can you more fully describe the issue? What causes the queue time to be negative? Under what situations does this occur? > RpcQueueTime may be negative when the response has to be sent later > --- > > Key: HADOOP-16700 > URL: https://issues.apache.org/jira/browse/HADOOP-16700 > Project: Hadoop Common > Issue Type: Bug >Reporter: xuzq >Priority: Minor > Attachments: HADOOP-16700-trunk-001.patch > > > RpcQueueTime may be negative when the response has to be sent later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16694) Use Objects requireNonNull Where Appropriate
[ https://issues.apache.org/jira/browse/HADOOP-16694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971716#comment-16971716 ] Erik Krogen commented on HADOOP-16694: -- Yeah, great points [~ste...@apache.org]. I agree. > Use Objects requireNonNull Where Appropriate > > > Key: HADOOP-16694 > URL: https://issues.apache.org/jira/browse/HADOOP-16694 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 3.2 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HADOOP-16694.1.patch, HADOOP-16694.2.patch, > HADOOP-16694.3.patch > > > https://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#requireNonNull-T- -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16694) Use Objects requireNonNull Where Appropriate
[ https://issues.apache.org/jira/browse/HADOOP-16694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970414#comment-16970414 ] Erik Krogen commented on HADOOP-16694: -- By the way, I updated the title to reflect that it is {{requireNonNull}} instead of {{requireNull}}. I was very confused when I first saw this come in with the old title :) > Use Objects requireNonNull Where Appropriate > > > Key: HADOOP-16694 > URL: https://issues.apache.org/jira/browse/HADOOP-16694 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 3.2 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HADOOP-16694.1.patch, HADOOP-16694.2.patch > > > https://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#requireNonNull-T- -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16694) Use Objects requireNonNull Where Appropriate
[ https://issues.apache.org/jira/browse/HADOOP-16694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16694: - Summary: Use Objects requireNonNull Where Appropriate (was: Use Objects requireNull Where Appropriate) > Use Objects requireNonNull Where Appropriate > > > Key: HADOOP-16694 > URL: https://issues.apache.org/jira/browse/HADOOP-16694 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 3.2 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HADOOP-16694.1.patch, HADOOP-16694.2.patch > > > https://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#requireNonNull-T- -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16695) Make LogThrottlingHelper thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-16695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970367#comment-16970367 ] Erik Krogen commented on HADOOP-16695: -- Awesome, thanks [~zhangchen]! > Make LogThrottlingHelper thread-safe > > > Key: HADOOP-16695 > URL: https://issues.apache.org/jira/browse/HADOOP-16695 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > > HADOOP-15726 introduced the \{{LogThrottlingHelper}}, but this class is not > thread-safe, which limits its usage scenarios, this Jira will try to improve > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16694) Use Objects requireNull Where Appropriate
[ https://issues.apache.org/jira/browse/HADOOP-16694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970366#comment-16970366 ] Erik Krogen commented on HADOOP-16694: -- I wonder if it's also reasonable to replace instances of {{Preconditions.checkNotNull}} with {{Objects.requireNotNull}} to reduce our exposure to Guava a bit? But I guess the shading work makes that a less necessary endeavor. > Use Objects requireNull Where Appropriate > - > > Key: HADOOP-16694 > URL: https://issues.apache.org/jira/browse/HADOOP-16694 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 3.2 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HADOOP-16694.1.patch, HADOOP-16694.2.patch > > > https://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#requireNonNull-T- -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16656) Document FairCallQueue configs in core-default.xml
[ https://issues.apache.org/jira/browse/HADOOP-16656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967933#comment-16967933 ] Erik Krogen commented on HADOOP-16656: -- Hi [~weichiu], [~smeng] -- it looks like this broke {{hadoop.conf.TestCommonConfigurationFields}}. You can see it failing in the precommit runs for this JIRA, and I see it now failing on trunk. I checked out trunk before this commit went in and the test succeeds. Can you please get this fixed? > Document FairCallQueue configs in core-default.xml > -- > > Key: HADOOP-16656 > URL: https://issues.apache.org/jira/browse/HADOOP-16656 > Project: Hadoop Common > Issue Type: Task > Components: conf, documentation >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-16656.001.patch, HADOOP-16656.002.patch > > > So far those callqueue / scheduler / faircallqueue -related configurations > are only documented in FairCallQueue.md in 3.3.0: > https://aajisaka.github.io/hadoop-document/hadoop-project/hadoop-project-dist/hadoop-common/FairCallQueue.html#Full_List_of_Configurations > (Thanks Akira for uploading this.) > Goal: Document those configs in core-default.xml as well to make it easier > for users(admins) to find and use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16581) ValueQueue does not trigger an async refill when number of values falls below watermark
[ https://issues.apache.org/jira/browse/HADOOP-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935957#comment-16935957 ] Erik Krogen commented on HADOOP-16581: -- Thanks [~iwasakims]! I had prepared the same patch and thought I included it in the commit I pushed, but it seems it was sitting in my local directory only 😓 I just pushed up the fix. > ValueQueue does not trigger an async refill when number of values falls below > watermark > --- > > Key: HADOOP-16581 > URL: https://issues.apache.org/jira/browse/HADOOP-16581 > Project: Hadoop Common > Issue Type: Bug > Components: common, kms >Affects Versions: 2.7.4, 3.2.0 >Reporter: Yuval Degani >Assignee: Yuval Degani >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2 > > Attachments: HADOOP-16581-branch-2.001.addendum.patch > > > The ValueQueue facility was designed to cache EDEKs for KMS KeyProviders so > that EDEKs could be served quickly, while the cache is replenished in a > background thread. > The existing code for triggering an asynchronous refill is only triggered > when a key queue becomes empty, rather than when it falls below the > configured watermark. > This is a relatively minor fix in the main code, however, most of the tests > require some changes as they verify the previous unintended behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16581) ValueQueue does not trigger an async refill when number of values falls below watermark
[ https://issues.apache.org/jira/browse/HADOOP-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16581: - Resolution: Fixed Status: Resolved (was: Patch Available) > ValueQueue does not trigger an async refill when number of values falls below > watermark > --- > > Key: HADOOP-16581 > URL: https://issues.apache.org/jira/browse/HADOOP-16581 > Project: Hadoop Common > Issue Type: Bug > Components: common, kms >Affects Versions: 2.7.4, 3.2.0 >Reporter: Yuval Degani >Assignee: Yuval Degani >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2 > > > The ValueQueue facility was designed to cache EDEKs for KMS KeyProviders so > that EDEKs could be served quickly, while the cache is replenished in a > background thread. > The existing code for triggering an asynchronous refill is only triggered > when a key queue becomes empty, rather than when it falls below the > configured watermark. > This is a relatively minor fix in the main code, however, most of the tests > require some changes as they verify the previous unintended behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16581) ValueQueue does not trigger an async refill when number of values falls below watermark
[ https://issues.apache.org/jira/browse/HADOOP-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934591#comment-16934591 ] Erik Krogen commented on HADOOP-16581: -- Also backported to branch-3.2, branch-3.1, and branch-2. There was a minor change I needed to make to the test case in branch-2 due to the lack of Java 8 support (removed the lambda, made some parameters final). Thanks [~yuvaldeg]! > ValueQueue does not trigger an async refill when number of values falls below > watermark > --- > > Key: HADOOP-16581 > URL: https://issues.apache.org/jira/browse/HADOOP-16581 > Project: Hadoop Common > Issue Type: Bug > Components: common, kms >Affects Versions: 2.7.4, 3.2.0 >Reporter: Yuval Degani >Assignee: Yuval Degani >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2 > > > The ValueQueue facility was designed to cache EDEKs for KMS KeyProviders so > that EDEKs could be served quickly, while the cache is replenished in a > background thread. > The existing code for triggering an asynchronous refill is only triggered > when a key queue becomes empty, rather than when it falls below the > configured watermark. > This is a relatively minor fix in the main code, however, most of the tests > require some changes as they verify the previous unintended behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16581) ValueQueue does not trigger an async refill when number of values falls below watermark
[ https://issues.apache.org/jira/browse/HADOOP-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16581: - Fix Version/s: (was: 3.2.1) 3.2.2 3.1.4 3.3.0 2.10.0 > ValueQueue does not trigger an async refill when number of values falls below > watermark > --- > > Key: HADOOP-16581 > URL: https://issues.apache.org/jira/browse/HADOOP-16581 > Project: Hadoop Common > Issue Type: Bug > Components: common, kms >Affects Versions: 2.7.4, 3.2.0 >Reporter: Yuval Degani >Assignee: Yuval Degani >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2 > > > The ValueQueue facility was designed to cache EDEKs for KMS KeyProviders so > that EDEKs could be served quickly, while the cache is replenished in a > background thread. > The existing code for triggering an asynchronous refill is only triggered > when a key queue becomes empty, rather than when it falls below the > configured watermark. > This is a relatively minor fix in the main code, however, most of the tests > require some changes as they verify the previous unintended behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16581) ValueQueue does not trigger an async refill when number of values falls below watermark
[ https://issues.apache.org/jira/browse/HADOOP-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934568#comment-16934568 ] Erik Krogen commented on HADOOP-16581: -- Just merged this to trunk based on +1 from Anu and myself. Thanks for the contribution [~yuvaldeg]! > ValueQueue does not trigger an async refill when number of values falls below > watermark > --- > > Key: HADOOP-16581 > URL: https://issues.apache.org/jira/browse/HADOOP-16581 > Project: Hadoop Common > Issue Type: Bug > Components: common, kms >Affects Versions: 2.7.4, 3.2.0 >Reporter: Yuval Degani >Assignee: Yuval Degani >Priority: Major > Fix For: 3.2.1 > > > The ValueQueue facility was designed to cache EDEKs for KMS KeyProviders so > that EDEKs could be served quickly, while the cache is replenished in a > background thread. > The existing code for triggering an asynchronous refill is only triggered > when a key queue becomes empty, rather than when it falls below the > configured watermark. > This is a relatively minor fix in the main code, however, most of the tests > require some changes as they verify the previous unintended behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16581) ValueQueue does not trigger an async refill when number of values falls below watermark
[ https://issues.apache.org/jira/browse/HADOOP-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933525#comment-16933525 ] Erik Krogen commented on HADOOP-16581: -- Got a clean Jenkins run. +1 from me, will give one more day before committing for anyone else to chime in. Ping [~aengineer] since it looks like I messed up the tag in my last comment. > ValueQueue does not trigger an async refill when number of values falls below > watermark > --- > > Key: HADOOP-16581 > URL: https://issues.apache.org/jira/browse/HADOOP-16581 > Project: Hadoop Common > Issue Type: Bug > Components: common, kms >Affects Versions: 2.7.4, 3.2.0 >Reporter: Yuval Degani >Assignee: Yuval Degani >Priority: Major > Fix For: 3.2.1 > > > The ValueQueue facility was designed to cache EDEKs for KMS KeyProviders so > that EDEKs could be served quickly, while the cache is replenished in a > background thread. > The existing code for triggering an asynchronous refill is only triggered > when a key queue becomes empty, rather than when it falls below the > configured watermark. > This is a relatively minor fix in the main code, however, most of the tests > require some changes as they verify the previous unintended behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16581) ValueQueue does not trigger an async refill when number of values falls below watermark
[ https://issues.apache.org/jira/browse/HADOOP-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932659#comment-16932659 ] Erik Krogen commented on HADOOP-16581: -- The change LGTM, ping [~xiaochen] who I see has some previous commits in this file, and [~anu] / [~daryn] / [~kihwal] who I believe have some experience with the KMS. > ValueQueue does not trigger an async refill when number of values falls below > watermark > --- > > Key: HADOOP-16581 > URL: https://issues.apache.org/jira/browse/HADOOP-16581 > Project: Hadoop Common > Issue Type: Bug > Components: common, kms >Affects Versions: 2.7.4, 3.2.0 >Reporter: Yuval Degani >Assignee: Yuval Degani >Priority: Major > Fix For: 3.2.1 > > > The ValueQueue facility was designed to cache EDEKs for KMS KeyProviders so > that EDEKs could be served quickly, while the cache is replenished in a > background thread. > The existing code for triggering an asynchronous refill is only triggered > when a key queue becomes empty, rather than when it falls below the > configured watermark. > This is a relatively minor fix in the main code, however, most of the tests > require some changes as they verify the previous unintended behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16531) Log more detail for slow RPC
[ https://issues.apache.org/jira/browse/HADOOP-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16531: - Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this to trunk. Thanks a lot for the contribution [~zhangchen]! > Log more detail for slow RPC > > > Key: HADOOP-16531 > URL: https://issues.apache.org/jira/browse/HADOOP-16531 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-16531.001.patch > > > Current implementation only log process time > {code:java} > if ((rpcMetrics.getProcessingSampleCount() > minSampleSize) && > (processingTime > threeSigma)) { > LOG.warn("Slow RPC : {} took {} {} to process from client {}", > methodName, processingTime, RpcMetrics.TIMEUNIT, call); > rpcMetrics.incrSlowRpc(); > } > {code} > We need to log more details to help us locate the problem (eg. how long it > take to request lock, holding lock, or do other things) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15565) ViewFileSystem.close doesn't close child filesystems and causes FileSystem objects leak.
[ https://issues.apache.org/jira/browse/HADOOP-15565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-15565: - Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) > ViewFileSystem.close doesn't close child filesystems and causes FileSystem > objects leak. > > > Key: HADOOP-15565 > URL: https://issues.apache.org/jira/browse/HADOOP-15565 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-15565.0001.patch, HADOOP-15565.0002.patch, > HADOOP-15565.0003.patch, HADOOP-15565.0004.patch, HADOOP-15565.0005.patch, > HADOOP-15565.0006.bak, HADOOP-15565.0006.patch, HADOOP-15565.0007.patch, > HADOOP-15565.0008.patch > > > ViewFileSystem.close() does nothing but remove itself from FileSystem.CACHE. > It's children filesystems are cached in FileSystem.CACHE and shared by all > the ViewFileSystem instances. We could't simply close all the children > filesystems because it will break the semantic of FileSystem.newInstance(). > We might add an inner cache to ViewFileSystem, let it cache all the children > filesystems. The children filesystems are not shared any more. When > ViewFileSystem is closed we close all the children filesystems in the inner > cache. The ViewFileSystem is still cached by FileSystem.CACHE so there won't > be too many FileSystem instances. > The FileSystem.CACHE caches the ViewFileSysem instance and the other > instances(the children filesystems) are cached in the inner cache. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15565) ViewFileSystem.close doesn't close child filesystems and causes FileSystem objects leak.
[ https://issues.apache.org/jira/browse/HADOOP-15565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924441#comment-16924441 ] Erik Krogen commented on HADOOP-15565: -- The v8 patch LGTM, thanks a lot [~LiJinglun]! I just committed this to trunk. > ViewFileSystem.close doesn't close child filesystems and causes FileSystem > objects leak. > > > Key: HADOOP-15565 > URL: https://issues.apache.org/jira/browse/HADOOP-15565 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HADOOP-15565.0001.patch, HADOOP-15565.0002.patch, > HADOOP-15565.0003.patch, HADOOP-15565.0004.patch, HADOOP-15565.0005.patch, > HADOOP-15565.0006.bak, HADOOP-15565.0006.patch, HADOOP-15565.0007.patch, > HADOOP-15565.0008.patch > > > ViewFileSystem.close() does nothing but remove itself from FileSystem.CACHE. > It's children filesystems are cached in FileSystem.CACHE and shared by all > the ViewFileSystem instances. We could't simply close all the children > filesystems because it will break the semantic of FileSystem.newInstance(). > We might add an inner cache to ViewFileSystem, let it cache all the children > filesystems. The children filesystems are not shared any more. When > ViewFileSystem is closed we close all the children filesystems in the inner > cache. The ViewFileSystem is still cached by FileSystem.CACHE so there won't > be too many FileSystem instances. > The FileSystem.CACHE caches the ViewFileSysem instance and the other > instances(the children filesystems) are cached in the inner cache. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15565) ViewFileSystem.close doesn't close child filesystems and causes FileSystem objects leak.
[ https://issues.apache.org/jira/browse/HADOOP-15565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923545#comment-16923545 ] Erik Krogen commented on HADOOP-15565: -- Thanks for the detailed explanations [~LiJinglun]! They are very helpful; it all makes sense now. It looks like the v007 patch is identical to the v006 patch – did you upload the wrong file? {code:java} ± diff HADOOP-15565.0006.patch HADOOP-15565.0006.patch | wc -l 0 {code} While I'm at it, there is one more thing I noticed: In {{TestChRootedFileSystem#getChildFileSystem()}}, can we use [{{Objects.equals}}|https://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#equals-java.lang.Object-java.lang.Object-] instead of manually doing a null + equality check? I think it should make this cleaner. > ViewFileSystem.close doesn't close child filesystems and causes FileSystem > objects leak. > > > Key: HADOOP-15565 > URL: https://issues.apache.org/jira/browse/HADOOP-15565 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HADOOP-15565.0001.patch, HADOOP-15565.0002.patch, > HADOOP-15565.0003.patch, HADOOP-15565.0004.patch, HADOOP-15565.0005.patch, > HADOOP-15565.0006.patch, HADOOP-15565.0007.patch > > > ViewFileSystem.close() does nothing but remove itself from FileSystem.CACHE. > It's children filesystems are cached in FileSystem.CACHE and shared by all > the ViewFileSystem instances. We could't simply close all the children > filesystems because it will break the semantic of FileSystem.newInstance(). > We might add an inner cache to ViewFileSystem, let it cache all the children > filesystems. The children filesystems are not shared any more. When > ViewFileSystem is closed we close all the children filesystems in the inner > cache. The ViewFileSystem is still cached by FileSystem.CACHE so there won't > be too many FileSystem instances. > The FileSystem.CACHE caches the ViewFileSysem instance and the other > instances(the children filesystems) are cached in the inner cache. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15565) ViewFileSystem.close doesn't close child filesystems and causes FileSystem objects leak.
[ https://issues.apache.org/jira/browse/HADOOP-15565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923545#comment-16923545 ] Erik Krogen edited comment on HADOOP-15565 at 9/5/19 3:41 PM: -- Thanks for the detailed explanations [~LiJinglun]! They are very helpful; it all makes sense now. It looks like the v007 patch is identical to the v006 patch – did you upload the wrong file? {code:java} ± diff HADOOP-15565.0006.patch HADOOP-15565.0007.patch | wc -l 0 {code} While I'm at it, there is one more thing I noticed: In {{TestChRootedFileSystem#getChildFileSystem()}}, can we use [{{Objects.equals}}|https://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#equals-java.lang.Object-java.lang.Object-] instead of manually doing a null + equality check? I think it should make this cleaner. was (Author: xkrogen): Thanks for the detailed explanations [~LiJinglun]! They are very helpful; it all makes sense now. It looks like the v007 patch is identical to the v006 patch – did you upload the wrong file? {code:java} ± diff HADOOP-15565.0006.patch HADOOP-15565.0006.patch | wc -l 0 {code} While I'm at it, there is one more thing I noticed: In {{TestChRootedFileSystem#getChildFileSystem()}}, can we use [{{Objects.equals}}|https://docs.oracle.com/javase/8/docs/api/java/util/Objects.html#equals-java.lang.Object-java.lang.Object-] instead of manually doing a null + equality check? I think it should make this cleaner. > ViewFileSystem.close doesn't close child filesystems and causes FileSystem > objects leak. > > > Key: HADOOP-15565 > URL: https://issues.apache.org/jira/browse/HADOOP-15565 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HADOOP-15565.0001.patch, HADOOP-15565.0002.patch, > HADOOP-15565.0003.patch, HADOOP-15565.0004.patch, HADOOP-15565.0005.patch, > HADOOP-15565.0006.patch, HADOOP-15565.0007.patch > > > ViewFileSystem.close() does nothing but remove itself from FileSystem.CACHE. > It's children filesystems are cached in FileSystem.CACHE and shared by all > the ViewFileSystem instances. We could't simply close all the children > filesystems because it will break the semantic of FileSystem.newInstance(). > We might add an inner cache to ViewFileSystem, let it cache all the children > filesystems. The children filesystems are not shared any more. When > ViewFileSystem is closed we close all the children filesystems in the inner > cache. The ViewFileSystem is still cached by FileSystem.CACHE so there won't > be too many FileSystem instances. > The FileSystem.CACHE caches the ViewFileSysem instance and the other > instances(the children filesystems) are cached in the inner cache. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16531) Log more detail for slow RPC
[ https://issues.apache.org/jira/browse/HADOOP-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923512#comment-16923512 ] Erik Krogen commented on HADOOP-16531: -- Thanks [~zhangchen]! This seems like a nice improvement and great leverage of the {{ProcessingDetails}} we added in HADOOP-16266. +1 from me. I'll give others some time to look and commit tomorrow morning PDT. > Log more detail for slow RPC > > > Key: HADOOP-16531 > URL: https://issues.apache.org/jira/browse/HADOOP-16531 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HADOOP-16531.001.patch > > > Current implementation only log process time > {code:java} > if ((rpcMetrics.getProcessingSampleCount() > minSampleSize) && > (processingTime > threeSigma)) { > LOG.warn("Slow RPC : {} took {} {} to process from client {}", > methodName, processingTime, RpcMetrics.TIMEUNIT, call); > rpcMetrics.incrSlowRpc(); > } > {code} > We need to log more details to help us locate the problem (eg. how long it > take to request lock, holding lock, or do other things) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15565) ViewFileSystem.close doesn't close child filesystems and causes FileSystem objects leak.
[ https://issues.apache.org/jira/browse/HADOOP-15565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922760#comment-16922760 ] Erik Krogen commented on HADOOP-15565: -- Hi [~LiJinglun], sorry for my delayed response, I just returned from vacation recently. Here is another review. Things look great overall, the comments are mostly minor. * Can we make the log message in {{ViewFileSystem}} L119 more informative? If you were only to look at the logs and not the code, "close failed" would be confusing. * Can you add some comments on {{ViewFileSystem}} L292 explaining why the cache can be immutable, on {{InnerCache}} explaining why this cache is necessary (maybe just a reference to this JIRA), and on {{InnerCache.Key}} describing why it is okay to use a simple key here (as we discussed previously, no need for UGI)? * The tests in {{TestViewFileSystemHdfs}} LGTM, but I don't think they are HDFS-specific. Can we put them in {{ViewFileSystemBaseTest}}? Also you have one typo, {{testViewFilsSystemInnerCache}} should be {{testViewFileSystemInnerCache}} * Can you describe why the changes in {{TestViewFsDefaultValue}} are necessary? * Can you explain why the changes in {{TestViewFileSystemDelegationTokenSupport}} are necessary? Same for {{TestViewFileSystemDelegation}} -- it seems like the old way of returning the created {{fs}} was cleaner? I also don't understand the need for changes in {{testSanity()}} -- does the string comparison no longer work? > ViewFileSystem.close doesn't close child filesystems and causes FileSystem > objects leak. > > > Key: HADOOP-15565 > URL: https://issues.apache.org/jira/browse/HADOOP-15565 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HADOOP-15565.0001.patch, HADOOP-15565.0002.patch, > HADOOP-15565.0003.patch, HADOOP-15565.0004.patch, HADOOP-15565.0005.patch, > HADOOP-15565.0006.patch > > > ViewFileSystem.close() does nothing but remove itself from FileSystem.CACHE. > It's children filesystems are cached in FileSystem.CACHE and shared by all > the ViewFileSystem instances. We could't simply close all the children > filesystems because it will break the semantic of FileSystem.newInstance(). > We might add an inner cache to ViewFileSystem, let it cache all the children > filesystems. The children filesystems are not shared any more. When > ViewFileSystem is closed we close all the children filesystems in the inner > cache. The ViewFileSystem is still cached by FileSystem.CACHE so there won't > be too many FileSystem instances. > The FileSystem.CACHE caches the ViewFileSysem instance and the other > instances(the children filesystems) are cached in the inner cache. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16268) Allow custom wrapped exception to be thrown by server if RPC call queue is filled up
[ https://issues.apache.org/jira/browse/HADOOP-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922600#comment-16922600 ] Erik Krogen commented on HADOOP-16268: -- [~crh] sorry for the delay, I just returned from vacation. I just committed this to trunk. Thanks for the contribution! > Allow custom wrapped exception to be thrown by server if RPC call queue is > filled up > > > Key: HADOOP-16268 > URL: https://issues.apache.org/jira/browse/HADOOP-16268 > Project: Hadoop Common > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-16268.001.patch, HADOOP-16268.002.patch, > HADOOP-16268.003.patch, HADOOP-16268.004.patch > > > In the current implementation of callqueue manager, > "CallQueueOverflowException" exceptions are always wrapping > "RetriableException". Through configs servers should be allowed to throw > custom exceptions based on new use cases. > In CallQueueManager.java for backoff the below is done > {code:java} > // ideally this behavior should be controllable too. > private void throwBackoff() throws IllegalStateException { > throw CallQueueOverflowException.DISCONNECT; > } > {code} > Since CallQueueOverflowException only wraps RetriableException clients would > end up hitting the same server for retries. In use cases that router supports > these overflowed requests could be handled by another router that shares the > same state thus distributing load across a cluster of routers better. In the > absence of any custom exception, current behavior should be supported. > In CallQueueOverflowException class a new Standby exception wrap should be > created. Something like the below > {code:java} >static final CallQueueOverflowException KEEPALIVE = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY), > RpcStatusProto.ERROR); > static final CallQueueOverflowException DISCONNECT = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > static final CallQueueOverflowException DISCONNECT2 = > new CallQueueOverflowException( > new StandbyException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16268) Allow custom wrapped exception to be thrown by server if RPC call queue is filled up
[ https://issues.apache.org/jira/browse/HADOOP-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16268: - Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Allow custom wrapped exception to be thrown by server if RPC call queue is > filled up > > > Key: HADOOP-16268 > URL: https://issues.apache.org/jira/browse/HADOOP-16268 > Project: Hadoop Common > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-16268.001.patch, HADOOP-16268.002.patch, > HADOOP-16268.003.patch, HADOOP-16268.004.patch > > > In the current implementation of callqueue manager, > "CallQueueOverflowException" exceptions are always wrapping > "RetriableException". Through configs servers should be allowed to throw > custom exceptions based on new use cases. > In CallQueueManager.java for backoff the below is done > {code:java} > // ideally this behavior should be controllable too. > private void throwBackoff() throws IllegalStateException { > throw CallQueueOverflowException.DISCONNECT; > } > {code} > Since CallQueueOverflowException only wraps RetriableException clients would > end up hitting the same server for retries. In use cases that router supports > these overflowed requests could be handled by another router that shares the > same state thus distributing load across a cluster of routers better. In the > absence of any custom exception, current behavior should be supported. > In CallQueueOverflowException class a new Standby exception wrap should be > created. Something like the below > {code:java} >static final CallQueueOverflowException KEEPALIVE = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY), > RpcStatusProto.ERROR); > static final CallQueueOverflowException DISCONNECT = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > static final CallQueueOverflowException DISCONNECT2 = > new CallQueueOverflowException( > new StandbyException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15726) Create utility to limit frequency of log statements
[ https://issues.apache.org/jira/browse/HADOOP-15726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922595#comment-16922595 ] Erik Krogen commented on HADOOP-15726: -- Hi [~zhangchen], if I recall correctly, the read locks weren't done in this patch because the implementation of {{LogThrottlingHelper}} is not thread-safe, and the read lock variables are modified in a concurrent fashion. If you want to make enhancements to support read locks, I would be happy to review. > Create utility to limit frequency of log statements > --- > > Key: HADOOP-15726 > URL: https://issues.apache.org/jira/browse/HADOOP-15726 > Project: Hadoop Common > Issue Type: Improvement > Components: common, util >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.0.4, 3.1.2 > > Attachments: HADOOP-15726.000.patch, HADOOP-15726.001.patch, > HADOOP-15726.002.patch, HADOOP-15726.003.patch, > HDFS-15726-branch-3.0.003.patch > > > There is a common pattern of logging a behavior that is normally extraneous. > Under some circumstances, such a behavior becomes common, flooding the logs > and making it difficult to see what else is going on in the system. Under > such situations it is beneficial to limit how frequently the extraneous > behavior is logged, while capturing some summary information about the > suppressed log statements. > This is currently implemented in {{FSNamesystemLock}} (in HDFS-10713). We > have additional use cases for this in HDFS-13791, so this is a good time to > create a common utility for different sites to share this logic. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16268) Allow custom wrapped exception to be thrown by server if RPC call queue is filled up
[ https://issues.apache.org/jira/browse/HADOOP-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914353#comment-16914353 ] Erik Krogen commented on HADOOP-16268: -- +1 on the latest patch > Allow custom wrapped exception to be thrown by server if RPC call queue is > filled up > > > Key: HADOOP-16268 > URL: https://issues.apache.org/jira/browse/HADOOP-16268 > Project: Hadoop Common > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HADOOP-16268.001.patch, HADOOP-16268.002.patch, > HADOOP-16268.003.patch, HADOOP-16268.004.patch > > > In the current implementation of callqueue manager, > "CallQueueOverflowException" exceptions are always wrapping > "RetriableException". Through configs servers should be allowed to throw > custom exceptions based on new use cases. > In CallQueueManager.java for backoff the below is done > {code:java} > // ideally this behavior should be controllable too. > private void throwBackoff() throws IllegalStateException { > throw CallQueueOverflowException.DISCONNECT; > } > {code} > Since CallQueueOverflowException only wraps RetriableException clients would > end up hitting the same server for retries. In use cases that router supports > these overflowed requests could be handled by another router that shares the > same state thus distributing load across a cluster of routers better. In the > absence of any custom exception, current behavior should be supported. > In CallQueueOverflowException class a new Standby exception wrap should be > created. Something like the below > {code:java} >static final CallQueueOverflowException KEEPALIVE = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY), > RpcStatusProto.ERROR); > static final CallQueueOverflowException DISCONNECT = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > static final CallQueueOverflowException DISCONNECT2 = > new CallQueueOverflowException( > new StandbyException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16268) Allow custom wrapped exception to be thrown by server if RPC call queue is filled up
[ https://issues.apache.org/jira/browse/HADOOP-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913569#comment-16913569 ] Erik Krogen commented on HADOOP-16268: -- LGTM thanks CR! +1 pending Jenkins. > Allow custom wrapped exception to be thrown by server if RPC call queue is > filled up > > > Key: HADOOP-16268 > URL: https://issues.apache.org/jira/browse/HADOOP-16268 > Project: Hadoop Common > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HADOOP-16268.001.patch, HADOOP-16268.002.patch, > HADOOP-16268.003.patch > > > In the current implementation of callqueue manager, > "CallQueueOverflowException" exceptions are always wrapping > "RetriableException". Through configs servers should be allowed to throw > custom exceptions based on new use cases. > In CallQueueManager.java for backoff the below is done > {code:java} > // ideally this behavior should be controllable too. > private void throwBackoff() throws IllegalStateException { > throw CallQueueOverflowException.DISCONNECT; > } > {code} > Since CallQueueOverflowException only wraps RetriableException clients would > end up hitting the same server for retries. In use cases that router supports > these overflowed requests could be handled by another router that shares the > same state thus distributing load across a cluster of routers better. In the > absence of any custom exception, current behavior should be supported. > In CallQueueOverflowException class a new Standby exception wrap should be > created. Something like the below > {code:java} >static final CallQueueOverflowException KEEPALIVE = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY), > RpcStatusProto.ERROR); > static final CallQueueOverflowException DISCONNECT = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > static final CallQueueOverflowException DISCONNECT2 = > new CallQueueOverflowException( > new StandbyException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16268) Allow custom wrapped exception to be thrown by server if RPC call queue is filled up
[ https://issues.apache.org/jira/browse/HADOOP-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912498#comment-16912498 ] Erik Krogen commented on HADOOP-16268: -- Great [~crh]! Thanks for the new changes. I have some small comments, mostly on the tests. * For the comment on {{IPC_CALLQUEUE_SERVER_FAILOVER_ENABLE}}, it seems that there is already a block of similar configs above with a general comment explaining how the namespacing works. If we push this key into that same block, I think we can remove this comment and rely on that one? * My IDE gives me a few warnings about {{testInsertionWithFailover}}: ** {{Exception}} is never thrown; you can remove the {{throws}} ** You can use diamond-typing for {{new FairCallQueue<>()}} ** {{p2}} is never used * {{testInsertionWithFailover}} is great, but a bit long. Can we refactor a method like: {code} private void addToQueueAndVerify(Schedulable call, int expectedQueue0, int expectedQueue1, int expectedQueue2) { Mockito.reset(fcq); fcq.add(call); Mockito.verify(fcq, times(expectedQueue0)).offerQueue(0, call); Mockito.verify(fcq, times(expectedQueue1)).offerQueue(1, call); Mockito.verify(fcq, times(expectedQueue2)).offerQueue(2, call); } {code} > Allow custom wrapped exception to be thrown by server if RPC call queue is > filled up > > > Key: HADOOP-16268 > URL: https://issues.apache.org/jira/browse/HADOOP-16268 > Project: Hadoop Common > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HADOOP-16268.001.patch, HADOOP-16268.002.patch > > > In the current implementation of callqueue manager, > "CallQueueOverflowException" exceptions are always wrapping > "RetriableException". Through configs servers should be allowed to throw > custom exceptions based on new use cases. > In CallQueueManager.java for backoff the below is done > {code:java} > // ideally this behavior should be controllable too. > private void throwBackoff() throws IllegalStateException { > throw CallQueueOverflowException.DISCONNECT; > } > {code} > Since CallQueueOverflowException only wraps RetriableException clients would > end up hitting the same server for retries. In use cases that router supports > these overflowed requests could be handled by another router that shares the > same state thus distributing load across a cluster of routers better. In the > absence of any custom exception, current behavior should be supported. > In CallQueueOverflowException class a new Standby exception wrap should be > created. Something like the below > {code:java} >static final CallQueueOverflowException KEEPALIVE = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY), > RpcStatusProto.ERROR); > static final CallQueueOverflowException DISCONNECT = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > static final CallQueueOverflowException DISCONNECT2 = > new CallQueueOverflowException( > new StandbyException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16268) Allow custom wrapped exception to be thrown by server if RPC call queue is filled up
[ https://issues.apache.org/jira/browse/HADOOP-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911496#comment-16911496 ] Erik Krogen commented on HADOOP-16268: -- Thanks for the explanation [~crh]. I thought that {{throwBackoff()}} was only called when {{shouldBackoff()}} returned true, but did not notice this code block which catches the {{IllegalStateException}}: {code} try { return putRef.get().add(e); } catch (CallQueueOverflowException ex) { // queue provided a custom exception that may control if the client // should be disconnected. throw ex; } catch (IllegalStateException ise) { throwBackoff(); } {code} I still think it makes sense to update {{FairCallQueue#add()}} as well. Even if the feature is primarily intended for use by the router, it should be consistent across different queue implementations. This may become useful when reading from standby nodes as well. > Allow custom wrapped exception to be thrown by server if RPC call queue is > filled up > > > Key: HADOOP-16268 > URL: https://issues.apache.org/jira/browse/HADOOP-16268 > Project: Hadoop Common > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HADOOP-16268.001.patch > > > In the current implementation of callqueue manager, > "CallQueueOverflowException" exceptions are always wrapping > "RetriableException". Through configs servers should be allowed to throw > custom exceptions based on new use cases. > In CallQueueManager.java for backoff the below is done > {code:java} > // ideally this behavior should be controllable too. > private void throwBackoff() throws IllegalStateException { > throw CallQueueOverflowException.DISCONNECT; > } > {code} > Since CallQueueOverflowException only wraps RetriableException clients would > end up hitting the same server for retries. In use cases that router supports > these overflowed requests could be handled by another router that shares the > same state thus distributing load across a cluster of routers better. In the > absence of any custom exception, current behavior should be supported. > In CallQueueOverflowException class a new Standby exception wrap should be > created. Something like the below > {code:java} >static final CallQueueOverflowException KEEPALIVE = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY), > RpcStatusProto.ERROR); > static final CallQueueOverflowException DISCONNECT = > new CallQueueOverflowException( > new RetriableException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > static final CallQueueOverflowException DISCONNECT2 = > new CallQueueOverflowException( > new StandbyException(TOO_BUSY + " - disconnecting"), > RpcStatusProto.FATAL); > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16391) Duplicate values in rpcDetailedMetrics
[ https://issues.apache.org/jira/browse/HADOOP-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16391: - Resolution: Fixed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) I fixed up the last whitespace issue for you and committed this to trunk. Thanks a lot for the contribution [~BilwaST]! > Duplicate values in rpcDetailedMetrics > -- > > Key: HADOOP-16391 > URL: https://issues.apache.org/jira/browse/HADOOP-16391 > Project: Hadoop Common > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-16391-001.patch, HADOOP-16391-002.patch, > HADOOP-16391-003.patch, image-2019-06-25-20-30-15-395.png, screenshot-1.png, > screenshot-2.png > > > In RpcDetailedMetrics init is called two times . Once for deferredRpcrates > and other one rates metrics which causes duplicate values in RM and NM > metrics. > !image-2019-06-25-20-30-15-395.png! -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16391) Duplicate values in rpcDetailedMetrics
[ https://issues.apache.org/jira/browse/HADOOP-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907568#comment-16907568 ] Erik Krogen commented on HADOOP-16391: -- Hi [~BilwaST], thanks for continuing to work on this :) The checkstlye and whitespace issues seem legitimate. Besides those, I had one thought regarding the changes to {{MutableRate}}. The new constructor basically makes it no longer a {{MutableRate}}, just a {{MutableStat}} – the only difference between the two is that {{MutableRate}} enforces the "Ops"/"Time" names and this constructor removes that convention. Instead, would it make more sense to leave {{MutableRate}} unchanged and then do this: {code:java|title=MutableRatesWithAggregation} metric = new MutableRate(name + typePrefix, name + typePrefix, false); {code} The output will look basically the same, with names like "GetLongDeferredNumOps" instead of "GetLongNumDeferredOps", and seems a change more in line with the intent of {{MutableRate}}. What do you think? Also, the new test looks great, but can we also add an assertion that there is a call for the deferred method name as well? You also have a typo: "Deferrred" (three r's) > Duplicate values in rpcDetailedMetrics > -- > > Key: HADOOP-16391 > URL: https://issues.apache.org/jira/browse/HADOOP-16391 > Project: Hadoop Common > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: HADOOP-16391-001.patch, HADOOP-16391-002.patch, > image-2019-06-25-20-30-15-395.png, screenshot-1.png, screenshot-2.png > > > In RpcDetailedMetrics init is called two times . Once for deferredRpcrates > and other one rates metrics which causes duplicate values in RM and NM > metrics. > !image-2019-06-25-20-30-15-395.png! -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906628#comment-16906628 ] Erik Krogen commented on HADOOP-16459: -- Thanks [~elgoiri]! Just committed this to branch-2. > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.0.4, 3.2.1, 3.1.3 > > Attachments: HADOOP-16266-branch-2.000.patch, > HADOOP-16266-branch-2.001.patch, HADOOP-16266-branch-2.002.patch, > HADOOP-16266-branch-3.0.000.patch, HADOOP-16266-branch-3.1.000.patch, > HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16459: - Resolution: Fixed Fix Version/s: 2.10.0 Status: Resolved (was: Patch Available) > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.2.1, 3.1.3 > > Attachments: HADOOP-16266-branch-2.000.patch, > HADOOP-16266-branch-2.001.patch, HADOOP-16266-branch-2.002.patch, > HADOOP-16266-branch-3.0.000.patch, HADOOP-16266-branch-3.1.000.patch, > HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16266) Add more fine-grained processing time metrics to the RPC layer
[ https://issues.apache.org/jira/browse/HADOOP-16266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16266: - Fix Version/s: 2.10.0 > Add more fine-grained processing time metrics to the RPC layer > -- > > Key: HADOOP-16266 > URL: https://issues.apache.org/jira/browse/HADOOP-16266 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Christopher Gregorian >Assignee: Erik Krogen >Priority: Minor > Labels: rpc > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HADOOP-16266.001.patch, HADOOP-16266.002.patch, > HADOOP-16266.003.patch, HADOOP-16266.004.patch, HADOOP-16266.005.patch, > HADOOP-16266.006.patch, HADOOP-16266.007.patch, HADOOP-16266.008.patch, > HADOOP-16266.009.patch, HADOOP-16266.010.patch, > HADOOP-16266.011-followon.patch, HADOOP-16266.011.patch > > > Splitting off of HDFS-14403 to track the first part: introduces more > fine-grained measuring of how a call's processing time is split up. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15565) ViewFileSystem.close doesn't close child filesystems and causes FileSystem objects leak.
[ https://issues.apache.org/jira/browse/HADOOP-15565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906520#comment-16906520 ] Erik Krogen commented on HADOOP-15565: -- Thanks for working on this [~LiJinglun]! Seems like a good problem to fix and I think the idea is sound. I didn't get a chance to look at the tests yet, but I have some comments from an initial review of the production code: * I don't think publicly exposing {{cacheSize()}} on {{FileSystem}} is a great idea. Can we make it package-private, and if it is needed in non-package-local tests, use a test utility to export it publicly? * Is there a chance the cache will be accessed in a multi-threaded way? If so we need to harden it for concurrent access. Looks like it will only work in a single-threaded fashion currently. If the FS instances are actually all created on startup, then I think we should explicitly populate the cache on startup. * I agree that swallowing exceptions on child FS close is the right move, but probably we should at least put them at INFO level? * This seems less strict than {{FileSystem.CACHE}} when checking for equality; it doesn't use the {{UserGroupInformation}} at all. I think this is safe because the cache is local to a single {{ViewFileSystem}}, so all of the inner cached instances must share the same UGI, but please help me to confirm. * We can use {{Objects.hash()}} for the {{hashCode()}} method of {{Key}}. * On {{ViewFileSystem}} L257, you shouldn't initialize {{fs}} -- you can just declare it: {{FileSystem fs;}} (this allows the compiler to help ensure that you remember to initialize it later) > ViewFileSystem.close doesn't close child filesystems and causes FileSystem > objects leak. > > > Key: HADOOP-15565 > URL: https://issues.apache.org/jira/browse/HADOOP-15565 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HADOOP-15565.0001.patch, HADOOP-15565.0002.patch, > HADOOP-15565.0003.patch, HADOOP-15565.0004.patch, HADOOP-15565.0005.patch > > > ViewFileSystem.close() does nothing but remove itself from FileSystem.CACHE. > It's children filesystems are cached in FileSystem.CACHE and shared by all > the ViewFileSystem instances. We could't simply close all the children > filesystems because it will break the semantic of FileSystem.newInstance(). > We might add an inner cache to ViewFileSystem, let it cache all the children > filesystems. The children filesystems are not shared any more. When > ViewFileSystem is closed we close all the children filesystems in the inner > cache. The ViewFileSystem is still cached by FileSystem.CACHE so there won't > be too many FileSystem instances. > The FileSystem.CACHE caches the ViewFileSysem instance and the other > instances(the children filesystems) are cached in the inner cache. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906420#comment-16906420 ] Erik Krogen commented on HADOOP-16459: -- Re-visiting this branch-2 backport now that HDFS-14204 has been completed -- see [^HADOOP-16266-branch-2.002.patch]. I was able to use the branch-3.0 patch as a cherry-pick, and made the Java 7 compatibility fixes discussed in my [previous comment|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=13246892&commentId=16892216]. [~vagarychen] or [~elgoiri], care to take another look? > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.0.4, 3.2.1, 3.1.3 > > Attachments: HADOOP-16266-branch-2.000.patch, > HADOOP-16266-branch-2.001.patch, HADOOP-16266-branch-2.002.patch, > HADOOP-16266-branch-3.0.000.patch, HADOOP-16266-branch-3.1.000.patch, > HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16459: - Attachment: HADOOP-16266-branch-2.002.patch > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.0.4, 3.2.1, 3.1.3 > > Attachments: HADOOP-16266-branch-2.000.patch, > HADOOP-16266-branch-2.001.patch, HADOOP-16266-branch-2.002.patch, > HADOOP-16266-branch-3.0.000.patch, HADOOP-16266-branch-3.1.000.patch, > HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16459: - Fix Version/s: 3.1.3 3.2.1 3.0.4 > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.0.4, 3.2.1, 3.1.3 > > Attachments: HADOOP-16266-branch-2.000.patch, > HADOOP-16266-branch-2.001.patch, HADOOP-16266-branch-3.0.000.patch, > HADOOP-16266-branch-3.1.000.patch, HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896554#comment-16896554 ] Erik Krogen commented on HADOOP-16459: -- I noticed that with HDFS-12943 in branch-3.0 (as part of HDFS-14573), it can be a direct cherry pick. So I think it will make everyone's life easier if I wait until HDFS-14204 is completed to backport HDFS-12943 to branch-2. I committed to branch-3.2, branch-3.1 and branch-3.0 for now. > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-2.000.patch, > HADOOP-16266-branch-2.001.patch, HADOOP-16266-branch-3.0.000.patch, > HADOOP-16266-branch-3.1.000.patch, HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16266) Add more fine-grained processing time metrics to the RPC layer
[ https://issues.apache.org/jira/browse/HADOOP-16266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16266: - Fix Version/s: 3.1.3 3.2.1 3.0.4 > Add more fine-grained processing time metrics to the RPC layer > -- > > Key: HADOOP-16266 > URL: https://issues.apache.org/jira/browse/HADOOP-16266 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Christopher Gregorian >Assignee: Erik Krogen >Priority: Minor > Labels: rpc > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HADOOP-16266.001.patch, HADOOP-16266.002.patch, > HADOOP-16266.003.patch, HADOOP-16266.004.patch, HADOOP-16266.005.patch, > HADOOP-16266.006.patch, HADOOP-16266.007.patch, HADOOP-16266.008.patch, > HADOOP-16266.009.patch, HADOOP-16266.010.patch, > HADOOP-16266.011-followon.patch, HADOOP-16266.011.patch > > > Splitting off of HDFS-14403 to track the first part: introduces more > fine-grained measuring of how a call's processing time is split up. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16245: - Resolution: Fixed Fix Version/s: 3.1.3 3.2.1 3.3.0 3.0.4 2.10.0 Status: Resolved (was: Patch Available) > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch, HADOOP-16245.003.patch, HADOOP-16245.004.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894055#comment-16894055 ] Erik Krogen commented on HADOOP-16245: -- Thanks [~vagarychen]! just committed this to trunk, branch-3.2, branch-3.1, branch-3.0, branch-2. > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch, HADOOP-16245.003.patch, HADOOP-16245.004.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16452) Increase ipc.maximum.data.length default from 64MB to 128MB
[ https://issues.apache.org/jira/browse/HADOOP-16452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893060#comment-16893060 ] Erik Krogen commented on HADOOP-16452: -- {quote} Breaking the BRs into multiple messages is a possible (and probably better) solution {quote} I agree, and I see you already linked HDFS-11313 with a proposal to do this. {quote} The block report processing logic releases the NN lock every 4 milliseconds. (BlockManager.BlockReportProcessingThread#processQueue) {quote} This is only for IBRs. A single volume FBR is still processed under a lock without release; HDFS-14657 proposes to fix this. But it is an issue for now. {quote} Additionally, when you get into this sort of situation, the only real solution is to increase the limit, so it probably makes sense to bump this to 128MB by default. If the cluster is running with this many blocks, then the NN heap is probably big enough to accommodate the larger report size anyway. {quote} +1 > Increase ipc.maximum.data.length default from 64MB to 128MB > --- > > Key: HADOOP-16452 > URL: https://issues.apache.org/jira/browse/HADOOP-16452 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Affects Versions: 2.6.0 >Reporter: Wei-Chiu Chuang >Priority: Major > > Reason for bumping the default: > Denser DataNodes are common. It is not uncommon to find a DataNode with > 7 > million blocks these days. > With such a high number of blocks, the block report message can exceed the > 64mb limit (defined by ipc.maximum.data.length). The block reports are > rejected, causing missing blocks in HDFS. We had to double this configuration > value in order to work around the issue. > We are seeing an increasing number of these cases. I think it's time to > revisit some of these default values as the hardware evolves. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893048#comment-16893048 ] Erik Krogen commented on HADOOP-16459: -- The checkstyles are carry-overs from matching the style around them that exist in the initial trunk patch as well; I don't think it's worth fixing them here. The failing tests are all flaky tests and not related. Ping [~vagarychen], [~elgoiri] to see if anyone can help out with a review. Also [~csun] FYI > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-2.000.patch, > HADOOP-16266-branch-2.001.patch, HADOOP-16266-branch-3.0.000.patch, > HADOOP-16266-branch-3.1.000.patch, HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892216#comment-16892216 ] Erik Krogen edited comment on HADOOP-16459 at 7/24/19 10:07 PM: I've put up a branch-2 patch as well. It has two additional modifications from the branch-3.0 patch: * A lamba for {{GenericTestUtils.waitFor()}} is replaced with an anonymous subclass * The {{RpcScheduler}} interface can no longer have default methods, since branch-2 uses Java 7. Unfortunately Java 7 has no way to emulate this behavior, so if users have a custom {{RpcScheduler}} implementation, it will break with this change. Our [compatibility policy|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/InterfaceClassification.html] states that it is acceptable for us to make this breaking change in a minor version release since this interface is marked as {{LimitedPrivate}} / {{Evolving}}. Edit: v000 patch for branch-2 was old and still had compilation issues. I just put up v001 with the correct version. My mistake. was (Author: xkrogen): I've put up a branch-2 patch as well. It has two additional modifications from the branch-3.0 patch: * A lamba for {{GenericTestUtils.waitFor()}} is replaced with an anonymous subclass * The {{RpcScheduler}} interface can no longer have default methods, since branch-2 uses Java 7. Unfortunately Java 7 has no way to emulate this behavior, so if users have a custom {{RpcScheduler}} implementation, it will break with this change. Our [compatibility policy|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/InterfaceClassification.html] states that it is acceptable for us to make this breaking change in a minor version release since this interface is marked as {{LimitedPrivate}} / {{Evolving}}. > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-2.000.patch, > HADOOP-16266-branch-2.001.patch, HADOOP-16266-branch-3.0.000.patch, > HADOOP-16266-branch-3.1.000.patch, HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16459: - Attachment: HADOOP-16266-branch-2.001.patch > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-2.000.patch, > HADOOP-16266-branch-2.001.patch, HADOOP-16266-branch-3.0.000.patch, > HADOOP-16266-branch-3.1.000.patch, HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892216#comment-16892216 ] Erik Krogen commented on HADOOP-16459: -- I've put up a branch-2 patch as well. It has two additional modifications from the branch-3.0 patch: * A lamba for {{GenericTestUtils.waitFor()}} is replaced with an anonymous subclass * The {{RpcScheduler}} interface can no longer have default methods, since branch-2 uses Java 7. Unfortunately Java 7 has no way to emulate this behavior, so if users have a custom {{RpcScheduler}} implementation, it will break with this change. Our [compatibility policy|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/InterfaceClassification.html] states that it is acceptable for us to make this breaking change in a minor version release since this interface is marked as {{LimitedPrivate}} / {{Evolving}}. > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-2.000.patch, > HADOOP-16266-branch-3.0.000.patch, HADOOP-16266-branch-3.1.000.patch, > HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16459: - Attachment: HADOOP-16266-branch-2.000.patch > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-2.000.patch, > HADOOP-16266-branch-3.0.000.patch, HADOOP-16266-branch-3.1.000.patch, > HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892198#comment-16892198 ] Erik Krogen commented on HADOOP-16459: -- I should note that all of the patches attached include the follow-on commit to fix the issue that was discovered after HADOOP-16266 was committed. > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-3.0.000.patch, > HADOOP-16266-branch-3.1.000.patch, HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892188#comment-16892188 ] Erik Krogen edited comment on HADOOP-16459 at 7/24/19 9:18 PM: --- Attached branch-3.0 patch. The merge difference was smaller than I expected; no logic changes were necessary. The signatures of some methods that appeared in the vicinity of changes were different, but not in a way that affected this patch. One notable difference was that {{TestConsistentReadsObserver}} doesn't yet exist in branch-3.0, so the modifications to that test were excluded. If HDFS-14573 is finalized before this goes in, those changes should be brought back. was (Author: xkrogen): Attached branch-3.0 patch. The merge difference was smaller than I expected; no logic changes were necessary. The signatures of some methods that appeared in the vicinity of changes were different, but not in a way that affected this patch. > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-3.0.000.patch, > HADOOP-16266-branch-3.1.000.patch, HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892188#comment-16892188 ] Erik Krogen commented on HADOOP-16459: -- Attached branch-3.0 patch. The merge difference was smaller than I expected; no logic changes were necessary. The signatures of some methods that appeared in the vicinity of changes were different, but not in a way that affected this patch. > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-3.0.000.patch, > HADOOP-16266-branch-3.1.000.patch, HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16459: - Attachment: HADOOP-16266-branch-3.0.000.patch > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-3.0.000.patch, > HADOOP-16266-branch-3.1.000.patch, HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16459: - Status: Patch Available (was: Open) Attached branch-3.2 and branch-3.1 patches, both of which had only very minor merge conflicts (imports). {{branch-3.0}} has some non-trivial differences; I am working on a patch now. > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-3.1.000.patch, > HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16459: - Attachment: HADOOP-16266-branch-3.2.000.patch HADOOP-16266-branch-3.1.000.patch > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16266-branch-3.1.000.patch, > HADOOP-16266-branch-3.2.000.patch > > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16459: - Description: We would like to target pulling HADOOP-16266, an important operability enhancement and prerequisite for HDFS-14403, into branch-2. It's only present in trunk now so we also need to backport through the 3.x lines. was:We would like to target pulling HADOOP-16266, an important operability enhancement and prerequisite for HDFS-14403, into branch-2. > Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the > RPC layer" to branch-2 > > > Key: HADOOP-16459 > URL: https://issues.apache.org/jira/browse/HADOOP-16459 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > We would like to target pulling HADOOP-16266, an important operability > enhancement and prerequisite for HDFS-14403, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16459) Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2
Erik Krogen created HADOOP-16459: Summary: Backport [HADOOP-16266] "Add more fine-grained processing time metrics to the RPC layer" to branch-2 Key: HADOOP-16459 URL: https://issues.apache.org/jira/browse/HADOOP-16459 Project: Hadoop Common Issue Type: Improvement Reporter: Erik Krogen Assignee: Erik Krogen We would like to target pulling HADOOP-16266, an important operability enhancement and prerequisite for HDFS-14403, into branch-2. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16891927#comment-16891927 ] Erik Krogen commented on HADOOP-16245: -- Attached v004 patch to address checkstyle/whitespace issues. > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch, HADOOP-16245.003.patch, HADOOP-16245.004.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16245: - Attachment: HADOOP-16245.004.patch > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch, HADOOP-16245.003.patch, HADOOP-16245.004.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16891402#comment-16891402 ] Erik Krogen commented on HADOOP-16245: -- Thanks [~jojochuang], I agree with your assessment. Spent a little more time testing this today, and realize that while the tests we ran previously verified that the LDAP SSL configs no longer broke other system-wide SSL configs, LDAP SSL also wasn't working properly with the v002 patch; the class name being passed via reflection was wrong, and a few APIs on {{SocketFactory}} were not implemented as expected by the Java naming services. I've fixed these issues, and _actually_ verified that this solves the issue without breaking LDAP SSL, in the v003 patch. > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch, HADOOP-16245.003.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16891402#comment-16891402 ] Erik Krogen edited comment on HADOOP-16245 at 7/23/19 9:43 PM: --- Thanks [~jojochuang], I agree with your assessment. Spent a little more time testing this today, and realized that while the tests we ran previously verified that the LDAP SSL configs no longer broke other system-wide SSL configs, LDAP SSL also wasn't working properly with the v002 patch. The class name being passed via reflection was wrong, and a few APIs on {{SocketFactory}} were not implemented as expected by the Java naming services. I've fixed these issues, and _actually_ verified that this solves the issue without breaking LDAP SSL, in the v003 patch. [~vagarychen], can you take another look? was (Author: xkrogen): Thanks [~jojochuang], I agree with your assessment. Spent a little more time testing this today, and realize that while the tests we ran previously verified that the LDAP SSL configs no longer broke other system-wide SSL configs, LDAP SSL also wasn't working properly with the v002 patch; the class name being passed via reflection was wrong, and a few APIs on {{SocketFactory}} were not implemented as expected by the Java naming services. I've fixed these issues, and _actually_ verified that this solves the issue without breaking LDAP SSL, in the v003 patch. [~vagarychen], can you take another look? > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch, HADOOP-16245.003.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16891402#comment-16891402 ] Erik Krogen edited comment on HADOOP-16245 at 7/23/19 9:43 PM: --- Thanks [~jojochuang], I agree with your assessment. Spent a little more time testing this today, and realize that while the tests we ran previously verified that the LDAP SSL configs no longer broke other system-wide SSL configs, LDAP SSL also wasn't working properly with the v002 patch; the class name being passed via reflection was wrong, and a few APIs on {{SocketFactory}} were not implemented as expected by the Java naming services. I've fixed these issues, and _actually_ verified that this solves the issue without breaking LDAP SSL, in the v003 patch. [~vagarychen], can you take another look? was (Author: xkrogen): Thanks [~jojochuang], I agree with your assessment. Spent a little more time testing this today, and realize that while the tests we ran previously verified that the LDAP SSL configs no longer broke other system-wide SSL configs, LDAP SSL also wasn't working properly with the v002 patch; the class name being passed via reflection was wrong, and a few APIs on {{SocketFactory}} were not implemented as expected by the Java naming services. I've fixed these issues, and _actually_ verified that this solves the issue without breaking LDAP SSL, in the v003 patch. > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch, HADOOP-16245.003.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16245: - Attachment: HADOOP-16245.003.patch > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch, HADOOP-16245.003.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16245: - Attachment: (was: HADOOP-16245.003.patch) > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16245: - Attachment: HADOOP-16245.003.patch > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch, HADOOP-16245.003.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890471#comment-16890471 ] Erik Krogen commented on HADOOP-16245: -- Thanks [~vagarychen]! I expanded the description a bit more. Let me know what you think. I was unable to reproduce the TestIPC failure locally; I believe it is unrelated. > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16245: - Attachment: HADOOP-16245.002.patch > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch, > HADOOP-16245.002.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890292#comment-16890292 ] Erik Krogen commented on HADOOP-16245: -- I've now tested this on one of our live clusters and confirmed that I was able to configure {{LdapGroupsMapping}} without negatively impacting other SSL connections, fixing the issue discussed here. I rebased the patch and cleaned up the documentation for v001. I think it should be ready for commit now. I'm can't think of a good way to test this in a unit test so I haven't added one for now. > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16245) Enabling SSL within LdapGroupsMapping can break system SSL configs
[ https://issues.apache.org/jira/browse/HADOOP-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16245: - Attachment: HADOOP-16245.001.patch > Enabling SSL within LdapGroupsMapping can break system SSL configs > -- > > Key: HADOOP-16245 > URL: https://issues.apache.org/jira/browse/HADOOP-16245 > Project: Hadoop Common > Issue Type: Bug > Components: common, security >Affects Versions: 2.9.1, 2.8.4, 2.7.6, 3.1.1, 3.0.3 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HADOOP-16245.000.patch, HADOOP-16245.001.patch > > > When debugging an issue where one of our server components was unable to > communicate with other components via SSL, we realized that LdapGroupsMapping > sets its SSL configurations globally, rather than scoping them to the HTTP > clients it creates. > {code:title=LdapGroupsMapping} > DirContext getDirContext() throws NamingException { > if (ctx == null) { > // Set up the initial environment for LDAP connectivity > Hashtable env = new Hashtable(); > env.put(Context.INITIAL_CONTEXT_FACTORY, > com.sun.jndi.ldap.LdapCtxFactory.class.getName()); > env.put(Context.PROVIDER_URL, ldapUrl); > env.put(Context.SECURITY_AUTHENTICATION, "simple"); > // Set up SSL security, if necessary > if (useSsl) { > env.put(Context.SECURITY_PROTOCOL, "ssl"); > if (!keystore.isEmpty()) { > System.setProperty("javax.net.ssl.keyStore", keystore); > } > if (!keystorePass.isEmpty()) { > System.setProperty("javax.net.ssl.keyStorePassword", keystorePass); > } > if (!truststore.isEmpty()) { > System.setProperty("javax.net.ssl.trustStore", truststore); > } > if (!truststorePass.isEmpty()) { > System.setProperty("javax.net.ssl.trustStorePassword", > truststorePass); > } > } > env.put(Context.SECURITY_PRINCIPAL, bindUser); > env.put(Context.SECURITY_CREDENTIALS, bindPassword); > env.put("com.sun.jndi.ldap.connect.timeout", > conf.get(CONNECTION_TIMEOUT, > String.valueOf(CONNECTION_TIMEOUT_DEFAULT))); > env.put("com.sun.jndi.ldap.read.timeout", conf.get(READ_TIMEOUT, > String.valueOf(READ_TIMEOUT_DEFAULT))); > ctx = new InitialDirContext(env); > } > {code} > Notice the {{System.setProperty()}} calls, which will change settings > JVM-wide. This causes issues for other SSL clients, which may rely on the > default JVM truststore being used. This behavior was initially introduced by > HADOOP-8121, and extended to include the truststore configurations in > HADOOP-12862. > The correct approach is to use a mechanism which is scoped to the LDAP > requests only. The right approach appears to be to use the > {{java.naming.ldap.factory.socket}} parameter to set the socket factory to a > custom SSL socket factory which correctly sets the key and trust store > parameters. See an example [here|https://stackoverflow.com/a/4615497/4979203]. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16418) Fix checkstyle and findbugs warnings in hadoop-dynamometer
[ https://issues.apache.org/jira/browse/HADOOP-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16418: - Fix Version/s: 3.3.0 > Fix checkstyle and findbugs warnings in hadoop-dynamometer > -- > > Key: HADOOP-16418 > URL: https://issues.apache.org/jira/browse/HADOOP-16418 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Masatake Iwasaki >Assignee: Erik Krogen >Priority: Minor > Fix For: 3.3.0 > > Attachments: HADOOP-16418.000.patch, HADOOP-16418.001.patch, > HADOOP-16418.002.patch, HADOOP-16418.003.patch, HADOOP-16418.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16418) Fix checkstyle and findbugs warnings in hadoop-dynamometer
[ https://issues.apache.org/jira/browse/HADOOP-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16418: - Resolution: Fixed Status: Resolved (was: Patch Available) Just committed this to trunk. Thanks for the help [~iwasakims]! I learned some new things about checkstyle and findbugs thanks to you :) > Fix checkstyle and findbugs warnings in hadoop-dynamometer > -- > > Key: HADOOP-16418 > URL: https://issues.apache.org/jira/browse/HADOOP-16418 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Masatake Iwasaki >Assignee: Erik Krogen >Priority: Minor > Attachments: HADOOP-16418.000.patch, HADOOP-16418.001.patch, > HADOOP-16418.002.patch, HADOOP-16418.003.patch, HADOOP-16418.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16418) Fix checkstyle and findbugs warnings in hadoop-dynamometer
[ https://issues.apache.org/jira/browse/HADOOP-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882448#comment-16882448 ] Erik Krogen commented on HADOOP-16418: -- Fix those last 2 checkstyle warnings in v004 > Fix checkstyle and findbugs warnings in hadoop-dynamometer > -- > > Key: HADOOP-16418 > URL: https://issues.apache.org/jira/browse/HADOOP-16418 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Masatake Iwasaki >Assignee: Erik Krogen >Priority: Minor > Attachments: HADOOP-16418.000.patch, HADOOP-16418.001.patch, > HADOOP-16418.002.patch, HADOOP-16418.003.patch, HADOOP-16418.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16418) Fix checkstyle and findbugs warnings in hadoop-dynamometer
[ https://issues.apache.org/jira/browse/HADOOP-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16418: - Attachment: HADOOP-16418.004.patch > Fix checkstyle and findbugs warnings in hadoop-dynamometer > -- > > Key: HADOOP-16418 > URL: https://issues.apache.org/jira/browse/HADOOP-16418 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Masatake Iwasaki >Assignee: Erik Krogen >Priority: Minor > Attachments: HADOOP-16418.000.patch, HADOOP-16418.001.patch, > HADOOP-16418.002.patch, HADOOP-16418.003.patch, HADOOP-16418.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16418) Fix checkstyle and findbugs warnings in hadoop-dynamometer
[ https://issues.apache.org/jira/browse/HADOOP-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882330#comment-16882330 ] Erik Krogen commented on HADOOP-16418: -- I added the suppression for the JavadocStyle check before the comment as you suggested. Nice find. I fixed the {{BlockInfo}} FindBugs and also cleaned up that class a bit; there was quite a bit of dead code hanging around. I pulled out some helper methods for the long method warning. For the {{AMOptions}} constructor, I don't see an issue with the parameter number -- the constructor is just used internally to be able to create the object. I added a suppression and marked the constructor private. > Fix checkstyle and findbugs warnings in hadoop-dynamometer > -- > > Key: HADOOP-16418 > URL: https://issues.apache.org/jira/browse/HADOOP-16418 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Masatake Iwasaki >Assignee: Erik Krogen >Priority: Minor > Attachments: HADOOP-16418.000.patch, HADOOP-16418.001.patch, > HADOOP-16418.002.patch, HADOOP-16418.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16418) Fix checkstyle and findbugs warnings in hadoop-dynamometer
[ https://issues.apache.org/jira/browse/HADOOP-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16418: - Attachment: HADOOP-16418.003.patch > Fix checkstyle and findbugs warnings in hadoop-dynamometer > -- > > Key: HADOOP-16418 > URL: https://issues.apache.org/jira/browse/HADOOP-16418 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Masatake Iwasaki >Assignee: Erik Krogen >Priority: Minor > Attachments: HADOOP-16418.000.patch, HADOOP-16418.001.patch, > HADOOP-16418.002.patch, HADOOP-16418.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16418) Fix checkstyle and findbugs warnings in hadoop-dynamometer
[ https://issues.apache.org/jira/browse/HADOOP-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16418: - Attachment: (was: HADOOP-16418.003.patch) > Fix checkstyle and findbugs warnings in hadoop-dynamometer > -- > > Key: HADOOP-16418 > URL: https://issues.apache.org/jira/browse/HADOOP-16418 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Masatake Iwasaki >Assignee: Erik Krogen >Priority: Minor > Attachments: HADOOP-16418.000.patch, HADOOP-16418.001.patch, > HADOOP-16418.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16418) Fix checkstyle and findbugs warnings in hadoop-dynamometer
[ https://issues.apache.org/jira/browse/HADOOP-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16418: - Attachment: HADOOP-16418.003.patch > Fix checkstyle and findbugs warnings in hadoop-dynamometer > -- > > Key: HADOOP-16418 > URL: https://issues.apache.org/jira/browse/HADOOP-16418 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Masatake Iwasaki >Assignee: Erik Krogen >Priority: Minor > Attachments: HADOOP-16418.000.patch, HADOOP-16418.001.patch, > HADOOP-16418.002.patch, HADOOP-16418.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16418) Fix checkstyle and findbugs warnings in hadoop-dynamometer
[ https://issues.apache.org/jira/browse/HADOOP-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881593#comment-16881593 ] Erik Krogen commented on HADOOP-16418: -- Seems like the unit test failure is legitimate, though separate from this JIRA; I've created HDFS-14640 to track it. > Fix checkstyle and findbugs warnings in hadoop-dynamometer > -- > > Key: HADOOP-16418 > URL: https://issues.apache.org/jira/browse/HADOOP-16418 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Masatake Iwasaki >Assignee: Erik Krogen >Priority: Minor > Attachments: HADOOP-16418.000.patch, HADOOP-16418.001.patch, > HADOOP-16418.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16418) Fix checkstyle and findbugs warnings in hadoop-dynamometer
[ https://issues.apache.org/jira/browse/HADOOP-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HADOOP-16418: - Attachment: HADOOP-16418.002.patch > Fix checkstyle and findbugs warnings in hadoop-dynamometer > -- > > Key: HADOOP-16418 > URL: https://issues.apache.org/jira/browse/HADOOP-16418 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Masatake Iwasaki >Assignee: Erik Krogen >Priority: Minor > Attachments: HADOOP-16418.000.patch, HADOOP-16418.001.patch, > HADOOP-16418.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-16418) Fix checkstyle and findbugs warnings in hadoop-dynamometer
[ https://issues.apache.org/jira/browse/HADOOP-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881501#comment-16881501 ] Erik Krogen edited comment on HADOOP-16418 at 7/9/19 7:52 PM: -- I successfully got rid of the findbugs warnings and one of the checkstyle. The other 5 checkstyle warnings outstanding all originate from a single Javadoc comment that has some XML tags in it. They're wrapped inside {{}} and {{@code}} but checkstyle complains regardless. I tried adding a {{@SuppressWarning("checkstyle:javadocstyle")}} but it didn't fix the issue. Unless anyone has a better idea of how to fix this, I think we can live with those few checkstyle warnings remaining. Attached v002 patch which removes the added SuppressWarnings annotation that didn't work. was (Author: xkrogen): I successfully got rid of the findbugs warnings and one of the checkstyle. The other 5 checkstyle warnings outstanding all originate from a single Javadoc comment that has some XML tags in it. They're wrapped inside {{}} and {{@code}} but checkstyle complains regardless. I tried adding a {{@SuppressWarning("checkstyle:javadocstyle")}} but it didn't fix the issue. Unless anyone has a better idea of how to fix this, I think we can live with those few checkstyle warnings remaining. > Fix checkstyle and findbugs warnings in hadoop-dynamometer > -- > > Key: HADOOP-16418 > URL: https://issues.apache.org/jira/browse/HADOOP-16418 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Masatake Iwasaki >Assignee: Erik Krogen >Priority: Minor > Attachments: HADOOP-16418.000.patch, HADOOP-16418.001.patch, > HADOOP-16418.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org