[jira] [Created] (HDFS-16940) Provide HTTP API health endpoints to simplify monitoring

2023-03-03 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16940:


 Summary: Provide HTTP API health endpoints to simplify monitoring
 Key: HDFS-16940
 URL: https://issues.apache.org/jira/browse/HDFS-16940
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 3.4.0, 3.3.9
Reporter: Steve Vaughan
Assignee: Steve Vaughan


Provide HTTP endpoints that provide 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16932) Mockito causing ClassCastException

2023-02-22 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16932:


 Summary: Mockito causing ClassCastException
 Key: HDFS-16932
 URL: https://issues.apache.org/jira/browse/HDFS-16932
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.4.0
 Environment: Running in the Hadoop development environment in docker, 
running mvn.
Reporter: Steve Vaughan


Running tests in TestBalancerRPCDelay fails because of ClassCastExceptions 
introduced by Mockito.  As an example, in this stack trace note that the 
RedundancyMonitor is calling "isRunning" but incorrectly ends up being routed 
to getBlocks (which returns BlocksWithLocations) via TestBalancer and a Mockito 
Spy.  This ultimately is reported as a failure during the shutdown process.



{{Exception in thread "RedundancyMonitor" java.lang.ClassCastException: 
java.lang.Boolean cannot be cast to 
org.apache.hadoop.hdfs.server.protocol.BlocksWithLocations}}{{        at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer$2.answer(TestBalancer.java:1865)}}{{
        at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer$2.answer(TestBalancer.java:1858)}}{{
        at 
org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:39)}}{{
        at 
org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:96)}}{{
        at 
org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)}}{{
        at 
org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:35)}}{{
        at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:61)}}{{
        at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:49)}}{{
        at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:108)}}{{
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$MockitoMock$1070381809.isRunning(Unknown
 Source)}}{{        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:5155)}}{{
        at java.lang.Thread.run(Thread.java:750)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16905) Provide default hadoop.log.dir for tests

2023-02-02 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16905:


 Summary: Provide default hadoop.log.dir for tests
 Key: HDFS-16905
 URL: https://issues.apache.org/jira/browse/HDFS-16905
 Project: Hadoop HDFS
  Issue Type: Test
  Components: hdfs-client
Affects Versions: 3.4.0, 3.3.5, 3.3.9
 Environment: Tested using the Hadoop development environment Docker 
image and an IDE on Mac
Reporter: Steve Vaughan
 Fix For: 3.4.0, 3.3.9


Provide a default directory configuration for test logging



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16904) Close webhdfs during the teardown

2023-02-02 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16904:


 Summary: Close webhdfs during the teardown
 Key: HDFS-16904
 URL: https://issues.apache.org/jira/browse/HDFS-16904
 Project: Hadoop HDFS
  Issue Type: Test
  Components: hdfs
Affects Versions: 3.4.0, 3.3.5, 3.3.9
 Environment: Tested using the Hadoop development environment Docker 
image.
Reporter: Steve Vaughan
 Fix For: 3.4.0, 3.3.9


The teardown for the tests shutdown the cluster, but leaves HDFS open.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16768) KMS should have it's own Kerberos principal

2022-09-11 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16768:


 Summary: KMS should have it's own Kerberos principal
 Key: HDFS-16768
 URL: https://issues.apache.org/jira/browse/HDFS-16768
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: kms
Affects Versions: 3.4.0, 3.3.9
 Environment: Demonstrated using the trunk code base on UBI 8 under 
Java 11.
Reporter: Steve Vaughan
Assignee: Steve Vaughan


Starting the KMS service without first running `kinit` fails when using HDFS to 
store the keys, throwing:
{noformat}
java.io.IOException: org.apache.hadoop.security.AccessControlException: Client 
cannot authenticate via:[TOKEN, KERBEROS]{noformat}
with the following underlying cause:
 
{noformat}
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[TOKEN, KERBEROS] at 
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:179)
 at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:392){noformat}
In addition, it would be valuable to have the automatic refresh using the 
keytab which is provided by the UserGroupInformation.

I'm proposing 2 new configuration settings to allow the definition of the 
principal and keytab to use for KMS, and if provided that they should be 
initialized as part of the server startup using the UserGroupInformation 
methods to support reloading.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16755) TestQJMWithFaults.testUnresolvableHostName() can fail due to unexpected host resolution

2022-09-01 Thread Steve Vaughan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Vaughan resolved HDFS-16755.
--
Resolution: Fixed

> TestQJMWithFaults.testUnresolvableHostName() can fail due to unexpected host 
> resolution
> ---
>
> Key: HDFS-16755
> URL: https://issues.apache.org/jira/browse/HDFS-16755
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0, 3.3.9
> Environment: Running using both Maven Surefire and an IDE results in 
> a test failure.  Switching the name to "bogus.invalid" results in the 
> expected behavior, which depends on an UnknownHostException.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> Tests that want to use an unresolvable address may actually resolve in some 
> environments.  Replacing host names like "bogus" with a IETF RFC 2606 domain 
> name avoids the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-16755) TestQJMWithFaults.testUnresolvableHostName() can fail due to unexpected host resolution

2022-09-01 Thread Steve Vaughan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Vaughan reopened HDFS-16755:
--

Backporting to branch-3.3 to avoid a test failure

> TestQJMWithFaults.testUnresolvableHostName() can fail due to unexpected host 
> resolution
> ---
>
> Key: HDFS-16755
> URL: https://issues.apache.org/jira/browse/HDFS-16755
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0, 3.3.9
> Environment: Running using both Maven Surefire and an IDE results in 
> a test failure.  Switching the name to "bogus.invalid" results in the 
> expected behavior, which depends on an UnknownHostException.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> Tests that want to use an unresolvable address may actually resolve in some 
> environments.  Replacing host names like "bogus" with a IETF RFC 2606 domain 
> name avoids the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16753) WebHDFSHandler should reject non-compliant requests

2022-08-30 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16753:


 Summary: WebHDFSHandler should reject non-compliant requests
 Key: HDFS-16753
 URL: https://issues.apache.org/jira/browse/HDFS-16753
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.4.0, 3.3.9
 Environment: Tested using both Maven Surefire and an IDE to 
demonstrate that the fix correctly rejects the invalid request with a 400 
status code.
Reporter: Steve Vaughan
Assignee: Steve Vaughan


When the nnId is not provided to the WebHDFSClient, the request uses null to 
generate a URI using a host name of "null" to construct a DFSClient instance.  
In environments where the host name "null" doesn't resolve, the test passes due 
to the unresolvable name.  If the host name "null" does resolve, then this 
results in repeated attempts through the retry mechanism, eventually causing a 
timeout and a failed test result.

This change make the parameter a precondition for constructing the DFSClient, 
which throws an exception, rejecting the request, and return the expected 400 
status code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16740) Mini cluster test flakiness

2022-08-23 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16740:


 Summary: Mini cluster test flakiness
 Key: HDFS-16740
 URL: https://issues.apache.org/jira/browse/HDFS-16740
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, test
Affects Versions: 3.4.0, 3.3.9
Reporter: Steve Vaughan
Assignee: Steve Vaughan


Mini clusters used during HDFS unit tests are reporting test failures that do 
not appear to be directly related to submitted changes.  The failures are the 
result of either interactions between tests run in parallel, or tests which 
share common disk space for tests.  In all cases, the tests can be run 
individually serially without any errors.  Addressing this issue will simplify 
future submissions by eliminating the confusion introduced by these unrelated 
test failures.

We can apply lessons recently from TestRollingUpgrade, which was recently 
patched to unblock a recent submission.  The fixes involved changing the HDFS 
configuration to use temporary disk space for each individual tests, and using 
try-with-resources to ensure that clusters were shutdown cleanly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16687) RouterFsckServlet replicates code from DfsServlet base class

2022-08-23 Thread Steve Vaughan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Vaughan resolved HDFS-16687.
--
Fix Version/s: 3.3.9
   Resolution: Fixed

> RouterFsckServlet replicates code from DfsServlet base class
> 
>
> Key: HDFS-16687
> URL: https://issues.apache.org/jira/browse/HDFS-16687
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: federation
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> RouterFsckServlet replicates the method "getUGI(HttpServletRequest request, 
> Configuration conf)" from DfsServlet instead of just extending DfsServlet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-16687) RouterFsckServlet replicates code from DfsServlet base class

2022-08-22 Thread Steve Vaughan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Vaughan reopened HDFS-16687:
--

Backporting to 3.3

> RouterFsckServlet replicates code from DfsServlet base class
> 
>
> Key: HDFS-16687
> URL: https://issues.apache.org/jira/browse/HDFS-16687
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: federation
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> RouterFsckServlet replicates the method "getUGI(HttpServletRequest request, 
> Configuration conf)" from DfsServlet instead of just extending DfsServlet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-4043) Namenode Kerberos Login does not use proper hostname for host qualified hdfs principal name.

2022-08-22 Thread Steve Vaughan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Vaughan reopened HDFS-4043:
-

Adding a backport to branch-3.3

> Namenode Kerberos Login does not use proper hostname for host qualified hdfs 
> principal name.
> 
>
> Key: HDFS-4043
> URL: https://issues.apache.org/jira/browse/HDFS-4043
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.0.3-alpha, 
> 3.4.0, 3.3.9
> Environment: CDH4U1 on Ubuntu 12.04
>Reporter: Ahad Rana
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>   Original Estimate: 24h
>  Time Spent: 50m
>  Remaining Estimate: 23h 10m
>
> The Namenode uses the loginAsNameNodeUser method in NameNode.java to login 
> using the hdfs principal. This method in turn invokes SecurityUtil.login with 
> a hostname (last parameter) obtained via a call to InetAddress.getHostName. 
> This call does not always return the fully qualified host name, and thus 
> causes the namenode to login to fail due to kerberos's inability to find a 
> matching hdfs principal in the hdfs.keytab file. Instead it should use 
> InetAddress.getCanonicalHostName. This is consistent with what is used 
> internally by SecurityUtil.java to login in other services, such as the 
> DataNode. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16702) MiniDFSCluster should report cause of exception in assertion error

2022-07-29 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16702:


 Summary: MiniDFSCluster should report cause of exception in 
assertion error
 Key: HDFS-16702
 URL: https://issues.apache.org/jira/browse/HDFS-16702
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
 Environment: Tests running in the Hadoop dev environment image.
Reporter: Steve Vaughan


When the MiniDFSClsuter detects that an exception caused an exit, it should 
include that exception as the cause for the AssertionError that it throws.  The 
current AssertError simply reports the message "Test resulted in an unexpected 
exit" and provides a stack trace to the location of the check for an exit 
exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16691) Use quorum instead of requiring full JN set for NN format

2022-07-25 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16691:


 Summary: Use quorum instead of requiring full JN set for NN format
 Key: HDFS-16691
 URL: https://issues.apache.org/jira/browse/HDFS-16691
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
 Environment: Demonstrated in a Kubernetes environment running Java 11. 
 Using an HA configuration:
 # Start new cluster, but short 1 JN (minimum quorum, and the missing JN won’t 
resolve). VERIFY:
- NN formats the 2 existing JN and stabilizes
- Messages show sync between JN-0 and JN-1, and NN -> JN
 # Scale JN stateful set to add missing JN.  NOTE: Requires HDFS-16690
Reporter: Steve Vaughan


Currently a format request fails if any of the JournalNodes is unresolvable.  
For dynamic cluster environments where a JournalNode may not be available 
during the initial formatting step but JournalNodes can self-heal, it makes 
sense to allow the format to succeed when a quorum of JournalNodes is available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16690) Automatically format new unformatted JournalNodes using JournalNodeSyncer

2022-07-25 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16690:


 Summary: Automatically format new unformatted JournalNodes using 
JournalNodeSyncer 
 Key: HDFS-16690
 URL: https://issues.apache.org/jira/browse/HDFS-16690
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: journal-node
 Environment: Demonstrated in a Kubernetes environment running Java 11.
 # Start new cluster, but short 1 JN (minimum quorum, and the missing JN won’t 
resolve). VERIFY:

 - NN formats the 2 existing JN and stabilizes.  NOTE: Formatting using just a 
quorum will be a separate submission
 - Messages show sync between JN-0 and JN-1, and NN -> JN.

 # Scale JN stateful set to add missing JN. VERIFY:

 - New JN starts
 - All other JN and all NN report IP address change (IP Address resolution).  
NOTE: require HADOOP-18365 and HDFS-16688
 - Messages show sync between all JN, and NN -> JN
 - New JN is formatted at least once (possibly by multiple other JN)
 - New JN storage directory is formatted only once
 - New JN joins cluster (lastWriterEpoch is non-zero)
Reporter: Steve Vaughan


If an unformatted JournalNode is added to an existing JournalNode set, 
instances of the JournalNodeSyncer are unable to sync to the new node.  When a 
sync receives a JournalNotFormattedException, we can initiate a format 
operation, and then retry the synchronization.

Conceptually this means that the JournalNodes and their data can be managed 
independently from the rest of the system, as the JournalNodes will incorporate 
new JournalNode instances.  Once the new JournalNode is formatted, it can 
participate in shared edits from the NameNodes. 

I've been testing an update to the InterQJournalProtocol to add a format call 
like that used by the NameNode.  Current tests include starting an HA cluster 
from scratch, but with 2 JournalNode instances.  Once the cluster is up, I can 
add the 3rd JournalNode (which is unformatted), and the other 2 JournalNodes 
will eventually attempt to sync which results in a formatting and subsequent 
sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16688) Unresolved Hosts during startup are not synced by JournalNodes

2022-07-25 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16688:


 Summary: Unresolved Hosts during startup are not synced by 
JournalNodes
 Key: HDFS-16688
 URL: https://issues.apache.org/jira/browse/HDFS-16688
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: journal-node
 Environment: Running in Kubernetes using Java 11, with an HA 
configuration.
Reporter: Steve Vaughan


During the JournalNode startup, it builds the list of servers in the 
JournalNode set, ignoring hostnames that cannot be resolved.  In environments 
with dynamic IP address allocations this means that the JournalNodeSyncer will 
never sync with hosts that aren't resolvable during startup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16687) RouterFsckServlet replicates code from DfsServlet base class

2022-07-25 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16687:


 Summary: RouterFsckServlet replicates code from DfsServlet base 
class
 Key: HDFS-16687
 URL: https://issues.apache.org/jira/browse/HDFS-16687
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: federation
Reporter: Steve Vaughan


RouterFsckServlet replicates the method "getUGI(HttpServletRequest request, 
Configuration conf)" from DfsServlet instead of just extending DfsServlet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16686) GetJournalEditServlet fails to authorize valid Kerberos request

2022-07-25 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16686:


 Summary: GetJournalEditServlet fails to authorize valid Kerberos 
request
 Key: HDFS-16686
 URL: https://issues.apache.org/jira/browse/HDFS-16686
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: journal-node
 Environment: Running in Kubernetes using Java 11 in an HA 
configuration.  JournalNodes run on separate pods and have their own Kerberos 
principal "jn/@".
Reporter: Steve Vaughan


GetJournalEditServlet uses request.getRemoteuser() to determine the 
remoteShortName for Kerberos authorization, which fails to match when the 
JournalNode uses its own Kerberos principal (e.g. jn/@).

This can be fixed by using the UserGroupInformation provided by the base 
DfsServlet class using the getUGI(request, conf) call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16685) DataNode registration fails because getHostName returns an IP address

2022-07-25 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16685:


 Summary: DataNode registration fails because getHostName returns 
an IP address
 Key: HDFS-16685
 URL: https://issues.apache.org/jira/browse/HDFS-16685
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
 Environment: Run in Kubernetes using Java 11.  
Reporter: Steve Vaughan


The call to dnAddress.getHostName() can return an IP address encoded as a 
string, which is then rejected because unresolved addresses can result in 
performance impacts due to repetitive DNS lookups later.  We can detect when 
this situation occurs, and perform a DNS reverse name lookup to fix the issue.

Bouncing a DataNode in a managed environment results in a new IP address 
allocation, and the new instance fails to register with the NameNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16684) Exclude self from JournalNodeSyncer when using a bind host

2022-07-25 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16684:


 Summary: Exclude self from JournalNodeSyncer when using a bind host
 Key: HDFS-16684
 URL: https://issues.apache.org/jira/browse/HDFS-16684
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: journal-node
 Environment: Running with Java 11 and bind addresses set to 0.0.0.0.
Reporter: Steve Vaughan


The JournalNodeSyncer will include the local instance in syncing when using a 
bind host (e.g. 0.0.0.0).  There is a mechanism that is supposed to exclude the 
local instance, but it doesn't recognize the meta-address as a local address.

Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log 
attempts to sync with itself as part of the normal syncing rotation.  For an HA 
configuration running 3 JournalNodes, the "other" list used by the 
JournalNodeSyncer will include 3 proxies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16625) Unit tests aren't checking for PMDK availability

2022-06-06 Thread Steve Vaughan (Jira)
Steve Vaughan created HDFS-16625:


 Summary: Unit tests aren't checking for PMDK availability
 Key: HDFS-16625
 URL: https://issues.apache.org/jira/browse/HDFS-16625
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 3.4.0, 3.3.4
Reporter: Steve Vaughan


There are unit tests that require native PMDK libraries which aren't checking 
if the library is available, resulting in unsuccessful test.  Adding the 
following in the test setup addresses the problem.
{code:java}
assumeTrue ("Requires PMDK", NativeIO.POSIX.isPmdkAvailable()); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org