[jira] [Commented] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big

2019-11-22 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980612#comment-16980612
 ] 

Eric Yang commented on HDFS-14820:
--

Read buffer size is usually slightly smaller than MTU size.  8k buffer works 
well for jumbo frame network, which is a must-have for 10GB+ network cards.  
Most system default to use MTU size of 1500, maybe a default value close to 
1400 is more sensible without suddenly slow down the default speed to 1/3 of 
what it is capable by default network.

>  The default 8KB buffer of 
> BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
> ---
>
> Key: HDFS-14820
> URL: https://issues.apache.org/jira/browse/HDFS-14820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14820.001.patch, HDFS-14820.002.patch, 
> HDFS-14820.003.patch
>
>
> this issue is similar to HDFS-14535.
> {code:java}
> public static BlockReader newBlockReader(String file,
> ExtendedBlock block,
> Token blockToken,
> long startOffset, long len,
> boolean verifyChecksum,
> String clientName,
> Peer peer, DatanodeID datanodeID,
> PeerCache peerCache,
> CachingStrategy cachingStrategy,
> int networkDistance) throws IOException {
>   // in and out will be closed when sock is closed (by the caller)
>   final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
>   peer.getOutputStream()));
>   new Sender(out).readBlock(block, blockToken, clientName, startOffset, len,
>   verifyChecksum, cachingStrategy);
> }
> public BufferedOutputStream(OutputStream out) {
> this(out, 8192);
> }
> {code}
> Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, 
> verifyChecksum, cachingStrategy) could not use such a big buffer.
> So i think it should reduce BufferedOutputStream buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter

2019-10-28 Thread Eric Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDFS-14730:
-
Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I just committed this to trunk.  Thank you [~zhangchen].

> Remove unused configuration dfs.web.authentication.filter 
> --
>
> Key: HDFS-14730
> URL: https://issues.apache.org/jira/browse/HDFS-14730
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14730.001.patch, HDFS-14730.002.patch
>
>
> After HADOOP-16314, this configuration is not used any where, so I propose to 
> deprecate it to avoid misuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter

2019-10-28 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961343#comment-16961343
 ] 

Eric Yang commented on HDFS-14730:
--

+1 for patch 002.  Will commit to trunk if no objections.

[~zhangchen] Thank you for the patch.


> Remove unused configuration dfs.web.authentication.filter 
> --
>
> Key: HDFS-14730
> URL: https://issues.apache.org/jira/browse/HDFS-14730
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14730.001.patch, HDFS-14730.002.patch
>
>
> After HADOOP-16314, this configuration is not used any where, so I propose to 
> deprecate it to avoid misuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1701) Move dockerbin script to libexec

2019-10-23 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957965#comment-16957965
 ] 

Eric Yang commented on HDDS-1701:
-

[~cxorm] We can not call it bin/docker due to prohibited trademark use in 
[Docker Inc trademark 
guideline|https://www.docker.com/legal/trademark-guidelines].  The scripts are 
used for docker image start up or making configuration conversion during 
bootstrap.  They are referenced by Dockerfile.  There is no need to make 
another script to call scripts in libexec.

> Move dockerbin script to libexec
> 
>
> Key: HDDS-1701
> URL: https://issues.apache.org/jira/browse/HDDS-1701
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: YiSheng Lien
>Priority: Major
>
> Ozone tarball structure contains a new bin script directory called dockerbin. 
>  These utility script can be relocated to OZONE_HOME/libexec because they are 
> internal binaries that are not intended to be executed directly by users or 
> shell scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent

2019-10-21 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956312#comment-16956312
 ] 

Eric Yang commented on HDDS-1847:
-

[~chris.t...@gmail.com] Hadoop 3.3.0+ has changed back to use 
hadoop.http.authentication.kerberos.keytab for securing HTTP protocol with 
Kerberos.  Hadoop unified the SPNEGO settings to make sure that all HTTP ports 
are secured by one global setting.  Ozone is departing from Hadoop, hence, some 
changes may not apply where other changes are worth considering.  There are 
three usability improvements that might help Ozone Kerberos configuration to be 
easier to use.
This ticket is focusing on three problems in Ozone Kerberos config names:

1. Datanode keytab files and principal names are inconsistent.  SPNEGO files 
are prefixed with hdds, but Ozone datanodes are still using dfs prefix.  It 
maybe useful to separate out Ozone deployed datanode config from HDFS to 
prevent confusion.
2. Datanode SPNEGO keytab file name is suffixed with keytab (look like Hadoop 
convention, but other Ozone processes are not suffixed with keytab.file.)
3. Should all SPNEGO keytab file uses the same prefix like Hadoop to prevent 
programming errors?


> Datanode Kerberos principal and keytab config key looks inconsistent
> 
>
> Key: HDDS-1847
> URL: https://issues.apache.org/jira/browse/HDDS-1847
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Eric Yang
>Assignee: Chris Teoh
>Priority: Major
>  Labels: newbie
>
> Ozone Kerberos configuration can be very confusing:
> | config name | Description |
> | hdds.scm.kerberos.principal | SCM service principal |
> | hdds.scm.kerberos.keytab.file | SCM service keytab file |
> | ozone.om.kerberos.principal | Ozone Manager service principal |
> | ozone.om.kerberos.keytab.file | Ozone Manager keytab file |
> | hdds.scm.http.kerberos.principal | SCM service spnego principal |
> | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file |
> | ozone.om.http.kerberos.principal | Ozone Manager spnego principal |
> | ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file |
> | hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file |
> | hdds.datanode.http.kerberos.principal | Datanode spnego principal |
> | dfs.datanode.kerberos.principal | Datanode service principal |
> | dfs.datanode.keytab.file | Datanode service keytab file |
> The prefix are very different for each of the datanode configuration.  It 
> would be nice to have some consistency for datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-10-04 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944755#comment-16944755
 ] 

Eric Yang commented on HDFS-2470:
-

[~weichiu] You might need to backport HDFS-14890, if you intend to apply this 
patch to branch-3.1.

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron Myers
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch, 
> HDFS-2470.06.patch, HDFS-2470.07.patch, HDFS-2470.08.patch, 
> HDFS-2470.09.patch, HDFS-2470.branch-3.1.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14890) Setting permissions on name directory fails on non posix compliant filesystems

2019-10-04 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944688#comment-16944688
 ] 

Eric Yang commented on HDFS-14890:
--

Thank you [~swagle] for the patch.
Thank you [~elgoiri] [~hirik] for the reviews.

I just committed this to trunk and branch-3.2.

> Setting permissions on name directory fails on non posix compliant filesystems
> --
>
> Key: HDFS-14890
> URL: https://issues.apache.org/jira/browse/HDFS-14890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
> Environment: Windows 10.
>Reporter: hirik
>Assignee: Siddharth Wagle
>Priority: Blocker
> Fix For: 3.3.0, 3.2.2
>
> Attachments: HDFS-14890.01.patch
>
>
> Hi,
> HDFS NameNode and JournalNode are not starting in Windows machine. Found 
> below related exception in logs. 
> Caused by: java.lang.UnsupportedOperationExceptionCaused by: 
> java.lang.UnsupportedOperationException
> at java.base/java.nio.file.Files.setPosixFilePermissions(Files.java:2155)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:422)
> at 
> com.slog.dfs.hdfs.nn.NameNodeServiceImpl.delayedStart(NameNodeServiceImpl.java:147)
>  
> Code changes related to this issue: 
> [https://github.com/apache/hadoop/commit/07e3cf952eac9e47e7bd5e195b0f9fc28c468313#diff-1a56e69d50f21b059637cfcbf1d23f11]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14890) Setting permissions on name directory fails on non posix compliant filesystems

2019-10-04 Thread Eric Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDFS-14890:
-
   Fix Version/s: 3.2.2
  3.3.0
Hadoop Flags: Reviewed
Release Note: - Fixed namenode/journal startup on Windows.
Target Version/s: 3.3.0, 3.2.2
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

> Setting permissions on name directory fails on non posix compliant filesystems
> --
>
> Key: HDFS-14890
> URL: https://issues.apache.org/jira/browse/HDFS-14890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
> Environment: Windows 10.
>Reporter: hirik
>Assignee: Siddharth Wagle
>Priority: Blocker
> Fix For: 3.3.0, 3.2.2
>
> Attachments: HDFS-14890.01.patch
>
>
> Hi,
> HDFS NameNode and JournalNode are not starting in Windows machine. Found 
> below related exception in logs. 
> Caused by: java.lang.UnsupportedOperationExceptionCaused by: 
> java.lang.UnsupportedOperationException
> at java.base/java.nio.file.Files.setPosixFilePermissions(Files.java:2155)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:422)
> at 
> com.slog.dfs.hdfs.nn.NameNodeServiceImpl.delayedStart(NameNodeServiceImpl.java:147)
>  
> Code changes related to this issue: 
> [https://github.com/apache/hadoop/commit/07e3cf952eac9e47e7bd5e195b0f9fc28c468313#diff-1a56e69d50f21b059637cfcbf1d23f11]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14890) HDFS NameNode and JournalNode are not starting in Windows

2019-10-04 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944679#comment-16944679
 ] 

Eric Yang commented on HDFS-14890:
--

Update title to reflect the original issue.  +1 for patch 01 for addressing 
regression from HDFS-2470.

> HDFS NameNode and JournalNode are not starting in Windows
> -
>
> Key: HDFS-14890
> URL: https://issues.apache.org/jira/browse/HDFS-14890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
> Environment: Windows 10.
>Reporter: hirik
>Assignee: Siddharth Wagle
>Priority: Blocker
> Attachments: HDFS-14890.01.patch
>
>
> Hi,
> HDFS NameNode and JournalNode are not starting in Windows machine. Found 
> below related exception in logs. 
> Caused by: java.lang.UnsupportedOperationExceptionCaused by: 
> java.lang.UnsupportedOperationException
> at java.base/java.nio.file.Files.setPosixFilePermissions(Files.java:2155)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:422)
> at 
> com.slog.dfs.hdfs.nn.NameNodeServiceImpl.delayedStart(NameNodeServiceImpl.java:147)
>  
> Code changes related to this issue: 
> [https://github.com/apache/hadoop/commit/07e3cf952eac9e47e7bd5e195b0f9fc28c468313#diff-1a56e69d50f21b059637cfcbf1d23f11]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14890) HDFS NameNode and JournalNode are not starting in Windows

2019-10-04 Thread Eric Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDFS-14890:
-
Summary: HDFS NameNode and JournalNode are not starting in Windows  (was: 
HDFS is not starting in Windows)

> HDFS NameNode and JournalNode are not starting in Windows
> -
>
> Key: HDFS-14890
> URL: https://issues.apache.org/jira/browse/HDFS-14890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
> Environment: Windows 10.
>Reporter: hirik
>Assignee: Siddharth Wagle
>Priority: Blocker
> Attachments: HDFS-14890.01.patch
>
>
> Hi,
> HDFS NameNode and JournalNode are not starting in Windows machine. Found 
> below related exception in logs. 
> Caused by: java.lang.UnsupportedOperationExceptionCaused by: 
> java.lang.UnsupportedOperationException
> at java.base/java.nio.file.Files.setPosixFilePermissions(Files.java:2155)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:422)
> at 
> com.slog.dfs.hdfs.nn.NameNodeServiceImpl.delayedStart(NameNodeServiceImpl.java:147)
>  
> Code changes related to this issue: 
> [https://github.com/apache/hadoop/commit/07e3cf952eac9e47e7bd5e195b0f9fc28c468313#diff-1a56e69d50f21b059637cfcbf1d23f11]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14845) Request is a replay (34) error in httpfs

2019-09-23 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936213#comment-16936213
 ] 

Eric Yang commented on HDFS-14845:
--

[~Prabhu Joseph] Thank you for the patch.  Patch 004 looks good to me.  
[~aajisaka] let us know if this looks good on your end.  Thanks

> Request is a replay (34) error in httpfs
> 
>
> Key: HDFS-14845
> URL: https://issues.apache.org/jira/browse/HDFS-14845
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Affects Versions: 3.3.0
> Environment: Kerberos and ZKDelgationTokenSecretManager enabled in 
> HttpFS
>Reporter: Akira Ajisaka
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: HDFS-14845-001.patch, HDFS-14845-002.patch, 
> HDFS-14845-003.patch, HDFS-14845-004.patch
>
>
> We are facing "Request is a replay (34)" error when accessing to HDFS via 
> httpfs on trunk.
> {noformat}
> % curl -i --negotiate -u : "https://:4443/webhdfs/v1/?op=liststatus"
> HTTP/1.1 401 Authentication required
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Pragma: no-cache
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> WWW-Authenticate: Negotiate
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 271
> HTTP/1.1 403 GSSException: Failure unspecified at GSS-API level (Mechanism 
> level: Request is a replay (34))
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Pragma: no-cache
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> (snip)
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 413
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /webhdfs/v1/. Reason:
> GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test

2019-09-23 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936045#comment-16936045
 ] 

Eric Yang commented on HDFS-14461:
--

[~hexiaoqiao] Thank you for the patch.  Patch 005 looks good to me.

+1

> RBF: Fix intermittently failing kerberos related unit test
> --
>
> Key: HDFS-14461
> URL: https://issues.apache.org/jira/browse/HDFS-14461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-14461.001.patch, HDFS-14461.002.patch, 
> HDFS-14461.003.patch, HDFS-14461.004.patch, HDFS-14461.005.patch
>
>
> TestRouterHttpDelegationToken#testGetDelegationToken fails intermittently. It 
> may be due to some race condition before using the keytab that's created for 
> testing.
>  
> {code:java}
>  Failed
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken
>  Failing for the past 1 build (Since 
> [!https://builds.apache.org/static/1e9ab9cc/images/16x16/red.png! 
> #26721|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/] )
>  [Took 89 
> ms.|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/testReport/org.apache.hadoop.hdfs.server.federation.security/TestRouterHttpDelegationToken/testGetDelegationToken/history]
>   
>  Error Message
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED
> h3. Stacktrace
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) 
> at 
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.setup(TestRouterHttpDelegationToken.java:99)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363) at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) 
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> Caused by: org.apache.hadoop.security.KerberosAuthException: failure to 
> login: for principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
> 

[jira] [Comment Edited] (HDFS-14845) Request is a replay (34) error in httpfs

2019-09-20 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934589#comment-16934589
 ] 

Eric Yang edited comment on HDFS-14845 at 9/20/19 5:17 PM:
---

[~Prabhu Joseph] Thank you for the patch.  I tested with these sets of 
configuration and both can work as long as I define 
hadoop.http.authentication.signature.secret.file.

{code}

  hadoop.http.authentication.type
  kerberos



  hadoop.http.authentication.kerberos.principal
  HTTP/host1.example@example.com



  hadoop.http.authentication.kerberos.keytab
  /etc/security/keytabs/spnego.service.keytab



  hadoop.http.authentication.signature.secret.file
  ${httpfs.config.dir}/httpfs-signature.secret



  hadoop.http.filter.initializers
  
org.apache.hadoop.security.authentication.server.ProxyUserAuthenticationFilterInitializer,org.apache.hadoop.security.HttpCrossOriginFilterInitializer



  hadoop.authentication.type
  kerberos



  httpfs.hadoop.authentication.type
  kerberos



  httpfs.hadoop.authentication.kerberos.principal
  nn/host1.example@example.com



  httpfs.hadoop.authentication.kerberos.keytab
  /etc/security/keytabs/hdfs.service.keytab

{code}

Backward compatible config also works:
{code}

  hadoop.http.authentication.type
  kerberos



  httpfs.authentication.signature.secret.file
  ${httpfs.config.dir}/httpfs-signature.secret



  hadoop.http.filter.initializers
  
org.apache.hadoop.security.authentication.server.ProxyUserAuthenticationFilterInitializer,org.apache.hadoop.security.HttpCrossOriginFilterInitializer



  httpfs.authentication.type
  kerberos



  httpfs.hadoop.authentication.type
  kerberos



  httpfs.authentication.kerberos.principal
  HTTP/host-1.example@example.com



  httpfs.authentication.kerberos.keytab
  /etc/security/keytabs/spnego.service.keytab



  httpfs.hadoop.authentication.kerberos.principal
  nn/host-1.example@example.com



  httpfs.hadoop.authentication.kerberos.keytab
  /etc/security/keytabs/hdfs.service.keytab

{code}

When httpfs.authentication.signature.secret.file is undefined in 
httpfs-site.xml, httpfs server doesn't work.

{code}
Exception in thread "main" java.io.IOException: Unable to initialize 
WebAppContext
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1198)
at 
org.apache.hadoop.fs.http.server.HttpFSServerWebServer.start(HttpFSServerWebServer.java:154)
at 
org.apache.hadoop.fs.http.server.HttpFSServerWebServer.main(HttpFSServerWebServer.java:187)
Caused by: java.lang.RuntimeException: Undefined property: signature.secret.file
at 
org.apache.hadoop.fs.http.server.HttpFSAuthenticationFilter.getConfiguration(HttpFSAuthenticationFilter.java:95)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:160)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.init(DelegationTokenAuthenticationFilter.java:180)
at 
org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:139)
at 
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:881)
at 
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:349)
at 
org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1406)
at 
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1368)
at 
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
at 
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
at 
org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:522)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.server.Server.start(Server.java:427)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.Server.doStart(Server.java:394)
  

[jira] [Commented] (HDFS-14845) Request is a replay (34) error in httpfs

2019-09-20 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934589#comment-16934589
 ] 

Eric Yang commented on HDFS-14845:
--

[~Prabhu Joseph] Thank you for the patch.  I tested with these sets of 
configuration and both can work as long as I define 
hadoop.http.authentication.signature.secret.file.

{code}

  hadoop.http.authentication.type
  kerberos



  hadoop.http.authentication.kerberos.principal
  HTTP/host1.example@example.com



  hadoop.http.authentication.kerberos.keytab
  /etc/security/keytabs/spnego.service.keytab



  hadoop.http.authentication.signature.secret.file
  ${httpfs.config.dir}/httpfs-signature.secret



  hadoop.http.filter.initializers
  
org.apache.hadoop.security.authentication.server.ProxyUserAuthenticationFilterInitializer,org.apache.hadoop.security.HttpCrossOriginFilterInitializer



  hadoop.authentication.type
  kerberos



  httpfs.hadoop.authentication.type
  kerberos



  httpfs.hadoop.authentication.kerberos.principal
  nn/host1.example@example.com



  httpfs.hadoop.authentication.kerberos.keytab
  /etc/security/keytabs/hdfs.service.keytab

{code}

Backward compatible config also works:
{code}

  hadoop.http.authentication.type
  kerberos



  httpfs.authentication.signature.secret.file
  ${httpfs.config.dir}/httpfs-signature.secret



  hadoop.http.filter.initializers
  
org.apache.hadoop.security.authentication.server.ProxyUserAuthenticationFilterInitializer,org.apache.hadoop.security.HttpCrossOriginFilterInitializer



  httpfs.authentication.type
  kerberos



  httpfs.hadoop.authentication.type
  kerberos



  httpfs.authentication.kerberos.principal
  HTTP/host-1.example@example.com



  httpfs.authentication.kerberos.keytab
  /etc/security/keytabs/spnego.service.keytab



  httpfs.hadoop.authentication.kerberos.principal
  nn/host-1.example@example.com



  httpfs.hadoop.authentication.kerberos.keytab
  /etc/security/keytabs/hdfs.service.keytab

{code}

When httpfs.authentication.signature.secret.file is undefined in 
httpfs-site.xml, httpfs server doesn't work.

{code}
Exception in thread "main" java.io.IOException: Unable to initialize 
WebAppContext
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1198)
at 
org.apache.hadoop.fs.http.server.HttpFSServerWebServer.start(HttpFSServerWebServer.java:154)
at 
org.apache.hadoop.fs.http.server.HttpFSServerWebServer.main(HttpFSServerWebServer.java:187)
Caused by: java.lang.RuntimeException: Undefined property: signature.secret.file
at 
org.apache.hadoop.fs.http.server.HttpFSAuthenticationFilter.getConfiguration(HttpFSAuthenticationFilter.java:95)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:160)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.init(DelegationTokenAuthenticationFilter.java:180)
at 
org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:139)
at 
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:881)
at 
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:349)
at 
org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1406)
at 
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1368)
at 
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
at 
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
at 
org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:522)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.server.Server.start(Server.java:427)
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.Server.doStart(Server.java:394)
at 

[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter

2019-09-17 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931942#comment-16931942
 ] 

Eric Yang commented on HDFS-14609:
--

It would be great to have a follow up for HDFS-14461.  I have reservation on 
giving +1 because I don't have full visibility if the both patches would be 
doing the right thing together.  Tentatively +1.

> RBF: Security should use common AuthenticationFilter
> 
>
> Key: HDFS-14609
> URL: https://issues.apache.org/jira/browse/HDFS-14609
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, 
> HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch, 
> HDFS-14609.006.patch
>
>
> We worked on router based federation security as part of HDFS-13532. We kept 
> it compatible with the way namenode works. However with HADOOP-16314 and 
> HDFS-16354 in trunk, auth filters seems to have been changed causing tests to 
> fail.
> Changes are needed appropriately in RBF, mainly fixing broken tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14845) Request is a replay (34) error in httpfs

2019-09-17 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931795#comment-16931795
 ] 

Eric Yang commented on HDFS-14845:
--

[~Prabhu Joseph] Thank you for patch 002.

{quote}But most of the testcases related to HttpFSServerWebServer (eg: 
TestHttpFSServer) requires more changes as they did not use HttpServer2 and so 
the filter initializers are not called, instead it uses a Test Jetty Server 
with HttpFSServerWebApp which are failing as the filter won't have any configs.

Please let me know if we can handle this in a separate improvement Jira.{quote}

All HttpFS unit tests are passing on my system.  Which test requires a separate 
ticket?

{quote}Have changed the HttpFSAuthenticationFilter$getConfiguration to honor 
the hadoop.http.authentication configs which will be overridden by 
httpfs.authentication configs.{quote}

Patch 2 works for these configuration:

{code}

  hadoop.http.authentication.type
  kerberos



  hadoop.http.authentication.kerberos.principal
  HTTP/host-1.example@example.com



  hadoop.http.authentication.kerberos.keytab
  /etc/security/keytabs/spnego.service.keytab



  hadoop.http.filter.initializers
  
org.apache.hadoop.security.authentication.server.ProxyUserAuthenticationFilterInitializer,org.apache.hadoop.security.HttpCrossOriginFilterInitializer



  httpfs.authentication.type
  kerberos



  hadoop.authentication.type
  kerberos



  httpfs.hadoop.authentication.type
  kerberos



  httpfs.authentication.kerberos.principal
  HTTP/host-1.example@example.com



  httpfs.authentication.kerberos.keytab
  /etc/security/keytabs/spnego.service.keytab



  httpfs.hadoop.authentication.kerberos.principal
  nn/host-1.example@example.com



  httpfs.hadoop.authentication.kerberos.keytab
  /etc/security/keytabs/hdfs.service.keytab

{code}

It doesn't work when configuration skips httpfs.hadoop.authentication.type, 
httpfs.authentication.kerberos.keytab and 
httpfs.hadoop.authentication.kerberos.principal.  httpfs server doesn't start 
when these config are missing.  I think some logic to map the configuration are 
missing in patch 002.

> Request is a replay (34) error in httpfs
> 
>
> Key: HDFS-14845
> URL: https://issues.apache.org/jira/browse/HDFS-14845
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Affects Versions: 3.3.0
> Environment: Kerberos and ZKDelgationTokenSecretManager enabled in 
> HttpFS
>Reporter: Akira Ajisaka
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: HDFS-14845-001.patch, HDFS-14845-002.patch
>
>
> We are facing "Request is a replay (34)" error when accessing to HDFS via 
> httpfs on trunk.
> {noformat}
> % curl -i --negotiate -u : "https://:4443/webhdfs/v1/?op=liststatus"
> HTTP/1.1 401 Authentication required
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Pragma: no-cache
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> WWW-Authenticate: Negotiate
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 271
> HTTP/1.1 403 GSSException: Failure unspecified at GSS-API level (Mechanism 
> level: Request is a replay (34))
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Pragma: no-cache
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> (snip)
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 413
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /webhdfs/v1/. Reason:
> GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14845) Request is a replay (34) error in httpfs

2019-09-14 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929793#comment-16929793
 ] 

Eric Yang commented on HDFS-14845:
--

[~Prabhu Joseph] Would it be possible that HttpFSAuthenticationFilter is only a 
parameter passing filter to trigger filter initialization like 
ProxyUserAuthenticationFilterInitializer, and the internally route all doGet, 
doPost methods to the initialized filter?  

1. If httpfs.authentication.* are not defined, then fall back to the default 
behavior to be consistent with hadoop.http.authentication.type.  
2. This provides appearance that if httpfs.authentication.type is configured to 
use custom filter, the system will respond consistently with rest of the Hadoop 
web end points.  
3. If httpfs.authentication.type=kerberos, HttpFSAuthenticationFilter is a 
combo of Kerberos + DelegationToken + Proxy support.


> Request is a replay (34) error in httpfs
> 
>
> Key: HDFS-14845
> URL: https://issues.apache.org/jira/browse/HDFS-14845
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Affects Versions: 3.3.0
> Environment: Kerberos and ZKDelgationTokenSecretManager enabled in 
> HttpFS
>Reporter: Akira Ajisaka
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: HDFS-14845-001.patch
>
>
> We are facing "Request is a replay (34)" error when accessing to HDFS via 
> httpfs on trunk.
> {noformat}
> % curl -i --negotiate -u : "https://:4443/webhdfs/v1/?op=liststatus"
> HTTP/1.1 401 Authentication required
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Pragma: no-cache
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> WWW-Authenticate: Negotiate
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 271
> HTTP/1.1 403 GSSException: Failure unspecified at GSS-API level (Mechanism 
> level: Request is a replay (34))
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Pragma: no-cache
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> (snip)
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 413
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /webhdfs/v1/. Reason:
> GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14845) Request is a replay (34) error in httpfs

2019-09-13 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929312#comment-16929312
 ] 

Eric Yang commented on HDFS-14845:
--

[~Prabhu Joseph] Thank you for the patch.  Correct me if I am mistaken.  This 
patch will restore HttpFSAuthenticationFilter instead of enforcing custom 
filters.  When user select to use JWTAuthenticationFilter, it would not apply 
to HttpFS server.  This may not meet user expectation.  The more proper 
solution is to nullified HttpFSAuthenticationFilter, and map authFilter 
initialization to the standard filter initializer only.

[~aajisaka] Let us know if you really intend to use JWTAuthenticationFilter 
with HttpFS, or you only want HttpFSAuthenticationFilter?  Thanks

> Request is a replay (34) error in httpfs
> 
>
> Key: HDFS-14845
> URL: https://issues.apache.org/jira/browse/HDFS-14845
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Affects Versions: 3.3.0
> Environment: Kerberos and ZKDelgationTokenSecretManager enabled in 
> HttpFS
>Reporter: Akira Ajisaka
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: HDFS-14845-001.patch
>
>
> We are facing "Request is a replay (34)" error when accessing to HDFS via 
> httpfs on trunk.
> {noformat}
> % curl -i --negotiate -u : "https://:4443/webhdfs/v1/?op=liststatus"
> HTTP/1.1 401 Authentication required
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Pragma: no-cache
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> WWW-Authenticate: Negotiate
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 271
> HTTP/1.1 403 GSSException: Failure unspecified at GSS-API level (Mechanism 
> level: Request is a replay (34))
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Pragma: no-cache
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> (snip)
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 413
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /webhdfs/v1/. Reason:
> GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter

2019-09-11 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927797#comment-16927797
 ] 

Eric Yang commented on HDFS-14609:
--

[~zhangchen] Thank you for the patch.  Once the check style issue is fixed.  
The rest looks good to me.

> RBF: Security should use common AuthenticationFilter
> 
>
> Key: HDFS-14609
> URL: https://issues.apache.org/jira/browse/HDFS-14609
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, 
> HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch
>
>
> We worked on router based federation security as part of HDFS-13532. We kept 
> it compatible with the way namenode works. However with HADOOP-16314 and 
> HDFS-16354 in trunk, auth filters seems to have been changed causing tests to 
> fail.
> Changes are needed appropriately in RBF, mainly fixing broken tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1554) Create disk tests for fault injection test

2019-09-03 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921541#comment-16921541
 ] 

Eric Yang commented on HDDS-1554:
-

[~arp] closer examination shows that:

{code}
mvn -T 1C clean install -DskipTests=true -Pdist -Dtar -DskipShade 
-Pit,docker-build -Ddocker.image=apache/ozone:0.5.0-SNAPSHOT
{code}

This does not work because skipTests flag is set.

{code}
mvn test -Pit -Ddocker.image=apache/ozone:0.5.0-SNAPSHOT 
{code}

This also doesn't work because the tests are written for integration-test 
phase.  By running test phase only, it does not trigger integration tests to 
run.

The proper command to run, looks like any of the following examples:

{code}
mvn clean install -Pit,docker-build
mvn verify -Pit -Ddocker.image=apache/ozone:0.5.0-SNAPSHOT 
{code}

Hope this clarifies the usage of maven commands for these integration tests.  

If the commands are too cumbersome, we can remove "it" profile.  I prefer to 
avoid docker-build, or docker.image parameters.  They are mandatory today 
because the dist module supports three ways of using docker images.  Hence it 
is necessary to drive from the top level to instruct which image to use.

> Create disk tests for fault injection test
> --
>
> Key: HDDS-1554
> URL: https://issues.apache.org/jira/browse/HDDS-1554
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: build
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1554.001.patch, HDDS-1554.002.patch, 
> HDDS-1554.003.patch, HDDS-1554.004.patch, HDDS-1554.005.patch, 
> HDDS-1554.006.patch, HDDS-1554.007.patch, HDDS-1554.008.patch, 
> HDDS-1554.009.patch, HDDS-1554.010.patch, HDDS-1554.011.patch, 
> HDDS-1554.012.patch, HDDS-1554.013.patch, HDDS-1554.014.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current plan for fault injection disk tests are:
>  # Scenario 1 - Read/Write test
>  ## Run docker-compose to bring up a cluster
>  ## Initialize scm and om
>  ## Upload data to Ozone cluster
>  ## Verify data is correct
>  ## Shutdown cluster
>  # Scenario 2 - Read/Only test
>  ## Repeat Scenario 1
>  ## Mount data disk as read only
>  ## Try to write data to Ozone cluster
>  ## Validate error message is correct
>  ## Shutdown cluster
>  # Scenario 3 - Corruption test
>  ## Repeat Scenario 2
>  ## Shutdown cluster
>  ## Modify data disk data
>  ## Restart cluster
>  ## Validate error message for read from corrupted data
>  ## Validate error message for write to corrupted volume



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test

2019-08-30 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919754#comment-16919754
 ] 

Eric Yang commented on HDFS-14461:
--

[~ayushtkn] Some test enhancement is happening in HDFS-14609.  The 
pre-requisites need to be addressed, then hadoop rbf project can tackle rbf 
Kerberos related test cases.

> RBF: Fix intermittently failing kerberos related unit test
> --
>
> Key: HDFS-14461
> URL: https://issues.apache.org/jira/browse/HDFS-14461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-14461.001.patch, HDFS-14461.002.patch
>
>
> TestRouterHttpDelegationToken#testGetDelegationToken fails intermittently. It 
> may be due to some race condition before using the keytab that's created for 
> testing.
>  
> {code:java}
>  Failed
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken
>  Failing for the past 1 build (Since 
> [!https://builds.apache.org/static/1e9ab9cc/images/16x16/red.png! 
> #26721|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/] )
>  [Took 89 
> ms.|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/testReport/org.apache.hadoop.hdfs.server.federation.security/TestRouterHttpDelegationToken/testGetDelegationToken/history]
>   
>  Error Message
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED
> h3. Stacktrace
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) 
> at 
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.setup(TestRouterHttpDelegationToken.java:99)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363) at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) 
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> Caused by: org.apache.hadoop.security.KerberosAuthException: failure to 
> login: for principal: router/localh...@example.com from keytab 
> 

[jira] [Comment Edited] (HDDS-1554) Create disk tests for fault injection test

2019-08-28 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918264#comment-16918264
 ] 

Eric Yang edited comment on HDDS-1554 at 8/29/19 4:04 AM:
--

[~arp] The test is written to run by specifying the "it" profile.

{code}
mvn -T 1C clean install -DskipTests=true -Pdist -Dtar -DskipShade 
-Pit,docker-build -Ddocker.image=apache/ozone:0.5.0-SNAPSHOT{code}


was (Author: eyang):
[~arp] The test is written to run by specifying the "it" profile.

{code}
mvn -T 1C clean install -DskipTests=true -Pdist -Dtar -DskipShade 
-P,itdocker-build -Ddocker.image=apache/ozone:0.5.0-SNAPSHOT{code}

> Create disk tests for fault injection test
> --
>
> Key: HDDS-1554
> URL: https://issues.apache.org/jira/browse/HDDS-1554
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: build
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1554.001.patch, HDDS-1554.002.patch, 
> HDDS-1554.003.patch, HDDS-1554.004.patch, HDDS-1554.005.patch, 
> HDDS-1554.006.patch, HDDS-1554.007.patch, HDDS-1554.008.patch, 
> HDDS-1554.009.patch, HDDS-1554.010.patch, HDDS-1554.011.patch, 
> HDDS-1554.012.patch, HDDS-1554.013.patch, HDDS-1554.014.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current plan for fault injection disk tests are:
>  # Scenario 1 - Read/Write test
>  ## Run docker-compose to bring up a cluster
>  ## Initialize scm and om
>  ## Upload data to Ozone cluster
>  ## Verify data is correct
>  ## Shutdown cluster
>  # Scenario 2 - Read/Only test
>  ## Repeat Scenario 1
>  ## Mount data disk as read only
>  ## Try to write data to Ozone cluster
>  ## Validate error message is correct
>  ## Shutdown cluster
>  # Scenario 3 - Corruption test
>  ## Repeat Scenario 2
>  ## Shutdown cluster
>  ## Modify data disk data
>  ## Restart cluster
>  ## Validate error message for read from corrupted data
>  ## Validate error message for write to corrupted volume



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1554) Create disk tests for fault injection test

2019-08-28 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918264#comment-16918264
 ] 

Eric Yang commented on HDDS-1554:
-

[~arp] The test is written to run by specifying the "it" profile.

{code}
mvn -T 1C clean install -DskipTests=true -Pdist -Dtar -DskipShade 
-P,itdocker-build -Ddocker.image=apache/ozone:0.5.0-SNAPSHOT{code}

> Create disk tests for fault injection test
> --
>
> Key: HDDS-1554
> URL: https://issues.apache.org/jira/browse/HDDS-1554
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: build
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1554.001.patch, HDDS-1554.002.patch, 
> HDDS-1554.003.patch, HDDS-1554.004.patch, HDDS-1554.005.patch, 
> HDDS-1554.006.patch, HDDS-1554.007.patch, HDDS-1554.008.patch, 
> HDDS-1554.009.patch, HDDS-1554.010.patch, HDDS-1554.011.patch, 
> HDDS-1554.012.patch, HDDS-1554.013.patch, HDDS-1554.014.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current plan for fault injection disk tests are:
>  # Scenario 1 - Read/Write test
>  ## Run docker-compose to bring up a cluster
>  ## Initialize scm and om
>  ## Upload data to Ozone cluster
>  ## Verify data is correct
>  ## Shutdown cluster
>  # Scenario 2 - Read/Only test
>  ## Repeat Scenario 1
>  ## Mount data disk as read only
>  ## Try to write data to Ozone cluster
>  ## Validate error message is correct
>  ## Shutdown cluster
>  # Scenario 3 - Corruption test
>  ## Repeat Scenario 2
>  ## Shutdown cluster
>  ## Modify data disk data
>  ## Restart cluster
>  ## Validate error message for read from corrupted data
>  ## Validate error message for write to corrupted volume



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-08-27 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916891#comment-16916891
 ] 

Eric Yang commented on HDFS-2470:
-

[~arp] Sorry for the confusion.  The patch is fine.  Further analysis revealed 
that my cluster automation script was flawed.  The patch is working fine with 
HBase.  

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch, 
> HDFS-2470.06.patch, HDFS-2470.07.patch, HDFS-2470.08.patch, HDFS-2470.09.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-08-26 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916216#comment-16916216
 ] 

Eric Yang commented on HDFS-2470:
-

[~swagle] Thank you for patch 09, unfortunately, this patch breaks HBase for 
some reason.  HBase does not show exact error, but fail to start HBase Region 
server.  It appears that there is an exception thrown, but the error menifested 
in HBase as ZooKeeper ACL exception:

{code}
2019-08-26 14:45:42,597 WARN  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 client.ZooKeeperSaslClient: Could not login: the client is being asked for a 
password, but the Zookeeper client code does not currently support obtaining a 
password from the user. Make sure that the client is configured to use a ticket 
cache (using the JAAS configuration setting 'useTicketCache=true)' and restart 
the client. If you still get this message after that, the TGT in the ticket 
cache has expired and must be manually refreshed. To do so, first determine if 
you are using a password or a keytab. If the former, run kinit in a Unix shell 
in the environment of the user who is running this Zookeeper client using the 
command 'kinit ' (where  is the name of the client's Kerberos 
principal). If the latter, do 'kinit -k -t  ' (where  is 
the name of the Kerberos principal, and  is the location of the keytab 
file). After manually refreshing your cache, restart this client. If you 
continue to see this message after manually refreshing your cache, ensure that 
your KDC host's clock is in sync with this host's clock.
2019-08-26 14:45:42,598 WARN  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: SASL configuration failed: 
javax.security.auth.login.LoginException: No password provided Will continue 
connection to Zookeeper server without SASL authentication, if Zookeeper server 
allows it.
2019-08-26 14:45:42,598 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Opening socket connection to server 
eyang-4.vpc.cloudera.com/10.65.53.170:2181
2019-08-26 14:45:42,598 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Socket connection established to 
eyang-4.vpc.cloudera.com/10.65.53.170:2181, initiating session
2019-08-26 14:45:42,601 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Session establishment complete on server 
eyang-4.vpc.cloudera.com/10.65.53.170:2181, sessionid = 0x200010a127c0070, 
negotiated timeout = 6
2019-08-26 14:45:45,659 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] ipc.RpcServer: 
Stopping server on 16020
2019-08-26 14:45:45,659 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] 
token.AuthenticationTokenSecretManager: Stopping leader election, because: 
SecretManager stopping
2019-08-26 14:45:45,660 INFO  [RpcServer.listener,port=16020] ipc.RpcServer: 
RpcServer.listener,port=16020: stopping
2019-08-26 14:45:45,660 INFO  [RpcServer.responder] ipc.RpcServer: 
RpcServer.responder: stopped
2019-08-26 14:45:45,660 INFO  [RpcServer.responder] ipc.RpcServer: 
RpcServer.responder: stopping
2019-08-26 14:45:45,660 FATAL 
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] 
regionserver.HRegionServer: ABORTING region server 
eyang-3.vpc.cloudera.com,16020,1566855941147: Initialization of RS failed.  
Hence aborting RS.
java.io.IOException: Received the shutdown message while waiting.
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:819)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:772)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:744)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:889)
at java.lang.Thread.run(Thread.java:748)
{code}

When the patch is removed, HBase was able to start successfully.  I dig pretty 
deep in HBase source code, but StorageDirectory is not used in the code base.  
I am validated that the Datanode directory default permission doesn't change by 
patch 09.  More studies is required to understand the root cause of the 
incompatibility.

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: 

[jira] [Comment Edited] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-08-26 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916216#comment-16916216
 ] 

Eric Yang edited comment on HDFS-2470 at 8/26/19 11:15 PM:
---

[~swagle] Thank you for patch 09, unfortunately, this patch breaks HBase for 
some reason.  HBase does not show exact error, but fail to start HBase Region 
server.  It appears that there is an exception thrown, but the error menifested 
in HBase as ZooKeeper ACL exception:

{code}
2019-08-26 14:45:42,597 WARN  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 client.ZooKeeperSaslClient: Could not login: the client is being asked for a 
password, but the Zookeeper client code does not currently support obtaining a 
password from the user. Make sure that the client is configured to use a ticket 
cache (using the JAAS configuration setting 'useTicketCache=true)' and restart 
the client. If you still get this message after that, the TGT in the ticket 
cache has expired and must be manually refreshed. To do so, first determine if 
you are using a password or a keytab. If the former, run kinit in a Unix shell 
in the environment of the user who is running this Zookeeper client using the 
command 'kinit ' (where  is the name of the client's Kerberos 
principal). If the latter, do 'kinit -k -t  ' (where  is 
the name of the Kerberos principal, and  is the location of the keytab 
file). After manually refreshing your cache, restart this client. If you 
continue to see this message after manually refreshing your cache, ensure that 
your KDC host's clock is in sync with this host's clock.
2019-08-26 14:45:42,598 WARN  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: SASL configuration failed: 
javax.security.auth.login.LoginException: No password provided Will continue 
connection to Zookeeper server without SASL authentication, if Zookeeper server 
allows it.
2019-08-26 14:45:42,598 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Opening socket connection to server 
eyang-4.vpc.cloudera.com/10.65.53.170:2181
2019-08-26 14:45:42,598 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Socket connection established to 
eyang-4.vpc.cloudera.com/10.65.53.170:2181, initiating session
2019-08-26 14:45:42,601 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Session establishment complete on server 
eyang-4.vpc.cloudera.com/10.65.53.170:2181, sessionid = 0x200010a127c0070, 
negotiated timeout = 6
2019-08-26 14:45:45,659 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] ipc.RpcServer: 
Stopping server on 16020
2019-08-26 14:45:45,659 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] 
token.AuthenticationTokenSecretManager: Stopping leader election, because: 
SecretManager stopping
2019-08-26 14:45:45,660 INFO  [RpcServer.listener,port=16020] ipc.RpcServer: 
RpcServer.listener,port=16020: stopping
2019-08-26 14:45:45,660 INFO  [RpcServer.responder] ipc.RpcServer: 
RpcServer.responder: stopped
2019-08-26 14:45:45,660 INFO  [RpcServer.responder] ipc.RpcServer: 
RpcServer.responder: stopping
2019-08-26 14:45:45,660 FATAL 
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] 
regionserver.HRegionServer: ABORTING region server 
eyang-3.vpc.cloudera.com,16020,1566855941147: Initialization of RS failed.  
Hence aborting RS.
java.io.IOException: Received the shutdown message while waiting.
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:819)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:772)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:744)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:889)
at java.lang.Thread.run(Thread.java:748)
{code}

When the patch is removed, HBase was not able to start successfully.  I dig 
pretty deep in HBase source code, but StorageDirectory is not used in the code 
base.  I am validated that the Datanode directory default permission doesn't 
change by patch 09.  More studies is required to understand the root cause of 
the incompatibility.


was (Author: eyang):
[~swagle] Thank you for patch 09, unfortunately, this patch breaks HBase for 
some reason.  HBase does not show exact error, but fail to start HBase Region 
server.  It appears that there is an exception thrown, but the error menifested 
in HBase as ZooKeeper ACL exception:

{code}
2019-08-26 14:45:42,597 WARN  

[jira] [Commented] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-08-20 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911779#comment-16911779
 ] 

Eric Yang commented on HDFS-2470:
-

[~swagle] Thank you for the patch.  Patch 08 looks good to me.  Pending Jenkins 
results.

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch, 
> HDFS-2470.06.patch, HDFS-2470.07.patch, HDFS-2470.08.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-08-15 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908481#comment-16908481
 ] 

Eric Yang commented on HDFS-2470:
-

[~swagle] Thank you for patch 07.  I think the right fix is not setting root 
directory permission.  HDFS is smart about creating subdirectory from root 
directory.  The root directory is a system admin defined location or the 
provision system should initialize the directory properly with proper ownership 
and permission.  Without setting root directory permission, the solution is 
more generic that works for both /tmp or /tmp/namenode.

Would it be safer to pass in a default permission of 0700 instead of null for 
the constructors that did not accept permission parameter?  In the past, the 
files and directories are created based on user umask.  This cause all files to 
be readable by anyone on standard Linux installation.  For HDFS, ihdfs user 
would want to keep all data private, unless explicitly required by very old 
version of short circuit read.  Hence, it might be useful to pass default 
permission and we can skip the null check to ensure data are secured by default 
unless explicitly allowed.

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch, 
> HDFS-2470.06.patch, HDFS-2470.07.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14375) DataNode cannot serve BlockPool to multiple NameNodes in the different realm

2019-08-15 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908389#comment-16908389
 ] 

Eric Yang commented on HDFS-14375:
--

[~Jihyun.Cho] The first log line indicates that ipc Server authenticated 
dn/testhost1@test1.com to access Datanode in dn/testhost1@test2.com.  

The problem is the second log line in SecurityAuthorizationManager.  It looks 
like a wrong optimization that happened long time ago [on this 
line|https://github.com/apache/hadoop/blame/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ServiceAuthorizationManager.java#L120].
  The original code was comparing [short 
username|https://github.com/apache/hadoop/commit/c3fdd289cf26fa3bb9c0d2d9f906eba769ddd789#diff-90193e5349be2122d5ed915ba38c957dL123].

The original code ensures dn/testhost1@test1.com and 
dn/testhost2@test2.com can both map to the same user in auth_to_local 
rules.  The current implementation compares the raw principals, which skips 
auth_to_local rule mapping and fail authorization incorrectly.  

> DataNode cannot serve BlockPool to multiple NameNodes in the different realm
> 
>
> Key: HDFS-14375
> URL: https://issues.apache.org/jira/browse/HDFS-14375
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.1.1
>Reporter: Jihyun Cho
>Assignee: Jihyun Cho
>Priority: Major
> Attachments: authorize.patch
>
>
> Let me explain the environment for a description.
> {noformat}
> KDC(TEST1.COM) <-- Cross-realm trust -->  KDC(TEST2.COM)
>| |
> NameNode1 NameNode2
>| |
>-- DataNodes (federated) --
> {noformat}
> We configured the secure clusters and federated them.
> * Principal
> ** NameNode1 : nn/_h...@test1.com 
> ** NameNode2 : nn/_h...@test2.com 
> ** DataNodes : dn/_h...@test2.com 
> But DataNodes could not connect to NameNode1 with below error.
> {noformat}
> WARN 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization failed for dn/hadoop-datanode.test@test2.com 
> (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol: this service is only 
> accessible by dn/hadoop-datanode.test@test1.com
> {noformat}
> We have avoided the error with attached patch.
> The patch checks only using {{username}} and {{hostname}} except {{realm}}.
> I think there is no problem. Because if realms are different and no 
> cross-realm setting, they cannot communication each other. If you are worried 
> about this, please let me know.
> In the long run, it would be better if I could set multiple realms for 
> authorize. Like this;
> {noformat}
> 
>   dfs.namenode.kerberos.trust-realms
>   TEST1.COM,TEST2.COM
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-08-14 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907425#comment-16907425
 ] 

Eric Yang commented on HDFS-2470:
-

[~swagle] . Default to 700 is generally a good idea.  StorageDirectory is also 
used by datanode, and there is a [legacy version of HDFS Short 
circuit|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html]
 read that allows datanode storage directory to be controlled using 
dfs.datanode.data.dir.perm config.  By using 700 default, it may create 
incompatible change to application that depends on legacy HDFS short circuit 
read defaults.  Directory permission can default to 700 after the code logic 
checks against all permissions config with dirType to ensure we don't regress.

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch, HDFS-2470.06.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-08-13 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906523#comment-16906523
 ] 

Eric Yang commented on HDFS-2470:
-

Thank you [~swagle] for the patch.  Namenode directory is created with 700 
permssion, however, I think there are still bugs in the implementation.  A few 
questions about patch 006:

in 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java:
{code}
+  if (permission != null) {
+Set permissions = EnumSet.of(OWNER_READ,
+OWNER_WRITE, OWNER_EXECUTE);
+Files.setPosixFilePermissions(root.toPath(), permissions);
+Files.setPosixFilePermissions(curDir.toPath(), permissions);
+  }
{code}

# It looks like if permission variable is passed in, it will use hard coded 
"permissions".  This logic doesn't seem right.  I think you want to map the 
numeric values of permission variable to PosixFilePermission posix enums.
# Can we avoid passing null as parameters to StorageDirectory method?  If it 
has not been defined, would it be possible to compute the default permission 
(dfs.*.storage.dir.perm) from dirType?

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch, HDFS-2470.06.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14375) DataNode cannot serve BlockPool to multiple NameNodes in the different realm

2019-08-12 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905573#comment-16905573
 ] 

Eric Yang commented on HDFS-14375:
--

{quote}I think the main issue is DataNode only authorize its own realm, even if 
the realms are set cross-realm trust.
 To solve this issue, clientPrincipal should be checked multiple cross-realms 
in authorize method.
{quote}
Authorize method is looking into [krbInfo to find the hostname from the service 
principal to find a 
match|https://github.com/apache/hadoop/blame/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ServiceAuthorizationManager.java#L109].
 If client access datanode and passed authentication negotiation, client ticket 
cache will have datanode hostname in ticket cache. Hadoop code does not inspect 
realm part of the principal name in authorize method, but merely validate that 
client ticket cache contains the hostname name of datanode. One way to validate 
that cross-realm authentication is to look at klist output and make sure that:
{code:java}
klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hdfs-d...@example.com

Valid starting   Expires  Service principal
08/12/2019 19:28:17  08/13/2019 19:28:17  krbtgt/example@example.com
renew until 08/19/2019 19:28:17
08/12/2019 20:37:49  08/13/2019 19:28:17  
HTTP/datanode.example2@example2.com
renew until 08/19/2019 19:28:17
{code}
In this example, ticket cache contains user's own krbtgt and also granted 
service principal for host in a different realm.

> DataNode cannot serve BlockPool to multiple NameNodes in the different realm
> 
>
> Key: HDFS-14375
> URL: https://issues.apache.org/jira/browse/HDFS-14375
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.1.1
>Reporter: Jihyun Cho
>Assignee: Jihyun Cho
>Priority: Major
> Attachments: authorize.patch
>
>
> Let me explain the environment for a description.
> {noformat}
> KDC(TEST1.COM) <-- Cross-realm trust -->  KDC(TEST2.COM)
>| |
> NameNode1 NameNode2
>| |
>-- DataNodes (federated) --
> {noformat}
> We configured the secure clusters and federated them.
> * Principal
> ** NameNode1 : nn/_h...@test1.com 
> ** NameNode2 : nn/_h...@test2.com 
> ** DataNodes : dn/_h...@test2.com 
> But DataNodes could not connect to NameNode1 with below error.
> {noformat}
> WARN 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization failed for dn/hadoop-datanode.test@test2.com 
> (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol: this service is only 
> accessible by dn/hadoop-datanode.test@test1.com
> {noformat}
> We have avoided the error with attached patch.
> The patch checks only using {{username}} and {{hostname}} except {{realm}}.
> I think there is no problem. Because if realms are different and no 
> cross-realm setting, they cannot communication each other. If you are worried 
> about this, please let me know.
> In the long run, it would be better if I could set multiple realms for 
> authorize. Like this;
> {noformat}
> 
>   dfs.namenode.kerberos.trust-realms
>   TEST1.COM,TEST2.COM
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14375) DataNode cannot serve BlockPool to multiple NameNodes in the different realm

2019-08-09 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904232#comment-16904232
 ] 

Eric Yang commented on HDFS-14375:
--

This looks like a configuration issue in KDC server to perform cross realm 
trust.  Please verify that krbtgt/test1@test2.com principal has been added 
for cross realm trust to work and vis-vera for bi-directional trust.  You will 
also need to make sure Hadoop's auth_to_local would map the remote realm to the 
same dn user.  UserGroupInformation.getShortName() should be invoked to resolve 
user name instead of manually parsing principal name.  Otherwise, auth_to_local 
rules are skipped and losing hierarchical information often result in 
privileges escalation security holes.

> DataNode cannot serve BlockPool to multiple NameNodes in the different realm
> 
>
> Key: HDFS-14375
> URL: https://issues.apache.org/jira/browse/HDFS-14375
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.1.1
>Reporter: Jihyun Cho
>Assignee: Jihyun Cho
>Priority: Major
> Attachments: authorize.patch
>
>
> Let me explain the environment for a description.
> {noformat}
> KDC(TEST1.COM) <-- Cross-realm trust -->  KDC(TEST2.COM)
>| |
> NameNode1 NameNode2
>| |
>-- DataNodes (federated) --
> {noformat}
> We configured the secure clusters and federated them.
> * Principal
> ** NameNode1 : nn/_h...@test1.com 
> ** NameNode2 : nn/_h...@test2.com 
> ** DataNodes : dn/_h...@test2.com 
> But DataNodes could not connect to NameNode1 with below error.
> {noformat}
> WARN 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization failed for dn/hadoop-datanode.test@test2.com 
> (auth:KERBEROS) for protocol=interface 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol: this service is only 
> accessible by dn/hadoop-datanode.test@test1.com
> {noformat}
> We have avoided the error with attached patch.
> The patch checks only using {{username}} and {{hostname}} except {{realm}}.
> I think there is no problem. Because if realms are different and no 
> cross-realm setting, they cannot communication each other. If you are worried 
> about this, please let me know.
> In the long run, it would be better if I could set multiple realms for 
> authorize. Like this;
> {noformat}
> 
>   dfs.namenode.kerberos.trust-realms
>   TEST1.COM,TEST2.COM
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1554) Create disk tests for fault injection test

2019-08-09 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1554:

Attachment: HDDS-1554.014.patch

> Create disk tests for fault injection test
> --
>
> Key: HDDS-1554
> URL: https://issues.apache.org/jira/browse/HDDS-1554
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: build
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1554.001.patch, HDDS-1554.002.patch, 
> HDDS-1554.003.patch, HDDS-1554.004.patch, HDDS-1554.005.patch, 
> HDDS-1554.006.patch, HDDS-1554.007.patch, HDDS-1554.008.patch, 
> HDDS-1554.009.patch, HDDS-1554.010.patch, HDDS-1554.011.patch, 
> HDDS-1554.012.patch, HDDS-1554.013.patch, HDDS-1554.014.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current plan for fault injection disk tests are:
>  # Scenario 1 - Read/Write test
>  ## Run docker-compose to bring up a cluster
>  ## Initialize scm and om
>  ## Upload data to Ozone cluster
>  ## Verify data is correct
>  ## Shutdown cluster
>  # Scenario 2 - Read/Only test
>  ## Repeat Scenario 1
>  ## Mount data disk as read only
>  ## Try to write data to Ozone cluster
>  ## Validate error message is correct
>  ## Shutdown cluster
>  # Scenario 3 - Corruption test
>  ## Repeat Scenario 2
>  ## Shutdown cluster
>  ## Modify data disk data
>  ## Restart cluster
>  ## Validate error message for read from corrupted data
>  ## Validate error message for write to corrupted volume



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1554) Create disk tests for fault injection test

2019-08-09 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904077#comment-16904077
 ] 

Eric Yang commented on HDDS-1554:
-

[~arp] Thank you for the review.
{quote}ITDiskReadOnly#testReadOnlyDiskStartup - The following block of code can 
probably be removed, since it's really testing that the cluster is read-only in 
safe mode. We have unit tests for that:
{quote}
Correct me, if I am wrong. The tests are not exactly the same. This test is 
triggering validation from Ozone client point of view. The unit test for 
TestVolumeSet#testFailedVolume is written for the server side. The smoke test 
tests the positive test case to ensure volume can be created, but not when disk 
is in read-only mode. I think there is value in test client side response to 
ensure we have better coverage. Thought?
{quote}ITDiskReadOnly#testUpload - do we need to wait for safe mode exit after 
restarting the cluster? Also I think this test is essentially the same as the 
previous one.
{quote}
Safe mode validation is skipped here because Ozone exits on read-only disk. The 
extra wait time only adds formality for wait time. In reality, it would be 
better to keep Ozone daemon running, but keep the file system in safe mode or 
degraded mode that prevents write operations. This would be useful for disaster 
recovery that System admin may want to prevent further damage to disk but 
intend to recover data from Ozone buckets. This test is designed to pass for 
running in read-only mode, and exit strategy mode. Both design are validate.  
Test is more useful, if Ozone daemons don't exit on read-only disk.  I intend 
to add a download test for ITDiskReadOnly as well, if read-only mode can be 
implemented.
{quote}ITDiskCorruption#addCorruption:72 - looks like we have a hard-coded 
path. Should we get from configuration instead?
{quote}
Thank you for the suggestion.  I made adjustment to ensure maven project build 
directory can be customized in patch 014.  The test is using 
${buildDirectory}/data/meta to store metadata, which defaults to maven 
${project.build.directory}. It will corrupt the data file. Placing the data 
file in maven build directory is a good way to ensure that mvn clean will reset 
the state of the data file cleanly. When this is configured externally, then 
external mechanism must be developed to reset the data file state.
{quote}ITDiskCorruption#testUpload - The corruption implementation is bit of a 
heavy hammer, it is replacing the content of all meta files. Is it possible to 
make it reflect real-world corruption where a part of the file may be 
corrupted. Also we should probably restart the cluster after corrupting RocksDB 
meta files.
{quote}
If Ozone is restarted after metadata corruption, it will fall into the same 
code path that unable to open rocksdb and fail to start. This will make 
corruption upload test to execution the same code path as 
ITDiskReadOnly#testReadOnlyDiskStartupp. The test would have no purpose. The 
test is purposefully corrupting metadata files without restart. This is to 
ensure safety mechanism will be built to protect metadata integrity. One 
possible design is to have background thread that check for rocksdb health. In 
the test, we can shorten the interval of the check to almost immediate, to 
verify that upload would not be successful when metadata corruption happens, 
and Ozone protect further corruption by entering safe mode or degraded mode.
{quote}ITDiskCorruption#testDownload:161 - should we just remove the assertTrue 
since it is no-op?
{quote}
The intend is to ensure IOException is throw for the test assertion to pass. It 
is better written for clarity:
{code:java}
Assert.assertTrue("Download File test passed.", e instanceof IOException);
{code}

Patch 014 also includes the improved assertTrue statements.

> Create disk tests for fault injection test
> --
>
> Key: HDDS-1554
> URL: https://issues.apache.org/jira/browse/HDDS-1554
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: build
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1554.001.patch, HDDS-1554.002.patch, 
> HDDS-1554.003.patch, HDDS-1554.004.patch, HDDS-1554.005.patch, 
> HDDS-1554.006.patch, HDDS-1554.007.patch, HDDS-1554.008.patch, 
> HDDS-1554.009.patch, HDDS-1554.010.patch, HDDS-1554.011.patch, 
> HDDS-1554.012.patch, HDDS-1554.013.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current plan for fault injection disk tests are:
>  # Scenario 1 - Read/Write test
>  ## Run docker-compose to bring up a cluster
>  ## Initialize scm and om
>  ## Upload data to Ozone cluster
>  ## Verify data is correct
>  ## Shutdown cluster
>  # Scenario 2 - 

[jira] [Commented] (HDDS-1554) Create disk tests for fault injection test

2019-08-08 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903436#comment-16903436
 ] 

Eric Yang commented on HDDS-1554:
-

[~arp] The tests are written to run in integration phase, try:

{code}
mvn verify -Pit,docker-build
{code}

> Create disk tests for fault injection test
> --
>
> Key: HDDS-1554
> URL: https://issues.apache.org/jira/browse/HDDS-1554
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: build
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1554.001.patch, HDDS-1554.002.patch, 
> HDDS-1554.003.patch, HDDS-1554.004.patch, HDDS-1554.005.patch, 
> HDDS-1554.006.patch, HDDS-1554.007.patch, HDDS-1554.008.patch, 
> HDDS-1554.009.patch, HDDS-1554.010.patch, HDDS-1554.011.patch, 
> HDDS-1554.012.patch, HDDS-1554.013.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current plan for fault injection disk tests are:
>  # Scenario 1 - Read/Write test
>  ## Run docker-compose to bring up a cluster
>  ## Initialize scm and om
>  ## Upload data to Ozone cluster
>  ## Verify data is correct
>  ## Shutdown cluster
>  # Scenario 2 - Read/Only test
>  ## Repeat Scenario 1
>  ## Mount data disk as read only
>  ## Try to write data to Ozone cluster
>  ## Validate error message is correct
>  ## Shutdown cluster
>  # Scenario 3 - Corruption test
>  ## Repeat Scenario 2
>  ## Shutdown cluster
>  ## Modify data disk data
>  ## Restart cluster
>  ## Validate error message for read from corrupted data
>  ## Validate error message for write to corrupted volume



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-08-05 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900435#comment-16900435
 ] 

Eric Yang commented on HDFS-2470:
-

[~swagle] Thank you for the patch.

1. File API is ridden with misbehavior for serious filesystem work.  For 
creating directory and setting file permission recursively for the newly 
created directories, use 
[Files|https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html] api 
instead.

  {code}
  import static java.nio.file.attribute.PosixFilePermission.OWNER_READ;
  import static java.nio.file.attribute.PosixFilePermission.OWNER_WRITE;
  
  ...
  
  Set permissions = EnumSet.of(OWNER_READ, OWNER_WRITE);
  Files.createDirectory(Paths.get(curDir), 
PosixFilePermissions.asFileAttribute(permissions));
  {code}

  I am not sure about setting root permission of the working directory.  It 
could be that /tmp/namenode, and accidentally make /tmp read/write only by hdfs 
user and fail.

2. javax.annotation.Nullable is a problematic annotation.  Findbugs uses this 
annotation but it will prevent code from working in JDK 9 to work with signed 
content.  See HADOOP-16463 for detail.  It would be nice to use 
findbugsExcludeFile.xml to define the variable is nullable.

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test

2019-07-30 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896267#comment-16896267
 ] 

Eric Yang commented on HDFS-14461:
--

[~hexiaoqiao] The test cases fail on my system same as Jenkins reported.  
Please make sure that your .m2 maven cache is cleared to ensure your test 
results are accurate.  
TestRouterWithSecureStartup#testStartupWithoutSpnegoPrincipal tests for invalid 
Spnego principal setup by unsetting dfs.web.authentication.kerberos.keytab 
configuration.  The test case can be updated to looking for 
hadoop.http.authentication.kerberos.principal because SecurityConfUtil has been 
updated to use the globally consistent configuration for referencing Spnego 
keytab setup.

TestRouterFaultTolerant#testWriteWithFailedSubcluster also failed because the 
test case is written for simple security.  SecurityConfUtil will turn on SPNEGO 
authentication to http protocol when this patch is applied. This caused client 
to unable to talk to namenode to get block locations if client does not send 
Authentication negotiation header.

> RBF: Fix intermittently failing kerberos related unit test
> --
>
> Key: HDFS-14461
> URL: https://issues.apache.org/jira/browse/HDFS-14461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-14461.001.patch, HDFS-14461.002.patch
>
>
> TestRouterHttpDelegationToken#testGetDelegationToken fails intermittently. It 
> may be due to some race condition before using the keytab that's created for 
> testing.
>  
> {code:java}
>  Failed
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken
>  Failing for the past 1 build (Since 
> [!https://builds.apache.org/static/1e9ab9cc/images/16x16/red.png! 
> #26721|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/] )
>  [Took 89 
> ms.|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/testReport/org.apache.hadoop.hdfs.server.federation.security/TestRouterHttpDelegationToken/testGetDelegationToken/history]
>   
>  Error Message
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED
> h3. Stacktrace
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) 
> at 
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.setup(TestRouterHttpDelegationToken.java:99)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363) at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>  at 
> 

[jira] [Updated] (HDDS-1833) RefCountedDB printing of stacktrace should be moved to trace logging

2019-07-29 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1833:

   Resolution: Fixed
Fix Version/s: 0.5.0
   Status: Resolved  (was: Patch Available)

> RefCountedDB printing of stacktrace should be moved to trace logging
> 
>
> Key: HDDS-1833
> URL: https://issues.apache.org/jira/browse/HDDS-1833
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: newbie
> Fix For: 0.5.0
>
> Attachments: HDDS-1833.01.patch, HDDS-1833.02.patch, 
> HDDS-1833.03.patch, HDDS-1833.04.patch
>
>
> RefCountedDB logs the stackTrace for both increment and decrement, this 
> pollutes the logs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1833) RefCountedDB printing of stacktrace should be moved to trace logging

2019-07-29 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895395#comment-16895395
 ] 

Eric Yang commented on HDDS-1833:
-

+1 Thank you [~swagle] for the patch.  Patch 004 looks good to me.  Committing 
shortly.

> RefCountedDB printing of stacktrace should be moved to trace logging
> 
>
> Key: HDDS-1833
> URL: https://issues.apache.org/jira/browse/HDDS-1833
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: newbie
> Attachments: HDDS-1833.01.patch, HDDS-1833.02.patch, 
> HDDS-1833.03.patch, HDDS-1833.04.patch
>
>
> RefCountedDB logs the stackTrace for both increment and decrement, this 
> pollutes the logs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1833) RefCountedDB printing of stacktrace should be moved to trace logging

2019-07-26 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894172#comment-16894172
 ] 

Eric Yang commented on HDDS-1833:
-

[~swagle] Sorry, I don't think that is true.  

>From [Java 
>spec|https://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.4]:

{quote}
Example 15.12.4.1-2. Evaluation Order During Method Invocation

As part of an instance method invocation (§15.12), there is an expression that 
denotes the object to be invoked. This expression appears to be fully evaluated 
before any part of any argument expression to the method invocation is 
evaluated.{quote}

ExceptionUtils.getStackTrace() is fully evaluated before trace method 
invocation.

> RefCountedDB printing of stacktrace should be moved to trace logging
> 
>
> Key: HDDS-1833
> URL: https://issues.apache.org/jira/browse/HDDS-1833
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: newbie
> Attachments: HDDS-1833.01.patch, HDDS-1833.02.patch, 
> HDDS-1833.03.patch
>
>
> RefCountedDB logs the stackTrace for both increment and decrement, this 
> pollutes the logs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13734) Add Heapsize variables for HDFS daemons

2019-07-26 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894069#comment-16894069
 ] 

Eric Yang commented on HDFS-13734:
--

[~bdscheller] Sorry, I agree with [~aw].  HDFS_*_OPTS is preferred for a number 
of reasons, like -Xms setting and gc policy flags etc.  By giving -Xmx flag 
without optimize other flag may create other problems for novice users and 
complicates config management.  YARN_*_HEAPSIZE are not good examples to follow.

> Add Heapsize variables for HDFS daemons
> ---
>
> Key: HDFS-13734
> URL: https://issues.apache.org/jira/browse/HDFS-13734
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, journal-node, namenode
>Affects Versions: 3.0.3
>Reporter: Brandon Scheller
>Priority: Major
>
> Currently there are no variables to set HDFS daemon heapsize differently. 
> While still possible through adding the -Xmx to HDFS_*DAEMON*_OPTS, this is 
> not intuitive for this relatively common setting.
> YARN currently has these separate YARN_*DAEMON*_HEAPSIZE variables supported 
> so it seems natural for HDFS too.
> It also looks like HDFS use to have this for namenode with 
> HADOOP_NAMENODE_INIT_HEAPSIZE
> This JIRA is to have these configurations added/supported



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test

2019-07-25 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893294#comment-16893294
 ] 

Eric Yang edited comment on HDFS-14461 at 7/26/19 3:43 AM:
---

[~elgoiri] I think it is premature to start using PR.  I have outlined a number 
of short coming using PR in the dev mailing list.  We may want to wait for some 
of the out standing issues to close before recommending PR.
[~hexiaoqiao]
{quote}
1. is there any other way to wait keys until persisted rather than 
Thread.sleep(1000)?
{quote}

{code}
while (!file.exists()) {}
{code}

But it would be nicer, if you do it the way that [~crh] suggested.

{quote}
2. do we need to define configuration item `hadoop.http.authentication.*` at 
CommonConfigurationKeys?
{quote}

I think it goes to: CommonConfigurationKeysPublic.java.  Some keys are already 
there.  I think it's nice but optional.

{quote}
3. I am confused how TestRouterHttpDelegationToken test passed when pre-commit 
HDFS-13972?
{quote}

I think it was testing with anonymous allowed which implicitly passed through, 
but I can't be sure.

{quote}
4. it seems that NoAuthFilter is not effective anymore, and I try to delete it.
{quote}

Ok


was (Author: eyang):
[~elgoiri] I think it is premature to start using PR.  I have outlined a number 
of short coming using PR in the dev mailing list.  We may want to wait for some 
of the out standing issues to close before recommending PR.
[~hexiaoqiao]
{quote}
1. is there any other way to wait keys until persisted rather than 
Thread.sleep(1000)?
{quote}

{code}
while (!file.exists()) {}
{code}

{quote}
2. do we need to define configuration item `hadoop.http.authentication.*` at 
CommonConfigurationKeys?
{quote}

I think it goes to: CommonConfigurationKeysPublic.java.  Some keys are already 
there.  I think it's nice but optional.

{quote}
3. I am confused how TestRouterHttpDelegationToken test passed when pre-commit 
HDFS-13972?
{quote}

I think it was testing with anonymous allowed which implicitly passed through, 
but I can't be sure.

{quote}
4. it seems that NoAuthFilter is not effective anymore, and I try to delete it.
{quote}

Ok

> RBF: Fix intermittently failing kerberos related unit test
> --
>
> Key: HDFS-14461
> URL: https://issues.apache.org/jira/browse/HDFS-14461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-14461.001.patch
>
>
> TestRouterHttpDelegationToken#testGetDelegationToken fails intermittently. It 
> may be due to some race condition before using the keytab that's created for 
> testing.
>  
> {code:java}
>  Failed
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken
>  Failing for the past 1 build (Since 
> [!https://builds.apache.org/static/1e9ab9cc/images/16x16/red.png! 
> #26721|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/] )
>  [Took 89 
> ms.|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/testReport/org.apache.hadoop.hdfs.server.federation.security/TestRouterHttpDelegationToken/testGetDelegationToken/history]
>   
>  Error Message
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED
> h3. Stacktrace
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) 
> at 
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.setup(TestRouterHttpDelegationToken.java:99)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> 

[jira] [Commented] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test

2019-07-25 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893294#comment-16893294
 ] 

Eric Yang commented on HDFS-14461:
--

[~elgoiri] I think it is premature to start using PR.  I have outlined a number 
of short coming using PR in the dev mailing list.  We may want to wait for some 
of the out standing issues to close before recommending PR.
[~hexiaoqiao]
{quote}
1. is there any other way to wait keys until persisted rather than 
Thread.sleep(1000)?
{quote}

{code}
while (!file.exists()) {}
{code}

{quote}
2. do we need to define configuration item `hadoop.http.authentication.*` at 
CommonConfigurationKeys?
{quote}

I think it goes to: CommonConfigurationKeysPublic.java.  Some keys are already 
there.  I think it's nice but optional.

{quote}
3. I am confused how TestRouterHttpDelegationToken test passed when pre-commit 
HDFS-13972?
{quote}

I think it was testing with anonymous allowed which implicitly passed through, 
but I can't be sure.

{quote}
4. it seems that NoAuthFilter is not effective anymore, and I try to delete it.
{quote}

Ok

> RBF: Fix intermittently failing kerberos related unit test
> --
>
> Key: HDFS-14461
> URL: https://issues.apache.org/jira/browse/HDFS-14461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-14461.001.patch
>
>
> TestRouterHttpDelegationToken#testGetDelegationToken fails intermittently. It 
> may be due to some race condition before using the keytab that's created for 
> testing.
>  
> {code:java}
>  Failed
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken
>  Failing for the past 1 build (Since 
> [!https://builds.apache.org/static/1e9ab9cc/images/16x16/red.png! 
> #26721|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/] )
>  [Took 89 
> ms.|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/testReport/org.apache.hadoop.hdfs.server.federation.security/TestRouterHttpDelegationToken/testGetDelegationToken/history]
>   
>  Error Message
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED
> h3. Stacktrace
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) 
> at 
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.setup(TestRouterHttpDelegationToken.java:99)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363) at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>  at 
> 

[jira] [Commented] (HDDS-1833) RefCountedDB printing of stacktrace should be moved to trace logging

2019-07-25 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893208#comment-16893208
 ] 

Eric Yang commented on HDDS-1833:
-

[~swagle] Thank you for the patch.  Generating the full stack may take more 
compute cycles.  If this is a frequently called API, I would recommend to keep 
the if statement to perform stack trace computation only when trace is turned 
on.  If this is not a frequently called API, the patch 3 looks good to me.

> RefCountedDB printing of stacktrace should be moved to trace logging
> 
>
> Key: HDDS-1833
> URL: https://issues.apache.org/jira/browse/HDDS-1833
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: newbie
> Attachments: HDDS-1833.01.patch, HDDS-1833.02.patch, 
> HDDS-1833.03.patch
>
>
> RefCountedDB logs the stackTrace for both increment and decrement, this 
> pollutes the logs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1833) RefCountedDB printing of stacktrace should be moved to trace logging

2019-07-25 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892989#comment-16892989
 ] 

Eric Yang commented on HDDS-1833:
-

[~swagle] Thank you for patch 002.  Unfortunately, this doesn't quite work as I 
intended.  Sorry for the mislead in option 1.  The output may turns out to be:

{code}
java.lang.Thread.getStackTrace(Thread.java:1559)
{code}

h3. Option 2 

This will give one liner output of the current stack:

{code}
LOG.trace("DecRef {} to refCnt {}, stackTrace: {}", containerDBPath,
referenceCount.get(), new Throwable().getStackTrace());
{code}

Output:

{code}
DecRef /test to 0, stackTrace: 
org/apache/hadoop/ozone/container/common/utils/ReferenceCountedDB. 
decrementReference(ReferenceCountedDB.java:64)
{code}

h3. Option 3

If you want a full stack trace, solution is:

{code}
import org.apache.commons.lang3.exception.ExceptionUtils;
...
LOG.trace("DecRef {} to refCnt {}, stackTrace: {}", containerDBPath,
referenceCount.get(), ExceptionUtils.getStackTrace(new Throwable());
{code}

Option 2 is better to know where the line of code originated.
Option 3 is useful to list the full stack.

> RefCountedDB printing of stacktrace should be moved to trace logging
> 
>
> Key: HDDS-1833
> URL: https://issues.apache.org/jira/browse/HDDS-1833
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: newbie
> Attachments: HDDS-1833.01.patch, HDDS-1833.02.patch
>
>
> RefCountedDB logs the stackTrace for both increment and decrement, this 
> pollutes the logs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1857) YARN fails on mapreduce in Kerberos enabled cluster

2019-07-24 Thread Eric Yang (JIRA)
Eric Yang created HDDS-1857:
---

 Summary: YARN fails on mapreduce in Kerberos enabled cluster
 Key: HDDS-1857
 URL: https://issues.apache.org/jira/browse/HDDS-1857
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Eric Yang


When configured Ozone as secure cluster, running mapreduce job on secure YARN 
produces this error message:

{code}
2019-07-23 19:33:12,168 INFO retry.RetryInvocationHandler: 
com.google.protobuf.ServiceException: java.io.IOException: DestHost:destPort 
eyang-1.openstacklocal:9862 , LocalHost:localPort 
eyang-1.openstacklocal/172.26.111.17:0. Failed on local exception: 
java.io.IOException: Couldn't set up IO streams: 
java.util.ServiceConfigurationError: org.apache.hadoop.security.SecurityInfo: 
Provider org.apache.hadoop.yarn.server.RMNMSecurityInfoClass not a subtype, 
while invoking $Proxy13.submitRequest over 
nodeId=null,nodeAddress=eyang-1.openstacklocal:9862 after 9 failover attempts. 
Trying to failover immediately.
2019-07-23 19:33:12,174 ERROR ha.OMFailoverProxyProvider: Failed to connect to 
OM. Attempted 10 retries and 10 failovers
2019-07-23 19:33:12,176 ERROR client.OzoneClientFactory: Couldn't create 
protocol class org.apache.hadoop.ozone.client.rpc.RpcClient exception: 
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291)
at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
at 
org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.(BasicOzoneClientAdapterImpl.java:137)
at 
org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.(BasicOzoneClientAdapterImpl.java:101)
at 
org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.(BasicOzoneClientAdapterImpl.java:86)
at 
org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:34)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.lambda$createAdapter$1(OzoneClientAdapterFactory.java:66)
at 
org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:116)
at 
org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:62)
at 
org.apache.hadoop.fs.ozone.OzoneFileSystem.createAdapter(OzoneFileSystem.java:98)
at 
org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:144)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3338)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:136)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3387)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3355)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:497)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:245)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:481)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233)
at 
org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104)
at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:327)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:390)
Caused by: java.io.IOException: DestHost:destPort eyang-1.openstacklocal:9862 , 
LocalHost:localPort eyang-1.openstacklocal/172.26.111.17:0. Failed on local 
exception: java.io.IOException: Couldn't set up IO streams: 
java.util.ServiceConfigurationError: org.apache.hadoop.security.SecurityInfo: 
Provider 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.security.LocalizerSecurityInfo
 not a subtype
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 

[jira] [Commented] (HDDS-1094) Performance testing infrastructure : Special handling for zero-filled chunks on the Datanode

2019-07-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892115#comment-16892115
 ] 

Eric Yang commented on HDDS-1094:
-

HDDS-1772 implements some tests that fill up datanode disk that might be useful 
here.

> Performance testing infrastructure : Special handling for zero-filled chunks 
> on the Datanode
> 
>
> Key: HDDS-1094
> URL: https://issues.apache.org/jira/browse/HDDS-1094
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Supratim Deka
>Priority: Major
>
> Goal:
> Make Ozone chunk Read/Write operations CPU/network bound for specially 
> constructed performance micro benchmarks.
> Remove disk bandwidth and latency constraints - running ozone data path 
> against extreme low-latency & high throughput storage will expose performance 
> bottlenecks in the flow. But low-latency storage(NVME flash drives, Storage 
> class memory etc) is expensive and availability is limited. Is there a 
> workaround which achieves similar running conditions for the software without 
> actually having the low latency storage? At least for specially constructed 
> datasets -  for example zero-filled blocks (*not* zero-length blocks).
> Required characteristics of the solution:
> No changes in Ozone client, OM and SCM. Changes limited to Datanode, Minimal 
> footprint in datanode code.
> Possible High level Approach:
> The ChunkManager and ChunkUtils can enable writeChunk for zero-filled chunks 
> to be dropped without actually writing to the local filesystem. Similarly, if 
> readChunk can construct a zero-filled buffer without reading from the local 
> filesystem whenever it detects a zero-filled chunk. Specifics of how to 
> detect and record a zero-filled chunk can be discussed on this jira. Also 
> discuss how to control this behaviour and make it available only for internal 
> testing.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1833) RefCountedDB printing of stacktrace should be moved to trace logging

2019-07-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891993#comment-16891993
 ] 

Eric Yang edited comment on HDDS-1833 at 7/24/19 5:16 PM:
--

We usually don't need conditional check unless the log statement contains 
expensive computation before it can be printed.  In this case, we are simply 
concatenating two strings.  By using LOG.trace statement only and let logger 
suppress the output maybe good enough unless we want to include stacktrace.  
new Exception().printStackTrace(); will print the stacktrace to .out file.  
This might result in same related log statement sent to .out file and .log 
file.  This could be hard to correlate.  In general Hadoop does not log 
anything to .out file other than showing the current ulimit configuration.  I 
would suggest to use slf4j feature to render stacktrace in the same log 
statement.

{code}
if (LOG.isTraceEnabled()) {
  LOG.trace("DecRef {} to refCnt {}, stacktrace: {}", containerDBPath,
   referenceCount.get(), Thread.currentThread().getStackTrace());
}
{code}


was (Author: eyang):
We usually don't need conditional check unless the log statement contains 
expensive computation before it can be printed.  In this case, we are simply 
concatenating two strings.  By using LOG.trace statement only and let logger 
suppress the output is good enough.  new Exception().printStackTrace(); will 
print the stacktrace to .out file.  This means we have two duplicated output in 
the logs as well as stdout which is converted to .out log file.  In general 
Hadoop does not log anything to .out file other than showing the current ulimit 
configuration.  I would suggest to skip new Exception().printStackTrace() 
statement all together.

> RefCountedDB printing of stacktrace should be moved to trace logging
> 
>
> Key: HDDS-1833
> URL: https://issues.apache.org/jira/browse/HDDS-1833
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: newbie
> Attachments: HDDS-1833.01.patch
>
>
> RefCountedDB logs the stackTrace for both increment and decrement, this 
> pollutes the logs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1833) RefCountedDB printing of stacktrace should be moved to trace logging

2019-07-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891993#comment-16891993
 ] 

Eric Yang commented on HDDS-1833:
-

We usually don't need conditional check unless the log statement contains 
expensive computation before it can be printed.  In this case, we are simply 
concatenating two strings.  By using LOG.trace statement only and let logger 
suppress the output is good enough.  new Exception().printStackTrace(); will 
print the stacktrace to .out file.  This means we have two duplicated output in 
the logs as well as stdout which is converted to .out log file.  In general 
Hadoop does not log anything to .out file other than showing the current ulimit 
configuration.  I would suggest to skip new Exception().printStackTrace() 
statement all together.

> RefCountedDB printing of stacktrace should be moved to trace logging
> 
>
> Key: HDDS-1833
> URL: https://issues.apache.org/jira/browse/HDDS-1833
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: newbie
> Attachments: HDDS-1833.01.patch
>
>
> RefCountedDB logs the stackTrace for both increment and decrement, this 
> pollutes the logs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent

2019-07-23 Thread Eric Yang (JIRA)
Eric Yang created HDDS-1847:
---

 Summary: Datanode Kerberos principal and keytab config key looks 
inconsistent
 Key: HDDS-1847
 URL: https://issues.apache.org/jira/browse/HDDS-1847
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Eric Yang


Ozone Kerberos configuration can be very confusing:

| config name | Description |
| hdds.scm.kerberos.principal | SCM service principal |
| hdds.scm.kerberos.keytab.file | SCM service keytab file |
| ozone.om.kerberos.principal | Ozone Manager service principal |
| ozone.om.kerberos.keytab.file | Ozone Manager keytab file |
| hdds.scm.http.kerberos.principal | SCM service spnego principal |
| hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file |
| ozone.om.http.kerberos.principal | Ozone Manager spnego principal |
| ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file |
| hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file |
| hdds.datanode.http.kerberos.principal | Datanode spnego principal |
| dfs.datanode.kerberos.principal | Datanode service principal |
| dfs.datanode.keytab.file | Datanode service keytab file |

The prefix are very different for each of the datanode configuration.  It would 
be nice to have some consistency for datanode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test

2019-07-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891178#comment-16891178
 ] 

Eric Yang edited comment on HDFS-14461 at 7/23/19 4:14 PM:
---

[~hexiaoqiao] {quote}SecurityConfUtil#initSecurity does not set principal or 
keytab currently. I try to reference to corresponding SPNEGO principal and 
test.keytab and throws another exception as following,{quote}

I think "Authentication required" is caused by the caller did not send the 
authentication header.

{code}
conf.set(DFSConfigKeys.DFS_WEBHDFS_AUTHENTICATION_FILTER_KEY,
NoAuthFilter.class.getName());
{code}

This above code is setting dfs.web.authentication.filter to no authentication 
filter.  This is what turns off SPNEGO filter.  You should configure it to use 
either AuthenticationFilter or ProxyUserAuthenticationFilter or AuthFilter to 
get proper SPNEGO setup.

HADOOP-16314 and HADOOP-16354 are designed to inspect 
hadoop.http.filter.initializers and if AuthenticationFilter or 
ProxyUserAuthenticationFilter is set in the config.  It will switch to use 
AuthFilter because HDFS uses AuthFilter to issue delegation token.  You were 
closer to getting successful authentication when you get Authentication 
required.  The caller side must send a valid SPNEGO negotiation header that 
looks like this:

{code}
Authenticate: Negotiate [base64 hex string of user tgt]
{code}

Example code for generating the token for kerberos authentication negotiate 
header is available in hadoop-common 
TestKerberosAuthenticationHandler#testRequestWithAuthorization test case.
Please make sure both server side and client side configuration have Kerberos 
turned on, otherwise client may not send the required header for authentication.


was (Author: eyang):
[~hexiaoqiao] {quote}SecurityConfUtil#initSecurity does not set principal or 
keytab currently. I try to reference to corresponding SPNEGO principal and 
test.keytab and throws another exception as following,{quote}

I think "Authentication required" is caused by the caller did not send the 
authentication header.

{code}
conf.set(DFSConfigKeys.DFS_WEBHDFS_AUTHENTICATION_FILTER_KEY,
NoAuthFilter.class.getName());
{code}

This above code is setting dfs.web.authentication.filter to no authentication 
filter.  This is what turns off SPNEGO filter.  You should configure it to use 
either AuthenticationFilter or ProxyUserAuthenticationFilter or AuthFilter to 
get proper SPNEGO setup.

HADOOP-16314 and HADOOP-16354 are designed to inspect 
hadoop.http.filter.initializers and if AuthenticationFilter or 
ProxyUserAuthenticationFilter is set in the config.  It will switch to use 
AuthFilter because HDFS uses AuthFilter to issue delegation token.  You were 
closer to getting successful authentication when you get Authentication 
required.  The caller side must send a valid SPNEGO negotiation header that 
looks like this:

{code}
Authorization: Negotiate [base64 hex string of user tgt]
{code}

Example code for generating the token for kerberos authentication negotiate 
header is available in hadoop-common 
TestKerberosAuthenticationHandler#testRequestWithAuthorization test case.
Please make sure both server side and client side configuration have Kerberos 
turned on, otherwise client may not send the required header for authentication.

> RBF: Fix intermittently failing kerberos related unit test
> --
>
> Key: HDFS-14461
> URL: https://issues.apache.org/jira/browse/HDFS-14461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: He Xiaoqiao
>Priority: Major
>
> TestRouterHttpDelegationToken#testGetDelegationToken fails intermittently. It 
> may be due to some race condition before using the keytab that's created for 
> testing.
>  
> {code:java}
>  Failed
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken
>  Failing for the past 1 build (Since 
> [!https://builds.apache.org/static/1e9ab9cc/images/16x16/red.png! 
> #26721|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/] )
>  [Took 89 
> ms.|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/testReport/org.apache.hadoop.hdfs.server.federation.security/TestRouterHttpDelegationToken/testGetDelegationToken/history]
>   
>  Error Message
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED
> h3. Stacktrace
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.security.KerberosAuthException: failure to 

[jira] [Commented] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test

2019-07-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891178#comment-16891178
 ] 

Eric Yang commented on HDFS-14461:
--

[~hexiaoqiao] {quote}SecurityConfUtil#initSecurity does not set principal or 
keytab currently. I try to reference to corresponding SPNEGO principal and 
test.keytab and throws another exception as following,{quote}

I think "Authentication required" is caused by the caller did not send the 
authentication header.

{code}
conf.set(DFSConfigKeys.DFS_WEBHDFS_AUTHENTICATION_FILTER_KEY,
NoAuthFilter.class.getName());
{code}

This above code is setting dfs.web.authentication.filter to no authentication 
filter.  This is what turns off SPNEGO filter.  You should configure it to use 
either AuthenticationFilter or ProxyUserAuthenticationFilter or AuthFilter to 
get proper SPNEGO setup.

HADOOP-16314 and HADOOP-16354 are designed to inspect 
hadoop.http.filter.initializers and if AuthenticationFilter or 
ProxyUserAuthenticationFilter is set in the config.  It will switch to use 
AuthFilter because HDFS uses AuthFilter to issue delegation token.  You were 
closer to getting successful authentication when you get Authentication 
required.  The caller side must send a valid SPNEGO negotiation header that 
looks like this:

{code}
Authorization: Negotiate [base64 hex string of user tgt]
{code}

Example code for generating the token for kerberos authentication negotiate 
header is available in hadoop-common 
TestKerberosAuthenticationHandler#testRequestWithAuthorization test case.
Please make sure both server side and client side configuration have Kerberos 
turned on, otherwise client may not send the required header for authentication.

> RBF: Fix intermittently failing kerberos related unit test
> --
>
> Key: HDFS-14461
> URL: https://issues.apache.org/jira/browse/HDFS-14461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: He Xiaoqiao
>Priority: Major
>
> TestRouterHttpDelegationToken#testGetDelegationToken fails intermittently. It 
> may be due to some race condition before using the keytab that's created for 
> testing.
>  
> {code:java}
>  Failed
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken
>  Failing for the past 1 build (Since 
> [!https://builds.apache.org/static/1e9ab9cc/images/16x16/red.png! 
> #26721|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/] )
>  [Took 89 
> ms.|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/testReport/org.apache.hadoop.hdfs.server.federation.security/TestRouterHttpDelegationToken/testGetDelegationToken/history]
>   
>  Error Message
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED
> h3. Stacktrace
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) 
> at 
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.setup(TestRouterHttpDelegationToken.java:99)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at 

[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-20 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889399#comment-16889399
 ] 

Eric Yang commented on HDDS-1712:
-

[~anu]  {quote}Case in point when you told me that Ozone is full of findbugs 
issues and checkstyle issues. When I asked you to compare with Hadoop you ran 
away, because like this it was blatantly false.{quote}

With regard to findbug issues, Hadoop does not require Findbugs jar file on the 
classpath at runtime.  Most of Hadoop findbugs exclusion were to deal with 
Object serialization generated with protobuf codegen.  The bugs flagged 
manually because of codegen and unfortunate compatibility reasons with keep up 
FSImage mutations.  They are only used as last resort.  Ozone uses annotation 
to suppress findbugs rather quickly and the bugs are not at the same level that 
is hard to solve in Hadoop.  The usage is very different.  Why having Findbugs 
on the classpath is not good?  Findbugs depends on older XML parser, which has 
CVE vulnerabilities.  If we don't need the jar file in the class, please remove 
it from runtime.  It is hard to identify how people would misuse 
vulnerabilities when a collections of them are hidden in the software.  Due 
diligence would help to keep security bugs down.  I offered the patches, and 
Marton said it's good to fix them.  Whether you accept or reject the patches is 
your choice.  If you allow sudo in the container, you will only end up with 
more code that does remote root download and execution at runtime.  This makes 
Ozone more unpredictable and dangerous.  It will be hard to clean up later.

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.hadoop-docker-ozone.patch, 
> HDDS-1712.001.patch, HDDS-1712.002.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-19 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889280#comment-16889280
 ] 

Eric Yang commented on HDDS-1712:
-

{quote}I am -1; on this patch and wasteful discussion. As I have clearly said 
many times; these are to be treated as examples and documentation, not as part 
of the product. Unless there is a change in that status, I am not willing to 
commit this patch.{quote}

With all due respect, I can not agree on this is just examples and 
documentation.  According [Alpha 
cluster|https://hadoop.apache.org/ozone/docs/0.4.0-alpha/runningviadocker.html] 
documentation, this is the first thing that you ask people to try.  No matter 
if you try Ozone from binary, or building from source, in all paths, 
Ozone-runner image is used.  Hence, there is no path that leads to avoid the 
vulnerable docker image according to Ozone website.  Although there is a path 
to manually setup without running smoke test and use tarball binary, this path 
is not documented in any known material.  Hence, this vulernable docker image 
puts everyone who tries Ozone at risk.  [Security is 
mandatory|https://www.apache.org/foundation/how-it-works.html#philosophy] is 
one of Apache's guiding principal.  Please be considerate for others at minimum 
fully document tarball instructions to avoid the mistake, or simply polish the 
code to a more presentable state before release.

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.hadoop-docker-ozone.patch, 
> HDDS-1712.001.patch, HDDS-1712.002.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-19 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1712:

Status: Patch Available  (was: Reopened)

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.hadoop-docker-ozone.patch, 
> HDDS-1712.001.patch, HDDS-1712.002.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-19 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reopened HDDS-1712:
-

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.hadoop-docker-ozone.patch, 
> HDDS-1712.001.patch, HDDS-1712.002.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-19 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889252#comment-16889252
 ] 

Eric Yang commented on HDDS-1712:
-

[~elek] HDDS-1712.001.hadoop-docker-ozone.patch and HDDS-1712.002.patch should 
remove sudo together to make Ozone-runner image less powerful.

I can only get 33 out of 110 test case pass on my own test machine without the 
patch.
When the patch is applied, the same result appears in smoke test report.

I don't have s3 account to validate if s3 test cases would pass.  Please help 
with the verification.  Thanks

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.hadoop-docker-ozone.patch, 
> HDDS-1712.001.patch, HDDS-1712.002.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-19 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1712:

Attachment: HDDS-1712.002.patch

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.hadoop-docker-ozone.patch, 
> HDDS-1712.001.patch, HDDS-1712.002.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-19 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1712:

Attachment: HDDS-1712.001.hadoop-docker-ozone.patch

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.hadoop-docker-ozone.patch, 
> HDDS-1712.001.patch, HDDS-1712.002.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1773) Add intermittent IO disk test to fault injection test

2019-07-19 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889200#comment-16889200
 ] 

Eric Yang commented on HDDS-1773:
-

Patch 005 fixed some minor directory configuration error, and core-site.xml 
setup.

> Add intermittent IO disk test to fault injection test
> -
>
> Key: HDDS-1773
> URL: https://issues.apache.org/jira/browse/HDDS-1773
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1773.001.patch, HDDS-1773.002.patch, 
> HDDS-1773.003.patch, HDDS-1773.004.patch, HDDS-1773.005.patch
>
>
> Disk errors can also be simulated by setting cgroup blkio rate to 0 while 
> Ozone cluster is running.  
> This test will be added to corruption test project and this test will only be 
> performed if there is write access into host cgroup to control the throttle 
> of disk IO.
> Expected result:
> When datanode becomes irresponsive due to slow io, scm must flag the node as 
> unhealthy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1773) Add intermittent IO disk test to fault injection test

2019-07-19 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1773:

Attachment: HDDS-1773.005.patch

> Add intermittent IO disk test to fault injection test
> -
>
> Key: HDDS-1773
> URL: https://issues.apache.org/jira/browse/HDDS-1773
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1773.001.patch, HDDS-1773.002.patch, 
> HDDS-1773.003.patch, HDDS-1773.004.patch, HDDS-1773.005.patch
>
>
> Disk errors can also be simulated by setting cgroup blkio rate to 0 while 
> Ozone cluster is running.  
> This test will be added to corruption test project and this test will only be 
> performed if there is write access into host cgroup to control the throttle 
> of disk IO.
> Expected result:
> When datanode becomes irresponsive due to slow io, scm must flag the node as 
> unhealthy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1773) Add intermittent IO disk test to fault injection test

2019-07-19 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1773:

Attachment: HDDS-1773.004.patch

> Add intermittent IO disk test to fault injection test
> -
>
> Key: HDDS-1773
> URL: https://issues.apache.org/jira/browse/HDDS-1773
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1773.001.patch, HDDS-1773.002.patch, 
> HDDS-1773.003.patch, HDDS-1773.004.patch
>
>
> Disk errors can also be simulated by setting cgroup blkio rate to 0 while 
> Ozone cluster is running.  
> This test will be added to corruption test project and this test will only be 
> performed if there is write access into host cgroup to control the throttle 
> of disk IO.
> Expected result:
> When datanode becomes irresponsive due to slow io, scm must flag the node as 
> unhealthy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1828) smoke test core-site.xml is confusing to user

2019-07-18 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned HDDS-1828:
---

Assignee: Xiaoyu Yao

> smoke test core-site.xml is confusing to user
> -
>
> Key: HDDS-1828
> URL: https://issues.apache.org/jira/browse/HDDS-1828
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Xiaoyu Yao
>Priority: Major
>
> In smoke test, Hadoop code is placed in /opt/hadoop, and Ozone code is placed 
> in /opt/ozone.
> There are two copies of core-site.xml, one in $HADOOP_CONF_DIR, and another 
> one in $OZONE_CONF_DIR.  When user look at the copy in $OZONE_CONF_DIR, 
> core-site.xml is empty.  This may lead to assumption that hadoop is running 
> with local file system.  Most application will reference to core-site.xml on 
> the classpath.  Hence, it depends on how the application is carefully setup 
> to avoid using $OZONE_CONF_DIR as a single node Hadoop.  It may make sense to 
> symlink $OZONE_CONF_DIR to $HADOOP_CONF_DIR to prevent mistakes.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1828) smoke test core-site.xml is confusing to user

2019-07-18 Thread Eric Yang (JIRA)
Eric Yang created HDDS-1828:
---

 Summary: smoke test core-site.xml is confusing to user
 Key: HDDS-1828
 URL: https://issues.apache.org/jira/browse/HDDS-1828
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Eric Yang


In smoke test, Hadoop code is placed in /opt/hadoop, and Ozone code is placed 
in /opt/ozone.
There are two copies of core-site.xml, one in $HADOOP_CONF_DIR, and another one 
in $OZONE_CONF_DIR.  When user look at the copy in $OZONE_CONF_DIR, 
core-site.xml is empty.  This may lead to assumption that hadoop is running 
with local file system.  Most application will reference to core-site.xml on 
the classpath.  Hence, it depends on how the application is carefully setup to 
avoid using $OZONE_CONF_DIR as a single node Hadoop.  It may make sense to 
symlink $OZONE_CONF_DIR to $HADOOP_CONF_DIR to prevent mistakes.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1826) External Ozone client throws exception when accessing data in docker container

2019-07-18 Thread Eric Yang (JIRA)
Eric Yang created HDDS-1826:
---

 Summary: External Ozone client throws exception when accessing 
data in docker container
 Key: HDDS-1826
 URL: https://issues.apache.org/jira/browse/HDDS-1826
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Eric Yang


External Ozone client has trouble accessing Ozone data hosted in docker 
container when data replication is set to 3.  This RPC error message is thrown:

{code}
Error while calling command 
(org.apache.hadoop.ozone.web.ozShell.volume.CreateVolumeHandler@2903c6ff): 
org.apache.hadoop.ipc.RemoteException(com.google.protobuf.InvalidProtocolBufferException):
 While parsing a protocol message, the input ended unexpectedly in the middle 
of a field.  This could mean either than the input has been truncated or that 
an embedded message misreported its own length.
at 
com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:70)
at 
com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:728)
at 
com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
at 
com.google.protobuf.CodedInputStream.readRawVarint32(CodedInputStream.java:378)
at 
com.google.protobuf.CodedInputStream.readEnum(CodedInputStream.java:343)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneAclInfo.(OzoneManagerProtocolProtos.java:42318)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneAclInfo.(OzoneManagerProtocolProtos.java:42236)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneAclInfo$1.parsePartialFrom(OzoneManagerProtocolProtos.java:42366)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneAclInfo$1.parsePartialFrom(OzoneManagerProtocolProtos.java:42361)
at 
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$VolumeInfo.(OzoneManagerProtocolProtos.java:21457)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$VolumeInfo.(OzoneManagerProtocolProtos.java:21376)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$VolumeInfo$1.parsePartialFrom(OzoneManagerProtocolProtos.java:21501)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$VolumeInfo$1.parsePartialFrom(OzoneManagerProtocolProtos.java:21496)
at 
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$CreateVolumeRequest.(OzoneManagerProtocolProtos.java:23836)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$CreateVolumeRequest.(OzoneManagerProtocolProtos.java:23783)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$CreateVolumeRequest$1.parsePartialFrom(OzoneManagerProtocolProtos.java:23887)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$CreateVolumeRequest$1.parsePartialFrom(OzoneManagerProtocolProtos.java:23882)
at 
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OMRequest.(OzoneManagerProtocolProtos.java:2023)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OMRequest.(OzoneManagerProtocolProtos.java:1935)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OMRequest$1.parsePartialFrom(OzoneManagerProtocolProtos.java:2607)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OMRequest$1.parsePartialFrom(OzoneManagerProtocolProtos.java:2602)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:89)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:95)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at 
org.apache.hadoop.ipc.RpcWritable$ProtobufWrapper.readFrom(RpcWritable.java:125)
at 
org.apache.hadoop.ipc.RpcWritable$Buffer.getValue(RpcWritable.java:187)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:514)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

at 

[jira] [Commented] (HDDS-1773) Add intermittent IO disk test to fault injection test

2019-07-18 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888291#comment-16888291
 ] 

Eric Yang commented on HDDS-1773:
-

[~elek] Patch 003 added a throttle-acid.sh for periodically throttle read IO of 
datanode container by throttling for 1 second, and unthrottle for 10 seconds.  
The README file contains instructions on how to use this script as privileged 
user.  Does this work for you?

> Add intermittent IO disk test to fault injection test
> -
>
> Key: HDDS-1773
> URL: https://issues.apache.org/jira/browse/HDDS-1773
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1773.001.patch, HDDS-1773.002.patch, 
> HDDS-1773.003.patch
>
>
> Disk errors can also be simulated by setting cgroup blkio rate to 0 while 
> Ozone cluster is running.  
> This test will be added to corruption test project and this test will only be 
> performed if there is write access into host cgroup to control the throttle 
> of disk IO.
> Expected result:
> When datanode becomes irresponsive due to slow io, scm must flag the node as 
> unhealthy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1773) Add intermittent IO disk test to fault injection test

2019-07-18 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1773:

Attachment: HDDS-1773.003.patch

> Add intermittent IO disk test to fault injection test
> -
>
> Key: HDDS-1773
> URL: https://issues.apache.org/jira/browse/HDDS-1773
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1773.001.patch, HDDS-1773.002.patch, 
> HDDS-1773.003.patch
>
>
> Disk errors can also be simulated by setting cgroup blkio rate to 0 while 
> Ozone cluster is running.  
> This test will be added to corruption test project and this test will only be 
> performed if there is write access into host cgroup to control the throttle 
> of disk IO.
> Expected result:
> When datanode becomes irresponsive due to slow io, scm must flag the node as 
> unhealthy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test

2019-07-18 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888099#comment-16888099
 ] 

Eric Yang commented on HDFS-14461:
--

[~hexiaoqiao] welcome.

> RBF: Fix intermittently failing kerberos related unit test
> --
>
> Key: HDFS-14461
> URL: https://issues.apache.org/jira/browse/HDFS-14461
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: CR Hota
>Assignee: Fengnan Li
>Priority: Major
>
> TestRouterHttpDelegationToken#testGetDelegationToken fails intermittently. It 
> may be due to some race condition before using the keytab that's created for 
> testing.
>  
> {code:java}
>  Failed
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken
>  Failing for the past 1 build (Since 
> [!https://builds.apache.org/static/1e9ab9cc/images/16x16/red.png! 
> #26721|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/] )
>  [Took 89 
> ms.|https://builds.apache.org/job/PreCommit-HDFS-Build/26721/testReport/org.apache.hadoop.hdfs.server.federation.security/TestRouterHttpDelegationToken/testGetDelegationToken/history]
>   
>  Error Message
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED
> h3. Stacktrace
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.security.KerberosAuthException: failure to login: for 
> principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) 
> at 
> org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.setup(TestRouterHttpDelegationToken.java:99)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363) at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) 
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> Caused by: org.apache.hadoop.security.KerberosAuthException: failure to 
> login: for principal: router/localh...@example.com from keytab 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/target/test/data/SecurityConfUtil/test.keytab
>  javax.security.auth.login.LoginException: Integrity check on decrypted field 
> failed (31) - PREAUTH_FAILED at 
> 

[jira] [Comment Edited] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-18 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888095#comment-16888095
 ] 

Eric Yang edited comment on HDDS-1712 at 7/18/19 3:56 PM:
--

[~elek] Your output seems to indicate multiple datanode pods.  This looks 
different than what I would expected, shouldn't it look like this:

{code}
$ kubectl get pod
NAME READY   STATUSRESTARTS   AGE
datanode-0   3/3 Running   0  11m
om-0 1/1 Running   0  11m
s3g-01/1 Running   0  11m
scm-01/1 Running   0  11m
{code}

Where datanode-0 pod has 3 instances running?  We can take this offline in 
HDDS-1825.  However, I think it is not fair to ask removal of sudo patch to 
include a full working smoke test code on kubernetes cluster because the 
kubernetes cluster code is incomplete.  I can include patch for smoke test to 
work with docker-compose cluster, if you are open to this.  Thoughts?


was (Author: eyang):
[~elek] Your output seems to indicate multiple datanode pods.  This looks 
different than what I would expected, shouldn't it look like this:

{code}
{code}
$ kubectl get pod
NAME READY   STATUSRESTARTS   AGE
datanode-0   3/3 Running   0  11m
om-0 1/1 Running   0  11m
s3g-01/1 Running   0  11m
scm-01/1 Running   0  11m
{code}

Where datanode-0 pod has 3 instances running?  We can take this offline in 
HDDS-1825.  However, I think it is not fair to ask removal of sudo patch to 
include a full working smoke test code on kubernetes cluster because the 
kubernetes cluster code is incomplete.  I can include patch for smoke test to 
work with docker-compose cluster, if you are open to this.  Thoughts?

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-18 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888095#comment-16888095
 ] 

Eric Yang commented on HDDS-1712:
-

[~elek] Your output seems to indicate multiple datanode pods.  This looks 
different than what I would expected, shouldn't it look like this:

{code}
{code}
$ kubectl get pod
NAME READY   STATUSRESTARTS   AGE
datanode-0   3/3 Running   0  11m
om-0 1/1 Running   0  11m
s3g-01/1 Running   0  11m
scm-01/1 Running   0  11m
{code}

Where datanode-0 pod has 3 instances running?  We can take this offline in 
HDDS-1825.  However, I think it is not fair to ask removal of sudo patch to 
include a full working smoke test code on kubernetes cluster because the 
kubernetes cluster code is incomplete.  I can include patch for smoke test to 
work with docker-compose cluster, if you are open to this.  Thoughts?

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1825) Kubernetes deployment starts only one data node by default

2019-07-18 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888081#comment-16888081
 ] 

Eric Yang commented on HDDS-1825:
-

I am not familiar with Kubernetes, but the expected output supposed to look 
like this:

{code}
$ kubectl get pod
NAME READY   STATUSRESTARTS   AGE
datanode-0   3/3 Running   0  11m
om-0 1/1 Running   0  11m
s3g-01/1 Running   0  11m
scm-01/1 Running   0  11m
{code}

> Kubernetes deployment starts only one data node by default
> --
>
> Key: HDDS-1825
> URL: https://issues.apache.org/jira/browse/HDDS-1825
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
>
> By following [Ozone 
> wiki|https://cwiki.apache.org/confluence/display/HADOOP/Deploy+Ozone+to+Kubernetes]
>  to deploy Ozone on Kubernetes, the default deployment result looks like this:
> {code}
> $ kubectl get pod
> NAME READY   STATUSRESTARTS   AGE
> datanode-0   0/1 Pending   0  11m
> om-0 0/1 Pending   0  11m
> s3g-00/1 Pending   0  11m
> scm-00/1 Pending   0  11m
> {code}
> There should be three datanodes for Ozone to work.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1825) Kubernetes deployment starts only one data node by default

2019-07-18 Thread Eric Yang (JIRA)
Eric Yang created HDDS-1825:
---

 Summary: Kubernetes deployment starts only one data node by default
 Key: HDDS-1825
 URL: https://issues.apache.org/jira/browse/HDDS-1825
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Eric Yang


By following [Ozone 
wiki|https://cwiki.apache.org/confluence/display/HADOOP/Deploy+Ozone+to+Kubernetes]
 to deploy Ozone on Kubernetes, the default deployment result looks like this:

{code}
$ kubectl get pod
NAME READY   STATUSRESTARTS   AGE
datanode-0   0/1 Pending   0  11m
om-0 0/1 Pending   0  11m
s3g-00/1 Pending   0  11m
scm-00/1 Pending   0  11m
{code}

There should be three datanodes for Ozone to work.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1773) Add intermittent IO disk test to fault injection test

2019-07-17 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887469#comment-16887469
 ] 

Eric Yang commented on HDDS-1773:
-

[~elek] {quote}I am sorry to say, but I have different opinion (as I tried to 
explain earlier). Sometimes it's notable, sometimes it's not.{quote}

Btw, cgroup can controll manually to have greater precision of number of IOs to 
commit to disk as a group operation.  For example:

{code}
echo ":  " > /cgrp/blkio.throttle.io_serviced
{code}

This throttle configuration can be changed base on time intervals to produce 
slow and intermittent IO at fixed interval.  Would this work in your train of 
thoughts?

> Add intermittent IO disk test to fault injection test
> -
>
> Key: HDDS-1773
> URL: https://issues.apache.org/jira/browse/HDDS-1773
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1773.001.patch, HDDS-1773.002.patch
>
>
> Disk errors can also be simulated by setting cgroup blkio rate to 0 while 
> Ozone cluster is running.  
> This test will be added to corruption test project and this test will only be 
> performed if there is write access into host cgroup to control the throttle 
> of disk IO.
> Expected result:
> When datanode becomes irresponsive due to slow io, scm must flag the node as 
> unhealthy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1771) Add slow IO disk test to fault injection test

2019-07-17 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887407#comment-16887407
 ] 

Eric Yang commented on HDDS-1771:
-

{quote}Neither HDDS-1773 nor HDDS-1771 test the problems what I described 
here.{quote}

I am sorry that problem statement in the test methodology was flawed.  One slow 
IO read or write out of 100 is statically insignificant and masked by OS and 
application caches, which I have [explained in HDDS-1773 
comment|https://issues.apache.org/jira/browse/HDDS-1773?focusedCommentId=16882206=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16882206].
  If you are unable to see pass the flaws in the problem statement, then we 
have reach another impasse.

{quote}I can execute it, but what do you expect? What is required for a green 
build? If the disk speed parameters are low the tests will be failed all time 
time. If they are high it will be passed.{quote}

This test is a good way to find a default number of slowest disk that can be 
supported in the current code base.  This default numbers may fail over time 
when Ozone code base becomes more complex.  When that happens, we can identify 
what patch trigger the performance regression, and improve disk health 
calculation based on heuristics.  I think there is some values in running these 
tests in nightly runs.

> Add slow IO disk test to fault injection test
> -
>
> Key: HDDS-1771
> URL: https://issues.apache.org/jira/browse/HDDS-1771
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1771.001.patch, HDDS-1771.002.patch, 
> HDDS-1771.003.patch
>
>
> In fault injection test, one possible simulation is to create slow disk IO.  
> This test can assist in developing a set of timing profiles that works for 
> Ozone cluster.  When we write to a file, the data travels across a bunch of 
> buffers and caches before it is effectively written to the disk.  By 
> controlling cgroup blkio rate in Linux Kernel, we can simulate slow disk 
> read, write.  Docker provides the following parameters to control cgroup:
> {code}
> --device-read-bps=""
> --device-write-bps=""
> --device-read-iops=""
> --device-write-iops=""
> {code}
> The test will be added to read/write test with docker compose file as 
> parameters to test the timing profiles.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-17 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887394#comment-16887394
 ] 

Eric Yang commented on HDDS-1712:
-

{quote}Yes, AFAIK it's fine. If you have any error message, let me know. Happy 
to help. (But maybe not in this jira, but using the usual channels...)

(I am just wondering: If you can't deploy, how do you know how does it work? 
How do you know if it's wrong...){quote}

According to kubctl output, the pod configuration does not have 3 datanodes:

{code}
$ kubectl get pod
NAME READY   STATUSRESTARTS   AGE
datanode-0   0/1 Pending   0  11m
om-0 0/1 Pending   0  11m
s3g-00/1 Pending   0  11m
scm-00/1 Pending   0  11m
{code}

ozone.replication is not set to 1, how does this work?

Pod configuration in json format indicates there is no environment variables 
for CORE-SITE.XML.  How does this work?

{code}
$ kubectl get pod -o json 
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"annotations": {
"prdatanodeetheus.io/path": "/prom",
"prdatanodeetheus.io/port": "9882",
"prdatanodeetheus.io/scrape": "true"
},
"creationTimestamp": "2019-07-17T19:30:41Z",
"generateName": "datanode-",
"labels": {
"app": "ozone",
"component": "datanode",
"controller-revision-hash": "datanode-5f4d6556b8",
"statefulset.kubernetes.io/pod-name": "datanode-0"
},
"name": "datanode-0",
"namespace": "default",
"ownerReferences": [
{
"apiVersion": "apps/v1",
"blockOwnerDeletion": true,
"controller": true,
"kind": "StatefulSet",
"name": "datanode",
"uid": "449168e5-c9b9-443c-b65b-475a97e64710"
}
],
"resourceVersion": "99413",
"selfLink": "/api/v1/namespaces/default/pods/datanode-0",
"uid": "46f1ad81-312e-4e33-b0a7-8496937511bd"
},
"spec": {
"affinity": {
"podAntiAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": [
{
"labelSelector": {
"matchExpressions": [
{
"key": "component",
"operator": "In",
"values": [
"datanode"
]
}
]
},
"topologyKey": "kubernetes.io/hostname"
}
]
}
},
"containers": [
{
"args": [
"ozone",
"datanode"
],
"envFrom": [
{
"configMapRef": {
"name": "config"
}
}
],
"image": "eyang/ozone:0.5.0-SNAPSHOT",
"imagePullPolicy": "IfNotPresent",
"name": "datanode",
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"volumeMounts": [
{
"mountPath": "/data",
"name": "data"
},
{
"mountPath": 
"/var/run/secrets/kubernetes.io/serviceaccount",
"name": "default-token-phlhw",
"readOnly": true
}
]
}
],
"dnsPolicy": "ClusterFirst",
"enableServiceLinks": true,
"hostname": "datanode-0",
"priority": 0,
"restartPolicy": "Always",
"schedulerName": "default-scheduler",

[jira] [Commented] (HDDS-1773) Add intermittent IO disk test to fault injection test

2019-07-17 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887388#comment-16887388
 ] 

Eric Yang commented on HDDS-1773:
-

{quote} As I wrote earlier: random read/write failure/slowness.

Let's say one read request form every 100 requests is significant slower than 
the others due to a disk error (I would call it intermittent disk slowness) 
This can't be reproduced with this approach.{quote}

In my [previous 
comment|https://issues.apache.org/jira/browse/HDDS-1773?focusedCommentId=16882206=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16882206],
 I have explained one slow read for every 100 IO is not notable due to OS and 
application caching.  Unless constant throttle is applied for a period of time, 
JVM will have marginal observation on one slow IO.  However, if there are bad 
sectors in the simulated disk.  The statical difference over a period of time 
can be measured, and quantified.  This tool can help to improve Ozone code for 
monitoring disk health.  Is this explanation useful for the approach that is 
taken?

> Add intermittent IO disk test to fault injection test
> -
>
> Key: HDDS-1773
> URL: https://issues.apache.org/jira/browse/HDDS-1773
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1773.001.patch, HDDS-1773.002.patch
>
>
> Disk errors can also be simulated by setting cgroup blkio rate to 0 while 
> Ozone cluster is running.  
> This test will be added to corruption test project and this test will only be 
> performed if there is write access into host cgroup to control the throttle 
> of disk IO.
> Expected result:
> When datanode becomes irresponsive due to slow io, scm must flag the node as 
> unhealthy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-17 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887282#comment-16887282
 ] 

Eric Yang commented on HDDS-1712:
-

{quote}Did you try it out, or is this your expectation?{quote}

I tried it out using [Deploy Ozone to 
Kubernetes|https://cwiki.apache.org/confluence/display/HADOOP/Deploy+Ozone+to+Kubernetes]
 instructions on Ozone wiki.  No successful cluster deployed.  Is this 
instruction up to date?

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14461) RBF: Fix intermittently failing kerberos related unit test

2019-07-17 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887266#comment-16887266
 ] 

Eric Yang commented on HDFS-14461:
--

[~hexiaoqiao] This is not related to HADOOP-16354 or HADOOP-16314.  The error 
message indicate there is a race condition between MiniKdc startup and 
router.start().  The master key for MiniKdc has not been written yet in the 
setup() method, and the router is already attempting to start up and login to 
Kerberos.

If I put a one second sleep after 

{code}
Configuration conf = SecurityConfUtil.initSecurity();
{code}

The router server can start properly.  However, the test failed with:

{code}
[ERROR] 
testGetDelegationToken(org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken)
  Time elapsed: 1.551 s  <<< ERROR!
java.io.IOException: Security enabled but user not authenticated by filter
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:551)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$800(WebHdfsFileSystem.java:136)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:898)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:864)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:663)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:701)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:697)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1749)
at 
org.apache.hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken.testGetDelegationToken(TestRouterHttpDelegationToken.java:120)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 

[jira] [Commented] (HDDS-1771) Add slow IO disk test to fault injection test

2019-07-17 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887186#comment-16887186
 ] 

Eric Yang commented on HDDS-1771:
-

{quote}With an always slow disk the scm can't be started therefore there 
couldn't be any in-flight connections. {quote}

Not true.  Even with a slow disk, it is possible to start scm.  In the case 
where disk IO is barely enough, scm can start, and writing data to disk buffers 
(application side cache), it only started to degrade after some IO operations.  
In this case, IOException may be throw when detecting scm disk is the 
bottleneck.

{quote}It's not a ready to use test, I can't schedule it to run every 
night.{quote}

Not true, try to create a maven job and run:

{code}
mvn -f pom.ozone.xml clean verify -Dmaven.javadoc.skip=true 
-Pit,docker-build,dist
{code}

This command works, when HDDS-1554 and HDDS-1771 patches are both applied.

{quote}I think the real question (at least for me) is that how the 
intermittent/random read/write failures/slowness are handled, but this approach 
can't test these questions.{quote}

Base on our meeting of not conflating separate issues between slow disk and 
intermittent failures.  We have a separate ticket HDDS-1773 for intermittent 
failure.  This was based on your feedback of not conflating separate issues.  
Do you wish to combine both tickets now or continue to discuss them separately?

> Add slow IO disk test to fault injection test
> -
>
> Key: HDDS-1771
> URL: https://issues.apache.org/jira/browse/HDDS-1771
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1771.001.patch, HDDS-1771.002.patch, 
> HDDS-1771.003.patch
>
>
> In fault injection test, one possible simulation is to create slow disk IO.  
> This test can assist in developing a set of timing profiles that works for 
> Ozone cluster.  When we write to a file, the data travels across a bunch of 
> buffers and caches before it is effectively written to the disk.  By 
> controlling cgroup blkio rate in Linux Kernel, we can simulate slow disk 
> read, write.  Docker provides the following parameters to control cgroup:
> {code}
> --device-read-bps=""
> --device-write-bps=""
> --device-read-iops=""
> --device-write-iops=""
> {code}
> The test will be added to read/write test with docker compose file as 
> parameters to test the timing profiles.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886653#comment-16886653
 ] 

Eric Yang commented on HDDS-1712:
-

[~elek] core-site.xml is required because fs.defaultName needs to be specified. 
 If there is no core-site.xml with volume and bucket in URL, then the test code 
does not test Ozone.

[~anu] Doesn't Ozone quick start guide refer to use docker-compose to start the 
cluster?  This puts Docker image on the critical path for most users to try it 
out.  Why ask people to try it out with docker, if you have no intention to 
finish what you started?

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1771) Add slow IO disk test to fault injection test

2019-07-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886645#comment-16886645
 ] 

Eric Yang commented on HDDS-1771:
-

{quote} But can you please defined what is the expected behavior? It's not 
clear (for me) from the tests. I assume that a good test should have some kind 
of assertions. What is the assertion here?{quote}

The existing ITReadWrite tests are supposed to pass unless the user defined 
rate is too slow for normal operations.  When this happens, there should be 
some error message in logs or UI to report unhealthy disk/nodes.

{quote}What is your expectation in case of a very slow hard disk? To drop 
client connections? (If I understood well, this is what you mentioned.). To 
throw an IOException?{quote}

IOException may be thrown on a connection that is in-flight.  If connection has 
not been established, it may throw connection refused or service unavailable 
exceptions.

HA logic and disk health detection logics hasn't been implemented.  The tests 
can be added later, and keep this JIRA as a turning knob for testing slow disks 
to find out the minimum IO rate required for a normal operation.

> Add slow IO disk test to fault injection test
> -
>
> Key: HDDS-1771
> URL: https://issues.apache.org/jira/browse/HDDS-1771
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1771.001.patch, HDDS-1771.002.patch, 
> HDDS-1771.003.patch
>
>
> In fault injection test, one possible simulation is to create slow disk IO.  
> This test can assist in developing a set of timing profiles that works for 
> Ozone cluster.  When we write to a file, the data travels across a bunch of 
> buffers and caches before it is effectively written to the disk.  By 
> controlling cgroup blkio rate in Linux Kernel, we can simulate slow disk 
> read, write.  Docker provides the following parameters to control cgroup:
> {code}
> --device-read-bps=""
> --device-write-bps=""
> --device-read-iops=""
> --device-write-iops=""
> {code}
> The test will be added to read/write test with docker compose file as 
> parameters to test the timing profiles.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-16 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reopened HDDS-1712:
-

Reopen because security is important.

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886364#comment-16886364
 ] 

Eric Yang commented on HDDS-1712:
-

{quote}2. grep for OZONE-SITE instead of CORE-SITE? The workflow is very 
similar to the docker-compose clusters just using kubernetes configmap instead 
of env files.{quote}

Kubernetes configmap of trunk looks like this:
{code}
data:
  OZONE-SITE.XML_hdds.datanode.dir: /data/storage
  OZONE-SITE.XML_ozone.scm.datanode.id.dir: /data
  OZONE-SITE.XML_ozone.metadata.dirs: /data/metadata
  OZONE-SITE.XML_ozone.scm.block.client.address: scm-0.scm
  OZONE-SITE.XML_ozone.om.address: om-0.om
  OZONE-SITE.XML_ozone.scm.client.address: scm-0.scm
  OZONE-SITE.XML_ozone.scm.names: scm-0.scm
  OZONE-SITE.XML_ozone.enabled: "true"
  LOG4J.PROPERTIES_log4j.rootLogger: INFO, stdout
  LOG4J.PROPERTIES_log4j.appender.stdout: org.apache.log4j.ConsoleAppender
  LOG4J.PROPERTIES_log4j.appender.stdout.layout: org.apache.log4j.PatternLayout
  LOG4J.PROPERTIES_log4j.appender.stdout.layout.ConversionPattern: 
'%d{-MM-dd
HH:mm:ss} %-5p %c{1}:%L - %m%n'
{code}

There is no core-site.xml generated.  How can the test case be valid?

{quote}If I understood well we can agree that the mentioned statement was not 
true and kubernetes examples doesn't use replication factor 1.{quote}

I can agree on replication factor of 1 does not apply to k8s tests, but it 
doesn't change the fact that the current k8s tests uses invalid core-site.xml, 
hence the test results passing are questionable.

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886360#comment-16886360
 ] 

Eric Yang commented on HDDS-1712:
-

{quote}It's not enforced. As you can add additional mount, the uid lines also 
can be removed.{quote}

Not enforcing security is exactly what went wrong in smoke test code evolution. 
 Why are we still arguing against enforcing security?

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886343#comment-16886343
 ] 

Eric Yang commented on HDDS-1712:
-

{quote}See my comment in the pull request, this is an independent problem. Even 
without sudo I can do the same (use ubuntu image + mount host path){quote}

Please demonstrate.  If -u ${UID}:${GID} is enforced, and UID does not have 
sudo access, and host mounting paths are permissively allowed?  docker -u flag 
and mounting path can be audited before source code is committed.  By 
implementing a few simple procedures, this will make Ozone docker image more 
secure and less abuse on root power.  We should not provide false impression to 
user that we are starting with -u hadoop, then go behind user's back to run 
sudo curl install.  Otherwise, it breaks user's trust to use Ozone-runner image.

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886332#comment-16886332
 ] 

Eric Yang edited comment on HDDS-1712 at 7/16/19 5:36 PM:
--

[~elek] 

{quote}Definitely not. This patch breaks something which works currently. If 
some of the mentioned points makes harder to post a proper, fully functional 
patch, please fix that issue in advance . Thanks a lot.{quote}

This is quite disappointing.  Two branches arrangement makes it not possible to 
provide fully functional patch upfront.  The docker image must be committed, 
and produced a version, then the sequent patch can reference to the docker 
image.  It is not possible to provide a fully functional patches, unless a 
commit and build tag has been made.

In your own code change, you have done exactly this in HDDS-1799.  You are 
committing pull request 4 without a fully functional pull request 1105.  If you 
give yourself a lower standard because you are in control of the source code.  
Why do you ask higher standard from others?  You should not use the double 
standard on others if you can not meet your own terms.

I will provide a second patch for review, but it will not be the exact code to 
be commit because of the two phase commit issues in current code structure.  
Would you be open to 99% functional patch for the second patch?

{quote}I am not sure about kubernetes. Can you please prove this statement (for 
kubernetes).{quote}

I can't find required core-site.xml values in k8s examples.

{code}
$ pwd
/home/eyang/test/hadoop/hadoop-ozone/dist/src/main/k8s/examples
[eyang@localhost examples]$ grep -R CORE-SITE *
[eyang@localhost examples]$
{code}

How does Kubernetes test work, if core-site.xml contain no configuration?  
Please educate me the process of config files to be generated for Kubernetes.


was (Author: eyang):
[~elek] 

{quote}Definitely not. This patch breaks something which works currently. If 
some of the mentioned points makes harder to post a proper, fully functional 
patch, please fix that issue in advance . Thanks a lot.{quote}

This is quite disappointing.  Two branches arrangement makes it not possible to 
provide fully functional patch upfront.  The docker image must be committed, 
and produced a version, then the sequent patch can reference to the docker 
image.  It is not possible to provide a fully functional patches, unless a 
commit and build tag has been made.

In your own code change, you have done exactly this in HDDS-1799.  You are 
committing pull request 4 without a fully functional pull request 1105.  If you 
give yourself a lower standard because you are in control of the source code.  
Why do you ask higher standard from others?  You should not use the double 
standard on others if you can not meet your own terms.

I will provide a second patch for review, but it will not be the exact code to 
be commit because of the two phase commit issues in current code structure.  
Would you be open to 99% functional patch for the second patch?

{quote}I am not sure about kubernetes. Can you please prove this statement (for 
kubernetes).{quote}

{code}$ pwd
/home/eyang/test/hadoop/hadoop-ozone/dist/src/main/k8s/examples
[eyang@localhost examples]$ grep -R CORE-SITE *
[eyang@localhost examples]${code}

How does Kubernetes test work, if core-site.xml contain no configuration?  
Please educate me the process of config files to be generated for Kubernetes.

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886332#comment-16886332
 ] 

Eric Yang commented on HDDS-1712:
-

[~elek] 

{quote}Definitely not. This patch breaks something which works currently. If 
some of the mentioned points makes harder to post a proper, fully functional 
patch, please fix that issue in advance . Thanks a lot.{quote}

This is quite disappointing.  Two branches arrangement makes it not possible to 
provide fully functional patch upfront.  The docker image must be committed, 
and produced a version, then the sequent patch can reference to the docker 
image.  It is not possible to provide a fully functional patches, unless a 
commit and build tag has been made.

In your own code change, you have done exactly this in HDDS-1799.  You are 
committing pull request 4 without a fully functional pull request 1105.  If you 
give yourself a lower standard because you are in control of the source code.  
Why do you ask higher standard from others?  You should not use the double 
standard on others if you can not meet your own terms.

I will provide a second patch for review, but it will not be the exact code to 
be commit because of the two phase commit issues in current code structure.  
Would you be open to 99% functional patch for the second patch?

{quote}I am not sure about kubernetes. Can you please prove this statement (for 
kubernetes).{quote}

{code}$ pwd
/home/eyang/test/hadoop/hadoop-ozone/dist/src/main/k8s/examples
[eyang@localhost examples]$ grep -R CORE-SITE *
[eyang@localhost examples]${code}

How does Kubernetes test work, if core-site.xml contain no configuration?  
Please educate me the process of config files to be generated for Kubernetes.

> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1771) Add slow IO disk test to fault injection test

2019-07-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886298#comment-16886298
 ] 

Eric Yang edited comment on HDDS-1771 at 7/16/19 4:58 PM:
--

[~elek] This test helps to develop a set of timing profiles for disk IO rates.  
If the disk is too slow, it would be helpful to detect the IO problems and 
present informative error message to system administrator for troubleshoots.  
This can save time in problem determination.

If the disk performance is degraded mode, this test can help to develop Ozone 
reliability logic to throttle client connections and save cpu cycles to perform 
other tasks like replicating rocksdb to other disks or black lists metadata 
disks.


was (Author: eyang):
[~elek] This test helps to develop a set of timing profiles for disk IO rates.  
If the disk is too slow, it would be helpful to detect the IO problems and 
present informative error message to system administrator for troubleshoots.  
This can save time in problem determination.

If the disk performance is degraded mode, this test can help to develop Ozone 
reliability logic to throttle client connections and save cpu cycles to perform 
other tasks like replications and black lists.

> Add slow IO disk test to fault injection test
> -
>
> Key: HDDS-1771
> URL: https://issues.apache.org/jira/browse/HDDS-1771
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1771.001.patch, HDDS-1771.002.patch, 
> HDDS-1771.003.patch
>
>
> In fault injection test, one possible simulation is to create slow disk IO.  
> This test can assist in developing a set of timing profiles that works for 
> Ozone cluster.  When we write to a file, the data travels across a bunch of 
> buffers and caches before it is effectively written to the disk.  By 
> controlling cgroup blkio rate in Linux Kernel, we can simulate slow disk 
> read, write.  Docker provides the following parameters to control cgroup:
> {code}
> --device-read-bps=""
> --device-write-bps=""
> --device-read-iops=""
> --device-write-iops=""
> {code}
> The test will be added to read/write test with docker compose file as 
> parameters to test the timing profiles.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1771) Add slow IO disk test to fault injection test

2019-07-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886298#comment-16886298
 ] 

Eric Yang commented on HDDS-1771:
-

[~elek] This test helps to develop a set of timing profiles for disk IO rates.  
If the disk is too slow, it would be helpful to detect the IO problems and 
present informative error message to system administrator for troubleshoots.  
This can save time in problem determination.

If the disk performance is degraded mode, this test can help to develop Ozone 
reliability logic to throttle client connections and save cpu cycles to perform 
other tasks like replications and black lists.

> Add slow IO disk test to fault injection test
> -
>
> Key: HDDS-1771
> URL: https://issues.apache.org/jira/browse/HDDS-1771
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1771.001.patch, HDDS-1771.002.patch, 
> HDDS-1771.003.patch
>
>
> In fault injection test, one possible simulation is to create slow disk IO.  
> This test can assist in developing a set of timing profiles that works for 
> Ozone cluster.  When we write to a file, the data travels across a bunch of 
> buffers and caches before it is effectively written to the disk.  By 
> controlling cgroup blkio rate in Linux Kernel, we can simulate slow disk 
> read, write.  Docker provides the following parameters to control cgroup:
> {code}
> --device-read-bps=""
> --device-write-bps=""
> --device-read-iops=""
> --device-write-iops=""
> {code}
> The test will be added to read/write test with docker compose file as 
> parameters to test the timing profiles.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1712) Remove sudo access from Ozone docker image

2019-07-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886283#comment-16886283
 ] 

Eric Yang commented on HDDS-1712:
-

[~elek] 
Root can jail break from container, when mounting host level files is allowed, 
such as mounting /etc/passwd, /proc, /sys/fs.  In the Pull Request #1053, it 
demonstrates the danger to give hadoop user root privileges without 
restriction.  By printing a write line to /etc/passwd file, this allows hadoop 
user to install a root user into host.  Hadoop user has the power to create 
chaos, when too much privileges is given.  We can remove the risk by giving it 
non-root privileges access in container.

Hadoop user is given sudo access for binary installation during test runtime.  
The flow of package installation logic can happen during compilation or package 
phase of maven build cycle.  By removing the sudo access, it will force 
developer to rethink how to instrument test into the running container more 
efficiently without the duplicated downloads of test framework from internet in 
the current smoke test.  If we can expand on the idea to build docker image 
after tarball creation (HDDS-1495) rather than current runner image layout, 
then forward progress would be easier.  I find it difficult to operate in 
reactive approach to remove sudo requirement and make the current smoke test 
work with ozone-runner or hadoop-runner because:

# The sudo code is in a separate branch from smoke test.  I can not make smoke 
test changes in this ticket because smoke test logic resides in another branch.
# Many binary download and installation during test run.  It takes quite a long 
time to repeat install binaries during test run.  On flaky internet, the test 
cases fails more frequently due to inability to install test framework rather 
than running the tests.
# The current smoke tests and Kubernetes cluster are working with replication 
factor of 1, and many tests are using empty core-site.xml, hence, the disk 
operations are not distributed.  Hence, I found the current smoke test 
confusing because the test parameters are invalid.
# Need on demand configuration changes - maven resource templating allows to 
modify environment variables prior to startup of test runs.  There is a 
mismatch between test generated volume and bucket and core-site.xml 
configuration.  Bucket creation sequence and configuration file generation, and 
daemon startup are in non-specific order.  The current tests are masking 
problems because a empty configuration leading to use local disk and allowed 
some tests to pass.

To properly address those problems, the conversations are much longer ones.  
This is my reasoning to narrow the scope of this patch to first step of 
removing the root power.  Would you be open to fix smoke test on a follow up 
ticket?



> Remove sudo access from Ozone docker image
> --
>
> Key: HDDS-1712
> URL: https://issues.apache.org/jira/browse/HDDS-1712
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1712.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone docker image is given unlimited sudo access to hadoop user.  This poses 
> a security risk where host level user uid 1000 can attach a debugger to the 
> container process to obtain root access.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1773) Add intermittent IO disk test to fault injection test

2019-07-12 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884241#comment-16884241
 ] 

Eric Yang commented on HDDS-1773:
-

Patch 002 provides setup-acid.sh and cleanup-acid.sh to generate a faulty disk.
Those script requires admin privileges to generate a faulty virtual disk.  
README file contains step by step instruction on how to run ITAcid test case to 
exercise Ozone on the faulty disk.

> Add intermittent IO disk test to fault injection test
> -
>
> Key: HDDS-1773
> URL: https://issues.apache.org/jira/browse/HDDS-1773
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1773.001.patch, HDDS-1773.002.patch
>
>
> Disk errors can also be simulated by setting cgroup blkio rate to 0 while 
> Ozone cluster is running.  
> This test will be added to corruption test project and this test will only be 
> performed if there is write access into host cgroup to control the throttle 
> of disk IO.
> Expected result:
> When datanode becomes irresponsive due to slow io, scm must flag the node as 
> unhealthy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1773) Add intermittent IO disk test to fault injection test

2019-07-12 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1773:

Attachment: HDDS-1773.002.patch

> Add intermittent IO disk test to fault injection test
> -
>
> Key: HDDS-1773
> URL: https://issues.apache.org/jira/browse/HDDS-1773
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1773.001.patch, HDDS-1773.002.patch
>
>
> Disk errors can also be simulated by setting cgroup blkio rate to 0 while 
> Ozone cluster is running.  
> This test will be added to corruption test project and this test will only be 
> performed if there is write access into host cgroup to control the throttle 
> of disk IO.
> Expected result:
> When datanode becomes irresponsive due to slow io, scm must flag the node as 
> unhealthy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1773) Add intermittent IO disk test to fault injection test

2019-07-12 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884164#comment-16884164
 ] 

Eric Yang commented on HDDS-1773:
-

{quote}I agree that it's easy. The problem is that it can't simulate a certain 
type of disk failures.{quote}

Can you give an example?

> Add intermittent IO disk test to fault injection test
> -
>
> Key: HDDS-1773
> URL: https://issues.apache.org/jira/browse/HDDS-1773
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1773.001.patch
>
>
> Disk errors can also be simulated by setting cgroup blkio rate to 0 while 
> Ozone cluster is running.  
> This test will be added to corruption test project and this test will only be 
> performed if there is write access into host cgroup to control the throttle 
> of disk IO.
> Expected result:
> When datanode becomes irresponsive due to slow io, scm must flag the node as 
> unhealthy.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1774) Add disk hang test to fault injection test

2019-07-12 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884161#comment-16884161
 ] 

Eric Yang commented on HDDS-1774:
-

Patch 001 is based on HDDS-1772 patch 3.  This patch adds a disk hang test for 
throttling datanode data disk availability, and run the standard upload and 
download tests.

> Add disk hang test to fault injection test
> --
>
> Key: HDDS-1774
> URL: https://issues.apache.org/jira/browse/HDDS-1774
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1774.001.patch
>
>
> When disk is corrupted, the disk may show behavior of hang in accessing data. 
>  One of the simulation that can be performed is to set disk IO throughput to 
> 0 bytes/sec to simulate disk hang.  Ozone file system client can detect disk 
> access timeout, and proceed to read/write data to another datanode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1774) Add disk hang test to fault injection test

2019-07-12 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1774:

Attachment: HDDS-1774.001.patch

> Add disk hang test to fault injection test
> --
>
> Key: HDDS-1774
> URL: https://issues.apache.org/jira/browse/HDDS-1774
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1774.001.patch
>
>
> When disk is corrupted, the disk may show behavior of hang in accessing data. 
>  One of the simulation that can be performed is to set disk IO throughput to 
> 0 bytes/sec to simulate disk hang.  Ozone file system client can detect disk 
> access timeout, and proceed to read/write data to another datanode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1772) Add disk full test to fault injection test

2019-07-12 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884084#comment-16884084
 ] 

Eric Yang commented on HDDS-1772:
-

Rebase patch 003 to HDDS-1771 patch 003.

> Add disk full test to fault injection test
> --
>
> Key: HDDS-1772
> URL: https://issues.apache.org/jira/browse/HDDS-1772
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1772.001.patch, HDDS-1772.002.patch, 
> HDDS-1772.003.patch
>
>
> In Read-only test, one of the simulation to verify is the data disk becomes 
> full.  This can be tested by using a small Docker data disk to simulate disk 
> full.  When data disk is full, Ozone should continue to operate, and provide 
> read access to Ozone file system.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1772) Add disk full test to fault injection test

2019-07-12 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1772:

Attachment: HDDS-1772.003.patch

> Add disk full test to fault injection test
> --
>
> Key: HDDS-1772
> URL: https://issues.apache.org/jira/browse/HDDS-1772
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1772.001.patch, HDDS-1772.002.patch, 
> HDDS-1772.003.patch
>
>
> In Read-only test, one of the simulation to verify is the data disk becomes 
> full.  This can be tested by using a small Docker data disk to simulate disk 
> full.  When data disk is full, Ozone should continue to operate, and provide 
> read access to Ozone file system.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1771) Add slow IO disk test to fault injection test

2019-07-12 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884016#comment-16884016
 ] 

Eric Yang commented on HDDS-1771:
-

Patch 3 rebased to HDDS-1554 patch 13.
The rate can be customized by using:

{code}
mvn clean verify -Ddisk.read.bps=1mb -Ddisk.read.iops=120 -Ddisk.write.bps=300k 
-Ddisk.write.iops=30 -Pit,docker-build
{code}

This will exercise the test with:

# read rate: 1mb/s, read ops: 120/s
# write rate: 300k/s, write ops: 30/s

> Add slow IO disk test to fault injection test
> -
>
> Key: HDDS-1771
> URL: https://issues.apache.org/jira/browse/HDDS-1771
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1771.001.patch, HDDS-1771.002.patch, 
> HDDS-1771.003.patch
>
>
> In fault injection test, one possible simulation is to create slow disk IO.  
> This test can assist in developing a set of timing profiles that works for 
> Ozone cluster.  When we write to a file, the data travels across a bunch of 
> buffers and caches before it is effectively written to the disk.  By 
> controlling cgroup blkio rate in Linux Kernel, we can simulate slow disk 
> read, write.  Docker provides the following parameters to control cgroup:
> {code}
> --device-read-bps=""
> --device-write-bps=""
> --device-read-iops=""
> --device-write-iops=""
> {code}
> The test will be added to read/write test with docker compose file as 
> parameters to test the timing profiles.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1771) Add slow IO disk test to fault injection test

2019-07-12 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDDS-1771:

Attachment: HDDS-1771.003.patch

> Add slow IO disk test to fault injection test
> -
>
> Key: HDDS-1771
> URL: https://issues.apache.org/jira/browse/HDDS-1771
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
> Attachments: HDDS-1771.001.patch, HDDS-1771.002.patch, 
> HDDS-1771.003.patch
>
>
> In fault injection test, one possible simulation is to create slow disk IO.  
> This test can assist in developing a set of timing profiles that works for 
> Ozone cluster.  When we write to a file, the data travels across a bunch of 
> buffers and caches before it is effectively written to the disk.  By 
> controlling cgroup blkio rate in Linux Kernel, we can simulate slow disk 
> read, write.  Docker provides the following parameters to control cgroup:
> {code}
> --device-read-bps=""
> --device-write-bps=""
> --device-read-iops=""
> --device-write-iops=""
> {code}
> The test will be added to read/write test with docker compose file as 
> parameters to test the timing profiles.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   >