date:20191112

[jira] [Commented] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-12 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972177#comment-16972177
 ] 

Stephen O'Donnell commented on HDDS-2446:
-

I think there could be an argument for merging datanodeDetails and datanodeInfo 
into a single object, but that is likely a very large change and I'm not sure 
its the best idea either.

{quote}
Just a thought, what if we get the state of all available datanode at the start 
of ReplicationManager cycle? We can avoid multiple lookups for same datanode.
{quote}

I had considered this, but it doesn't really give us anything, because:

1. We would need to store the state in a hashMap or similar structure, so we 
still need to pay the price of the lookup per container
2. The cached data could change part way through a run.

In order to make decisions about how to handle any ContainerReplica, we are 
going to need to know the nodeStatus (health and OpState) going forward, and I 
think its cleaner and more efficient if we reference datanodeInfo directly 
within it. The alternative is that we need to pass the NodeManager object into 
anything that needs to deal with the replicas and do a lookup per container via 
the NodeManager. That would not be terrible, but I think both DatanodeDetails 
and DatanodeInfo are tied very closely to registration in SCM, so we should be 
able to control how DatanodeInfo gets created.

> ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
> 
>
> Key: HDDS-2446
> URL: https://issues.apache.org/jira/browse/HDDS-2446
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ContainerReplica object is used by the SCM to track containers reported 
> by the datanodes. The current fields stored in ContainerReplica are:
> {code}
> final private ContainerID containerID;
> final private ContainerReplicaProto.State state;
> final private DatanodeDetails datanodeDetails;
> final private UUID placeOfBirth;
> {code}
> Now we have introduced decommission and maintenance mode, the replication 
> manager (and potentially other parts of the code) need to know the status of 
> the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to 
> make replication decisions.
> The DatanodeDetails object does not carry this information, however the 
> DatanodeInfo object extends DatanodeDetails and does carry the required 
> information.
> As DatanodeInfo extends DatanodeDetails, any place which needs a 
> DatanodeDetails can accept a DatanodeInfo instead.
> In this Jira I propose we change the DatanodeDetails stored in 
> ContainerReplica to DatanodeInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14955) RBF: getQuotaUsage() on mount point should return global quota.

2019-11-12 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972193#comment-16972193
 ] 

Hadoop QA commented on HDFS-14955:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
23s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14955 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985594/HDFS-14955.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 73de4827f87d 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b988487 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28292/testReport/ |
| Max. process+thread count | 2763 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28292/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RBF: getQuotaUsage() on mount point should return global quota.
> ---
>
> Key:

[jira] [Commented] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service

2019-11-12 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972201#comment-16972201
 ] 

Hadoop QA commented on HDFS-13811:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
52s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m  5s{color} 
| {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterFaultTolerant |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-13811 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985596/HDFS-13811.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 648dea06f22d 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b988487 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28293/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28293/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28293/testReport/ |
| Max. process+thread count

[jira] [Created] (HDFS-14980) diskbalancer query command always tries to contact to port 9867

2019-11-12 Thread Nilotpal Nandi (Jira)

Nilotpal Nandi created HDFS-14980:
-

 Summary: diskbalancer query command always tries to contact to 
port 9867
 Key: HDFS-14980
 URL: https://issues.apache.org/jira/browse/HDFS-14980
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: diskbalancer
Reporter: Nilotpal Nandi


disbalancer query commands always tries to connect to port 9867 even when 
datanode IPC port is different.

In this setup , datanode IPC port is set to 20001.

 

diskbalancer report command works fine and connects to IPC port 20001

 
{noformat}
hdfs diskbalancer -report -node 172.27.131.193
19/11/12 08:58:55 INFO command.Command: Processing report command
19/11/12 08:58:57 INFO balancer.KeyManager: Block token params received from 
NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
19/11/12 08:58:57 INFO block.BlockTokenSecretManager: Setting block keys
19/11/12 08:58:57 INFO balancer.KeyManager: Update block keys every 2hrs, 
30mins, 0sec
19/11/12 08:58:58 INFO command.Command: Reporting volume information for 
DataNode(s). These DataNode(s) are parsed from '172.27.131.193'.
Processing report command
Reporting volume information for DataNode(s). These DataNode(s) are parsed from 
'172.27.131.193'.
[172.27.131.193:20001] - : 3 
volumes with node data density 0.05.
[DISK: volume-/dataroot/ycloud/dfs/NEW_DISK1/] - 0.15 used: 
39343871181/259692498944, 0.85 free: 220348627763/259692498944, isFailed: 
False, isReadOnly: False, isSkip: False, isTransient: False.
[DISK: volume-/dataroot/ycloud/dfs/NEW_DISK2/] - 0.15 used: 
39371179986/259692498944, 0.85 free: 220321318958/259692498944, isFailed: 
False, isReadOnly: False, isSkip: False, isTransient: False.
[DISK: volume-/dataroot/ycloud/dfs/dn/] - 0.19 used: 49934903670/259692498944, 
0.81 free: 209757595274/259692498944, isFailed: False, isReadOnly: False, 
isSkip: False, isTransient: False.
 
{noformat}
 

But  diskbalancer query command fails and tries to connect to port 9867 
(default port).

 
{noformat}
hdfs diskbalancer -query 172.27.131.193
19/11/12 06:37:15 INFO command.Command: Executing "query plan" command.
19/11/12 06:37:16 INFO ipc.Client: Retrying connect to server: 
/172.27.131.193:9867. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/11/12 06:37:17 INFO ipc.Client: Retrying connect to server: 
/172.27.131.193:9867. Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
..
..
..

19/11/12 06:37:25 ERROR tools.DiskBalancerCLI: Exception thrown while running 
DiskBalancerCLI.

{noformat}
 

 

Expectation :

diskbalancer query command should work fine without explicitly mentioning 
datanode IPC port address



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14617) Improve fsimage load time by writing sub-sections to the fsimage index

2019-11-12 Thread Feng Yuan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972250#comment-16972250
 ] 

Feng Yuan commented on HDFS-14617:
--

in loadINodeSectionInParallel:
   {code:java}
 new Runnable() {
@Override
public void run() {
 ...
 ...
prog.setCount(Phase.LOADING_FSIMAGE, currentStep,
totalLoaded.get());
... 
...
{code}

why this setCount is not at out of sub-thread func？


> Improve fsimage load time by writing sub-sections to the fsimage index
> --
>
> Key: HDFS-14617
> URL: https://issues.apache.org/jira/browse/HDFS-14617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 2.10.0, 3.3.0
>
> Attachments: HDFS-14617.001.patch, ParallelLoading.svg, 
> SerialLoading.svg, dirs-single.svg, flamegraph.parallel.svg, 
> flamegraph.serial.svg, inodes.svg
>
>
> Loading an fsimage is basically a single threaded process. The current 
> fsimage is written out in sections, eg iNode, iNode_Directory, Snapshots, 
> Snapshot_Diff etc. Then at the end of the file, an index is written that 
> contains the offset and length of each section. The image loader code uses 
> this index to initialize an input stream to read and process each section. It 
> is important that one section is fully loaded before another is started, as 
> the next section depends on the results of the previous one.
> What I would like to propose is the following:
> 1. When writing the image, we can optionally output sub_sections to the 
> index. That way, a given section would effectively be split into several 
> sections, eg:
> {code:java}
>inode_section offset 10 length 1000
>  inode_sub_section offset 10 length 500
>  inode_sub_section offset 510 length 500
>  
>inode_dir_section offset 1010 length 1000
>  inode_dir_sub_section offset 1010 length 500
>  inode_dir_sub_section offset 1010 length 500
> {code}
> Here you can see we still have the original section index, but then we also 
> have sub-section entries that cover the entire section. Then a processor can 
> either read the full section in serial, or read each sub-section in parallel.
> 2. In the Image Writer code, we should set a target number of sub-sections, 
> and then based on the total inodes in memory, it will create that many 
> sub-sections per major image section. I think the only sections worth doing 
> this for are inode, inode_reference, inode_dir and snapshot_diff. All others 
> tend to be fairly small in practice.
> 3. If there are under some threshold of inodes (eg 10M) then don't bother 
> with the sub-sections as a serial load only takes a few seconds at that scale.
> 4. The image loading code can then have a switch to enable 'parallel loading' 
> and a 'number of threads' where it uses the sub-sections, or if not enabled 
> falls back to the existing logic to read the entire section in serial.
> Working with a large image of 316M inodes and 35GB on disk, I have a proof of 
> concept of this change working, allowing just inode and inode_dir to be 
> loaded in parallel, but I believe inode_reference and snapshot_diff can be 
> make parallel with the same technique.
> Some benchmarks I have are as follows:
> {code:java}
> Threads   1 2 3 4 
> 
> inodes448   290   226   189 
> inode_dir 326   211   170   161 
> Total 927   651   535   488 (MD5 calculation about 100 seconds)
> {code}
> The above table shows the time in seconds to load the inode section and the 
> inode_directory section, and then the total load time of the image.
> With 4 threads using the above technique, we are able to better than half the 
> load time of the two sections. With the patch in HDFS-13694 it would take a 
> further 100 seconds off the run time, going from 927 seconds to 388, which is 
> a significant improvement. Adding more threads beyond 4 has diminishing 
> returns as there are some synchronized points in the loading code to protect 
> the in memory structures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2460) Default checksum type is wrong in description

2019-11-12 Thread Attila Doroszlai (Jira)

Attila Doroszlai created HDDS-2460:
--

 Summary: Default checksum type is wrong in description
 Key: HDDS-2460
 URL: https://issues.apache.org/jira/browse/HDDS-2460
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Attila Doroszlai


Default client checksum type is CRC32, but the config item's description says 
it's SHA256 (leftover from HDDS-1149).  The description should be updated to 
match the actual default value.

{code:title=https://github.com/apache/hadoop-ozone/blob/a6f80c096b5320f50b6e9e9b4ba5f7c7e3544385/hadoop-hdds/common/src/main/resources/ozone-default.xml#L1489-L1497}
  
ozone.client.checksum.type
CRC32
OZONE, CLIENT, MANAGEMENT
The checksum type [NONE/ CRC32/ CRC32C/ SHA256/ MD5] determines
  which algorithm would be used to compute checksum for chunk data.
  Default checksum type is SHA256.

  
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-12 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972318#comment-16972318
 ] 

Stephen O'Donnell commented on HDDS-2446:
-

I looked into the code a bit more to double check this area.

The only place outside of tests where a DatanodeInfo object gets created is via 
SCMNodeMananger.register() -> nodeStateManager.addNode() -> Here it creates the 
new datanodeInfo. So far as I can tell, nothing cleans a registered node 
(DatanodeDetails or datanodeInfo) out of SCM except a restart - it will 
remember all nodes which have previously registered with it.

If a node re-registers, the above chain of calls will give a NodeAlreadyExists 
exception on registration, which is caught and a success is still returned to 
the DN.

If a node goes dead, then all its containers will be purged, but if it 
re-registers without being dead, the containers will still be present 
referencing the old DatanodeInfo object, which will not have changed.

One thing we could do, is purge the container list on re-registration, as the 
register command should have a container report which must be processed anyway.

As an aside, I wonder if there is a bug in the re-registration process - the 
way SCM checks if a node has already registered, is to look it up by UUID. If a 
DN is stopped and changes its IP or hostname, but retains the UUID, then it 
will 'register' successfully but the datanodeDetails information will not be 
updated if any of it has changed.

{code}
  public RegisteredCommand register(
  DatanodeDetails datanodeDetails, NodeReportProto nodeReport,
  PipelineReportsProto pipelineReportsProto) {

InetAddress dnAddress = Server.getRemoteIp();
if (dnAddress != null) {
  // Mostly called inside an RPC, update ip and peer hostname
  datanodeDetails.setHostName(dnAddress.getHostName());
  datanodeDetails.setIpAddress(dnAddress.getHostAddress());
}
try {
  String dnsName;
  String networkLocation;
  datanodeDetails.setNetworkName(datanodeDetails.getUuidString());
  if (useHostname) {
dnsName = datanodeDetails.getHostName();
  } else {
dnsName = datanodeDetails.getIpAddress();
  }
  networkLocation = nodeResolve(dnsName);
  if (networkLocation != null) {
datanodeDetails.setNetworkLocation(networkLocation);
  }
  nodeStateManager.addNode(datanodeDetails);
  clusterMap.add(datanodeDetails);
  addEntryTodnsToUuidMap(dnsName, datanodeDetails.getUuidString());
  // Updating Node Report, as registration is successful
  processNodeReport(datanodeDetails, nodeReport);
  LOG.info("Registered Data node : {}", datanodeDetails);
} catch (NodeAlreadyExistsException e) {
  if (LOG.isTraceEnabled()) {
LOG.trace("Datanode is already registered. Datanode: {}",
datanodeDetails.toString());
  }
}

return RegisteredCommand.newBuilder().setErrorCode(ErrorCode.success)
.setDatanode(datanodeDetails)
.setClusterID(this.scmStorageConfig.getClusterID())
.build();
  }
{code}

We should probably open another Jira if this bug is potentially there, but we 
may need to look at re-registration for maintenance mode anyway, as that will 
involve a node going dead, NOT clearing its replicas out, and then it 
registering again.


> ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
> 
>
> Key: HDDS-2446
> URL: https://issues.apache.org/jira/browse/HDDS-2446
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ContainerReplica object is used by the SCM to track containers reported 
> by the datanodes. The current fields stored in ContainerReplica are:
> {code}
> final private ContainerID containerID;
> final private ContainerReplicaProto.State state;
> final private DatanodeDetails datanodeDetails;
> final private UUID placeOfBirth;
> {code}
> Now we have introduced decommission and maintenance mode, the replication 
> manager (and potentially other parts of the code) need to know the status of 
> the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to 
> make replication decisions.
> The DatanodeDetails object does not carry this information, however the 
> DatanodeInfo object extends DatanodeDetails and does carry the required 
> information.
> As DatanodeInfo extends DatanodeDetails, any place which needs a 
> DatanodeDetails can accept a DatanodeInfo instead.
> In this Jira I propose we change the DatanodeDeta

[jira] [Commented] (HDFS-14617) Improve fsimage load time by writing sub-sections to the fsimage index

2019-11-12 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972323#comment-16972323
 ] 

Stephen O'Donnell commented on HDFS-14617:
--

[~Feng Yuan] The progress object (prog) is updated by each thread as it 
completes loading the inodes for the sub-section, which means anything using or 
monitoring the prog object can see the progress being made across all loading 
threads. I think this is used by the webUI to report startup progress. 
Therefore I think the call must be made in the sub-thread.

> Improve fsimage load time by writing sub-sections to the fsimage index
> --
>
> Key: HDFS-14617
> URL: https://issues.apache.org/jira/browse/HDFS-14617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 2.10.0, 3.3.0
>
> Attachments: HDFS-14617.001.patch, ParallelLoading.svg, 
> SerialLoading.svg, dirs-single.svg, flamegraph.parallel.svg, 
> flamegraph.serial.svg, inodes.svg
>
>
> Loading an fsimage is basically a single threaded process. The current 
> fsimage is written out in sections, eg iNode, iNode_Directory, Snapshots, 
> Snapshot_Diff etc. Then at the end of the file, an index is written that 
> contains the offset and length of each section. The image loader code uses 
> this index to initialize an input stream to read and process each section. It 
> is important that one section is fully loaded before another is started, as 
> the next section depends on the results of the previous one.
> What I would like to propose is the following:
> 1. When writing the image, we can optionally output sub_sections to the 
> index. That way, a given section would effectively be split into several 
> sections, eg:
> {code:java}
>inode_section offset 10 length 1000
>  inode_sub_section offset 10 length 500
>  inode_sub_section offset 510 length 500
>  
>inode_dir_section offset 1010 length 1000
>  inode_dir_sub_section offset 1010 length 500
>  inode_dir_sub_section offset 1010 length 500
> {code}
> Here you can see we still have the original section index, but then we also 
> have sub-section entries that cover the entire section. Then a processor can 
> either read the full section in serial, or read each sub-section in parallel.
> 2. In the Image Writer code, we should set a target number of sub-sections, 
> and then based on the total inodes in memory, it will create that many 
> sub-sections per major image section. I think the only sections worth doing 
> this for are inode, inode_reference, inode_dir and snapshot_diff. All others 
> tend to be fairly small in practice.
> 3. If there are under some threshold of inodes (eg 10M) then don't bother 
> with the sub-sections as a serial load only takes a few seconds at that scale.
> 4. The image loading code can then have a switch to enable 'parallel loading' 
> and a 'number of threads' where it uses the sub-sections, or if not enabled 
> falls back to the existing logic to read the entire section in serial.
> Working with a large image of 316M inodes and 35GB on disk, I have a proof of 
> concept of this change working, allowing just inode and inode_dir to be 
> loaded in parallel, but I believe inode_reference and snapshot_diff can be 
> make parallel with the same technique.
> Some benchmarks I have are as follows:
> {code:java}
> Threads   1 2 3 4 
> 
> inodes448   290   226   189 
> inode_dir 326   211   170   161 
> Total 927   651   535   488 (MD5 calculation about 100 seconds)
> {code}
> The above table shows the time in seconds to load the inode section and the 
> inode_directory section, and then the total load time of the image.
> With 4 threads using the above technique, we are able to better than half the 
> load time of the two sections. With the patch in HDFS-13694 it would take a 
> further 100 seconds off the run time, going from 927 seconds to 388, which is 
> a significant improvement. Adding more threads beyond 4 has diminishing 
> returns as there are some synchronized points in the loading code to protect 
> the in memory structures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-12 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972318#comment-16972318
 ] 

Stephen O'Donnell edited comment on HDDS-2446 at 11/12/19 12:16 PM:


I looked into the code a bit more to double check this area.

The only place outside of tests where a DatanodeInfo object gets created is via 
SCMNodeMananger.register() -> nodeStateManager.addNode() -> Here it creates the 
new datanodeInfo. So far as I can tell, nothing cleans a registered node 
(DatanodeDetails or datanodeInfo) out of SCM except a restart - it will 
remember all nodes which have previously registered with it.

If a node re-registers, the above chain of calls will give a NodeAlreadyExists 
exception on registration, which is caught and a success is still returned to 
the DN.

If a node goes dead, then all its containers will be purged, but if it 
re-registers without being dead, the containers will still be present 
referencing the old DatanodeInfo object, which will not have changed.

One thing we could do, is purge the container list on re-registration, as the 
register command should have a container report which must be processed anyway.

As an aside, I wonder if there is a bug in the re-registration process - the 
way SCM checks if a node has already registered, is to look it up by UUID. If a 
DN is stopped and changes its IP or hostname, but retains the UUID, then it 
will 'register' successfully but the datanodeDetails information will not be 
updated if any of it has changed.

{code}
  public RegisteredCommand register(
  DatanodeDetails datanodeDetails, NodeReportProto nodeReport,
  PipelineReportsProto pipelineReportsProto) {

InetAddress dnAddress = Server.getRemoteIp();
if (dnAddress != null) {
  // Mostly called inside an RPC, update ip and peer hostname
  datanodeDetails.setHostName(dnAddress.getHostName());
  datanodeDetails.setIpAddress(dnAddress.getHostAddress());
}
try {
  String dnsName;
  String networkLocation;
  datanodeDetails.setNetworkName(datanodeDetails.getUuidString());
  if (useHostname) {
dnsName = datanodeDetails.getHostName();
  } else {
dnsName = datanodeDetails.getIpAddress();
  }
  networkLocation = nodeResolve(dnsName);
  if (networkLocation != null) {
datanodeDetails.setNetworkLocation(networkLocation);
  }
  nodeStateManager.addNode(datanodeDetails); // <<-  This will throw 
NodeExists on re-registration, which means the nodeReport is also not processed.
  clusterMap.add(datanodeDetails);
  addEntryTodnsToUuidMap(dnsName, datanodeDetails.getUuidString());
  // Updating Node Report, as registration is successful
  processNodeReport(datanodeDetails, nodeReport);
  LOG.info("Registered Data node : {}", datanodeDetails);
} catch (NodeAlreadyExistsException e) {
  if (LOG.isTraceEnabled()) {
LOG.trace("Datanode is already registered. Datanode: {}",
datanodeDetails.toString());
  }
}

return RegisteredCommand.newBuilder().setErrorCode(ErrorCode.success)
.setDatanode(datanodeDetails)
.setClusterID(this.scmStorageConfig.getClusterID())
.build();
  }
{code}

We should probably open another Jira if this bug is potentially there, but we 
may need to look at re-registration for maintenance mode anyway, as that will 
involve a node going dead, NOT clearing its replicas out, and then it 
registering again.



was (Author: sodonnell):
I looked into the code a bit more to double check this area.

The only place outside of tests where a DatanodeInfo object gets created is via 
SCMNodeMananger.register() -> nodeStateManager.addNode() -> Here it creates the 
new datanodeInfo. So far as I can tell, nothing cleans a registered node 
(DatanodeDetails or datanodeInfo) out of SCM except a restart - it will 
remember all nodes which have previously registered with it.

If a node re-registers, the above chain of calls will give a NodeAlreadyExists 
exception on registration, which is caught and a success is still returned to 
the DN.

If a node goes dead, then all its containers will be purged, but if it 
re-registers without being dead, the containers will still be present 
referencing the old DatanodeInfo object, which will not have changed.

One thing we could do, is purge the container list on re-registration, as the 
register command should have a container report which must be processed anyway.

As an aside, I wonder if there is a bug in the re-registration process - the 
way SCM checks if a node has already registered, is to look it up by UUID. If a 
DN is stopped and changes its IP or hostname, but retains the UUID, then it 
will 'register' successfully but the datanodeDetails information will not be 
updated if any of it has changed.

{code}
  public RegisteredCommand register(

[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-11-12 Thread Istvan Fajth (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972334#comment-16972334
 ] 

Istvan Fajth commented on HDFS-14882:
-

Hi [~hexiaoqiao], patch-10 looks good to me, sorry for the slow response, I got 
a bit overwhelmed with other stuff and I couldn't get back here faster.

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch, HDFS-14882.004.patch, HDFS-14882.005.patch, 
> HDFS-14882.006.patch, HDFS-14882.007.patch, HDFS-14882.008.patch, 
> HDFS-14882.009.patch, HDFS-14882.010.patch, HDFS-14882.suggestion
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14617) Improve fsimage load time by writing sub-sections to the fsimage index

2019-11-12 Thread Feng Yuan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972347#comment-16972347
 ] 

Feng Yuan commented on HDFS-14617:
--

[~sodonnell] Ok, i  understood it. And  i have a another question, 
Why parallel loading is disabled when compression is open?

{code:java}
Parallel Image loading is not supported when {} is set to" +
+  " true. Parallel loading will be disabled.
{code}

Thanks for your replys.


> Improve fsimage load time by writing sub-sections to the fsimage index
> --
>
> Key: HDFS-14617
> URL: https://issues.apache.org/jira/browse/HDFS-14617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 2.10.0, 3.3.0
>
> Attachments: HDFS-14617.001.patch, ParallelLoading.svg, 
> SerialLoading.svg, dirs-single.svg, flamegraph.parallel.svg, 
> flamegraph.serial.svg, inodes.svg
>
>
> Loading an fsimage is basically a single threaded process. The current 
> fsimage is written out in sections, eg iNode, iNode_Directory, Snapshots, 
> Snapshot_Diff etc. Then at the end of the file, an index is written that 
> contains the offset and length of each section. The image loader code uses 
> this index to initialize an input stream to read and process each section. It 
> is important that one section is fully loaded before another is started, as 
> the next section depends on the results of the previous one.
> What I would like to propose is the following:
> 1. When writing the image, we can optionally output sub_sections to the 
> index. That way, a given section would effectively be split into several 
> sections, eg:
> {code:java}
>inode_section offset 10 length 1000
>  inode_sub_section offset 10 length 500
>  inode_sub_section offset 510 length 500
>  
>inode_dir_section offset 1010 length 1000
>  inode_dir_sub_section offset 1010 length 500
>  inode_dir_sub_section offset 1010 length 500
> {code}
> Here you can see we still have the original section index, but then we also 
> have sub-section entries that cover the entire section. Then a processor can 
> either read the full section in serial, or read each sub-section in parallel.
> 2. In the Image Writer code, we should set a target number of sub-sections, 
> and then based on the total inodes in memory, it will create that many 
> sub-sections per major image section. I think the only sections worth doing 
> this for are inode, inode_reference, inode_dir and snapshot_diff. All others 
> tend to be fairly small in practice.
> 3. If there are under some threshold of inodes (eg 10M) then don't bother 
> with the sub-sections as a serial load only takes a few seconds at that scale.
> 4. The image loading code can then have a switch to enable 'parallel loading' 
> and a 'number of threads' where it uses the sub-sections, or if not enabled 
> falls back to the existing logic to read the entire section in serial.
> Working with a large image of 316M inodes and 35GB on disk, I have a proof of 
> concept of this change working, allowing just inode and inode_dir to be 
> loaded in parallel, but I believe inode_reference and snapshot_diff can be 
> make parallel with the same technique.
> Some benchmarks I have are as follows:
> {code:java}
> Threads   1 2 3 4 
> 
> inodes448   290   226   189 
> inode_dir 326   211   170   161 
> Total 927   651   535   488 (MD5 calculation about 100 seconds)
> {code}
> The above table shows the time in seconds to load the inode section and the 
> inode_directory section, and then the total load time of the image.
> With 4 threads using the above technique, we are able to better than half the 
> load time of the two sections. With the patch in HDFS-13694 it would take a 
> further 100 seconds off the run time, going from 927 seconds to 388, which is 
> a significant improvement. Adding more threads beyond 4 has diminishing 
> returns as there are some synchronized points in the loading code to protect 
> the in memory structures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972358#comment-16972358
 ] 

Lisheng Sun commented on HDFS-14648:


{quote}
2) The line newDeadNodes.retainAll(deadNodes.values()); should not be correct, 
it will let newDeadNodes be same with old deadnodes.
{code:java}
+  public synchronized Set getDeadNodesToDetect() {
+// remove the dead nodes who doesn't have any inputstream first
+Set newDeadNodes = new HashSet();
+for (HashSet datanodeInfos : dfsInputStreamNodes.values()) {
+  newDeadNodes.addAll(datanodeInfos);
+}
+
+newDeadNodes.retainAll(deadNodes.values());
+
+for (DatanodeInfo datanodeInfo : deadNodes.values()) {
+  if (!newDeadNodes.contains(datanodeInfo)) {
+deadNodes.remove(datanodeInfo);
+  }
+}
+return newDeadNodes;
+  }

{code}
{quote}
Thanks [~linyiqun] for deep review comments.
Finally, newDeadNodes should be same with old deadnodes in 
DeadNodeDetector#clearAndGetDetectedDeadNodes.
And updated the patch and uploaded the v008 patch. Thank you a lot. [~linyiqun]

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14648:
---
Attachment: HDFS-14648.008.patch

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Ravuri Sushma sree (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14442:
--
Attachment: HDFS-14442.003.patch

> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch, 
> HDFS-14442.003.patch
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Ravuri Sushma sree (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14528:
--
Attachment: HDFS-14528.006.patch

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.006.patch, HDFS-14528.2.Patch, 
> ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14978) In-place Erasure Coding Conversion

2019-11-12 Thread Wei-Chiu Chuang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972384#comment-16972384
 ] 

Wei-Chiu Chuang commented on HDFS-14978:


bq. What is the client behavior during the CAS operation OP_SWAP_BLOCK_LIST
This operation is atomic. Semantically, it is similar to truncating the file to 
zero length, and then append the file with erasure coded blocks. 
Assuming both files are not open. A getBlockLocations() call for the $src prior 
to swapBlockList() gets the replicated block list. Once a client has the 
located blocks list, it has the block tokens too and it should be able to read 
without problems, even though the namespace has changed.

> In-place Erasure Coding Conversion
> --
>
> Key: HDFS-14978
> URL: https://issues.apache.org/jira/browse/HDFS-14978
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: In-place Erasure Coding Conversion.pdf
>
>
> HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses 
> encoding algorithms to reduce disk space usage while retaining redundancy 
> necessary for data recovery. It was a huge amount of work but it is just 
> getting adopted after almost 2 years.
> One usability problem that’s blocking users from adopting HDFS Erasure Coding 
> is that existing replicated files have to be copied to an EC-enabled 
> directory explicitly. Renaming a file/directory to an EC-enabled directory 
> does not automatically convert the blocks. Therefore users typically perform 
> the following steps to erasure-code existing files:
> {noformat}
> Create $tmp directory, set EC policy at it
> Distcp $src to $tmp
> Delete $src (rm -rf $src)
> mv $tmp $src
> {noformat}
> There are several reasons why this is not popular:
> * Complex. The process involves several steps: distcp data to a temporary 
> destination; delete source file; move destination to the source path.
> * Availability: there is a short period where nothing exists at the source 
> path, and jobs may fail unexpectedly.
> * Overhead. During the copy phase, there is a point in time where all of 
> source and destination files exist at the same time, exhausting disk space.
> * Not snapshot-friendly. If a snapshot is taken prior to performing the 
> conversion, the source (replicated) files will be preserved in the cluster 
> too. Therefore, the conversion actually increase storage space usage.
> * Not management-friendly. This approach changes file inode number, 
> modification time and access time. Erasure coded files are supposed to store 
> cold data, but this conversion makes data “hot” again.
> * Bulky. It’s either all or nothing. The directory may be partially erasure 
> coded, but this approach simply erasure code everything again.
> To ease data management, we should offer a utility tool to convert replicated 
> files to erasure coded files in-place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14612) SlowDiskReport won't update when SlowDisks is always empty in heartbeat

2019-11-12 Thread Haibin Huang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972486#comment-16972486
 ] 

Haibin Huang commented on HDFS-14612:
-

[~weichiu],i have update this patch, would you help review it? Thank you.

> SlowDiskReport won't update when SlowDisks is always empty in heartbeat
> ---
>
> Key: HDFS-14612
> URL: https://issues.apache.org/jira/browse/HDFS-14612
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haibin Huang
>Assignee: Haibin Huang
>Priority: Major
> Attachments: HDFS-14612-001.patch, HDFS-14612-002.patch, 
> HDFS-14612-003.patch, HDFS-14612-004.patch, HDFS-14612.patch
>
>
> I found SlowDiskReport won't update when slowDisks is always empty in 
> org.apache.hadoop.hdfs.server.blockmanagement.*handleHeartbeat*, this may 
> lead to outdated SlowDiskReport alway staying in jmx of namenode until next 
> time slowDisks isn't empty. So i think this method 
> *checkAndUpdateReportIfNecessary()* should be called firstly when we want to 
> get the jmx information about SlowDiskReport, this can keep the 
> SlowDiskReport on jmx is alway valid.
>  
> There is also some incorrect object reference on 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.
> *DataNodeVolumeMetrics*
> {code:java}
> // Based on writeIoRate
> public long getWriteIoSampleCount() {
>   return syncIoRate.lastStat().numSamples();
> }
> public double getWriteIoMean() {
>   return syncIoRate.lastStat().mean();
> }
> public double getWriteIoStdDev() {
>   return syncIoRate.lastStat().stddev();
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDDS-2461) Logging by ChunkUtils is misleading

2019-11-12 Thread Marton Elek (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek reassigned HDDS-2461:
-

Assignee: Marton Elek

> Logging by ChunkUtils is misleading
> ---
>
> Key: HDDS-2461
> URL: https://issues.apache.org/jira/browse/HDDS-2461
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>
> During a k8s based test I found a lot of log message like:
> {code:java}
> 2019-11-12 14:27:13 WARN  ChunkManagerImpl:209 - Duplicate write chunk 
> request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='A9UrLxiEUN_testdata_chunk_4465025, offset=0, len=1024} 
> {code}
> I was very surprised as at ChunkManagerImpl:209 there was no similar lines.
> It turned out that it's logged by ChunkUtils but it's used the logger of 
> ChunkManagerImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2461) Logging by ChunkUtils is misleading

2019-11-12 Thread Marton Elek (Jira)

Marton Elek created HDDS-2461:
-

 Summary: Logging by ChunkUtils is misleading
 Key: HDDS-2461
 URL: https://issues.apache.org/jira/browse/HDDS-2461
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Marton Elek


During a k8s based test I found a lot of log message like:
{code:java}
2019-11-12 14:27:13 WARN  ChunkManagerImpl:209 - Duplicate write chunk request. 
Chunk overwrite without explicit request. 
ChunkInfo{chunkName='A9UrLxiEUN_testdata_chunk_4465025, offset=0, len=1024} 
{code}
I was very surprised as at ChunkManagerImpl:209 there was no similar lines.

It turned out that it's logged by ChunkUtils but it's used the logger of 
ChunkManagerImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2461) Logging by ChunkUtils is misleading

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2461?focusedWorklogId=341966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341966
 ]

ASF GitHub Bot logged work on HDDS-2461:


Author: ASF GitHub Bot
Created on: 12/Nov/19 15:22
Start Date: 12/Nov/19 15:22
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #144: HDDS-2461. Logging 
by ChunkUtils is misleading
URL: https://github.com/apache/hadoop-ozone/pull/144
 
 
   ## What changes were proposed in this pull request?
   
   During a k8s based test I found a lot of log message like:
   ```
   2019-11-12 14:27:13 WARN  ChunkManagerImpl:209 - Duplicate write chunk 
request. Chunk overwrite without explicit request. 
ChunkInfo{chunkName='A9UrLxiEUN_testdata_chunk_4465025, offset=0, len=1024} 
   ```
   
   I was very surprised as at `ChunkManagerImpl:209` there was no related lines.
   
   It turned out that it's logged by `ChunkUtils` but it's used the logger of 
`ChunkManagerImpl`.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2461
   
   ## How was this patch tested?
   
   I deployed a new version  from Ozone to the kubernetes cluster. But I also 
added a new test method TestChunkUtil to have at least one unit test method 
which uses the logger. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341966)
Remaining Estimate: 0h
Time Spent: 10m

> Logging by ChunkUtils is misleading
> ---
>
> Key: HDDS-2461
> URL: https://issues.apache.org/jira/browse/HDDS-2461
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During a k8s based test I found a lot of log message like:
> {code:java}
> 2019-11-12 14:27:13 WARN  ChunkManagerImpl:209 - Duplicate write chunk 
> request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='A9UrLxiEUN_testdata_chunk_4465025, offset=0, len=1024} 
> {code}
> I was very surprised as at ChunkManagerImpl:209 there was no similar lines.
> It turned out that it's logged by ChunkUtils but it's used the logger of 
> ChunkManagerImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2461) Logging by ChunkUtils is misleading

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2461:
-
Labels: pull-request-available  (was: )

> Logging by ChunkUtils is misleading
> ---
>
> Key: HDDS-2461
> URL: https://issues.apache.org/jira/browse/HDDS-2461
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>
> During a k8s based test I found a lot of log message like:
> {code:java}
> 2019-11-12 14:27:13 WARN  ChunkManagerImpl:209 - Duplicate write chunk 
> request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='A9UrLxiEUN_testdata_chunk_4465025, offset=0, len=1024} 
> {code}
> I was very surprised as at ChunkManagerImpl:209 there was no similar lines.
> It turned out that it's logged by ChunkUtils but it's used the logger of 
> ChunkManagerImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2415) Completely disable tracer if hdds.tracing.enabled=false

2019-11-12 Thread Marton Elek (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2415:
--
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Completely disable tracer if hdds.tracing.enabled=false
> ---
>
> Key: HDDS-2415
> URL: https://issues.apache.org/jira/browse/HDDS-2415
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: perfomance, pull-request-available
> Fix For: 0.5.0
>
> Attachments: allocations.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a config setting to enable/disable OpenTracing-based distributed 
> tracing in Ozone ({{hdds.tracing.enabled}}).  However, setting it to false 
> does not prevent tracer initialization, which causes unnecessary object 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2415) Completely disable tracer if hdds.tracing.enabled=false

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2415?focusedWorklogId=341985&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341985
 ]

ASF GitHub Bot logged work on HDDS-2415:


Author: ASF GitHub Bot
Created on: 12/Nov/19 15:46
Start Date: 12/Nov/19 15:46
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #128: HDDS-2415. 
Completely disable tracer if hdds.tracing.enabled=false
URL: https://github.com/apache/hadoop-ozone/pull/128
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341985)
Time Spent: 20m  (was: 10m)

> Completely disable tracer if hdds.tracing.enabled=false
> ---
>
> Key: HDDS-2415
> URL: https://issues.apache.org/jira/browse/HDDS-2415
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: perfomance, pull-request-available
> Fix For: 0.5.0
>
> Attachments: allocations.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a config setting to enable/disable OpenTracing-based distributed 
> tracing in Ozone ({{hdds.tracing.enabled}}).  However, setting it to false 
> does not prevent tracer initialization, which causes unnecessary object 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1868?focusedWorklogId=341992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341992
 ]

ASF GitHub Bot logged work on HDDS-1868:


Author: ASF GitHub Bot
Created on: 12/Nov/19 15:54
Start Date: 12/Nov/19 15:54
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on pull request #23: HDDS-1868. 
Ozone pipelines should be marked as ready only after the leader election is 
complete.
URL: https://github.com/apache/hadoop-ozone/pull/23
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341992)
Time Spent: 3h 50m  (was: 3h 40m)

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, 
> HDDS-1868.03.patch, HDDS-1868.04.patch, HDDS-1868.05.patch, HDDS-1868.06.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Ozone pipeline on create and restart, start in allocated state. They are 
> moved into open state after all the pipeline have reported to it. However, 
> this potentially can lead into an issue where the pipeline is still not ready 
> to accept any incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-11-12 Thread Nanda kumar (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1868:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch, 
> HDDS-1868.03.patch, HDDS-1868.04.patch, HDDS-1868.05.patch, HDDS-1868.06.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Ozone pipeline on create and restart, start in allocated state. They are 
> moved into open state after all the pipeline have reported to it. However, 
> this potentially can lead into an issue where the pipeline is still not ready 
> to accept any incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2462) Add jq dependency in how to contribute docs

2019-11-12 Thread Istvan Fajth (Jira)

Istvan Fajth created HDDS-2462:
--

 Summary: Add jq dependency in how to contribute docs
 Key: HDDS-2462
 URL: https://issues.apache.org/jira/browse/HDDS-2462
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Istvan Fajth


Docker based tests are using JQ to parse JMX pages of different processes, but 
the documentation does not mention it as a dependency.

Add it to CONTRIBUTION.MD in the "Additional requirements to execute different 
type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972566#comment-16972566
 ] 

Hadoop QA commented on HDFS-14648:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 52s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
112 unchanged - 1 fixed = 113 total (was 113) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
13s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client generated 1 new 
+ 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
2s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 54s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}171m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs-client |
|  |  org.apache.hadoop.hdfs.protocol.DatanodeInfo is incompatible with 
expected argument type String in 
org.apache.hadoop.hdfs.DeadNodeDetector.clearAndGetDetectedDeadNodes()  At 
DeadNodeDetector.java:argument type String in 
org.apache.hadoop.hdfs.DeadNodeDetector.clearAndGetDetectedDeadNodes()  At 
DeadNodeDetector.java:[line 165] |
| Failed junit tests | hadoop.hdfs.server.namenode.TestReencryption |
|

[jira] [Updated] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread Istvan Fajth (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDDS-2462:
---
Summary: Add jq dependency in Contribution guideline  (was: Add jq 
dependency in how to contribute docs)

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Priority: Major
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Yiqun Lin (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972571#comment-16972571
 ] 

Yiqun Lin commented on HDFS-14648:
--

Thanks [~leosun08] , the patch almost looks good now, only some minor comments:

*DFSInputStream.java*
 1. There are one additional places we can add the addNodeToDeadNodeDetector 
call in {{createBlockReader}}
{code:java}
  boolean createBlockReader(LocatedBlock block, long offsetInBlock,
  LocatedBlock[] targetBlocks, BlockReaderInfo[] readerInfos,
 ...
} else {
  //TODO: handles connection issues
  DFSClient.LOG.warn("Failed to connect to " + dnInfo.addr + " for " +
  "block" + block.getBlock(), e);
  // re-fetch the block in case the block has been moved
  fetchBlockAt(block.getStartOffset());
  addToLocalDeadNodes(dnInfo.info);
 //  <
}
  }
{code}
*DeadNodeDetector.java*
 1.Can you address this comment that missed?
{quote}1. Can we comment the name as Client context name

+ /**
 + * Client context name.
 + */
 + private String name;
{quote}
2. We can use the containsKey to check
{code:java}
  public boolean isDeadNode(DatanodeInfo datanodeInfo) {
return deadNodes.containsKey((datanodeInfo.getDatanodeUuid());
  }
{code}
Also we can use the key to remove in method clearAndGetDetectedDeadNodes
{code:java}
for (DatanodeInfo datanodeInfo : deadNodes.values()) {
  if (!newDeadNodes.contains(datanodeInfo)) {
deadNodes.remove(datanodeInfo.getDatanodeUuid());
  }
}
{code}
3. We can periodically call clearAndGetDetectedDeadNodes to make deadNodes list 
be refreshed. I think deadNodes list can be a little staled when the local dead 
node is cleared in dfs input stream.
{code:java}
  public void run() {
while (true) {
  clearAndGetDetectedDeadNodes();
  LOG.debug("Current detector state {}, the detected nodes: {}.", state,
  deadNodes.values());
  switch (state) {
{code}
4. Not fully get this. Why we still call this in the latest patch? Can you 
explain for this?
{noformat}
newDeadNodes.retainAll(deadNodes.values());
{noformat}

*TestDFSClientDetectDeadNodes.java*
 1. Can you rename the unit test name from {{TestDFSClientDetectDeadNodes}} to 
{{TestDeadNodeDetection}}? And simplified the comment to this:
{noformat}
+/**
+ * Tests for dead node detection in DFSClient.
+ */
+public class TestDeadNodeDetection {
{noformat}
Two other name updated:
 * testDetectDeadNodeInBackground --> testDeadNodeDetectionInBackground
 * testDeadNodeMultipleDFSInputStream --> 
testDeadNodeDetectionInMultipleDFSInputStream

2. No needed to call {{ThreadUtil.sleepAtLeastIgnoreInterrupts(10 * 1000L);}} I 
think.
 3. Can we extract the DFSClient here? I see we call many times getDFSClient().
{code:java}
assertEquals(1, din1.getDFSClient().getDeadNodes(din1).size());
assertEquals(1, din1.getDFSClient().getClientContext()
{code}
 

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is

[jira] [Updated] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2462:
-
Labels: pull-request-available  (was: )

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2462?focusedWorklogId=342001&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342001
 ]

ASF GitHub Bot logged work on HDDS-2462:


Author: ASF GitHub Bot
Created on: 12/Nov/19 16:08
Start Date: 12/Nov/19 16:08
Worklog Time Spent: 10m 
  Work Description: fapifta commented on pull request #145: HDDS-2462. Add 
jq dependency in Contribution guideline
URL: https://github.com/apache/hadoop-ozone/pull/145
 
 
   ## What changes were proposed in this pull request?
   Documentation update, add jq dependency into the Contribution Guideline in 
the "Additional requirements to execute different type of tests" section
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-2462
   
   ## How was this patch tested?
   Doc change, no tests needed as far as I can tell.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342001)
Remaining Estimate: 0h
Time Spent: 10m

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread Istvan Fajth (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth reassigned HDDS-2462:
--

Assignee: Istvan Fajth

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972577#comment-16972577
 ] 

Hadoop QA commented on HDFS-14442:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 51s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 22 unchanged - 0 fixed = 24 total (was 22) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}112m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}185m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
|   | hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN |
|   | hadoop.hdfs.server.datanode.TestDataNodeReconfiguration |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.datanode.checker.TestDatasetVolumeCheckerTimeout |
|   | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
|   | hadoop.hdfs.server.mover.TestMover |
|   | hadoop.hdfs.server.mover.TestStorageMover |
|   | hadoop.hdfs.server.datanode.TestDataNodeLifeline |
|   | hadoop.hdfs.server.blockmanagement.TestBlockInfoStriped |
|   | hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting |
|   | hadoop.hdfs.server.datanode.TestBlockRecovery |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.server.blockmanagement.TestPendingReconstruction |
|   | hadoop.hdfs.server.blockmanagement.TestReplicationPolicy |
|   | hadoop.hdfs.server.datanode.TestBatchIbr |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14442 |
| JIRA Patch U

[jira] [Updated] (HDDS-2456) Add explicit base image version for images derived from ozone-runner

2019-11-12 Thread Marton Elek (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-2456:
--
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Add explicit base image version for images derived from ozone-runner
> 
>
> Key: HDDS-2456
> URL: https://issues.apache.org/jira/browse/HDDS-2456
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: docker
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ozone-om-ha}} and {{ozonescripts}} build images based on 
> {{apache/ozone-runner}}.
> Problem: They do not specify base image versions, so it defaults to 
> {{latest}}.  If a new {{ozone-runner}} image is published on Docker Hub, 
> developers needs to manually pull the {{latest}} image for it to take effect 
> on these derived images.
> Solution: Use explicit base image version (defined by 
> {{OZONE_RUNNER_VERSION}} variable in {{.env}} file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2456) Add explicit base image version for images derived from ozone-runner

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2456?focusedWorklogId=342014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342014
 ]

ASF GitHub Bot logged work on HDDS-2456:


Author: ASF GitHub Bot
Created on: 12/Nov/19 16:23
Start Date: 12/Nov/19 16:23
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #139: HDDS-2456. Add 
explicit base image version for images derived from ozone-runner
URL: https://github.com/apache/hadoop-ozone/pull/139
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342014)
Time Spent: 20m  (was: 10m)

> Add explicit base image version for images derived from ozone-runner
> 
>
> Key: HDDS-2456
> URL: https://issues.apache.org/jira/browse/HDDS-2456
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: docker
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{ozone-om-ha}} and {{ozonescripts}} build images based on 
> {{apache/ozone-runner}}.
> Problem: They do not specify base image versions, so it defaults to 
> {{latest}}.  If a new {{ozone-runner}} image is published on Docker Hub, 
> developers needs to manually pull the {{latest}} image for it to take effect 
> on these derived images.
> Solution: Use explicit base image version (defined by 
> {{OZONE_RUNNER_VERSION}} variable in {{.env}} file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972622#comment-16972622
 ] 

Hadoop QA commented on HDFS-14528:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 18m 
28s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 57s{color} | {color:orange} root: The patch generated 11 new + 36 unchanged 
- 0 fixed = 47 total (was 36) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
46s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 22s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
57s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}214m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
|   | hadoop.hdfs.server.balancer.TestBalancerRPCDelay |
|   | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy |
|   | hadoop.hdfs.TestFileChecksumCompositeCrc |
|   | hadoop.hdfs.TestErasureCodingPolicies |
|   | hadoop.hdfs.TestDecommissionWithStriped |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestFileAppend2 |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14528 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985622/HDFS-14528.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite

[jira] [Updated] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14648:
---
Attachment: HDFS-14648.009.patch

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972624#comment-16972624
 ] 

Lisheng Sun commented on HDFS-14648:


Thanks [~linyiqun] for good comments.
 i updated the patch as your comments and uploaded the v009 patch. Thank you.

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14981) BlockStateChange logging is exceedingly verbose

2019-11-12 Thread Nick Dimiduk (Jira)

Nick Dimiduk created HDFS-14981:
---

 Summary: BlockStateChange logging is exceedingly verbose
 Key: HDFS-14981
 URL: https://issues.apache.org/jira/browse/HDFS-14981
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: logging
Reporter: Nick Dimiduk


On a moderately loaded cluster, name node logs are flooded with entries of 
{{INFO BlockStateChange...}}, to the tune of ~30 lines per millisecond. This 
provides operators with little to no usable information. I suggest reducing 
this log message to {{DEBUG}}. Perhaps this information (and other logging 
related to it) should be directed to a dedicated block-audit.log file that can 
be queried, rotated on a separate schedule from the log of the main process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14981) BlockStateChange logging is exceedingly verbose

2019-11-12 Thread Wei-Chiu Chuang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972640#comment-16972640
 ] 

Wei-Chiu Chuang commented on HDFS-14981:


I think this is done by HDFS-6860.

> BlockStateChange logging is exceedingly verbose
> ---
>
> Key: HDFS-14981
> URL: https://issues.apache.org/jira/browse/HDFS-14981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging
>Reporter: Nick Dimiduk
>Priority: Major
>
> On a moderately loaded cluster, name node logs are flooded with entries of 
> {{INFO BlockStateChange...}}, to the tune of ~30 lines per millisecond. This 
> provides operators with little to no usable information. I suggest reducing 
> this log message to {{DEBUG}}. Perhaps this information (and other logging 
> related to it) should be directed to a dedicated block-audit.log file that 
> can be queried, rotated on a separate schedule from the log of the main 
> process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14922) On StartUp , Snapshot modification time got changed

2019-11-12 Thread hemanthboyina (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972642#comment-16972642
 ] 

hemanthboyina commented on HDFS-14922:
--

[~elgoiri] can you push the patch forward

> On StartUp , Snapshot modification time got changed
> ---
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Ravuri Sushma sree (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14528:
--
Attachment: (was: HDFS-14528.006.patch)

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.2.Patch, ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14981) BlockStateChange logging is exceedingly verbose

2019-11-12 Thread Nick Dimiduk (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HDFS-14981.
-
Resolution: Duplicate

Yep, I think you're right. Thanks for the pointer [~weichiu].

> BlockStateChange logging is exceedingly verbose
> ---
>
> Key: HDFS-14981
> URL: https://issues.apache.org/jira/browse/HDFS-14981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging
>Reporter: Nick Dimiduk
>Priority: Major
>
> On a moderately loaded cluster, name node logs are flooded with entries of 
> {{INFO BlockStateChange...}}, to the tune of ~30 lines per millisecond. This 
> provides operators with little to no usable information. I suggest reducing 
> this log message to {{DEBUG}}. Perhaps this information (and other logging 
> related to it) should be directed to a dedicated block-audit.log file that 
> can be queried, rotated on a separate schedule from the log of the main 
> process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14922) On StartUp, snapshot modification time got changed

2019-11-12 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14922:
---
Summary: On StartUp, snapshot modification time got changed  (was: On 
StartUp , Snapshot modification time got changed)

> On StartUp, snapshot modification time got changed
> --
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14922) Prevent snapshot modification time got change on startup

2019-11-12 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14922:
---
Summary: Prevent snapshot modification time got change on startup  (was: On 
StartUp, snapshot modification time got changed)

> Prevent snapshot modification time got change on startup
> 
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14922) Prevent snapshot modification time got change on startup

2019-11-12 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14922:
---
Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Prevent snapshot modification time got change on startup
> 
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14922) Prevent snapshot modification time got change on startup

2019-11-12 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972694#comment-16972694
 ] 

Íñigo Goiri commented on HDFS-14922:


Thanks [~hemanthboyina] for the patch and [~virajith] for checking.
Committed to trunk.

> Prevent snapshot modification time got change on startup
> 
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Ravuri Sushma sree (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14528:
--
Attachment: (was: HDFS-14528.006.patch)

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.2.Patch, ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Ravuri Sushma sree (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14528:
--
Attachment: HDFS-14528.006.patch

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.006.patch, HDFS-14528.2.Patch, 
> ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Ravuri Sushma sree (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14528:
--
Attachment: HDFS-14528.006.patch

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.006.patch, HDFS-14528.2.Patch, 
> ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-11-12 Thread Chen Liang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972701#comment-16972701
 ] 

Chen Liang commented on HDFS-14655:
---

We have this fix in our deployment, one thing I found is that it prints a ton 
of WARN {{java.util.concurrent.CancellationException}} in NN logs, can we make 
a fix to suppress the warnings? 

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655-05.patch, 
> HDFS-14655-06.patch, HDFS-14655-07.patch, HDFS-14655-08.patch, 
> HDFS-14655-branch-2-01.patch, HDFS-14655-branch-2-02.patch, 
> HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14955) RBF: getQuotaUsage() on mount point should return global quota.

2019-11-12 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972704#comment-16972704
 ] 

Íñigo Goiri commented on HDFS-14955:


Thanks [~LiJinglun] for the patch.
* Update the javadoc for {{aggregateQuota()}}.
* I think we can skip most of the for loop right before if this is a mount 
point.

> RBF: getQuotaUsage() on mount point should return global quota.
> ---
>
> Key: HDFS-14955
> URL: https://issues.apache.org/jira/browse/HDFS-14955
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14955.001.patch
>
>
> When getQuotaUsage() on a mount point path, the quota part should be the 
> global quota. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14922) Prevent snapshot modification time got change on startup

2019-11-12 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972707#comment-16972707
 ] 

Hudson commented on HDFS-14922:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17634 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17634/])
HDFS-14922. Prevent snapshot modification time got change on startup. 
(inigoiri: rev 40150da1e12a41c2e774fe2a277ddc3988bed239)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSnapshotOp.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshot.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectorySnapshottableFeature.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java


> Prevent snapshot modification time got change on startup
> 
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch, HDFS-14922.005.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Ravuri Sushma sree (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14442:
--
Attachment: (was: HDFS-14442.003.patch)

> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch, 
> HDFS-14442.003.PATCH
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Ravuri Sushma sree (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14442:
--
Attachment: HDFS-14442.003.PATCH

> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch, 
> HDFS-14442.003.PATCH
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972714#comment-16972714
 ] 

Hadoop QA commented on HDFS-14442:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
5s{color} | {color:blue} The patch file was not named according to hadoop's 
naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute 
for instructions. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} HDFS-14442 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14442 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985651/HDFS-14442.003.PATCH |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28299/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch, 
> HDFS-14442.003.PATCH
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-11-12 Thread Ravuri Sushma sree (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14442:
--
Attachment: (was: HDFS-14442.003.PATCH)

> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14959) [SBNN read] access time should be turned off

2019-11-12 Thread Wei-Chiu Chuang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14959:
---
Fix Version/s: 3.2.2
   3.1.4
   3.3.0

> [SBNN read] access time should be turned off
> 
>
> Key: HDFS-14959
> URL: https://issues.apache.org/jira/browse/HDFS-14959
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation
>Reporter: Wei-Chiu Chuang
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> Both Uber and Didi shared that access time has to be switched off to avoid 
> spiky NameNode RPC process time. If access time is not off entirely, 
> getBlockLocations RPCs have to update access time and must access the active 
> NameNode. (that's my understanding. haven't checked the code)
> We should record this as a best practice in our doc.
> (If you are on the ASF slack, check out this thread
> https://the-asf.slack.com/archives/CAD7C52Q3/p1572033324008600)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14959) [SBNN read] access time should be turned off

2019-11-12 Thread Wei-Chiu Chuang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-14959.

Resolution: Fixed

Merged the PR to trunk and cherry pick the commit to branch-3.2 and branch-3.1.
Thanks [~csun]!

> [SBNN read] access time should be turned off
> 
>
> Key: HDFS-14959
> URL: https://issues.apache.org/jira/browse/HDFS-14959
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation
>Reporter: Wei-Chiu Chuang
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> Both Uber and Didi shared that access time has to be switched off to avoid 
> spiky NameNode RPC process time. If access time is not off entirely, 
> getBlockLocations RPCs have to update access time and must access the active 
> NameNode. (that's my understanding. haven't checked the code)
> We should record this as a best practice in our doc.
> (If you are on the ASF slack, check out this thread
> https://the-asf.slack.com/archives/CAD7C52Q3/p1572033324008600)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972735#comment-16972735
 ] 

Hadoop QA commented on HDFS-14648:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} hadoop-hdfs-project: The patch generated 0 new + 112 
unchanged - 1 fixed = 112 total (was 113) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
51s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 28s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}182m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy 
|
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14648 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985638/HDFS-14648.009.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 40af05af70d8 4.15.0-6

[jira] [Commented] (HDFS-14959) [SBNN read] access time should be turned off

2019-11-12 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972736#comment-16972736
 ] 

Hudson commented on HDFS-14959:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17636 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17636/])
HDFS-14959: [SBNN read] access time should be turned off (#1706) (weichiu: rev 
97ec34e117af71e1a9950b8002131c45754009c7)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ObserverNameNode.md


> [SBNN read] access time should be turned off
> 
>
> Key: HDFS-14959
> URL: https://issues.apache.org/jira/browse/HDFS-14959
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation
>Reporter: Wei-Chiu Chuang
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> Both Uber and Didi shared that access time has to be switched off to avoid 
> spiky NameNode RPC process time. If access time is not off entirely, 
> getBlockLocations RPCs have to update access time and must access the active 
> NameNode. (that's my understanding. haven't checked the code)
> We should record this as a best practice in our doc.
> (If you are on the ASF slack, check out this thread
> https://the-asf.slack.com/archives/CAD7C52Q3/p1572033324008600)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2463) Remove unnecessary getServiceInfo calls

2019-11-12 Thread Xiaoyu Yao (Jira)

Xiaoyu Yao created HDDS-2463:


 Summary: Remove unnecessary getServiceInfo calls
 Key: HDDS-2463
 URL: https://issues.apache.org/jira/browse/HDDS-2463
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.4.1
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


OzoneManagerProtocolClientSideTranslatorPB.java Line 766-772 has multiple 
impl.getServiceInfo() which can be reduced by adding a local variable. 
{code:java}
 
resp.addAllServiceInfo(impl.getServiceInfo().getServiceInfoList().stream()
 .map(ServiceInfo::getProtobuf)
 .collect(Collectors.toList()));
if (impl.getServiceInfo().getCaCertificate() != null) {
 resp.setCaCertificate(impl.getServiceInfo().getCaCertificate()); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2462?focusedWorklogId=342146&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342146
 ]

ASF GitHub Bot logged work on HDDS-2462:


Author: ASF GitHub Bot
Created on: 12/Nov/19 20:57
Start Date: 12/Nov/19 20:57
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #145: HDDS-2462. 
Add jq dependency in Contribution guideline
URL: https://github.com/apache/hadoop-ozone/pull/145
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342146)
Time Spent: 20m  (was: 10m)

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2462) Add jq dependency in Contribution guideline

2019-11-12 Thread Anu Engineer (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer resolved HDDS-2462.

Fix Version/s: 0.5.0
   Resolution: Fixed

Committed to Master branch.

> Add jq dependency in Contribution guideline
> ---
>
> Key: HDDS-2462
> URL: https://issues.apache.org/jira/browse/HDDS-2462
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Docker based tests are using JQ to parse JMX pages of different processes, 
> but the documentation does not mention it as a dependency.
> Add it to CONTRIBUTION.MD in the "Additional requirements to execute 
> different type of tests" section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2463) Reduce unnecessary getServiceInfo calls

2019-11-12 Thread Xiaoyu Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2463:
-
Summary: Reduce unnecessary getServiceInfo calls  (was: Remove unnecessary 
getServiceInfo calls)

> Reduce unnecessary getServiceInfo calls
> ---
>
> Key: HDDS-2463
> URL: https://issues.apache.org/jira/browse/HDDS-2463
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>
> OzoneManagerProtocolClientSideTranslatorPB.java Line 766-772 has multiple 
> impl.getServiceInfo() which can be reduced by adding a local variable. 
> {code:java}
>  
> resp.addAllServiceInfo(impl.getServiceInfo().getServiceInfoList().stream()
>  .map(ServiceInfo::getProtobuf)
>  .collect(Collectors.toList()));
> if (impl.getServiceInfo().getCaCertificate() != null) {
>  resp.setCaCertificate(impl.getServiceInfo().getCaCertificate()); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2463) Reduce unnecessary getServiceInfo calls

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2463?focusedWorklogId=342180&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342180
 ]

ASF GitHub Bot logged work on HDDS-2463:


Author: ASF GitHub Bot
Created on: 12/Nov/19 21:32
Start Date: 12/Nov/19 21:32
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #146: HDDS-2463. 
Reduce unnecessary getServiceInfo calls. Contributed by Xi…
URL: https://github.com/apache/hadoop-ozone/pull/146
 
 
   …aoyu Yao.
   
   ## What changes were proposed in this pull request?
   
   reduce unncessary getServiceInfo calls. 
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2463
   
   ## How was this patch tested?
   
   Run Ozone RPC related unit tests and acceptance tests. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342180)
Remaining Estimate: 0h
Time Spent: 10m

> Reduce unnecessary getServiceInfo calls
> ---
>
> Key: HDDS-2463
> URL: https://issues.apache.org/jira/browse/HDDS-2463
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> OzoneManagerProtocolClientSideTranslatorPB.java Line 766-772 has multiple 
> impl.getServiceInfo() which can be reduced by adding a local variable. 
> {code:java}
>  
> resp.addAllServiceInfo(impl.getServiceInfo().getServiceInfoList().stream()
>  .map(ServiceInfo::getProtobuf)
>  .collect(Collectors.toList()));
> if (impl.getServiceInfo().getCaCertificate() != null) {
>  resp.setCaCertificate(impl.getServiceInfo().getCaCertificate()); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2463) Reduce unnecessary getServiceInfo calls

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2463:
-
Labels: pull-request-available  (was: )

> Reduce unnecessary getServiceInfo calls
> ---
>
> Key: HDDS-2463
> URL: https://issues.apache.org/jira/browse/HDDS-2463
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>
> OzoneManagerProtocolClientSideTranslatorPB.java Line 766-772 has multiple 
> impl.getServiceInfo() which can be reduced by adding a local variable. 
> {code:java}
>  
> resp.addAllServiceInfo(impl.getServiceInfo().getServiceInfoList().stream()
>  .map(ServiceInfo::getProtobuf)
>  .collect(Collectors.toList()));
> if (impl.getServiceInfo().getCaCertificate() != null) {
>  resp.setCaCertificate(impl.getServiceInfo().getCaCertificate()); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2464) Avoid unnecessary allocations for FileChannel.open call

2019-11-12 Thread Attila Doroszlai (Jira)

Attila Doroszlai created HDDS-2464:
--

 Summary: Avoid unnecessary allocations for FileChannel.open call
 Key: HDDS-2464
 URL: https://issues.apache.org/jira/browse/HDDS-2464
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


{{ChunkUtils}} calls {{FileChannel#open(Path, OpenOption...)}}.  Vararg array 
elements are then added to a new {{HashSet}} to call {{FileChannel#open(Path, 
Set, FileAttribute...)}}.  We can call the latter 
directly instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2105) Merge OzoneClientFactory#getRpcClient functions

2019-11-12 Thread Siyao Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-2105:
-
Description: 
Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214

There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
increases code paths.

Goal: Merge those functions into fewer.

  was:
Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214

There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
increases code paths.

Goal: Merge those functions into one or two.

Work will begin after HDDS-2007 is committed.


> Merge OzoneClientFactory#getRpcClient functions
> ---
>
> Key: HDDS-2105
> URL: https://issues.apache.org/jira/browse/HDDS-2105
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214
> There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
> HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
> increases code paths.
> Goal: Merge those functions into fewer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2105) Merge OzoneClientFactory#getRpcClient functions

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2105:
-
Labels: pull-request-available  (was: )

> Merge OzoneClientFactory#getRpcClient functions
> ---
>
> Key: HDDS-2105
> URL: https://issues.apache.org/jira/browse/HDDS-2105
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>
> Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214
> There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
> HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
> increases code paths.
> Goal: Merge those functions into fewer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2464) Avoid unnecessary allocations for FileChannel.open call

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2464:
-
Labels: pull-request-available  (was: )

> Avoid unnecessary allocations for FileChannel.open call
> ---
>
> Key: HDDS-2464
> URL: https://issues.apache.org/jira/browse/HDDS-2464
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>
> {{ChunkUtils}} calls {{FileChannel#open(Path, OpenOption...)}}.  Vararg array 
> elements are then added to a new {{HashSet}} to call {{FileChannel#open(Path, 
> Set, FileAttribute...)}}.  We can call the latter 
> directly instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2105) Merge OzoneClientFactory#getRpcClient functions

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2105?focusedWorklogId=342234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342234
 ]

ASF GitHub Bot logged work on HDDS-2105:


Author: ASF GitHub Bot
Created on: 12/Nov/19 22:14
Start Date: 12/Nov/19 22:14
Worklog Time Spent: 10m 
  Work Description: smengcl commented on pull request #148: HDDS-2105. 
Merge OzoneClientFactory#getRpcClient functions
URL: https://github.com/apache/hadoop-ozone/pull/148
 
 
   ## What changes were proposed in this pull request?
   
   There are in total 6 overloaded `OzoneClientFactory#getRpcClient` functions 
now. Some of them are not used or just used once. Remove/merge some of them. 
(Should be fine to simply remove public function without deprecating at this 
moment since ozone is still in alpha?)
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2105
   
   ## How was this patch tested?
   
   Rerun all existing tests, since this is just a straightforward refactoring.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342234)
Remaining Estimate: 0h
Time Spent: 10m

> Merge OzoneClientFactory#getRpcClient functions
> ---
>
> Key: HDDS-2105
> URL: https://issues.apache.org/jira/browse/HDDS-2105
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214
> There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
> HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
> increases code paths.
> Goal: Merge those functions into fewer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2464) Avoid unnecessary allocations for FileChannel.open call

2019-11-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2464?focusedWorklogId=342233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342233
 ]

ASF GitHub Bot logged work on HDDS-2464:


Author: ASF GitHub Bot
Created on: 12/Nov/19 22:14
Start Date: 12/Nov/19 22:14
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #147: HDDS-2464. 
Avoid unnecessary allocations for FileChannel.open call
URL: https://github.com/apache/hadoop-ozone/pull/147
 
 
   ## What changes were proposed in this pull request?
   
   `ChunkUtils` calls `FileChannel#open(Path, OpenOption...)`.  Vararg array 
elements are then added to a new `HashSet` to be passed to 
`FileChannel#open(Path, Set, FileAttribute...)`.  We 
can call the latter directly instead.
   
   https://issues.apache.org/jira/browse/HDDS-2464
   
   ## How was this patch tested?
   
   Ran `TestChunkUtils`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 342233)
Remaining Estimate: 0h
Time Spent: 10m

> Avoid unnecessary allocations for FileChannel.open call
> ---
>
> Key: HDDS-2464
> URL: https://issues.apache.org/jira/browse/HDDS-2464
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ChunkUtils}} calls {{FileChannel#open(Path, OpenOption...)}}.  Vararg array 
> elements are then added to a new {{HashSet}} to call {{FileChannel#open(Path, 
> Set, FileAttribute...)}}.  We can call the latter 
> directly instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work started] (HDDS-2105) Merge OzoneClientFactory#getRpcClient functions

2019-11-12 Thread Siyao Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2105 started by Siyao Meng.

> Merge OzoneClientFactory#getRpcClient functions
> ---
>
> Key: HDDS-2105
> URL: https://issues.apache.org/jira/browse/HDDS-2105
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214
> There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
> HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
> increases code paths.
> Goal: Merge those functions into fewer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2464) Avoid unnecessary allocations for FileChannel.open call

2019-11-12 Thread Attila Doroszlai (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2464:
---
Status: Patch Available  (was: Open)

> Avoid unnecessary allocations for FileChannel.open call
> ---
>
> Key: HDDS-2464
> URL: https://issues.apache.org/jira/browse/HDDS-2464
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ChunkUtils}} calls {{FileChannel#open(Path, OpenOption...)}}.  Vararg array 
> elements are then added to a new {{HashSet}} to call {{FileChannel#open(Path, 
> Set, FileAttribute...)}}.  We can call the latter 
> directly instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14982) Backport HADOOP-16152 to branch-3.1

2019-11-12 Thread Siyao Meng (Jira)

Siyao Meng created HDFS-14982:
-

 Summary: Backport HADOOP-16152 to branch-3.1
 Key: HDFS-14982
 URL: https://issues.apache.org/jira/browse/HDFS-14982
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.1.3
Reporter: Siyao Meng
Assignee: Siyao Meng


HADOOP-16152. Upgrade Eclipse Jetty version to 9.4.x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-11-12 Thread Wei-Chiu Chuang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972859#comment-16972859
 ] 

Wei-Chiu Chuang commented on HDFS-14884:


Sorry missed it. I'll review it for sure.

> Add sanity check that zone key equals feinfo key while setting Xattrs
> -
>
> Key: HDFS-14884
> URL: https://issues.apache.org/jira/browse/HDFS-14884
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Affects Versions: 2.11.0
>Reporter: Mukul Kumar Singh
>Assignee: Yuval Degani
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.11.0
>
> Attachments: HDFS-14884-branch-2.001.patch, HDFS-14884.001.patch, 
> HDFS-14884.002.patch, HDFS-14884.003.patch, hdfs_distcp.patch
>
>
> Currently, it is possible to set an external attribute where the  zone key is 
> not the same as  feinfo key. This jira will add a precondition before setting 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14528) Failover from Active to Standby Failed

2019-11-12 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972872#comment-16972872
 ] 

Hadoop QA commented on HDFS-14528:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 36s{color} | {color:orange} root: The patch generated 3 new + 36 unchanged - 
0 fixed = 39 total (was 36) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
56s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 98m 41s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}230m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.tools.TestObserverManualFailover |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.server.balancer.TestBalancer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14528 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985649/HDFS-14528.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5c78eec29cb6 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality

[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874
 ] 

Bharat Viswanadham commented on HDDS-2356:
--

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, 
bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, 
replicationType=RATIS, replicationFactor=ONE, keyLocationIn       fo=[], 
multipartList=[partNumber: 1

  5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"

  5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"

 

 

.

.

  5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"

  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278

2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD {volume=s325d55ad283aa400af464c76d713c07ad, 
bucket=ozone-test, key=plc_1570863541668_9278, dataSize=0, 
replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=       []} | 
ret=SUCCESS |

 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber. 
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below.

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    networkLocation: "/default-rack"
  }
  state: PIPELINE_OPEN
  type: RATIS
  factor: T

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:08 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD

{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
   5912 ]}

| ret=FAILURE | INVALID_PART org.apache.hadoop.ozone.om.exceptions.OMException: 
Complete Multipart Upload Failed: volume: 
s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: plc_1570863541668_9278

2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD

{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []}

| ret=SUCCESS |

 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDA

[jira] [Resolved] (HDFS-14792) [SBN read] StanbyNode does not come out of safemode while adding new blocks.

2019-11-12 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-14792.

Fix Version/s: 2.10.1
   Resolution: Fixed

This turned out to be related to the same race condition between edits 
{{OP_ADD_BLOCK}} and IBRs of HDFS-14941. We do not see any delays in leaving 
safemode on StandbyNode after the HDFS-14941 fix.
Closing this as fixed.

> [SBN read] StanbyNode does not come out of safemode while adding new blocks.
> 
>
> Key: HDFS-14792
> URL: https://issues.apache.org/jira/browse/HDFS-14792
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
> Fix For: 2.10.1
>
>
> During startup StandbyNode reports that it needs additional X blocks to reach 
> the threshold 1.. Where X is changing up and down.
> This is because with fast tailing SBN adds new blocks from edits while DNs 
> have not reported replicas yet. Being in SafeMode SBN counts new blocks 
> towards the threshold and can stay in SafeMode for a long time.
> By design, the purpose of startup SafeMode is to disallow modifications of 
> the namespace and blocks map until all DN replicas are reported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:17 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

 
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []} 
{code}
 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDA

[jira] [Comment Edited] (HDFS-14283) DFSInputStream to prefer cached replica

2019-11-12 Thread Siyao Meng (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972895#comment-16972895
 ] 

Siyao Meng edited comment on HDFS-14283 at 11/12/19 11:20 PM:
--

[~leosun08] I discussed with [~weichiu]. I'm fine with ditching the sorting 
logic on the server side so that we don't need to make any server side changes 
in this patch. One reason is that in most cases there will only be one cached 
replica for a block.

We will simply allow the client to prefer the cached replica with a 
configuration option then.


was (Author: smeng):
[~leosun08] I discussed with [~weichiu]. I'm fine with ditching the sorting 
logic on the server side so that we don't need to make any server side changed 
in this patch. One reason is that in most cases there will only be one cached 
replica for a block.

We will simply allow the client to prefer the cached replica with a 
configuration option then.

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch, HDFS-14283.004.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

2019-11-12 Thread Siyao Meng (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972895#comment-16972895
 ] 

Siyao Meng commented on HDFS-14283:
---

[~leosun08] I discussed with [~weichiu]. I'm fine with ditching the sorting 
logic on the server side so that we don't need to make any server side changed 
in this patch. One reason is that in most cases there will only be one cached 
replica for a block.

We will simply allow the client to prefer the cached replica with a 
configuration option then.

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch, HDFS-14283.004.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:21 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

 
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []} 
{code}
 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDA

[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-11-12 Thread Bharat Viswanadham (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:22 PM:
-

Hi [~timmylicheng]

Thanks for sharing the logs.

I see an abort multipart upload request for the key plc_1570863541668_9278 once 
complete multipart upload failed.

 

 
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 | 
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationIn       fo=[], multipartList=[partNumber: 1  
 5626 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
   5627 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
     . .   5911 partName: 
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
  5912 ]} | ret=FAILURE | INVALID_PART 
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload 
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey: 
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO  | OMAudit | user=root | ip=9.134.50.210 | 
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test, 
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=       []} 
{code}
 

And after that still, allocateBlock is continuing for the key because the entry 
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed 
only entry which has been created during initiateMPU request, so that is the 
reason after some time you see the  NO_SUCH_MULTIPART_UPLOAD error during 
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But 
the strange thing I have observed is the clientID is not matching with any of 
the name in the partlist, as partName lastpart is clientID.)

 

And from the OM audit log, I see partNumber 1, and a list of multipart names, 
not sure if some log is truncated here. As it should show like part name, 
partNumber.
 # If you can confirm for this key what are parts in OM, you can get this from 
listParts(But this should be done before abort request).
 # Check in the OM audit log for this key what is the partlist we get, not sure 
in the uploaded log it is truncated. 

 

On my cluster audit logs look like below, where when completeMultipartUpload, I 
can see partNumber and partName.(Whereas in the uploaded log, I don't see like 
below)

 

 
{code:java}
2019-11-12 14:57:18,580 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS, 
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,967 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE, 
keyLocationInfo=[]} | ret=SUCCESS | 
2019-11-12 14:57:53,974 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234, 
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE, 
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS | 
2019-11-12 14:57:54,154 | INFO  | OMAudit | user=root | ip=10.65.53.160 | 
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, 
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS, 
replicationFactor=ONE, keyLocationInfo=[blockID {
  containerBlockID {
    containerID: 6
    localID: 103127415126327331
  }
  blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
  leaderID: ""
  members {
    uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    ipAddress: "10.65.49.251"
    hostName: "bh-ozone-3.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    ipAddress: "10.65.51.23"
    hostName: "bh-ozone-4.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDALONE"
      value: 9859
    }
    networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
    networkLocation: "/default-rack"
  }
  members {
    uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
    ipAddress: "10.65.53.160"
    hostName: "bh-ozone-2.vpc.cloudera.com"
    ports {
      name: "RATIS"
      value: 9858
    }
    ports {
      name: "STANDA

[jira] [Created] (HDDS-2465) S3 Multipart upload failing

2019-11-12 Thread Bharat Viswanadham (Jira)

Bharat Viswanadham created HDDS-2465:


 Summary: S3 Multipart upload failing
 Key: HDDS-2465
 URL: https://issues.apache.org/jira/browse/HDDS-2465
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Bharat Viswanadham


When I run attached java program, facing below error, during 
completeMultipartUpload.
{code:java}
ERROR StatusLogger No Log4j 2 configuration file found. Using default 
configuration (logging only errors to the console), or user programmatically 
provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
internal initialization logging. See 
https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions 
on how to configure Log4j 2ERROR StatusLogger No Log4j 2 configuration file 
found. Using default configuration (logging only errors to the console), or 
user programmatically provided configurations. Set system property 
'log4j2.debug' to show Log4j 2 internal initialization logging. See 
https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions 
on how to configure Log4j 2Exception in thread "main" 
com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon 
S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 
c7b87393-955b-4c93-85f6-b02945e293ca; S3 Extended Request ID: 7tnVbqgc4bgb), S3 
Extended Request ID: 7tnVbqgc4bgb at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
 at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
 at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921) at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867) at 
com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3464)
 at org.apache.hadoop.ozone.freon.MPU.main(MPU.java:96){code}
When I debug it is not the request is not been received by S3Gateway, and I 
don't see any trace of this in audit log.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2465) S3 Multipart upload failing

2019-11-12 Thread Bharat Viswanadham (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-2465:
-
Attachment: MPU.java

> S3 Multipart upload failing
> ---
>
> Key: HDDS-2465
> URL: https://issues.apache.org/jira/browse/HDDS-2465
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Priority: Major
> Attachments: MPU.java
>
>
> When I run attached java program, facing below error, during 
> completeMultipartUpload.
> {code:java}
> ERROR StatusLogger No Log4j 2 configuration file found. Using default 
> configuration (logging only errors to the console), or user programmatically 
> provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
> internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2ERROR StatusLogger No Log4j 2 
> configuration file found. Using default configuration (logging only errors to 
> the console), or user programmatically provided configurations. Set system 
> property 'log4j2.debug' to show Log4j 2 internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2Exception in thread "main" 
> com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: 
> Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 
> c7b87393-955b-4c93-85f6-b02945e293ca; S3 Extended Request ID: 7tnVbqgc4bgb), 
> S3 Extended Request ID: 7tnVbqgc4bgb at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867) at 
> com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3464)
>  at org.apache.hadoop.ozone.freon.MPU.main(MPU.java:96){code}
> When I debug it is not the request is not been received by S3Gateway, and I 
> don't see any trace of this in audit log.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2465) S3 Multipart upload failing

2019-11-12 Thread Bharat Viswanadham (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972951#comment-16972951
 ] 

Bharat Viswanadham commented on HDDS-2465:
--

cc [~elek]

> S3 Multipart upload failing
> ---
>
> Key: HDDS-2465
> URL: https://issues.apache.org/jira/browse/HDDS-2465
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Priority: Major
> Attachments: MPU.java
>
>
> When I run attached java program, facing below error, during 
> completeMultipartUpload.
> {code:java}
> ERROR StatusLogger No Log4j 2 configuration file found. Using default 
> configuration (logging only errors to the console), or user programmatically 
> provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
> internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2ERROR StatusLogger No Log4j 2 
> configuration file found. Using default configuration (logging only errors to 
> the console), or user programmatically provided configurations. Set system 
> property 'log4j2.debug' to show Log4j 2 internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2Exception in thread "main" 
> com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: 
> Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 
> c7b87393-955b-4c93-85f6-b02945e293ca; S3 Extended Request ID: 7tnVbqgc4bgb), 
> S3 Extended Request ID: 7tnVbqgc4bgb at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867) at 
> com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3464)
>  at org.apache.hadoop.ozone.freon.MPU.main(MPU.java:96){code}
> When I debug it is not the request is not been received by S3Gateway, and I 
> don't see any trace of this in audit log.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14648:
---
Attachment: HDFS-14648.010.patch

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14612) SlowDiskReport won't update when SlowDisks is always empty in heartbeat

2019-11-12 Thread Haibin Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibin Huang updated HDFS-14612:

Attachment: HDFS-14612-005.patch

> SlowDiskReport won't update when SlowDisks is always empty in heartbeat
> ---
>
> Key: HDFS-14612
> URL: https://issues.apache.org/jira/browse/HDFS-14612
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haibin Huang
>Assignee: Haibin Huang
>Priority: Major
> Attachments: HDFS-14612-001.patch, HDFS-14612-002.patch, 
> HDFS-14612-003.patch, HDFS-14612-004.patch, HDFS-14612-005.patch, 
> HDFS-14612.patch
>
>
> I found SlowDiskReport won't update when slowDisks is always empty in 
> org.apache.hadoop.hdfs.server.blockmanagement.*handleHeartbeat*, this may 
> lead to outdated SlowDiskReport alway staying in jmx of namenode until next 
> time slowDisks isn't empty. So i think this method 
> *checkAndUpdateReportIfNecessary()* should be called firstly when we want to 
> get the jmx information about SlowDiskReport, this can keep the 
> SlowDiskReport on jmx is alway valid.
>  
> There is also some incorrect object reference on 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.
> *DataNodeVolumeMetrics*
> {code:java}
> // Based on writeIoRate
> public long getWriteIoSampleCount() {
>   return syncIoRate.lastStat().numSamples();
> }
> public double getWriteIoMean() {
>   return syncIoRate.lastStat().mean();
> }
> public double getWriteIoStdDev() {
>   return syncIoRate.lastStat().stddev();
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-12 Thread Xudong Cao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972962#comment-16972962
 ] 

Xudong Cao commented on HDFS-14969:
---

cc [~xkrogen] [~vagarychen]  [~shv] [~weichiu] I feel it's not good to remove 
the entire log. The more appropriate way is to update the logic to be aware of 
how many NNs are configured. We may need to add a new method to the 
FailoverProxyProvider interface such as getProxiesCount() , and then implement 
it in all subclasses. What do you think?

However, after the HDFS-14963 is merged in the future, I feel that this problem 
will be greatly alleviated.

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-12 Thread Xudong Cao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972962#comment-16972962
 ] 

Xudong Cao edited comment on HDFS-14969 at 11/13/19 2:34 AM:
-

cc [~xkrogen] [~vagarychen]  [~shv] [~weichiu] I feel it's not good to remove 
the entire log. The more appropriate way is to update the logic to be aware of 
how many NNs are configured. We may need to add a new method to the 
FailoverProxyProvider interface such as getProxiesCount() and implement it in 
all subclasses. Then We can compare the current failover count and the total 
number of NNs in RetryInvocationHandler to determine whether to print the 
failover log. What do you think?

However, after the HDFS-14963 is merged in the future, I feel that this problem 
will be greatly alleviated.


was (Author: xudongcao):
cc [~xkrogen] [~vagarychen]  [~shv] [~weichiu] I feel it's not good to remove 
the entire log. The more appropriate way is to update the logic to be aware of 
how many NNs are configured. We may need to add a new method to the 
FailoverProxyProvider interface such as getProxiesCount() , and then implement 
it in all subclasses. What do you think?

However, after the HDFS-14963 is merged in the future, I feel that this problem 
will be greatly alleviated.

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14283) DFSInputStream to prefer cached replica

2019-11-12 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14283:
---
Attachment: HDFS-14283.005.patch

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch, HDFS-14283.004.patch, HDFS-14283.005.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14283) DFSInputStream to prefer cached replica

2019-11-12 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972968#comment-16972968
 ] 

Lisheng Sun edited comment on HDFS-14283 at 11/13/19 3:05 AM:
--

Thanks for [~smeng] [~weichiu] [~ayushtkn] for good suggestions.

i updated the patch and uploaded the v005 patch. Could you mind review it? 
Thank you a lot. [~weichiu][~ayushtkn] [~smeng]


was (Author: leosun08):
Thanks for [~smeng] [~weichiu] for good suggestions.

i updated the patch and uploaded the v005 patch. Could you mind review it? 
Thank you a lot. [~weichiu] [~smeng]

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch, HDFS-14283.004.patch, HDFS-14283.005.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

2019-11-12 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972968#comment-16972968
 ] 

Lisheng Sun commented on HDFS-14283:


Thanks for [~smeng] [~weichiu] for good suggestions.

i updated the patch and uploaded the v005 patch. Could you mind review it? 
Thank you a lot. [~weichiu] [~smeng]

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch, HDFS-14283.004.patch, HDFS-14283.005.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Yiqun Lin (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972980#comment-16972980
 ] 

Yiqun Lin commented on HDFS-14648:
--

The latest patch looks great, some more comments:

*ClientContext.java*
 We need a method to stop dead node detector thread and called this in 
DFSClient#close.
{code:java}
  /**
   * Close dead node detector thread.
   */
  public void stopDeadNodeDetectorThread() {
  if (deadNodeDetectorThr != null) {
  deadNodeDetectorThr.interrupt();
  try {
  deadNodeDetectorThr.join(3000);
  } catch (InterruptedException e) {
  LOG.warn("Encountered exception while waiting to join on dead 
node detector thread.", e);
  }
}
  }

.
  public synchronized void close() throws IOException {
if(clientRunning) {
  ...
  // close dead node detector thread
  clientContext.stopDeadNodeDetectorThread();
}
  }
{code}

 *DFSInputStream.java*
 I haven't seen the call {{dfsClient.addNodeToDeadNodeDetector}} added in 
method {{createBlockReader}} under this class.

 *DFSStripedInputStream.java*
 Can we remove dfsClient.addNodeToDeadNodeDetector in this class? It's not 
expected enable dead node detection in the EC mode.
{code:java}
   fetchBlockAt(block.getStartOffset());
-  addToDeadNodes(dnInfo.info);
+  addToLocalDeadNodes(dnInfo.info);
+  dfsClient.addNodeToDeadNodeDetector(this, dnInfo.info);   <=== be 
removed
 }
{code}

Can we also fix this whitespace warning?
{noformat}
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDeadNodeDetection.java:113:
  public void testDeadNodeDetectionInMultipleDFSInputStream() 
{noformat}
Others looks good to me now.
  

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972990#comment-16972990
 ] 

Lisheng Sun commented on HDFS-14648:


hi [~linyiqun]
{quote}
DFSInputStream.java
I haven't seen the call dfsClient.addNodeToDeadNodeDetector added in method 
createBlockReader under this class
{quote}
i do not find the method DFSInputStream#createBlockReader. createBlockReader 
should be in DFSStripedInputStream.

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972990#comment-16972990
 ] 

Lisheng Sun edited comment on HDFS-14648 at 11/13/19 3:52 AM:
--

hi [~linyiqun]
{quote}
DFSInputStream.java
I haven't seen the call dfsClient.addNodeToDeadNodeDetector added in method 
createBlockReader under this class
{quote}
i do not find the method DFSInputStream#createBlockReader. createBlockReader 
should be in DFSStripedInputStream.
please correct me if i was wrong.Thank you.


was (Author: leosun08):
hi [~linyiqun]
{quote}
DFSInputStream.java
I haven't seen the call dfsClient.addNodeToDeadNodeDetector added in method 
createBlockReader under this class
{quote}
i do not find the method DFSInputStream#createBlockReader. createBlockReader 
should be in DFSStripedInputStream.

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-12 Thread Yiqun Lin (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972994#comment-16972994
 ] 

Yiqun Lin commented on HDFS-14648:
--

[~leosun08], sorry for the confused, you are right. Please remove this change 
in DFSStripedInputStream and address other comments. Thanks.

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

1 2 >

1 - 100 of 114 matches

Mail list logo