[jira] [Resolved] (HDDS-1015) Cleanup snapshot repository settings

2019-10-24 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-1015.
--
Resolution: Won't Fix

> Cleanup snapshot repository settings
> 
>
> Key: HDDS-1015
> URL: https://issues.apache.org/jira/browse/HDDS-1015
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDDS-1015.00.patch
>
>
> Now we can clean up snapshot repository settings from hadoop-hdds/pom.xml and 
> hadoop-ozone/pom.xml
> As now we have moved our dependencies from Hadoop 3.2.1-SNAPSHOT to 3.2.0 as 
> part of HDDS-993, we don't require them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2365) TestRatisPipelineProvider#testCreatePipelinesDnExclude is flaky

2019-10-24 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2365 started by Attila Doroszlai.
--
> TestRatisPipelineProvider#testCreatePipelinesDnExclude is flaky
> ---
>
> Key: HDDS-2365
> URL: https://issues.apache.org/jira/browse/HDDS-2365
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>
> TestRatisPipelineProvider#testCreatePipelinesDnExclude is flaky, failing in 
> CI intermittently:
> * 
> https://github.com/elek/ozone-ci-03/blob/master/pr/pr-hdds-2360-9pxww/integration/hadoop-ozone/integration-test/org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider.txt
> * 
> https://github.com/elek/ozone-ci-03/blob/master/pr/pr-hdds-2352-cxhw9/integration/hadoop-ozone/integration-test/org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2365) TestRatisPipelineProvider#testCreatePipelinesDnExclude is flaky

2019-10-24 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2365:
--

 Summary: TestRatisPipelineProvider#testCreatePipelinesDnExclude is 
flaky
 Key: HDDS-2365
 URL: https://issues.apache.org/jira/browse/HDDS-2365
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


TestRatisPipelineProvider#testCreatePipelinesDnExclude is flaky, failing in CI 
intermittently:

* 
https://github.com/elek/ozone-ci-03/blob/master/pr/pr-hdds-2360-9pxww/integration/hadoop-ozone/integration-test/org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider.txt
* 
https://github.com/elek/ozone-ci-03/blob/master/pr/pr-hdds-2352-cxhw9/integration/hadoop-ozone/integration-test/org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter

2019-10-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959434#comment-16959434
 ] 

Hadoop QA commented on HDFS-14730:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 450 unchanged - 7 fixed = 450 total (was 457) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 43s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}177m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeLifeline |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14730 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983981/HDFS-14730.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 6ad3d5d935c6 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 0db0f1e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28174/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28174/testReport/ |
| Max. process+thread count | 2836 (vs. ulimit of 5500) |

[jira] [Updated] (HDFS-14933) Fixing a typo in documentaion of Observer NameNode

2019-10-24 Thread Xieming Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xieming Li updated HDFS-14933:
--
Attachment: HDFS-14933.001.patch
Status: Patch Available  (was: Open)

> Fixing a typo in documentaion of Observer NameNode
> --
>
> Key: HDFS-14933
> URL: https://issues.apache.org/jira/browse/HDFS-14933
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Attachments: HDFS-14933.001.patch
>
>
> Fix a typo in documentation Observer NameNode
> https://aajisaka.github.io/hadoop-document/hadoop-project/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html
> This 
> {code}
>   
>   dfs.ha.tail-edits.period
>   10s
> 
> {code}
> should be changed to 
> {code}
>   
>   dfs.ha.tail-edits.period.backoff-max
>   10s
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14933) Fixing a typo in documentaion of Observer NameNode

2019-10-24 Thread Xieming Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xieming Li updated HDFS-14933:
--
Description: 
Fix a typo in documentation Observer NameNode
https://aajisaka.github.io/hadoop-document/hadoop-project/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html

This 
{code}
  
  dfs.ha.tail-edits.period
  10s

{code}

should be changed to 
{code}
  
  dfs.ha.tail-edits.period.backoff-max
  10s

{code}


  was:
Fix a typo in documentation Observer NameNode


This 
{code}
  
  dfs.ha.tail-edits.period
  10s

{code}

should be changed to 
{code}
  
  dfs.ha.tail-edits.period.backoff-max
  10s

{code}



> Fixing a typo in documentaion of Observer NameNode
> --
>
> Key: HDFS-14933
> URL: https://issues.apache.org/jira/browse/HDFS-14933
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
>
> Fix a typo in documentation Observer NameNode
> https://aajisaka.github.io/hadoop-document/hadoop-project/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html
> This 
> {code}
>   
>   dfs.ha.tail-edits.period
>   10s
> 
> {code}
> should be changed to 
> {code}
>   
>   dfs.ha.tail-edits.period.backoff-max
>   10s
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14933) Fixing a typo in documentaion of Observer NameNode

2019-10-24 Thread Xieming Li (Jira)
Xieming Li created HDFS-14933:
-

 Summary: Fixing a typo in documentaion of Observer NameNode
 Key: HDFS-14933
 URL: https://issues.apache.org/jira/browse/HDFS-14933
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Xieming Li
Assignee: Xieming Li


Fix a typo in documentation Observer NameNode


This 
{code}
  
  dfs.ha.tail-edits.period
  10s

{code}

should be changed to 
{code}
  
  dfs.ha.tail-edits.period.backoff-max
  10s

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2364) Add a OM metrics to find the false positive rate for the keyMayExist

2019-10-24 Thread Mukul Kumar Singh (Jira)
Mukul Kumar Singh created HDDS-2364:
---

 Summary: Add a OM metrics to find the false positive rate for the 
keyMayExist
 Key: HDDS-2364
 URL: https://issues.apache.org/jira/browse/HDDS-2364
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Manager
Affects Versions: 0.5.0
Reporter: Mukul Kumar Singh


Add a OM metrics to find the false positive rate for the keyMayExist.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2355:

Priority: Blocker  (was: Critical)

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Blocker
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-3807) hadoop.http.filter.initializers not applied to webhdfs urls

2019-10-24 Thread Clay B. (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clay B. resolved HDFS-3807.
---
Target Version/s:   (was: )
  Resolution: Duplicate

> hadoop.http.filter.initializers not applied to webhdfs urls
> ---
>
> Key: HDFS-3807
> URL: https://issues.apache.org/jira/browse/HDFS-3807
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 0.23.3
>Reporter: Thomas Graves
>Priority: Major
>
> I was messing with the http filters and noticed that they don't get applied 
> when going to the webhdfs uri's.  This might also apply to the other internal 
> namenode servlets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2355:

Fix Version/s: 0.5.0

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Blocker
> Fix For: 0.5.0
>
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-24 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959419#comment-16959419
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/25/19 4:25 AM:


The above is an issue in OM, which might happen randomly, when there is another 
handler thread in OM is updating the partInfo Map while flush thread commits 
those entries. (During commit, we convert OmMultipartInfo to proto, during this 
we will see the above error).

 

Above config are not related to OM, they are for SCM end. 

 
{quote}However, writing fails due to no more blocks allocated. I guess my 
cluster cannot keep up with the writing. 
{quote}
we can see the error in  SCM logs why no more blocks are being allocated. And 
also this exception will be received by OM too.

 


was (Author: bharatviswa):
The above is an issue in OM, which might happen randomly, when there is another 
handler thread in OM is updating the partInfo Map while flush thread commits 
those entries. (During commit, we convert OmMultipartInfo to proto, during this 
we will see the above error).

 

Above config are not related to OM, they are for SCM end. 

 
{quote}However, writing fails due to no more blocks allocated. I guess my 
cluster cannot keep up with the writing. 
{quote}
we can see the error in  SCM logs why no more blocks are being allocated. And 
also this exception will be received by OM too.

 

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-24 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959419#comment-16959419
 ] 

Bharat Viswanadham commented on HDDS-2356:
--

The above is an issue in OM, which might happen randomly, when there is another 
handler thread in OM is updating the partInfo Map while flush thread commits 
those entries. (During commit, we convert OmMultipartInfo to proto, during this 
we will see the above error).

 

Above config are not related to OM, they are for SCM end. 

 
{quote}However, writing fails due to no more blocks allocated. I guess my 
cluster cannot keep up with the writing. 
{quote}
we can see the error in  SCM logs why no more blocks are being allocated. And 
also this exception will be received by OM too.

 

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-24 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959407#comment-16959407
 ] 

Li Cheng commented on HDDS-2356:


Quick update. I tried to make ozone have more handlers (like 10+ times more) 
and cease to see this error. See the attached properties. However, writing 
fails due to no more blocks allocated. I guess my cluster cannot keep up with 
the writing. 

 


 ozone.scm.handler.count.key
 128
 OZONE, MANAGEMENT, PERFORMANCE
 
 The number of RPC handler threads for each SCM service
 endpoint.

The default is appropriate for small clusters (tens of nodes).

Set a value that is appropriate for the cluster size. Generally, HDFS
 recommends RPC handler count is set to 20 * log2(Cluster Size) with an
 upper limit of 200. However, SCM will not have the same amount of
 traffic as Namenode, so a value much smaller than that will work well too.
 
 
 
 ozone.om.handler.count.key
 256
 OM, PERFORMANCE
 
 The number of RPC handler threads for OM service endpoints.
 
 
 
 dfs.container.ratis.num.container.op.executors
 128
 OZONE, RATIS, PERFORMANCE
 Number of executors that will be used by Ratis to execute
 container ops.(10 by default).
 
 
 
 dfs.container.ratis.num.write.chunk.threads
 512
 OZONE, RATIS, PERFORMANCE
 Maximum number of threads in the thread pool that Ratis
 will use for writing chunks (60 by default).
 
 

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2344) Add immutable entries in to the DoubleBuffer for Volume requests.

2019-10-24 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia updated HDDS-2344:

Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks for review [~xyao] and [~bharat] for contribution. This has been 
committed to master.

> Add immutable entries in to the DoubleBuffer for Volume requests.
> -
>
> Key: HDDS-2344
> URL: https://issues.apache.org/jira/browse/HDDS-2344
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> OMVolumeCreateRequest.java L159:
> {code:java}
> omClientResponse =
>  new OMVolumeCreateResponse(omVolumeArgs,volumeList, 
> omResponse.build());{code}
>  
> We add this to double-buffer, and double-buffer flushThread which is running 
> in the background when picks up, converts to protoBuf and to ByteArray and 
> write to rocksDB tables. So, during this conversion(This conversion will be 
> done without any lock acquire), if any other request changes internal 
> structure(like acls list) of OmVolumeArgs we might get 
> ConcurrentModificationException.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Improve datanode write failure log

2019-10-24 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Description: 
Logs as following didn't reveal the true failure of write failure. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR

  was:
Logs as following haven't reveal the true failure of write failure. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR


> Improve datanode write failure log
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> Logs as following didn't reveal the true failure of write failure. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2363) Improve datanode write failure log

2019-10-24 Thread Sammi Chen (Jira)
Sammi Chen created HDDS-2363:


 Summary: Improve datanode write failure log
 Key: HDDS-2363
 URL: https://issues.apache.org/jira/browse/HDDS-2363
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Reporter: Sammi Chen
Assignee: Sammi Chen


Logs as following haven't reveal the true failure of write failure. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.

2019-10-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959390#comment-16959390
 ] 

Hadoop QA commented on HDFS-14908:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red} HDFS-14908 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14908 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28175/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> LeaseManager should check parent-child relationship when filter open files.
> ---
>
> Key: HDFS-14908
> URL: https://issues.apache.org/jira/browse/HDFS-14908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch, 
> HDFS-14908.003.patch, Test.java, TestV2.java, TestV3.java
>
>
> Now when doing listOpenFiles(), LeaseManager only checks whether the filter 
> path is the prefix of the open files. We should check whether the filter path 
> is the parent/ancestor of the open files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.

2019-10-24 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959388#comment-16959388
 ] 

Jinglun edited comment on HDFS-14908 at 10/25/19 3:29 AM:
--

Hi [~hexiaoqiao], thanks your nice comments ! And sorry for my late response. 
Your demo is much simpler and it needs a minor change. When super user calls 
listOpenFiles, the parameter path might end with a slash, like 
'/user/hdfs_admin/demo/'. So before calling the code below we must normalize 
the path first. The normalize part would introduce some overhead.
{code:java}
(fullPathName.startsWith(path) && (fullPathName.equals(path) || 
fullPathName.charAt(path.length() - 1) == Path.SEPARATOR_CHAR)){code}
When doing the random tests, I added a new method startsWithAndCharAt() with 
the normalize part to monitor the demo.

 
{code:java}
  public static boolean startsWithAndCharAt(String path, String parent) {
if (path.length() > 1 && path.charAt(path.length() - 1) == '/') {
  path = path.substring(0, path.length() - 1);
}
return path.startsWith(parent) && (path.equals(parent)
|| path.charAt(parent.length() - 1) == '/');
  }

{code}
And here is the result:

*Case 1:*

path starts with parent and neither path nor parent end with '/'

 
||Time||100,000,000||
|isParent|7,888ms|
|startsWithAndCharAt|8,850ms|
|startsWith|7,877ms|

*Case 2:*

path doesn't start with parent and neither path nor parent end with '/'
||Time||10,000,000,000||
|isParent|2,391ms|
|startsWithAndCharAt|2,362ms|
|startsWith|2,384ms|

*Case 4:*

path starts with parent and both path and parent end with '/'
||Time||100,000,000||
|isParent|7,882ms|
|startsWithAndCharAt|11,118ms|
|startsWith|7,803ms|

Test commands are:
{quote}java -Xmx512m Test 1 case1

java -Xmx512m Test 100 case2

java -Xmx512m Test 1 case4
{quote}
Test file is TestV3.java


was (Author: lijinglun):
Hi [~hexiaoqiao], thanks your nice comments ! And sorry for my late response. 
Your demo is much simpler and it needs a minor change. When super user calls 
listOpenFiles, the parameter path might end with a slash, like 
'/user/hdfs_admin/demo/'. So before calling the code below we must normalize 
the path first. The normalize part would introduce some overhead.
(fullPathName.startsWith(path) && (fullPathName.equals(path) || 
fullPathName.charAt(path.length() - 1) == Path.SEPARATOR_CHAR))
When doing the random tests, I added a new method startsWithAndCharAt() with 
the normalize part to monitor the demo.

 
{code:java}
  public static boolean startsWithAndCharAt(String path, String parent) {
if (path.length() > 1 && path.charAt(path.length() - 1) == '/') {
  path = path.substring(0, path.length() - 1);
}
return path.startsWith(parent) && (path.equals(parent)
|| path.charAt(parent.length() - 1) == '/');
  }

{code}
And here is the result:

*Case 1:* 

path starts with parent and neither path nor parent end with '/'

 
||Time||100,000,000||
|isParent|7,888ms|
|startsWithAndCharAt|8,850ms|
|startsWith|7,877ms|

*Case 2:*

path doesn't start with parent and neither path nor parent end with '/'
||Time||10,000,000,000||
|isParent|2,391ms|
|startsWithAndCharAt|2,362ms|
|startsWith|2,384ms|

*Case 4:*

path starts with parent and both path and parent end with '/'
||Time||100,000,000||
|isParent|7,882ms|
|startsWithAndCharAt|11,118ms|
|startsWith|7,803ms|

Test commands are:
{quote}java -Xmx512m Test 1 case1

java -Xmx512m Test 100 case2

java -Xmx512m Test 1 case4
{quote}
Test file is TestV3.java

> LeaseManager should check parent-child relationship when filter open files.
> ---
>
> Key: HDFS-14908
> URL: https://issues.apache.org/jira/browse/HDFS-14908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch, 
> HDFS-14908.003.patch, Test.java, TestV2.java, TestV3.java
>
>
> Now when doing listOpenFiles(), LeaseManager only checks whether the filter 
> path is the prefix of the open files. We should check whether the filter path 
> is the parent/ancestor of the open files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.

2019-10-24 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-14908:
---
Attachment: TestV3.java

> LeaseManager should check parent-child relationship when filter open files.
> ---
>
> Key: HDFS-14908
> URL: https://issues.apache.org/jira/browse/HDFS-14908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch, 
> HDFS-14908.003.patch, Test.java, TestV2.java, TestV3.java
>
>
> Now when doing listOpenFiles(), LeaseManager only checks whether the filter 
> path is the prefix of the open files. We should check whether the filter path 
> is the parent/ancestor of the open files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2344) Add immutable entries in to the DoubleBuffer for Volume requests.

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2344?focusedWorklogId=333851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333851
 ]

ASF GitHub Bot logged work on HDDS-2344:


Author: ASF GitHub Bot
Created on: 25/Oct/19 03:28
Start Date: 25/Oct/19 03:28
Worklog Time Spent: 10m 
  Work Description: dineshchitlangia commented on pull request #71: 
HDDS-2344. Add immutable entries in to the DoubleBuffer for Volume requests.
URL: https://github.com/apache/hadoop-ozone/pull/71
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333851)
Time Spent: 40m  (was: 0.5h)

> Add immutable entries in to the DoubleBuffer for Volume requests.
> -
>
> Key: HDDS-2344
> URL: https://issues.apache.org/jira/browse/HDDS-2344
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> OMVolumeCreateRequest.java L159:
> {code:java}
> omClientResponse =
>  new OMVolumeCreateResponse(omVolumeArgs,volumeList, 
> omResponse.build());{code}
>  
> We add this to double-buffer, and double-buffer flushThread which is running 
> in the background when picks up, converts to protoBuf and to ByteArray and 
> write to rocksDB tables. So, during this conversion(This conversion will be 
> done without any lock acquire), if any other request changes internal 
> structure(like acls list) of OmVolumeArgs we might get 
> ConcurrentModificationException.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.

2019-10-24 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959388#comment-16959388
 ] 

Jinglun commented on HDFS-14908:


Hi [~hexiaoqiao], thanks your nice comments ! And sorry for my late response. 
Your demo is much simpler and it needs a minor change. When super user calls 
listOpenFiles, the parameter path might end with a slash, like 
'/user/hdfs_admin/demo/'. So before calling the code below we must normalize 
the path first. The normalize part would introduce some overhead.
(fullPathName.startsWith(path) && (fullPathName.equals(path) || 
fullPathName.charAt(path.length() - 1) == Path.SEPARATOR_CHAR))
When doing the random tests, I added a new method startsWithAndCharAt() with 
the normalize part to monitor the demo.

 
{code:java}
  public static boolean startsWithAndCharAt(String path, String parent) {
if (path.length() > 1 && path.charAt(path.length() - 1) == '/') {
  path = path.substring(0, path.length() - 1);
}
return path.startsWith(parent) && (path.equals(parent)
|| path.charAt(parent.length() - 1) == '/');
  }

{code}
And here is the result:

*Case 1:* 

path starts with parent and neither path nor parent end with '/'

 
||Time||100,000,000||
|isParent|7,888ms|
|startsWithAndCharAt|8,850ms|
|startsWith|7,877ms|

*Case 2:*

path doesn't start with parent and neither path nor parent end with '/'
||Time||10,000,000,000||
|isParent|2,391ms|
|startsWithAndCharAt|2,362ms|
|startsWith|2,384ms|

*Case 4:*

path starts with parent and both path and parent end with '/'
||Time||100,000,000||
|isParent|7,882ms|
|startsWithAndCharAt|11,118ms|
|startsWith|7,803ms|

Test commands are:
{quote}java -Xmx512m Test 1 case1

java -Xmx512m Test 100 case2

java -Xmx512m Test 1 case4
{quote}
Test file is TestV3.java

> LeaseManager should check parent-child relationship when filter open files.
> ---
>
> Key: HDFS-14908
> URL: https://issues.apache.org/jira/browse/HDFS-14908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch, 
> HDFS-14908.003.patch, Test.java, TestV2.java
>
>
> Now when doing listOpenFiles(), LeaseManager only checks whether the filter 
> path is the prefix of the open files. We should check whether the filter path 
> is the parent/ancestor of the open files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2360) Update Ratis snapshot to d6d58d0

2019-10-24 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-2360.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> Update Ratis snapshot to d6d58d0
> 
>
> Key: HDDS-2360
> URL: https://issues.apache.org/jira/browse/HDDS-2360
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Client, Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update Ratis dependency version to snapshot 
> [d6d58d0|https://github.com/apache/incubator-ratis/commit/d6d58d0], to fix 
> memory issues (RATIS-726, RATIS-728).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2360) Update Ratis snapshot to d6d58d0

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2360?focusedWorklogId=333846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333846
 ]

ASF GitHub Bot logged work on HDDS-2360:


Author: ASF GitHub Bot
Created on: 25/Oct/19 03:13
Start Date: 25/Oct/19 03:13
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #83: 
HDDS-2360. Update Ratis snapshot to d6d58d0
URL: https://github.com/apache/hadoop-ozone/pull/83
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333846)
Time Spent: 20m  (was: 10m)

> Update Ratis snapshot to d6d58d0
> 
>
> Key: HDDS-2360
> URL: https://issues.apache.org/jira/browse/HDDS-2360
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Client, Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update Ratis dependency version to snapshot 
> [d6d58d0|https://github.com/apache/incubator-ratis/commit/d6d58d0], to fix 
> memory issues (RATIS-726, RATIS-728).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14917) Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"

2019-10-24 Thread Xieming Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959385#comment-16959385
 ] 

Xieming Li commented on HDFS-14917:
---

[~elgoiri][~tasanuma]
Thank you for your review.

> Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"
> ---
>
> Key: HDFS-14917
> URL: https://issues.apache.org/jira/browse/HDFS-14917
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Fix For: 3.3.0
>
> Attachments: HDFS-14917.patch, image-2019-10-21-17-49-10-635.png, 
> image-2019-10-21-17-49-58-759.png, image-2019-10-21-18-03-53-914.png, 
> image-2019-10-21-18-04-52-405.png, image-2019-10-21-18-05-19-160.png, 
> image-2019-10-21-18-13-01-884.png, image-2019-10-21-18-13-54-427.png
>
>
> This is a really simple UI change proposal:
>  The icon of "Decommissioned & dead" datanode could be improved. It can be 
> changed from    !image-2019-10-21-18-05-19-160.png|width=31,height=28! to   
> !image-2019-10-21-18-04-52-405.png|width=32,height=29! so that,
>  #  icon "  !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be 
> used for all status starts with "decommission" on dfshealth.html, 
>  #  icon "  !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be 
> differentiated with icon "  !image-2019-10-21-18-13-54-427.png! " on 
> federationhealth.html
> |*DataNode Infomation Legend (now)*
>  dfshealth.html#tab-datanode 
> |!image-2019-10-21-17-49-10-635.png|width=516,height=55!|
> |*DataNode* *Infomation* *Legend (proposed)*
>   dfshealth.html#tab-datanode 
> |!image-2019-10-21-18-03-53-914.png|width=589,height=60!|
> |*NameService Legend*
>  
> federationhealth.htm#tab-namenode|!image-2019-10-21-17-49-58-759.png|width=445,height=43!|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2361) Ozone Manager init & start command prints out unnecessary line in the beginning.

2019-10-24 Thread YiSheng Lien (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YiSheng Lien reassigned HDDS-2361:
--

Assignee: YiSheng Lien

> Ozone Manager init & start command prints out unnecessary line in the 
> beginning.
> 
>
> Key: HDDS-2361
> URL: https://issues.apache.org/jira/browse/HDDS-2361
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Aravindan Vijayan
>Assignee: YiSheng Lien
>Priority: Major
>
> {code}
> [root@avijayan-om-1 ozone-0.5.0-SNAPSHOT]# bin/ozone --daemon start om
> Ozone Manager classpath extended by
> {code}
> We could probably print this line only when extra elements are added to OM 
> classpathor skip printing this line altogether.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter

2019-10-24 Thread Chen Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-14730:
--
Attachment: (was: HDFS-14730.002.patch)

> Remove unused configuration dfs.web.authentication.filter 
> --
>
> Key: HDFS-14730
> URL: https://issues.apache.org/jira/browse/HDFS-14730
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14730.001.patch, HDFS-14730.002.patch
>
>
> After HADOOP-16314, this configuration is not used any where, so I propose to 
> deprecate it to avoid misuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter

2019-10-24 Thread Chen Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-14730:
--
Attachment: HDFS-14730.002.patch

> Remove unused configuration dfs.web.authentication.filter 
> --
>
> Key: HDFS-14730
> URL: https://issues.apache.org/jira/browse/HDFS-14730
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14730.001.patch, HDFS-14730.002.patch
>
>
> After HADOOP-16314, this configuration is not used any where, so I propose to 
> deprecate it to avoid misuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959358#comment-16959358
 ] 

Hadoop QA commented on HDFS-14927:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 28m 
54s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  6m 
44s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14927 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983977/HDFS-14927.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e53f74ffb535 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b41394e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28173/testReport/ |
| Max. process+thread count | 3599 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28173/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RBF: Add metrics for async callers thread 

[jira] [Commented] (HDFS-14917) Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"

2019-10-24 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959356#comment-16959356
 ] 

Hudson commented on HDFS-14917:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17572 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17572/])
HDFS-14917. Change the ICON of "Decommissioned & dead" datanode on (tasanuma: 
rev 0db0f1e3990c4bf93ca8db41858860da6537a9bf)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/hadoop.css


> Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"
> ---
>
> Key: HDFS-14917
> URL: https://issues.apache.org/jira/browse/HDFS-14917
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Fix For: 3.3.0
>
> Attachments: HDFS-14917.patch, image-2019-10-21-17-49-10-635.png, 
> image-2019-10-21-17-49-58-759.png, image-2019-10-21-18-03-53-914.png, 
> image-2019-10-21-18-04-52-405.png, image-2019-10-21-18-05-19-160.png, 
> image-2019-10-21-18-13-01-884.png, image-2019-10-21-18-13-54-427.png
>
>
> This is a really simple UI change proposal:
>  The icon of "Decommissioned & dead" datanode could be improved. It can be 
> changed from    !image-2019-10-21-18-05-19-160.png|width=31,height=28! to   
> !image-2019-10-21-18-04-52-405.png|width=32,height=29! so that,
>  #  icon "  !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be 
> used for all status starts with "decommission" on dfshealth.html, 
>  #  icon "  !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be 
> differentiated with icon "  !image-2019-10-21-18-13-54-427.png! " on 
> federationhealth.html
> |*DataNode Infomation Legend (now)*
>  dfshealth.html#tab-datanode 
> |!image-2019-10-21-17-49-10-635.png|width=516,height=55!|
> |*DataNode* *Infomation* *Legend (proposed)*
>   dfshealth.html#tab-datanode 
> |!image-2019-10-21-18-03-53-914.png|width=589,height=60!|
> |*NameService Legend*
>  
> federationhealth.htm#tab-namenode|!image-2019-10-21-17-49-58-759.png|width=445,height=43!|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14902) RBF: NullPointer When Misconfigured

2019-10-24 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-14902:

Parent: HDFS-14603
Issue Type: Sub-task  (was: Improvement)

> RBF: NullPointer When Misconfigured
> ---
>
> Key: HDFS-14902
> URL: https://issues.apache.org/jira/browse/HDFS-14902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: Takanobu Asanuma
>Priority: Minor
> Attachments: HDFS-14902.001.patch
>
>
> Admittedly the server was mis-configured, but this should be a bit more 
> elegant.
> {code:none}
> 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled 
> exception updating NN registration for null:null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108)
>   at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2344) Add immutable entries in to the DoubleBuffer for Volume requests.

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2344?focusedWorklogId=333819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333819
 ]

ASF GitHub Bot logged work on HDDS-2344:


Author: ASF GitHub Bot
Created on: 25/Oct/19 01:32
Start Date: 25/Oct/19 01:32
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #71: 
HDDS-2344. Add immutable entries in to the DoubleBuffer for Volume requests.
URL: https://github.com/apache/hadoop-ozone/pull/71
 
 
   …quests.
   
   ## What changes were proposed in this pull request?
   
   OMVolumeCreateRequest.java L159:
   
   omClientResponse =
new OMVolumeCreateResponse(omVolumeArgs,volumeList, omResponse.build());

   
   We add this to double-buffer, and double-buffer flushThread which is running 
in the background when picks up, converts to protoBuf and to ByteArray and 
write to rocksDB tables. So, during this conversion(This conversion will be 
done without any lock acquire), if any other request changes internal 
structure(like acls list) of OmVolumeArgs we might get 
ConcurrentModificationException.
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2344
   
   
   
   ## How was this patch tested?
   
   Ran TestOzoneRpcClient which tests this code path and also added a new test 
for clone method.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333819)
Time Spent: 0.5h  (was: 20m)

> Add immutable entries in to the DoubleBuffer for Volume requests.
> -
>
> Key: HDDS-2344
> URL: https://issues.apache.org/jira/browse/HDDS-2344
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> OMVolumeCreateRequest.java L159:
> {code:java}
> omClientResponse =
>  new OMVolumeCreateResponse(omVolumeArgs,volumeList, 
> omResponse.build());{code}
>  
> We add this to double-buffer, and double-buffer flushThread which is running 
> in the background when picks up, converts to protoBuf and to ByteArray and 
> write to rocksDB tables. So, during this conversion(This conversion will be 
> done without any lock acquire), if any other request changes internal 
> structure(like acls list) of OmVolumeArgs we might get 
> ConcurrentModificationException.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2344) Add immutable entries in to the DoubleBuffer for Volume requests.

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2344?focusedWorklogId=333818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333818
 ]

ASF GitHub Bot logged work on HDDS-2344:


Author: ASF GitHub Bot
Created on: 25/Oct/19 01:31
Start Date: 25/Oct/19 01:31
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #71: 
HDDS-2344. Add immutable entries in to the DoubleBuffer for Volume requests.
URL: https://github.com/apache/hadoop-ozone/pull/71
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333818)
Time Spent: 20m  (was: 10m)

> Add immutable entries in to the DoubleBuffer for Volume requests.
> -
>
> Key: HDDS-2344
> URL: https://issues.apache.org/jira/browse/HDDS-2344
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> OMVolumeCreateRequest.java L159:
> {code:java}
> omClientResponse =
>  new OMVolumeCreateResponse(omVolumeArgs,volumeList, 
> omResponse.build());{code}
>  
> We add this to double-buffer, and double-buffer flushThread which is running 
> in the background when picks up, converts to protoBuf and to ByteArray and 
> write to rocksDB tables. So, during this conversion(This conversion will be 
> done without any lock acquire), if any other request changes internal 
> structure(like acls list) of OmVolumeArgs we might get 
> ConcurrentModificationException.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14917) Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"

2019-10-24 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-14917:

Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for your contribution, [~risyomei], and thanks for 
your review, [~elgoiri].

> Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"
> ---
>
> Key: HDFS-14917
> URL: https://issues.apache.org/jira/browse/HDFS-14917
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Fix For: 3.3.0
>
> Attachments: HDFS-14917.patch, image-2019-10-21-17-49-10-635.png, 
> image-2019-10-21-17-49-58-759.png, image-2019-10-21-18-03-53-914.png, 
> image-2019-10-21-18-04-52-405.png, image-2019-10-21-18-05-19-160.png, 
> image-2019-10-21-18-13-01-884.png, image-2019-10-21-18-13-54-427.png
>
>
> This is a really simple UI change proposal:
>  The icon of "Decommissioned & dead" datanode could be improved. It can be 
> changed from    !image-2019-10-21-18-05-19-160.png|width=31,height=28! to   
> !image-2019-10-21-18-04-52-405.png|width=32,height=29! so that,
>  #  icon "  !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be 
> used for all status starts with "decommission" on dfshealth.html, 
>  #  icon "  !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be 
> differentiated with icon "  !image-2019-10-21-18-13-54-427.png! " on 
> federationhealth.html
> |*DataNode Infomation Legend (now)*
>  dfshealth.html#tab-datanode 
> |!image-2019-10-21-17-49-10-635.png|width=516,height=55!|
> |*DataNode* *Infomation* *Legend (proposed)*
>   dfshealth.html#tab-datanode 
> |!image-2019-10-21-18-03-53-914.png|width=589,height=60!|
> |*NameService Legend*
>  
> federationhealth.htm#tab-namenode|!image-2019-10-21-17-49-58-759.png|width=445,height=43!|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-24 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959348#comment-16959348
 ] 

Íñigo Goiri commented on HDFS-14927:


The name {{getAsyncCallerServiceThreadPoolJson()}} seems a little specific.
Isn't there anything a little more general and descriptive?
In any case, please go ahead with the unit test.

> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-14927.001.patch
>
>
> It is good to add some monitoring on the async caller thread pool to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14638) [Dynamometer] Fix scripts to refer to current build structure

2019-10-24 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-14638:

Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> [Dynamometer] Fix scripts to refer to current build structure
> -
>
> Key: HDFS-14638
> URL: https://issues.apache.org/jira/browse/HDFS-14638
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode, test
>Reporter: Erik Krogen
>Assignee: Takanobu Asanuma
>Priority: Major
> Fix For: 3.3.0
>
>
> The scripts within the Dynamometer build dirs all refer to the old 
> distribution structure with a single {{bin}} directory and a single {{lib}} 
> directory. We need to update them to refer to the Hadoop-standard layout.
> Also as pointed out by [~pingsutw]:
> {quote}
> Due to dynamometer rename to hadoop-dynamometer in hadoop-tools
> but we still use old name of jar inside the scripts
> {code}
> "$hadoop_cmd" jar "${script_pwd}"/lib/dynamometer-infra-*.jar 
> org.apache.hadoop.tools.dynamometer.Client "$@"
> {code}
> We should rename these jar inside the scripts
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-24 Thread Leon Gao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959331#comment-16959331
 ] 

Leon Gao commented on HDFS-14927:
-

[~elgoiri] Please let me know the change makes sense to you, then I will add 
some UT ^

> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-14927.001.patch
>
>
> It is good to add some monitoring on the async caller thread pool to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-24 Thread Leon Gao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959320#comment-16959320
 ] 

Leon Gao commented on HDFS-14927:
-

Submitting patch and update the ticket name.. As executorService is just 
handling async fan-out calls, unlike RPC client connection pool.

> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-14927.001.patch
>
>
> It is good to add some monitoring on the async caller thread pool to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-24 Thread Leon Gao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959321#comment-16959321
 ] 

Leon Gao commented on HDFS-14927:
-

Example metrics:

 

"AsyncCallerServiceThreadPool" : "\{\"active\":0,\"total\":218,\"max\":19191}"

> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-14927.001.patch
>
>
> It is good to add some monitoring on the async caller thread pool to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-24 Thread Leon Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leon Gao updated HDFS-14927:

Attachment: HDFS-14927.001.patch
Status: Patch Available  (was: Reopened)

> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-14927.001.patch
>
>
> It is good to add some monitoring on the async caller thread pool to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14923) Remove dead code from HealthMonitor

2019-10-24 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959319#comment-16959319
 ] 

Wei-Chiu Chuang commented on HDFS-14923:


+1

> Remove dead code from HealthMonitor
> ---
>
> Key: HDFS-14923
> URL: https://issues.apache.org/jira/browse/HDFS-14923
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Minor
> Attachments: HDFS-14923.001.patch
>
>
> Dig ZKFC source code and find that the dead code as follow
> {code}
> public void removeCallback(Callback cb) {
>callbacks.remove(cb);
> }
> public synchronized void removeServiceStateCallback(ServiceStateCallback cb) {
>serviceStateCallbacks.remove(cb);
> }
> synchronized HAServiceStatus getLastServiceStatus() {
>return lastServiceState;
> }
> {code}
> It's useless, and should be deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-24 Thread Leon Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leon Gao updated HDFS-14927:

Summary: RBF: Add metrics for async callers thread pool  (was: RBF: Add 
metrics for active RPC client threads for async calls)

> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>
> It is good to add some monitoring on the active RPC client threads to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-24 Thread Leon Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leon Gao updated HDFS-14927:

Description: It is good to add some monitoring on the async caller thread 
pool to handle fan-out RPC client requests, so we know the utilization and when 
to bump up dfs.federation.router.client.thread-size  (was: It is good to add 
some monitoring on the active RPC client threads to handle fan-out RPC client 
requests, so we know the utilization and when to bump up 
dfs.federation.router.client.thread-size)

> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>
> It is good to add some monitoring on the async caller thread pool to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14638) [Dynamometer] Fix scripts to refer to current build structure

2019-10-24 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959303#comment-16959303
 ] 

Hudson commented on HDFS-14638:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17571 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17571/])
HDFS-14638. [Dynamometer] Fix scripts to refer to current build (weichiu: rev 
b41394eec8552f419aefe452b3fdb8ff2506b9d1)
* (edit) 
hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-blockgen/src/main/bash/generate-block-lists.sh
* (edit) 
hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-workload/src/main/bash/start-workload.sh
* (edit) 
hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-infra/src/main/bash/start-dynamometer-cluster.sh


> [Dynamometer] Fix scripts to refer to current build structure
> -
>
> Key: HDFS-14638
> URL: https://issues.apache.org/jira/browse/HDFS-14638
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode, test
>Reporter: Erik Krogen
>Assignee: Takanobu Asanuma
>Priority: Major
>
> The scripts within the Dynamometer build dirs all refer to the old 
> distribution structure with a single {{bin}} directory and a single {{lib}} 
> directory. We need to update them to refer to the Hadoop-standard layout.
> Also as pointed out by [~pingsutw]:
> {quote}
> Due to dynamometer rename to hadoop-dynamometer in hadoop-tools
> but we still use old name of jar inside the scripts
> {code}
> "$hadoop_cmd" jar "${script_pwd}"/lib/dynamometer-infra-*.jar 
> org.apache.hadoop.tools.dynamometer.Client "$@"
> {code}
> We should rename these jar inside the scripts
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-24 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan reassigned HDDS-2355:
---

Assignee: Aravindan Vijayan

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Critical
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-24 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-2355:

Priority: Critical  (was: Major)

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Priority: Critical
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14931) hdfs crypto commands limit column width

2019-10-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959277#comment-16959277
 ] 

Hadoop QA commented on HDFS-14931:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 1 unchanged - 1 fixed = 2 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 98m  3s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDistributedFileSystem |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14931 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983959/HDFS-14931.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c9d33fd85c1e 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a1b4eeb |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28172/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28172/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Assigned] (HDDS-2273) Avoid buffer copying in GrpcReplicationService

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDDS-2273:
---

Assignee: Attila Doroszlai  (was: Tsz-wo Sze)

> Avoid buffer copying in GrpcReplicationService
> --
>
> Key: HDDS-2273
> URL: https://issues.apache.org/jira/browse/HDDS-2273
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>
> In GrpcOutputStream, it writes data to a ByteArrayOutputStream and copies 
> them to a ByteString.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2272) Avoid buffer copying in GrpcReplicationClient

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDDS-2272:
---

Assignee: Attila Doroszlai  (was: Tsz-wo Sze)

> Avoid buffer copying in GrpcReplicationClient
> -
>
> Key: HDDS-2272
> URL: https://issues.apache.org/jira/browse/HDDS-2272
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>
> In StreamDownloader.onNext, CopyContainerResponseProto is copied to a byte[] 
> and then it is written out to the stream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2071) Support filters in ozone insight point

2019-10-24 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2071:
-
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~elek] for the contribution. I've merged the fix to master. 

> Support filters in ozone insight point
> --
>
> Key: HDDS-2071
> URL: https://issues.apache.org/jira/browse/HDDS-2071
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With Ozone insight we can print out all the logs / metrics of one specific 
> component s (eg. scm.node-manager or scm.node-manager).
> It would be great to support additional filtering capabilities where the 
> output is filtered based on specific keys.
> For example to print out all of the logs related to one datanode or related 
> to one type of RPC request.
> Filter should be a key value map (eg. --filter 
> datanode=sjdhfhf,rpc=createChunk) which can be defined in the ozone insight 
> CLI.
> As we have no option to add additional tags to the logs (it may be supported 
> by log4j2 but not with slf4k), the first implementation can be implemented by 
> pattern matching.
> For example in SCMNodeManager.processNodeReport contains trace/debug logs 
> which includes the " [datanode={}]" part. This formatting convention can be 
> used to print out the only the related information. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2071) Support filters in ozone insight point

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2071?focusedWorklogId=333731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333731
 ]

ASF GitHub Bot logged work on HDDS-2071:


Author: ASF GitHub Bot
Created on: 24/Oct/19 21:55
Start Date: 24/Oct/19 21:55
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #4: HDDS-2071. 
Support filters in ozone insight point
URL: https://github.com/apache/hadoop-ozone/pull/4
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333731)
Time Spent: 1h 50m  (was: 1h 40m)

> Support filters in ozone insight point
> --
>
> Key: HDDS-2071
> URL: https://issues.apache.org/jira/browse/HDDS-2071
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With Ozone insight we can print out all the logs / metrics of one specific 
> component s (eg. scm.node-manager or scm.node-manager).
> It would be great to support additional filtering capabilities where the 
> output is filtered based on specific keys.
> For example to print out all of the logs related to one datanode or related 
> to one type of RPC request.
> Filter should be a key value map (eg. --filter 
> datanode=sjdhfhf,rpc=createChunk) which can be defined in the ozone insight 
> CLI.
> As we have no option to add additional tags to the logs (it may be supported 
> by log4j2 but not with slf4k), the first implementation can be implemented by 
> pattern matching.
> For example in SCMNodeManager.processNodeReport contains trace/debug logs 
> which includes the " [datanode={}]" part. This formatting convention can be 
> used to print out the only the related information. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2322) DoubleBuffer flush termination and OM shutdown's after that.

2019-10-24 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2322 started by Bharat Viswanadham.

> DoubleBuffer flush termination and OM shutdown's after that.
> 
>
> Key: HDDS-2322
> URL: https://issues.apache.org/jira/browse/HDDS-2322
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> om1_1       | 2019-10-18 00:34:45,317 [OMDoubleBufferFlushThread] ERROR      
> - Terminating with exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> om1_1       | java.util.ConcurrentModificationException
> om1_1       | at 
> java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1660)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
> om1_1       | at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> om1_1       | at 
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
> om1_1       | at 
> org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup.getProtobuf(OmKeyLocationInfoGroup.java:65)
> om1_1       | at 
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
> om1_1       | at 
> java.base/java.util.Collections$2.tryAdvance(Collections.java:4745)
> om1_1       | at 
> java.base/java.util.Collections$2.forEachRemaining(Collections.java:4753)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
> om1_1       | at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> om1_1       | at 
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
> om1_1       | at 
> org.apache.hadoop.ozone.om.helpers.OmKeyInfo.getProtobuf(OmKeyInfo.java:362)
> om1_1       | at 
> org.apache.hadoop.ozone.om.codec.OmKeyInfoCodec.toPersistedFormat(OmKeyInfoCodec.java:37)
> om1_1       | at 
> org.apache.hadoop.ozone.om.codec.OmKeyInfoCodec.toPersistedFormat(OmKeyInfoCodec.java:31)
> om1_1       | at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
> om1_1       | at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
> om1_1       | at 
> org.apache.hadoop.ozone.om.response.key.OMKeyCreateResponse.addToDBBatch(OMKeyCreateResponse.java:58)
> om1_1       | at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:139)
> om1_1       | at 
> java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> om1_1       | at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:137)
> om1_1       | at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-24 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-2356.
--
Resolution: Fixed

This will be fixed as part of HDDS-2322.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-24 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959238#comment-16959238
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/24/19 9:44 PM:


This will be fixed as part of HDDS-2322.  Thank You [~timmylicheng] for 
reporting this issue.


was (Author: bharatviswa):
This will be fixed as part of HDDS-2322.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-24 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-2356:
-
Fix Version/s: 0.5.0

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2362) XCeiverClientManager issues

2019-10-24 Thread Istvan Fajth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth reassigned HDDS-2362:
--

Assignee: Istvan Fajth

> XCeiverClientManager issues
> ---
>
> Key: HDDS-2362
> URL: https://issues.apache.org/jira/browse/HDDS-2362
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>
> These issues were revealed while reviewing the XCeiverClientManager, and the 
> clients.
> - secure clients are not released properly, so the reference counting does 
> not work with secure clients
> - even though we have reference counting for the clients, the cache can evict 
> and remove client instances with active references, as it is not connected 
> with the reference counts
> - isUseRatis, getFactor, getType is not really belonging to this class
> - acquireClient and acquireClientForRead and release methods of the same 
> kind, seems to be a bad smell, we might separate the two things, especially 
> because reads are using the grpc client while writes are using the ratis 
> client as I know
> - pipelines are leaking from the clients themselves, the pipelines are not 
> modified in these code paths, but it should be better if we can hide the 
> pipeline, and don't serve it for the clients, or if we can serve an immutable 
> version
> - ContainerProtocolCalls seems to be something that is extracted to a utility 
> class but it may be placed into the client itself, as in all the cases, the 
> client is gathered from the XCeiverClientManager, then given to one of 
> ContainerProtocolCalls' method, which calls the sendCommandAsync on the 
> client which does not seem to be necessary, we can encapsulate all the 
> protobuf message creation, and provide response data from the client.
> -ContainerOperationClient acquires the client twice from the 
> XCeiverClientManager in the createContainer call, but releases it only once
> - we can try to get rid of some of the synchronization by trying to eliminate 
> some of the states in the clients, and the manager, and replace them with 
> some polymorphism.
> I will go through this list one by one and will create JIRAs one by one using 
> this one as an umbrella JIRA, so that we can create PRs one by one, or if 
> needed, we can consolidate the whole thing into one PR at the end but review 
> one by one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2362) XCeiverClientManager issues

2019-10-24 Thread Istvan Fajth (Jira)
Istvan Fajth created HDDS-2362:
--

 Summary: XCeiverClientManager issues
 Key: HDDS-2362
 URL: https://issues.apache.org/jira/browse/HDDS-2362
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Istvan Fajth


These issues were revealed while reviewing the XCeiverClientManager, and the 
clients.

- secure clients are not released properly, so the reference counting does not 
work with secure clients
- even though we have reference counting for the clients, the cache can evict 
and remove client instances with active references, as it is not connected with 
the reference counts
- isUseRatis, getFactor, getType is not really belonging to this class
- acquireClient and acquireClientForRead and release methods of the same kind, 
seems to be a bad smell, we might separate the two things, especially because 
reads are using the grpc client while writes are using the ratis client as I 
know
- pipelines are leaking from the clients themselves, the pipelines are not 
modified in these code paths, but it should be better if we can hide the 
pipeline, and don't serve it for the clients, or if we can serve an immutable 
version
- ContainerProtocolCalls seems to be something that is extracted to a utility 
class but it may be placed into the client itself, as in all the cases, the 
client is gathered from the XCeiverClientManager, then given to one of 
ContainerProtocolCalls' method, which calls the sendCommandAsync on the client 
which does not seem to be necessary, we can encapsulate all the protobuf 
message creation, and provide response data from the client.
-ContainerOperationClient acquires the client twice from the 
XCeiverClientManager in the createContainer call, but releases it only once
- we can try to get rid of some of the synchronization by trying to eliminate 
some of the states in the clients, and the manager, and replace them with some 
polymorphism.

I will go through this list one by one and will create JIRAs one by one using 
this one as an umbrella JIRA, so that we can create PRs one by one, or if 
needed, we can consolidate the whole thing into one PR at the end but review 
one by one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14932) XCeiverClientManager issues

2019-10-24 Thread Istvan Fajth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth resolved HDFS-14932.
-
Resolution: Invalid

Sorry I meant to create this one in HDDS project, closing this one, as I can 
not move it.

> XCeiverClientManager issues
> ---
>
> Key: HDFS-14932
> URL: https://issues.apache.org/jira/browse/HDFS-14932
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Priority: Major
>
> These issues were revealed while reviewing the XCeiverClientManager, and the 
> clients.
> - secure clients are not released properly, so the reference counting does 
> not work with secure clients
> - even though we have reference counting for the clients, the cache can evict 
> and remove client instances with active references, as it is not connected 
> with the reference counts
> - isUseRatis, getFactor, getType is not really belonging to this class
> - acquireClient and acquireClientForRead and release methods of the same 
> kind, seems to be a bad smell, we might separate the two things, especially 
> because reads are using the grpc client while writes are using the ratis 
> client as I know
> - pipelines are leaking from the clients themselves, the pipelines are not 
> modified in these code paths, but it should be better if we can hide the 
> pipeline, and don't serve it for the clients, or if we can serve an immutable 
> version
> - ContainerProtocolCalls seems to be something that is extracted to a utility 
> class but it may be placed into the client itself, as in all the cases, the 
> client is gathered from the XCeiverClientManager, then given to one of 
> ContainerProtocolCalls' method, which calls the sendCommandAsync on the 
> client which does not seem to be necessary, we can encapsulate all the 
> protobuf message creation, and provide response data from the client.
> -ContainerOperationClient acquires the client twice from the 
> XCeiverClientManager in the createContainer call, but releases it only once
> - we can try to get rid of some of the synchronization by trying to eliminate 
> some of the states in the clients, and the manager, and replace them with 
> some polymorphism.
> I will go through this list one by one and will create JIRAs one by one using 
> this one as an umbrella JIRA, so that we can create PRs one by one, or if 
> needed, we can consolidate the whole thing into one PR at the end but review 
> one by one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14932) XCeiverClientManager issues

2019-10-24 Thread Istvan Fajth (Jira)
Istvan Fajth created HDFS-14932:
---

 Summary: XCeiverClientManager issues
 Key: HDFS-14932
 URL: https://issues.apache.org/jira/browse/HDFS-14932
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Istvan Fajth


These issues were revealed while reviewing the XCeiverClientManager, and the 
clients.

- secure clients are not released properly, so the reference counting does not 
work with secure clients
- even though we have reference counting for the clients, the cache can evict 
and remove client instances with active references, as it is not connected with 
the reference counts
- isUseRatis, getFactor, getType is not really belonging to this class
- acquireClient and acquireClientForRead and release methods of the same kind, 
seems to be a bad smell, we might separate the two things, especially because 
reads are using the grpc client while writes are using the ratis client as I 
know
- pipelines are leaking from the clients themselves, the pipelines are not 
modified in these code paths, but it should be better if we can hide the 
pipeline, and don't serve it for the clients, or if we can serve an immutable 
version
- ContainerProtocolCalls seems to be something that is extracted to a utility 
class but it may be placed into the client itself, as in all the cases, the 
client is gathered from the XCeiverClientManager, then given to one of 
ContainerProtocolCalls' method, which calls the sendCommandAsync on the client 
which does not seem to be necessary, we can encapsulate all the protobuf 
message creation, and provide response data from the client.
-ContainerOperationClient acquires the client twice from the 
XCeiverClientManager in the createContainer call, but releases it only once
- we can try to get rid of some of the synchronization by trying to eliminate 
some of the states in the clients, and the manager, and replace them with some 
polymorphism.

I will go through this list one by one and will create JIRAs one by one using 
this one as an umbrella JIRA, so that we can create PRs one by one, or if 
needed, we can consolidate the whole thing into one PR at the end but review 
one by one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log

2019-10-24 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959206#comment-16959206
 ] 

Erik Krogen commented on HDFS-14775:


I only took a quick look but it seems like a good change. 2 minor things:
* In {{FSNamesystemLock}} L167 we fetch and store the current time, we should 
re-use this below at L169 rather than re-fetching it
* The indentation on {{FSNamesystemLock}} L179 seems off?

> Add Timestamp for longest FSN write/read lock held log
> --
>
> Key: HDFS-14775
> URL: https://issues.apache.org/jira/browse/HDFS-14775
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, 
> HDFS-14775.003.patch, HDFS-14775.004.patch
>
>
> HDFS-13946 improved the log for longest read/write lock held time, it's very 
> useful improvement.
> In some condition, we need to locate the detailed call information(user, ip, 
> path, etc.) for longest lock holder, but the default throttle interval(10s) 
> is too long to find the corresponding audit log. I think we should add the 
> timestamp for the {{longestWriteLockHeldStackTrace}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2361) Ozone Manager init & start command prints out unnecessary line in the beginning.

2019-10-24 Thread Aravindan Vijayan (Jira)
Aravindan Vijayan created HDDS-2361:
---

 Summary: Ozone Manager init & start command prints out unnecessary 
line in the beginning.
 Key: HDDS-2361
 URL: https://issues.apache.org/jira/browse/HDDS-2361
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Aravindan Vijayan


{code}
[root@avijayan-om-1 ozone-0.5.0-SNAPSHOT]# bin/ozone --daemon start om
Ozone Manager classpath extended by
{code}

We could probably print this line only when extra elements are added to OM 
classpathor skip printing this line altogether.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2360) Update Ratis snapshot to d6d58d0

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2360?focusedWorklogId=333704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333704
 ]

ASF GitHub Bot logged work on HDDS-2360:


Author: ASF GitHub Bot
Created on: 24/Oct/19 20:31
Start Date: 24/Oct/19 20:31
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #83: HDDS-2360. 
Update Ratis snapshot to d6d58d0
URL: https://github.com/apache/hadoop-ozone/pull/83
 
 
   ## What changes were proposed in this pull request?
   
   Update Ratis dependency version to snapshot 
[d6d58d0](https://github.com/apache/incubator-ratis/commit/d6d58d0), to fix 
memory issues ([RATIS-726](https://issues.apache.org/jira/browse/RATIS-726), 
[RATIS-728](https://issues.apache.org/jira/browse/RATIS-728)).
   
   Thanks @szetszwo and @bshashikant for the fixes, and @mukul1987 for creating 
the snapshot release.
   
   https://issues.apache.org/jira/browse/HDDS-2360
   
   ## How was this patch tested?
   
   Tested with Freon using 1MB and 16MB keys.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333704)
Remaining Estimate: 0h
Time Spent: 10m

> Update Ratis snapshot to d6d58d0
> 
>
> Key: HDDS-2360
> URL: https://issues.apache.org/jira/browse/HDDS-2360
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Client, Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Update Ratis dependency version to snapshot 
> [d6d58d0|https://github.com/apache/incubator-ratis/commit/d6d58d0], to fix 
> memory issues (RATIS-726, RATIS-728).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2360) Update Ratis snapshot to d6d58d0

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2360:
-
Labels: pull-request-available  (was: )

> Update Ratis snapshot to d6d58d0
> 
>
> Key: HDDS-2360
> URL: https://issues.apache.org/jira/browse/HDDS-2360
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Client, Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> Update Ratis dependency version to snapshot 
> [d6d58d0|https://github.com/apache/incubator-ratis/commit/d6d58d0], to fix 
> memory issues (RATIS-726, RATIS-728).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-24 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959200#comment-16959200
 ] 

Bharat Viswanadham commented on HDDS-2356:
--

The issue is right now entries added are mutable, For Volume/Bucket this is 
fixed as part of HDDS-2344 and HDDS-2343.

There is already a Jira HDDS-2322 which is seen with similar stack trace error 
when doing Key creation.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14910) Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.

2019-10-24 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959196#comment-16959196
 ] 

Hudson commented on HDFS-14910:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17570 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17570/])
HDFS-14910. Rename Snapshot with Pre Descendants Fail With (github: rev 
a1b4eebcc92976a9fb78ad5d3ab70c52cc0a5fa7)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java


> Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.
> 
>
> Key: HDFS-14910
> URL: https://issues.apache.org/jira/browse/HDFS-14910
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Íñigo Goiri
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> TestRenameWithSnapshots#testRename2PreDescendant has been failing 
> consistently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14492) Snapshot memory leak

2019-10-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14492:
---
Fix Version/s: 3.2.2
   3.1.4

> Snapshot memory leak
> 
>
> Key: HDFS-14492
> URL: https://issues.apache.org/jira/browse/HDFS-14492
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.6.0
> Environment: CDH5.14.4
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> We recently examined the NameNode heap dump of a big, heavy snapshot user, 
> trying to trim some fat, and surely enough we found memory leak in it: when 
> snapshots are removed, the corresponding data structures are not removed.
> This cluster has 586 million file system objects (286 million files, 287 
> million blocks, 13 million directories), using around 132gb of heap.
> While only 44.5 million files have snapshotted copies, 
> (INodeFileAttributes$SnapshotCopy), most inodes (nearly 212 million) have 
> FileWithSnapshotFeature and FileDiffList. Those inodes had snapshotted copies 
> at some point in the past, but after snapshots are removed, those data 
> structured are still kept in the heap.
> INode$Feature = 32.5 byte on average, FileWithSnapshotFeature = 32 bytes, 
> FileDiffList = 24 bytes. It may not sound a lot, but they add up quickly in 
> large clusters like this. In this cluster, a whopping 13.8gb of memory could 
> have been saved:  ((32.5 + 32 + 24) bytes * (211997769 -  44572380) =~ 
> 13.8gb) if not for this bug. That is more than 10% of savings in heap size.
> Heap histogram for reference:
> {noformat}
> num #instances #bytes class name
>  --
>  1: 286418254 27496152384 org.apache.hadoop.hdfs.server.namenode.INodeFile
>  2: 28737 18388622528 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
>  3: 227899550 17144816120 [B
>  4: 287324031 13769408616 
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo;
>  5: 71352116 12353841568 [Ljava.lang.Object;
>  6: 286322650 9170335840 
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
>  7: 235632329 7658462416 
> [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
>  8: 4 7046430816 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
>  9: 211997769 6783928608 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature
>  10: 211997769 5087946456 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList
>  11: 76586261 3780468856 [I
>  12: 44572380 3209211360 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy
>  13: 58634517 2345380680 java.util.ArrayList
>  14: 44572380 2139474240 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff
>  15: 76582416 1837977984 org.apache.hadoop.hdfs.server.namenode.AclFeature
>  16: 12907668 1135874784 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory{noformat}
> [~szetszwo] [~arpaga] [~smeng] [~shashikant]  any thoughts?
> I am thinking that inside 
> AbstractINodeDiffList#deleteSnapshotDiff() , in addition to cleaning up file 
> diffs, it should also remove FileWithSnapshotFeature. I am not familiar with 
> the snapshot implementation, so any guidance is greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14910) Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.

2019-10-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14910:
---
Fix Version/s: 3.2.2
   3.1.4

> Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.
> 
>
> Key: HDFS-14910
> URL: https://issues.apache.org/jira/browse/HDFS-14910
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Íñigo Goiri
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> TestRenameWithSnapshots#testRename2PreDescendant has been failing 
> consistently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14492) Snapshot memory leak

2019-10-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14492:
---
Fix Version/s: (was: 3.1.4)
   3.3.0

> Snapshot memory leak
> 
>
> Key: HDFS-14492
> URL: https://issues.apache.org/jira/browse/HDFS-14492
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.6.0
> Environment: CDH5.14.4
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.3.0
>
>
> We recently examined the NameNode heap dump of a big, heavy snapshot user, 
> trying to trim some fat, and surely enough we found memory leak in it: when 
> snapshots are removed, the corresponding data structures are not removed.
> This cluster has 586 million file system objects (286 million files, 287 
> million blocks, 13 million directories), using around 132gb of heap.
> While only 44.5 million files have snapshotted copies, 
> (INodeFileAttributes$SnapshotCopy), most inodes (nearly 212 million) have 
> FileWithSnapshotFeature and FileDiffList. Those inodes had snapshotted copies 
> at some point in the past, but after snapshots are removed, those data 
> structured are still kept in the heap.
> INode$Feature = 32.5 byte on average, FileWithSnapshotFeature = 32 bytes, 
> FileDiffList = 24 bytes. It may not sound a lot, but they add up quickly in 
> large clusters like this. In this cluster, a whopping 13.8gb of memory could 
> have been saved:  ((32.5 + 32 + 24) bytes * (211997769 -  44572380) =~ 
> 13.8gb) if not for this bug. That is more than 10% of savings in heap size.
> Heap histogram for reference:
> {noformat}
> num #instances #bytes class name
>  --
>  1: 286418254 27496152384 org.apache.hadoop.hdfs.server.namenode.INodeFile
>  2: 28737 18388622528 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
>  3: 227899550 17144816120 [B
>  4: 287324031 13769408616 
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo;
>  5: 71352116 12353841568 [Ljava.lang.Object;
>  6: 286322650 9170335840 
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
>  7: 235632329 7658462416 
> [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
>  8: 4 7046430816 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
>  9: 211997769 6783928608 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature
>  10: 211997769 5087946456 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList
>  11: 76586261 3780468856 [I
>  12: 44572380 3209211360 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy
>  13: 58634517 2345380680 java.util.ArrayList
>  14: 44572380 2139474240 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff
>  15: 76582416 1837977984 org.apache.hadoop.hdfs.server.namenode.AclFeature
>  16: 12907668 1135874784 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory{noformat}
> [~szetszwo] [~arpaga] [~smeng] [~shashikant]  any thoughts?
> I am thinking that inside 
> AbstractINodeDiffList#deleteSnapshotDiff() , in addition to cleaning up file 
> diffs, it should also remove FileWithSnapshotFeature. I am not familiar with 
> the snapshot implementation, so any guidance is greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14910) Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.

2019-10-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14910:
---
Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~ayushtkn] for the review and [~inigoiri] for raising this issue. I'll 
also cherry pick the commit to lower branches.

> Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.
> 
>
> Key: HDFS-14910
> URL: https://issues.apache.org/jira/browse/HDFS-14910
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Íñigo Goiri
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
> Fix For: 3.3.0
>
>
> TestRenameWithSnapshots#testRename2PreDescendant has been failing 
> consistently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14450) Erasure Coding: decommissioning datanodes cause replicate a large number of duplicate EC internal blocks

2019-10-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14450:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Close this one as a dup. Thanks [~ferhui] for confirmation. And thanks 
[~wuweiwei] for raising the issue.

> Erasure Coding: decommissioning datanodes cause replicate a large number of 
> duplicate EC internal blocks
> 
>
> Key: HDFS-14450
> URL: https://issues.apache.org/jira/browse/HDFS-14450
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ec
>Reporter: Wu Weiwei
>Assignee: Wu Weiwei
>Priority: Major
> Attachments: HDFS-14450-000.patch
>
>
> {code:java}
> //  [WARN] [RedundancyMonitor] : Failed to place enough replicas, still in 
> need of 2 to reach 167 (unavailableStorages=[DISK, ARCHIVE], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All 
> required storage types are unavailable:  unavailableStorages=[DISK, ARCHIVE], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> {code}
> In a large-scale cluster, decommissioning large-scale datanodes cause EC 
> block groups to replicate a large number of duplicate internal blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width

2019-10-24 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14931:
---
Status: Patch Available  (was: Open)

> hdfs crypto commands limit column width
> ---
>
> Key: HDFS-14931
> URL: https://issues.apache.org/jira/browse/HDFS-14931
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: HDFS-14931.001.patch
>
>
> {noformat}
> foo@bar$ hdfs crypto -listZones
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
>   
> yptio
>   nzon
>   e1
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
>   
> yptio
>   nzon
>   e2
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
>   
> yptio
>   nzon
>   e3
> {noformat}
> The command ends up looking something really ugly like this when the path is 
> long. This also makes it very difficult to pipe the output into other 
> utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width

2019-10-24 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14931:
---
Attachment: HDFS-14931.001.patch

> hdfs crypto commands limit column width
> ---
>
> Key: HDFS-14931
> URL: https://issues.apache.org/jira/browse/HDFS-14931
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: HDFS-14931.001.patch
>
>
> {noformat}
> foo@bar$ hdfs crypto -listZones
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
>   
> yptio
>   nzon
>   e1
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
>   
> yptio
>   nzon
>   e2
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
>   
> yptio
>   nzon
>   e3
> {noformat}
> The command ends up looking something really ugly like this when the path is 
> long. This also makes it very difficult to pipe the output into other 
> utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14931) hdfs crypto commands limit column width

2019-10-24 Thread Eric Badger (Jira)
Eric Badger created HDFS-14931:
--

 Summary: hdfs crypto commands limit column width
 Key: HDFS-14931
 URL: https://issues.apache.org/jira/browse/HDFS-14931
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


{noformat}
foo@bar$ hdfs crypto -listZones
/projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
  yptio
  nzon
  e1
/projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
  yptio
  nzon
  e2
/projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
  yptio
  nzon
  e3
{noformat}
The command ends up looking something really ugly like this when the path is 
long. This also makes it very difficult to pipe the output into other 
utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2283) Container creation on datanodes take time because of Rocksdb option creation.

2019-10-24 Thread Jitendra Nath Pandey (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDDS-2283:
---
Fix Version/s: 0.5.0

> Container creation on datanodes take time because of Rocksdb option creation.
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-2283.00.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.
> Creating a rocksdb per disk should be enough and each container can be table 
> inside the rocksdb.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-24 Thread Jitendra Nath Pandey (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959180#comment-16959180
 ] 

Jitendra Nath Pandey commented on HDDS-2356:


This seems very similar to HDDS-2355.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14284) RBF: Log Router identifier when reporting exceptions

2019-10-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959178#comment-16959178
 ] 

Hadoop QA commented on HDFS-14284:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
59s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m  6s{color} 
| {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterFaultTolerant |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14284 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983954/HDFS-14284.007.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux cff374255ed5 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2eba2624 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28171/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28171/testReport/ |
| Max. process+thread count | 2765 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28171/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Created] (HDDS-2360) Update Ratis snapshot to d6d58d0

2019-10-24 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2360:
--

 Summary: Update Ratis snapshot to d6d58d0
 Key: HDDS-2360
 URL: https://issues.apache.org/jira/browse/HDDS-2360
 Project: Hadoop Distributed Data Store
  Issue Type: Task
  Components: Ozone Client, Ozone Datanode
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


Update Ratis dependency version to snapshot 
[d6d58d0|https://github.com/apache/incubator-ratis/commit/d6d58d0], to fix 
memory issues (RATIS-726, RATIS-728).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14284) RBF: Log Router identifier when reporting exceptions

2019-10-24 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14284:
-
Attachment: HDFS-14284.007.patch

> RBF: Log Router identifier when reporting exceptions
> 
>
> Key: HDFS-14284
> URL: https://issues.apache.org/jira/browse/HDFS-14284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14284.001.patch, HDFS-14284.002.patch, 
> HDFS-14284.003.patch, HDFS-14284.004.patch, HDFS-14284.005.patch, 
> HDFS-14284.006.patch, HDFS-14284.007.patch
>
>
> The typical setup is to use multiple Routers through 
> ConfiguredFailoverProxyProvider.
> In a regular HA Namenode setup, it is easy to know which NN was used.
> However, in RBF, any Router can be the one reporting the exception and it is 
> hard to know which was the one.
> We should have a way to identify which Router/Namenode was the one triggering 
> the exception.
> This would also apply with Observer Namenodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log

2019-10-24 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959106#comment-16959106
 ] 

Íñigo Goiri commented on HDFS-14775:


[~xkrogen], [~shv], thoughts?

> Add Timestamp for longest FSN write/read lock held log
> --
>
> Key: HDFS-14775
> URL: https://issues.apache.org/jira/browse/HDFS-14775
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, 
> HDFS-14775.003.patch, HDFS-14775.004.patch
>
>
> HDFS-13946 improved the log for longest read/write lock held time, it's very 
> useful improvement.
> In some condition, we need to locate the detailed call information(user, ip, 
> path, etc.) for longest lock holder, but the default throttle interval(10s) 
> is too long to find the corresponding audit log. I think we should add the 
> timestamp for the {{longestWriteLockHeldStackTrace}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14928) UI: unifying the WebUI across different components.

2019-10-24 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959103#comment-16959103
 ] 

Íñigo Goiri commented on HDFS-14928:


Thanks [~risyomei] for bringing up the proposals.
Given that we are putting the icon next to the text, I don't think there is a 
need for the legend.
The legend would make sense if we removed the text.
I prefer without the legend (proposed 1) and having the text next to it (new 
for the DN).

Another option would be to put the icon after the text instead of before.

> UI: unifying the WebUI across different components.
> ---
>
> Key: HDFS-14928
> URL: https://issues.apache.org/jira/browse/HDFS-14928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Priority: Trivial
> Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, 
> NN_orig.png, NN_with_legend.png, NN_wo_legend.png, RBF_orig.png, 
> RBF_with_legend.png, RBF_wo_legend.png
>
>
> The WebUI of different components could be unified.
> *Router:*
> |Current|  !RBF_orig.png|width=500! | 
> |Proposed 1 (With Icon) |  !RBF_wo_legend.png|width=500! | 
> |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500!  | 
> *NameNode:*
> |Current| !NN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! |
> *DataNode:*
> |Current| !DN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2301) Write path: Reduce read contention in rocksDB

2019-10-24 Thread Aravindan Vijayan (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959100#comment-16959100
 ] 

Aravindan Vijayan commented on HDDS-2301:
-

[~sdeka] Here are some useful RocksDB configs in OM

* Enable RocksDB metrics - *ozone.metastore.rocksdb.statistics=ALL*
* Enable RocksDB logging - *rocksdb.logging.enabled=true*
* Enable RocksDB DEBUG logging - *rocksdb.logging.level=DEBUG*

> Write path: Reduce read contention in rocksDB
> -
>
> Key: HDDS-2301
> URL: https://issues.apache.org/jira/browse/HDDS-2301
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Rajesh Balamohan
>Assignee: Supratim Deka
>Priority: Major
>  Labels: performance
> Attachments: om_write_profile.png
>
>
> Benchmark: 
>  
>  Simple benchmark which creates 100 and 1000s of keys (empty directory) in 
> OM. This is done in a tight loop and multiple threads from client side to add 
> enough load on CPU. Note that intention is to understand the bottlenecks in 
> OM (intentionally avoiding interactions with SCM & DN).
> Observation:
>  -
>  During write path, Ozone checks {{OMFileRequest.verifyFilesInPath}}. This 
> internally calls {{omMetadataManager.getKeyTable().get(dbKeyName)}} for every 
> write operation. This turns out to be expensive and chokes the write path.
> [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMDirectoryCreateRequest.java#L155]
> [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileRequest.java#L63]
> In most of the cases, directory creation would be fresh entry. In such cases, 
> it would be good to try with {{RocksDB::keyMayExist.}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2301) Write path: Reduce read contention in rocksDB

2019-10-24 Thread Aravindan Vijayan (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959100#comment-16959100
 ] 

Aravindan Vijayan edited comment on HDDS-2301 at 10/24/19 5:51 PM:
---

[~sdeka] Here are some useful RocksDB configs in OM

* Enable RocksDB metrics - *ozone.metastore.rocksdb.statistics=ALL*
* Enable RocksDB logging - *hadoop.hdds.db.rocksdb.logging.enabled=true*
* Enable RocksDB DEBUG logging - *hadoop.hdds.db.rocksdb.logging.level=DEBUG*


was (Author: avijayan):
[~sdeka] Here are some useful RocksDB configs in OM

* Enable RocksDB metrics - *ozone.metastore.rocksdb.statistics=ALL*
* Enable RocksDB logging - *rocksdb.logging.enabled=true*
* Enable RocksDB DEBUG logging - *rocksdb.logging.level=DEBUG*

> Write path: Reduce read contention in rocksDB
> -
>
> Key: HDDS-2301
> URL: https://issues.apache.org/jira/browse/HDDS-2301
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Rajesh Balamohan
>Assignee: Supratim Deka
>Priority: Major
>  Labels: performance
> Attachments: om_write_profile.png
>
>
> Benchmark: 
>  
>  Simple benchmark which creates 100 and 1000s of keys (empty directory) in 
> OM. This is done in a tight loop and multiple threads from client side to add 
> enough load on CPU. Note that intention is to understand the bottlenecks in 
> OM (intentionally avoiding interactions with SCM & DN).
> Observation:
>  -
>  During write path, Ozone checks {{OMFileRequest.verifyFilesInPath}}. This 
> internally calls {{omMetadataManager.getKeyTable().get(dbKeyName)}} for every 
> write operation. This turns out to be expensive and chokes the write path.
> [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMDirectoryCreateRequest.java#L155]
> [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileRequest.java#L63]
> In most of the cases, directory creation would be fresh entry. In such cases, 
> it would be good to try with {{RocksDB::keyMayExist.}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14917) Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"

2019-10-24 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959097#comment-16959097
 ] 

Íñigo Goiri commented on HDFS-14917:


OK, let's follow up in HDFS-14928.
+1 on  [^HDFS-14917.patch].

> Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"
> ---
>
> Key: HDFS-14917
> URL: https://issues.apache.org/jira/browse/HDFS-14917
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Attachments: HDFS-14917.patch, image-2019-10-21-17-49-10-635.png, 
> image-2019-10-21-17-49-58-759.png, image-2019-10-21-18-03-53-914.png, 
> image-2019-10-21-18-04-52-405.png, image-2019-10-21-18-05-19-160.png, 
> image-2019-10-21-18-13-01-884.png, image-2019-10-21-18-13-54-427.png
>
>
> This is a really simple UI change proposal:
>  The icon of "Decommissioned & dead" datanode could be improved. It can be 
> changed from    !image-2019-10-21-18-05-19-160.png|width=31,height=28! to   
> !image-2019-10-21-18-04-52-405.png|width=32,height=29! so that,
>  #  icon "  !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be 
> used for all status starts with "decommission" on dfshealth.html, 
>  #  icon "  !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be 
> differentiated with icon "  !image-2019-10-21-18-13-54-427.png! " on 
> federationhealth.html
> |*DataNode Infomation Legend (now)*
>  dfshealth.html#tab-datanode 
> |!image-2019-10-21-17-49-10-635.png|width=516,height=55!|
> |*DataNode* *Infomation* *Legend (proposed)*
>   dfshealth.html#tab-datanode 
> |!image-2019-10-21-18-03-53-914.png|width=589,height=60!|
> |*NameService Legend*
>  
> federationhealth.htm#tab-namenode|!image-2019-10-21-17-49-58-759.png|width=445,height=43!|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-24 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh reassigned HDDS-2356:
---

Assignee: Bharat Viswanadham

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14925) rename operation should check nest snapshot

2019-10-24 Thread Junwang Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958990#comment-16958990
 ] 

Junwang Zhao commented on HDFS-14925:
-

[~sodonnell] thanks for your reply.
{code:java}
hadoop fs -mv /project/folder/test /other_project/folder/
{code}
⬆️ will not be denied, what should be denied is ⬇️:
{code:java}
hadoop fs -mv /project /other_project/folder
{code}
because /project is snapshot root.

In your case, because /project has snapshot so the validateRenameSource already 
make sure this will be denied,

what i'm fixing is that if /project does'nt have snapshot, but it is 
snapshottable, the mv operation won't be denied.

 

To make it more clear, you can try the following:
{code:java}
hdfs dfs -mkdir /dir1
hdfs dfs -mkdir /dir2
hdfs dfsadmin -allowSnapshot /dir1
hdfs dfsadmin -allowSnapshot /dir2
hdfs dfs -createSnapshot /dir1 snap1
hdfs dfs -mv /dir2 /dir1/
hdfs dfs -createSnapshot /dir1/dir2 snap2{code}
Which will cause nested snapshot.

> rename operation should check nest snapshot
> ---
>
> Key: HDFS-14925
> URL: https://issues.apache.org/jira/browse/HDFS-14925
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Junwang Zhao
>Priority: Major
>
> When we do rename operation, If the src directory or any of its descendant
> is snapshottable and the dst directory or any of its ancestors is 
> snapshottable, 
> we consider this as nested snapshot, which should be denied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-24 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958975#comment-16958975
 ] 

Ayush Saxena commented on HDFS-14882:
-

Thanx [~hexiaoqiao] for the patch v008 LGTM +1

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch, HDFS-14882.004.patch, HDFS-14882.005.patch, 
> HDFS-14882.006.patch, HDFS-14882.007.patch, HDFS-14882.008.patch
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14910) Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.

2019-10-24 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958970#comment-16958970
 ] 

Ayush Saxena commented on HDFS-14910:
-

Thanx [~weichiu] for the PR.

fix LGTM +1

> Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.
> 
>
> Key: HDFS-14910
> URL: https://issues.apache.org/jira/browse/HDFS-14910
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Íñigo Goiri
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
>
> TestRenameWithSnapshots#testRename2PreDescendant has been failing 
> consistently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2188) Implement LocatedFileStatus & getFileBlockLocations to provide node/localization information to Yarn/Mapreduce

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2188?focusedWorklogId=333509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333509
 ]

ASF GitHub Bot logged work on HDDS-2188:


Author: ASF GitHub Bot
Created on: 24/Oct/19 15:17
Start Date: 24/Oct/19 15:17
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #1631: 
HDDS-2188 : Implement LocatedFileStatus & getFileBlockLocations to pr…
URL: https://github.com/apache/hadoop/pull/1631#discussion_r338635266
 
 

 ##
 File path: 
hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java
 ##
 @@ -626,6 +640,16 @@ public FileStatus getFileStatus(Path f) throws 
IOException {
 return fileStatus;
   }
 
+  public BlockLocation[] getFileBlockLocations(FileStatus fileStatus,
 
 Review comment:
   add @override
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333509)
Time Spent: 1h 10m  (was: 1h)

> Implement LocatedFileStatus & getFileBlockLocations to provide 
> node/localization information to Yarn/Mapreduce
> --
>
> Key: HDDS-2188
> URL: https://issues.apache.org/jira/browse/HDDS-2188
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.5.0
>Reporter: Mukul Kumar Singh
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> For applications like Hive/MapReduce to take advantage of the data locality 
> in Ozone, Ozone should return the location of the Ozone blocks. This is 
> needed for better read performance for Hadoop Applications.
> {code}
> if (file instanceof LocatedFileStatus) {
>   blkLocations = ((LocatedFileStatus) file).getBlockLocations();
> } else {
>   blkLocations = fs.getFileBlockLocations(file, 0, length);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14880) Balancer sequence of statistics & exit message is not correct

2019-10-24 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958962#comment-16958962
 ] 

Ayush Saxena commented on HDFS-14880:
-

V0003 LGTM +1
 [~vinayakumarb] any comments?

> Balancer sequence of statistics & exit message is not correct
> -
>
> Key: HDFS-14880
> URL: https://issues.apache.org/jira/browse/HDFS-14880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.1.1, 3.2.1
> Environment: Run the balancer tool in cluster.
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
> Attachments: HDFS-14880.0001.patch, HDFS-14880.0002.patch, 
> HDFS-14880.0003.patch
>
>
> Actual:
> Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
> The cluster is balanced. Exiting...
> Sep 27, 2019 5:13:15 PM   0   0 B  0 B
>   0 B
> Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds
> Done!
> Expected: Exit message should be after loggin all the balancer movement 
> statistics data.
> Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
> Sep 27, 2019 5:13:15 PM   0   0 B  0 B
>   0 B
> The cluster is balanced. Exiting...
> Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds
> Done!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14880) Balancer sequence of statistics & exit message is not correct

2019-10-24 Thread Renukaprasad C (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958947#comment-16958947
 ] 

Renukaprasad C edited comment on HDFS-14880 at 10/24/19 2:56 PM:
-

Thanks [~ayushtkn] for quick review. 
I have made the changes & submitted the patch. Plz review.
There is a test failure - TestRenameWithSnapshots, this is not related to the 
patch i have submitted.


was (Author: prasad-acit):
Thanks [~ayushtkn] for quick review. 

> Balancer sequence of statistics & exit message is not correct
> -
>
> Key: HDFS-14880
> URL: https://issues.apache.org/jira/browse/HDFS-14880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.1.1, 3.2.1
> Environment: Run the balancer tool in cluster.
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
> Attachments: HDFS-14880.0001.patch, HDFS-14880.0002.patch, 
> HDFS-14880.0003.patch
>
>
> Actual:
> Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
> The cluster is balanced. Exiting...
> Sep 27, 2019 5:13:15 PM   0   0 B  0 B
>   0 B
> Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds
> Done!
> Expected: Exit message should be after loggin all the balancer movement 
> statistics data.
> Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
> Sep 27, 2019 5:13:15 PM   0   0 B  0 B
>   0 B
> The cluster is balanced. Exiting...
> Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds
> Done!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14880) Balancer sequence of statistics & exit message is not correct

2019-10-24 Thread Renukaprasad C (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958947#comment-16958947
 ] 

Renukaprasad C commented on HDFS-14880:
---

Thanks [~ayushtkn] for quick review. 

> Balancer sequence of statistics & exit message is not correct
> -
>
> Key: HDFS-14880
> URL: https://issues.apache.org/jira/browse/HDFS-14880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.1.1, 3.2.1
> Environment: Run the balancer tool in cluster.
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
> Attachments: HDFS-14880.0001.patch, HDFS-14880.0002.patch, 
> HDFS-14880.0003.patch
>
>
> Actual:
> Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
> The cluster is balanced. Exiting...
> Sep 27, 2019 5:13:15 PM   0   0 B  0 B
>   0 B
> Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds
> Done!
> Expected: Exit message should be after loggin all the balancer movement 
> statistics data.
> Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
> Sep 27, 2019 5:13:15 PM   0   0 B  0 B
>   0 B
> The cluster is balanced. Exiting...
> Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds
> Done!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2359) Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2359:
-
Labels: pull-request-available  (was: )

> Seeking randomly in a key with more than 2 blocks of data leads to 
> inconsistent reads
> -
>
> Key: HDDS-2359
> URL: https://issues.apache.org/jira/browse/HDDS-2359
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: pull-request-available
>
> During Hive testing we found the following exception:
> {code}
> TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1569246922012_0214_1_03_00_3:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 16 more
> Caused by: java.io.IOException: java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:366)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
> ... 18 more
> Caused by: java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:835)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:74)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361)
> ... 24 more
> Caused by: java.io.IOException: Error reading file: 
> o3fs://hive.warehouse.vc0136.halxg.cloudera.com:9862/data/inventory/delta_001_001_/bucket_0
> at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1283)
> at 
> 

[jira] [Work logged] (HDDS-2359) Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2359?focusedWorklogId=333473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333473
 ]

ASF GitHub Bot logged work on HDDS-2359:


Author: ASF GitHub Bot
Created on: 24/Oct/19 14:42
Start Date: 24/Oct/19 14:42
Worklog Time Spent: 10m 
  Work Description: bshashikant commented on pull request #82: HDDS-2359. 
Seeking randomly in a key with more than 2 blocks of data leads to inconsistent 
reads
URL: https://github.com/apache/hadoop-ozone/pull/82
 
 
   
   
   ## What changes were proposed in this pull request?
   The issue was primarily caused when first seek to an offset , then read 
followed by seek to a different offset and read data again both containing 
overlapping set of chunks . Once a seek to a position is done, the 
chunkPosition inside each blockInputStream is not correctly set to 0 thereby, 
the 1st which to which the seek offset belongs is correctly read but for the 
next subsequent chunks , data to be read will be returned as zero as a result 
of which , all the read for the subsequent chunks will return length to be read 
as 0. The solution here is to reset all the subsequent chunks for all 
subsequent blocks after a seek to set to 0 so once that it will start read from 
the beginning of each chunk.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-2359
   
   ## How was this patch tested?
   The patch was tested with addition of unit tests which reliably reproduce 
the issue. This was also deployed in real cluster where the issue was first 
discovered and verified.
   
   Thanks @fapifta for discovering the issue and help verifying the fix as 
well. Thanks @bharatviswa504 and @hanishakoneru for the contribution in the fix 
provided.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333473)
Remaining Estimate: 0h
Time Spent: 10m

> Seeking randomly in a key with more than 2 blocks of data leads to 
> inconsistent reads
> -
>
> Key: HDDS-2359
> URL: https://issues.apache.org/jira/browse/HDDS-2359
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During Hive testing we found the following exception:
> {code}
> TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1569246922012_0214_1_03_00_3:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: error iterating
> at 
> 

[jira] [Work logged] (HDDS-2357) Add replication factor option to new Freon tests

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2357?focusedWorklogId=333439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333439
 ]

ASF GitHub Bot logged work on HDDS-2357:


Author: ASF GitHub Bot
Created on: 24/Oct/19 14:20
Start Date: 24/Oct/19 14:20
Worklog Time Spent: 10m 
  Work Description: arp7 commented on pull request #79: HDDS-2357. Add 
replication factor option to new Freon tests
URL: https://github.com/apache/hadoop-ozone/pull/79
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333439)
Time Spent: 20m  (was: 10m)

> Add replication factor option to new Freon tests
> 
>
> Key: HDDS-2357
> URL: https://issues.apache.org/jira/browse/HDDS-2357
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: freon
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> New Freon generators (OCKG and OKG) use fixed replication factor of 3.  
> Sometimes it's useful to be able to test single-node replication.  The goal 
> of this task to add a command-line option to specify replication factor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2357) Add replication factor option to new Freon tests

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2357:

Labels:   (was: pull-request-available)

> Add replication factor option to new Freon tests
> 
>
> Key: HDDS-2357
> URL: https://issues.apache.org/jira/browse/HDDS-2357
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: freon
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> New Freon generators (OCKG and OKG) use fixed replication factor of 3.  
> Sometimes it's useful to be able to test single-node replication.  The goal 
> of this task to add a command-line option to specify replication factor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2357) Add replication factor option to new Freon tests

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2357:

Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

+1 I've committed this via GitHub. Thanks for the contribution [~adoroszlai].

> Add replication factor option to new Freon tests
> 
>
> Key: HDDS-2357
> URL: https://issues.apache.org/jira/browse/HDDS-2357
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: freon
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> New Freon generators (OCKG and OKG) use fixed replication factor of 3.  
> Sometimes it's useful to be able to test single-node replication.  The goal 
> of this task to add a command-line option to specify replication factor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2358) Change to replication factor THREE in acceptance tests

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2358:

Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

+1 I've committed this via GitHub. Thanks for catching this [~adoroszlai].

> Change to replication factor THREE in acceptance tests
> --
>
> Key: HDDS-2358
> URL: https://issues.apache.org/jira/browse/HDDS-2358
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Acceptance test clusters are currently configured to replication factor of 
> ONE.  This way the [test 
> succeeds|https://elek.github.io/ozone-ci-q4/pr/pr-hdds-2305-c92ks/acceptance/summary.html]
>  in spite of Ratis leader election problems (note "term 1464"):
> {noformat:title=https://raw.githubusercontent.com/elek/ozone-ci-q4/master/pr/pr-hdds-2305-c92ks/acceptance/docker-ozones3-ozones3-s3-scm.log}
> datanode_2  | 2019-10-15 03:18:06,953 INFO impl.RoleInfo: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748: start FollowerState
> datanode_2  | 2019-10-15 03:18:06,953 INFO impl.FollowerState: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13-FollowerState was 
> interrupted: java.lang.InterruptedException: sleep interrupted
> datanode_2  | 2019-10-15 03:18:07,090 INFO impl.RaftServerImpl: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13: changes role from  
> FOLLOWER to FOLLOWER at term 1464 for 
> recognizeCandidate:5ce55bf6-dbcc-40fb-8fb4-6e78032f4b8c
> datanode_2  | 2019-10-15 03:18:07,090 INFO impl.RoleInfo: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748: shutdown FollowerState
> {noformat}
> The goal of this change is to configure factor of THREE, to allow acceptance 
> test to catch such issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2358) Change to replication factor THREE in acceptance tests

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-2358:

Labels:   (was: pull-request-available)

> Change to replication factor THREE in acceptance tests
> --
>
> Key: HDDS-2358
> URL: https://issues.apache.org/jira/browse/HDDS-2358
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Acceptance test clusters are currently configured to replication factor of 
> ONE.  This way the [test 
> succeeds|https://elek.github.io/ozone-ci-q4/pr/pr-hdds-2305-c92ks/acceptance/summary.html]
>  in spite of Ratis leader election problems (note "term 1464"):
> {noformat:title=https://raw.githubusercontent.com/elek/ozone-ci-q4/master/pr/pr-hdds-2305-c92ks/acceptance/docker-ozones3-ozones3-s3-scm.log}
> datanode_2  | 2019-10-15 03:18:06,953 INFO impl.RoleInfo: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748: start FollowerState
> datanode_2  | 2019-10-15 03:18:06,953 INFO impl.FollowerState: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13-FollowerState was 
> interrupted: java.lang.InterruptedException: sleep interrupted
> datanode_2  | 2019-10-15 03:18:07,090 INFO impl.RaftServerImpl: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13: changes role from  
> FOLLOWER to FOLLOWER at term 1464 for 
> recognizeCandidate:5ce55bf6-dbcc-40fb-8fb4-6e78032f4b8c
> datanode_2  | 2019-10-15 03:18:07,090 INFO impl.RoleInfo: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748: shutdown FollowerState
> {noformat}
> The goal of this change is to configure factor of THREE, to allow acceptance 
> test to catch such issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2358) Change to replication factor THREE in acceptance tests

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2358?focusedWorklogId=333438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333438
 ]

ASF GitHub Bot logged work on HDDS-2358:


Author: ASF GitHub Bot
Created on: 24/Oct/19 14:18
Start Date: 24/Oct/19 14:18
Worklog Time Spent: 10m 
  Work Description: arp7 commented on pull request #78: HDDS-2358. Change 
to replication factor THREE in acceptance tests
URL: https://github.com/apache/hadoop-ozone/pull/78
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333438)
Time Spent: 20m  (was: 10m)

> Change to replication factor THREE in acceptance tests
> --
>
> Key: HDDS-2358
> URL: https://issues.apache.org/jira/browse/HDDS-2358
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Acceptance test clusters are currently configured to replication factor of 
> ONE.  This way the [test 
> succeeds|https://elek.github.io/ozone-ci-q4/pr/pr-hdds-2305-c92ks/acceptance/summary.html]
>  in spite of Ratis leader election problems (note "term 1464"):
> {noformat:title=https://raw.githubusercontent.com/elek/ozone-ci-q4/master/pr/pr-hdds-2305-c92ks/acceptance/docker-ozones3-ozones3-s3-scm.log}
> datanode_2  | 2019-10-15 03:18:06,953 INFO impl.RoleInfo: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748: start FollowerState
> datanode_2  | 2019-10-15 03:18:06,953 INFO impl.FollowerState: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13-FollowerState was 
> interrupted: java.lang.InterruptedException: sleep interrupted
> datanode_2  | 2019-10-15 03:18:07,090 INFO impl.RaftServerImpl: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13: changes role from  
> FOLLOWER to FOLLOWER at term 1464 for 
> recognizeCandidate:5ce55bf6-dbcc-40fb-8fb4-6e78032f4b8c
> datanode_2  | 2019-10-15 03:18:07,090 INFO impl.RoleInfo: 
> 749b19c7-0772-44d2-8efd-0664e6aa0748: shutdown FollowerState
> {noformat}
> The goal of this change is to configure factor of THREE, to allow acceptance 
> test to catch such issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1228) Chunk Scanner Checkpoints

2019-10-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1228:

Labels:   (was: pull-request-available)

> Chunk Scanner Checkpoints
> -
>
> Key: HDDS-1228
> URL: https://issues.apache.org/jira/browse/HDDS-1228
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Reporter: Supratim Deka
>Assignee: Attila Doroszlai
>Priority: Critical
> Fix For: 0.5.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Checkpoint the progress of the chunk verification scanner.
> Save the checkpoint persistently to support scanner resume from checkpoint - 
> after a datanode restart.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >