date:20240103

[jira] [Comment Edited] (HDFS-12859) Admin command resetBalancerBandwidth

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802455#comment-17802455
 ] 

Shilun Fan edited comment on HDFS-12859 at 1/4/24 7:58 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> Admin command resetBalancerBandwidth
> 
>
> Key: HDFS-12859
> URL: https://issues.apache.org/jira/browse/HDFS-12859
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer & mover
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Major
> Attachments: 
> 0003-HDFS-12859-Admin-command-resetBalancerBandwidth.patch, 
> 0004-HDFS-12859-Admin-command-resetBalancerBandwidth.patch, HDFS-12859.patch
>
>
> We can already set balancer bandwidth dynamically using command 
> setBalancerBandwidth. The setting value is not persistent and not stored in 
> configuration file. The different datanodes could their different default or 
> former setting in configuration.
> When we suggested to develop a schedule balancer task which runs at midnight 
> everyday. We set a larger bandwidth for it and hope to reset the value after 
> finishing. However, we found it difficult to reset the different setting for 
> different datanodes as the setBalancerBandwidth command can only set the same 
> value to all datanodes. If we want to use unique setting for every datanode, 
> we have to reset the datanodes.
> So it would be useful to have a command to synchronize the setting with the 
> configuration file. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12657) Operations based on inode id must not fallback to the path

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802459#comment-17802459
 ] 

Shilun Fan edited comment on HDFS-12657 at 1/4/24 7:58 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> Operations based on inode id must not fallback to the path
> --
>
> Key: HDFS-12657
> URL: https://issues.apache.org/jira/browse/HDFS-12657
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Priority: Major
>
> HDFS-6294 added the ability for some path-based operations to specify an 
> optional inode id to mimic file descriptors.  If an inode id is provided and 
> it exists, it replaces the provided path.  If it doesn't exist, it has the 
> broken behavior of falling back to the supplied path.  A supplied inode id 
> must be authoritative.  A FNF should be thrown if the inode does not exist.  
> (HDFS-10745 changed from string paths to IIPs but preserved the same broken 
> semantics)
> This is broken since an operation specifying an inode for a deleted and 
> recreated path will operate on the newer inode.  If another client recreates 
> the path, the operation is likely to fail for other reasons such as lease 
> checks.  However a multi-threaded client has a single lease id.  If thread1 
> creates a file, it's somehow deleted, thread2 recreates the path, then 
> further operations in thread1 may conflict with thread2 and corrupt the state 
> of the file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12652) INodeAttributesProvider#getAttributes(): Avoid multiple conversions of path components byte[][] to String[] when requesting INode attributes

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802460#comment-17802460
 ] 

Shilun Fan edited comment on HDFS-12652 at 1/4/24 7:58 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> INodeAttributesProvider#getAttributes(): Avoid multiple conversions of path 
> components byte[][] to String[] when requesting INode attributes
> 
>
> Key: HDFS-12652
> URL: https://issues.apache.org/jira/browse/HDFS-12652
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
>
> {{INodeAttributesProvider#getAttributes}} needs the path components passed in 
> to be an array of Strings. Where as the INode and related layers maintain 
> path components as an array of byte[]. So, these layers are required to 
> convert each byte[] component of the path back into a string and for multiple 
> times when requesting for INode attributes from the Provider. 
> That is, the path "/a/b/c" requires calling the attribute provider with: (1) 
> "", (2) "", "a", (3) "", "a","b", (4) "", "a","b", "c". Every single one of 
> those strings were freshly (re)converted from a byte[]. Say, a file listing 
> is done on a huge directory containing 100s of millions of files, then these 
> multiple time redundant conversions of byte[][] to String[] create lots of 
> tiny object garbages, occupying memory and affecting performance. Better if 
> we could avoid creating redundant copies of path component strings.
>   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12597) Add CryptoOutputStream to WebHdfsFileSystem create call.

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-12597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802461#comment-17802461
 ] 

Shilun Fan edited comment on HDFS-12597 at 1/4/24 7:58 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> Add CryptoOutputStream to WebHdfsFileSystem create call.
> 
>
> Key: HDFS-12597
> URL: https://issues.apache.org/jira/browse/HDFS-12597
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: encryption, kms, webhdfs
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12953) XORRawDecoder.doDecode throws NullPointerException

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-12953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802454#comment-17802454
 ] 

Shilun Fan edited comment on HDFS-12953 at 1/4/24 7:58 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> XORRawDecoder.doDecode throws NullPointerException
> --
>
> Key: HDFS-12953
> URL: https://issues.apache.org/jira/browse/HDFS-12953
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Aihua Xu
>Priority: Major
> Attachments: HDFS-12953.test.patch
>
>
> Thanks [~danielpol] report on HDFS-12860.
> {noformat}
> 17/11/30 04:19:55 INFO mapreduce.Job: map 0% reduce 0%
> 17/11/30 04:20:01 INFO mapreduce.Job: Task Id : 
> attempt_1512036058655_0003_m_02_0, Status : FAILED
> Error: java.lang.NullPointerException
> at 
> org.apache.hadoop.io.erasurecode.rawcoder.XORRawDecoder.doDecode(XORRawDecoder.java:83)
> at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:106)
> at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
> at 
> org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:423)
> at 
> org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
> at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:382)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:318)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:391)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:813)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at 
> org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:257)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:563)
> at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-13177) Investigate and fix DFSStripedOutputStream handling of DSQuotaExceededException

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802450#comment-17802450
 ] 

Shilun Fan edited comment on HDFS-13177 at 1/4/24 7:57 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> Investigate and fix DFSStripedOutputStream handling of 
> DSQuotaExceededException
> ---
>
> Key: HDFS-13177
> URL: https://issues.apache.org/jira/browse/HDFS-13177
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Priority: Major
>
> This is the DFSStripedOutputStream equivalent of HDFS-13164



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-13285) Improve runtime for TestReadStripedFileWithMissingBlocks#testReadFileWithMissingBlocks

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802448#comment-17802448
 ] 

Shilun Fan edited comment on HDFS-13285 at 1/4/24 7:57 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> Improve runtime for 
> TestReadStripedFileWithMissingBlocks#testReadFileWithMissingBlocks 
> ---
>
> Key: HDFS-13285
> URL: https://issues.apache.org/jira/browse/HDFS-13285
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ajay Kumar
>Priority: Major
>
> TestReadStripedFileWithMissingBlocks#testReadFileWithMissingBlocks takes 
> anywhere b/w 2-4 minutes depending on host machine. Jira intends to make it 
> leaner.
> cc: [~elgoiri]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12969) DfsAdmin listOpenFiles should report files by type

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802453#comment-17802453
 ] 

Shilun Fan edited comment on HDFS-12969 at 1/4/24 7:57 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> DfsAdmin listOpenFiles should report files by type
> --
>
> Key: HDFS-12969
> URL: https://issues.apache.org/jira/browse/HDFS-12969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Manoj Govindassamy
>Assignee: Hemanth Boyina
>Priority: Major
> Attachments: HDFS-12969.001.patch, HDFS-12969.002.patch, 
> HDFS-12969.003.patch
>
>
> HDFS-11847 has introduced a new option to {{-blockingDecommission}} to an 
> existing command 
> {{dfsadmin -listOpenFiles}}. But the reporting done by the command doesn't 
> differentiate the files based on the type (like blocking decommission). In 
> order to change the reporting style, the proto format used for the base 
> command has to be updated to carry additional fields and better be done in a 
> new jira outside of HDFS-11847. This jira is to track the end-to-end 
> enhancements needed for dfsadmin -listOpenFiles console output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-13064) Httpfs should return json instead of html when writting to a file without Content-Type

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802451#comment-17802451
 ] 

Shilun Fan edited comment on HDFS-13064 at 1/4/24 7:57 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> Httpfs should return json instead of html when writting to a file without 
> Content-Type
> --
>
> Key: HDFS-13064
> URL: https://issues.apache.org/jira/browse/HDFS-13064
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.0.0
>Reporter: fang zhenyi
>Assignee: fang zhenyi
>Priority: Minor
> Attachments: HDFS-13064.001.patch, HDFS-13064.002.patch
>
>
> When I create a hdfs file, I get the following response.
>  
> {code:java}
> zdh102:~ # curl -i -X PUT 
> "http://10.43.183.103:14000/webhdfs/v1/2.txt?op=CREATE&user.name=hdfs&data=true";
> HTTP/1.1 400 Bad Request
> Server: Apache-Coyote/1.1
> Set-Cookie: 
> hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1516901333684&s=wYqDlu/ovRxay9d6I6UmoH77KKI=";
>  Path=/; Expires= , 25- -2018 17:28:53 GMT; HttpOnly
> Content-Type: text/html;charset=utf-8
> Content-Language: en
> Content-Length: 1122
> Date: Thu, 25 Jan 2018 07:28:53 GMT
> Connection: close
> Apache Tomcat/7.0.82 - Error report 
> HTTP Status 400 - Data upload requests must have 
> content-type set to 'application/octet-stream' noshade="noshade">type Status reportmessage Data 
> upload requests must have content-type set to 
> 'application/octet-stream'description The request sent 
> by the client was syntactically incorrect. noshade="noshade">Apache Tomcat/7.0.82zdh102:~ # 
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-13035) Owner should be allowed to set xattr if not already set.

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802452#comment-17802452
 ] 

Shilun Fan edited comment on HDFS-13035 at 1/4/24 7:57 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> Owner should be allowed to set xattr if not already set.
> 
>
> Key: HDFS-13035
> URL: https://issues.apache.org/jira/browse/HDFS-13035
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Rushabh Shah
>Priority: Major
>
> Motivation: This is needed to support encryption zones on WebhdfsFileSystem.
> For writing into EZ directory, webhdfs client will encrypt data on client 
> side and will always write into /.reserved/raw/ directory so that datanode 
> will not encrypt since its writing to /.reserved/raw directory.
> But then we have to somehow set crypto related x-attrs on the file.
> Currently only super user is allowed to set x-attrs.
> So I am proposing that the file owner(and superuser) should be allowed to set 
> crypto related x-attrs if they are already not set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-13243) Get CorruptBlock because of calling close and sync in same time

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802449#comment-17802449
 ] 

Shilun Fan edited comment on HDFS-13243 at 1/4/24 7:57 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> Get CorruptBlock because of calling close and sync in same time
> ---
>
> Key: HDFS-13243
> URL: https://issues.apache.org/jira/browse/HDFS-13243
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2, 3.2.0
>Reporter: Zephyr Guo
>Assignee: daimin
>Priority: Critical
> Attachments: HDFS-13243-v1.patch, HDFS-13243-v2.patch, 
> HDFS-13243-v3.patch, HDFS-13243-v4.patch, HDFS-13243-v5.patch, 
> HDFS-13243-v6.patch
>
>
> HDFS File might get broken because of corrupt block(s) that could be produced 
> by calling close and sync in the same time.
> When calling close was not successful, UCBlock status would change to 
> COMMITTED, and if a sync request gets popped from queue and processed, sync 
> operation would change the last block length.
> After that, DataNode would report all received block to NameNode, and will 
> check Block length of all COMMITTED Blocks. But the block length was already 
> different between recorded in NameNode memory and reported by DataNode, and 
> consequently, the last block is marked as corruptted because of inconsistent 
> length.
>  
> {panel:title=Log in my hdfs}
> 2018-03-05 04:05:39,261 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocate blk_1085498930_11758129\{UCState=UNDER_CONSTRUCTION, 
> truncateBlock=null, primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
>  
> ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW],
>  
> ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]}
>  for 
> /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515
> 2018-03-05 04:05:39,760 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> fsync: 
> /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515
>  for DFSClient_NONMAPREDUCE_1077513762_1
> 2018-03-05 04:05:39,761 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
>  
> ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW],
>  
> ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]}
>  is not COMPLETE (ucState = COMMITTED, replication# = 0 < minimum = 2) in 
> file 
> /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515
> 2018-03-05 04:05:39,761 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.0.0.220:50010 is added to 
> blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
>  
> ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW],
>  
> ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]}
>  size 2054413
> 2018-03-05 04:05:39,761 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1085498930 added as corrupt on 
> 10.0.0.219:50010 by 
> hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com/10.0.0.219 because block is 
> COMMITTED and reported length 2054413 does not match length in block map 
> 141232
> 2018-03-05 04:05:39,762 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1085498930 added as corrupt on 
> 10.0.0.218:50010 by 
> hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218 because block is 
> COMMITTED and reported length 2054413 does not match length in block map 
> 141232
> 2018-03-05 04:05:40,162 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
>  
> ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc1

[jira] [Comment Edited] (HDFS-13287) TestINodeFile#testGetBlockType results in NPE when run alone

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802447#comment-17802447
 ] 

Shilun Fan edited comment on HDFS-13287 at 1/4/24 7:56 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> TestINodeFile#testGetBlockType results in NPE when run alone
> 
>
> Key: HDFS-13287
> URL: https://issues.apache.org/jira/browse/HDFS-13287
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Minor
> Attachments: HDFS-13287.01.patch
>
>
> When TestINodeFile#testGetBlockType is run by itself, it results in the 
> following error:
> {code:java}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.218 
> s <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestINodeFile
> [ERROR] 
> testGetBlockType(org.apache.hadoop.hdfs.server.namenode.TestINodeFile)  Time 
> elapsed: 0.023 s  <<< ERROR!
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager.getPolicyInfoByID(ErasureCodingPolicyManager.java:220)
> at 
> org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager.getByID(ErasureCodingPolicyManager.java:208)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat.getBlockLayoutRedundancy(INodeFile.java:207)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.(INodeFile.java:266)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestINodeFile.createStripedINodeFile(TestINodeFile.java:112)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestINodeFile.testGetBlockType(TestINodeFile.java:299)
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-13656) Logging more info when client completes file error

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802445#comment-17802445
 ] 

Shilun Fan edited comment on HDFS-13656 at 1/4/24 7:56 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker.

> Logging more info when client completes file error
> --
>
> Key: HDFS-13656
> URL: https://issues.apache.org/jira/browse/HDFS-13656
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-13656.001.patch, error-hdfs.png
>
>
> We found the error when dfs client completes file.
> !error-hdfs.png!
> Now the error log is too simple and cannot provide enough info for debugging. 
> And this error will failed the write operation and finally lead our critical 
> tasks failed. It will be good to print retry time and file path info in 
> {{DFSOutputStream#completeFile}}. In addition, the log level can be updated 
> from INFO to WARN, it won't be ignored easily by users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-13689) NameNodeRpcServer getEditsFromTxid assumes it is run on active NameNode

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802444#comment-17802444
 ] 

Shilun Fan edited comment on HDFS-13689 at 1/4/24 7:56 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> NameNodeRpcServer getEditsFromTxid assumes it is run on active NameNode
> ---
>
> Key: HDFS-13689
> URL: https://issues.apache.org/jira/browse/HDFS-13689
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: Erik Krogen
>Priority: Major
>
> {{NameNodeRpcServer#getEditsFromTxid}} currently decides which transactions 
> are able to be served, i.e. which transactions are durable, using the 
> following logic:
> {code}
> long syncTxid = log.getSyncTxId();
> // If we haven't synced anything yet, we can only read finalized
> // segments since we can't reliably determine which txns in in-progress
> // segments have actually been committed (e.g. written to a quorum of 
> JNs).
> // If we have synced txns, we can definitely read up to syncTxid since
> // syncTxid is only updated after a transaction is committed to all
> // journals. (In-progress segments written by old writers are already
> // discarded for us, so if we read any in-progress segments they are
> // guaranteed to have been written by this NameNode.)
> boolean readInProgress = syncTxid > 0;
> {code}
> This assumes that the NameNode serving this request is the current 
> writer/active NameNode, which may not be true in the ObserverNode situation. 
> Since {{selectInputStreams}} now has a {{onlyDurableTxns}} flag, which, if 
> enabled, will only return durable/committed transactions, we can instead 
> leverage this to provide the same functionality. We should utilize this to 
> avoid consistency issues when serving this request from the ObserverNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-3570) Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used space

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-3570:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used 
> space
> 
>
> Key: HDFS-3570
> URL: https://issues.apache.org/jira/browse/HDFS-3570
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HDFS-3570.003.patch, HDFS-3570.2.patch, 
> HDFS-3570.aash.1.patch
>
>
> Report from a user here: 
> https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ,
>  post archived at http://pastebin.com/eVFkk0A0
> This user had a specific DN that had a large non-DFS usage among 
> dfs.data.dirs, and very little DFS usage (which is computed against total 
> possible capacity). 
> Balancer apparently only looks at the usage, and ignores to consider that 
> non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a 
> DFS Usage report from DN is 8% only, its got a lot of free space to write 
> more blocks, when that isn't true as shown by the case of this user. It went 
> on scheduling writes to the DN to balance it out, but the DN simply can't 
> accept any more blocks as a result of its disks' state.
> I think it would be better if we _computed_ the actual utilization based on 
> {{(100-(actual remaining space))/(capacity)}}, as opposed to the current 
> {{(dfs used)/(capacity)}}. Thoughts?
> This isn't very critical, however, cause it is very rare to see DN space 
> being used for non DN data, but it does expose a valid bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-3570) Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used space

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802564#comment-17802564
 ] 

Shilun Fan commented on HDFS-3570:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used 
> space
> 
>
> Key: HDFS-3570
> URL: https://issues.apache.org/jira/browse/HDFS-3570
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HDFS-3570.003.patch, HDFS-3570.2.patch, 
> HDFS-3570.aash.1.patch
>
>
> Report from a user here: 
> https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ,
>  post archived at http://pastebin.com/eVFkk0A0
> This user had a specific DN that had a large non-DFS usage among 
> dfs.data.dirs, and very little DFS usage (which is computed against total 
> possible capacity). 
> Balancer apparently only looks at the usage, and ignores to consider that 
> non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a 
> DFS Usage report from DN is 8% only, its got a lot of free space to write 
> more blocks, when that isn't true as shown by the case of this user. It went 
> on scheduling writes to the DN to balance it out, but the DN simply can't 
> accept any more blocks as a result of its disks' state.
> I think it would be better if we _computed_ the actual utilization based on 
> {{(100-(actual remaining space))/(capacity)}}, as opposed to the current 
> {{(dfs used)/(capacity)}}. Thoughts?
> This isn't very critical, however, cause it is very rare to see DN space 
> being used for non DN data, but it does expose a valid bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4319) FSShell copyToLocal creates files with the executable bit set

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802563#comment-17802563
 ] 

Shilun Fan commented on HDFS-4319:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> FSShell copyToLocal creates files with the executable bit set
> -
>
> Key: HDFS-4319
> URL: https://issues.apache.org/jira/browse/HDFS-4319
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha
>Reporter: Colin McCabe
>Priority: Minor
>
> With the default value of {{fs.permissions.umask-mode}}, {{022}},  {{FSShell 
> copyToLocal}} creates files with the executable bit set.
> If, on the other hand, you change {{fs.permissions.umask-mode}} to something 
> like {{133}}, you encounter a different problem.  When you use 
> {{copyToLocal}} to create directories, they don't have the executable bit 
> set, meaning they do not have search permission.
> Since HDFS doesn't allow the executable bit to be set on files, it seems 
> illogical to add it in when using {{copyToLocal}}.  This is also a 
> regression, since branch 1 did not have this problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-4389) Non-HA DFSClients do not attempt reconnects

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-4389:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Non-HA DFSClients do not attempt reconnects
> ---
>
> Key: HDFS-4389
> URL: https://issues.apache.org/jira/browse/HDFS-4389
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs-client
>Affects Versions: 2.0.0-alpha, 3.0.0-alpha1
>Reporter: Daryn Sharp
>Priority: Major
>
> The HA retry policy implementation appears to have broken non-HA 
> {{DFSClient}} connect retries.  The ipc 
> {{Client.Connection#handleConnectionFailure}} used to perform 45 connection 
> attempts, but now it consults a retry policy.  For non-HA proxies, the policy 
> does not handle {{ConnectException}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-4319) FSShell copyToLocal creates files with the executable bit set

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-4319:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> FSShell copyToLocal creates files with the executable bit set
> -
>
> Key: HDFS-4319
> URL: https://issues.apache.org/jira/browse/HDFS-4319
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha
>Reporter: Colin McCabe
>Priority: Minor
>
> With the default value of {{fs.permissions.umask-mode}}, {{022}},  {{FSShell 
> copyToLocal}} creates files with the executable bit set.
> If, on the other hand, you change {{fs.permissions.umask-mode}} to something 
> like {{133}}, you encounter a different problem.  When you use 
> {{copyToLocal}} to create directories, they don't have the executable bit 
> set, meaning they do not have search permission.
> Since HDFS doesn't allow the executable bit to be set on files, it seems 
> illogical to add it in when using {{copyToLocal}}.  This is also a 
> regression, since branch 1 did not have this problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-7368) Support HDFS specific 'shell' on command 'hdfs dfs' invocation

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-7368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-7368:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Support HDFS specific 'shell' on command 'hdfs dfs' invocation
> --
>
> Key: HDFS-7368
> URL: https://issues.apache.org/jira/browse/HDFS-7368
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
> Attachments: HDFS-7368-001.patch
>
>
> * *hadoop fs* is the generic implementation for all filesystem 
> implementations, but some of the operations are supported only in some 
> filesystems. Ex: snapshot commands, acl commands, xattr commands.
> * *hdfs dfs* is recommended in all hdfs related docs in current releases.
> In current code  both *hdfs shell* and *hadoop fs* points to hadoop common 
> implementation of FSShell.
> It would be better to have HDFS specific extention of FSShell which includes 
> HDFS only commands in future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-7550) Minor followon cleanups from HDFS-7543

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-7550:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Minor followon cleanups from HDFS-7543
> --
>
> Key: HDFS-7550
> URL: https://issues.apache.org/jira/browse/HDFS-7550
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7550.001.patch
>
>
> The commit of HDFS-7543 crossed paths with these comments:
> FSDirMkdirOp.java
> in #mkdirs, you removed the final String srcArg = src. This should be left 
> in. Many IDEs will whine about making assignments to formal args and that's 
> why it was put in in the first place.
> FSDirRenameOp.java
> #renameToInt, dstIIP (and resultingStat) could benefit from final's.
> FSDirXAttrOp.java
> I'm not sure why you've moved the call to getINodesInPath4Write and 
> checkXAttrChangeAccess inside the writeLock.
> FSDirStatAndListing.java
> The javadoc for the @param src needs to be changed to reflect that it's an 
> INodesInPath, not a String. Nit: it might be better to rename the 
> INodesInPath arg from src to iip.
> #getFileInfo4DotSnapshot is now unused since you in-lined it into 
> #getFileInfo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-7408) Add a counter in the log that shows the number of block reports processed

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-7408:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Add a counter in the log that shows the number of block reports processed
> -
>
> Key: HDFS-7408
> URL: https://issues.apache.org/jira/browse/HDFS-7408
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Suresh Srinivas
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-7408.001.patch
>
>
> It would be great to have in the info log corresponding to block report 
> processing, printing information on how many block reports have been 
> processed. This can be useful to debug when namenode is unresponsive 
> especially during startup time to understand if datanodes are sending block 
> reports multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7408) Add a counter in the log that shows the number of block reports processed

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802559#comment-17802559
 ] 

Shilun Fan commented on HDFS-7408:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Add a counter in the log that shows the number of block reports processed
> -
>
> Key: HDFS-7408
> URL: https://issues.apache.org/jira/browse/HDFS-7408
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Suresh Srinivas
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-7408.001.patch
>
>
> It would be great to have in the info log corresponding to block report 
> processing, printing information on how many block reports have been 
> processed. This can be useful to debug when namenode is unresponsive 
> especially during startup time to understand if datanodes are sending block 
> reports multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8065) Erasure coding: Support truncate at striped group boundary

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802554#comment-17802554
 ] 

Shilun Fan commented on HDFS-8065:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Erasure coding: Support truncate at striped group boundary
> --
>
> Key: HDFS-8065
> URL: https://issues.apache.org/jira/browse/HDFS-8065
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Rakesh Radhakrishnan
>Priority: Major
> Attachments: HDFS-8065-00.patch, HDFS-8065-01.patch
>
>
> We can support truncate at striped group boundary firstly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8115) Make PermissionStatusFormat public

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802552#comment-17802552
 ] 

Shilun Fan commented on HDFS-8115:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Make PermissionStatusFormat public
> --
>
> Key: HDFS-8115
> URL: https://issues.apache.org/jira/browse/HDFS-8115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Minor
> Attachments: HDFS-8115.1.patch
>
>
> implementations of {{INodeAttributeProvider}} are required to provide an 
> implementation of {{getPermissionLong()}} method. Unfortunately, the long 
> permission format is an encoding of the user, group and mode with each field 
> converted to int using {{SerialNumberManager}} which is package protected.
> Thus it would be nice to make the {{PermissionStatusFormat}} enum public (and 
> also make the {{toLong()}} static method public) so that user specified 
> implementations of {{INodeAttributeProvider}} may use it.
> This would also make it more consistent with {{AclStatusFormat}} which I 
> guess has been made public for the same reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8065) Erasure coding: Support truncate at striped group boundary

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-8065:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Erasure coding: Support truncate at striped group boundary
> --
>
> Key: HDFS-8065
> URL: https://issues.apache.org/jira/browse/HDFS-8065
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Rakesh Radhakrishnan
>Priority: Major
> Attachments: HDFS-8065-00.patch, HDFS-8065-01.patch
>
>
> We can support truncate at striped group boundary firstly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-7702) Move metadata across namenode - Effort to a real distributed namenode

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-7702:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Move metadata across namenode - Effort to a real distributed namenode
> -
>
> Key: HDFS-7702
> URL: https://issues.apache.org/jira/browse/HDFS-7702
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ray Zhang
>Assignee: Ray Zhang
>Priority: Major
> Attachments: Namespace Moving Tool Design Proposal.pdf
>
>
> Implement a tool can show in memory namespace tree structure with 
> weight(size) and a API can move metadata across different namenode. The 
> purpose is moving data efficiently and faster, without moving blocks on 
> datanode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-7902) Expose truncate API for libwebhdfs

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-7902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-7902:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Expose truncate API for libwebhdfs
> --
>
> Key: HDFS-7902
> URL: https://issues.apache.org/jira/browse/HDFS-7902
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: native, webhdfs
>Affects Versions: 2.7.0
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Major
> Attachments: HDFS-7902.001.patch, HDFS-7902.002.patch
>
>
> As Colin suggested in HDFS-7838, we will add truncate support for libwebhdfs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8115) Make PermissionStatusFormat public

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-8115:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Make PermissionStatusFormat public
> --
>
> Key: HDFS-8115
> URL: https://issues.apache.org/jira/browse/HDFS-8115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Minor
> Attachments: HDFS-8115.1.patch
>
>
> implementations of {{INodeAttributeProvider}} are required to provide an 
> implementation of {{getPermissionLong()}} method. Unfortunately, the long 
> permission format is an encoding of the user, group and mode with each field 
> converted to int using {{SerialNumberManager}} which is package protected.
> Thus it would be nice to make the {{PermissionStatusFormat}} enum public (and 
> also make the {{toLong()}} static method public) so that user specified 
> implementations of {{INodeAttributeProvider}} may use it.
> This would also make it more consistent with {{AclStatusFormat}} which I 
> guess has been made public for the same reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8538) Change the default volume choosing policy to AvailableSpaceVolumeChoosingPolicy

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-8538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802550#comment-17802550
 ] 

Shilun Fan commented on HDFS-8538:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Change the default volume choosing policy to 
> AvailableSpaceVolumeChoosingPolicy
> ---
>
> Key: HDFS-8538
> URL: https://issues.apache.org/jira/browse/HDFS-8538
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Major
> Attachments: hdfs-8538.001.patch
>
>
> For datanodes with different sized disks, they almost always want the 
> available space policy. Users with homogenous disks are unaffected.
> Since this code has baked for a while, let's change it to be the default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8538) Change the default volume choosing policy to AvailableSpaceVolumeChoosingPolicy

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-8538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-8538:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Change the default volume choosing policy to 
> AvailableSpaceVolumeChoosingPolicy
> ---
>
> Key: HDFS-8538
> URL: https://issues.apache.org/jira/browse/HDFS-8538
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Major
> Attachments: hdfs-8538.001.patch
>
>
> For datanodes with different sized disks, they almost always want the 
> available space policy. Users with homogenous disks are unaffected.
> Since this code has baked for a while, let's change it to be the default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8430) Erasure coding: compute file checksum for striped files (stripe by stripe)

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802551#comment-17802551
 ] 

Shilun Fan commented on HDFS-8430:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Erasure coding: compute file checksum for striped files (stripe by stripe)
> --
>
> Key: HDFS-8430
> URL: https://issues.apache.org/jira/browse/HDFS-8430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: HDFS-7285
>Reporter: Walter Su
>Assignee: Kai Zheng
>Priority: Major
> Attachments: HDFS-8430-poc1.patch
>
>
> HADOOP-3981 introduces a  distributed file checksum algorithm. It's designed 
> for replicated block.
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped 
> block group.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8430) Erasure coding: compute file checksum for striped files (stripe by stripe)

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-8430:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Erasure coding: compute file checksum for striped files (stripe by stripe)
> --
>
> Key: HDFS-8430
> URL: https://issues.apache.org/jira/browse/HDFS-8430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: HDFS-7285
>Reporter: Walter Su
>Assignee: Kai Zheng
>Priority: Major
> Attachments: HDFS-8430-poc1.patch
>
>
> HADOOP-3981 introduces a  distributed file checksum algorithm. It's designed 
> for replicated block.
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped 
> block group.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9821) HDFS configuration should accept friendly time units

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802548#comment-17802548
 ] 

Shilun Fan commented on HDFS-9821:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> HDFS configuration should accept friendly time units
> 
>
> Key: HDFS-9821
> URL: https://issues.apache.org/jira/browse/HDFS-9821
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 2.8.0
>Reporter: Arpit Agarwal
>Assignee: Xiaobing Zhou
>Priority: Major
>
> HDFS configuration keys that define time intervals use units inconsistently 
> (Hours, seconds, milliseconds).
> Not all keys have the unit as part of their name. Related keys may use 
> different units e.g. {{dfs.blockreport.intervalMsec}} accepts msec while 
> {{dfs.blockreport.initialDelay}} accepts seconds. Milliseconds is rarely 
> useful as a time unit which makes these values hard to parse when reading 
> config files.
> We can either
> # Let existing keys use friendly units e.g. 100ms, 60s, 5m, 1d, 6w etc. This 
> can be done compatibly since there will be no conflict with existing valid 
> configuration. If no suffix is specified just default to the current time 
> unit.
> # Just deprecate the existing keys and define new ones that accept friendly 
> units.
> We continue to use fine-grained time units (usually ms) internally in code 
> and also accept "ms" option for tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-8893:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> DNs with failed volumes stop serving during rolling upgrade
> ---
>
> Key: HDFS-8893
> URL: https://issues.apache.org/jira/browse/HDFS-8893
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rushabh Shah
>Priority: Critical
>
> When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker 
> to each of their volumes. If one of the volumes is bad, this will fail. When 
> this failure happens, the DN does not update the key it received from the NN.
> Unfortunately we had one failed volume on all the 3 datanodes which were 
> having replica.
> Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the 
> DNs with failed volumes will stop serving clients.
> Here is the stack trace on the datanode size:
> {noformat}
> 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN 
> datanode.DataNode: IOException in offerService
> java.io.IOException: Read-only file system
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:947)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9821) HDFS configuration should accept friendly time units

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-9821:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> HDFS configuration should accept friendly time units
> 
>
> Key: HDFS-9821
> URL: https://issues.apache.org/jira/browse/HDFS-9821
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 2.8.0
>Reporter: Arpit Agarwal
>Assignee: Xiaobing Zhou
>Priority: Major
>
> HDFS configuration keys that define time intervals use units inconsistently 
> (Hours, seconds, milliseconds).
> Not all keys have the unit as part of their name. Related keys may use 
> different units e.g. {{dfs.blockreport.intervalMsec}} accepts msec while 
> {{dfs.blockreport.initialDelay}} accepts seconds. Milliseconds is rarely 
> useful as a time unit which makes these values hard to parse when reading 
> config files.
> We can either
> # Let existing keys use friendly units e.g. 100ms, 60s, 5m, 1d, 6w etc. This 
> can be done compatibly since there will be no conflict with existing valid 
> configuration. If no suffix is specified just default to the current time 
> unit.
> # Just deprecate the existing keys and define new ones that accept friendly 
> units.
> We continue to use fine-grained time units (usually ms) internally in code 
> and also accept "ms" option for tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9940) Balancer should not use property dfs.datanode.balance.max.concurrent.moves

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802547#comment-17802547
 ] 

Shilun Fan commented on HDFS-9940:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Balancer should not use property dfs.datanode.balance.max.concurrent.moves
> --
>
> Key: HDFS-9940
> URL: https://issues.apache.org/jira/browse/HDFS-9940
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
>
> It is very confusing for both Balancer and Datanode to use the same property 
> {{dfs.datanode.balance.max.concurrent.moves}}. It is especially so for the 
> Balancer because the property has "datanode" in the name string. Many 
> customers forget to set the property for the Balancer.
> Change the Balancer to use a new property 
> {{dfs.balancer.max.concurrent.moves}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9940) Balancer should not use property dfs.datanode.balance.max.concurrent.moves

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-9940:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Balancer should not use property dfs.datanode.balance.max.concurrent.moves
> --
>
> Key: HDFS-9940
> URL: https://issues.apache.org/jira/browse/HDFS-9940
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
>
> It is very confusing for both Balancer and Datanode to use the same property 
> {{dfs.datanode.balance.max.concurrent.moves}}. It is especially so for the 
> Balancer because the property has "datanode" in the name string. Many 
> customers forget to set the property for the Balancer.
> Change the Balancer to use a new property 
> {{dfs.balancer.max.concurrent.moves}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10274) Move NameSystem#isInStartupSafeMode() to BlockManagerSafeMode

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802544#comment-17802544
 ] 

Shilun Fan commented on HDFS-10274:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Move NameSystem#isInStartupSafeMode() to BlockManagerSafeMode
> -
>
> Key: HDFS-10274
> URL: https://issues.apache.org/jira/browse/HDFS-10274
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
> Attachments: HDFS-10274-01.patch
>
>
> To reduce the number of methods in Namesystem interface and for clean looking 
> refactor, its better to move {{isInStartupSafeMode()}} to BlockManager and 
> BlockManagerSafeMode, as most of the callers are in BlockManager. So one more 
> interface overhead can be reduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10237) Support specifying checksum type in WebHDFS/HTTPFS writers

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-10237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-10237:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Support specifying checksum type in WebHDFS/HTTPFS writers
> --
>
> Key: HDFS-10237
> URL: https://issues.apache.org/jira/browse/HDFS-10237
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Affects Versions: 2.8.0
>Reporter: Harsh J
>Priority: Minor
> Attachments: HDFS-10237.000.patch, HDFS-10237.001.patch, 
> HDFS-10237.002.patch, HDFS-10237.002.patch
>
>
> Currently you cannot set a desired checksum type over a WebHDFS or HTTPFS 
> writer, as you can with the regular DFS writer (done via HADOOP-8240)
> This JIRA covers the changes necessary to bring the same ability to WebHDFS 
> and HTTPFS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10274) Move NameSystem#isInStartupSafeMode() to BlockManagerSafeMode

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-10274:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Move NameSystem#isInStartupSafeMode() to BlockManagerSafeMode
> -
>
> Key: HDFS-10274
> URL: https://issues.apache.org/jira/browse/HDFS-10274
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
> Attachments: HDFS-10274-01.patch
>
>
> To reduce the number of methods in Namesystem interface and for clean looking 
> refactor, its better to move {{isInStartupSafeMode()}} to BlockManager and 
> BlockManagerSafeMode, as most of the callers are in BlockManager. So one more 
> interface overhead can be reduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10237) Support specifying checksum type in WebHDFS/HTTPFS writers

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-10237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802545#comment-17802545
 ] 

Shilun Fan commented on HDFS-10237:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Support specifying checksum type in WebHDFS/HTTPFS writers
> --
>
> Key: HDFS-10237
> URL: https://issues.apache.org/jira/browse/HDFS-10237
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Affects Versions: 2.8.0
>Reporter: Harsh J
>Priority: Minor
> Attachments: HDFS-10237.000.patch, HDFS-10237.001.patch, 
> HDFS-10237.002.patch, HDFS-10237.002.patch
>
>
> Currently you cannot set a desired checksum type over a WebHDFS or HTTPFS 
> writer, as you can with the regular DFS writer (done via HADOOP-8240)
> This JIRA covers the changes necessary to bring the same ability to WebHDFS 
> and HTTPFS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10364) Log current node in reversexml tool when parse failed

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802543#comment-17802543
 ] 

Shilun Fan commented on HDFS-10364:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Log current node in reversexml tool when parse failed
> -
>
> Key: HDFS-10364
> URL: https://issues.apache.org/jira/browse/HDFS-10364
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Trivial
> Attachments: HDFS-10364.01.patch
>
>
> Sometimes we want to modify the xml before converting it. If some error 
> happened, it's hard to find out. Adding a line to tell where the failure is 
> would be helpful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10364) Log current node in reversexml tool when parse failed

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-10364:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Log current node in reversexml tool when parse failed
> -
>
> Key: HDFS-10364
> URL: https://issues.apache.org/jira/browse/HDFS-10364
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Trivial
> Attachments: HDFS-10364.01.patch
>
>
> Sometimes we want to modify the xml before converting it. If some error 
> happened, it's hard to find out. Adding a line to tell where the failure is 
> would be helpful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10429) DataStreamer interrupted warning always appears when using CLI upload file

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-10429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-10429:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> DataStreamer interrupted warning  always appears when using CLI upload file
> ---
>
> Key: HDFS-10429
> URL: https://issues.apache.org/jira/browse/HDFS-10429
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Zhiyuan Yang
>Priority: Minor
> Attachments: HDFS-10429.1.patch, HDFS-10429.2.patch, 
> HDFS-10429.3.patch
>
>
> Every time I use 'hdfs dfs -put' upload file, this warning is printed:
> {code:java}
> 16/05/18 20:57:56 WARN hdfs.DataStreamer: Caught exception
> java.lang.InterruptedException
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Thread.join(Thread.java:1245)
>   at java.lang.Thread.join(Thread.java:1319)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:871)
>   at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:519)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:696)
> {code}
> The reason is this: originally, DataStreamer::closeResponder always prints a 
> warning about InterruptedException; since HDFS-9812, 
> DFSOutputStream::closeImpl  always forces threads to close, which causes 
> InterruptedException.
> A simple fix is to use debug level log instead of warning level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10529) Df reports incorrect usage when appending less than block size

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-10529:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Df reports incorrect usage when appending less than block size
> --
>
> Key: HDFS-10529
> URL: https://issues.apache.org/jira/browse/HDFS-10529
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2, 3.0.0-alpha1
>Reporter: Pranav Prakash
>Assignee: Pranav Prakash
>Priority: Minor
>  Labels: datanode, fs, hdfs
> Attachments: HDFS-10529.000.patch
>
>
> Steps to recreate issue:
> 1. Create a 100MB file on HDFS cluster with 128MB blocksize and replication 
> factor 3
> 2. Append 100MB to the file
> 3. Df reports around 900MB even though it should only be around 600MB.
> Looking at the blocks confirms that df is incorrect, as there exist only two 
> blocks on each DN -- a 128MB block and a 72MB block.
> This issue seems to arise because BlockPoolSlice does not account for the 
> delta increase in dfsUsage when an append happens to a partially-filled 
> block, and instead naively adds the total block size. For instance, in the 
> example scenario when when block is "filled" from 100 to 128MB, 
> addFinalizedBlock() in BlockPoolSlice adds the size of the newly created 
> block into the total instead of accounting for the difference/delta in block 
> size between old and new.  This has the effect of double-counting the old 
> partially-filled block: it is counted once when it is first created (in the 
> example scenario when the 100MB file is created) and again when it becomes 
> part of the filled block (in the example scenario when the 128MB block is 
> formed form the initial 100MB block). Thus the perceived size becomes 100MB + 
> 128MB + 72 = 300 MB for each DN, or 900MB across the cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10529) Df reports incorrect usage when appending less than block size

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802540#comment-17802540
 ] 

Shilun Fan commented on HDFS-10529:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Df reports incorrect usage when appending less than block size
> --
>
> Key: HDFS-10529
> URL: https://issues.apache.org/jira/browse/HDFS-10529
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2, 3.0.0-alpha1
>Reporter: Pranav Prakash
>Assignee: Pranav Prakash
>Priority: Minor
>  Labels: datanode, fs, hdfs
> Attachments: HDFS-10529.000.patch
>
>
> Steps to recreate issue:
> 1. Create a 100MB file on HDFS cluster with 128MB blocksize and replication 
> factor 3
> 2. Append 100MB to the file
> 3. Df reports around 900MB even though it should only be around 600MB.
> Looking at the blocks confirms that df is incorrect, as there exist only two 
> blocks on each DN -- a 128MB block and a 72MB block.
> This issue seems to arise because BlockPoolSlice does not account for the 
> delta increase in dfsUsage when an append happens to a partially-filled 
> block, and instead naively adds the total block size. For instance, in the 
> example scenario when when block is "filled" from 100 to 128MB, 
> addFinalizedBlock() in BlockPoolSlice adds the size of the newly created 
> block into the total instead of accounting for the difference/delta in block 
> size between old and new.  This has the effect of double-counting the old 
> partially-filled block: it is counted once when it is first created (in the 
> example scenario when the 100MB file is created) and again when it becomes 
> part of the filled block (in the example scenario when the 128MB block is 
> formed form the initial 100MB block). Thus the perceived size becomes 100MB + 
> 128MB + 72 = 300 MB for each DN, or 900MB across the cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10743) MiniDFSCluster test runtimes can be drastically reduce

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-10743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-10743:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> MiniDFSCluster test runtimes can be drastically reduce
> --
>
> Key: HDFS-10743
> URL: https://issues.apache.org/jira/browse/HDFS-10743
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: HDFS-10743.001.patch, HDFS-10743.002.patch, 
> HDFS-10743.003.patch
>
>
> {{MiniDFSCluster}} tests have excessive runtimes.  The main problem appears 
> to be the heartbeat interval.  The NN may have to wait up to 3s (default 
> value) for all DNs to heartbeat, triggering registration, so NN can go 
> active.  Tests that repeatedly restart the NN are severely affected.
> Example for varying heartbeat intervals for {{TestFSImageWithAcl}}:
> * 3s = ~70s -- (disgusting, why I investigated)
> * 1s = ~27s
> * 500ms = ~17s -- (had to hack DNConf for millisecond precision)
> That a 4x improvement in runtime.
> 17s is still excessively long for what the test does.  Further areas to 
> explore when running tests:
> * Reduce numerous sleeps intervals in DN's {{BPServiceActor}}.
> * Ensure heartbeats and initial BR are sent immediately upon (re)registration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10848) Move hadoop-hdfs-native-client module into hadoop-hdfs-client

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-10848:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Move hadoop-hdfs-native-client module into hadoop-hdfs-client
> -
>
> Key: HDFS-10848
> URL: https://issues.apache.org/jira/browse/HDFS-10848
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Akira Ajisaka
>Assignee: Huafeng Wang
>Priority: Major
> Attachments: HDFS-10848.001.patch
>
>
> When a patch changes hadoop-hdfs-client module, Jenkins does not pick up the 
> tests in the native code. That way we overlooked test failure when committing 
> the patch. (ex. HDFS-10844)
> [~aw] said in HDFS-10844,
> bq. Ideally, all of this native code would be hdfs-client. Then when a change 
> is made to to that code, this code will also get tested.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10854) Remove createStripedFile and addBlockToFile by creating real EC files

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802536#comment-17802536
 ] 

Shilun Fan commented on HDFS-10854:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Remove createStripedFile and addBlockToFile by creating real EC files
> -
>
> Key: HDFS-10854
> URL: https://issues.apache.org/jira/browse/HDFS-10854
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding, test
>Affects Versions: 3.0.0-alpha2
>Reporter: Zhe Zhang
>Assignee: Sammi Chen
>Priority: Major
>
> {{DFSTestUtil#createStripedFile}} and {{addBlockToFile}} were developed 
> before we completed EC client. They were used to test the {{NameNode}} EC 
> logic when the client was unable to really create/read/write EC files.
> They are causing confusions in other issues about {{NameNode}}. For example, 
> in one of the patches under {{HDFS-10301}}, 
> {{testProcessOverReplicatedAndMissingStripedBlock}} fails because in the test 
> we fake a block report from a DN, with a randomly generated storage ID. The 
> DN itself is never aware of that storage. This is not possible in a real 
> production environment.
> {code}
>   DatanodeStorage storage = new 
> DatanodeStorage(UUID.randomUUID().toString());
>   StorageReceivedDeletedBlocks[] reports = DFSTestUtil
>   .makeReportForReceivedBlock(block,
>   ReceivedDeletedBlockInfo.BlockStatus.RECEIVING_BLOCK, storage);
>   for (StorageReceivedDeletedBlocks report : reports) {
> ns.processIncrementalBlockReport(dn.getDatanodeId(), report);
>   }
> {code}
> Now that we have a fully functional EC client, we should remove the old 
> testing logic and use similar logic as non-EC tests (creating real files and 
> emulate blocks missing / being corrupt).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10616) Improve performance of path handling

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-10616:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Improve performance of path handling
> 
>
> Key: HDFS-10616
> URL: https://issues.apache.org/jira/browse/HDFS-10616
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Major
> Attachments: 2.6-2.7.1-heap.png
>
>
> Path handling in the namesystem and directory is very inefficient.  The path 
> is repeatedly resolved, decomposed into path components, recombined to a full 
> path. parsed again, throughout the system.  This is directly inefficient for 
> general performance, and indirectly via unnecessary pressure on young gen GC.
> The namesystem should only operate on paths, parse it once into inodes, and 
> the directory should only operate on inodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10848) Move hadoop-hdfs-native-client module into hadoop-hdfs-client

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802537#comment-17802537
 ] 

Shilun Fan commented on HDFS-10848:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Move hadoop-hdfs-native-client module into hadoop-hdfs-client
> -
>
> Key: HDFS-10848
> URL: https://issues.apache.org/jira/browse/HDFS-10848
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Akira Ajisaka
>Assignee: Huafeng Wang
>Priority: Major
> Attachments: HDFS-10848.001.patch
>
>
> When a patch changes hadoop-hdfs-client module, Jenkins does not pick up the 
> tests in the native code. That way we overlooked test failure when committing 
> the patch. (ex. HDFS-10844)
> [~aw] said in HDFS-10844,
> bq. Ideally, all of this native code would be hdfs-client. Then when a change 
> is made to to that code, this code will also get tested.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10854) Remove createStripedFile and addBlockToFile by creating real EC files

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-10854:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Remove createStripedFile and addBlockToFile by creating real EC files
> -
>
> Key: HDFS-10854
> URL: https://issues.apache.org/jira/browse/HDFS-10854
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding, test
>Affects Versions: 3.0.0-alpha2
>Reporter: Zhe Zhang
>Assignee: Sammi Chen
>Priority: Major
>
> {{DFSTestUtil#createStripedFile}} and {{addBlockToFile}} were developed 
> before we completed EC client. They were used to test the {{NameNode}} EC 
> logic when the client was unable to really create/read/write EC files.
> They are causing confusions in other issues about {{NameNode}}. For example, 
> in one of the patches under {{HDFS-10301}}, 
> {{testProcessOverReplicatedAndMissingStripedBlock}} fails because in the test 
> we fake a block report from a DN, with a randomly generated storage ID. The 
> DN itself is never aware of that storage. This is not possible in a real 
> production environment.
> {code}
>   DatanodeStorage storage = new 
> DatanodeStorage(UUID.randomUUID().toString());
>   StorageReceivedDeletedBlocks[] reports = DFSTestUtil
>   .makeReportForReceivedBlock(block,
>   ReceivedDeletedBlockInfo.BlockStatus.RECEIVING_BLOCK, storage);
>   for (StorageReceivedDeletedBlocks report : reports) {
> ns.processIncrementalBlockReport(dn.getDatanodeId(), report);
>   }
> {code}
> Now that we have a fully functional EC client, we should remove the old 
> testing logic and use similar logic as non-EC tests (creating real files and 
> emulate blocks missing / being corrupt).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11024) Add rate metrics for block recovery work

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11024:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Add rate metrics for block recovery work
> 
>
> Key: HDFS-11024
> URL: https://issues.apache.org/jira/browse/HDFS-11024
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.3
>Reporter: Andrew Wang
>Priority: Major
>
> As discussed on HDFS-6682, admins currently have very little introspection 
> into how fast recovery work is progressing on the cluster. It'd be useful to 
> have rate metrics for the pending queue on the NN, possibly also DN-side too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11020) Add more doc for HDFS transparent encryption

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11020:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Add more doc for HDFS transparent encryption
> 
>
> Key: HDFS-11020
> URL: https://issues.apache.org/jira/browse/HDFS-11020
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation, encryption, fs
>Reporter: Yi Liu
>Assignee: Jennica Pounds
>Priority: Minor
>
> We need correct version of Openssl which supports hardware acceleration of 
> AES CTR.
> Let's add more doc about how to configure the correct Openssl.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10919) Provide admin/debug tool to dump out info of a given block

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802534#comment-17802534
 ] 

Shilun Fan commented on HDFS-10919:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Provide admin/debug tool to dump out info of a given block
> --
>
> Key: HDFS-10919
> URL: https://issues.apache.org/jira/browse/HDFS-10919
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Yongjun Zhang
>Assignee: Shweta
>Priority: Major
>
> We have fsck to find out blocks associated with a file, which is nice. 
> Sometimes, we saw trouble with a specific block, we'd like to collect info 
> about this block, such as
> * what file this block belong to, 
> * where the replicas of this block are located, 
> * whether the block is EC coded; 
> * if a block is EC coded, whether it's a data block, or code
> * if a block is EC coded, what's the codec.
> * if a block is EC coded, what's the block group
> * for the block group, what are the other blocks
> Create this jira to provide such a util, as dfsadmin, or a debug tool.
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10919) Provide admin/debug tool to dump out info of a given block

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-10919:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Provide admin/debug tool to dump out info of a given block
> --
>
> Key: HDFS-10919
> URL: https://issues.apache.org/jira/browse/HDFS-10919
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Yongjun Zhang
>Assignee: Shweta
>Priority: Major
>
> We have fsck to find out blocks associated with a file, which is nice. 
> Sometimes, we saw trouble with a specific block, we'd like to collect info 
> about this block, such as
> * what file this block belong to, 
> * where the replicas of this block are located, 
> * whether the block is EC coded; 
> * if a block is EC coded, whether it's a data block, or code
> * if a block is EC coded, what's the codec.
> * if a block is EC coded, what's the block group
> * for the block group, what are the other blocks
> Create this jira to provide such a util, as dfsadmin, or a debug tool.
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11024) Add rate metrics for block recovery work

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802532#comment-17802532
 ] 

Shilun Fan commented on HDFS-11024:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Add rate metrics for block recovery work
> 
>
> Key: HDFS-11024
> URL: https://issues.apache.org/jira/browse/HDFS-11024
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.3
>Reporter: Andrew Wang
>Priority: Major
>
> As discussed on HDFS-6682, admins currently have very little introspection 
> into how fast recovery work is progressing on the cluster. It'd be useful to 
> have rate metrics for the pending queue on the NN, possibly also DN-side too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11039) Expose more configuration properties to hdfs-default.xml

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802531#comment-17802531
 ] 

Shilun Fan commented on HDFS-11039:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Expose more configuration properties to hdfs-default.xml
> 
>
> Key: HDFS-11039
> URL: https://issues.apache.org/jira/browse/HDFS-11039
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, newbie
>Reporter: Yi Liu
>Assignee: Jennica Pounds
>Priority: Minor
>
> There are some configuration properties for hdfs, but have not been exposed 
> in hdfs-default.xml.
> It's convenient for Hadoop user/admin if we add them in the hdfs-default.xml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11039) Expose more configuration properties to hdfs-default.xml

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11039:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Expose more configuration properties to hdfs-default.xml
> 
>
> Key: HDFS-11039
> URL: https://issues.apache.org/jira/browse/HDFS-11039
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, newbie
>Reporter: Yi Liu
>Assignee: Jennica Pounds
>Priority: Minor
>
> There are some configuration properties for hdfs, but have not been exposed 
> in hdfs-default.xml.
> It's convenient for Hadoop user/admin if we add them in the hdfs-default.xml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11091) Implement a getTrashRoot that does not fall-back

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802528#comment-17802528
 ] 

Shilun Fan commented on HDFS-11091:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Implement a getTrashRoot that does not fall-back
> 
>
> Key: HDFS-11091
> URL: https://issues.apache.org/jira/browse/HDFS-11091
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Yuanbo Liu
>Priority: Major
>
> From HDFS-10756's 
> [discussion|https://issues.apache.org/jira/browse/HDFS-10756?focusedCommentId=15623755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15623755]:
> {{getTrashRoot}} is supposed to return the trash dir considering encryption 
> zone. But if there's an error encountered (e.g. access control exception), it 
> falls back to the default trash dir.
> Although there is a warning message about this, it is still a somewhat 
> surprising behavior. The fall back was added by HDFS-9799 for compatibility 
> reasons. This jira is to propose we add a getTrashRoot that throws, which 
> will actually be more user-friendly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11063) Set NameNode RPC server handler thread name with more descriptive information about the RPC call.

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802530#comment-17802530
 ] 

Shilun Fan commented on HDFS-11063:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Set NameNode RPC server handler thread name with more descriptive information 
> about the RPC call.
> -
>
> Key: HDFS-11063
> URL: https://issues.apache.org/jira/browse/HDFS-11063
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chris Nauroth
>Priority: Major
>
> We often run {{jstack}} on a NameNode process as a troubleshooting step if it 
> is suffering high load or appears to be hanging.  By reading the stack trace, 
> we can identify if a caller is blocked inside an expensive operation.  This 
> would be even more helpful if we updated the RPC server handler thread name 
> with more descriptive information about the RPC call.  This could include the 
> calling user, the called RPC method, and the most significant argument to 
> that method (most likely the path).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11066) Improve test coverage for ISA-L native coder

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11066:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Improve test coverage for ISA-L native coder
> 
>
> Key: HDFS-11066
> URL: https://issues.apache.org/jira/browse/HDFS-11066
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Huafeng Wang
>Priority: Major
>  Labels: hdfs-ec-3.0-nice-to-have
>
> Some issues were introduced but not found in time due to lack of necessary 
> Jenkins support for the ISA-L related building options. We should re-enable 
> ISA-L related building options in Jenkins system, so to ensure the quality of 
> the related native codes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11063) Set NameNode RPC server handler thread name with more descriptive information about the RPC call.

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11063:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Set NameNode RPC server handler thread name with more descriptive information 
> about the RPC call.
> -
>
> Key: HDFS-11063
> URL: https://issues.apache.org/jira/browse/HDFS-11063
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chris Nauroth
>Priority: Major
>
> We often run {{jstack}} on a NameNode process as a troubleshooting step if it 
> is suffering high load or appears to be hanging.  By reading the stack trace, 
> we can identify if a caller is blocked inside an expensive operation.  This 
> would be even more helpful if we updated the RPC server handler thread name 
> with more descriptive information about the RPC call.  This could include the 
> calling user, the called RPC method, and the most significant argument to 
> that method (most likely the path).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11091) Implement a getTrashRoot that does not fall-back

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11091:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Implement a getTrashRoot that does not fall-back
> 
>
> Key: HDFS-11091
> URL: https://issues.apache.org/jira/browse/HDFS-11091
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Yuanbo Liu
>Priority: Major
>
> From HDFS-10756's 
> [discussion|https://issues.apache.org/jira/browse/HDFS-10756?focusedCommentId=15623755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15623755]:
> {{getTrashRoot}} is supposed to return the trash dir considering encryption 
> zone. But if there's an error encountered (e.g. access control exception), it 
> falls back to the default trash dir.
> Although there is a warning message about this, it is still a somewhat 
> surprising behavior. The fall back was added by HDFS-9799 for compatibility 
> reasons. This jira is to propose we add a getTrashRoot that throws, which 
> will actually be more user-friendly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11109) ViewFileSystem Df command should work even when the backing NameServices are down

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11109:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> ViewFileSystem Df command should work even when the backing NameServices are 
> down
> -
>
> Key: HDFS-11109
> URL: https://issues.apache.org/jira/browse/HDFS-11109
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
>  Labels: viewfs
>
> With HDFS-11058, Df command will work well with ViewFileSystem. federated 
> cluster can be backed up several NameServers, with each managing their own 
> NameSpaces. Even when some of the NameServers are down, the Federated cluster 
> will continue to work well for the NameServers that are alive. 
> But {{hadoop fs -df}} command when run against the Federated cluster expects 
> all the backing NameServers to be up and running. Else, the command errors 
> out with exception. 
> Would be preferable to have the federated cluster commands highly available 
> to match the NameSpace partition availability. 
> {noformat}
> #hadoop fs -df -h /
> df: Call From manoj-mbp.local/172.16.3.66 to localhost:52001 failed on 
> connection exception: java.net.ConnectException: Connection refused; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11232) System.err should be System.out

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802524#comment-17802524
 ] 

Shilun Fan commented on HDFS-11232:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> System.err should be System.out
> ---
>
> Key: HDFS-11232
> URL: https://issues.apache.org/jira/browse/HDFS-11232
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Ethan Li
>Priority: Trivial
> Attachments: HDFS-11232.001.patch, HDFS-11232.002.patch
>
>
> In 
> /Users/Ethan/Worksplace/IntelliJWorkspace/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java,
>  System.err.println("Generating new cluster id:"); is used. I think it should 
> be System.out.println(...) since this is not an error message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11232) System.err should be System.out

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11232:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> System.err should be System.out
> ---
>
> Key: HDFS-11232
> URL: https://issues.apache.org/jira/browse/HDFS-11232
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Ethan Li
>Priority: Trivial
> Attachments: HDFS-11232.001.patch, HDFS-11232.002.patch
>
>
> In 
> /Users/Ethan/Worksplace/IntelliJWorkspace/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java,
>  System.err.println("Generating new cluster id:"); is used. I think it should 
> be System.out.println(...) since this is not an error message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11203) Rename support during re-encrypt EDEK

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11203:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Rename support during re-encrypt EDEK
> -
>
> Key: HDFS-11203
> URL: https://issues.apache.org/jira/browse/HDFS-11203
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: encryption
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
>
> Currently HDFS-10899 disables renames within the EZ if it's under 
> re-encryption. (similar to current cross-zone rename checks).
> We'd like to support rename in the long run, so cluster is fully functioning 
> during re-encryption.
> The reason rename is particularly difficult is:
> - We want to re-encrypt all files under an EZ in one pass, without missing any
> - We want to iterate through the files and keep track of where we are (i.e. a 
> cursor), so in case of NN failover/crash, we can resume from fsimage/edits.
> - We cannot guarantee namespace is not changed during re-encryption. Newly 
> created files automatically has new edek, deleted files we don't care. But if 
> a file is renamed from behind the cursor to before, it may be missed in the 
> re-encryption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11260) Slow writer threads are not stopped

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802523#comment-17802523
 ] 

Shilun Fan commented on HDFS-11260:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Slow writer threads are not stopped
> ---
>
> Key: HDFS-11260
> URL: https://issues.apache.org/jira/browse/HDFS-11260
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
> Environment: CDH5.8.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> If a DataNode receives a transferred block, it tries to stop writer to the 
> same block. However, this may not work, and we saw the following error 
> message and stacktrace.
> Fundamentally, the assumption of {{ReplicaInPipeline#stopWriter}} is wrong. 
> It assumes the writer thread must be a DataXceiver thread, which it can be 
> interrupted and terminates afterwards. However, IPC threads may also be the 
> writer thread by calling initReplicaRecovery, and which ignores interrupt and 
> do not terminate. 
> {noformat}
> 2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Join on writer thread Thread[IPC Server handler 6 on 50020,5,main] timed out
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> org.apache.hadoop.ipc.CallQueueManager.take(CallQueueManager.java:135)
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2052)
> 2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> IOException in BlockReceiver constructor. Cause is
> 2016-12-16 19:58:56,168 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> sj1dra082.corp.adobe.com:50010:DataXceiver error processing WRITE_BLOCK 
> operation  src: /10.10.0.80:44105 dst: /10.10.0.82:50010
> java.io.IOException: Join on writer thread Thread[IPC Server handler 6 on 
> 50020,5,main] timed out
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:212)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1579)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:195)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:669)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> There is also a logic error in FsDatasetImpl#createTemporary, in which if the 
> code in the synchronized block executes for more than 60 seconds (in theory), 
> it could throw an exception, without trying to stop the existing slow writer.
> We saw a FsDatasetImpl#createTemporary failed after nearly 10 minutes, and 
> it's unclear why yet. It's my understanding that the code intends to stop 
> slow writers after 1 minute by default. Some code rewrite is probably needed 
> to get the logic right.
> {noformat}
> 2016-12-16 23:12:24,636 WARN 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Unable 
> to stop existing writer for block 
> BP-1527842723-10.0.0.180-1367984731269:blk_4313782210_1103780331023 after 
> 568320 miniseconds.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11203) Rename support during re-encrypt EDEK

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802525#comment-17802525
 ] 

Shilun Fan commented on HDFS-11203:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Rename support during re-encrypt EDEK
> -
>
> Key: HDFS-11203
> URL: https://issues.apache.org/jira/browse/HDFS-11203
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: encryption
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
>
> Currently HDFS-10899 disables renames within the EZ if it's under 
> re-encryption. (similar to current cross-zone rename checks).
> We'd like to support rename in the long run, so cluster is fully functioning 
> during re-encryption.
> The reason rename is particularly difficult is:
> - We want to re-encrypt all files under an EZ in one pass, without missing any
> - We want to iterate through the files and keep track of where we are (i.e. a 
> cursor), so in case of NN failover/crash, we can resume from fsimage/edits.
> - We cannot guarantee namespace is not changed during re-encryption. Newly 
> created files automatically has new edek, deleted files we don't care. But if 
> a file is renamed from behind the cursor to before, it may be missed in the 
> re-encryption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11356) figure out what to do about hadoop-hdfs-project/hadoop-hdfs/src/main/native

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802520#comment-17802520
 ] 

Shilun Fan commented on HDFS-11356:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> figure out what to do about hadoop-hdfs-project/hadoop-hdfs/src/main/native
> ---
>
> Key: HDFS-11356
> URL: https://issues.apache.org/jira/browse/HDFS-11356
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, documentation
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-11356.001.patch
>
>
> The move of code to hdfs-client-native creation caused all sorts of loose 
> ends, and this is just another one.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11350) Document the missing settings relevant to DataNode disk IO statistics

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802521#comment-17802521
 ] 

Shilun Fan commented on HDFS-11350:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Document the missing settings relevant to DataNode disk IO statistics
> -
>
> Key: HDFS-11350
> URL: https://issues.apache.org/jira/browse/HDFS-11350
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-11350.001.patch
>
>
> HDFS-11299 and HDFS-11339 introduced some new setting relevant to  profiling 
> hooks to expose latency statistics around DataNode disk IO. These settings 
> should be intended for users but they are not documented. Totally three 
> relevant settings:
> 1.dfs.datanode.enable.fileio.profiling
> 2.dfs.datanode.enable.fileio.fault.injection
> 3.dfs.datanode.fileio.profiling.sampling.fraction
> Actually, only the setting {{dfs.datanode.enable.fileio.fault.injection}} 
> don't need to be documented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11161) Incorporate Baidu Yun BOS file system implementation

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11161:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Incorporate Baidu Yun BOS file system implementation
> 
>
> Key: HDFS-11161
> URL: https://issues.apache.org/jira/browse/HDFS-11161
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Faen Zhang
>Priority: Major
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> Baidu Yun ( https://cloud.baidu.com/ ) is one of top tier cloud computing 
> provider. Baidu Yun BOS is widely used among China's cloud users, but 
> currently it is not easy to access data laid on BOS storage from user's 
> Hadoop/Spark application, because of no original support for BOS in Hadoop.
> This work aims to integrate Baidu Yun BOS with Hadoop. By simple 
> configuration, Spark/Hadoop applications can read/write data from BOS without 
> any code change. Narrowing the gap between user's APP and data storage, like 
> what have been done for S3 and Aliyun OSS in Hadoop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11350) Document the missing settings relevant to DataNode disk IO statistics

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11350:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Document the missing settings relevant to DataNode disk IO statistics
> -
>
> Key: HDFS-11350
> URL: https://issues.apache.org/jira/browse/HDFS-11350
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-11350.001.patch
>
>
> HDFS-11299 and HDFS-11339 introduced some new setting relevant to  profiling 
> hooks to expose latency statistics around DataNode disk IO. These settings 
> should be intended for users but they are not documented. Totally three 
> relevant settings:
> 1.dfs.datanode.enable.fileio.profiling
> 2.dfs.datanode.enable.fileio.fault.injection
> 3.dfs.datanode.fileio.profiling.sampling.fraction
> Actually, only the setting {{dfs.datanode.enable.fileio.fault.injection}} 
> don't need to be documented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11381) PreCommit TestDataNodeOutlierDetectionViaMetrics failure

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802519#comment-17802519
 ] 

Shilun Fan commented on HDFS-11381:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> PreCommit TestDataNodeOutlierDetectionViaMetrics failure
> 
>
> Key: HDFS-11381
> URL: https://issues.apache.org/jira/browse/HDFS-11381
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-alpha4
>Reporter: John Zhuge
>Assignee: Arpit Agarwal
>Priority: Minor
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/18285/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
> {noformat}
> Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.457 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics
> testOutlierIsDetected(org.apache.hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics)
>   Time elapsed: 0.27 sec  <<< FAILURE!
> java.lang.AssertionError: 
> Expected: is <1>
>  but: was <0>
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.junit.Assert.assertThat(Assert.java:865)
>   at org.junit.Assert.assertThat(Assert.java:832)
>   at 
> org.apache.hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics.testOutlierIsDetected(TestDataNodeOutlierDetectionViaMetrics.java:85)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11398) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure still fails intermittently

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11398:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure still fails 
> intermittently
> 
>
> Key: HDFS-11398
> URL: https://issues.apache.org/jira/browse/HDFS-11398
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-11398-reproduce.patch, HDFS-11398.001.patch, 
> HDFS-11398.002.patch, failure.log
>
>
> The test {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} 
> still fails intermittently in trunk after HDFS-11316. The stack infos:
> {code}
> testUnderReplicationAfterVolFailure(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure)
>   Time elapsed: 95.021 sec  <<< ERROR!
> java.util.concurrent.TimeoutException: Timed out waiting for condition. 
> Thread diagnostics:
> Timestamp: 2017-02-07 07:00:34,193
> 
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:511)
> at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:276)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure(TestDataNodeVolumeFailure.java:412)
> {code}
> I looked into this and found there is one chance that the vaule 
> {{UnderReplicatedBlocksCount}} will be no longer > 0. The following is my 
> analysation:
> In test {{TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure}}, it 
> uses creating file to trigger the disk error checking. The related codes:
> {code}
> Path file1 = new Path("/test1");
> DFSTestUtil.createFile(fs, file1, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file1, (short)3);
> // Fail the first volume on both datanodes
> File dn1Vol1 = new File(dataDir, "data"+(2*0+1));
> File dn2Vol1 = new File(dataDir, "data"+(2*1+1));
> DataNodeTestUtils.injectDataDirFailure(dn1Vol1, dn2Vol1);
> Path file2 = new Path("/test2");
> DFSTestUtil.createFile(fs, file2, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file2, (short)3);
> {code}
> This will lead one problem: If the cluster is busy, and it costs long time to 
> wait replication of file2 to be desired value. During this time, the under 
> replication blocks of file1 can also be rereplication in cluster. If this is 
> done, the condition {{underReplicatedBlocks > 0}} will never be  satisfied.
> And this can be reproduced in my local env.
> Actually, we can use a easy way {{DataNodeTestUtils.waitForDiskError}} to 
> replace this, it runs fast and be more reliable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11260) Slow writer threads are not stopped

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11260:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Slow writer threads are not stopped
> ---
>
> Key: HDFS-11260
> URL: https://issues.apache.org/jira/browse/HDFS-11260
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
> Environment: CDH5.8.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> If a DataNode receives a transferred block, it tries to stop writer to the 
> same block. However, this may not work, and we saw the following error 
> message and stacktrace.
> Fundamentally, the assumption of {{ReplicaInPipeline#stopWriter}} is wrong. 
> It assumes the writer thread must be a DataXceiver thread, which it can be 
> interrupted and terminates afterwards. However, IPC threads may also be the 
> writer thread by calling initReplicaRecovery, and which ignores interrupt and 
> do not terminate. 
> {noformat}
> 2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Join on writer thread Thread[IPC Server handler 6 on 50020,5,main] timed out
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> org.apache.hadoop.ipc.CallQueueManager.take(CallQueueManager.java:135)
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2052)
> 2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> IOException in BlockReceiver constructor. Cause is
> 2016-12-16 19:58:56,168 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> sj1dra082.corp.adobe.com:50010:DataXceiver error processing WRITE_BLOCK 
> operation  src: /10.10.0.80:44105 dst: /10.10.0.82:50010
> java.io.IOException: Join on writer thread Thread[IPC Server handler 6 on 
> 50020,5,main] timed out
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:212)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1579)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:195)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:669)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> There is also a logic error in FsDatasetImpl#createTemporary, in which if the 
> code in the synchronized block executes for more than 60 seconds (in theory), 
> it could throw an exception, without trying to stop the existing slow writer.
> We saw a FsDatasetImpl#createTemporary failed after nearly 10 minutes, and 
> it's unclear why yet. It's my understanding that the code intends to stop 
> slow writers after 1 minute by default. Some code rewrite is probably needed 
> to get the logic right.
> {noformat}
> 2016-12-16 23:12:24,636 WARN 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Unable 
> to stop existing writer for block 
> BP-1527842723-10.0.0.180-1367984731269:blk_4313782210_1103780331023 after 
> 568320 miniseconds.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11397) TestThrottledAsyncChecker#testCancellation timed out

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11397:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> TestThrottledAsyncChecker#testCancellation timed out
> 
>
> Key: HDFS-11397
> URL: https://issues.apache.org/jira/browse/HDFS-11397
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 3.0.0-alpha4
>Reporter: John Zhuge
>Assignee: Manjunath Anand
>Priority: Minor
> Attachments: HDFS-11397-V01.patch
>
>
> {noformat}
> Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 61.153 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker
> testCancellation(org.apache.hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker)
>   Time elapsed: 60.033 sec  <<< ERROR!
> java.lang.Exception: test timed out after 6 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> org.apache.hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker.testCancellation(TestThrottledAsyncChecker.java:114)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11381) PreCommit TestDataNodeOutlierDetectionViaMetrics failure

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11381:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> PreCommit TestDataNodeOutlierDetectionViaMetrics failure
> 
>
> Key: HDFS-11381
> URL: https://issues.apache.org/jira/browse/HDFS-11381
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-alpha4
>Reporter: John Zhuge
>Assignee: Arpit Agarwal
>Priority: Minor
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/18285/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
> {noformat}
> Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.457 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics
> testOutlierIsDetected(org.apache.hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics)
>   Time elapsed: 0.27 sec  <<< FAILURE!
> java.lang.AssertionError: 
> Expected: is <1>
>  but: was <0>
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.junit.Assert.assertThat(Assert.java:865)
>   at org.junit.Assert.assertThat(Assert.java:832)
>   at 
> org.apache.hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics.testOutlierIsDetected(TestDataNodeOutlierDetectionViaMetrics.java:85)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11356) figure out what to do about hadoop-hdfs-project/hadoop-hdfs/src/main/native

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11356:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> figure out what to do about hadoop-hdfs-project/hadoop-hdfs/src/main/native
> ---
>
> Key: HDFS-11356
> URL: https://issues.apache.org/jira/browse/HDFS-11356
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, documentation
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-11356.001.patch
>
>
> The move of code to hdfs-client-native creation caused all sorts of loose 
> ends, and this is just another one.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11423) Allow ReplicaInfo to be subclassed outside of package

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11423:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Allow ReplicaInfo to be subclassed outside of package
> -
>
> Key: HDFS-11423
> URL: https://issues.apache.org/jira/browse/HDFS-11423
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0-alpha2
>Reporter: Joe Pallas
>Priority: Minor
> Attachments: HDFS-11423.001.patch
>
>
> The constructor for {{ReplicaInfo}} is package-scope instead of protected.  
> This means that an alternative dataset implementation in its own package 
> cannot create Replica classes that inherit from ReplicaInfo the way 
> {{LocalReplica}} does (or the way {{ProvidedReplica}} does in HDFS-9806).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11398) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure still fails intermittently

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802516#comment-17802516
 ] 

Shilun Fan commented on HDFS-11398:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure still fails 
> intermittently
> 
>
> Key: HDFS-11398
> URL: https://issues.apache.org/jira/browse/HDFS-11398
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-11398-reproduce.patch, HDFS-11398.001.patch, 
> HDFS-11398.002.patch, failure.log
>
>
> The test {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} 
> still fails intermittently in trunk after HDFS-11316. The stack infos:
> {code}
> testUnderReplicationAfterVolFailure(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure)
>   Time elapsed: 95.021 sec  <<< ERROR!
> java.util.concurrent.TimeoutException: Timed out waiting for condition. 
> Thread diagnostics:
> Timestamp: 2017-02-07 07:00:34,193
> 
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:511)
> at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:276)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure(TestDataNodeVolumeFailure.java:412)
> {code}
> I looked into this and found there is one chance that the vaule 
> {{UnderReplicatedBlocksCount}} will be no longer > 0. The following is my 
> analysation:
> In test {{TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure}}, it 
> uses creating file to trigger the disk error checking. The related codes:
> {code}
> Path file1 = new Path("/test1");
> DFSTestUtil.createFile(fs, file1, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file1, (short)3);
> // Fail the first volume on both datanodes
> File dn1Vol1 = new File(dataDir, "data"+(2*0+1));
> File dn2Vol1 = new File(dataDir, "data"+(2*1+1));
> DataNodeTestUtils.injectDataDirFailure(dn1Vol1, dn2Vol1);
> Path file2 = new Path("/test2");
> DFSTestUtil.createFile(fs, file2, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file2, (short)3);
> {code}
> This will lead one problem: If the cluster is busy, and it costs long time to 
> wait replication of file2 to be desired value. During this time, the under 
> replication blocks of file1 can also be rereplication in cluster. If this is 
> done, the condition {{underReplicatedBlocks > 0}} will never be  satisfied.
> And this can be reproduced in my local env.
> Actually, we can use a easy way {{DataNodeTestUtils.waitForDiskError}} to 
> replace this, it runs fast and be more reliable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11392) FSPermissionChecker#checkSubAccess should support inodeattribute provider

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11392:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> FSPermissionChecker#checkSubAccess should support inodeattribute provider
> -
>
> Key: HDFS-11392
> URL: https://issues.apache.org/jira/browse/HDFS-11392
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: John Zhuge
>Priority: Minor
>
> HDFS-6826 added this TODO in {{FSPermissionChecker#checkSubAccess}}:
> {code:title=FSPermissionChecker#checkSubAccess}
> //TODO have to figure this out with inodeattribute provider
> INodeAttributes inodeAttr =
> getINodeAttrs(components, pathIdx, d, snapshotId);
> {code}
> If inodeattribute provider in play, it always incorrectly returns the root 
> attr of the subtree even when it descends multiple levels down the sub tree, 
> because the components array is for the root .
> {code:title=FSPermissionChecker#getINodeAttrs}
>   private INodeAttributes getINodeAttrs(byte[][] pathByNameArr, int pathIdx,
>   INode inode, int snapshotId) {
> INodeAttributes inodeAttrs = inode.getSnapshotINode(snapshotId);
> if (getAttributesProvider() != null) {
>   String[] elements = new String[pathIdx + 1];
>   for (int i = 0; i < elements.length; i++) {
> elements[i] = DFSUtil.bytes2String(pathByNameArr[i]);
>   }
>   inodeAttrs = getAttributesProvider().getAttributes(elements, 
> inodeAttrs);
> }
> return inodeAttrs;
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11485) HttpFS should warn about weak SSL ciphers

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802511#comment-17802511
 ] 

Shilun Fan commented on HDFS-11485:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> HttpFS should warn about weak SSL ciphers
> -
>
> Key: HDFS-11485
> URL: https://issues.apache.org/jira/browse/HDFS-11485
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 2.9.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>
> HDFS-11418 sets a list of default ciphers that contain a few weak ciphers in 
> order to maintain backwards compatibility. In addition, users can select weak 
> ciphers by env {{HTTPFS_SSL_CIPHERS}}. It'd nice to get warnings about the 
> weak ciphers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11464) Improve the selection in choosing storage for blocks

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11464:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Improve the selection in choosing storage for blocks
> 
>
> Key: HDFS-11464
> URL: https://issues.apache.org/jira/browse/HDFS-11464
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-11464.001.patch, HDFS-11464.002.patch, 
> HDFS-11464.003.patch, HDFS-11464.004.patch, HDFS-11464.005.patch
>
>
> Currently the logic in choosing storage for blocks is not a good way. It 
> always uses the first valid storage of a given StorageType ({{see 
> DataNodeDescriptor#chooseStorage4Block}}). This should not be a good 
> selection. That means blcoks will always be written to the same volume (first 
> volume) and other valid volumes have no choices. This problem is brought up 
> by this comment ( 
> https://issues.apache.org/jira/browse/HDFS-9807?focusedCommentId=15878382&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15878382
>  )
> There is one solution from me:
> * First, based on existing storages in one node, extract all the valid 
> storages into a collection.
> * Then, disrupt the order of these vaild storages, get a new collection.
> * Finally, get the first storage from the new storages collection.
> These steps will be executed in {{DataNodeDescriptor#chooseStorage4Block}} 
> and replace current logic. I think this improvement can be done as a subtask 
> under HDFS-11419. Any further comments are welcomed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11455) Fix javac warnings in HDFS that caused by deprecated FileSystem APIs

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802514#comment-17802514
 ] 

Shilun Fan commented on HDFS-11455:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Fix javac warnings in HDFS that caused by deprecated FileSystem APIs
> 
>
> Key: HDFS-11455
> URL: https://issues.apache.org/jira/browse/HDFS-11455
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-11455.001.patch, HDFS-11455.002.patch, 
> HDFS-11455.003.patch
>
>
> There are many javac warnings coming out after FileSystem APIs which promote 
> inefficient call patterns being deprecated in HADOOP-13321. The relevant 
> warnings:
> {code}
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[320,18]
>  [deprecation] isFile(Path) in FileSystem has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[1409,18]
>  [deprecation] isFile(Path) in FileSystem has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java:[778,19]
>  [deprecation] isDirectory(Path) in FileSystem has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java:[787,20]
>  [deprecation] isDirectory(Path) in FileSystem has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestQuotaByStorageType.java:[834,18]
>  [deprecation] isFile(Path) in FileSystem has been 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11464) Improve the selection in choosing storage for blocks

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802512#comment-17802512
 ] 

Shilun Fan commented on HDFS-11464:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Improve the selection in choosing storage for blocks
> 
>
> Key: HDFS-11464
> URL: https://issues.apache.org/jira/browse/HDFS-11464
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-11464.001.patch, HDFS-11464.002.patch, 
> HDFS-11464.003.patch, HDFS-11464.004.patch, HDFS-11464.005.patch
>
>
> Currently the logic in choosing storage for blocks is not a good way. It 
> always uses the first valid storage of a given StorageType ({{see 
> DataNodeDescriptor#chooseStorage4Block}}). This should not be a good 
> selection. That means blcoks will always be written to the same volume (first 
> volume) and other valid volumes have no choices. This problem is brought up 
> by this comment ( 
> https://issues.apache.org/jira/browse/HDFS-9807?focusedCommentId=15878382&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15878382
>  )
> There is one solution from me:
> * First, based on existing storages in one node, extract all the valid 
> storages into a collection.
> * Then, disrupt the order of these vaild storages, get a new collection.
> * Finally, get the first storage from the new storages collection.
> These steps will be executed in {{DataNodeDescriptor#chooseStorage4Block}} 
> and replace current logic. I think this improvement can be done as a subtask 
> under HDFS-11419. Any further comments are welcomed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11455) Fix javac warnings in HDFS that caused by deprecated FileSystem APIs

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11455:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Fix javac warnings in HDFS that caused by deprecated FileSystem APIs
> 
>
> Key: HDFS-11455
> URL: https://issues.apache.org/jira/browse/HDFS-11455
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Attachments: HDFS-11455.001.patch, HDFS-11455.002.patch, 
> HDFS-11455.003.patch
>
>
> There are many javac warnings coming out after FileSystem APIs which promote 
> inefficient call patterns being deprecated in HADOOP-13321. The relevant 
> warnings:
> {code}
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[320,18]
>  [deprecation] isFile(Path) in FileSystem has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[1409,18]
>  [deprecation] isFile(Path) in FileSystem has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java:[778,19]
>  [deprecation] isDirectory(Path) in FileSystem has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java:[787,20]
>  [deprecation] isDirectory(Path) in FileSystem has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestQuotaByStorageType.java:[834,18]
>  [deprecation] isFile(Path) in FileSystem has been 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11485) HttpFS should warn about weak SSL ciphers

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11485:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> HttpFS should warn about weak SSL ciphers
> -
>
> Key: HDFS-11485
> URL: https://issues.apache.org/jira/browse/HDFS-11485
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 2.9.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>
> HDFS-11418 sets a list of default ciphers that contain a few weak ciphers in 
> order to maintain backwards compatibility. In addition, users can select weak 
> ciphers by env {{HTTPFS_SSL_CIPHERS}}. It'd nice to get warnings about the 
> weak ciphers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11489) KMS should warning about weak SSL ciphers

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11489:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> KMS should warning about weak SSL ciphers
> -
>
> Key: HDFS-11489
> URL: https://issues.apache.org/jira/browse/HDFS-11489
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: kms
>Affects Versions: 2.9.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>
> HADOOP-14083 sets a list of default ciphers that contain a few weak ciphers 
> in order to maintain backwards compatibility. In addition, users can select 
> weak ciphers by env {{KMS_SSL_CIPHERS}}. It'd nice to get warnings about the 
> weak ciphers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11510) Revamp erasure coding user documentation

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11510:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Revamp erasure coding user documentation
> 
>
> Key: HDFS-11510
> URL: https://issues.apache.org/jira/browse/HDFS-11510
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Major
>  Labels: hdfs-ec-3.0-nice-to-have
>
> After we finish more of the must-do EC changes targeted for 3.0, it'd be good 
> to take a fresh look at the EC documentation to make sure it's comprehensive, 
> particularly how to choose a good erasure coding policy for your cluster and 
> how to enable policies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11510) Revamp erasure coding user documentation

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802508#comment-17802508
 ] 

Shilun Fan commented on HDFS-11510:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Revamp erasure coding user documentation
> 
>
> Key: HDFS-11510
> URL: https://issues.apache.org/jira/browse/HDFS-11510
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Major
>  Labels: hdfs-ec-3.0-nice-to-have
>
> After we finish more of the must-do EC changes targeted for 3.0, it'd be good 
> to take a fresh look at the EC documentation to make sure it's comprehensive, 
> particularly how to choose a good erasure coding policy for your cluster and 
> how to enable policies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11542) Fix RawErasureCoderBenchmark decoding operation

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11542:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Fix RawErasureCoderBenchmark decoding operation
> ---
>
> Key: HDFS-11542
> URL: https://issues.apache.org/jira/browse/HDFS-11542
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha2
>Reporter: László Bence Nagy
>Priority: Minor
>  Labels: test
>
> There are some issues with the decode operation in the 
> *RawErasureCoderBenchmark.java* file. The decoding method is called like 
> this: *decoder.decode(decodeInputs, ERASED_INDEXES, outputs);*. 
> Using RS 6+3 configuration it could be called with these parameters correctly 
> like this: *decode([ d0, NULL, d2, d3, NULL, d5, p0, NULL, p2 ], [ 1, 4, 7 ], 
> [ -, -, - ])*. The 1,4,7 indexes are in the *ERASED_INDEXES* array so in the 
> *decodeInputs* array the values at those indexes are set to NULL, all other 
> data and parity packets are present in the array. The *outputs* array's 
> length is 3, where the d1, d4 and p1 packets should be reconstructed. This 
> would be the right solution.
> Right now this example would be called like this: *decode([ d0, d1, d2, d3, 
> d4, d5, -, -, - ], [ 1, 4, 7 ], [ -, -, - ])*. So it has two main problems 
> with the *decodeInputs* array. Firstly, the packets are not set to NULL where 
> they should be based on the *ERASED_INDEXES* array. Secondly, it does not 
> have any parity packets for decoding.
> The first problem is easy to solve, the values at the proper indexes need to 
> be set to NULL. The latter one is a little harder because right now multiple 
> rounds of encode operations are done one after another and similarly multiple 
> decode operations are called one by one. Encode and decode pairs should be 
> called one after another so that the encoded parity packets can be used in 
> the *decodeInputs* array as a parameter for decode. (Of course, their 
> performance should be still measured separately.)
> Moreover, there is one more problem in this file. Right now it works with RS 
> 6+3 and the *ERASED_INDEXES* array is fixed to *[ 6, 7, 8 ]*. So the three 
> parity packets are needed to be reconstructed. This means that no real decode 
> performance is measured because no data packet is needed to be reconstructed 
> (even if the decode works properly). Actually, only new parity packets are 
> needed to be encoded. The exact implementation depends on the underlying 
> erasure coding plugin, but the point is that data packets should also be 
> erased to measure real decode performance.
> In addition to this, more RS configurations (not just 6+3) could be measured 
> as well to be able to compare them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11489) KMS should warning about weak SSL ciphers

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802510#comment-17802510
 ] 

Shilun Fan commented on HDFS-11489:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> KMS should warning about weak SSL ciphers
> -
>
> Key: HDFS-11489
> URL: https://issues.apache.org/jira/browse/HDFS-11489
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: kms
>Affects Versions: 2.9.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>
> HADOOP-14083 sets a list of default ciphers that contain a few weak ciphers 
> in order to maintain backwards compatibility. In addition, users can select 
> weak ciphers by env {{KMS_SSL_CIPHERS}}. It'd nice to get warnings about the 
> weak ciphers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11542) Fix RawErasureCoderBenchmark decoding operation

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802507#comment-17802507
 ] 

Shilun Fan commented on HDFS-11542:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Fix RawErasureCoderBenchmark decoding operation
> ---
>
> Key: HDFS-11542
> URL: https://issues.apache.org/jira/browse/HDFS-11542
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha2
>Reporter: László Bence Nagy
>Priority: Minor
>  Labels: test
>
> There are some issues with the decode operation in the 
> *RawErasureCoderBenchmark.java* file. The decoding method is called like 
> this: *decoder.decode(decodeInputs, ERASED_INDEXES, outputs);*. 
> Using RS 6+3 configuration it could be called with these parameters correctly 
> like this: *decode([ d0, NULL, d2, d3, NULL, d5, p0, NULL, p2 ], [ 1, 4, 7 ], 
> [ -, -, - ])*. The 1,4,7 indexes are in the *ERASED_INDEXES* array so in the 
> *decodeInputs* array the values at those indexes are set to NULL, all other 
> data and parity packets are present in the array. The *outputs* array's 
> length is 3, where the d1, d4 and p1 packets should be reconstructed. This 
> would be the right solution.
> Right now this example would be called like this: *decode([ d0, d1, d2, d3, 
> d4, d5, -, -, - ], [ 1, 4, 7 ], [ -, -, - ])*. So it has two main problems 
> with the *decodeInputs* array. Firstly, the packets are not set to NULL where 
> they should be based on the *ERASED_INDEXES* array. Secondly, it does not 
> have any parity packets for decoding.
> The first problem is easy to solve, the values at the proper indexes need to 
> be set to NULL. The latter one is a little harder because right now multiple 
> rounds of encode operations are done one after another and similarly multiple 
> decode operations are called one by one. Encode and decode pairs should be 
> called one after another so that the encoded parity packets can be used in 
> the *decodeInputs* array as a parameter for decode. (Of course, their 
> performance should be still measured separately.)
> Moreover, there is one more problem in this file. Right now it works with RS 
> 6+3 and the *ERASED_INDEXES* array is fixed to *[ 6, 7, 8 ]*. So the three 
> parity packets are needed to be reconstructed. This means that no real decode 
> performance is measured because no data packet is needed to be reconstructed 
> (even if the decode works properly). Actually, only new parity packets are 
> needed to be encoded. The exact implementation depends on the underlying 
> erasure coding plugin, but the point is that data packets should also be 
> erased to measure real decode performance.
> In addition to this, more RS configurations (not just 6+3) could be measured 
> as well to be able to compare them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11543) Test multiple erasure coding implementations

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802506#comment-17802506
 ] 

Shilun Fan commented on HDFS-11543:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Test multiple erasure coding implementations
> 
>
> Key: HDFS-11543
> URL: https://issues.apache.org/jira/browse/HDFS-11543
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha2
>Reporter: László Bence Nagy
>Priority: Minor
>  Labels: test
>
> Potentially, multiple native erasure coding plugins will be available to be 
> used from HDFS later on. These plugins should be tested as well. For example, 
> the *NativeRSRawErasureCoderFactory* class - which is used for instantiating 
> the native ISA-L plugin's encoder and decoder objects - are used in 5 test 
> files under the 
> *hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/*
>  directory. The files are:
> - *TestDFSStripedInputStream.java*
> - *TestDFSStripedOutputStream.java*
> - *TestDFSStripedOutputStreamWithFailure.java*
> - *TestReconstructStripedFile.java*
> - *TestUnsetAndChangeDirectoryEcPolicy.java*
> Other erasure coding plugins should be tested in these cases as well in a 
> nice way (not by for example making a new file for every new erasure coding 
> plugin). For this purpose [parameterized 
> tests|https://github.com/junit-team/junit4/wiki/parameterized-tests] might be 
> used.
> This is also true for the 
> *hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/erasurecode/rawcoder/*
>  directory where this approach could be used for example for the 
> interoperability tests (when it is checked that certain erasure coding 
> implementations are compatible with each other by doing the encoding and 
> decoding operations with different plugins and verifying their results). The 
> plugin pairs which should be tested could be the parameters for the 
> parameterized tests.
> The parameterized test is just an idea, there can be other solutions as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11543) Test multiple erasure coding implementations

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11543:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Test multiple erasure coding implementations
> 
>
> Key: HDFS-11543
> URL: https://issues.apache.org/jira/browse/HDFS-11543
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha2
>Reporter: László Bence Nagy
>Priority: Minor
>  Labels: test
>
> Potentially, multiple native erasure coding plugins will be available to be 
> used from HDFS later on. These plugins should be tested as well. For example, 
> the *NativeRSRawErasureCoderFactory* class - which is used for instantiating 
> the native ISA-L plugin's encoder and decoder objects - are used in 5 test 
> files under the 
> *hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/*
>  directory. The files are:
> - *TestDFSStripedInputStream.java*
> - *TestDFSStripedOutputStream.java*
> - *TestDFSStripedOutputStreamWithFailure.java*
> - *TestReconstructStripedFile.java*
> - *TestUnsetAndChangeDirectoryEcPolicy.java*
> Other erasure coding plugins should be tested in these cases as well in a 
> nice way (not by for example making a new file for every new erasure coding 
> plugin). For this purpose [parameterized 
> tests|https://github.com/junit-team/junit4/wiki/parameterized-tests] might be 
> used.
> This is also true for the 
> *hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/erasurecode/rawcoder/*
>  directory where this approach could be used for example for the 
> interoperability tests (when it is checked that certain erasure coding 
> implementations are compatible with each other by doing the encoding and 
> decoding operations with different plugins and verifying their results). The 
> plugin pairs which should be tested could be the parameters for the 
> parameterized tests.
> The parameterized test is just an idea, there can be other solutions as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11584) TestErasureCodeBenchmarkThroughput#testECReadWrite fails intermittently

2024-01-03 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-11584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-11584:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> TestErasureCodeBenchmarkThroughput#testECReadWrite fails intermittently
> ---
>
> Key: HDFS-11584
> URL: https://issues.apache.org/jira/browse/HDFS-11584
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Priority: Blocker
>  Labels: flaky-test
> Attachments: TestECCodeBenchmark.fail.log
>
>
> TestErasureCodeBenchmarkThroughput.testECReadWrite has been failing 
> intermittently. Attached logs from the recent precheckin failure run.
> https://builds.apache.org/job/PreCommit-HADOOP-Build/11907/testReport/org.apache.hadoop.hdfs/TestErasureCodeBenchmarkThroughput/testECReadWrite/
> https://builds.apache.org/job/PreCommit-HADOOP-Build/11888/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure110/testAddBlockWhenNoSufficientDataBlockNumOfNodes/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11584) TestErasureCodeBenchmarkThroughput#testECReadWrite fails intermittently

2024-01-03 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-11584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802497#comment-17802497
 ] 

Shilun Fan commented on HDFS-11584:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> TestErasureCodeBenchmarkThroughput#testECReadWrite fails intermittently
> ---
>
> Key: HDFS-11584
> URL: https://issues.apache.org/jira/browse/HDFS-11584
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Priority: Blocker
>  Labels: flaky-test
> Attachments: TestECCodeBenchmark.fail.log
>
>
> TestErasureCodeBenchmarkThroughput.testECReadWrite has been failing 
> intermittently. Attached logs from the recent precheckin failure run.
> https://builds.apache.org/job/PreCommit-HADOOP-Build/11907/testReport/org.apache.hadoop.hdfs/TestErasureCodeBenchmarkThroughput/testECReadWrite/
> https://builds.apache.org/job/PreCommit-HADOOP-Build/11888/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure110/testAddBlockWhenNoSufficientDataBlockNumOfNodes/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

1 2 3 4 >

1 - 100 of 353 matches

Mail list logo