[jira] [Commented] (HDFS-16057) Make sure the order for location in ENTERING_MAINTENANCE state

2021-06-09 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360489#comment-17360489
 ] 

tomscut commented on HDFS-16057:


Thank [~weichiu] for pointing out that. I updated the unit tests based on your 
suggestions. If you are free, please help to take a look.

> Make sure the order for location in ENTERING_MAINTENANCE state
> --
>
> Key: HDFS-16057
> URL: https://issues.apache.org/jira/browse/HDFS-16057
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We use comparator to sort locations in getBlockLocations(), and the expected 
> result is: live -> stale -> entering_maintenance -> decommissioned.
> But the networktopology. SortByDistance() will disrupt the order. We should 
> also filtered out node in sate  AdminStates.ENTERING_MAINTENANCE before 
> networktopology. SortByDistance().
>  
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock()
> {code:java}
> DatanodeInfoWithStorage[] di = lb.getLocations();
> // Move decommissioned/stale datanodes to the bottom
> Arrays.sort(di, comparator);
> // Sort nodes by network distance only for located blocks
> int lastActiveIndex = di.length - 1;
> while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) {
>   --lastActiveIndex;
> }
> int activeLen = lastActiveIndex + 1;
> if(nonDatanodeReader) {
>   networktopology.sortByDistanceUsingNetworkLocation(client,
>   lb.getLocations(), activeLen, createSecondaryNodeSorter());
> } else {
>   networktopology.sortByDistance(client, lb.getLocations(), activeLen,
>   createSecondaryNodeSorter());
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16057) Make sure the order for location in ENTERING_MAINTENANCE state

2021-06-09 Thread tomscut (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tomscut updated HDFS-16057:
---
Description: 
We use comparator to sort locations in getBlockLocations(), and the expected 
result is: live -> stale -> entering_maintenance -> decommissioned.

But the networktopology. SortByDistance() will disrupt the order. We should 
also filtered out node in sate  AdminStates.ENTERING_MAINTENANCE before 
networktopology. SortByDistance().

 

org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock()
{code:java}
DatanodeInfoWithStorage[] di = lb.getLocations();
// Move decommissioned/stale datanodes to the bottom
Arrays.sort(di, comparator);

// Sort nodes by network distance only for located blocks
int lastActiveIndex = di.length - 1;
while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) {
  --lastActiveIndex;
}
int activeLen = lastActiveIndex + 1;
if(nonDatanodeReader) {
  networktopology.sortByDistanceUsingNetworkLocation(client,
  lb.getLocations(), activeLen, createSecondaryNodeSorter());
} else {
  networktopology.sortByDistance(client, lb.getLocations(), activeLen,
  createSecondaryNodeSorter());
}
{code}
 

  was:
We use compactor to sort locations in getBlockLocations(), and the expected 
result is: live -> stale -> entering_maintenance -> decommissioned.

But the networktopology. SortByDistance() will disrupt the order. We should 
also filtered out node in sate  AdminStates.ENTERING_MAINTENANCE before 
networktopology. SortByDistance().

 

org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock()
{code:java}
DatanodeInfoWithStorage[] di = lb.getLocations();
// Move decommissioned/stale datanodes to the bottom
Arrays.sort(di, comparator);

// Sort nodes by network distance only for located blocks
int lastActiveIndex = di.length - 1;
while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) {
  --lastActiveIndex;
}
int activeLen = lastActiveIndex + 1;
if(nonDatanodeReader) {
  networktopology.sortByDistanceUsingNetworkLocation(client,
  lb.getLocations(), activeLen, createSecondaryNodeSorter());
} else {
  networktopology.sortByDistance(client, lb.getLocations(), activeLen,
  createSecondaryNodeSorter());
}
{code}
 


> Make sure the order for location in ENTERING_MAINTENANCE state
> --
>
> Key: HDFS-16057
> URL: https://issues.apache.org/jira/browse/HDFS-16057
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We use comparator to sort locations in getBlockLocations(), and the expected 
> result is: live -> stale -> entering_maintenance -> decommissioned.
> But the networktopology. SortByDistance() will disrupt the order. We should 
> also filtered out node in sate  AdminStates.ENTERING_MAINTENANCE before 
> networktopology. SortByDistance().
>  
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock()
> {code:java}
> DatanodeInfoWithStorage[] di = lb.getLocations();
> // Move decommissioned/stale datanodes to the bottom
> Arrays.sort(di, comparator);
> // Sort nodes by network distance only for located blocks
> int lastActiveIndex = di.length - 1;
> while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) {
>   --lastActiveIndex;
> }
> int activeLen = lastActiveIndex + 1;
> if(nonDatanodeReader) {
>   networktopology.sortByDistanceUsingNetworkLocation(client,
>   lb.getLocations(), activeLen, createSecondaryNodeSorter());
> } else {
>   networktopology.sortByDistance(client, lb.getLocations(), activeLen,
>   createSecondaryNodeSorter());
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor

2021-06-09 Thread Chenren Shao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360122#comment-17360122
 ] 

Chenren Shao edited comment on HDFS-14099 at 6/9/21, 9:53 PM:
--

Hi, all. 

I found that hadoop cannot process multi-frame zstd files and we applied this 
patch and still was not able to process it. The error message is the same as 
the one posted here.

 

I attached the problematic file 
[here|https://drive.google.com/file/d/12oGYQL63jmSBDwFi208jDNzFSP_CrraL/view?usp=sharing]
 and we can reproduce the issue by reading it via spark. This file was created 
by essentially running `cat file1.zst file2.zst > output.zst`. You can run 
`zstd -d output.zst` to decompress it without any issue, but spark.read will 
cause problem. Spark read of file1.zst and file2.zst doesn't have problem.


was (Author: cshao239):
Hi, all. 

I found that hadoop cannot process multi-frame zstd files and we applied this 
patch and still was not able to process it. The error message is the same as 
the one posted here.

 

I will try to attach the problematic file here and we can reproduce the issue 
by reading it via spark. This file was created by essentially running `cat 
file1.zst file2.zst > output.zst`. You can run `zstd -d output.zst` to 
decompress it without any issue, but spark.read will cause problem. Spark read 
of file1.zst and file2.zst doesn't have problem.

> Unknown frame descriptor when decompressing multiple frames in 
> ZStandardDecompressor
> 
>
> Key: HDFS-14099
> URL: https://issues.apache.org/jira/browse/HDFS-14099
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Hadoop Version: hadoop-3.0.3
> Java Version: 1.8.0_144
>Reporter: xuzq
>Assignee: xuzq
>Priority: Major
> Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, 
> HDFS-14099-trunk-003.patch
>
>
> We need to use the ZSTD compression algorithm in Hadoop. So I write a simple 
> demo like this for testing.
> {code:java}
> // code placeholder
> while ((size = fsDataInputStream.read(bufferV2)) > 0 ) {
>   countSize += size;
>   if (countSize == 65536 * 8) {
> if(!isFinished) {
>   // finish a frame in zstd
>   cmpOut.finish();
>   isFinished = true;
> }
> fsDataOutputStream.flush();
> fsDataOutputStream.hflush();
>   }
>   if(isFinished) {
> LOG.info("Will resetState. N=" + n);
> // reset the stream and write again
> cmpOut.resetState();
> isFinished = false;
>   }
>   cmpOut.write(bufferV2, 0, size);
>   bufferV2 = new byte[5 * 1024 * 1024];
>   n++;
> }
> {code}
>  
> And I use "*hadoop fs -text*"  to read this file and failed. The error as 
> blow.
> {code:java}
> Exception in thread "main" java.lang.InternalError: Unknown frame descriptor
> at 
> org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native
>  Method)
> at 
> org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
> at java.io.InputStream.read(InputStream.java:101)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127)
> at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101)
> at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
> at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269)
> at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
> at org.apache.hadoop.fs.shell.Command.run(Command.java:176)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
> {code}
>  
> So I had to look the code, include jni, then found this bug.
> *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*.
> The first is  in *ZStandardDecompressor.c.* 
> {code:java}
> if (size == 0) {
> (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, 
> JNI_TRUE);
> size_t result = dlsym_ZSTD_initDStream(stream);
> if (dlsym_ZSTD_isError(result)) {
> THROW(env, "java/lang/InternalError", 
> dlsym_ZSTD_getErrorName(result));
> return 

[jira] [Commented] (HDFS-15971) Make mkstemp cross platform

2021-06-09 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360371#comment-17360371
 ] 

Jim Brennan commented on HDFS-15971:


[~gautham], thanks for your response. I have upgraded everything on my linux VM 
based on what is in the centos 7 docker image, and I have been able to get 
native hadoop to compile.

I think improving the Windows build experience is a worthy goal. I am a bit 
concerned about how much of an upgrade this is on the linux (gcc) side, going 
from 4.8.5 to 9.3.0. Maybe this is OK to do in trunk, but if the intention is 
to push this back to earlier branch-3 branches, that might be a problem for 
groups using those branches. I'm not sure what the guidelines are for upgrading 
the build environment in minor releases. As you say, maybe it is worth it in 
this case.

Note that I can build trunk with 4.8.5 if I revert the change to this line in 
build_libhdfs_test in hadoop-hdfs-native-client/.../src/CMakeLists.txt:
{code:java}
add_executable("${NAME}_${LIBRARY}" $ 
$ ${FILES})
{code}
But I think that just prevents it from building the x_platform_obj.
 I'm not sure if it was building the x_platform stuff prior to this commit, or 
if it was just silently failing with 4.8.5?
{quote}*I'm also puzzled by this failing check. This did not fail on 4.8.5 
before this commit (and the warning message hasn't been updated either):*
 This is coming from here. Let me take a look at this and get back to you. 
Nevertheless, we don't face this issue when GCC is upgraded to 9.3.0.
{quote}
I am curious to see what you find. I would like to understand better what 
specifically in this commit is causing this check to fail with gcc 4.8.5. It 
wasn't failing before, and you didn't modify this check.

> Make mkstemp cross platform
> ---
>
> Key: HDFS-15971
> URL: https://issues.apache.org/jira/browse/HDFS-15971
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: Dockerfile_centos_7, build-log.zip, commit-details.txt
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> mkstemp isn't available in Visual C++. Need to make it cross platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=609363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609363
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 20:10
Start Date: 09/Jun/21 20:10
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r648647841



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -372,12 +420,69 @@ private static void setStateStoreVersions(
 
   @Override
   public long getTotalCapacity() {
-return getNameserviceAggregatedLong(MembershipStats::getTotalSpace);
+return getNameserviceAggregatedLong(
+DatanodeReportType.LIVE, DatanodeInfo::getCapacity);
+  }
+
+  public LoadingCache getDnCache() {
+return dnCache;
+  }
+
+  /**
+   * Get the aggregated value for a DatanodeReportType and
+   * a method for all nameservices.
+   * @param type a DatanodeReportType
+   * @param f Method reference
+   * @return Aggregated long.
+   */
+  public long getNameserviceAggregatedLong(
+  DatanodeReportType type, ToLongFunction f){
+long size = 0;
+try {
+  size = Arrays.stream(dnCache.get(type)).mapToLong(f).sum();

Review comment:
   Extract the get(type)?

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -372,12 +420,69 @@ private static void setStateStoreVersions(
 
   @Override
   public long getTotalCapacity() {
-return getNameserviceAggregatedLong(MembershipStats::getTotalSpace);
+return getNameserviceAggregatedLong(
+DatanodeReportType.LIVE, DatanodeInfo::getCapacity);
+  }
+
+  public LoadingCache getDnCache() {
+return dnCache;
+  }
+
+  /**
+   * Get the aggregated value for a DatanodeReportType and
+   * a method for all nameservices.
+   * @param type a DatanodeReportType
+   * @param f Method reference
+   * @return Aggregated long.
+   */
+  public long getNameserviceAggregatedLong(
+  DatanodeReportType type, ToLongFunction f){
+long size = 0;
+try {
+  size = Arrays.stream(dnCache.get(type)).mapToLong(f).sum();
+} catch (ExecutionException e) {
+  LOG.debug("Cannot get " + type + " nodes", e.getMessage());
+}
+return size;
+  }
+
+  /**
+   * Get the aggregated value for a DatanodeReportType and
+   * a method for all nameservices.
+   * @param type a DatanodeReportType
+   * @param f Method reference
+   * @return Aggregated Integer.
+   */
+  public int getNameserviceAggregatedInt(
+  DatanodeReportType type, Predicate f){
+int size = 0;
+try {
+  Arrays.stream(dnCache.get(DatanodeReportType.LIVE)).filter(f).count();

Review comment:
   We are not updating size at all, are we?
   It is also not the most intuitive code to read; maybe extract a litle.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/NamenodeBeanMetrics.java
##
@@ -500,6 +498,8 @@ private String getNodesImpl(final DatanodeReportType type) {
   LOG.error("Cannot get {} nodes, subclusters timed out responding", type);
 } catch (IOException e) {
   LOG.error("Cannot get " + type + " nodes", e);
+} catch (ExecutionException e) {
+  LOG.error("Cannot get " + type + " nodes", e);

Review comment:
   Do we support logger {}?

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -372,12 +420,69 @@ private static void setStateStoreVersions(
 
   @Override
   public long getTotalCapacity() {
-return getNameserviceAggregatedLong(MembershipStats::getTotalSpace);
+return getNameserviceAggregatedLong(
+DatanodeReportType.LIVE, DatanodeInfo::getCapacity);
+  }
+
+  public LoadingCache getDnCache() {

Review comment:
   Add a javadoc explaining the purpose.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -372,12 +420,69 @@ private static void setStateStoreVersions(
 
   @Override
   public long getTotalCapacity() {
-return getNameserviceAggregatedLong(MembershipStats::getTotalSpace);
+return getNameserviceAggregatedLong(
+DatanodeReportType.LIVE, DatanodeInfo::getCapacity);
+  }
+
+  public LoadingCache getDnCache() {
+return dnCache;
+  }
+
+  /**
+   * Get the aggregated value for a DatanodeReportType and
+   * a method for all nameservices.
+   * @param type a DatanodeReportType
+   * @param f Method reference
+   * @return Aggregated long.
+   */
+  public long 

[jira] [Created] (HDFS-16061) DFTestUtil.waitReplication can produce false positives

2021-06-09 Thread Ahmed Hussein (Jira)
Ahmed Hussein created HDFS-16061:


 Summary: DFTestUtil.waitReplication can produce false positives
 Key: HDFS-16061
 URL: https://issues.apache.org/jira/browse/HDFS-16061
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Ahmed Hussein
Assignee: Ahmed Hussein


While checking the intermittent failure in 
TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault described in HDFS-15146, I 
found that the implementation of waitReplication is incorrect.
In the last iteration, when {{correctReplFactor}} is {{false}}, the thread 
sleeps for 1 second, then a {{TimeoutException}} is thrown without check 
whether the replication was complete in the last second.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor

2021-06-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360243#comment-17360243
 ] 

Hadoop QA commented on HDFS-14099:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
44s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
 1s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
32s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
56s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 54s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
32s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 22m 
48s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m 
21s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
53s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 21m 53s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/619/artifact/out/diff-compile-javac-root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt{color}
 | {color:red} root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 2 new + 1998 unchanged - 0 
fixed = 2000 total (was 1998) {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m  
8s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 19m  8s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/619/artifact/out/diff-compile-javac-root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt{color}
 | {color:red} root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 with 
JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 2 new + 1873 
unchanged - 0 fixed = 1875 total (was 1873) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has 

[jira] [Reopened] (HDFS-15671) TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk

2021-06-09 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein reopened HDFS-15671:
--

> TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk
> --
>
> Key: HDFS-15671
> URL: https://issues.apache.org/jira/browse/HDFS-15671
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Priority: Major
> Attachments: TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault.log
>
>
> qbt report shows failures on TestBalancer
> {code:bash}
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault
> Failing for the past 1 build (Since Failed#317 )
> Took 45 sec.
> Error Message
> Timed out waiting for /tmp.txt to reach 20 replicas
> Stacktrace
> java.util.concurrent.TimeoutException: Timed out waiting for /tmp.txt to 
> reach 20 replicas
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:829)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.createFile(TestBalancer.java:319)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:865)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2193)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-06-09 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359902#comment-17359902
 ] 

Viraj Jasani edited comment on HDFS-15982 at 6/9/21, 3:46 PM:
--

{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete -> FSDirDeleteOp#delete -> 
deleteInternal -> delete -> deleteAllowed) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

Although I agree that delete(path) should return true for non-existing path, if 
we change this behaviour (as part of separate Jira), it will be another 
incompatible change.

 

Edit: Even Abfs returns false while deleting non-existent path it seems

 
{code:java}
try {
  abfsStore.delete(qualifiedPath, recursive);
  return true;
} catch (AzureBlobFileSystemException ex) {
  checkException(f, ex, AzureServiceErrorCode.PATH_NOT_FOUND);
  return false;
}
{code}
 

 


was (Author: vjasani):
{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

Although I agree that delete(path) should return true for non-existing path, if 
we change this behaviour (as part of separate Jira), it will be an incompatible 
change.

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, httpfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-06-09 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359902#comment-17359902
 ] 

Viraj Jasani edited comment on HDFS-15982 at 6/9/21, 3:46 PM:
--

{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete -> FSDirDeleteOp#delete -> 
deleteInternal -> delete -> deleteAllowed) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

Although I agree that delete(path) should return true for non-existing path, if 
we change this behaviour (as part of separate Jira), it will be another 
incompatible change.

 

Edit: Even Abfs returns false while deleting non-existent path it seems
{code:java}
try {
  abfsStore.delete(qualifiedPath, recursive);
  return true;
} catch (AzureBlobFileSystemException ex) {
  checkException(f, ex, AzureServiceErrorCode.PATH_NOT_FOUND);
  return false;
}
{code}


was (Author: vjasani):
{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete -> FSDirDeleteOp#delete -> 
deleteInternal -> delete -> deleteAllowed) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

Although I agree that delete(path) should return true for non-existing path, if 
we change this behaviour (as part of separate Jira), it will be another 
incompatible change.

 

Edit: Even Abfs returns false while deleting non-existent path it seems

 
{code:java}
try {
  abfsStore.delete(qualifiedPath, recursive);
  return true;
} catch (AzureBlobFileSystemException ex) {
  checkException(f, ex, AzureServiceErrorCode.PATH_NOT_FOUND);
  return false;
}
{code}
 

 

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, httpfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15146) TestBalancerRPCDelay. testBalancerRPCDelayQpsDefault fails intermittently

2021-06-09 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15146:
-
Description: 
TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault fails intermittently when 
the number of blocks does not match the expected. In {{testBalancerRPCDelay}}, 
it seems like some datanodes will not be up by the time we fetch the block 
locations.

I see the following stack trace:
{code:bash}
[ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 39.969 
s <<< FAILURE! - in org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay
[ERROR] 
testBalancerRPCDelayQpsDefault(org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay)
  Time elapsed: 12.035 s  <<< FAILURE!
java.lang.AssertionError: Number of getBlocks should be not less than 20
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2197)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}

  was:
TestBalancerRPCDelay.testBalancerRPCDelay fails intermittently when the number 
of blocks does not match the expected. In {{testBalancerRPCDelay}}, it seems 
like some datanodes will not be up by the time we fetch the block locations.

I see the following stack trace:

{code:bash}
[ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 39.969 
s <<< FAILURE! - in org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay
[ERROR] 
testBalancerRPCDelayQpsDefault(org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay)
  Time elapsed: 12.035 s  <<< FAILURE!
java.lang.AssertionError: Number of getBlocks should be not less than 20
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2197)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}




> TestBalancerRPCDelay. testBalancerRPCDelayQpsDefault fails intermittently
> -
>
> Key: HDFS-15146
> URL: https://issues.apache.org/jira/browse/HDFS-15146
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0, 3.2.2, 2.10.1, 3.4.0
>
> Attachments: HDFS-15146-branch-2.10.001.patch, HDFS-15146.001.patch
>
>
> TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault fails intermittently when 
> the number of 

[jira] [Resolved] (HDFS-15671) TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk

2021-06-09 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein resolved HDFS-15671.
--
Resolution: Duplicate

duplicate of HDFS-15146

> TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk
> --
>
> Key: HDFS-15671
> URL: https://issues.apache.org/jira/browse/HDFS-15671
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Priority: Major
> Attachments: TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault.log
>
>
> qbt report shows failures on TestBalancer
> {code:bash}
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault
> Failing for the past 1 build (Since Failed#317 )
> Took 45 sec.
> Error Message
> Timed out waiting for /tmp.txt to reach 20 replicas
> Stacktrace
> java.util.concurrent.TimeoutException: Timed out waiting for /tmp.txt to 
> reach 20 replicas
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:829)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.createFile(TestBalancer.java:319)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:865)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2193)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15146) TestBalancerRPCDelay. testBalancerRPCDelayQpsDefault fails intermittently

2021-06-09 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15146:
-
Summary: TestBalancerRPCDelay. testBalancerRPCDelayQpsDefault fails 
intermittently  (was: TestBalancerRPCDelay.testBalancerRPCDelay fails 
intermittently)

> TestBalancerRPCDelay. testBalancerRPCDelayQpsDefault fails intermittently
> -
>
> Key: HDFS-15146
> URL: https://issues.apache.org/jira/browse/HDFS-15146
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0, 3.2.2, 2.10.1, 3.4.0
>
> Attachments: HDFS-15146-branch-2.10.001.patch, HDFS-15146.001.patch
>
>
> TestBalancerRPCDelay.testBalancerRPCDelay fails intermittently when the 
> number of blocks does not match the expected. In {{testBalancerRPCDelay}}, it 
> seems like some datanodes will not be up by the time we fetch the block 
> locations.
> I see the following stack trace:
> {code:bash}
> [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 39.969 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay
> [ERROR] 
> testBalancerRPCDelayQpsDefault(org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay)
>   Time elapsed: 12.035 s  <<< FAILURE!
> java.lang.AssertionError: Number of getBlocks should be not less than 20
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2197)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15146) TestBalancerRPCDelay.testBalancerRPCDelay fails intermittently

2021-06-09 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15146:
-
Parent: HDFS-15646
Issue Type: Sub-task  (was: Bug)

> TestBalancerRPCDelay.testBalancerRPCDelay fails intermittently
> --
>
> Key: HDFS-15146
> URL: https://issues.apache.org/jira/browse/HDFS-15146
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0, 3.2.2, 2.10.1, 3.4.0
>
> Attachments: HDFS-15146-branch-2.10.001.patch, HDFS-15146.001.patch
>
>
> TestBalancerRPCDelay.testBalancerRPCDelay fails intermittently when the 
> number of blocks does not match the expected. In {{testBalancerRPCDelay}}, it 
> seems like some datanodes will not be up by the time we fetch the block 
> locations.
> I see the following stack trace:
> {code:bash}
> [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 39.969 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay
> [ERROR] 
> testBalancerRPCDelayQpsDefault(org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay)
>   Time elapsed: 12.035 s  <<< FAILURE!
> java.lang.AssertionError: Number of getBlocks should be not less than 20
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2197)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor

2021-06-09 Thread Chenren Shao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360122#comment-17360122
 ] 

Chenren Shao edited comment on HDFS-14099 at 6/9/21, 2:35 PM:
--

Hi, all. 

I found that hadoop cannot process multi-frame zstd files and we applied this 
patch and still was not able to process it. The error message is the same as 
the one posted here.

 

I will try to attach the problematic file here and we can reproduce the issue 
by reading it via spark. This file was created by essentially running `cat 
file1.zst file2.zst > output.zst`. You can run `zstd -d output.zst` to 
decompress it without any issue, but spark.read will cause problem. Spark read 
of file1.zst and file2.zst doesn't have problem.


was (Author: cshao239):
Hi, all. 

I found that hadoop cannot process multi-frame files and we applied this patch 
and still was not able to process it. The error message is the same as the one 
posted here.

 

I will try to attach the problematic file here and we can reproduce the issue 
by reading it via spark. This file was created by essentially running `cat 
file1.zst file2.zst > output.zst`. You can run `zstd -d output.zst` to 
decompress it without any issue, but spark.read will cause problem. Spark read 
of file1.zst and file2.zst doesn't have problem.

> Unknown frame descriptor when decompressing multiple frames in 
> ZStandardDecompressor
> 
>
> Key: HDFS-14099
> URL: https://issues.apache.org/jira/browse/HDFS-14099
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Hadoop Version: hadoop-3.0.3
> Java Version: 1.8.0_144
>Reporter: xuzq
>Assignee: xuzq
>Priority: Major
> Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, 
> HDFS-14099-trunk-003.patch
>
>
> We need to use the ZSTD compression algorithm in Hadoop. So I write a simple 
> demo like this for testing.
> {code:java}
> // code placeholder
> while ((size = fsDataInputStream.read(bufferV2)) > 0 ) {
>   countSize += size;
>   if (countSize == 65536 * 8) {
> if(!isFinished) {
>   // finish a frame in zstd
>   cmpOut.finish();
>   isFinished = true;
> }
> fsDataOutputStream.flush();
> fsDataOutputStream.hflush();
>   }
>   if(isFinished) {
> LOG.info("Will resetState. N=" + n);
> // reset the stream and write again
> cmpOut.resetState();
> isFinished = false;
>   }
>   cmpOut.write(bufferV2, 0, size);
>   bufferV2 = new byte[5 * 1024 * 1024];
>   n++;
> }
> {code}
>  
> And I use "*hadoop fs -text*"  to read this file and failed. The error as 
> blow.
> {code:java}
> Exception in thread "main" java.lang.InternalError: Unknown frame descriptor
> at 
> org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native
>  Method)
> at 
> org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
> at java.io.InputStream.read(InputStream.java:101)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127)
> at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101)
> at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
> at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269)
> at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
> at org.apache.hadoop.fs.shell.Command.run(Command.java:176)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
> {code}
>  
> So I had to look the code, include jni, then found this bug.
> *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*.
> The first is  in *ZStandardDecompressor.c.* 
> {code:java}
> if (size == 0) {
> (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, 
> JNI_TRUE);
> size_t result = dlsym_ZSTD_initDStream(stream);
> if (dlsym_ZSTD_isError(result)) {
> THROW(env, "java/lang/InternalError", 
> dlsym_ZSTD_getErrorName(result));
> return (jint) 0;
> }
> }
> {code}
> This call here is correct, but *Finished* no longer 

[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor

2021-06-09 Thread Chenren Shao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360122#comment-17360122
 ] 

Chenren Shao commented on HDFS-14099:
-

Hi, all. 

I found that hadoop cannot process multi-frame files and we applied this patch 
and still was not able to process it. The error message is the same as the one 
posted here.

 

I will try to attach the problematic file here and we can reproduce the issue 
by reading it via spark. This file was created by essentially running `cat 
file1.zst file2.zst > output.zst`. You can run `zstd -d output.zst` to 
decompress it without any issue, but spark.read will cause problem. Spark read 
of file1.zst and file2.zst doesn't have problem.

> Unknown frame descriptor when decompressing multiple frames in 
> ZStandardDecompressor
> 
>
> Key: HDFS-14099
> URL: https://issues.apache.org/jira/browse/HDFS-14099
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Hadoop Version: hadoop-3.0.3
> Java Version: 1.8.0_144
>Reporter: xuzq
>Assignee: xuzq
>Priority: Major
> Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, 
> HDFS-14099-trunk-003.patch
>
>
> We need to use the ZSTD compression algorithm in Hadoop. So I write a simple 
> demo like this for testing.
> {code:java}
> // code placeholder
> while ((size = fsDataInputStream.read(bufferV2)) > 0 ) {
>   countSize += size;
>   if (countSize == 65536 * 8) {
> if(!isFinished) {
>   // finish a frame in zstd
>   cmpOut.finish();
>   isFinished = true;
> }
> fsDataOutputStream.flush();
> fsDataOutputStream.hflush();
>   }
>   if(isFinished) {
> LOG.info("Will resetState. N=" + n);
> // reset the stream and write again
> cmpOut.resetState();
> isFinished = false;
>   }
>   cmpOut.write(bufferV2, 0, size);
>   bufferV2 = new byte[5 * 1024 * 1024];
>   n++;
> }
> {code}
>  
> And I use "*hadoop fs -text*"  to read this file and failed. The error as 
> blow.
> {code:java}
> Exception in thread "main" java.lang.InternalError: Unknown frame descriptor
> at 
> org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native
>  Method)
> at 
> org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
> at java.io.InputStream.read(InputStream.java:101)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127)
> at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101)
> at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
> at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269)
> at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
> at org.apache.hadoop.fs.shell.Command.run(Command.java:176)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
> {code}
>  
> So I had to look the code, include jni, then found this bug.
> *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*.
> The first is  in *ZStandardDecompressor.c.* 
> {code:java}
> if (size == 0) {
> (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, 
> JNI_TRUE);
> size_t result = dlsym_ZSTD_initDStream(stream);
> if (dlsym_ZSTD_isError(result)) {
> THROW(env, "java/lang/InternalError", 
> dlsym_ZSTD_getErrorName(result));
> return (jint) 0;
> }
> }
> {code}
> This call here is correct, but *Finished* no longer be set to false, even if 
> there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need 
> to be decompressed.
> The second is in *org.apache.hadoop.io.compress.DecompressorStream* by 
> *decompressor.reset()*, because *Finished* is always true after decompressed 
> a *Frame*.
> {code:java}
> if (decompressor.finished()) {
>   // First see if there was any leftover buffered input from previous
>   // stream; if not, attempt to refill buffer.  If refill -> EOF, we're
>   // all done; else reset, fix up input buffer, and get ready for next
>   // concatenated substream/"member".
>   int nRemaining = 

[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=609138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609138
 ]

ASF GitHub Bot logged work on HDFS-16016:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 13:25
Start Date: 09/Jun/21 13:25
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2998:
URL: https://github.com/apache/hadoop/pull/2998#issuecomment-857691934


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 51s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 28s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 10s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 53s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 46s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m  5s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 374m 46s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2998/24/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | -1 :x: |  asflicense  |   0m 46s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2998/24/artifact/out/results-asflicense.txt)
 |  The patch generated 2 ASF License warnings.  |
   |  |   | 458m 59s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
   |   | 
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor |
   |   | hadoop.hdfs.TestDFSShell |
   |   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
   |   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
   |   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2998/24/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2998 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux e50bd05e6f89 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a5ca47922104dd6a4fc820de134062fd15863215 |
   | Default Java | Private 

[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-06-09 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359902#comment-17359902
 ] 

Viraj Jasani edited comment on HDFS-15982 at 6/9/21, 11:58 AM:
---

{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

Although I agree that delete(path) should return true for non-existing path, if 
we change this behaviour (as part of separate Jira), it will be an incompatible 
change.


was (Author: vjasani):
{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, httpfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-06-09 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359902#comment-17359902
 ] 

Viraj Jasani commented on HDFS-15982:
-

{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, httpfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16060) There is an inconsistent between replicas of datanodes when hardware is abnormal

2021-06-09 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359898#comment-17359898
 ] 

Hui Fei commented on HDFS-16060:


Haven't find the root cause.

I think DN recalculates the checksum for the input data( but it's wrong) and 
write them on the disk.

 

> There is an inconsistent between replicas of datanodes when hardware is 
> abnormal
> 
>
> Key: HDFS-16060
> URL: https://issues.apache.org/jira/browse/HDFS-16060
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Priority: Major
>
> We find the following case on production env.
>  * replicas of the same block are stored in dn1, dn2.
>  * replicas of dn1 and dn2 are different
>  * Verify meta & data for replica successfully on dn1, and the same on dn2.
> User code is just copyfromlocal.
> Find some error log on datanode at first
> {quote}
> 2021-05-27 04:54:20,471 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Checksum error in block 
> BP-1453431581-x.x.x.x-1531302155027:blk_13892199285_12902824176 from 
> /y.y.y.y:47960
> org.apache.hadoop.fs.ChecksumException: Checksum error: 
> DFSClient_NONMAPREDUCE_-1760730985_129 at 0 exp: 37939694 got: -1180138774
>  at 
> org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native 
> Method)
>  at 
> org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
>  at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
>  at 
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:438)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:582)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:885)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:801)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
>  at java.lang.Thread.run(Thread.java:748)
> {quote}
> After this, new pipeline is created and then wrong data and meta written in 
> disk file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16060) There is an inconsistent between replicas of datanodes when hardware is abnormal

2021-06-09 Thread Hui Fei (Jira)
Hui Fei created HDFS-16060:
--

 Summary: There is an inconsistent between replicas of datanodes 
when hardware is abnormal
 Key: HDFS-16060
 URL: https://issues.apache.org/jira/browse/HDFS-16060
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.4.0
Reporter: Hui Fei


We find the following case on production env.
 * replicas of the same block are stored in dn1, dn2.
 * replicas of dn1 and dn2 are different
 * Verify meta & data for replica successfully on dn1, and the same on dn2.

User code is just copyfromlocal.

Find some error log on datanode at first

{quote}

2021-05-27 04:54:20,471 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Checksum error in block 
BP-1453431581-x.x.x.x-1531302155027:blk_13892199285_12902824176 from 
/y.y.y.y:47960
org.apache.hadoop.fs.ChecksumException: Checksum error: 
DFSClient_NONMAPREDUCE_-1760730985_129 at 0 exp: 37939694 got: -1180138774
 at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native 
Method)
 at 
org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
 at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
 at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:438)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:582)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:885)
 at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:801)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
 at java.lang.Thread.run(Thread.java:748)

{quote}

After this, new pipeline is created and then wrong data and meta written in 
disk file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=609034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609034
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 08:58
Start Date: 09/Jun/21 08:58
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#issuecomment-857518062


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 40s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  34m 14s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 17s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 15s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  20m 10s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 37s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 100m 37s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination |
   |   | hadoop.hdfs.server.federation.metrics.TestRBFMetrics |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpc |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3086 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 5d8ec9349326 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | 

[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=609031=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609031
 ]

ASF GitHub Bot logged work on HDFS-13671:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 08:46
Start Date: 09/Jun/21 08:46
Worklog Time Spent: 10m 
  Work Description: aajisaka commented on a change in pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#discussion_r648098136



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
##
@@ -1996,7 +1996,12 @@ private void metaSave(PrintWriter out) {
 LightWeightHashSet openFileIds = new LightWeightHashSet<>();
 for (DatanodeDescriptor dataNode :
 blockManager.getDatanodeManager().getDatanodes()) {
-  for (long ucFileId : dataNode.getLeavingServiceStatus().getOpenFiles()) {
+  // Sort open files
+  LightWeightHashSet dnOpenFiles =
+  dataNode.getLeavingServiceStatus().getOpenFiles();
+  Long[] dnOpenFileIds = new Long[dnOpenFiles.size()];
+  Arrays.sort(dnOpenFiles.toArray(dnOpenFileIds));
+  for (Long ucFileId : dnOpenFileIds) {
 INode ucFile = getFSDirectory().getInode(ucFileId);
 if (ucFile == null || ucFileId <= prevId ||
 openFileIds.contains(ucFileId)) {

Review comment:
   Filed HDFS-16059




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609031)
Time Spent: 3.5h  (was: 3h 20m)

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13671-001.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, 

[jira] [Commented] (HDFS-16059) dfsadmin -listOpenFiles -blockingDecommission can miss some files

2021-06-09 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359872#comment-17359872
 ] 

Akira Ajisaka commented on HDFS-16059:
--

The regression test result:
{quote}
[ERROR] Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.864 
s <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem
[ERROR] 
testGetFilesBlockingDecomInternal(org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem)
  Time elapsed: 0.305 s  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<4>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:633)
at 
org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem.testGetFilesBlockingDecomInternal(TestFSNamesystem.java:372)
 {quote}

> dfsadmin -listOpenFiles -blockingDecommission can miss some files
> -
>
> Key: HDFS-16059
> URL: https://issues.apache.org/jira/browse/HDFS-16059
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsadmin
>Reporter: Akira Ajisaka
>Priority: Major
> Attachments: HDFS-16059-regression-test.patch
>
>
> While reviewing HDFS-13671, I found "dfsadmin -listOpenFiles 
> -blockingDecommission" can drop some files.
> [https://github.com/apache/hadoop/pull/3065#discussion_r647396463]
> {quote}If the DataNodes have the following open files and we want to list all 
> the open files:
> DN1: [1001, 1002, 1003, ... , 2000]
>  DN2: [1, 2, 3, ... , 1000]
> At first getFilesBlockingDecom(0, "/") is called and it returns [1001, 1002, 
> ... , 2000] because it reached max size (=1000), and next 
> getFilesBlockingDecom(2000, "/") is called because the last inode Id of the 
> previous result is 2000. That way the open files of DN2 is missed
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16059) dfsadmin -listOpenFiles -blockingDecommission can miss some files

2021-06-09 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-16059:
-
Attachment: HDFS-16059-regression-test.patch

> dfsadmin -listOpenFiles -blockingDecommission can miss some files
> -
>
> Key: HDFS-16059
> URL: https://issues.apache.org/jira/browse/HDFS-16059
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsadmin
>Reporter: Akira Ajisaka
>Priority: Major
> Attachments: HDFS-16059-regression-test.patch
>
>
> While reviewing HDFS-13671, I found "dfsadmin -listOpenFiles 
> -blockingDecommission" can drop some files.
> [https://github.com/apache/hadoop/pull/3065#discussion_r647396463]
> {quote}If the DataNodes have the following open files and we want to list all 
> the open files:
> DN1: [1001, 1002, 1003, ... , 2000]
>  DN2: [1, 2, 3, ... , 1000]
> At first getFilesBlockingDecom(0, "/") is called and it returns [1001, 1002, 
> ... , 2000] because it reached max size (=1000), and next 
> getFilesBlockingDecom(2000, "/") is called because the last inode Id of the 
> previous result is 2000. That way the open files of DN2 is missed
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16059) dfsadmin -listOpenFiles -blockingDecommission can miss some files

2021-06-09 Thread Akira Ajisaka (Jira)
Akira Ajisaka created HDFS-16059:


 Summary: dfsadmin -listOpenFiles -blockingDecommission can miss 
some files
 Key: HDFS-16059
 URL: https://issues.apache.org/jira/browse/HDFS-16059
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: dfsadmin
Reporter: Akira Ajisaka


While reviewing HDFS-13671, I found "dfsadmin -listOpenFiles 
-blockingDecommission" can drop some files.

[https://github.com/apache/hadoop/pull/3065#discussion_r647396463]
{quote}If the DataNodes have the following open files and we want to list all 
the open files:

DN1: [1001, 1002, 1003, ... , 2000]
 DN2: [1, 2, 3, ... , 1000]

At first getFilesBlockingDecom(0, "/") is called and it returns [1001, 1002, 
... , 2000] because it reached max size (=1000), and next 
getFilesBlockingDecom(2000, "/") is called because the last inode Id of the 
previous result is 2000. That way the open files of DN2 is missed
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=608972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608972
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 07:16
Start Date: 09/Jun/21 07:16
Worklog Time Spent: 10m 
  Work Description: zhuxiangyi opened a new pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086


   Solve the inaccurate statistics of metrics for getNumLiveNodes, 
getNumDeadNodes, getNumDecommissioningNodes, getNumDecomLiveNodes, 
getNumDecomDeadNodes, getNumInMaintenanceLiveDataNodes, 
getNumInMaintenanceDeadDataNodes, getNumEnteringMaintenanceDataNodes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608972)
Remaining Estimate: 0h
Time Spent: 10m

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16039:
--
Labels: pull-request-available  (was: )

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16057) Make sure the order for location in ENTERING_MAINTENANCE state

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16057?focusedWorklogId=608951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608951
 ]

ASF GitHub Bot logged work on HDFS-16057:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 06:27
Start Date: 09/Jun/21 06:27
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3084:
URL: https://github.com/apache/hadoop/pull/3084#issuecomment-857419235


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 35s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 26s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 52s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 47s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m  8s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 48s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 228m 44s |  |  hadoop-hdfs in the patch 
passed.  |
   | -1 :x: |  asflicense  |   0m 47s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3084/3/artifact/out/results-asflicense.txt)
 |  The patch generated 2 ASF License warnings.  |
   |  |   | 311m 59s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3084/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3084 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 17996b1e50ba 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 702b82e873b4f2ab68b1cb76bada0c9dbd25df1c |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3084/3/testReport/ |
   | Max. process+thread count | 3166 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3084/3/console |
   | versions | git=2.25.1 maven=3.6.3 

[jira] [Work logged] (HDFS-16054) Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16054?focusedWorklogId=608941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608941
 ]

ASF GitHub Bot logged work on HDFS-16054:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 06:13
Start Date: 09/Jun/21 06:13
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #3073:
URL: https://github.com/apache/hadoop/pull/3073#issuecomment-857410973


   Merged. Thanks again, @virajjasani.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608941)
Time Spent: 2h 10m  (was: 2h)

> Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project
> --
>
> Key: HDFS-16054
> URL: https://issues.apache.org/jira/browse/HDFS-16054
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16054) Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16054?focusedWorklogId=608939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608939
 ]

ASF GitHub Bot logged work on HDFS-16054:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 06:13
Start Date: 09/Jun/21 06:13
Worklog Time Spent: 10m 
  Work Description: tasanuma merged pull request #3073:
URL: https://github.com/apache/hadoop/pull/3073


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608939)
Time Spent: 2h  (was: 1h 50m)

> Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project
> --
>
> Key: HDFS-16054
> URL: https://issues.apache.org/jira/browse/HDFS-16054
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16054) Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project

2021-06-09 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16054.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project
> --
>
> Key: HDFS-16054
> URL: https://issues.apache.org/jira/browse/HDFS-16054
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16054) Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project

2021-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16054?focusedWorklogId=608937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608937
 ]

ASF GitHub Bot logged work on HDFS-16054:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 06:12
Start Date: 09/Jun/21 06:12
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #3073:
URL: https://github.com/apache/hadoop/pull/3073#issuecomment-857410626


   The failed tests succeeded in my local environment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608937)
Time Spent: 1h 50m  (was: 1h 40m)

> Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project
> --
>
> Key: HDFS-16054
> URL: https://issues.apache.org/jira/browse/HDFS-16054
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org