[GitHub] [hadoop] GauthamBanasandra commented on pull request #2710: HDFS-15843. Make write cross-platform

2021-03-05 Thread GitBox


GauthamBanasandra commented on pull request #2710:
URL: https://github.com/apache/hadoop/pull/2710#issuecomment-791875636


   @smengcl could you please review my PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17552) Change ipc.client.rpc-timeout.ms from 0 to 120000 by default to avoid potential hang

2021-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17552?focusedWorklogId=561713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561713
 ]

ASF GitHub Bot logged work on HADOOP-17552:
---

Author: ASF GitHub Bot
Created on: 06/Mar/21 04:48
Start Date: 06/Mar/21 04:48
Worklog Time Spent: 10m 
  Work Description: iwasakims commented on pull request #2727:
URL: https://github.com/apache/hadoop/pull/2727#issuecomment-791872984


   @functioner You should address the checkstyle warning. I think we don't need 
the  comment.
   
   
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java:61:
  public static final int IPC_CLIENT_RPC_TIMEOUT_DEFAULT = 12; // 120 
seconds: Line is longer than 80 characters (found 81). [LineLength]



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 561713)
Time Spent: 9h  (was: 8h 50m)

> Change ipc.client.rpc-timeout.ms from 0 to 12 by default to avoid 
> potential hang
> 
>
> Key: HADOOP-17552
> URL: https://issues.apache.org/jira/browse/HADOOP-17552
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
>     We are doing some systematic fault injection testing in Hadoop-3.2.2 and 
> when we try to run a client (e.g., `bin/hdfs dfs -ls /`) to our HDFS cluster 
> (1 NameNode, 2 DataNodes), the client gets stuck forever. After some 
> investigation, we believe that it’s a bug in `hadoop.ipc.Client` because the 
> read method of `hadoop.ipc.Client$Connection$PingInputStream` keeps 
> swallowing `java.net.SocketTimeoutException` due to the mistaken usage of the 
> `rpcTimeout` configuration in the `handleTimeout` method.
> *Reproduction*
>     Start HDFS with the default configuration. Then execute a client (we used 
> the command `bin/hdfs dfs -ls /` in the terminal). While HDFS is trying to 
> accept the client’s socket, inject a socket error (java.net.SocketException 
> or java.io.IOException), specifically at line 1402 (line 1403 or 1404 will 
> also work).
>     We prepare the scripts for reproduction in a gist 
> ([https://gist.github.com/functioner/08bcd86491b8ff32860eafda8c140e24]).
> *Diagnosis*
>     When the NameNode tries to accept a client’s socket, basically there are 
> 4 steps:
>  # accept the socket (line 1400)
>  # configure the socket (line 1402-1404)
>  # make the socket a Reader (after line 1404)
>  # swallow the possible IOException in line 1350
> {code:java}
> //hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
> public void run() {
>   while (running) {
> SelectionKey key = null;
> try {
>   getSelector().select();
>   Iterator iter = 
> getSelector().selectedKeys().iterator();
>   while (iter.hasNext()) {
> key = iter.next();
> iter.remove();
> try {
>   if (key.isValid()) {
> if (key.isAcceptable())
>   doAccept(key);
>   }
> } catch (IOException e) { // line 1350
> }
> key = null;
>   }
> } catch (OutOfMemoryError e) {
>   // ...
> } catch (Exception e) {
>   // ...
> }
>   }
> }
> void doAccept(SelectionKey key) throws InterruptedException, IOException, 
> OutOfMemoryError {
>   ServerSocketChannel server = (ServerSocketChannel) key.channel();
>   SocketChannel channel;
>   while ((channel = server.accept()) != null) {   // line 1400
> channel.configureBlocking(false); // line 1402
> channel.socket().setTcpNoDelay(tcpNoDelay);   // line 1403
> channel.socket().setKeepAlive(true);  // line 1404
> 
> Reader reader = getReader();
> Connection c = connectionManager.register(channel,
> this.listenPort, this.isOnAuxiliaryPort);
> // If the connectionManager can't take it, close the connection.
> if (c == null) {
>   if (channel.isOpen()) {
> IOUtils.cleanup(null, channel);
>   }
>   connectionManager.droppedConnections.getAndIncrement();
>   

[GitHub] [hadoop] iwasakims commented on pull request #2727: HADOOP-17552. Change ipc.client.rpc-timeout.ms from 0 to 120000 by default to avoid potential hang

2021-03-05 Thread GitBox


iwasakims commented on pull request #2727:
URL: https://github.com/apache/hadoop/pull/2727#issuecomment-791872984


   @functioner You should address the checkstyle warning. I think we don't need 
the  comment.
   
   
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java:61:
  public static final int IPC_CLIENT_RPC_TIMEOUT_DEFAULT = 12; // 120 
seconds: Line is longer than 80 characters (found 81). [LineLength]



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] tomscut opened a new pull request #2748: HDFS-15879. Exclude slow nodes when choose targets for blocks

2021-03-05 Thread GitBox


tomscut opened a new pull request #2748:
URL: https://github.com/apache/hadoop/pull/2748


   JIRA: [HDFS-15879](https://issues.apache.org/jira/browse/HDFS-15879)
   
   Previously, we have monitored the slow nodes, related to 
[HDFS-11194](https://issues.apache.org/jira/browse/HDFS-11194)
   
   We can use a thread to periodically collect these slow nodes into a set. 
Then use the set to filter out slow nodes when choose targets for blocks.
   
   This feature can be configured to be turned on when needed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] tomscut commented on pull request #2743: HDFS-15873. Add namenode address in logs for block report

2021-03-05 Thread GitBox


tomscut commented on pull request #2743:
URL: https://github.com/apache/hadoop/pull/2743#issuecomment-791805356


   > Makes Sense
   > +1
   
   Thanks @ayushtkn for the review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ayushtkn commented on pull request #2745: Test YETUS-1102 (Add an option to comment to GitHub PR)

2021-03-05 Thread GitBox


ayushtkn commented on pull request #2745:
URL: https://github.com/apache/hadoop/pull/2745#issuecomment-791698381


   Hey @aajisaka 
   This looks good :-)
   Mind raising a jira for the same, so as to get this in, or are we blocked on 
something?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17552) Change ipc.client.rpc-timeout.ms from 0 to 120000 by default to avoid potential hang

2021-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17552?focusedWorklogId=561507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561507
 ]

ASF GitHub Bot logged work on HADOOP-17552:
---

Author: ASF GitHub Bot
Created on: 05/Mar/21 18:10
Start Date: 05/Mar/21 18:10
Worklog Time Spent: 10m 
  Work Description: functioner commented on pull request #2727:
URL: https://github.com/apache/hadoop/pull/2727#issuecomment-791591405


   Are we ready to merge? @ferhui @iwasakims 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 561507)
Time Spent: 8h 50m  (was: 8h 40m)

> Change ipc.client.rpc-timeout.ms from 0 to 12 by default to avoid 
> potential hang
> 
>
> Key: HADOOP-17552
> URL: https://issues.apache.org/jira/browse/HADOOP-17552
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
>     We are doing some systematic fault injection testing in Hadoop-3.2.2 and 
> when we try to run a client (e.g., `bin/hdfs dfs -ls /`) to our HDFS cluster 
> (1 NameNode, 2 DataNodes), the client gets stuck forever. After some 
> investigation, we believe that it’s a bug in `hadoop.ipc.Client` because the 
> read method of `hadoop.ipc.Client$Connection$PingInputStream` keeps 
> swallowing `java.net.SocketTimeoutException` due to the mistaken usage of the 
> `rpcTimeout` configuration in the `handleTimeout` method.
> *Reproduction*
>     Start HDFS with the default configuration. Then execute a client (we used 
> the command `bin/hdfs dfs -ls /` in the terminal). While HDFS is trying to 
> accept the client’s socket, inject a socket error (java.net.SocketException 
> or java.io.IOException), specifically at line 1402 (line 1403 or 1404 will 
> also work).
>     We prepare the scripts for reproduction in a gist 
> ([https://gist.github.com/functioner/08bcd86491b8ff32860eafda8c140e24]).
> *Diagnosis*
>     When the NameNode tries to accept a client’s socket, basically there are 
> 4 steps:
>  # accept the socket (line 1400)
>  # configure the socket (line 1402-1404)
>  # make the socket a Reader (after line 1404)
>  # swallow the possible IOException in line 1350
> {code:java}
> //hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
> public void run() {
>   while (running) {
> SelectionKey key = null;
> try {
>   getSelector().select();
>   Iterator iter = 
> getSelector().selectedKeys().iterator();
>   while (iter.hasNext()) {
> key = iter.next();
> iter.remove();
> try {
>   if (key.isValid()) {
> if (key.isAcceptable())
>   doAccept(key);
>   }
> } catch (IOException e) { // line 1350
> }
> key = null;
>   }
> } catch (OutOfMemoryError e) {
>   // ...
> } catch (Exception e) {
>   // ...
> }
>   }
> }
> void doAccept(SelectionKey key) throws InterruptedException, IOException, 
> OutOfMemoryError {
>   ServerSocketChannel server = (ServerSocketChannel) key.channel();
>   SocketChannel channel;
>   while ((channel = server.accept()) != null) {   // line 1400
> channel.configureBlocking(false); // line 1402
> channel.socket().setTcpNoDelay(tcpNoDelay);   // line 1403
> channel.socket().setKeepAlive(true);  // line 1404
> 
> Reader reader = getReader();
> Connection c = connectionManager.register(channel,
> this.listenPort, this.isOnAuxiliaryPort);
> // If the connectionManager can't take it, close the connection.
> if (c == null) {
>   if (channel.isOpen()) {
> IOUtils.cleanup(null, channel);
>   }
>   connectionManager.droppedConnections.getAndIncrement();
>   continue;
> }
> key.attach(c);  // so closeCurrentConnection can get the object
> reader.addConnection(c);
>   }
> }
> {code}
>     When a SocketException occurs in line 1402 (or 1403 or 1404), the 
> server.accept() in line 1400 has finished, so we expect the following 

[GitHub] [hadoop] functioner commented on pull request #2727: HADOOP-17552. Change ipc.client.rpc-timeout.ms from 0 to 120000 by default to avoid potential hang

2021-03-05 Thread GitBox


functioner commented on pull request #2727:
URL: https://github.com/apache/hadoop/pull/2727#issuecomment-791591405


   Are we ready to merge? @ferhui @iwasakims 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16721) Improve S3A rename resilience

2021-03-05 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296038#comment-17296038
 ] 

Steve Loughran commented on HADOOP-16721:
-

PR #1 had probe policy optional between LIST dir and HEAD object; having looked 
at a trace of a failure more closely, we have to stop doing the LIST calls as 
hive deleting subdirs in separate threads can break renames in other threads.

Instead:
* HEAD object to guarantee no rename under file
* contract XML changed appropriately
* test rename under file subdir skipped 



> Improve S3A rename resilience
> -
>
> Key: HADOOP-16721
> URL: https://issues.apache.org/jira/browse/HADOOP-16721
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. race condition in delete/rename overlap
> If you have multiple threads on a system doing rename operations, then one 
> thread doing a delete(dest/subdir) may delete the last file under a subdir, 
> and, before its listed and recreated any parent dir marker -other threads may 
> conclude there's an empty dest dir and fail.
> This is most likely on an overloaded system with many threads executing 
> rename operations, as with parallel copying taking place there are many 
> threads to schedule and https connections to pool. 
> h3. failure reporting
> the classic \{[rename(source, dest)}} operation returns \{{false}} on certain 
> failures, which, while somewhat consistent with the posix APIs, turns out to 
> be useless for identifying the cause of problems. Applications tend to have 
> code which goes
> {code}
> if (!fs.rename(src, dest)) throw new IOException("rename failed");
> {code}
> While ultimately the rename/3 call needs to be made public (HADOOP-11452) it 
> would then need a adoption across applications. We can do this in the hadoop 
> modules, but for Hive, Spark etc it will take along time.
> Proposed: a switch to tell S3A to stop downgrading certain failures (source 
> is dir, dest is file, src==dest, etc) into "false". This can be turned on 
> when trying to diagnose why things like Hive are failing.
> Production code: trivial 
> * change in rename(), 
> * new option
> * docs.
> Test code: 
> * need to clear this option for rename contract tests
> * need to create a new FS with this set to verify the various failure modes 
> trigger it.
>  
> If this works we should do the same for ABFS, GCS. Hey, maybe even HDFS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work started] (HADOOP-16721) Improve S3A rename resilience

2021-03-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-16721 started by Steve Loughran.
---
> Improve S3A rename resilience
> -
>
> Key: HADOOP-16721
> URL: https://issues.apache.org/jira/browse/HADOOP-16721
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. race condition in delete/rename overlap
> If you have multiple threads on a system doing rename operations, then one 
> thread doing a delete(dest/subdir) may delete the last file under a subdir, 
> and, before its listed and recreated any parent dir marker -other threads may 
> conclude there's an empty dest dir and fail.
> This is most likely on an overloaded system with many threads executing 
> rename operations, as with parallel copying taking place there are many 
> threads to schedule and https connections to pool. 
> h3. failure reporting
> the classic \{[rename(source, dest)}} operation returns \{{false}} on certain 
> failures, which, while somewhat consistent with the posix APIs, turns out to 
> be useless for identifying the cause of problems. Applications tend to have 
> code which goes
> {code}
> if (!fs.rename(src, dest)) throw new IOException("rename failed");
> {code}
> While ultimately the rename/3 call needs to be made public (HADOOP-11452) it 
> would then need a adoption across applications. We can do this in the hadoop 
> modules, but for Hive, Spark etc it will take along time.
> Proposed: a switch to tell S3A to stop downgrading certain failures (source 
> is dir, dest is file, src==dest, etc) into "false". This can be turned on 
> when trying to diagnose why things like Hive are failing.
> Production code: trivial 
> * change in rename(), 
> * new option
> * docs.
> Test code: 
> * need to clear this option for rename contract tests
> * need to create a new FS with this set to verify the various failure modes 
> trigger it.
>  
> If this works we should do the same for ABFS, GCS. Hey, maybe even HDFS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16721) Improve S3A rename resilience

2021-03-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-16721:

Target Version/s: 3.3.1
 Description: 
h3. race condition in delete/rename overlap

If you have multiple threads on a system doing rename operations, then one 
thread doing a delete(dest/subdir) may delete the last file under a subdir, 
and, before its listed and recreated any parent dir marker -other threads may 
conclude there's an empty dest dir and fail.

This is most likely on an overloaded system with many threads executing rename 
operations, as with parallel copying taking place there are many threads to 
schedule and https connections to pool. 

h3. failure reporting
the classic \{[rename(source, dest)}} operation returns \{{false}} on certain 
failures, which, while somewhat consistent with the posix APIs, turns out to be 
useless for identifying the cause of problems. Applications tend to have code 
which goes

{code}
if (!fs.rename(src, dest)) throw new IOException("rename failed");
{code}

While ultimately the rename/3 call needs to be made public (HADOOP-11452) it 
would then need a adoption across applications. We can do this in the hadoop 
modules, but for Hive, Spark etc it will take along time.

Proposed: a switch to tell S3A to stop downgrading certain failures (source is 
dir, dest is file, src==dest, etc) into "false". This can be turned on when 
trying to diagnose why things like Hive are failing.

Production code: trivial 
* change in rename(), 
* new option
* docs.

Test code: 
* need to clear this option for rename contract tests
* need to create a new FS with this set to verify the various failure modes 
trigger it.

 

If this works we should do the same for ABFS, GCS. Hey, maybe even HDFS

  was:

h3. race condition in delete/rename overlap

If you have multiple threads on a system doing rename operations, then one 
thread doing a delete(dest/subdir) may delete the last file under a subdir, 
and, before its listed and recreated any parent dir marker -other threads may 
conclude there's an empty dest dir and fail.

This is most likely on an overloaded system with many threads executing rename 
operations, as with parallel copying taking place there are many threads to 
schedule and https connections to pool. 

h3. failure reporting
the classic \{[rename(source, dest)}} operation returns \{{false}} on certain 
failures, which, while somewhat consistent with the posix APIs, turns out to be 
useless for identifying the cause of problems. Applications tend to have code 
which goes

{code}
if (!fs.rename(src, dest)) throw new IOException("rename failed");
{code}

While ultimately the rename/3 call needs to be made public (HADOOP-11452) it 
would then need a adoption across applications. We can do this in the hadoop 
modules, but for Hive, Spark etc it will take along time.

Proposed: a switch to tell S3A to stop downgrading certain failures (source is 
dir, dest is file, src==dest, etc) into "false". This can be turned on when 
trying to diagnose why things like Hive are failing.

Production code: trivial 
* change in rename(), 
* new option
* docs.

Test code: 
* need to clear this option for rename contract tests
* need to create a new FS with this set to verify the various failure modes 
trigger it.

 

If this works we should do the same for ABFS, GCS. Hey, maybe even HDFS


> Improve S3A rename resilience
> -
>
> Key: HADOOP-16721
> URL: https://issues.apache.org/jira/browse/HADOOP-16721
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. race condition in delete/rename overlap
> If you have multiple threads on a system doing rename operations, then one 
> thread doing a delete(dest/subdir) may delete the last file under a subdir, 
> and, before its listed and recreated any parent dir marker -other threads may 
> conclude there's an empty dest dir and fail.
> This is most likely on an overloaded system with many threads executing 
> rename operations, as with parallel copying taking place there are many 
> threads to schedule and https connections to pool. 
> h3. failure reporting
> the classic \{[rename(source, dest)}} operation returns \{{false}} on certain 
> failures, which, while somewhat consistent with the posix APIs, turns out to 
> be useless for identifying the cause of problems. Applications tend to have 
> code which goes
> {code}
> if (!fs.rename(src, dest)) throw new IOException("rename failed");
> {code}
> While ultimately the rename/3 call needs to be made public (HADOOP-11452) it 
> would then need a 

[jira] [Updated] (HADOOP-16721) Improve S3A rename resilience

2021-03-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-16721:

Description: 

h3. race condition in delete/rename overlap

If you have multiple threads on a system doing rename operations, then one 
thread doing a delete(dest/subdir) may delete the last file under a subdir, 
and, before its listed and recreated any parent dir marker -other threads may 
conclude there's an empty dest dir and fail.

This is most likely on an overloaded system with many threads executing rename 
operations, as with parallel copying taking place there are many threads to 
schedule and https connections to pool. 

h3. failure reporting
the classic \{[rename(source, dest)}} operation returns \{{false}} on certain 
failures, which, while somewhat consistent with the posix APIs, turns out to be 
useless for identifying the cause of problems. Applications tend to have code 
which goes

{code}
if (!fs.rename(src, dest)) throw new IOException("rename failed");
{code}

While ultimately the rename/3 call needs to be made public (HADOOP-11452) it 
would then need a adoption across applications. We can do this in the hadoop 
modules, but for Hive, Spark etc it will take along time.

Proposed: a switch to tell S3A to stop downgrading certain failures (source is 
dir, dest is file, src==dest, etc) into "false". This can be turned on when 
trying to diagnose why things like Hive are failing.

Production code: trivial 
* change in rename(), 
* new option
* docs.

Test code: 
* need to clear this option for rename contract tests
* need to create a new FS with this set to verify the various failure modes 
trigger it.

 

If this works we should do the same for ABFS, GCS. Hey, maybe even HDFS

  was:

Improve rename resilience in two ways.

h3. parent dir probes

allow an option skip the LIST for the parent and just do HEAD object to makes 
sure it is not a file. 

h3. failure reporting
the classic \{[rename(source, dest)}} operation returns \{{false}} on certain 
failures, which, while somewhat consistent with the posix APIs, turns out to be 
useless for identifying the cause of problems. Applications tend to have code 
which goes

{code}
if (!fs.rename(src, dest)) throw new IOException("rename failed");
{code}

While ultimately the rename/3 call needs to be made public (HADOOP-11452) it 
would then need a adoption across applications. We can do this in the hadoop 
modules, but for Hive, Spark etc it will take along time.

Proposed: a switch to tell S3A to stop downgrading certain failures (source is 
dir, dest is file, src==dest, etc) into "false". This can be turned on when 
trying to diagnose why things like Hive are failing.

Production code: trivial 
* change in rename(), 
* new option
* docs.

Test code: 
* need to clear this option for rename contract tests
* need to create a new FS with this set to verify the various failure modes 
trigger it.

 

If this works we should do the same for ABFS, GCS. Hey, maybe even HDFS


> Improve S3A rename resilience
> -
>
> Key: HADOOP-16721
> URL: https://issues.apache.org/jira/browse/HADOOP-16721
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. race condition in delete/rename overlap
> If you have multiple threads on a system doing rename operations, then one 
> thread doing a delete(dest/subdir) may delete the last file under a subdir, 
> and, before its listed and recreated any parent dir marker -other threads may 
> conclude there's an empty dest dir and fail.
> This is most likely on an overloaded system with many threads executing 
> rename operations, as with parallel copying taking place there are many 
> threads to schedule and https connections to pool. 
> h3. failure reporting
> the classic \{[rename(source, dest)}} operation returns \{{false}} on certain 
> failures, which, while somewhat consistent with the posix APIs, turns out to 
> be useless for identifying the cause of problems. Applications tend to have 
> code which goes
> {code}
> if (!fs.rename(src, dest)) throw new IOException("rename failed");
> {code}
> While ultimately the rename/3 call needs to be made public (HADOOP-11452) it 
> would then need a adoption across applications. We can do this in the hadoop 
> modules, but for Hive, Spark etc it will take along time.
> Proposed: a switch to tell S3A to stop downgrading certain failures (source 
> is dir, dest is file, src==dest, etc) into "false". This can be turned on 
> when trying to diagnose why things like Hive are failing.
> Production code: trivial 
> * change in rename(), 
> * 

[jira] [Updated] (HADOOP-16721) Improve S3A rename resilience

2021-03-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-16721:

Priority: Blocker  (was: Minor)

> Improve S3A rename resilience
> -
>
> Key: HADOOP-16721
> URL: https://issues.apache.org/jira/browse/HADOOP-16721
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. race condition in delete/rename overlap
> If you have multiple threads on a system doing rename operations, then one 
> thread doing a delete(dest/subdir) may delete the last file under a subdir, 
> and, before its listed and recreated any parent dir marker -other threads may 
> conclude there's an empty dest dir and fail.
> This is most likely on an overloaded system with many threads executing 
> rename operations, as with parallel copying taking place there are many 
> threads to schedule and https connections to pool. 
> h3. failure reporting
> the classic \{[rename(source, dest)}} operation returns \{{false}} on certain 
> failures, which, while somewhat consistent with the posix APIs, turns out to 
> be useless for identifying the cause of problems. Applications tend to have 
> code which goes
> {code}
> if (!fs.rename(src, dest)) throw new IOException("rename failed");
> {code}
> While ultimately the rename/3 call needs to be made public (HADOOP-11452) it 
> would then need a adoption across applications. We can do this in the hadoop 
> modules, but for Hive, Spark etc it will take along time.
> Proposed: a switch to tell S3A to stop downgrading certain failures (source 
> is dir, dest is file, src==dest, etc) into "false". This can be turned on 
> when trying to diagnose why things like Hive are failing.
> Production code: trivial 
> * change in rename(), 
> * new option
> * docs.
> Test code: 
> * need to clear this option for rename contract tests
> * need to create a new FS with this set to verify the various failure modes 
> trigger it.
>  
> If this works we should do the same for ABFS, GCS. Hey, maybe even HDFS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-16948) ABFS: Support single writer dirs

2021-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16948?focusedWorklogId=561419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561419
 ]

ASF GitHub Bot logged work on HADOOP-16948:
---

Author: ASF GitHub Bot
Created on: 05/Mar/21 14:12
Start Date: 05/Mar/21 14:12
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #1925:
URL: https://github.com/apache/hadoop/pull/1925#issuecomment-791443797


   OK, I'm happy with this; test changes are in, and the next step is to merge 
and see what happens to people using the feature.
   
   Billie: +1 from me; if you are happy with it yourself then merge into trunk 
at your leisure.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 561419)
Time Spent: 3h 50m  (was: 3h 40m)

> ABFS: Support single writer dirs
> 
>
> Key: HADOOP-16948
> URL: https://issues.apache.org/jira/browse/HADOOP-16948
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Minor
>  Labels: abfsactive, pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This would allow some directories to be configured as single writer 
> directories. The ABFS driver would obtain a lease when creating or opening a 
> file for writing and would automatically renew the lease and release the 
> lease when closing the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #1925: HADOOP-16948. Support single writer dirs.

2021-03-05 Thread GitBox


steveloughran commented on pull request #1925:
URL: https://github.com/apache/hadoop/pull/1925#issuecomment-791443797


   OK, I'm happy with this; test changes are in, and the next step is to merge 
and see what happens to people using the feature.
   
   Billie: +1 from me; if you are happy with it yourself then merge into trunk 
at your leisure.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17563) Update Bouncy Castle to 1.68

2021-03-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HADOOP-17563:
--
Fix Version/s: 3.2.3
   3.4.0
   3.3.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Update Bouncy Castle to 1.68
> 
>
> Key: HADOOP-17563
> URL: https://issues.apache.org/jira/browse/HADOOP-17563
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Bouncy Castle 1.60 has Hash Collision Vulnerability. Let's update to 1.68.
> https://www.sourceclear.com/vulnerability-database/security/hash-collision/java/sid-6009



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17563) Update Bouncy Castle to 1.68

2021-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17563?focusedWorklogId=561409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561409
 ]

ASF GitHub Bot logged work on HADOOP-17563:
---

Author: ASF GitHub Bot
Created on: 05/Mar/21 13:57
Start Date: 05/Mar/21 13:57
Worklog Time Spent: 10m 
  Work Description: tasanuma merged pull request #2740:
URL: https://github.com/apache/hadoop/pull/2740


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 561409)
Time Spent: 1h 10m  (was: 1h)

> Update Bouncy Castle to 1.68
> 
>
> Key: HADOOP-17563
> URL: https://issues.apache.org/jira/browse/HADOOP-17563
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Bouncy Castle 1.60 has Hash Collision Vulnerability. Let's update to 1.68.
> https://www.sourceclear.com/vulnerability-database/security/hash-collision/java/sid-6009



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17563) Update Bouncy Castle to 1.68

2021-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17563?focusedWorklogId=561410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561410
 ]

ASF GitHub Bot logged work on HADOOP-17563:
---

Author: ASF GitHub Bot
Created on: 05/Mar/21 13:57
Start Date: 05/Mar/21 13:57
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #2740:
URL: https://github.com/apache/hadoop/pull/2740#issuecomment-791435039


   Thanks for your review, @aajisaka!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 561410)
Time Spent: 1h 20m  (was: 1h 10m)

> Update Bouncy Castle to 1.68
> 
>
> Key: HADOOP-17563
> URL: https://issues.apache.org/jira/browse/HADOOP-17563
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Bouncy Castle 1.60 has Hash Collision Vulnerability. Let's update to 1.68.
> https://www.sourceclear.com/vulnerability-database/security/hash-collision/java/sid-6009



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] tasanuma commented on pull request #2740: HADOOP-17563. Update Bouncy Castle to 1.68.

2021-03-05 Thread GitBox


tasanuma commented on pull request #2740:
URL: https://github.com/apache/hadoop/pull/2740#issuecomment-791435039


   Thanks for your review, @aajisaka!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] tasanuma merged pull request #2740: HADOOP-17563. Update Bouncy Castle to 1.68.

2021-03-05 Thread GitBox


tasanuma merged pull request #2740:
URL: https://github.com/apache/hadoop/pull/2740


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] haiyang1987 opened a new pull request #2747: HDFS-15877. BlockReconstructionWork should resetTargets() before BlockManager#validateReconstructionWork return false

2021-03-05 Thread GitBox


haiyang1987 opened a new pull request #2747:
URL: https://github.com/apache/hadoop/pull/2747


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HADOOP-X. Fix a typo in YYY.)
   For more details, please see 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17531) DistCp: Reduce memory usage on copying huge directories

2021-03-05 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295971#comment-17295971
 ] 

Ayush Saxena commented on HADOOP-17531:
---

Have raised HADOOP-17558, for Object stores. Will try use a fixed size TPE 
there, instead this prod-consumer setup

Regarding {{listFiles}}, I think this won't include directories?, and In distCp 
we add the directories too in the sequence File, Using listFiles would miss 
atleast the empty directories and may be that preserve attributes(-p option) on 
directories would also not work.

I would give a try to that as well in HADOOP-17558 and try to include the IO 
performance stuff in the LOG as well.

> DistCp: Reduce memory usage on copying huge directories
> ---
>
> Key: HADOOP-17531
> URL: https://issues.apache.org/jira/browse/HADOOP-17531
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: MoveToStackIterator.patch, gc-NewD-512M-3.8ML.log
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Presently distCp, uses the producer-consumer kind of setup while building the 
> listing, the input queue and output queue are both unbounded, thus the 
> listStatus grows quite huge.
> Rel Code Part :
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L635
> This goes on bredth-first traversal kind of stuff(uses queue instead of 
> earlier stack), so if you have files at lower depth, it will like open up the 
> entire tree and the start processing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ayushtkn commented on a change in pull request #2746: HDFS-15875. Check whether file is being truncated before truncate

2021-03-05 Thread GitBox


ayushtkn commented on a change in pull request #2746:
URL: https://github.com/apache/hadoop/pull/2746#discussion_r588195715



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
##
@@ -218,6 +219,65 @@ public void testSnapshotTruncateThenDeleteSnapshot() 
throws IOException {
 fs.delete(dir, true);
   }
 
+
+  /**
+   * Test truncate twice together on a file
+   */
+  @Test(timeout=9)
+  public void testTruncateTwiceTogether() throws Exception {
+
+Path dir = new Path("/testTruncateTwiceTogether");
+fs.mkdirs(dir);
+final Path p = new Path(dir, "file");
+final byte[] data = new byte[100 * BLOCK_SIZE];
+ThreadLocalRandom.current().nextBytes(data);
+writeContents(data, data.length, p);
+
+DataNodeFaultInjector originInjector = DataNodeFaultInjector.get();

Review comment:
   Is this used somewhere?

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
##
@@ -218,6 +219,65 @@ public void testSnapshotTruncateThenDeleteSnapshot() 
throws IOException {
 fs.delete(dir, true);
   }
 
+
+  /**
+   * Test truncate twice together on a file
+   */
+  @Test(timeout=9)
+  public void testTruncateTwiceTogether() throws Exception {
+
+Path dir = new Path("/testTruncateTwiceTogether");
+fs.mkdirs(dir);
+final Path p = new Path(dir, "file");
+final byte[] data = new byte[100 * BLOCK_SIZE];
+ThreadLocalRandom.current().nextBytes(data);
+writeContents(data, data.length, p);
+
+DataNodeFaultInjector originInjector = DataNodeFaultInjector.get();
+DataNodeFaultInjector injector = new DataNodeFaultInjector() {
+  @Override
+  public void delay() {
+try {
+  // Bigger than soft lease period.
+  Thread.sleep(65000);
+} catch (InterruptedException e) {
+  e.printStackTrace();

Review comment:
   No need to print the trace, if possible add Exception to method 
signature and remove this try-catch, else ignore the exception

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
##
@@ -218,6 +219,65 @@ public void testSnapshotTruncateThenDeleteSnapshot() 
throws IOException {
 fs.delete(dir, true);
   }
 
+
+  /**
+   * Test truncate twice together on a file
+   */
+  @Test(timeout=9)
+  public void testTruncateTwiceTogether() throws Exception {
+
+Path dir = new Path("/testTruncateTwiceTogether");
+fs.mkdirs(dir);
+final Path p = new Path(dir, "file");
+final byte[] data = new byte[100 * BLOCK_SIZE];
+ThreadLocalRandom.current().nextBytes(data);
+writeContents(data, data.length, p);
+
+DataNodeFaultInjector originInjector = DataNodeFaultInjector.get();
+DataNodeFaultInjector injector = new DataNodeFaultInjector() {
+  @Override
+  public void delay() {
+try {
+  // Bigger than soft lease period.
+  Thread.sleep(65000);
+} catch (InterruptedException e) {
+  e.printStackTrace();
+}
+  }
+};
+// Delay to recovery.
+DataNodeFaultInjector.set(injector);
+
+// Truncate by using different client name.
+Thread t = new Thread(() ->
+{
+  String hdfsCacheDisableKey = "fs.hdfs.impl.disable.cache";
+  boolean originCacheDisable =
+  conf.getBoolean(hdfsCacheDisableKey, false);
+  try {
+conf.setBoolean(hdfsCacheDisableKey, true);
+FileSystem fs1 = FileSystem.get(conf);
+fs1.truncate(p, data.length-1);
+} catch (IOException e) {
+  // ignore
+} finally{
+  conf.setBoolean(hdfsCacheDisableKey, originCacheDisable);
+}
+});
+t.start();
+t.join();
+Thread.sleep(6);
+try {
+  fs.truncate(p, data.length - 2);
+} catch (IOException e) {
+  //GenericTestUtils.assertExceptionContains("is being truncated.", e);
+}

Review comment:
   Can use LambdaTestUtils.
   
   ```
   LambdaTestUtils.intercept(RemoteException.class,
   "/testTruncateTwiceTogether/file is being truncated",
   () -> fs.truncate(p, data.length - 2));
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17527) ABFS: Fix boundary conditions in InputStream seek and skip

2021-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17527?focusedWorklogId=561359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561359
 ]

ASF GitHub Bot logged work on HADOOP-17527:
---

Author: ASF GitHub Bot
Created on: 05/Mar/21 09:55
Start Date: 05/Mar/21 09:55
Worklog Time Spent: 10m 
  Work Description: sumangala-patki closed pull request #2698:
URL: https://github.com/apache/hadoop/pull/2698


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 561359)
Time Spent: 4h 50m  (was: 4h 40m)

> ABFS: Fix boundary conditions in InputStream seek and skip
> --
>
> Key: HADOOP-17527
> URL: https://issues.apache.org/jira/browse/HADOOP-17527
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sumangala Patki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Modify AbfsInputStream seek method to throw EOF exception on seek to 
> contentLength for a non-empty file. With this change, it will no longer be 
> possible for the inputstream position (as obtained by getPos() API) to be 
> moved to contentlength manually, except post reading the last byte.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17527) ABFS: Fix boundary conditions in InputStream seek and skip

2021-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17527?focusedWorklogId=561358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561358
 ]

ASF GitHub Bot logged work on HADOOP-17527:
---

Author: ASF GitHub Bot
Created on: 05/Mar/21 09:54
Start Date: 05/Mar/21 09:54
Worklog Time Spent: 10m 
  Work Description: sumangala-patki commented on a change in pull request 
#2698:
URL: https://github.com/apache/hadoop/pull/2698#discussion_r588169126



##
File path: 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
##
@@ -542,7 +542,7 @@ public synchronized void seek(long n) throws IOException {
 if (n < 0) {
   throw new EOFException(FSExceptionMessages.NEGATIVE_SEEK);
 }
-if (n > contentLength) {
+if (n > 0 && n >= contentLength) {
   throw new EOFException(FSExceptionMessages.CANNOT_SEEK_PAST_EOF);

Review comment:
   seek(n=0) is allowed; n>0 has to be specified so as to avoid throwing 
exception on seek(0) in a 0-byte file





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 561358)
Time Spent: 4h 40m  (was: 4.5h)

> ABFS: Fix boundary conditions in InputStream seek and skip
> --
>
> Key: HADOOP-17527
> URL: https://issues.apache.org/jira/browse/HADOOP-17527
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sumangala Patki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Modify AbfsInputStream seek method to throw EOF exception on seek to 
> contentLength for a non-empty file. With this change, it will no longer be 
> possible for the inputstream position (as obtained by getPos() API) to be 
> moved to contentlength manually, except post reading the last byte.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] sumangala-patki closed pull request #2698: HADOOP-17527. ABFS: Fix boundary conditions in InputStream seek and skip

2021-03-05 Thread GitBox


sumangala-patki closed pull request #2698:
URL: https://github.com/apache/hadoop/pull/2698


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] sumangala-patki commented on a change in pull request #2698: HADOOP-17527. ABFS: Fix boundary conditions in InputStream seek and skip

2021-03-05 Thread GitBox


sumangala-patki commented on a change in pull request #2698:
URL: https://github.com/apache/hadoop/pull/2698#discussion_r588169126



##
File path: 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
##
@@ -542,7 +542,7 @@ public synchronized void seek(long n) throws IOException {
 if (n < 0) {
   throw new EOFException(FSExceptionMessages.NEGATIVE_SEEK);
 }
-if (n > contentLength) {
+if (n > 0 && n >= contentLength) {
   throw new EOFException(FSExceptionMessages.CANNOT_SEEK_PAST_EOF);

Review comment:
   seek(n=0) is allowed; n>0 has to be specified so as to avoid throwing 
exception on seek(0) in a 0-byte file





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17548) ABFS: Config for Mkdir overwrite

2021-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17548?focusedWorklogId=561357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561357
 ]

ASF GitHub Bot logged work on HADOOP-17548:
---

Author: ASF GitHub Bot
Created on: 05/Mar/21 09:49
Start Date: 05/Mar/21 09:49
Worklog Time Spent: 10m 
  Work Description: sumangala-patki commented on pull request #2729:
URL: https://github.com/apache/hadoop/pull/2729#issuecomment-791305493


   TEST RESULTS
   
   HNS Account Location: East US 2
   NonHNS Account Location: East US 2, Central US
   Overwrite=true
   
   ```
   HNS OAuth
   
   [INFO] Tests run: 93, Failures: 0, Errors: 0, Skipped: 0
   [INFO] Tests run: 504, Failures: 0, Errors: 0, Skipped: 70
   [WARNING] Tests run: 257, Failures: 0, Errors: 0, Skipped: 48
   
   HNS SharedKey
   
   [INFO] Tests run: 93, Failures: 0, Errors: 0, Skipped: 0
   [WARNING] Tests run: 513, Failures: 0, Errors: 0, Skipped: 26
   [WARNING] Tests run: 257, Failures: 0, Errors: 0, Skipped: 40
   
   Non-HNS SharedKey
   
   [INFO] Tests run: 93, Failures: 0, Errors: 0, Skipped: 0
   [WARNING] Tests run: 504, Failures: 0, Errors: 0, Skipped: 250
   [WARNING] Tests run: 257, Failures: 0, Errors: 0, Skipped: 40
   ```
   
   Dev Fabric (Xns account)
   Overwrite=false
   
   ```
   Tests run: 868, passed: 762, failed: 19, ignored: 87
   Errors:
   ITestAbfsNetworkStatistics, ITestAzureBlobFileSystemCheckAccess, 
ITestAzureBlobFileSystemFileStatus, ITestClientUrlScheme
   ITestFileSystemInitialization, ITestFileSystemRegistration, 
TestAbfsConfigurationFieldsValidation, ITestAbfsDelegationTokens
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 561357)
Time Spent: 0.5h  (was: 20m)

> ABFS: Config for Mkdir overwrite
> 
>
> Key: HADOOP-17548
> URL: https://issues.apache.org/jira/browse/HADOOP-17548
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.1
>Reporter: Sumangala Patki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The call to mkdirs with overwrite set to true results in an additional call 
> to set properties (LMT update, etc) at the backend, which is not required for 
> the HDFS scenario. Moreover, mkdirs on an existing file path returns success. 
> This PR provides an option to set the overwrite parameter to false, and 
> ensures that mkdirs on a file throws an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] sumangala-patki commented on pull request #2729: HADOOP-17548. ABFS: Toggle Store Mkdirs request overwrite parameter

2021-03-05 Thread GitBox


sumangala-patki commented on pull request #2729:
URL: https://github.com/apache/hadoop/pull/2729#issuecomment-791305493


   TEST RESULTS
   
   HNS Account Location: East US 2
   NonHNS Account Location: East US 2, Central US
   Overwrite=true
   
   ```
   HNS OAuth
   
   [INFO] Tests run: 93, Failures: 0, Errors: 0, Skipped: 0
   [INFO] Tests run: 504, Failures: 0, Errors: 0, Skipped: 70
   [WARNING] Tests run: 257, Failures: 0, Errors: 0, Skipped: 48
   
   HNS SharedKey
   
   [INFO] Tests run: 93, Failures: 0, Errors: 0, Skipped: 0
   [WARNING] Tests run: 513, Failures: 0, Errors: 0, Skipped: 26
   [WARNING] Tests run: 257, Failures: 0, Errors: 0, Skipped: 40
   
   Non-HNS SharedKey
   
   [INFO] Tests run: 93, Failures: 0, Errors: 0, Skipped: 0
   [WARNING] Tests run: 504, Failures: 0, Errors: 0, Skipped: 250
   [WARNING] Tests run: 257, Failures: 0, Errors: 0, Skipped: 40
   ```
   
   Dev Fabric (Xns account)
   Overwrite=false
   
   ```
   Tests run: 868, passed: 762, failed: 19, ignored: 87
   Errors:
   ITestAbfsNetworkStatistics, ITestAzureBlobFileSystemCheckAccess, 
ITestAzureBlobFileSystemFileStatus, ITestClientUrlScheme
   ITestFileSystemInitialization, ITestFileSystemRegistration, 
TestAbfsConfigurationFieldsValidation, ITestAbfsDelegationTokens
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] asf-cloudbees-jenkins-ci-hadoop[bot] commented on pull request #2745: Test YETUS-1102 (Add an option to comment to GitHub PR)

2021-03-05 Thread GitBox


asf-cloudbees-jenkins-ci-hadoop[bot] commented on pull request #2745:
URL: https://github.com/apache/hadoop/pull/2745#issuecomment-791268008


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 52s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  shelldocs  |   0m  1s |  |  Shelldocs was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m  6s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  shadedclient  |  13m 11s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  shellcheck  |   0m  0s |  |  No new issues.  |
   | +1 :green_heart: |  shadedclient  |  13m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  44m 15s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2745/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2745 |
   | Optional Tests | dupname asflicense codespell shellcheck shelldocs |
   | uname | Linux 3d8e89a4a2b0 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 48eab3a26f1c1338d50f0bd625962ff373c502ce |
   | Max. process+thread count | 721 (vs. ulimit of 5500) |
   | modules | C:  U:  |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2745/3/console |
   | versions | git=2.25.1 maven=3.6.3 shellcheck=0.7.0 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ferhui commented on pull request #2746: HDFS-15875. Check whether file is being truncated before truncate

2021-03-05 Thread GitBox


ferhui commented on pull request #2746:
URL: https://github.com/apache/hadoop/pull/2746#issuecomment-791254544


   @ayushtkn Could you please help to  review this? Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org