[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-06-09 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359902#comment-17359902
 ] 

Viraj Jasani edited comment on HDFS-15982 at 6/9/21, 3:46 PM:
--

{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete -> FSDirDeleteOp#delete -> 
deleteInternal -> delete -> deleteAllowed) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

Although I agree that delete(path) should return true for non-existing path, if 
we change this behaviour (as part of separate Jira), it will be another 
incompatible change.

 

Edit: Even Abfs returns false while deleting non-existent path it seems

 
{code:java}
try {
  abfsStore.delete(qualifiedPath, recursive);
  return true;
} catch (AzureBlobFileSystemException ex) {
  checkException(f, ex, AzureServiceErrorCode.PATH_NOT_FOUND);
  return false;
}
{code}
 

 


was (Author: vjasani):
{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

Although I agree that delete(path) should return true for non-existing path, if 
we change this behaviour (as part of separate Jira), it will be an incompatible 
change.

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, httpfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-06-09 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359902#comment-17359902
 ] 

Viraj Jasani edited comment on HDFS-15982 at 6/9/21, 3:46 PM:
--

{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete -> FSDirDeleteOp#delete -> 
deleteInternal -> delete -> deleteAllowed) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

Although I agree that delete(path) should return true for non-existing path, if 
we change this behaviour (as part of separate Jira), it will be another 
incompatible change.

 

Edit: Even Abfs returns false while deleting non-existent path it seems
{code:java}
try {
  abfsStore.delete(qualifiedPath, recursive);
  return true;
} catch (AzureBlobFileSystemException ex) {
  checkException(f, ex, AzureServiceErrorCode.PATH_NOT_FOUND);
  return false;
}
{code}


was (Author: vjasani):
{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete -> FSDirDeleteOp#delete -> 
deleteInternal -> delete -> deleteAllowed) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

Although I agree that delete(path) should return true for non-existing path, if 
we change this behaviour (as part of separate Jira), it will be another 
incompatible change.

 

Edit: Even Abfs returns false while deleting non-existent path it seems

 
{code:java}
try {
  abfsStore.delete(qualifiedPath, recursive);
  return true;
} catch (AzureBlobFileSystemException ex) {
  checkException(f, ex, AzureServiceErrorCode.PATH_NOT_FOUND);
  return false;
}
{code}
 

 

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, httpfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-06-09 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359902#comment-17359902
 ] 

Viraj Jasani edited comment on HDFS-15982 at 6/9/21, 11:58 AM:
---

{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

Although I agree that delete(path) should return true for non-existing path, if 
we change this behaviour (as part of separate Jira), it will be an incompatible 
change.


was (Author: vjasani):
{quote}delete(path) MUST be a no-op if the path isn't there. The way to view 
the semantics of the call is that delete(path) == true implies the path is no 
longer present.
{quote}
[~ste...@apache.org] It seems that we don't follow this everywhere.

DFS client (NameNode -> FSNameSystem#delete) doesn't follow this and I just 
quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not 
followed. For non existing file, FS#delete returns false.

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, httpfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-05-08 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339535#comment-17339535
 ] 

Viraj Jasani edited comment on HDFS-15982 at 5/8/21, 12:27 PM:
---

{quote}Regarding the UI. If the trash interval isn't set and If I select 
NO(move to trash), It still deletes with success? Check if the behaviour is 
like that, The client may be in wrong impression that things moved to trash, 
but it actually didn't. We should have bugged him back, Trash isn't enabled.
{quote}
If trash interval isn't set and if we select NO, it does delete with success 
(as per logic 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]:
 file moves to trash only if skiptrash is false and trashInterval > 0)
{code:java}
case DELETE: {
  Configuration conf =
  (Configuration) context.getAttribute(JspHelper.CURRENT_CONF);
  long trashInterval =
  conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT);
  if (trashInterval > 0 && !skipTrash.getValue()) {
LOG.info("{} is {} , trying to archive {} instead of removing",
FS_TRASH_INTERVAL_KEY, trashInterval, fullpath);
org.apache.hadoop.fs.Path path =
new org.apache.hadoop.fs.Path(fullpath);
Configuration clonedConf = new Configuration(conf);
// To avoid caching FS objects and prevent OOM issues
clonedConf.set("fs.hdfs.impl.disable.cache", "true");
FileSystem fs = FileSystem.get(clonedConf);
boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path,
clonedConf);
if (movedToTrash) {
  final String js = JsonUtil.toJsonString("boolean", true);
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}
// Same is the behavior with Delete shell command.
// If moveToAppropriateTrash() returns false, file deletion
// is attempted rather than throwing Error.
LOG.debug("Could not move {} to Trash, attempting removal", fullpath);
  }
  final boolean b = cp.delete(fullpath, recursive.getValue());
  final String js = JsonUtil.toJsonString("boolean", b);
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}

{code}
I think in UI, we can provide additional info in same model: "These buttons are 
useful only if fs.trash.interval has been configured. Without setting interval, 
files will be hard deleted anyways."

I just saw HDFS-14117 on using trash on RBF: "delete the files or dirs of one 
subcluster in a cluster with multiple subclusters".

 
{quote}Router would also have issues, if the trash path resolves to different 
NS, or to some path which isn't in the Mount Table, when Default Namespace 
isn't configured.
{quote}
I tried to explore if we can replace everything that 
Trash.moveToAppropriateTrash() offers with what ClientProtocol provides (that 
way router and mount table resolution would be taken care of) but it seems 
almost impossible to replace all FileSystem utilities used by Trash.

 

Considering all the discussion happened so far (and with further feedbacks from 
Ayush and Wei-Chiu), my proposals for addendum PR#2976:
 * Keep default value of skiptrash true (thereby making default behaviour of 
HTTP API compatible with existing releases)
 * Update doc and tests accordingly
 * Let NamenodeWebHdfsMethods Delete endpoint perform "moving file to .Trash 
dir" operation with skiptrash false only if NameNode uses NameNodeRpcServer 
(i.e default HDFS FileSystem) because RouterRpcServer requires special 
treatment that Trash utility is not providing as of today. If we agree to this, 
we need to document this option with proper information. This is not yet 
present on addendum PR#2976, will try only once we have consensus.


was (Author: vjasani):
{quote}Regarding the UI. If the trash interval isn't set and If I select 
NO(move to trash), It still deletes with success? Check if the behaviour is 
like that, The client may be in wrong impression that things moved to trash, 
but it actually didn't. We should have bugged him back, Trash isn't enabled.
{quote}
If trash interval isn't set and if we select NO, it does delete with success 
(as per logic 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]:
 file moves to trash only if skiptrash is false and trashInterval > 0)
{code:java}
case DELETE: {
  Configuration conf =
  (Configuration) context.getAttribute(JspHelper.CURRENT_CONF);
  long trashInterval =
  conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT);
  if (trashInterval > 0 && !skipTrash.getValue()) {
LOG.info("{} is {} , trying to archive {} instead of removing",
FS_TRASH_INTERVAL_KEY, trashInterval, fullpath);
org.apache.hadoop.fs.P

[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-05-05 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339535#comment-17339535
 ] 

Viraj Jasani edited comment on HDFS-15982 at 5/5/21, 6:55 PM:
--

{quote}Regarding the UI. If the trash interval isn't set and If I select 
NO(move to trash), It still deletes with success? Check if the behaviour is 
like that, The client may be in wrong impression that things moved to trash, 
but it actually didn't. We should have bugged him back, Trash isn't enabled.
{quote}
If trash interval isn't set and if we select NO, it does delete with success 
(as per logic 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]:
 file moves to trash only if skiptrash is false and trashInterval > 0)
{code:java}
case DELETE: {
  Configuration conf =
  (Configuration) context.getAttribute(JspHelper.CURRENT_CONF);
  long trashInterval =
  conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT);
  if (trashInterval > 0 && !skipTrash.getValue()) {
LOG.info("{} is {} , trying to archive {} instead of removing",
FS_TRASH_INTERVAL_KEY, trashInterval, fullpath);
org.apache.hadoop.fs.Path path =
new org.apache.hadoop.fs.Path(fullpath);
Configuration clonedConf = new Configuration(conf);
// To avoid caching FS objects and prevent OOM issues
clonedConf.set("fs.hdfs.impl.disable.cache", "true");
FileSystem fs = FileSystem.get(clonedConf);
boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path,
clonedConf);
if (movedToTrash) {
  final String js = JsonUtil.toJsonString("boolean", true);
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}
// Same is the behavior with Delete shell command.
// If moveToAppropriateTrash() returns false, file deletion
// is attempted rather than throwing Error.
LOG.debug("Could not move {} to Trash, attempting removal", fullpath);
  }
  final boolean b = cp.delete(fullpath, recursive.getValue());
  final String js = JsonUtil.toJsonString("boolean", b);
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}

{code}
I think in UI, we can provide additional info in same model: "These buttons are 
useful only if fs.trash.interval has been configured. Without setting interval, 
files will be hard deleted anyways."

I just saw HDFS-14117 on using trash on RBF: "delete the files or dirs of one 
subcluster in a cluster with multiple subclusters".

 
{quote}Router would also have issues, if the trash path resolves to different 
NS, or to some path which isn't in the Mount Table, when Default Namespace 
isn't configured.
{quote}
I tried to explore if we can replace everything that 
Trash.moveToAppropriateTrash() offers with what ClientProtocol provides (that 
way router and mount table resolution would be taken care of) but it seems 
almost impossible to replace all FileSystem utilities used by Trash.


was (Author: vjasani):
{quote}Regarding the UI. If the trash interval isn't set and If I select 
NO(move to trash), It still deletes with success? Check if the behaviour is 
like that, The client may be in wrong impression that things moved to trash, 
but it actually didn't. We should have bugged him back, Trash isn't enabled.
{quote}
If trash interval isn't set and if we select NO, it does delete with success 
(as per logic 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]:
 file moves to trash only if skiptrash is false and trashInterval > 0)
{code:java}
case DELETE: {
  Configuration conf =
  (Configuration) context.getAttribute(JspHelper.CURRENT_CONF);
  long trashInterval =
  conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT);
  if (trashInterval > 0 && !skipTrash.getValue()) {
LOG.info("{} is {} , trying to archive {} instead of removing",
FS_TRASH_INTERVAL_KEY, trashInterval, fullpath);
org.apache.hadoop.fs.Path path =
new org.apache.hadoop.fs.Path(fullpath);
Configuration clonedConf = new Configuration(conf);
// To avoid caching FS objects and prevent OOM issues
clonedConf.set("fs.hdfs.impl.disable.cache", "true");
FileSystem fs = FileSystem.get(clonedConf);
boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path,
clonedConf);
if (movedToTrash) {
  final String js = JsonUtil.toJsonString("boolean", true);
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}
// Same is the behavior with Delete shell command.
// If moveToAppropriateTrash() returns false, file deletion
// is attempted rather than throwing Error.
LOG.debug("Could not move {} to Trash, attempti

[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-05-05 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339535#comment-17339535
 ] 

Viraj Jasani edited comment on HDFS-15982 at 5/5/21, 9:35 AM:
--

{quote}Regarding the UI. If the trash interval isn't set and If I select 
NO(move to trash), It still deletes with success? Check if the behaviour is 
like that, The client may be in wrong impression that things moved to trash, 
but it actually didn't. We should have bugged him back, Trash isn't enabled.
{quote}
If trash interval isn't set and if we select NO, it does delete with success 
(as per logic 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]:
 file moves to trash only if skiptrash is false and trashInterval > 0)
{code:java}
case DELETE: {
  Configuration conf =
  (Configuration) context.getAttribute(JspHelper.CURRENT_CONF);
  long trashInterval =
  conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT);
  if (trashInterval > 0 && !skipTrash.getValue()) {
LOG.info("{} is {} , trying to archive {} instead of removing",
FS_TRASH_INTERVAL_KEY, trashInterval, fullpath);
org.apache.hadoop.fs.Path path =
new org.apache.hadoop.fs.Path(fullpath);
Configuration clonedConf = new Configuration(conf);
// To avoid caching FS objects and prevent OOM issues
clonedConf.set("fs.hdfs.impl.disable.cache", "true");
FileSystem fs = FileSystem.get(clonedConf);
boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path,
clonedConf);
if (movedToTrash) {
  final String js = JsonUtil.toJsonString("boolean", true);
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}
// Same is the behavior with Delete shell command.
// If moveToAppropriateTrash() returns false, file deletion
// is attempted rather than throwing Error.
LOG.debug("Could not move {} to Trash, attempting removal", fullpath);
  }
  final boolean b = cp.delete(fullpath, recursive.getValue());
  final String js = JsonUtil.toJsonString("boolean", b);
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}

{code}
I think in UI, we can provide additional info in same model: "These buttons are 
useful only if fs.trash.interval has been configured. Without setting interval, 
files will be hard deleted anyways."

I just saw HDFS-14117 on using trash on RBF: "delete the files or dirs of one 
subcluster in a cluster with multiple subclusters".


was (Author: vjasani):
{quote}Regarding the UI. If the trash interval isn't set and If I select 
NO(move to trash), It still deletes with success? Check if the behaviour is 
like that, The client may be in wrong impression that things moved to trash, 
but it actually didn't. We should have bugged him back, Trash isn't enabled.
{quote}
If trash interval isn't set and if we select NO, it does delete with success 
(as per logic 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]:
 file moves to trash only if skiptrash is false and trashInterval > 0)
{code:java}
case DELETE: {
  Configuration conf =
  (Configuration) context.getAttribute(JspHelper.CURRENT_CONF);
  long trashInterval =
  conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT);
  if (trashInterval > 0 && !skipTrash.getValue()) {
LOG.info("{} is {} , trying to archive {} instead of removing",
FS_TRASH_INTERVAL_KEY, trashInterval, fullpath);
org.apache.hadoop.fs.Path path =
new org.apache.hadoop.fs.Path(fullpath);
Configuration clonedConf = new Configuration(conf);
// To avoid caching FS objects and prevent OOM issues
clonedConf.set("fs.hdfs.impl.disable.cache", "true");
FileSystem fs = FileSystem.get(clonedConf);
boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path,
clonedConf);
if (movedToTrash) {
  final String js = JsonUtil.toJsonString("boolean", true);
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}
// Same is the behavior with Delete shell command.
// If moveToAppropriateTrash() returns false, file deletion
// is attempted rather than throwing Error.
LOG.debug("Could not move {} to Trash, attempting removal", fullpath);
  }
  final boolean b = cp.delete(fullpath, recursive.getValue());
  final String js = JsonUtil.toJsonString("boolean", b);
  return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
}

{code}
I think in UI, we can provide additional info in same model: "These buttons are 
useful only if fs.trash.interval has been configured. Without setting interval, 
files will be hard deleted anyways."

> Deleted data using HTTP API should 

[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-05-04 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339438#comment-17339438
 ] 

Viraj Jasani edited comment on HDFS-15982 at 5/5/21, 6:30 AM:
--

 
{quote}I was following the older jira.

[~daryn] had a comment:{color:#172b4d} {color}

{color:#172b4d}You cannot or should not create a default fs and parse a path in 
the NN. It's very dangerous. Give me some time (that I don't have) and I'd 
likely come up with a nasty exploit.{color}
{quote}
This seems interesting. Although yes, this is default fs, but it is 
instantiated from WebHdfs config object only (which is used by all endpoints in 
NamenodeWebHdfsMethods). Is WebHdfs server implementation used for any other 
FileSystem (from current and future viewpoints)?
{quote}Router part needs to be checked again and confirmed, Trash itself has 
issues with Router(There are Jiras). So, if Trash becomes true by default, I 
doubt delete through Router don't break or moves to some weird place.
{quote}
Oops, I was not aware of existing concerns with router path resolution within 
Trash. I think this is fair point to make this Jira a compatible change w.r.t 
DELETE REST API calls. Let me provide an addendum for trunk to bring default 
value of skiptrash as true.

branch-3.3 backport PR#2925 is pending. Shall we get it in and then I can 
provide addendum for both trunk and branch-3.3 for clean history?

Btw [~ayushtkn] you might also want to check screenshots attached on this Jira 
to take a look at how skiptrash is handled from Web UI.


was (Author: vjasani):
{quote}I was following the older jira.

[~daryn] had a comment:
{quote}You cannot or should not create a default fs and parse a path in the NN. 
It's very dangerous. Give me some time (that I don't have) and I'd likely come 
up with a nasty exploit.
{quote}{quote}
This seems interesting. Although yes, this is default fs, but it is 
instantiated from WebHdfs config object only (which is used by all endpoints in 
NamenodeWebHdfsMethods). Is WebHdfs server implementation used for any other 
FileSystem (from current and future viewpoints)?
{quote}Router part needs to be checked again and confirmed, Trash itself has 
issues with Router(There are Jiras). So, if Trash becomes true by default, I 
doubt delete through Router don't break or moves to some weird place.
{quote}
Oops, I was not aware of existing concerns with router path resolution with 
Trash. I think this is fair point to make this Jira a compatible change w.r.t 
DELETE REST API calls. Let me provide an addendum for trunk to bring default 
value of skiptrash as true. branch-3.3 backport PR#2925 is pending. Shall we 
get it in and then I can provide addendum for both trunk and branch-3.3 for 
clean history?

Btw [~ayushtkn] you might also want to check screenshots attached on this Jira 
to take a look at how skiptrash is handled from Web UI.

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, httpfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-05-04 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338945#comment-17338945
 ] 

Viraj Jasani edited comment on HDFS-15982 at 5/4/21, 2:24 PM:
--

Thanks for the questions [~ayushtkn]

All of below examples include positive value of fs.trash.interval config (to 
enable trash through core-site config).
{quote} * If my FileSystem is WebHdfs, If I call delete with recursive false on 
a non empty directory, will it delete now or throw me an exception? Should be a 
NO{quote}
{code:java}
$ curl -X DELETE 
"http://localhost:9870/webhdfs/v1/xyz?op=DELETE&recursive=false";
{"boolean":true}
{code}
{code:java}
$ curl -X DELETE 
"http://localhost:9870/webhdfs/v1/xyz?op=DELETE&recursive=false&skiptrash=true"; 
{"RemoteException":{"exception":"PathIsNotEmptyDirectoryException","javaClassName":"org.apache.hadoop.fs.PathIsNotEmptyDirectoryException","message":"`/xyz
 is non empty': Directory is not empty"}}
{code}
recursive is used by fs.delete() only and not by fs.rename(), hence moving 
non-empty folder to trash just works fine. However, for skiptrash true case, we 
should just return \{"boolean":false} ?

 
{quote} * if my FileSystem is WebHdfs, If I call delete on a non-existing file, 
the response will be false? or Now an exception.{quote}
{code:java}
$ curl -X DELETE 
"http://localhost:9870/webhdfs/v1/xyz1?op=DELETE&recursive=false";
{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
 does not exist: /xyz1"}}
{code}
{code:java}
$ curl -X DELETE 
"http://localhost:9870/webhdfs/v1/xyz1?op=DELETE&recursive=false&skiptrash=true";
 {"boolean":false}
{code}
Similarly here, for moving to trash case, we should return \{"boolean":false} ?
{quote} * How does this trash path resolution behaves when the client is coming 
through Router?{quote}
I think the resolution should be taken care of by FileSystem 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1573-L1575].

For the above cases where Exceptions are being thrown instead of returning 
\{"boolean":false}, let me file a follow-up subtask soon once you confirm the 
expected behaviour [~ayushtkn].

Thanks


was (Author: vjasani):
Thanks for the questions [~ayushtkn]

All of below examples include positive value of fs.trash.interval config (to 
enable trash through core-site config).
{quote} * If my FileSystem is WebHdfs, If I call delete with recursive false on 
a non empty directory, will it delete now or throw me an exception? Should be a 
NO{quote}
{code:java}
$ curl -X DELETE 
"http://localhost:9870/webhdfs/v1/xyz?op=DELETE&recursive=false";
{"boolean":true}
{code}
{code:java}
$ curl -X DELETE 
"http://localhost:9870/webhdfs/v1/xyz?op=DELETE&recursive=false&skiptrash=true"; 
{"RemoteException":{"exception":"PathIsNotEmptyDirectoryException","javaClassName":"org.apache.hadoop.fs.PathIsNotEmptyDirectoryException","message":"`/xyz
 is non empty': Directory is not empty"}}
{code}
I believe for skiptrash true case, we should just return \{"boolean":false} ?

 
{quote} * if my FileSystem is WebHdfs, If I call delete on a non-existing file, 
the response will be false? or Now an exception.{quote}
{code:java}
$ curl -X DELETE 
"http://localhost:9870/webhdfs/v1/xyz1?op=DELETE&recursive=false";
{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
 does not exist: /xyz1"}}
{code}
{code:java}
$ curl -X DELETE 
"http://localhost:9870/webhdfs/v1/xyz1?op=DELETE&recursive=false&skiptrash=true";
 {"boolean":false}
{code}
Similarly here, for moving to trash case, we should return \{"boolean":false} ?
{quote} * How does this trash path resolution behaves when the client is coming 
through Router?{quote}
I think the resolution should be taken care of by FileSystem 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1573-L1575].

For the above cases where Exceptions are being thrown instead of returning 
\{"boolean":false}, let me file a follow-up subtask soon once you confirm the 
expected behaviour [~ayushtkn].

Thanks

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, httpfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-2

[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-04-26 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331801#comment-17331801
 ] 

Viraj Jasani edited comment on HDFS-15982 at 4/26/21, 12:44 PM:


{quote}[~vjasani] I think [~weichiu] is trying to tell like along with webhdfs 
API we have to also consider the behavior of httpfs(server) delete API.
{quote}
Oh I see, my bad. Yes, this is also taken care of 
[here|https://github.com/apache/hadoop/pull/2927/files#diff-635868fc4dbae8e9368423032df25338bcde0e9bad63e0a11e5e328db7dcfbccR261].


was (Author: vjasani):
{quote}[~vjasani] I think [~weichiu] is trying to tell like along with webhdfs 
API we have to also consider the behavior of httpfs(server) delete API.
{quote}
Oh I see, my bad. Yes, this is also taken care of 
[here|https://github.com/apache/hadoop/pull/2927/files#diff-b8d1575f2afb5b04a56f4a43eb8ed4387fbef8ebe51c4503052413a2887cf96bR747].

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-04-23 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330398#comment-17330398
 ] 

Viraj Jasani edited comment on HDFS-15982 at 4/23/21, 11:39 AM:


{quote}Let's rename the JIRA subject and also PR by replacing "Web UI" with 
"HTTP API". HDFS "Web UI" is usually about the web portal that one can browse 
for information purpose. This JIRA is to change the "RESTful HTTP API", not 
about the Web UI.
{quote}
Thanks [~liuml07], yes the core changes are at API level and hence I have 
updated Jira as well as PR titles. However, in the recent revision, I have 
added "skipTrash" option in API, which is accessible through UI. Please find 
attached screenshots.
{quote}My only concern about this is that, the "Trash" concept is not a part of 
the FileSystem DELETE API. Changing this behavior may break existing 
applications that assumes storage will be released.
{quote}
I understand and hence there is no attempt to change FileSystem DELETE API 
itself, rather the changes are limited to  NamenodeWebHdfsMethods endpoint only.
{quote}It seems counter-intuitive that one can skipTrash from command line but 
can not using WebHDFS. Since keeping data in Trash for a while is usually a 
good idea, I think I'm fine with this feature proposal. Ideally we can expose 
-skipTrash parameter so users can choose.
{quote}
Thanks for this advice. Updated PR revision has this change.
{quote}When I explore I found HDFS-14320 is all about the same idea and similar 
implementation. Do you guys want to post there and try with a collaboration to 
get this in? I did not look into that closely.
{quote}
Apologies, I was not aware of existing Jira already. Thanks for pointing this 
out. I have commented over this Jira as well. Since that Jira's last patch 
revision was more than 2 yr old, I am not sure if the patch is upto date or we 
will get active response from there.

 

Please find attached screenshots and review 
[PR|https://github.com/apache/hadoop/pull/2927] as per your convenience. Thanks.


was (Author: vjasani):
{quote}Let's rename the JIRA subject and also PR by replacing "Web UI" with 
"HTTP API". HDFS "Web UI" is usually about the web portal that one can browse 
for information purpose. This JIRA is to change the "RESTful HTTP API", not 
about the Web UI.
{quote}
Thanks [~liuml07], yes the core changes are at API level and hence I have 
updated Jira as well as PR titles. However, in the recent revision, I have 
added "skipTrash" option in API, which can be accessed through UI. Please find 
attached screenshots.
{quote}My only concern about this is that, the "Trash" concept is not a part of 
the FileSystem DELETE API. Changing this behavior may break existing 
applications that assumes storage will be released.
{quote}
I understand and hence there is no attempt to change FileSystem DELETE API 
itself, rather the changes are limited to  NamenodeWebHdfsMethods endpoint only.
{quote}It seems counter-intuitive that one can skipTrash from command line but 
can not using WebHDFS. Since keeping data in Trash for a while is usually a 
good idea, I think I'm fine with this feature proposal. Ideally we can expose 
-skipTrash parameter so users can choose.
{quote}
Thanks for this advice. Updated PR revision has this change.
{quote}When I explore I found HDFS-14320 is all about the same idea and similar 
implementation. Do you guys want to post there and try with a collaboration to 
get this in? I did not look into that closely.
{quote}
Apologies, I was not aware of existing Jira already. Thanks for pointing this 
out. I have commented over this Jira as well. Since that Jira's last patch 
revision was more than 2 yr old, I am not sure if the patch is upto date or we 
will get active response from there.

 

Please find attached screenshots and review as per your convenience. Thanks.

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" 

[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-04-23 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330398#comment-17330398
 ] 

Viraj Jasani edited comment on HDFS-15982 at 4/23/21, 11:37 AM:


{quote}Let's rename the JIRA subject and also PR by replacing "Web UI" with 
"HTTP API". HDFS "Web UI" is usually about the web portal that one can browse 
for information purpose. This JIRA is to change the "RESTful HTTP API", not 
about the Web UI.
{quote}
Thanks [~liuml07], yes the core changes are at API level and hence I have 
updated Jira as well as PR titles. However, in the recent revision, I have 
added "skipTrash" option in API, which can be accessed through UI. Please find 
attached screenshots.
{quote}My only concern about this is that, the "Trash" concept is not a part of 
the FileSystem DELETE API. Changing this behavior may break existing 
applications that assumes storage will be released.
{quote}
I understand and hence there is no attempt to change FileSystem DELETE API 
itself, rather the changes are limited to  NamenodeWebHdfsMethods endpoint only.
{quote}It seems counter-intuitive that one can skipTrash from command line but 
can not using WebHDFS. Since keeping data in Trash for a while is usually a 
good idea, I think I'm fine with this feature proposal. Ideally we can expose 
-skipTrash parameter so users can choose.
{quote}
Thanks for this advice. Updated PR revision has this change.
{quote}When I explore I found HDFS-14320 is all about the same idea and similar 
implementation. Do you guys want to post there and try with a collaboration to 
get this in? I did not look into that closely.
{quote}
Apologies, I was not aware of existing Jira already. Thanks for pointing this 
out. I have commented over this Jira as well. Since that Jira's last patch 
revision was more than 2 yr old, I am not sure if the patch is upto date or we 
will get active response from there.

 

Please find attached screenshots and review as per your convenience. Thanks.


was (Author: vjasani):
{quote}Let's rename the JIRA subject and also PR by replacing "Web UI" with 
"HTTP API". HDFS "Web UI" is usually about the web portal that one can browse 
for information purpose. This JIRA is to change the "RESTful HTTP API", not 
about the Web UI.
{quote}
Thanks [~liuml07], yes the core changes are at API level and hence I have 
updated Jira as well as PR titles. However, in the recent revision, I have 
added "skipTrash" option in API, which can be accessed through UI. Please find 
attached screenshots.
{quote}My only concern about this is that, the "Trash" concept is not a part of 
the FileSystem DELETE API. Changing this behavior may break existing 
applications that assumes storage will be released.
{quote}
I understand and hence there is no attempt to change FileSystem DELETE API 
itself, rather the changes are limited to  NamenodeWebHdfsMethods endpoint only.
{quote}It seems counter-intuitive that one can skipTrash from command line but 
can not using WebHDFS. Since keeping data in Trash for a while is usually a 
good idea, I think I'm fine with this feature proposal. Ideally we can expose 
-skipTrash parameter so users can choose.
{quote}
Thanks for this advice. Updated PR revision has this change.
{quote}When I explore I found [HDFS-14320] is all about the same idea and 
similar implementation. Do you guys want to post there and try with a 
collaboration to get this in? I did not look into that closely.
{quote}
Apologies, I was not aware of existing Jira already. Thanks for pointing this 
out. I have commented over this Jira as well. Since that Jira's last patch 
revision was more than 2 yr old, I am not sure if the patch is upto date or we 
will get active response from there.

 

!Screenshot 2021-04-23 at 4.19.42 PM.png!

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
T