[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359902#comment-17359902 ] Viraj Jasani edited comment on HDFS-15982 at 6/9/21, 3:46 PM: -- {quote}delete(path) MUST be a no-op if the path isn't there. The way to view the semantics of the call is that delete(path) == true implies the path is no longer present. {quote} [~ste...@apache.org] It seems that we don't follow this everywhere. DFS client (NameNode -> FSNameSystem#delete -> FSDirDeleteOp#delete -> deleteInternal -> delete -> deleteAllowed) doesn't follow this and I just quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not followed. For non existing file, FS#delete returns false. Although I agree that delete(path) should return true for non-existing path, if we change this behaviour (as part of separate Jira), it will be another incompatible change. Edit: Even Abfs returns false while deleting non-existent path it seems {code:java} try { abfsStore.delete(qualifiedPath, recursive); return true; } catch (AzureBlobFileSystemException ex) { checkException(f, ex, AzureServiceErrorCode.PATH_NOT_FOUND); return false; } {code} was (Author: vjasani): {quote}delete(path) MUST be a no-op if the path isn't there. The way to view the semantics of the call is that delete(path) == true implies the path is no longer present. {quote} [~ste...@apache.org] It seems that we don't follow this everywhere. DFS client (NameNode -> FSNameSystem#delete) doesn't follow this and I just quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not followed. For non existing file, FS#delete returns false. Although I agree that delete(path) should return true for non-existing path, if we change this behaviour (as part of separate Jira), it will be an incompatible change. > Deleted data using HTTP API should be saved to the trash > > > Key: HDFS-15982 > URL: https://issues.apache.org/jira/browse/HDFS-15982 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs, hdfs-client, httpfs, webhdfs >Reporter: Bhavik Patel >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot > 2021-04-23 at 4.36.57 PM.png > > Time Spent: 13h 20m > Remaining Estimate: 0h > > If we delete the data from the Web UI then it should be first moved to > configured/default Trash directory and after the trash interval time, it > should be removed. currently, data directly removed from the system[This > behavior should be the same as CLI cmd] > This can be helpful when the user accidentally deletes data from the Web UI. > Similarly we should provide "Skip Trash" option in HTTP API as well which > should be accessible through Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359902#comment-17359902 ] Viraj Jasani edited comment on HDFS-15982 at 6/9/21, 3:46 PM: -- {quote}delete(path) MUST be a no-op if the path isn't there. The way to view the semantics of the call is that delete(path) == true implies the path is no longer present. {quote} [~ste...@apache.org] It seems that we don't follow this everywhere. DFS client (NameNode -> FSNameSystem#delete -> FSDirDeleteOp#delete -> deleteInternal -> delete -> deleteAllowed) doesn't follow this and I just quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not followed. For non existing file, FS#delete returns false. Although I agree that delete(path) should return true for non-existing path, if we change this behaviour (as part of separate Jira), it will be another incompatible change. Edit: Even Abfs returns false while deleting non-existent path it seems {code:java} try { abfsStore.delete(qualifiedPath, recursive); return true; } catch (AzureBlobFileSystemException ex) { checkException(f, ex, AzureServiceErrorCode.PATH_NOT_FOUND); return false; } {code} was (Author: vjasani): {quote}delete(path) MUST be a no-op if the path isn't there. The way to view the semantics of the call is that delete(path) == true implies the path is no longer present. {quote} [~ste...@apache.org] It seems that we don't follow this everywhere. DFS client (NameNode -> FSNameSystem#delete -> FSDirDeleteOp#delete -> deleteInternal -> delete -> deleteAllowed) doesn't follow this and I just quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not followed. For non existing file, FS#delete returns false. Although I agree that delete(path) should return true for non-existing path, if we change this behaviour (as part of separate Jira), it will be another incompatible change. Edit: Even Abfs returns false while deleting non-existent path it seems {code:java} try { abfsStore.delete(qualifiedPath, recursive); return true; } catch (AzureBlobFileSystemException ex) { checkException(f, ex, AzureServiceErrorCode.PATH_NOT_FOUND); return false; } {code} > Deleted data using HTTP API should be saved to the trash > > > Key: HDFS-15982 > URL: https://issues.apache.org/jira/browse/HDFS-15982 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs, hdfs-client, httpfs, webhdfs >Reporter: Bhavik Patel >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot > 2021-04-23 at 4.36.57 PM.png > > Time Spent: 13h 20m > Remaining Estimate: 0h > > If we delete the data from the Web UI then it should be first moved to > configured/default Trash directory and after the trash interval time, it > should be removed. currently, data directly removed from the system[This > behavior should be the same as CLI cmd] > This can be helpful when the user accidentally deletes data from the Web UI. > Similarly we should provide "Skip Trash" option in HTTP API as well which > should be accessible through Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359902#comment-17359902 ] Viraj Jasani edited comment on HDFS-15982 at 6/9/21, 11:58 AM: --- {quote}delete(path) MUST be a no-op if the path isn't there. The way to view the semantics of the call is that delete(path) == true implies the path is no longer present. {quote} [~ste...@apache.org] It seems that we don't follow this everywhere. DFS client (NameNode -> FSNameSystem#delete) doesn't follow this and I just quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not followed. For non existing file, FS#delete returns false. Although I agree that delete(path) should return true for non-existing path, if we change this behaviour (as part of separate Jira), it will be an incompatible change. was (Author: vjasani): {quote}delete(path) MUST be a no-op if the path isn't there. The way to view the semantics of the call is that delete(path) == true implies the path is no longer present. {quote} [~ste...@apache.org] It seems that we don't follow this everywhere. DFS client (NameNode -> FSNameSystem#delete) doesn't follow this and I just quickly tested Http FS with WebHdfs and LocalFS, and this semantic is not followed. For non existing file, FS#delete returns false. > Deleted data using HTTP API should be saved to the trash > > > Key: HDFS-15982 > URL: https://issues.apache.org/jira/browse/HDFS-15982 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs, hdfs-client, httpfs, webhdfs >Reporter: Bhavik Patel >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot > 2021-04-23 at 4.36.57 PM.png > > Time Spent: 13h 20m > Remaining Estimate: 0h > > If we delete the data from the Web UI then it should be first moved to > configured/default Trash directory and after the trash interval time, it > should be removed. currently, data directly removed from the system[This > behavior should be the same as CLI cmd] > This can be helpful when the user accidentally deletes data from the Web UI. > Similarly we should provide "Skip Trash" option in HTTP API as well which > should be accessible through Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339535#comment-17339535 ] Viraj Jasani edited comment on HDFS-15982 at 5/8/21, 12:27 PM: --- {quote}Regarding the UI. If the trash interval isn't set and If I select NO(move to trash), It still deletes with success? Check if the behaviour is like that, The client may be in wrong impression that things moved to trash, but it actually didn't. We should have bugged him back, Trash isn't enabled. {quote} If trash interval isn't set and if we select NO, it does delete with success (as per logic [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]: file moves to trash only if skiptrash is false and trashInterval > 0) {code:java} case DELETE: { Configuration conf = (Configuration) context.getAttribute(JspHelper.CURRENT_CONF); long trashInterval = conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT); if (trashInterval > 0 && !skipTrash.getValue()) { LOG.info("{} is {} , trying to archive {} instead of removing", FS_TRASH_INTERVAL_KEY, trashInterval, fullpath); org.apache.hadoop.fs.Path path = new org.apache.hadoop.fs.Path(fullpath); Configuration clonedConf = new Configuration(conf); // To avoid caching FS objects and prevent OOM issues clonedConf.set("fs.hdfs.impl.disable.cache", "true"); FileSystem fs = FileSystem.get(clonedConf); boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path, clonedConf); if (movedToTrash) { final String js = JsonUtil.toJsonString("boolean", true); return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); } // Same is the behavior with Delete shell command. // If moveToAppropriateTrash() returns false, file deletion // is attempted rather than throwing Error. LOG.debug("Could not move {} to Trash, attempting removal", fullpath); } final boolean b = cp.delete(fullpath, recursive.getValue()); final String js = JsonUtil.toJsonString("boolean", b); return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); } {code} I think in UI, we can provide additional info in same model: "These buttons are useful only if fs.trash.interval has been configured. Without setting interval, files will be hard deleted anyways." I just saw HDFS-14117 on using trash on RBF: "delete the files or dirs of one subcluster in a cluster with multiple subclusters". {quote}Router would also have issues, if the trash path resolves to different NS, or to some path which isn't in the Mount Table, when Default Namespace isn't configured. {quote} I tried to explore if we can replace everything that Trash.moveToAppropriateTrash() offers with what ClientProtocol provides (that way router and mount table resolution would be taken care of) but it seems almost impossible to replace all FileSystem utilities used by Trash. Considering all the discussion happened so far (and with further feedbacks from Ayush and Wei-Chiu), my proposals for addendum PR#2976: * Keep default value of skiptrash true (thereby making default behaviour of HTTP API compatible with existing releases) * Update doc and tests accordingly * Let NamenodeWebHdfsMethods Delete endpoint perform "moving file to .Trash dir" operation with skiptrash false only if NameNode uses NameNodeRpcServer (i.e default HDFS FileSystem) because RouterRpcServer requires special treatment that Trash utility is not providing as of today. If we agree to this, we need to document this option with proper information. This is not yet present on addendum PR#2976, will try only once we have consensus. was (Author: vjasani): {quote}Regarding the UI. If the trash interval isn't set and If I select NO(move to trash), It still deletes with success? Check if the behaviour is like that, The client may be in wrong impression that things moved to trash, but it actually didn't. We should have bugged him back, Trash isn't enabled. {quote} If trash interval isn't set and if we select NO, it does delete with success (as per logic [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]: file moves to trash only if skiptrash is false and trashInterval > 0) {code:java} case DELETE: { Configuration conf = (Configuration) context.getAttribute(JspHelper.CURRENT_CONF); long trashInterval = conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT); if (trashInterval > 0 && !skipTrash.getValue()) { LOG.info("{} is {} , trying to archive {} instead of removing", FS_TRASH_INTERVAL_KEY, trashInterval, fullpath); org.apache.hadoop.fs.P
[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339535#comment-17339535 ] Viraj Jasani edited comment on HDFS-15982 at 5/5/21, 6:55 PM: -- {quote}Regarding the UI. If the trash interval isn't set and If I select NO(move to trash), It still deletes with success? Check if the behaviour is like that, The client may be in wrong impression that things moved to trash, but it actually didn't. We should have bugged him back, Trash isn't enabled. {quote} If trash interval isn't set and if we select NO, it does delete with success (as per logic [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]: file moves to trash only if skiptrash is false and trashInterval > 0) {code:java} case DELETE: { Configuration conf = (Configuration) context.getAttribute(JspHelper.CURRENT_CONF); long trashInterval = conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT); if (trashInterval > 0 && !skipTrash.getValue()) { LOG.info("{} is {} , trying to archive {} instead of removing", FS_TRASH_INTERVAL_KEY, trashInterval, fullpath); org.apache.hadoop.fs.Path path = new org.apache.hadoop.fs.Path(fullpath); Configuration clonedConf = new Configuration(conf); // To avoid caching FS objects and prevent OOM issues clonedConf.set("fs.hdfs.impl.disable.cache", "true"); FileSystem fs = FileSystem.get(clonedConf); boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path, clonedConf); if (movedToTrash) { final String js = JsonUtil.toJsonString("boolean", true); return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); } // Same is the behavior with Delete shell command. // If moveToAppropriateTrash() returns false, file deletion // is attempted rather than throwing Error. LOG.debug("Could not move {} to Trash, attempting removal", fullpath); } final boolean b = cp.delete(fullpath, recursive.getValue()); final String js = JsonUtil.toJsonString("boolean", b); return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); } {code} I think in UI, we can provide additional info in same model: "These buttons are useful only if fs.trash.interval has been configured. Without setting interval, files will be hard deleted anyways." I just saw HDFS-14117 on using trash on RBF: "delete the files or dirs of one subcluster in a cluster with multiple subclusters". {quote}Router would also have issues, if the trash path resolves to different NS, or to some path which isn't in the Mount Table, when Default Namespace isn't configured. {quote} I tried to explore if we can replace everything that Trash.moveToAppropriateTrash() offers with what ClientProtocol provides (that way router and mount table resolution would be taken care of) but it seems almost impossible to replace all FileSystem utilities used by Trash. was (Author: vjasani): {quote}Regarding the UI. If the trash interval isn't set and If I select NO(move to trash), It still deletes with success? Check if the behaviour is like that, The client may be in wrong impression that things moved to trash, but it actually didn't. We should have bugged him back, Trash isn't enabled. {quote} If trash interval isn't set and if we select NO, it does delete with success (as per logic [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]: file moves to trash only if skiptrash is false and trashInterval > 0) {code:java} case DELETE: { Configuration conf = (Configuration) context.getAttribute(JspHelper.CURRENT_CONF); long trashInterval = conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT); if (trashInterval > 0 && !skipTrash.getValue()) { LOG.info("{} is {} , trying to archive {} instead of removing", FS_TRASH_INTERVAL_KEY, trashInterval, fullpath); org.apache.hadoop.fs.Path path = new org.apache.hadoop.fs.Path(fullpath); Configuration clonedConf = new Configuration(conf); // To avoid caching FS objects and prevent OOM issues clonedConf.set("fs.hdfs.impl.disable.cache", "true"); FileSystem fs = FileSystem.get(clonedConf); boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path, clonedConf); if (movedToTrash) { final String js = JsonUtil.toJsonString("boolean", true); return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); } // Same is the behavior with Delete shell command. // If moveToAppropriateTrash() returns false, file deletion // is attempted rather than throwing Error. LOG.debug("Could not move {} to Trash, attempti
[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339535#comment-17339535 ] Viraj Jasani edited comment on HDFS-15982 at 5/5/21, 9:35 AM: -- {quote}Regarding the UI. If the trash interval isn't set and If I select NO(move to trash), It still deletes with success? Check if the behaviour is like that, The client may be in wrong impression that things moved to trash, but it actually didn't. We should have bugged him back, Trash isn't enabled. {quote} If trash interval isn't set and if we select NO, it does delete with success (as per logic [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]: file moves to trash only if skiptrash is false and trashInterval > 0) {code:java} case DELETE: { Configuration conf = (Configuration) context.getAttribute(JspHelper.CURRENT_CONF); long trashInterval = conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT); if (trashInterval > 0 && !skipTrash.getValue()) { LOG.info("{} is {} , trying to archive {} instead of removing", FS_TRASH_INTERVAL_KEY, trashInterval, fullpath); org.apache.hadoop.fs.Path path = new org.apache.hadoop.fs.Path(fullpath); Configuration clonedConf = new Configuration(conf); // To avoid caching FS objects and prevent OOM issues clonedConf.set("fs.hdfs.impl.disable.cache", "true"); FileSystem fs = FileSystem.get(clonedConf); boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path, clonedConf); if (movedToTrash) { final String js = JsonUtil.toJsonString("boolean", true); return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); } // Same is the behavior with Delete shell command. // If moveToAppropriateTrash() returns false, file deletion // is attempted rather than throwing Error. LOG.debug("Could not move {} to Trash, attempting removal", fullpath); } final boolean b = cp.delete(fullpath, recursive.getValue()); final String js = JsonUtil.toJsonString("boolean", b); return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); } {code} I think in UI, we can provide additional info in same model: "These buttons are useful only if fs.trash.interval has been configured. Without setting interval, files will be hard deleted anyways." I just saw HDFS-14117 on using trash on RBF: "delete the files or dirs of one subcluster in a cluster with multiple subclusters". was (Author: vjasani): {quote}Regarding the UI. If the trash interval isn't set and If I select NO(move to trash), It still deletes with success? Check if the behaviour is like that, The client may be in wrong impression that things moved to trash, but it actually didn't. We should have bugged him back, Trash isn't enabled. {quote} If trash interval isn't set and if we select NO, it does delete with success (as per logic [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1560]: file moves to trash only if skiptrash is false and trashInterval > 0) {code:java} case DELETE: { Configuration conf = (Configuration) context.getAttribute(JspHelper.CURRENT_CONF); long trashInterval = conf.getLong(FS_TRASH_INTERVAL_KEY, FS_TRASH_INTERVAL_DEFAULT); if (trashInterval > 0 && !skipTrash.getValue()) { LOG.info("{} is {} , trying to archive {} instead of removing", FS_TRASH_INTERVAL_KEY, trashInterval, fullpath); org.apache.hadoop.fs.Path path = new org.apache.hadoop.fs.Path(fullpath); Configuration clonedConf = new Configuration(conf); // To avoid caching FS objects and prevent OOM issues clonedConf.set("fs.hdfs.impl.disable.cache", "true"); FileSystem fs = FileSystem.get(clonedConf); boolean movedToTrash = Trash.moveToAppropriateTrash(fs, path, clonedConf); if (movedToTrash) { final String js = JsonUtil.toJsonString("boolean", true); return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); } // Same is the behavior with Delete shell command. // If moveToAppropriateTrash() returns false, file deletion // is attempted rather than throwing Error. LOG.debug("Could not move {} to Trash, attempting removal", fullpath); } final boolean b = cp.delete(fullpath, recursive.getValue()); final String js = JsonUtil.toJsonString("boolean", b); return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); } {code} I think in UI, we can provide additional info in same model: "These buttons are useful only if fs.trash.interval has been configured. Without setting interval, files will be hard deleted anyways." > Deleted data using HTTP API should
[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339438#comment-17339438 ] Viraj Jasani edited comment on HDFS-15982 at 5/5/21, 6:30 AM: -- {quote}I was following the older jira. [~daryn] had a comment:{color:#172b4d} {color} {color:#172b4d}You cannot or should not create a default fs and parse a path in the NN. It's very dangerous. Give me some time (that I don't have) and I'd likely come up with a nasty exploit.{color} {quote} This seems interesting. Although yes, this is default fs, but it is instantiated from WebHdfs config object only (which is used by all endpoints in NamenodeWebHdfsMethods). Is WebHdfs server implementation used for any other FileSystem (from current and future viewpoints)? {quote}Router part needs to be checked again and confirmed, Trash itself has issues with Router(There are Jiras). So, if Trash becomes true by default, I doubt delete through Router don't break or moves to some weird place. {quote} Oops, I was not aware of existing concerns with router path resolution within Trash. I think this is fair point to make this Jira a compatible change w.r.t DELETE REST API calls. Let me provide an addendum for trunk to bring default value of skiptrash as true. branch-3.3 backport PR#2925 is pending. Shall we get it in and then I can provide addendum for both trunk and branch-3.3 for clean history? Btw [~ayushtkn] you might also want to check screenshots attached on this Jira to take a look at how skiptrash is handled from Web UI. was (Author: vjasani): {quote}I was following the older jira. [~daryn] had a comment: {quote}You cannot or should not create a default fs and parse a path in the NN. It's very dangerous. Give me some time (that I don't have) and I'd likely come up with a nasty exploit. {quote}{quote} This seems interesting. Although yes, this is default fs, but it is instantiated from WebHdfs config object only (which is used by all endpoints in NamenodeWebHdfsMethods). Is WebHdfs server implementation used for any other FileSystem (from current and future viewpoints)? {quote}Router part needs to be checked again and confirmed, Trash itself has issues with Router(There are Jiras). So, if Trash becomes true by default, I doubt delete through Router don't break or moves to some weird place. {quote} Oops, I was not aware of existing concerns with router path resolution with Trash. I think this is fair point to make this Jira a compatible change w.r.t DELETE REST API calls. Let me provide an addendum for trunk to bring default value of skiptrash as true. branch-3.3 backport PR#2925 is pending. Shall we get it in and then I can provide addendum for both trunk and branch-3.3 for clean history? Btw [~ayushtkn] you might also want to check screenshots attached on this Jira to take a look at how skiptrash is handled from Web UI. > Deleted data using HTTP API should be saved to the trash > > > Key: HDFS-15982 > URL: https://issues.apache.org/jira/browse/HDFS-15982 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs, hdfs-client, httpfs, webhdfs >Reporter: Bhavik Patel >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot > 2021-04-23 at 4.36.57 PM.png > > Time Spent: 12h 10m > Remaining Estimate: 0h > > If we delete the data from the Web UI then it should be first moved to > configured/default Trash directory and after the trash interval time, it > should be removed. currently, data directly removed from the system[This > behavior should be the same as CLI cmd] > This can be helpful when the user accidentally deletes data from the Web UI. > Similarly we should provide "Skip Trash" option in HTTP API as well which > should be accessible through Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338945#comment-17338945 ] Viraj Jasani edited comment on HDFS-15982 at 5/4/21, 2:24 PM: -- Thanks for the questions [~ayushtkn] All of below examples include positive value of fs.trash.interval config (to enable trash through core-site config). {quote} * If my FileSystem is WebHdfs, If I call delete with recursive false on a non empty directory, will it delete now or throw me an exception? Should be a NO{quote} {code:java} $ curl -X DELETE "http://localhost:9870/webhdfs/v1/xyz?op=DELETE&recursive=false"; {"boolean":true} {code} {code:java} $ curl -X DELETE "http://localhost:9870/webhdfs/v1/xyz?op=DELETE&recursive=false&skiptrash=true"; {"RemoteException":{"exception":"PathIsNotEmptyDirectoryException","javaClassName":"org.apache.hadoop.fs.PathIsNotEmptyDirectoryException","message":"`/xyz is non empty': Directory is not empty"}} {code} recursive is used by fs.delete() only and not by fs.rename(), hence moving non-empty folder to trash just works fine. However, for skiptrash true case, we should just return \{"boolean":false} ? {quote} * if my FileSystem is WebHdfs, If I call delete on a non-existing file, the response will be false? or Now an exception.{quote} {code:java} $ curl -X DELETE "http://localhost:9870/webhdfs/v1/xyz1?op=DELETE&recursive=false"; {"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File does not exist: /xyz1"}} {code} {code:java} $ curl -X DELETE "http://localhost:9870/webhdfs/v1/xyz1?op=DELETE&recursive=false&skiptrash=true"; {"boolean":false} {code} Similarly here, for moving to trash case, we should return \{"boolean":false} ? {quote} * How does this trash path resolution behaves when the client is coming through Router?{quote} I think the resolution should be taken care of by FileSystem [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1573-L1575]. For the above cases where Exceptions are being thrown instead of returning \{"boolean":false}, let me file a follow-up subtask soon once you confirm the expected behaviour [~ayushtkn]. Thanks was (Author: vjasani): Thanks for the questions [~ayushtkn] All of below examples include positive value of fs.trash.interval config (to enable trash through core-site config). {quote} * If my FileSystem is WebHdfs, If I call delete with recursive false on a non empty directory, will it delete now or throw me an exception? Should be a NO{quote} {code:java} $ curl -X DELETE "http://localhost:9870/webhdfs/v1/xyz?op=DELETE&recursive=false"; {"boolean":true} {code} {code:java} $ curl -X DELETE "http://localhost:9870/webhdfs/v1/xyz?op=DELETE&recursive=false&skiptrash=true"; {"RemoteException":{"exception":"PathIsNotEmptyDirectoryException","javaClassName":"org.apache.hadoop.fs.PathIsNotEmptyDirectoryException","message":"`/xyz is non empty': Directory is not empty"}} {code} I believe for skiptrash true case, we should just return \{"boolean":false} ? {quote} * if my FileSystem is WebHdfs, If I call delete on a non-existing file, the response will be false? or Now an exception.{quote} {code:java} $ curl -X DELETE "http://localhost:9870/webhdfs/v1/xyz1?op=DELETE&recursive=false"; {"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File does not exist: /xyz1"}} {code} {code:java} $ curl -X DELETE "http://localhost:9870/webhdfs/v1/xyz1?op=DELETE&recursive=false&skiptrash=true"; {"boolean":false} {code} Similarly here, for moving to trash case, we should return \{"boolean":false} ? {quote} * How does this trash path resolution behaves when the client is coming through Router?{quote} I think the resolution should be taken care of by FileSystem [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L1573-L1575]. For the above cases where Exceptions are being thrown instead of returning \{"boolean":false}, let me file a follow-up subtask soon once you confirm the expected behaviour [~ayushtkn]. Thanks > Deleted data using HTTP API should be saved to the trash > > > Key: HDFS-15982 > URL: https://issues.apache.org/jira/browse/HDFS-15982 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs, hdfs-client, httpfs, webhdfs >Reporter: Bhavik Patel >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2021-04-2
[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331801#comment-17331801 ] Viraj Jasani edited comment on HDFS-15982 at 4/26/21, 12:44 PM: {quote}[~vjasani] I think [~weichiu] is trying to tell like along with webhdfs API we have to also consider the behavior of httpfs(server) delete API. {quote} Oh I see, my bad. Yes, this is also taken care of [here|https://github.com/apache/hadoop/pull/2927/files#diff-635868fc4dbae8e9368423032df25338bcde0e9bad63e0a11e5e328db7dcfbccR261]. was (Author: vjasani): {quote}[~vjasani] I think [~weichiu] is trying to tell like along with webhdfs API we have to also consider the behavior of httpfs(server) delete API. {quote} Oh I see, my bad. Yes, this is also taken care of [here|https://github.com/apache/hadoop/pull/2927/files#diff-b8d1575f2afb5b04a56f4a43eb8ed4387fbef8ebe51c4503052413a2887cf96bR747]. > Deleted data using HTTP API should be saved to the trash > > > Key: HDFS-15982 > URL: https://issues.apache.org/jira/browse/HDFS-15982 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs, webhdfs >Reporter: Bhavik Patel >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot > 2021-04-23 at 4.36.57 PM.png > > Time Spent: 3h 50m > Remaining Estimate: 0h > > If we delete the data from the Web UI then it should be first moved to > configured/default Trash directory and after the trash interval time, it > should be removed. currently, data directly removed from the system[This > behavior should be the same as CLI cmd] > This can be helpful when the user accidentally deletes data from the Web UI. > Similarly we should provide "Skip Trash" option in HTTP API as well which > should be accessible through Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330398#comment-17330398 ] Viraj Jasani edited comment on HDFS-15982 at 4/23/21, 11:39 AM: {quote}Let's rename the JIRA subject and also PR by replacing "Web UI" with "HTTP API". HDFS "Web UI" is usually about the web portal that one can browse for information purpose. This JIRA is to change the "RESTful HTTP API", not about the Web UI. {quote} Thanks [~liuml07], yes the core changes are at API level and hence I have updated Jira as well as PR titles. However, in the recent revision, I have added "skipTrash" option in API, which is accessible through UI. Please find attached screenshots. {quote}My only concern about this is that, the "Trash" concept is not a part of the FileSystem DELETE API. Changing this behavior may break existing applications that assumes storage will be released. {quote} I understand and hence there is no attempt to change FileSystem DELETE API itself, rather the changes are limited to NamenodeWebHdfsMethods endpoint only. {quote}It seems counter-intuitive that one can skipTrash from command line but can not using WebHDFS. Since keeping data in Trash for a while is usually a good idea, I think I'm fine with this feature proposal. Ideally we can expose -skipTrash parameter so users can choose. {quote} Thanks for this advice. Updated PR revision has this change. {quote}When I explore I found HDFS-14320 is all about the same idea and similar implementation. Do you guys want to post there and try with a collaboration to get this in? I did not look into that closely. {quote} Apologies, I was not aware of existing Jira already. Thanks for pointing this out. I have commented over this Jira as well. Since that Jira's last patch revision was more than 2 yr old, I am not sure if the patch is upto date or we will get active response from there. Please find attached screenshots and review [PR|https://github.com/apache/hadoop/pull/2927] as per your convenience. Thanks. was (Author: vjasani): {quote}Let's rename the JIRA subject and also PR by replacing "Web UI" with "HTTP API". HDFS "Web UI" is usually about the web portal that one can browse for information purpose. This JIRA is to change the "RESTful HTTP API", not about the Web UI. {quote} Thanks [~liuml07], yes the core changes are at API level and hence I have updated Jira as well as PR titles. However, in the recent revision, I have added "skipTrash" option in API, which can be accessed through UI. Please find attached screenshots. {quote}My only concern about this is that, the "Trash" concept is not a part of the FileSystem DELETE API. Changing this behavior may break existing applications that assumes storage will be released. {quote} I understand and hence there is no attempt to change FileSystem DELETE API itself, rather the changes are limited to NamenodeWebHdfsMethods endpoint only. {quote}It seems counter-intuitive that one can skipTrash from command line but can not using WebHDFS. Since keeping data in Trash for a while is usually a good idea, I think I'm fine with this feature proposal. Ideally we can expose -skipTrash parameter so users can choose. {quote} Thanks for this advice. Updated PR revision has this change. {quote}When I explore I found HDFS-14320 is all about the same idea and similar implementation. Do you guys want to post there and try with a collaboration to get this in? I did not look into that closely. {quote} Apologies, I was not aware of existing Jira already. Thanks for pointing this out. I have commented over this Jira as well. Since that Jira's last patch revision was more than 2 yr old, I am not sure if the patch is upto date or we will get active response from there. Please find attached screenshots and review as per your convenience. Thanks. > Deleted data using HTTP API should be saved to the trash > > > Key: HDFS-15982 > URL: https://issues.apache.org/jira/browse/HDFS-15982 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Bhavik Patel >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot > 2021-04-23 at 4.36.57 PM.png > > Time Spent: 1h 50m > Remaining Estimate: 0h > > If we delete the data from the Web UI then it should be first moved to > configured/default Trash directory and after the trash interval time, it > should be removed. currently, data directly removed from the system[This > behavior should be the same as CLI cmd] > This can be helpful when the user accidentally deletes data from the Web UI. > Similarly we should provide "Skip Trash"
[jira] [Comment Edited] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330398#comment-17330398 ] Viraj Jasani edited comment on HDFS-15982 at 4/23/21, 11:37 AM: {quote}Let's rename the JIRA subject and also PR by replacing "Web UI" with "HTTP API". HDFS "Web UI" is usually about the web portal that one can browse for information purpose. This JIRA is to change the "RESTful HTTP API", not about the Web UI. {quote} Thanks [~liuml07], yes the core changes are at API level and hence I have updated Jira as well as PR titles. However, in the recent revision, I have added "skipTrash" option in API, which can be accessed through UI. Please find attached screenshots. {quote}My only concern about this is that, the "Trash" concept is not a part of the FileSystem DELETE API. Changing this behavior may break existing applications that assumes storage will be released. {quote} I understand and hence there is no attempt to change FileSystem DELETE API itself, rather the changes are limited to NamenodeWebHdfsMethods endpoint only. {quote}It seems counter-intuitive that one can skipTrash from command line but can not using WebHDFS. Since keeping data in Trash for a while is usually a good idea, I think I'm fine with this feature proposal. Ideally we can expose -skipTrash parameter so users can choose. {quote} Thanks for this advice. Updated PR revision has this change. {quote}When I explore I found HDFS-14320 is all about the same idea and similar implementation. Do you guys want to post there and try with a collaboration to get this in? I did not look into that closely. {quote} Apologies, I was not aware of existing Jira already. Thanks for pointing this out. I have commented over this Jira as well. Since that Jira's last patch revision was more than 2 yr old, I am not sure if the patch is upto date or we will get active response from there. Please find attached screenshots and review as per your convenience. Thanks. was (Author: vjasani): {quote}Let's rename the JIRA subject and also PR by replacing "Web UI" with "HTTP API". HDFS "Web UI" is usually about the web portal that one can browse for information purpose. This JIRA is to change the "RESTful HTTP API", not about the Web UI. {quote} Thanks [~liuml07], yes the core changes are at API level and hence I have updated Jira as well as PR titles. However, in the recent revision, I have added "skipTrash" option in API, which can be accessed through UI. Please find attached screenshots. {quote}My only concern about this is that, the "Trash" concept is not a part of the FileSystem DELETE API. Changing this behavior may break existing applications that assumes storage will be released. {quote} I understand and hence there is no attempt to change FileSystem DELETE API itself, rather the changes are limited to NamenodeWebHdfsMethods endpoint only. {quote}It seems counter-intuitive that one can skipTrash from command line but can not using WebHDFS. Since keeping data in Trash for a while is usually a good idea, I think I'm fine with this feature proposal. Ideally we can expose -skipTrash parameter so users can choose. {quote} Thanks for this advice. Updated PR revision has this change. {quote}When I explore I found [HDFS-14320] is all about the same idea and similar implementation. Do you guys want to post there and try with a collaboration to get this in? I did not look into that closely. {quote} Apologies, I was not aware of existing Jira already. Thanks for pointing this out. I have commented over this Jira as well. Since that Jira's last patch revision was more than 2 yr old, I am not sure if the patch is upto date or we will get active response from there. !Screenshot 2021-04-23 at 4.19.42 PM.png! > Deleted data using HTTP API should be saved to the trash > > > Key: HDFS-15982 > URL: https://issues.apache.org/jira/browse/HDFS-15982 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Bhavik Patel >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot > 2021-04-23 at 4.36.57 PM.png > > Time Spent: 1h 50m > Remaining Estimate: 0h > > If we delete the data from the Web UI then it should be first moved to > configured/default Trash directory and after the trash interval time, it > should be removed. currently, data directly removed from the system[This > behavior should be the same as CLI cmd] > This can be helpful when the user accidentally deletes data from the Web UI. > Similarly we should provide "Skip Trash" option in HTTP API as well which > should be accessible through Web UI. -- T