[ https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vadim Spector updated SENTRY-2014: ---------------------------------- Description: There are at least three places in the code where HDFS paths may not be parsed correctly: a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the path portion of URI into one slash. This method is used when getting paths data from HMS store. HDFS paths with duplicate slashes are perfectly legal and the specs refer to UNIX guidelines saying that multiple slashes should be treated as single slashes. If we keep multiple slashes in the path, such a path may be incorrectly split into path entries with some entries being empty, ultimately resulting in hard-to-troubleshoot ACL problems in the field. We should not assume that the URIs fed into parsePath() have already been normalized. It's easier to fix the code. b) NotificationProcessor.splitPath() is using "/" regex instead of the correct "/+" one. While the inputs to this class _may_ be controlled by Sentry software, which _may_ normalize paths properly, it is better not to make such assumptions and just fix the code. c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from database as "path.split("/") instead of path.split("/+") This may result in HDFS sync failures. was: There are at least three places in the code where HDFS paths may not be parsed correctly: a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the path portion of URI into one slash. This method is used when getting paths data from HMS store. HDFS paths with duplicate slashes are perfectly legal and the specs refer to UNIX guidelines saying that multiple slashes should be treated as single slashes. If we keep multiple slashes in the path, such a path may be incorrectly split into path entries with some entries being empty, ultimately resulting in hard-to-troubleshoot ACL problems in the field. We should not assume that the URIs fed into parsePath() have already been normalized. It's easier to fix the code. b) NotificationProcessor.splitPath() is using "/" regex instead of the correct "/+" one. While the inputs to this class _may_ be controlled by Sentry software, which _may_ normalize paths properly, it is better not to make such assumptions and just fix the code. c) SentryStore > Incorrect handling of HDFS paths with multiple slashes > ------------------------------------------------------ > > Key: SENTRY-2014 > URL: https://issues.apache.org/jira/browse/SENTRY-2014 > Project: Sentry > Issue Type: Bug > Reporter: Vadim Spector > Assignee: Vadim Spector > > There are at least three places in the code where HDFS paths may not be > parsed correctly: > a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the > path portion of URI into one slash. This method is used when getting paths > data from HMS store. HDFS paths with duplicate slashes are perfectly legal > and the specs refer to UNIX guidelines saying that multiple slashes should be > treated as single slashes. If we keep multiple slashes in the path, such a > path may be incorrectly split into path entries with some entries being > empty, ultimately resulting in hard-to-troubleshoot ACL problems in the > field. We should not assume that the URIs fed into parsePath() have already > been normalized. It's easier to fix the code. > b) NotificationProcessor.splitPath() is using "/" regex instead of the > correct "/+" one. While the inputs to this class _may_ be controlled by > Sentry software, which _may_ normalize paths properly, it is better not to > make such assumptions and just fix the code. > c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from > database as "path.split("/") instead of path.split("/+") > This may result in HDFS sync failures. -- This message was sent by Atlassian JIRA (v6.4.14#64029)