ijokarumawak commented on a change in pull request #3483: NIFI-6275 ListHDFS 
now ignores scheme and authority when uses "Full P…
URL: https://github.com/apache/nifi/pull/3483#discussion_r319786069
 
 

 ##########
 File path: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java
 ##########
 @@ -527,7 +527,7 @@ private PathFilter createPathFilter(final ProcessContext 
context) {
         return path -> {
             final boolean accepted;
             if (FILTER_FULL_PATH_VALUE.getValue().equals(filterMode)) {
-                accepted = filePattern.matcher(path.toString()).matches();
+                accepted = 
filePattern.matcher(Path.getPathWithoutSchemeAndAuthority(path).toString()).matches();
 
 Review comment:
   If it's possible that this improvement may break existing user flows, then 
I'd like to discuss about other approaches to opt-in this.
   
   We can provide different UX via different approaches:
   1. Current approach: If existing flows regex contains schema or authority, 
their flow will not list files as before. Users may wonder what goes wrong. May 
not notice the change if they don't read docs..
   2. Adding new 'Filter without Schema and Authority' property:
       - A. If we leave its default value blank and implement a custom 
validation to require it when filter regex is not empty, then we can make 
existing ListHDFS invalid. That will give user to chance to review their 
configuration.
       - B. If we use `false` as default value, existing flows work as is. 
While this improvement can be opt-in. The most safe approach, but a con is 
people may forget enabling this option.
   3. Adding new 'Full Path (without schema and authority)' filter mode, or add 
new one and rename the existing one's display name to 'Full Path (include 
schema and authority)': this guarantees existing flows work as is, while 
providing easy configuration UX for new setups.
   
   I personally prefer the option 3 above. How do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to