[ 
https://issues.apache.org/jira/browse/NIFI-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Storck updated NIFI-6275:
------------------------------
    Description: 
When using the *{{Full Path}}* filter mode, the regex is applied to the URI 
returned for each file which includes the scheme and authority (hostname, HA 
namespace, port).  For the filter to work across multiple HDFS installations 
(such as a flow used on multiple environments that is retrieved from NiFi 
Registry), the regex filter would have to account for the scheme and authority 
by matching possible scheme and authority values.

To make it easier for the user, the *{{Full Path}}* filter mode's filter regex 
should only be applied to the path components of the URI, without the scheme 
and authority.  This can be done by updating the filter for *{{Full Path}}* 
mode to use: 
[Path.getPathWithoutSchemeAndAuthority(Path)|https://hadoop.apache.org/docs/r3.0.0/api/org/apache/hadoop/fs/Path.html#getPathWithoutSchemeAndAuthority-org.apache.hadoop.fs.Path-].
  This will bring the regex values in line with the other modes, since those 
are only applied to the value of *{{Path.getName()}}*.

Migration guidance will be needed when this improvement is released.  Existing 
regex values for *{{Full Path}}* filter mode that accepted any scheme and 
authority will still work. 
 Those that specify a scheme and authority will *_not_* work, and will have to 
be updated to specify only path components.

  was:
When using the *{{Full Path}}* filter mode, the regex is applied to the URI 
returned for each file which includes the scheme and authority (hostname, HA 
namespace, port).  For the filter to work across multiple HDFS installations 
(such as a flow used on multiple environments that is retrieved from NiFi 
Registry), the regex filter would have to account for the scheme and authority 
by matching possible scheme and authority values.

To make it easier for the user, the *{{Full Path}}* filter mode's filter regex 
should only be applied to the path components of the URI, without the scheme 
and authority.  This can be done by updating the filter for *{{Full Path}}* 
mode to use: 
[https://hadoop.apache.org/docs/r3.0.0/api/org/apache/hadoop/fs/Path.html#getPathWithoutSchemeAndAuthority-org.apache.hadoop.fs.Path-|Path.getPathWithoutSchemeAndAuthority(Path)].
  This will bring the regex values in line with the other modes, since those 
are only applied to the value of *{{Path.getName()}}*.

Migration guidance will be needed when this improvement is released.  Existing 
regex values for *{{Full Path}}* filter mode that accepted any scheme and 
authority will still work. 
 Those that specify a scheme and authority will *_not_* work, and will have to 
be updated to specify only path components.


> ListHDFS with Full Path filter mode regex does not work as intended
> -------------------------------------------------------------------
>
>                 Key: NIFI-6275
>                 URL: https://issues.apache.org/jira/browse/NIFI-6275
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Documentation & Website, Extensions
>    Affects Versions: 1.8.0, 1.9.0, 1.9.1, 1.9.2
>            Reporter: Jeff Storck
>            Assignee: Jeff Storck
>            Priority: Minor
>
> When using the *{{Full Path}}* filter mode, the regex is applied to the URI 
> returned for each file which includes the scheme and authority (hostname, HA 
> namespace, port).  For the filter to work across multiple HDFS installations 
> (such as a flow used on multiple environments that is retrieved from NiFi 
> Registry), the regex filter would have to account for the scheme and 
> authority by matching possible scheme and authority values.
> To make it easier for the user, the *{{Full Path}}* filter mode's filter 
> regex should only be applied to the path components of the URI, without the 
> scheme and authority.  This can be done by updating the filter for *{{Full 
> Path}}* mode to use: 
> [Path.getPathWithoutSchemeAndAuthority(Path)|https://hadoop.apache.org/docs/r3.0.0/api/org/apache/hadoop/fs/Path.html#getPathWithoutSchemeAndAuthority-org.apache.hadoop.fs.Path-].
>   This will bring the regex values in line with the other modes, since those 
> are only applied to the value of *{{Path.getName()}}*.
> Migration guidance will be needed when this improvement is released.  
> Existing regex values for *{{Full Path}}* filter mode that accepted any 
> scheme and authority will still work. 
>  Those that specify a scheme and authority will *_not_* work, and will have 
> to be updated to specify only path components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to