[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

2021-10-11 Thread GitBox


Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726491261



##
File path: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##
@@ -59,7 +59,25 @@ public void setup() {
 @Test
 public void testFetchStaticFileThatExists() throws IOException {
 final String file = "src/test/resources/testdata/randombytes-1";
-runner.setProperty(FetchHDFS.FILENAME, file);
+final String fileWithMultipliedSeparators = 
"src/testresources//testdata/randombytes-1";
+runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+runner.enqueue(new String("trigger flow file"));
+runner.run();
+runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+final List provenanceEvents = 
runner.getProvenanceEvents();
+assertEquals(1, provenanceEvents.size());
+final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+// If it runs with a real HDFS, the protocol will be "hdfs://", but 
with a local filesystem, just assert the filename.
+assertTrue(fetchEvent.getTransitUri().endsWith(file));
+}
+
+@Test
+public void testFetchStaticFileThatExistsWithAbsolutePath() throws 
IOException {

Review comment:
   I'm not sure how this is expected to work on Windows as 
org.apache.hadoop.fs.Path uses unix style `"/"` separators and the tests are 
using the Java path which is platform independent. One possible solution might 
be using `FilenameUtils::separatorsToSystem` on `fetchEvent.getTransitUri()`, 
however this needs to be checked.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

2021-10-11 Thread GitBox


Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726491261



##
File path: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##
@@ -59,7 +59,25 @@ public void setup() {
 @Test
 public void testFetchStaticFileThatExists() throws IOException {
 final String file = "src/test/resources/testdata/randombytes-1";
-runner.setProperty(FetchHDFS.FILENAME, file);
+final String fileWithMultipliedSeparators = 
"src/testresources//testdata/randombytes-1";
+runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+runner.enqueue(new String("trigger flow file"));
+runner.run();
+runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+final List provenanceEvents = 
runner.getProvenanceEvents();
+assertEquals(1, provenanceEvents.size());
+final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+// If it runs with a real HDFS, the protocol will be "hdfs://", but 
with a local filesystem, just assert the filename.
+assertTrue(fetchEvent.getTransitUri().endsWith(file));
+}
+
+@Test
+public void testFetchStaticFileThatExistsWithAbsolutePath() throws 
IOException {

Review comment:
   I'm not sure how this is expected to work on Windows as 
org.apache.hadoop.fs.Path uses unix style "/" separators and the tests are 
using the Java path which is platform independent. One possible solution might 
be using FilenameUtils::separatorsToSystem on fetchEvent.getTransitUri(), 
however this needs to be checked.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

2021-10-11 Thread GitBox


Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726488053



##
File path: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##
@@ -59,7 +59,25 @@ public void setup() {
 @Test
 public void testFetchStaticFileThatExists() throws IOException {
 final String file = "src/test/resources/testdata/randombytes-1";
-runner.setProperty(FetchHDFS.FILENAME, file);
+final String fileWithMultipliedSeparators = 
"src/testresources//testdata/randombytes-1";
+runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+runner.enqueue(new String("trigger flow file"));
+runner.run();
+runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+final List provenanceEvents = 
runner.getProvenanceEvents();
+assertEquals(1, provenanceEvents.size());
+final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+// If it runs with a real HDFS, the protocol will be "hdfs://", but 
with a local filesystem, just assert the filename.
+assertTrue(fetchEvent.getTransitUri().endsWith(file));
+}
+
+@Test
+public void testFetchStaticFileThatExistsWithAbsolutePath() throws 
IOException {
+final File destination = new 
File("src/test/resources/testdata/randombytes-1");
+final String file = destination.getAbsolutePath();
+final String fileWithMultipliedSeparators = "/" + 
destination.getAbsolutePath();

Review comment:
   I think you can use `file` instead of calling `getAbsolutePath()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

2021-10-11 Thread GitBox


Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726060005



##
File path: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##
@@ -59,7 +59,25 @@ public void setup() {
 @Test
 public void testFetchStaticFileThatExists() throws IOException {
 final String file = "src/test/resources/testdata/randombytes-1";
-runner.setProperty(FetchHDFS.FILENAME, file);
+final String fileWithMultipliedSeparators = 
"src/testresources//testdata/randombytes-1";

Review comment:
   Minor: Would you please remove the `throws IOException`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

2021-10-11 Thread GitBox


Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726059622



##
File path: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##
@@ -59,7 +59,25 @@ public void setup() {
 @Test
 public void testFetchStaticFileThatExists() throws IOException {
 final String file = "src/test/resources/testdata/randombytes-1";
-runner.setProperty(FetchHDFS.FILENAME, file);
+final String fileWithMultipliedSeparators = 
"src/testresources//testdata/randombytes-1";
+runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+runner.enqueue(new String("trigger flow file"));
+runner.run();
+runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+final List provenanceEvents = 
runner.getProvenanceEvents();
+assertEquals(1, provenanceEvents.size());
+final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+// If it runs with a real HDFS, the protocol will be "hdfs://", but 
with a local filesystem, just assert the filename.
+assertTrue(fetchEvent.getTransitUri().endsWith(file));
+}
+
+@Test
+public void testFetchStaticFileThatExistsWithAbsolutePath() throws 
IOException {

Review comment:
   Minor: Would you please remove the `throws IOException`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

2021-10-11 Thread GitBox


Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726058729



##
File path: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##
@@ -59,7 +59,25 @@ public void setup() {
 @Test
 public void testFetchStaticFileThatExists() throws IOException {
 final String file = "src/test/resources/testdata/randombytes-1";
-runner.setProperty(FetchHDFS.FILENAME, file);
+final String fileWithMultipliedSeparators = 
"src/testresources//testdata/randombytes-1";
+runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+runner.enqueue(new String("trigger flow file"));
+runner.run();
+runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+final List provenanceEvents = 
runner.getProvenanceEvents();
+assertEquals(1, provenanceEvents.size());
+final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+// If it runs with a real HDFS, the protocol will be "hdfs://", but 
with a local filesystem, just assert the filename.
+assertTrue(fetchEvent.getTransitUri().endsWith(file));
+}
+
+@Test
+public void testFetchStaticFileThatExistsWithAbsolutePath() throws 
IOException {

Review comment:
   I'd recommend using hamcrest matchers for asserting the string ends with 
the filename. Currently when it fails, we can't see the difference.
   
   ```suggestion
   ...
   assertThat(fetchEvent.getTransitUri(), StringEndsWith.endsWith(file));
   ```

##
File path: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##
@@ -59,7 +59,25 @@ public void setup() {
 @Test
 public void testFetchStaticFileThatExists() throws IOException {
 final String file = "src/test/resources/testdata/randombytes-1";
-runner.setProperty(FetchHDFS.FILENAME, file);
+final String fileWithMultipliedSeparators = 
"src/testresources//testdata/randombytes-1";
+runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+runner.enqueue(new String("trigger flow file"));
+runner.run();
+runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+final List provenanceEvents = 
runner.getProvenanceEvents();
+assertEquals(1, provenanceEvents.size());
+final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+// If it runs with a real HDFS, the protocol will be "hdfs://", but 
with a local filesystem, just assert the filename.
+assertTrue(fetchEvent.getTransitUri().endsWith(file));
+}
+
+@Test
+public void testFetchStaticFileThatExistsWithAbsolutePath() throws 
IOException {

Review comment:
   I'd recommend using hamcrest matchers for asserting the string ends with 
the filename. Currently when it fails, we can't see the difference.
   
   ```
   ...
   assertThat(fetchEvent.getTransitUri(), StringEndsWith.endsWith(file));
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

2021-10-08 Thread GitBox


Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r725132050



##
File path: 
nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, 
PropertyDescriptor prop
 }
 
 protected Path getNormalizedPath(final String rawPath) {
-final Path path = new Path(rawPath);
-final URI uri = path.toUri();
-
-final URI fileSystemUri = getFileSystem().getUri();
-
-if (uri.getScheme() != null) {
-if (!uri.getScheme().equals(fileSystemUri.getScheme()) || 
!uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-getLogger().warn("The filesystem component of the URI 
configured ({}) does not match the filesystem URI from the Hadoop configuration 
file ({}) " +
-"and will be ignored.", uri, fileSystemUri);
-}
-
-return new Path(uri.getPath());
-} else {
-return path;
-}
+   return getNormalizedPath(rawPath, Optional.empty());
 }
 
 protected Path getNormalizedPath(final ProcessContext context, final 
PropertyDescriptor property, final FlowFile flowFile) {
 final String propertyValue = 
context.getProperty(property).evaluateAttributeExpressions(flowFile).getValue();
-final Path path = new Path(propertyValue);
-final URI uri = path.toUri();
+return getNormalizedPath(propertyValue, 
Optional.of(property.getDisplayName()));
+}
 
+private Path getNormalizedPath(final String rawPath, final 
Optional propertyName) {

Review comment:
   What do you think of removing the optional from the method parameter? I 
think it could be a simple string with a null check. If I remember well 
optionals are not meant to be used as parameters because they introduce a new 
object state: null, not empty, empty instead of null and not null. I think in 
this case passing null to this method would be fine, and I could also find 
several examples in the code where null was passed as flowfile.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

2021-10-08 Thread GitBox


Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r724562953



##
File path: 
nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, 
PropertyDescriptor prop
 }
 
 protected Path getNormalizedPath(final String rawPath) {
-final Path path = new Path(rawPath);
-final URI uri = path.toUri();
-
-final URI fileSystemUri = getFileSystem().getUri();
-
-if (uri.getScheme() != null) {
-if (!uri.getScheme().equals(fileSystemUri.getScheme()) || 
!uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-getLogger().warn("The filesystem component of the URI 
configured ({}) does not match the filesystem URI from the Hadoop configuration 
file ({}) " +
-"and will be ignored.", uri, fileSystemUri);
-}
-
-return new Path(uri.getPath());
-} else {
-return path;
-}
+   return getNormalizedPath(rawPath, Optional.empty());
 }
 
 protected Path getNormalizedPath(final ProcessContext context, final 
PropertyDescriptor property, final FlowFile flowFile) {
 final String propertyValue = 
context.getProperty(property).evaluateAttributeExpressions(flowFile).getValue();
-final Path path = new Path(propertyValue);
-final URI uri = path.toUri();
+return getNormalizedPath(propertyValue, 
Optional.of(property.getDisplayName()));
+}
 
+private Path getNormalizedPath(final String rawPath, final 
Optional propertyName) {
+final URI uri = new Path(rawPath).toUri();
 final URI fileSystemUri = getFileSystem().getUri();
+final String path;
 
 if (uri.getScheme() != null) {
 if (!uri.getScheme().equals(fileSystemUri.getScheme()) || 
!uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-getLogger().warn("The filesystem component of the URI 
configured in the '{}' property ({}) does not match the filesystem URI from the 
Hadoop configuration file ({}) " +
-"and will be ignored.", property.getDisplayName(), 
uri, fileSystemUri);
+if (propertyName.isPresent()) {

Review comment:
   Could you please add a test case which covers this part? It seems like 
Path normalizes the paths automatically and the difference can't be seen 
between the if and else branches related to that.

##
File path: 
nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, 
PropertyDescriptor prop
 }
 
 protected Path getNormalizedPath(final String rawPath) {

Review comment:
   This is only called from FetchHDFS line 128 where the getPath method 
uses a _FILENAME_ property:
   
   `protected String getPath(final ProcessContext context, final FlowFile 
flowFile) {
   return 
context.getProperty(FILENAME).evaluateAttributeExpressions(flowFile).getValue();
}`
   
   I think if we call 
   
   `path = getNormalizedPath(context, FILENAME, flowFile);`
   
   there, we can eliminate this method and also the optional propertyName 
attribute from AbtractHadoopProcessor 692.

##
File path: 
nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, 
PropertyDescriptor prop
 }
 
 protected Path getNormalizedPath(final String rawPath) {
-final Path path = new Path(rawPath);
-final URI uri = path.toUri();
-
-final URI fileSystemUri = getFileSystem().getUri();
-
-if (uri.getScheme() != null) {
-if (!uri.getScheme().equals(fileSystemUri.getScheme()) || 
!uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-getLogger().warn("The filesystem component of the URI 
configured ({}) does not match the filesystem URI from the Hadoop configuration 
file ({}) " +
-"and will be ignored.", uri, fileSystemUri);
-}
-
-return new Path(uri.getPath());
-} else {
-return path;
-}
+   return getNormalizedPath(rawPath, Optional.empty());
 }
 
 protected Path getNormalizedPath(final ProcessContext context, final 
PropertyDescriptor property, final FlowFile flowFile) {
 final String propertyValue = 
context.getProperty(property).evaluateAttributeExpressions(flowFile).getValue();
-final Path path = new Path(propertyValue);
-final URI uri = path.toUri();
+return getNormalizedPath(propertyValue,