Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]

2023-12-22 Thread via GitHub


exceptionfactory closed pull request #7999: NIFI-12334: Implement GCS option 
for FileResourceService
URL: https://github.com/apache/nifi/pull/7999


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]

2023-12-17 Thread via GitHub


mark-bathori commented on PR #7999:
URL: https://github.com/apache/nifi/pull/7999#issuecomment-1859116369

   Thanks @dan-s1 for the detailed review, I've updated the PR according it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]

2023-12-15 Thread via GitHub


dan-s1 commented on code in PR #7999:
URL: https://github.com/apache/nifi/pull/7999#discussion_r1428160148


##
nifi-nar-bundles/nifi-gcp-bundle/nifi-gcp-processors/src/test/java/org/apache/nifi/processors/gcp/storage/GCSFileResourceServiceTest.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.gcp.storage;
+
+import com.google.auth.oauth2.GoogleCredentials;
+import com.google.cloud.storage.Blob;
+import com.google.cloud.storage.BlobId;
+import com.google.cloud.storage.Storage;
+import com.google.cloud.storage.testing.RemoteStorageHelper;
+import org.apache.nifi.fileresource.service.api.FileResource;
+import org.apache.nifi.gcp.credentials.service.GCPCredentialsService;
+import org.apache.nifi.processor.exception.ProcessException;
+import 
org.apache.nifi.processors.gcp.credentials.service.GCPCredentialsControllerService;
+import org.apache.nifi.processors.gcp.util.MockReadChannel;
+import org.apache.nifi.reporting.InitializationException;
+import org.apache.nifi.util.NoOpProcessor;
+import org.apache.nifi.util.TestRunner;
+import org.apache.nifi.util.TestRunners;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.mockito.Mock;
+import org.mockito.MockitoAnnotations;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+import static 
org.apache.nifi.processors.gcp.util.GoogleUtils.GCP_CREDENTIALS_PROVIDER_SERVICE;
+import static org.junit.jupiter.api.Assertions.assertArrayEquals;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.reset;

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]

2023-12-15 Thread via GitHub


dan-s1 commented on code in PR #7999:
URL: https://github.com/apache/nifi/pull/7999#discussion_r1428126047


##
nifi-nar-bundles/nifi-gcp-bundle/nifi-gcp-processors/src/test/java/org/apache/nifi/processors/gcp/storage/GCSFileResourceServiceTest.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.gcp.storage;
+
+import com.google.auth.oauth2.GoogleCredentials;
+import com.google.cloud.storage.Blob;
+import com.google.cloud.storage.BlobId;
+import com.google.cloud.storage.Storage;
+import com.google.cloud.storage.testing.RemoteStorageHelper;
+import org.apache.nifi.fileresource.service.api.FileResource;
+import org.apache.nifi.gcp.credentials.service.GCPCredentialsService;
+import org.apache.nifi.processor.exception.ProcessException;
+import 
org.apache.nifi.processors.gcp.credentials.service.GCPCredentialsControllerService;
+import org.apache.nifi.processors.gcp.util.MockReadChannel;
+import org.apache.nifi.reporting.InitializationException;
+import org.apache.nifi.util.NoOpProcessor;
+import org.apache.nifi.util.TestRunner;
+import org.apache.nifi.util.TestRunners;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.mockito.Mock;
+import org.mockito.MockitoAnnotations;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+import static 
org.apache.nifi.processors.gcp.util.GoogleUtils.GCP_CREDENTIALS_PROVIDER_SERVICE;
+import static org.junit.jupiter.api.Assertions.assertArrayEquals;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.reset;
+import static org.mockito.Mockito.when;
+
+public class GCSFileResourceServiceTest {
+
+private static final String TEST_NAME = 
GCSFileResourceServiceTest.class.getSimpleName();
+private static final String CONTROLLER_SERVICE = "GCPCredentialsService";
+private static final String BUCKET = 
RemoteStorageHelper.generateBucketName();
+private static final String KEY = "test-file";
+private static final String TEST_DATA = "test-data";
+
+@Mock
+Storage storage;
+
+private TestRunner runner;
+private TestGCSFileResourceService service;
+private AutoCloseable mocksCloseable;
+
+@BeforeEach
+void setup() throws InitializationException {
+mocksCloseable = MockitoAnnotations.openMocks(this);
+
+reset(storage);
+mockBlob();
+service = new TestGCSFileResourceService(storage);
+runner = TestRunners.newTestRunner(NoOpProcessor.class);
+runner.addControllerService(TEST_NAME, service);
+}
+
+@AfterEach
+public void cleanup() throws Exception {
+final AutoCloseable closeable = mocksCloseable;
+mocksCloseable = null;
+if (closeable != null) {
+closeable.close();
+}
+}
+
+@Test
+void testValidBlob() throws InitializationException, IOException {
+setUpService(KEY, BUCKET);
+
+final FileResource fileResource = 
service.getFileResource(Collections.emptyMap());
+
+assertFileResource(fileResource);
+}
+
+@Test
+void testValidBlobUsingEL() throws IOException, InitializationException {
+final Map attributes = setUpServiceWithEL(KEY, BUCKET);
+
+final FileResource fileResource = service.getFileResource(attributes);
+
+assertFileResource(fileResource);
+}
+
+@Test
+void testValidBlobUsingELButMissingAttribute() throws 
InitializationException {
+runner.setValidateExpressionUsage(false);
+
+setUpServiceWithEL(KEY, BUCKET);
+
+assertThrows(IllegalArgumentException.class, () -> 
service.getFileResource(Collections.emptyMap()));
+}
+
+@Test
+void testNonExistingBlob() throws InitializationException {
+final Map attributes = 
setUpServiceWithEL("invalid-key", "invalid-bucket");
+
+

Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]

2023-12-05 Thread via GitHub


exceptionfactory commented on code in PR #7999:
URL: https://github.com/apache/nifi/pull/7999#discussion_r1415731745


##
nifi-nar-bundles/nifi-gcp-bundle/nifi-gcp-processors/src/main/java/org/apache/nifi/processors/gcp/fileresource/service/GCSFileResourceService.java:
##
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.gcp.fileresource.service;
+
+import com.google.auth.oauth2.GoogleCredentials;
+import com.google.cloud.ReadChannel;
+import com.google.cloud.storage.Blob;
+import com.google.cloud.storage.BlobId;
+import com.google.cloud.storage.Storage;
+import com.google.cloud.storage.StorageException;
+import com.google.cloud.storage.StorageOptions;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.documentation.UseCase;
+import org.apache.nifi.annotation.lifecycle.OnDisabled;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.context.PropertyContext;
+import org.apache.nifi.controller.AbstractControllerService;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.fileresource.service.api.FileResource;
+import org.apache.nifi.fileresource.service.api.FileResourceService;
+import org.apache.nifi.flowfile.attributes.CoreAttributes;
+import org.apache.nifi.gcp.credentials.service.GCPCredentialsService;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.processors.gcp.storage.FetchGCSObject;
+import org.apache.nifi.processors.gcp.util.GoogleUtils;
+
+import java.io.IOException;
+import java.nio.channels.Channels;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+
+import static 
org.apache.nifi.processors.gcp.storage.StorageAttributes.BUCKET_ATTR;
+import static 
org.apache.nifi.processors.gcp.storage.StorageAttributes.BUCKET_DESC;
+import static 
org.apache.nifi.processors.gcp.storage.StorageAttributes.KEY_DESC;
+
+@Tags({"file", "resource", "gcs"})
+@SeeAlso({FetchGCSObject.class})
+@CapabilityDescription("Provides a Google Compute Storage (GCS) file resource 
for other components.")
+@UseCase(
+description = "Fetch a specific file from GCS." +
+" The service provides higher performance compared to fetch 
processors when the data should be moved between different storages without any 
transformation.",
+configuration = """
+"Bucket" = "${gcs.bucket}"
+"Name" = "${filename}"
+
+The "GCP Credentials Provider Service" property should specify 
an instance of the GCPCredentialsService in order to provide credentials for 
accessing the bucket.
+"""
+)
+public class GCSFileResourceService extends AbstractControllerService 
implements FileResourceService {
+
+public static final PropertyDescriptor BUCKET = new PropertyDescriptor
+.Builder().name("gcs-bucket")
+.displayName("Bucket")
+.description(BUCKET_DESC)
+.required(true)
+.defaultValue("${" + BUCKET_ATTR + "}")
+
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+.build();
+
+public static final PropertyDescriptor KEY = new PropertyDescriptor
+.Builder().name("gcs-key")
+.displayName("Name")

Review Comment:
   
   ```suggestion
   .Builder()
   .name("Name")
   .displayName("Name")
   ```



##
nifi-nar-bundles/nifi-gcp-bundle/nifi-gcp-processors/src/main/java/org/apache/nifi/processors/gcp/fileresource/service/GCSFileResourceService.java:
##
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for 

Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]

2023-11-13 Thread via GitHub


exceptionfactory commented on PR #7999:
URL: https://github.com/apache/nifi/pull/7999#issuecomment-1808208309

   Thanks for the responses @mark-bathori and @pvillard31, that is helpful. I 
think it is worth highlighting alternative components using the `SeeAlso` 
annotation, and also including a note in the Capability Description about the 
intended use case being extremely large files. This might also be a good use of 
the newer UseCase annotations. There are tradeoffs, but having a few lines in 
the description will help point to target use cases and lessen potential 
confusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]

2023-11-08 Thread via GitHub


pvillard31 commented on PR #7999:
URL: https://github.com/apache/nifi/pull/7999#issuecomment-1802147126

   I think the end goal here is to provide very high performance when it's 
about moving data from an object store to another without any kind of 
transformation. Right now support for FileResourceService is only in Azure 
processors but I'd expect the same to be added in AWS/GCS supports and add 
FileResourceService implementations for Azure, AWS in addition to local file 
system and GCS in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]

2023-11-08 Thread via GitHub


mark-bathori commented on PR #7999:
URL: https://github.com/apache/nifi/pull/7999#issuecomment-1802124689

   Thanks for the question @exceptionfactory.
   
   Currently if someone would like to move their data from one storage to 
another they would need to use a List processor into a Fetch processor and then 
to a Put processor. The fetched data will be stored in the content repository 
before it can be used by the Put processor. The `FileResourceService` main 
purpose is to skip the content repository and reference the data directly in 
the Put processor so instead of eg.: `ListGCS` -> `FetchGCS` -> `PutADLS` the 
user can do `ListGCS` -> `PutADLS`. 
   This should also provide better performance in the amount of processed data 
due avoiding the need of writing data in the content repository.
   The `FileResourceService` already have an implementation for local file 
systems(`StandardFileResourceService`) and my PR would provide and option for 
Google Cloud Storage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]

2023-11-08 Thread via GitHub


mark-bathori opened a new pull request, #7999:
URL: https://github.com/apache/nifi/pull/7999

   
   
   
   
   
   
   
   
   
   
   
   
   
   # Summary
   
   [NIFI-12334](https://issues.apache.org/jira/browse/NIFI-12334)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [x] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [x] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [x] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [x] Pull Request based on current revision of the `main` branch
   - [x] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [x] Build completed using `mvn clean install -P contrib-check`
 - [x] JDK 21
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org