Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]
exceptionfactory closed pull request #7999: NIFI-12334: Implement GCS option for FileResourceService URL: https://github.com/apache/nifi/pull/7999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]
mark-bathori commented on PR #7999: URL: https://github.com/apache/nifi/pull/7999#issuecomment-1859116369 Thanks @dan-s1 for the detailed review, I've updated the PR according it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]
dan-s1 commented on code in PR #7999: URL: https://github.com/apache/nifi/pull/7999#discussion_r1428160148 ## nifi-nar-bundles/nifi-gcp-bundle/nifi-gcp-processors/src/test/java/org/apache/nifi/processors/gcp/storage/GCSFileResourceServiceTest.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.gcp.storage; + +import com.google.auth.oauth2.GoogleCredentials; +import com.google.cloud.storage.Blob; +import com.google.cloud.storage.BlobId; +import com.google.cloud.storage.Storage; +import com.google.cloud.storage.testing.RemoteStorageHelper; +import org.apache.nifi.fileresource.service.api.FileResource; +import org.apache.nifi.gcp.credentials.service.GCPCredentialsService; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processors.gcp.credentials.service.GCPCredentialsControllerService; +import org.apache.nifi.processors.gcp.util.MockReadChannel; +import org.apache.nifi.reporting.InitializationException; +import org.apache.nifi.util.NoOpProcessor; +import org.apache.nifi.util.TestRunner; +import org.apache.nifi.util.TestRunners; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.mockito.Mock; +import org.mockito.MockitoAnnotations; + +import java.io.IOException; +import java.io.InputStream; +import java.util.Collections; +import java.util.HashMap; +import java.util.Map; + +import static org.apache.nifi.processors.gcp.util.GoogleUtils.GCP_CREDENTIALS_PROVIDER_SERVICE; +import static org.junit.jupiter.api.Assertions.assertArrayEquals; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNotNull; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.reset; Review Comment: ```suggestion ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]
dan-s1 commented on code in PR #7999: URL: https://github.com/apache/nifi/pull/7999#discussion_r1428126047 ## nifi-nar-bundles/nifi-gcp-bundle/nifi-gcp-processors/src/test/java/org/apache/nifi/processors/gcp/storage/GCSFileResourceServiceTest.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.gcp.storage; + +import com.google.auth.oauth2.GoogleCredentials; +import com.google.cloud.storage.Blob; +import com.google.cloud.storage.BlobId; +import com.google.cloud.storage.Storage; +import com.google.cloud.storage.testing.RemoteStorageHelper; +import org.apache.nifi.fileresource.service.api.FileResource; +import org.apache.nifi.gcp.credentials.service.GCPCredentialsService; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processors.gcp.credentials.service.GCPCredentialsControllerService; +import org.apache.nifi.processors.gcp.util.MockReadChannel; +import org.apache.nifi.reporting.InitializationException; +import org.apache.nifi.util.NoOpProcessor; +import org.apache.nifi.util.TestRunner; +import org.apache.nifi.util.TestRunners; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.mockito.Mock; +import org.mockito.MockitoAnnotations; + +import java.io.IOException; +import java.io.InputStream; +import java.util.Collections; +import java.util.HashMap; +import java.util.Map; + +import static org.apache.nifi.processors.gcp.util.GoogleUtils.GCP_CREDENTIALS_PROVIDER_SERVICE; +import static org.junit.jupiter.api.Assertions.assertArrayEquals; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNotNull; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.reset; +import static org.mockito.Mockito.when; + +public class GCSFileResourceServiceTest { + +private static final String TEST_NAME = GCSFileResourceServiceTest.class.getSimpleName(); +private static final String CONTROLLER_SERVICE = "GCPCredentialsService"; +private static final String BUCKET = RemoteStorageHelper.generateBucketName(); +private static final String KEY = "test-file"; +private static final String TEST_DATA = "test-data"; + +@Mock +Storage storage; + +private TestRunner runner; +private TestGCSFileResourceService service; +private AutoCloseable mocksCloseable; + +@BeforeEach +void setup() throws InitializationException { +mocksCloseable = MockitoAnnotations.openMocks(this); + +reset(storage); +mockBlob(); +service = new TestGCSFileResourceService(storage); +runner = TestRunners.newTestRunner(NoOpProcessor.class); +runner.addControllerService(TEST_NAME, service); +} + +@AfterEach +public void cleanup() throws Exception { +final AutoCloseable closeable = mocksCloseable; +mocksCloseable = null; +if (closeable != null) { +closeable.close(); +} +} + +@Test +void testValidBlob() throws InitializationException, IOException { +setUpService(KEY, BUCKET); + +final FileResource fileResource = service.getFileResource(Collections.emptyMap()); + +assertFileResource(fileResource); +} + +@Test +void testValidBlobUsingEL() throws IOException, InitializationException { +final Map attributes = setUpServiceWithEL(KEY, BUCKET); + +final FileResource fileResource = service.getFileResource(attributes); + +assertFileResource(fileResource); +} + +@Test +void testValidBlobUsingELButMissingAttribute() throws InitializationException { +runner.setValidateExpressionUsage(false); + +setUpServiceWithEL(KEY, BUCKET); + +assertThrows(IllegalArgumentException.class, () -> service.getFileResource(Collections.emptyMap())); +} + +@Test +void testNonExistingBlob() throws InitializationException { +final Map attributes = setUpServiceWithEL("invalid-key", "invalid-bucket"); + +
Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]
exceptionfactory commented on code in PR #7999: URL: https://github.com/apache/nifi/pull/7999#discussion_r1415731745 ## nifi-nar-bundles/nifi-gcp-bundle/nifi-gcp-processors/src/main/java/org/apache/nifi/processors/gcp/fileresource/service/GCSFileResourceService.java: ## @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.gcp.fileresource.service; + +import com.google.auth.oauth2.GoogleCredentials; +import com.google.cloud.ReadChannel; +import com.google.cloud.storage.Blob; +import com.google.cloud.storage.BlobId; +import com.google.cloud.storage.Storage; +import com.google.cloud.storage.StorageException; +import com.google.cloud.storage.StorageOptions; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.SeeAlso; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.documentation.UseCase; +import org.apache.nifi.annotation.lifecycle.OnDisabled; +import org.apache.nifi.annotation.lifecycle.OnEnabled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.context.PropertyContext; +import org.apache.nifi.controller.AbstractControllerService; +import org.apache.nifi.controller.ConfigurationContext; +import org.apache.nifi.expression.ExpressionLanguageScope; +import org.apache.nifi.fileresource.service.api.FileResource; +import org.apache.nifi.fileresource.service.api.FileResourceService; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.gcp.credentials.service.GCPCredentialsService; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.util.StandardValidators; +import org.apache.nifi.processors.gcp.storage.FetchGCSObject; +import org.apache.nifi.processors.gcp.util.GoogleUtils; + +import java.io.IOException; +import java.nio.channels.Channels; +import java.util.Arrays; +import java.util.List; +import java.util.Map; + +import static org.apache.nifi.processors.gcp.storage.StorageAttributes.BUCKET_ATTR; +import static org.apache.nifi.processors.gcp.storage.StorageAttributes.BUCKET_DESC; +import static org.apache.nifi.processors.gcp.storage.StorageAttributes.KEY_DESC; + +@Tags({"file", "resource", "gcs"}) +@SeeAlso({FetchGCSObject.class}) +@CapabilityDescription("Provides a Google Compute Storage (GCS) file resource for other components.") +@UseCase( +description = "Fetch a specific file from GCS." + +" The service provides higher performance compared to fetch processors when the data should be moved between different storages without any transformation.", +configuration = """ +"Bucket" = "${gcs.bucket}" +"Name" = "${filename}" + +The "GCP Credentials Provider Service" property should specify an instance of the GCPCredentialsService in order to provide credentials for accessing the bucket. +""" +) +public class GCSFileResourceService extends AbstractControllerService implements FileResourceService { + +public static final PropertyDescriptor BUCKET = new PropertyDescriptor +.Builder().name("gcs-bucket") +.displayName("Bucket") +.description(BUCKET_DESC) +.required(true) +.defaultValue("${" + BUCKET_ATTR + "}") + .expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor KEY = new PropertyDescriptor +.Builder().name("gcs-key") +.displayName("Name") Review Comment: ```suggestion .Builder() .name("Name") .displayName("Name") ``` ## nifi-nar-bundles/nifi-gcp-bundle/nifi-gcp-processors/src/main/java/org/apache/nifi/processors/gcp/fileresource/service/GCSFileResourceService.java: ## @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for
Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]
exceptionfactory commented on PR #7999: URL: https://github.com/apache/nifi/pull/7999#issuecomment-1808208309 Thanks for the responses @mark-bathori and @pvillard31, that is helpful. I think it is worth highlighting alternative components using the `SeeAlso` annotation, and also including a note in the Capability Description about the intended use case being extremely large files. This might also be a good use of the newer UseCase annotations. There are tradeoffs, but having a few lines in the description will help point to target use cases and lessen potential confusion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]
pvillard31 commented on PR #7999: URL: https://github.com/apache/nifi/pull/7999#issuecomment-1802147126 I think the end goal here is to provide very high performance when it's about moving data from an object store to another without any kind of transformation. Right now support for FileResourceService is only in Azure processors but I'd expect the same to be added in AWS/GCS supports and add FileResourceService implementations for Azure, AWS in addition to local file system and GCS in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]
mark-bathori commented on PR #7999: URL: https://github.com/apache/nifi/pull/7999#issuecomment-1802124689 Thanks for the question @exceptionfactory. Currently if someone would like to move their data from one storage to another they would need to use a List processor into a Fetch processor and then to a Put processor. The fetched data will be stored in the content repository before it can be used by the Put processor. The `FileResourceService` main purpose is to skip the content repository and reference the data directly in the Put processor so instead of eg.: `ListGCS` -> `FetchGCS` -> `PutADLS` the user can do `ListGCS` -> `PutADLS`. This should also provide better performance in the amount of processed data due avoiding the need of writing data in the content repository. The `FileResourceService` already have an implementation for local file systems(`StandardFileResourceService`) and my PR would provide and option for Google Cloud Storage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] NIFI-12334: Implement GCS option for FileResourceService [nifi]
mark-bathori opened a new pull request, #7999: URL: https://github.com/apache/nifi/pull/7999 # Summary [NIFI-12334](https://issues.apache.org/jira/browse/NIFI-12334) # Tracking Please complete the following tracking steps prior to pull request creation. ### Issue Tracking - [x] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue created ### Pull Request Tracking - [x] Pull Request title starts with Apache NiFi Jira issue number, such as `NIFI-0` - [x] Pull Request commit message starts with Apache NiFi Jira issue number, as such `NIFI-0` ### Pull Request Formatting - [x] Pull Request based on current revision of the `main` branch - [x] Pull Request refers to a feature branch with one commit containing changes # Verification Please indicate the verification steps performed prior to pull request creation. ### Build - [x] Build completed using `mvn clean install -P contrib-check` - [x] JDK 21 ### Licensing - [ ] New dependencies are compatible with the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License Policy](https://www.apache.org/legal/resolved.html) - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` files ### Documentation - [ ] Documentation formatting appears as expected in rendered files -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org