[GitHub] nifi pull request #2003: NIFI-4181 CSVReader and CSVRecordSetWriter can be u...
GitHub user Wesley-Lawrence opened a pull request: https://github.com/apache/nifi/pull/2003 NIFI-4181 CSVReader and CSVRecordSetWriter can be used by just explictly declaring their columns. Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [â] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [â] Does your PR title start with NIFI- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [â] Has your PR been rebased against the latest commit within the target branch (typically master)? - [â] Is your initial contribution a single, squashed commit? ### For code changes: - [â] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [â] Have you written or updated unit tests to verify your changes? - [N/A] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [N/A] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? - [N/A] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? - [â] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [N/A] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Wesley-Lawrence/nifi NIFI-4181 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/2003.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2003 commit cc3f6af7c4d751e813384e166aa87821ace42273 Author: Wesley-Lawrence Date: 2017-07-12T21:12:55Z NIFI-4181 CSVReader and CSVRecordSetWriter can be used by just explicitly declaring their columns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #2003: NIFI-4181 CSVReader and CSVRecordSetWriter can be u...
Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2003#discussion_r127116107 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVUtils.java --- @@ -37,6 +45,16 @@ "The format used by Informix when issuing the UNLOAD TO file_name command with escaping disabled"); static final AllowableValue MYSQL = new AllowableValue("mysql", "MySQL Format", "CSV data follows the format used by MySQL"); +static final AllowableValue SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS = new AllowableValue("csv-explicit-columns", "Use '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' Property", +"Takes the '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' property value as the explicit definition of the CSV columns."); + +static final PropertyDescriptor EXPLICIT_COLUMNS = new PropertyDescriptor.Builder() +.name(EXPLICIT_COLUMNS_DISPLAY_NAME) +.description("Specifies the CSV columns expected as a comma separated list. Only used with the Schema Access Strategy '" + SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS.getDisplayName() + "'.") +.expressionLanguageSupported(false) --- End diff -- Is there any reason why expression language should not be supported? Using a Variable Registry for example, the header list could be set externally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #2003: NIFI-4181 CSVReader and CSVRecordSetWriter can be u...
Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2003#discussion_r127120939 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java --- @@ -49,7 +51,7 @@ + "the values. See Controller Service's Usage for further documentation.") public class CSVReader extends SchemaRegistryService implements RecordReaderFactory { -private final AllowableValue headerDerivedAllowableValue = new AllowableValue("csv-header-derived", "Use String Fields From Header", +static final AllowableValue HEADER_DERIVED_ALLOWABLE_VALUE = new AllowableValue("csv-header-derived", "Use String Fields From Header", --- End diff -- Making this static (and capitalizing the name) aligns with the common constant / property pattern, thanks! However it also looks like it can remain private (at least that's what IntelliJ tells me ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #2003: NIFI-4181 CSVReader and CSVRecordSetWriter can be u...
Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2003#discussion_r127115982 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVUtils.java --- @@ -37,6 +45,16 @@ "The format used by Informix when issuing the UNLOAD TO file_name command with escaping disabled"); static final AllowableValue MYSQL = new AllowableValue("mysql", "MySQL Format", "CSV data follows the format used by MySQL"); +static final AllowableValue SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS = new AllowableValue("csv-explicit-columns", "Use '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' Property", +"Takes the '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' property value as the explicit definition of the CSV columns."); + +static final PropertyDescriptor EXPLICIT_COLUMNS = new PropertyDescriptor.Builder() +.name(EXPLICIT_COLUMNS_DISPLAY_NAME) --- End diff -- The common convention here is to set .name() to a machine-friendly name (like 'csv-explicit-columns') and set .displayName() to the user-friendly name (EXPLICIT_COLUMNS_DISPLAY_NAME) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #2003: NIFI-4181 CSVReader and CSVRecordSetWriter can be u...
Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2003#discussion_r127120413 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVRecordSetWriter.java --- @@ -48,6 +54,7 @@ @Override protected List getSupportedPropertyDescriptors() { final List properties = new ArrayList<>(super.getSupportedPropertyDescriptors()); +properties.add(CSVUtils.EXPLICIT_COLUMNS); --- End diff -- Sorry this is the wrong spot to leave the comment but since @CapabilityDescription (line 40/46) wasn't part of the diff, I couldn't leave the comment there. Setting Explicit Columns can affect the first line written, so the CapabilityDescription text should be updated to include that. In addition, explicitly setting the output columns is subject to the same rules as if you had an output Avro schema; namely, the output field/column names have to match the input names or else there will be empty columns/fields in the output. In the general case this is covered by processor or reader/writer doc, but since this is CSV-specific I think we should make this clear. On one hand, there is the interesting feature that columns can be re-arranged by specifying the input fields in a different order in the explicit output columns; but on the other hand, if the user expects to use the writer to rename the fields (because the names are positional), that won't work. In general, I'd like this extra flexibility/power to not be too confusing for the user, or its usefulness will be overshadowed by its complexity. For example, you can use the Explicit Columns in the CSVReader to rename the columns, and in the CSVRecordSetWriter to reorder the columns, but the inverse is not true. These could remain undocumented/unsupported features, and/or we'd need very clear documentation explaining their use. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #2003: NIFI-4181 CSVReader and CSVRecordSetWriter can be u...
Github user Wesley-Lawrence commented on a diff in the pull request: https://github.com/apache/nifi/pull/2003#discussion_r127262710 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVUtils.java --- @@ -37,6 +45,16 @@ "The format used by Informix when issuing the UNLOAD TO file_name command with escaping disabled"); static final AllowableValue MYSQL = new AllowableValue("mysql", "MySQL Format", "CSV data follows the format used by MySQL"); +static final AllowableValue SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS = new AllowableValue("csv-explicit-columns", "Use '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' Property", +"Takes the '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' property value as the explicit definition of the CSV columns."); + +static final PropertyDescriptor EXPLICIT_COLUMNS = new PropertyDescriptor.Builder() +.name(EXPLICIT_COLUMNS_DISPLAY_NAME) --- End diff -- Good catch, will fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #2003: NIFI-4181 CSVReader and CSVRecordSetWriter can be u...
Github user Wesley-Lawrence commented on a diff in the pull request: https://github.com/apache/nifi/pull/2003#discussion_r127263093 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVUtils.java --- @@ -37,6 +45,16 @@ "The format used by Informix when issuing the UNLOAD TO file_name command with escaping disabled"); static final AllowableValue MYSQL = new AllowableValue("mysql", "MySQL Format", "CSV data follows the format used by MySQL"); +static final AllowableValue SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS = new AllowableValue("csv-explicit-columns", "Use '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' Property", +"Takes the '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' property value as the explicit definition of the CSV columns."); + +static final PropertyDescriptor EXPLICIT_COLUMNS = new PropertyDescriptor.Builder() +.name(EXPLICIT_COLUMNS_DISPLAY_NAME) +.description("Specifies the CSV columns expected as a comma separated list. Only used with the Schema Access Strategy '" + SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS.getDisplayName() + "'.") +.expressionLanguageSupported(false) --- End diff -- I didn't immediately see a use for it, but it'd be better to let people use the expression language if it's not going to cause any issues. Will change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #2003: NIFI-4181 CSVReader and CSVRecordSetWriter can be u...
Github user Wesley-Lawrence commented on a diff in the pull request: https://github.com/apache/nifi/pull/2003#discussion_r127263693 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java --- @@ -49,7 +51,7 @@ + "the values. See Controller Service's Usage for further documentation.") public class CSVReader extends SchemaRegistryService implements RecordReaderFactory { -private final AllowableValue headerDerivedAllowableValue = new AllowableValue("csv-header-derived", "Use String Fields From Header", +static final AllowableValue HEADER_DERIVED_ALLOWABLE_VALUE = new AllowableValue("csv-header-derived", "Use String Fields From Header", --- End diff -- I figured leaving it package-private would let others access it as I've been accessing other `AllowableValue`s out of CSVUtils. I can set this to `private` if that's preferred, just let me know =) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #2003: NIFI-4181 CSVReader and CSVRecordSetWriter can be u...
Github user Wesley-Lawrence commented on a diff in the pull request: https://github.com/apache/nifi/pull/2003#discussion_r127265959 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVRecordSetWriter.java --- @@ -48,6 +54,7 @@ @Override protected List getSupportedPropertyDescriptors() { final List properties = new ArrayList<>(super.getSupportedPropertyDescriptors()); +properties.add(CSVUtils.EXPLICIT_COLUMNS); --- End diff -- Hmm. I'll take a stab at defining the new functionality in the `@capabilitydescription` in both `CSVReader` and `CSVRecordSetWriter`. I think It's worth documenting, and explaining in `CSVReader` that columns can be renamed by using "Explicit Columns". As well as explaining in `CSVRecordSetWriter` that "Explicit Columns" doesn't rename columns, but does allow for re-ordering, while reminding users that the column names have been defined by whatever RecordReader is being used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #2003: NIFI-4181 CSVReader and CSVRecordSetWriter can be u...
Github user Wesley-Lawrence commented on a diff in the pull request: https://github.com/apache/nifi/pull/2003#discussion_r127294734 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java --- @@ -49,7 +51,7 @@ + "the values. See Controller Service's Usage for further documentation.") public class CSVReader extends SchemaRegistryService implements RecordReaderFactory { -private final AllowableValue headerDerivedAllowableValue = new AllowableValue("csv-header-derived", "Use String Fields From Header", +static final AllowableValue HEADER_DERIVED_ALLOWABLE_VALUE = new AllowableValue("csv-header-derived", "Use String Fields From Header", --- End diff -- Note that this hasn't changed in my latest code even though GitHub hid it, still waiting for feed back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request #2003: NIFI-4181 CSVReader and CSVRecordSetWriter can be u...
Github user Wesley-Lawrence commented on a diff in the pull request: https://github.com/apache/nifi/pull/2003#discussion_r127294937 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVRecordSetWriter.java --- @@ -48,6 +54,7 @@ @Override protected List getSupportedPropertyDescriptors() { final List properties = new ArrayList<>(super.getSupportedPropertyDescriptors()); +properties.add(CSVUtils.EXPLICIT_COLUMNS); --- End diff -- Alright, I modified the capability descriptors for CSVReader and CSVRecordSetWriter. Take a look and let me know what you think =) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---