[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2018-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454540#comment-16454540
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user Wesley-Lawrence commented on the issue:

https://github.com/apache/nifi/pull/2003
  
@mattyb149 Looks like a lot has changed in the last 9 months. 

The code I added here leveraged classes related to Schema registries, but 
those classes have been moved into a more Avro specific package 
(`nifi-avro-record-utils`), where the CSV stuff is still under a standard 
package (`nifi-standard-record-utils`). Looks like it'll take some work to 
abstract the schema-registry specific stuff away from Avro, so the CSV 
reader/writers can leverage it.

Sadly, I don't have the time to get back deep in NiFi right now, so I'm OK 
with closing this PR so a more updated solution can be worked on.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2018-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454354#comment-16454354
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user Wesley-Lawrence commented on the issue:

https://github.com/apache/nifi/pull/2003
  
While I think @markap14 and I agree on the correct long-term solution of 
adding a new "String Column Names" schema registry, I've never got around to 
taking on that task. Shifting priorities and all.

However, if you'd like to take the work in as-is @mattyb149 (maybe as a 
stepping stone to an eventual new schema registry), I'll update the PR for you.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2018-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454339#comment-16454339
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/2003
  
@Wesley-Lawrence is this ready for "final" review? If so, do you mind 
rebasing this PR against the latest master? There are some conflicts listed 
above. Please and thanks!


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158931#comment-16158931
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2003
  
@Wesley-Lawrence I definitely think that's a reasonable approach. Sorry, 
just going through old PR's to make sure that I've followed up on everything. I 
like what's been outlined above.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130982#comment-16130982
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user Wesley-Lawrence commented on the issue:

https://github.com/apache/nifi/pull/2003
  
Yea, I think we're on the same page.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130856#comment-16130856
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2003
  
@Wesley-Lawrence sorry - i must have missed the email that you'd commented 
on the PR. Sorry I didn't notice until now. So I think that what you're 
proposing here is that we should have a property named "Schema Text Format" on 
the readers/writers, in addition to "Schema Text", and the Schema Text Format 
would tell the service how to parse the "Schema Text." For example, there would 
be two options: "Avro" and "String Column Names". In addition to this, we could 
potentially also introduce a new Schema Registry as I described above. This 
means that for one-off types of schemas, such as when we have a CSV file and we 
only need to use the schema once, we don't have to go to the trouble of 
creating a Schema Registry and adding entries to it - we'd just set the column 
names (optionally using Expression Language) in the Reader. And we would add 
this capability to JSON, etc. as well.

Does that all sound accurate? Just want to make sure that we are on the 
same page here.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100515#comment-16100515
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user Wesley-Lawrence commented on the issue:

https://github.com/apache/nifi/pull/2003
  
@markap14 @mattyb149 Is there any other feedback you'd two like me to 
address?


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086162#comment-16086162
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user Wesley-Lawrence commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2003#discussion_r127294937
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVRecordSetWriter.java
 ---
@@ -48,6 +54,7 @@
 @Override
 protected List getSupportedPropertyDescriptors() {
 final List properties = new 
ArrayList<>(super.getSupportedPropertyDescriptors());
+properties.add(CSVUtils.EXPLICIT_COLUMNS);
--- End diff --

Alright, I modified the capability descriptors for CSVReader and 
CSVRecordSetWriter. Take a look and let me know what you think =)


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086161#comment-16086161
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user Wesley-Lawrence commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2003#discussion_r127294734
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java
 ---
@@ -49,7 +51,7 @@
 + "the values. See Controller Service's Usage for further 
documentation.")
 public class CSVReader extends SchemaRegistryService implements 
RecordReaderFactory {
 
-private final AllowableValue headerDerivedAllowableValue = new 
AllowableValue("csv-header-derived", "Use String Fields From Header",
+static final AllowableValue HEADER_DERIVED_ALLOWABLE_VALUE = new 
AllowableValue("csv-header-derived", "Use String Fields From Header",
--- End diff --

Note that this hasn't changed in my latest code even though GitHub hid it, 
still waiting for feed back.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085956#comment-16085956
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user Wesley-Lawrence commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2003#discussion_r127265959
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVRecordSetWriter.java
 ---
@@ -48,6 +54,7 @@
 @Override
 protected List getSupportedPropertyDescriptors() {
 final List properties = new 
ArrayList<>(super.getSupportedPropertyDescriptors());
+properties.add(CSVUtils.EXPLICIT_COLUMNS);
--- End diff --

Hmm. I'll take a stab at defining the new functionality in the 
`@capabilitydescription` in both `CSVReader` and `CSVRecordSetWriter`. 

I think It's worth documenting, and explaining in `CSVReader` that columns 
can be renamed by using "Explicit Columns". As well as explaining in 
`CSVRecordSetWriter` that "Explicit Columns" doesn't rename columns, but does 
allow for re-ordering, while reminding users that the column names have been 
defined by whatever RecordReader is being used.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085945#comment-16085945
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user Wesley-Lawrence commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2003#discussion_r127263693
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java
 ---
@@ -49,7 +51,7 @@
 + "the values. See Controller Service's Usage for further 
documentation.")
 public class CSVReader extends SchemaRegistryService implements 
RecordReaderFactory {
 
-private final AllowableValue headerDerivedAllowableValue = new 
AllowableValue("csv-header-derived", "Use String Fields From Header",
+static final AllowableValue HEADER_DERIVED_ALLOWABLE_VALUE = new 
AllowableValue("csv-header-derived", "Use String Fields From Header",
--- End diff --

I figured leaving it package-private would let others access it as I've 
been accessing other `AllowableValue`s out of CSVUtils. I can set this to 
`private` if that's preferred, just let me know =)


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085940#comment-16085940
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user Wesley-Lawrence commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2003#discussion_r127263093
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVUtils.java
 ---
@@ -37,6 +45,16 @@
 "The format used by Informix when issuing the UNLOAD TO file_name 
command with escaping disabled");
 static final AllowableValue MYSQL = new AllowableValue("mysql", "MySQL 
Format", "CSV data follows the format used by MySQL");
 
+static final AllowableValue SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS = 
new AllowableValue("csv-explicit-columns", "Use '" + 
EXPLICIT_COLUMNS_DISPLAY_NAME + "' Property",
+"Takes the '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' property 
value as the explicit definition of the CSV columns.");
+
+static final PropertyDescriptor EXPLICIT_COLUMNS = new 
PropertyDescriptor.Builder()
+.name(EXPLICIT_COLUMNS_DISPLAY_NAME)
+.description("Specifies the CSV columns expected as a comma 
separated list. Only used with the Schema Access Strategy '" + 
SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS.getDisplayName() + "'.")
+.expressionLanguageSupported(false)
--- End diff --

I didn't immediately see a use for it, but it'd be better to let people use 
the expression language if it's not going to cause any issues. Will change.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085938#comment-16085938
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user Wesley-Lawrence commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2003#discussion_r127262710
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVUtils.java
 ---
@@ -37,6 +45,16 @@
 "The format used by Informix when issuing the UNLOAD TO file_name 
command with escaping disabled");
 static final AllowableValue MYSQL = new AllowableValue("mysql", "MySQL 
Format", "CSV data follows the format used by MySQL");
 
+static final AllowableValue SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS = 
new AllowableValue("csv-explicit-columns", "Use '" + 
EXPLICIT_COLUMNS_DISPLAY_NAME + "' Property",
+"Takes the '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' property 
value as the explicit definition of the CSV columns.");
+
+static final PropertyDescriptor EXPLICIT_COLUMNS = new 
PropertyDescriptor.Builder()
+.name(EXPLICIT_COLUMNS_DISPLAY_NAME)
--- End diff --

Good catch, will fix.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085929#comment-16085929
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user Wesley-Lawrence commented on the issue:

https://github.com/apache/nifi/pull/2003
  
@markap14 Good points. I actually considered adding a schema registry at 
first, but thought it might be too much for just making CSV in/out easier.

I think using schemas falls into two categories. Case A schemas are ones 
that someone uses all the time. Having a schema registry is great then, because 
it's consistently defined in a single place. Case B schemas are one-off 
scheams. Input or output is in some format a person doesn't typically use, and 
someone is just converting to-or-from a Case A often-used schema. In this case, 
the "Schema Text" property becomes really useful. Rather than cluttering your 
registry, you can just define the Case B on-off in a processor/service, and 
never think about it again.

To your point, I've only solved Case B, for CSV in/out. By adding a 
"ExplicitFieldSchemaRegistry" or "ColumnSchemaRegistry" (I think whatever this 
gets named, it's going to be ugly =P) we can tackle Case A for any type.

I think to do this completely properly, we should solve this for Case A and 
B, for any type.

So I personally think we should add a new schema registry for Case A, but I 
think we could also add some "Schema Format" property (defaulting to Avro) to 
`SchemaRegistryService` that informs processors how to interpret "Schema Text". 
That way, it's also easy to define Case B one-off schemas of any type.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085676#comment-16085676
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2003
  
@Wesley-Lawrence I definitely agree that having to build up an entire Avro 
schema can be a pain -- especially if you're not already familiar with Avro 
schemas. And I absolutely love that you saw something that can be improved and 
jumped in to improve the user experience! However, I am a bit concerned with 
the approach taken here because while it does scratch an itch, it does so only 
for CSV data and as a result isn't really consistent.

I would like to start a discussion on how we could perhaps handle this in a 
more generic way. My first thought is to actually provide an alternative 
implementation of the Schema Registry. Call it 
ExplicitStringFieldSchemaRegistry for lack of a better name (i'm suggesting 
calling it this for the sake of the discussion, not necessarily creating an 
implementation with this name). The idea here, though, is that there would be a 
new implementation of SchemaRegistry. In this new implementation you would add 
properties just like AvroSchemaRegistry. The name of the property would be the 
name of a schema. But the value, instead of an Avro Schema, would be a 
comma-separated list of column names, and all would be assumed to be Strings. 
This approach gives us a few different benefits.

First, it allows this same approach to be taken with other data formats 
(flat JSON, for example). Also, it keeps things consistent in terms of how we 
access the schema (we can still use the Schema Name property along with the 
Schema Registry). It also makes the CSV Reader more re-usable because we can 
use Expression Language to access the name of the schema, so a single CSV 
Reader can be used by many different processors spread throughout the flow.

What I would love to see at some point is a more powerful SchemaRegistry 
service that provides a Custom/Advanced UI for actually building schemas. I 
think this would be extremely powerful and useful and far easier to use. But I 
promise that I am the last person that you (or any user) wants building a UI :)

Any thoughts here?


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085132#comment-16085132
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2003#discussion_r127120413
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVRecordSetWriter.java
 ---
@@ -48,6 +54,7 @@
 @Override
 protected List getSupportedPropertyDescriptors() {
 final List properties = new 
ArrayList<>(super.getSupportedPropertyDescriptors());
+properties.add(CSVUtils.EXPLICIT_COLUMNS);
--- End diff --

Sorry this is the wrong spot to leave the comment but since 
@CapabilityDescription (line 40/46) wasn't part of the diff, I couldn't leave 
the comment there. Setting Explicit Columns can affect the first line written, 
so the CapabilityDescription text should be updated to include that.

In addition, explicitly setting the output columns is subject to the same 
rules as if you had an output Avro schema; namely, the output field/column 
names have to match the input names or else there will be empty columns/fields 
in the output. In the general case this is covered by processor or 
reader/writer doc, but since this is CSV-specific I think we should make this 
clear. On one hand, there is the interesting feature that columns can be 
re-arranged by specifying the input fields in a different order in the explicit 
output columns; but on the other hand, if the user expects to use the writer to 
rename the fields (because the names are positional), that won't work.

In general, I'd like this extra flexibility/power to not be too confusing 
for the user, or its usefulness will be overshadowed by its complexity. For 
example, you can use the Explicit Columns in the CSVReader to rename the 
columns, and in the CSVRecordSetWriter to reorder the columns, but the inverse 
is not true. These could remain undocumented/unsupported features, and/or we'd 
need very clear documentation explaining their use.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085130#comment-16085130
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2003#discussion_r127120939
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java
 ---
@@ -49,7 +51,7 @@
 + "the values. See Controller Service's Usage for further 
documentation.")
 public class CSVReader extends SchemaRegistryService implements 
RecordReaderFactory {
 
-private final AllowableValue headerDerivedAllowableValue = new 
AllowableValue("csv-header-derived", "Use String Fields From Header",
+static final AllowableValue HEADER_DERIVED_ALLOWABLE_VALUE = new 
AllowableValue("csv-header-derived", "Use String Fields From Header",
--- End diff --

Making this static (and capitalizing the name) aligns with the common 
constant / property pattern, thanks! However it also looks like it can remain 
private (at least that's what IntelliJ tells me ;)


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085131#comment-16085131
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2003#discussion_r127116107
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVUtils.java
 ---
@@ -37,6 +45,16 @@
 "The format used by Informix when issuing the UNLOAD TO file_name 
command with escaping disabled");
 static final AllowableValue MYSQL = new AllowableValue("mysql", "MySQL 
Format", "CSV data follows the format used by MySQL");
 
+static final AllowableValue SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS = 
new AllowableValue("csv-explicit-columns", "Use '" + 
EXPLICIT_COLUMNS_DISPLAY_NAME + "' Property",
+"Takes the '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' property 
value as the explicit definition of the CSV columns.");
+
+static final PropertyDescriptor EXPLICIT_COLUMNS = new 
PropertyDescriptor.Builder()
+.name(EXPLICIT_COLUMNS_DISPLAY_NAME)
+.description("Specifies the CSV columns expected as a comma 
separated list. Only used with the Schema Access Strategy '" + 
SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS.getDisplayName() + "'.")
+.expressionLanguageSupported(false)
--- End diff --

Is there any reason why expression language should not be supported? Using 
a Variable Registry for example, the header list could be set externally.


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085129#comment-16085129
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2003#discussion_r127115982
  
--- Diff: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVUtils.java
 ---
@@ -37,6 +45,16 @@
 "The format used by Informix when issuing the UNLOAD TO file_name 
command with escaping disabled");
 static final AllowableValue MYSQL = new AllowableValue("mysql", "MySQL 
Format", "CSV data follows the format used by MySQL");
 
+static final AllowableValue SCHEMA_ACCESS_STRATEGY_EXPLICIT_COLUMNS = 
new AllowableValue("csv-explicit-columns", "Use '" + 
EXPLICIT_COLUMNS_DISPLAY_NAME + "' Property",
+"Takes the '" + EXPLICIT_COLUMNS_DISPLAY_NAME + "' property 
value as the explicit definition of the CSV columns.");
+
+static final PropertyDescriptor EXPLICIT_COLUMNS = new 
PropertyDescriptor.Builder()
+.name(EXPLICIT_COLUMNS_DISPLAY_NAME)
--- End diff --

The common convention here is to set .name() to a machine-friendly name 
(like 'csv-explicit-columns') and set .displayName() to the user-friendly name 
(EXPLICIT_COLUMNS_DISPLAY_NAME)


> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4181) CSVReader and CSVRecordSetWriter services should be able to work given an explicit list of columns.

2017-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084698#comment-16084698
 ] 

ASF GitHub Bot commented on NIFI-4181:
--

GitHub user Wesley-Lawrence opened a pull request:

https://github.com/apache/nifi/pull/2003

NIFI-4181 CSVReader and CSVRecordSetWriter can be used by just explictly 
declaring their columns.

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [✓] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [✓] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [✓] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [✓] Is your initial contribution a single, squashed commit?

### For code changes:
- [✓] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [✓] Have you written or updated unit tests to verify your changes?
- [N/A] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [N/A] If applicable, have you updated the LICENSE file, including the 
main LICENSE file under nifi-assembly?
- [N/A] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [✓] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [N/A] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Wesley-Lawrence/nifi NIFI-4181

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2003.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2003


commit cc3f6af7c4d751e813384e166aa87821ace42273
Author: Wesley-Lawrence 
Date:   2017-07-12T21:12:55Z

NIFI-4181 CSVReader and CSVRecordSetWriter can be used by just explicitly 
declaring their columns.




> CSVReader and CSVRecordSetWriter services should be able to work given an 
> explicit list of columns.
> ---
>
> Key: NIFI-4181
> URL: https://issues.apache.org/jira/browse/NIFI-4181
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Wesley L Lawrence
>Priority: Minor
> Attachments: NIFI-4181.patch
>
>
> Currently, to read or write a CSV file with *Record processors, the CSVReader 
> and CSVRecordSetWriters need to be given an avro schema. For CSV, a simple 
> column definition can also work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)