Re: Handling CSVs dynamically with NiFi

2023-04-06 Thread Bryan Bende
For any record reader, including CsvReader, you can choose the "Schema
Access Strategy" of "Infer Schema" and NiFi will read in all the
records and infer the schema from them.

On Thu, Apr 6, 2023 at 7:36 AM James McMahon  wrote:
>
> We have a task requiring that we transform incoming CSV files to JSON. The 
> CSVs vary in schema.
>
> There are a number of interesting flow examples out there illustrating how 
> one can set up a flow to handle the case where the CSV schema is well known 
> and fixed, but none for the generalized case.
>
> The structure of the incoming CSV files will not be known in advance in our 
> use case. Our nifi flow must be generalized because I cannot configure and 
> rely on a service that defines a specific fixed Avro schema registry. An Avro 
> schema registry seems to presume an awareness of the CSV structure in 
> advance. We don't have that luxury in this use case, with CSVs arriving from 
> many different providers and so characterized by schemas that are unknown.
>
> What is the best way to get around this challenge? Does anyone know of an 
> example where NiFi builds the schema on the fly as CSVs arrive for 
> processing, dynamically defining the Avro schema for the CSV?
>
> Thanks in advance for any thoughts.


RE: Handling CSVs dynamically with NiFi

2023-04-06 Thread Mike Sofen
Hi James,

 

I don’t have time to go into details, but I had nearly the same scenario and 
solved it by using Nifi as the file processing piece only, sending valid CSV 
files (valid as in CSV formatting) and leveraged Postgres to land the CSV data 
into pre-built staging tables and from there did content validations and 
packaging into jsonb for storage into a single target table.  

 

In my case, an external file source had to “register” a single file (to allow 
creating the matching staging table) prior to sending data.  I used Nifi for 
that pre-staging step to derive the schema for the staging table for a file and 
I used a complex stored procedure to handle a massive amount of logic around 
the contents of a file when processing the actual files prior to storing into 
the destination table.

 

Nifi was VERY fast and efficient in this, as was Postgres.

 

Mike Sofen

 

From: James McMahon  
Sent: Thursday, April 06, 2023 4:35 AM
To: users 
Subject: Handling CSVs dynamically with NiFi

 

We have a task requiring that we transform incoming CSV files to JSON. The CSVs 
vary in schema.

 

There are a number of interesting flow examples out there illustrating how one 
can set up a flow to handle the case where the CSV schema is well known and 
fixed, but none for the generalized case.

 

The structure of the incoming CSV files will not be known in advance in our use 
case. Our nifi flow must be generalized because I cannot configure and rely on 
a service that defines a specific fixed Avro schema registry. An Avro schema 
registry seems to presume an awareness of the CSV structure in advance. We 
don't have that luxury in this use case, with CSVs arriving from many different 
providers and so characterized by schemas that are unknown.

 

What is the best way to get around this challenge? Does anyone know of an 
example where NiFi builds the schema on the fly as CSVs arrive for processing, 
dynamically defining the Avro schema for the CSV?

 

Thanks in advance for any thoughts.



Re: Handling CSVs dynamically with NiFi

2023-04-06 Thread James McMahon
Thank you both very much, Bryan and Mike. Mike, had you considered the
approach mentioned by Bryan - a Reader processor to infer schema  -  and
found it wasn't suitable for your use case, for some reason? For instance,
perhaps you were employing a version of Apache NiFi that did not afford
access to a CsvReader or InferAvroSchema processor?
Jim

On Thu, Apr 6, 2023 at 9:30 AM Mike Sofen  wrote:

> Hi James,
>
>
>
> I don’t have time to go into details, but I had nearly the same scenario
> and solved it by using Nifi as the file processing piece only, sending
> valid CSV files (valid as in CSV formatting) and leveraged Postgres to land
> the CSV data into pre-built staging tables and from there did content
> validations and packaging into jsonb for storage into a single target
> table.
>
>
>
> In my case, an external file source had to “register” a single file (to
> allow creating the matching staging table) prior to sending data.  I used
> Nifi for that pre-staging step to derive the schema for the staging table
> for a file and I used a complex stored procedure to handle a massive amount
> of logic around the contents of a file when processing the actual files
> prior to storing into the destination table.
>
>
>
> Nifi was VERY fast and efficient in this, as was Postgres.
>
>
>
> Mike Sofen
>
>
>
> *From:* James McMahon 
> *Sent:* Thursday, April 06, 2023 4:35 AM
> *To:* users 
> *Subject:* Handling CSVs dynamically with NiFi
>
>
>
> We have a task requiring that we transform incoming CSV files to JSON. The
> CSVs vary in schema.
>
>
>
> There are a number of interesting flow examples out there illustrating how
> one can set up a flow to handle the case where the CSV schema is well known
> and fixed, but none for the generalized case.
>
>
>
> The structure of the incoming CSV files will not be known in advance in
> our use case. Our nifi flow must be generalized because I cannot configure
> and rely on a service that defines a specific fixed Avro schema registry.
> An Avro schema registry seems to presume an awareness of the CSV
> structure in advance. We don't have that luxury in this use case, with CSVs
> arriving from many different providers and so characterized by schemas that
> are unknown.
>
>
>
> What is the best way to get around this challenge? Does anyone know of an
> example where NiFi builds the schema on the fly as CSVs arrive for
> processing, dynamically defining the Avro schema for the CSV?
>
>
>
> Thanks in advance for any thoughts.
>


Re: Handling CSVs dynamically with NiFi

2023-04-06 Thread Chris Sampson
For the CSVReader, there's also the "Use String Fields From Header"
Strategy, which will treat all fields as strings if that's more
appropriate, e.g. a numeric field with leading 0 where the 0 needs to be
retained (common for telephone numbers, for example).

To convert to JSON, consider the ConvertRecord processor with the CSVReader
and JsonRecordSetWriter. If you need to make changes to the data at the
same time then maybe UpdateRecord or QueryRecord, etc.

On Thu, 6 Apr 2023, 14:30 Mike Sofen,  wrote:

> Hi James,
>
>
>
> I don’t have time to go into details, but I had nearly the same scenario
> and solved it by using Nifi as the file processing piece only, sending
> valid CSV files (valid as in CSV formatting) and leveraged Postgres to land
> the CSV data into pre-built staging tables and from there did content
> validations and packaging into jsonb for storage into a single target
> table.
>
>
>
> In my case, an external file source had to “register” a single file (to
> allow creating the matching staging table) prior to sending data.  I used
> Nifi for that pre-staging step to derive the schema for the staging table
> for a file and I used a complex stored procedure to handle a massive amount
> of logic around the contents of a file when processing the actual files
> prior to storing into the destination table.
>
>
>
> Nifi was VERY fast and efficient in this, as was Postgres.
>
>
>
> Mike Sofen
>
>
>
> *From:* James McMahon 
> *Sent:* Thursday, April 06, 2023 4:35 AM
> *To:* users 
> *Subject:* Handling CSVs dynamically with NiFi
>
>
>
> We have a task requiring that we transform incoming CSV files to JSON. The
> CSVs vary in schema.
>
>
>
> There are a number of interesting flow examples out there illustrating how
> one can set up a flow to handle the case where the CSV schema is well known
> and fixed, but none for the generalized case.
>
>
>
> The structure of the incoming CSV files will not be known in advance in
> our use case. Our nifi flow must be generalized because I cannot configure
> and rely on a service that defines a specific fixed Avro schema registry.
> An Avro schema registry seems to presume an awareness of the CSV
> structure in advance. We don't have that luxury in this use case, with CSVs
> arriving from many different providers and so characterized by schemas that
> are unknown.
>
>
>
> What is the best way to get around this challenge? Does anyone know of an
> example where NiFi builds the schema on the fly as CSVs arrive for
> processing, dynamically defining the Avro schema for the CSV?
>
>
>
> Thanks in advance for any thoughts.
>


Re: Handling CSVs dynamically with NiFi

2023-04-06 Thread Bryan Bende
Just to clarify, I would not recommend using the InferAvroSchema
processor as it is from an older approach and it should be considered
deprecated if it is not already. The infer capability is built into
the record readers as a Schema Access Strategy for all record readers.

On Thu, Apr 6, 2023 at 9:39 AM James McMahon  wrote:
>
> Thank you both very much, Bryan and Mike. Mike, had you considered the 
> approach mentioned by Bryan - a Reader processor to infer schema  -  and 
> found it wasn't suitable for your use case, for some reason? For instance, 
> perhaps you were employing a version of Apache NiFi that did not afford 
> access to a CsvReader or InferAvroSchema processor?
> Jim
>
> On Thu, Apr 6, 2023 at 9:30 AM Mike Sofen  wrote:
>>
>> Hi James,
>>
>>
>>
>> I don’t have time to go into details, but I had nearly the same scenario and 
>> solved it by using Nifi as the file processing piece only, sending valid CSV 
>> files (valid as in CSV formatting) and leveraged Postgres to land the CSV 
>> data into pre-built staging tables and from there did content validations 
>> and packaging into jsonb for storage into a single target table.
>>
>>
>>
>> In my case, an external file source had to “register” a single file (to 
>> allow creating the matching staging table) prior to sending data.  I used 
>> Nifi for that pre-staging step to derive the schema for the staging table 
>> for a file and I used a complex stored procedure to handle a massive amount 
>> of logic around the contents of a file when processing the actual files 
>> prior to storing into the destination table.
>>
>>
>>
>> Nifi was VERY fast and efficient in this, as was Postgres.
>>
>>
>>
>> Mike Sofen
>>
>>
>>
>> From: James McMahon 
>> Sent: Thursday, April 06, 2023 4:35 AM
>> To: users 
>> Subject: Handling CSVs dynamically with NiFi
>>
>>
>>
>> We have a task requiring that we transform incoming CSV files to JSON. The 
>> CSVs vary in schema.
>>
>>
>>
>> There are a number of interesting flow examples out there illustrating how 
>> one can set up a flow to handle the case where the CSV schema is well known 
>> and fixed, but none for the generalized case.
>>
>>
>>
>> The structure of the incoming CSV files will not be known in advance in our 
>> use case. Our nifi flow must be generalized because I cannot configure and 
>> rely on a service that defines a specific fixed Avro schema registry. An 
>> Avro schema registry seems to presume an awareness of the CSV structure in 
>> advance. We don't have that luxury in this use case, with CSVs arriving from 
>> many different providers and so characterized by schemas that are unknown.
>>
>>
>>
>> What is the best way to get around this challenge? Does anyone know of an 
>> example where NiFi builds the schema on the fly as CSVs arrive for 
>> processing, dynamically defining the Avro schema for the CSV?
>>
>>
>>
>> Thanks in advance for any thoughts.


RE: Handling CSVs dynamically with NiFi

2023-04-06 Thread Mike Sofen
Jim – that’s exactly what I did on that “pre” step – generate a schema from the 
CSVReader and use that to dynamically create the DDL sql needed to build the 
staging table in Postgres.  In my solution, there are 2 separate pipelines 
running – this pre step and the normal file processing.

 

I used the pre step to ensure that all incoming files were from a known and 
valid source and that they conformed to the schema for that source – a very 
tidy way to ensure data quality.

 

Mike

 

From: James McMahon  
Sent: Thursday, April 06, 2023 6:39 AM
To: users@nifi.apache.org
Subject: Re: Handling CSVs dynamically with NiFi

 

Thank you both very much, Bryan and Mike. Mike, had you considered the approach 
mentioned by Bryan - a Reader processor to infer schema  -  and found it wasn't 
suitable for your use case, for some reason? For instance, perhaps you were 
employing a version of Apache NiFi that did not afford access to a CsvReader or 
InferAvroSchema processor?

Jim

 

On Thu, Apr 6, 2023 at 9:30 AM Mike Sofen mailto:mso...@runbox.com> > wrote:

Hi James,

 

I don’t have time to go into details, but I had nearly the same scenario and 
solved it by using Nifi as the file processing piece only, sending valid CSV 
files (valid as in CSV formatting) and leveraged Postgres to land the CSV data 
into pre-built staging tables and from there did content validations and 
packaging into jsonb for storage into a single target table.  

 

In my case, an external file source had to “register” a single file (to allow 
creating the matching staging table) prior to sending data.  I used Nifi for 
that pre-staging step to derive the schema for the staging table for a file and 
I used a complex stored procedure to handle a massive amount of logic around 
the contents of a file when processing the actual files prior to storing into 
the destination table.

 

Nifi was VERY fast and efficient in this, as was Postgres.

 

Mike Sofen

 

From: James McMahon mailto:jsmcmah...@gmail.com> > 
Sent: Thursday, April 06, 2023 4:35 AM
To: users mailto:users@nifi.apache.org> >
Subject: Handling CSVs dynamically with NiFi

 

We have a task requiring that we transform incoming CSV files to JSON. The CSVs 
vary in schema.

 

There are a number of interesting flow examples out there illustrating how one 
can set up a flow to handle the case where the CSV schema is well known and 
fixed, but none for the generalized case.

 

The structure of the incoming CSV files will not be known in advance in our use 
case. Our nifi flow must be generalized because I cannot configure and rely on 
a service that defines a specific fixed Avro schema registry. An Avro schema 
registry seems to presume an awareness of the CSV structure in advance. We 
don't have that luxury in this use case, with CSVs arriving from many different 
providers and so characterized by schemas that are unknown.

 

What is the best way to get around this challenge? Does anyone know of an 
example where NiFi builds the schema on the fly as CSVs arrive for processing, 
dynamically defining the Avro schema for the CSV?

 

Thanks in advance for any thoughts.



Re: Handling CSVs dynamically with NiFi

2023-04-06 Thread James McMahon
Can I ask you one follow-up? I've gotten my ConvertRecord to work. I
created a CsvReader service with Schema Access Strategy of Use String
Fields From Header. I created a JsonRecordSetWriter service with Schema
Write Strategy of Do Not Write Schema.
When ConvertRecord is finished, my result looks like this sample:

[ {
  "Bank Name�" : "Almena State Bank",
  "City�" : "Almena",
  "State�" : "KS",
  "Cert�" : "15426",
  "Acquiring Institution�" : "Equity Bank",
  "Closing Date�" : "23-Oct-20",
  "Fund" : "10538"
}, {
  "Bank Name�" : "First City Bank of Florida",
  "City�" : "Fort Walton Beach",
  "State�" : "FL",
  "Cert�" : "16748",
  "Acquiring Institution�" : "United Fidelity Bank, fsb",
  "Closing Date�" : "16-Oct-20",
  "Fund" : "10537"
}, {
  "Bank Name�" : "The First State Bank",
  "City�" : "Barboursville",
  "State�" : "WV",
  "Cert�" : "14361",
  "Acquiring Institution�" : "MVB Bank, Inc.",
  "Closing Date�" : "3-Apr-20",
  "Fund" : "10536"
}]

I don't really have a schema. How can I use a combination of SplitJson and
EvaluateJsonPath to split each json object out to its own nifi flowfile,
and to pull the json key values out to define the fields in the csv header?
I've found a few examples through research that allude to this, but they
all seem to have a fixed schema and they don't offer configurations for the
SplitJson. In a case where my json keys definition changes depending on the
lfowfile, what should JsonPathExpression be set to in the SplitJson
configuration?

On Thu, Apr 6, 2023 at 9:59 AM Mike Sofen  wrote:

> Jim – that’s exactly what I did on that “pre” step – generate a schema
> from the CSVReader and use that to dynamically create the DDL sql needed to
> build the staging table in Postgres.  In my solution, there are 2 separate
> pipelines running – this pre step and the normal file processing.
>
>
>
> I used the pre step to ensure that all incoming files were from a known
> and valid source and that they conformed to the schema for that source – a
> very tidy way to ensure data quality.
>
>
>
> Mike
>
>
>
> *From:* James McMahon 
> *Sent:* Thursday, April 06, 2023 6:39 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Thank you both very much, Bryan and Mike. Mike, had you considered the
> approach mentioned by Bryan - a Reader processor to infer schema  -  and
> found it wasn't suitable for your use case, for some reason? For instance,
> perhaps you were employing a version of Apache NiFi that did not afford
> access to a CsvReader or InferAvroSchema processor?
>
> Jim
>
>
>
> On Thu, Apr 6, 2023 at 9:30 AM Mike Sofen  wrote:
>
> Hi James,
>
>
>
> I don’t have time to go into details, but I had nearly the same scenario
> and solved it by using Nifi as the file processing piece only, sending
> valid CSV files (valid as in CSV formatting) and leveraged Postgres to land
> the CSV data into pre-built staging tables and from there did content
> validations and packaging into jsonb for storage into a single target
> table.
>
>
>
> In my case, an external file source had to “register” a single file (to
> allow creating the matching staging table) prior to sending data.  I used
> Nifi for that pre-staging step to derive the schema for the staging table
> for a file and I used a complex stored procedure to handle a massive amount
> of logic around the contents of a file when processing the actual files
> prior to storing into the destination table.
>
>
>
> Nifi was VERY fast and efficient in this, as was Postgres.
>
>
>
> Mike Sofen
>
>
>
> *From:* James McMahon 
> *Sent:* Thursday, April 06, 2023 4:35 AM
> *To:* users 
> *Subject:* Handling CSVs dynamically with NiFi
>
>
>
> We have a task requiring that we transform incoming CSV files to JSON. The
> CSVs vary in schema.
>
>
>
> There are a number of interesting flow examples out there illustrating how
> one can set up a flow to handle the case where the CSV schema is well known
> and fixed, but none for the generalized case.
>
>
>
> The structure of the incoming CSV files will not be known in advance in
> our use case. Our nifi flow must be generalized because I cannot configure
> and rely on a service that defines a specific fixed Avro schema registry.
> An Avro schema registry seems to presume an awareness of the CSV
> structure in advance. We don't have that luxury in this use case, with CSVs
> arriving from many different providers and so characterized by schemas that
> are unknown.
>
>
>
> What is the best way to get around this challenge? Does anyone know of an
> example where NiFi builds the schema on the fly as CSVs arrive for
> processing, dynamically defining the Avro schema for the CSV?
>
>
>
> Thanks in advance for any thoughts.
>
>


RE: Handling CSVs dynamically with NiFi

2023-04-07 Thread Mike Sofen
This is where I felt Nifi wasn’t the right tool for the job and Postgres was.  
After I imported the CSV directly into a staging table in the database (using 
Nifi), I converted the payload part of the columns into jsonb and stored that 
into the final table in a column with additional columns as relational data 
(timestamps, identifiers, etc).  It was an object-relational data model.

 

THEN, using the amazingly powerful Postgres jsonb functions, I was able to 
extract the unique keys in an entire dataset or across multiple datasets (to 
build a data catalog for example), perform a wide range of validations on 
individual keys, etc.  I use the word amazing because they are not just 
powerful functions but they run surprisingly fast given the amount of string 
data they are traversing.

 

Mike Sofen

 

From: James McMahon  
Sent: Thursday, April 06, 2023 2:03 PM
To: users@nifi.apache.org
Subject: Re: Handling CSVs dynamically with NiFi

 

Can I ask you one follow-up? I've gotten my ConvertRecord to work. I created a 
CsvReader service with Schema Access Strategy of Use String Fields From Header. 
I created a JsonRecordSetWriter service with Schema Write Strategy of Do Not 
Write Schema.

When ConvertRecord is finished, my result looks like this sample:

[ {
  "Bank Name�" : "Almena State Bank",
  "City�" : "Almena",
  "State�" : "KS",
  "Cert�" : "15426",
  "Acquiring Institution�" : "Equity Bank",
  "Closing Date�" : "23-Oct-20",
  "Fund" : "10538"
}, {
  "Bank Name�" : "First City Bank of Florida",
  "City�" : "Fort Walton Beach",
  "State�" : "FL",
  "Cert�" : "16748",
  "Acquiring Institution�" : "United Fidelity Bank, fsb",
  "Closing Date�" : "16-Oct-20",
  "Fund" : "10537"
}, {
  "Bank Name�" : "The First State Bank",
  "City�" : "Barboursville",
  "State�" : "WV",
  "Cert�" : "14361",
  "Acquiring Institution�" : "MVB Bank, Inc.",
  "Closing Date�" : "3-Apr-20",
  "Fund" : "10536"
}] 

 

I don't really have a schema. How can I use a combination of SplitJson and 
EvaluateJsonPath to split each json object out to its own nifi flowfile, and to 
pull the json key values out to define the fields in the csv header? I've found 
a few examples through research that allude to this, but they all seem to have 
a fixed schema and they don't offer configurations for the SplitJson. In a case 
where my json keys definition changes depending on the lfowfile, what should 
JsonPathExpression be set to in the SplitJson configuration?

 

On Thu, Apr 6, 2023 at 9:59 AM Mike Sofen mailto:mso...@runbox.com> > wrote:

Jim – that’s exactly what I did on that “pre” step – generate a schema from the 
CSVReader and use that to dynamically create the DDL sql needed to build the 
staging table in Postgres.  In my solution, there are 2 separate pipelines 
running – this pre step and the normal file processing.

 

I used the pre step to ensure that all incoming files were from a known and 
valid source and that they conformed to the schema for that source – a very 
tidy way to ensure data quality.

 

Mike

 

From: James McMahon mailto:jsmcmah...@gmail.com> > 
Sent: Thursday, April 06, 2023 6:39 AM
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: Handling CSVs dynamically with NiFi

 

Thank you both very much, Bryan and Mike. Mike, had you considered the approach 
mentioned by Bryan - a Reader processor to infer schema  -  and found it wasn't 
suitable for your use case, for some reason? For instance, perhaps you were 
employing a version of Apache NiFi that did not afford access to a CsvReader or 
InferAvroSchema processor?

Jim

 

On Thu, Apr 6, 2023 at 9:30 AM Mike Sofen mailto:mso...@runbox.com> > wrote:

Hi James,

 

I don’t have time to go into details, but I had nearly the same scenario and 
solved it by using Nifi as the file processing piece only, sending valid CSV 
files (valid as in CSV formatting) and leveraged Postgres to land the CSV data 
into pre-built staging tables and from there did content validations and 
packaging into jsonb for storage into a single target table.  

 

In my case, an external file source had to “register” a single file (to allow 
creating the matching staging table) prior to sending data.  I used Nifi for 
that pre-staging step to derive the schema for the staging table for a file and 
I used a complex stored procedure to handle a massive amount of logic around 
the contents of a file when processing the actual files prior to storing into 
the destination table.

 

Nifi was VERY fast and effici

Re: Handling CSVs dynamically with NiFi

2023-04-07 Thread James McMahon
Very interesting, very helpful insights. Thank you again, Mike.
Late last night I decided to punt on a pure NiFi solution. I knew I could
do this easily with Groovy scripting, and I knew that was well-within my
wheelhouse. So that's what I did: Groovy from an ExecuteScript processor.
I'm 90% of the way there. Just a few more refinements to get just what I
want, which I'll tackle later tonight.
Groovy is pretty cool. Flexible, easily tailored to just what you need. I
like having that flexibility. And I like having options, too: your results
have motivated me to look at using QueryRecords, etc etc.
Jim

On Fri, Apr 7, 2023 at 9:32 AM Mike Sofen  wrote:

> This is where I felt Nifi wasn’t the right tool for the job and Postgres
> was.  After I imported the CSV directly into a staging table in the
> database (using Nifi), I converted the payload part of the columns into
> jsonb and stored that into the final table in a column with additional
> columns as relational data (timestamps, identifiers, etc).  It was an
> object-relational data model.
>
>
>
> THEN, using the amazingly powerful Postgres jsonb functions, I was able to
> extract the unique keys in an entire dataset or across multiple datasets
> (to build a data catalog for example), perform a wide range of validations
> on individual keys, etc.  I use the word amazing because they are not just
> powerful functions but they run surprisingly fast given the amount of
> string data they are traversing.
>
>
>
> Mike Sofen
>
>
>
> *From:* James McMahon 
> *Sent:* Thursday, April 06, 2023 2:03 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Can I ask you one follow-up? I've gotten my ConvertRecord to work. I
> created a CsvReader service with Schema Access Strategy of Use String
> Fields From Header. I created a JsonRecordSetWriter service with Schema
> Write Strategy of Do Not Write Schema.
>
> When ConvertRecord is finished, my result looks like this sample:
>
> [ {
>   "Bank Name�" : "Almena State Bank",
>   "City�" : "Almena",
>   "State�" : "KS",
>   "Cert�" : "15426",
>   "Acquiring Institution�" : "Equity Bank",
>   "Closing Date�" : "23-Oct-20",
>   "Fund" : "10538"
> }, {
>   "Bank Name�" : "First City Bank of Florida",
>   "City�" : "Fort Walton Beach",
>   "State�" : "FL",
>   "Cert�" : "16748",
>   "Acquiring Institution�" : "United Fidelity Bank, fsb",
>   "Closing Date�" : "16-Oct-20",
>   "Fund" : "10537"
> }, {
>   "Bank Name�" : "The First State Bank",
>   "City�" : "Barboursville",
>   "State�" : "WV",
>   "Cert�" : "14361",
>   "Acquiring Institution�" : "MVB Bank, Inc.",
>   "Closing Date�" : "3-Apr-20",
>   "Fund" : "10536"
> }]
>
>
>
> I don't really have a schema. How can I use a combination of SplitJson and
> EvaluateJsonPath to split each json object out to its own nifi flowfile,
> and to pull the json key values out to define the fields in the csv header?
> I've found a few examples through research that allude to this, but they
> all seem to have a fixed schema and they don't offer configurations for the
> SplitJson. In a case where my json keys definition changes depending on the
> lfowfile, what should JsonPathExpression be set to in the SplitJson
> configuration?
>
>
>
> On Thu, Apr 6, 2023 at 9:59 AM Mike Sofen  wrote:
>
> Jim – that’s exactly what I did on that “pre” step – generate a schema
> from the CSVReader and use that to dynamically create the DDL sql needed to
> build the staging table in Postgres.  In my solution, there are 2 separate
> pipelines running – this pre step and the normal file processing.
>
>
>
> I used the pre step to ensure that all incoming files were from a known
> and valid source and that they conformed to the schema for that source – a
> very tidy way to ensure data quality.
>
>
>
> Mike
>
>
>
> *From:* James McMahon 
> *Sent:* Thursday, April 06, 2023 6:39 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Thank you both very much, Bryan and Mike. Mike, had you considered the
> approach mentioned by Bryan - a Reader processor to infer schema  -  and
> found it wasn't suitable for your use case, for some reason? For instance,
> perhaps you were employin

Re: Handling CSVs dynamically with NiFi

2023-04-07 Thread Bryan Bende
James,

I'm not sure what the end goal is, but why do you need to use
EvaluateJsonPath and SplitJson?

Generally you don't want to split a flow file of multiple records into
1 record per flow file, this is an anti-pattern that leads to poor
performance in the flow.

Thanks,

Bryan

On Fri, Apr 7, 2023 at 9:41 AM James McMahon  wrote:
>
> Very interesting, very helpful insights. Thank you again, Mike.
> Late last night I decided to punt on a pure NiFi solution. I knew I could do 
> this easily with Groovy scripting, and I knew that was well-within my 
> wheelhouse. So that's what I did: Groovy from an ExecuteScript processor. I'm 
> 90% of the way there. Just a few more refinements to get just what I want, 
> which I'll tackle later tonight.
> Groovy is pretty cool. Flexible, easily tailored to just what you need. I 
> like having that flexibility. And I like having options, too: your results 
> have motivated me to look at using QueryRecords, etc etc.
> Jim
>
> On Fri, Apr 7, 2023 at 9:32 AM Mike Sofen  wrote:
>>
>> This is where I felt Nifi wasn’t the right tool for the job and Postgres 
>> was.  After I imported the CSV directly into a staging table in the database 
>> (using Nifi), I converted the payload part of the columns into jsonb and 
>> stored that into the final table in a column with additional columns as 
>> relational data (timestamps, identifiers, etc).  It was an object-relational 
>> data model.
>>
>>
>>
>> THEN, using the amazingly powerful Postgres jsonb functions, I was able to 
>> extract the unique keys in an entire dataset or across multiple datasets (to 
>> build a data catalog for example), perform a wide range of validations on 
>> individual keys, etc.  I use the word amazing because they are not just 
>> powerful functions but they run surprisingly fast given the amount of string 
>> data they are traversing.
>>
>>
>>
>> Mike Sofen
>>
>>
>>
>> From: James McMahon 
>> Sent: Thursday, April 06, 2023 2:03 PM
>> To: users@nifi.apache.org
>> Subject: Re: Handling CSVs dynamically with NiFi
>>
>>
>>
>> Can I ask you one follow-up? I've gotten my ConvertRecord to work. I created 
>> a CsvReader service with Schema Access Strategy of Use String Fields From 
>> Header. I created a JsonRecordSetWriter service with Schema Write Strategy 
>> of Do Not Write Schema.
>>
>> When ConvertRecord is finished, my result looks like this sample:
>>
>> [ {
>>   "Bank Name�" : "Almena State Bank",
>>   "City�" : "Almena",
>>   "State�" : "KS",
>>   "Cert�" : "15426",
>>   "Acquiring Institution�" : "Equity Bank",
>>   "Closing Date�" : "23-Oct-20",
>>   "Fund" : "10538"
>> }, {
>>   "Bank Name�" : "First City Bank of Florida",
>>   "City�" : "Fort Walton Beach",
>>   "State�" : "FL",
>>   "Cert�" : "16748",
>>   "Acquiring Institution�" : "United Fidelity Bank, fsb",
>>   "Closing Date�" : "16-Oct-20",
>>   "Fund" : "10537"
>> }, {
>>   "Bank Name�" : "The First State Bank",
>>   "City�" : "Barboursville",
>>   "State�" : "WV",
>>   "Cert�" : "14361",
>>   "Acquiring Institution�" : "MVB Bank, Inc.",
>>   "Closing Date�" : "3-Apr-20",
>>   "Fund" : "10536"
>> }]
>>
>>
>>
>> I don't really have a schema. How can I use a combination of SplitJson and 
>> EvaluateJsonPath to split each json object out to its own nifi flowfile, and 
>> to pull the json key values out to define the fields in the csv header? I've 
>> found a few examples through research that allude to this, but they all seem 
>> to have a fixed schema and they don't offer configurations for the 
>> SplitJson. In a case where my json keys definition changes depending on the 
>> lfowfile, what should JsonPathExpression be set to in the SplitJson 
>> configuration?
>>
>>
>>
>> On Thu, Apr 6, 2023 at 9:59 AM Mike Sofen  wrote:
>>
>> Jim – that’s exactly what I did on that “pre” step – generate a schema from 
>> the CSVReader and use that to dynamically create the DDL sql needed to build 
>> the staging table in Postgres.  In my solution, there are 2 separate 
>

Re: Handling CSVs dynamically with NiFi

2023-04-07 Thread James McMahon
Hello Bryan. Thank you for your question.
A downstream consumer requires the complete set in json. So that's part of
why I convert.
Other downstream tools require json input, but not the entire set. The data
needs to be routed based on certain features. Geographic location. Sales
data by zip codes. Etc, etc. Splitting the records out seemed to be a
reasonable option to route individual records.
I appreciate you taking the time to ask. You are far more familiar with
nifi best practices than me. If there is a better way than what I intended,
please fire away. I'd love to mArch down a better path if there is one.
Cheers,
Jim

On Fri, Apr 7, 2023 at 10:57 AM Bryan Bende  wrote:

> James,
>
> I'm not sure what the end goal is, but why do you need to use
> EvaluateJsonPath and SplitJson?
>
> Generally you don't want to split a flow file of multiple records into
> 1 record per flow file, this is an anti-pattern that leads to poor
> performance in the flow.
>
> Thanks,
>
> Bryan
>
> On Fri, Apr 7, 2023 at 9:41 AM James McMahon  wrote:
> >
> > Very interesting, very helpful insights. Thank you again, Mike.
> > Late last night I decided to punt on a pure NiFi solution. I knew I
> could do this easily with Groovy scripting, and I knew that was well-within
> my wheelhouse. So that's what I did: Groovy from an ExecuteScript
> processor. I'm 90% of the way there. Just a few more refinements to get
> just what I want, which I'll tackle later tonight.
> > Groovy is pretty cool. Flexible, easily tailored to just what you need.
> I like having that flexibility. And I like having options, too: your
> results have motivated me to look at using QueryRecords, etc etc.
> > Jim
> >
> > On Fri, Apr 7, 2023 at 9:32 AM Mike Sofen  wrote:
> >>
> >> This is where I felt Nifi wasn’t the right tool for the job and
> Postgres was.  After I imported the CSV directly into a staging table in
> the database (using Nifi), I converted the payload part of the columns into
> jsonb and stored that into the final table in a column with additional
> columns as relational data (timestamps, identifiers, etc).  It was an
> object-relational data model.
> >>
> >>
> >>
> >> THEN, using the amazingly powerful Postgres jsonb functions, I was able
> to extract the unique keys in an entire dataset or across multiple datasets
> (to build a data catalog for example), perform a wide range of validations
> on individual keys, etc.  I use the word amazing because they are not just
> powerful functions but they run surprisingly fast given the amount of
> string data they are traversing.
> >>
> >>
> >>
> >> Mike Sofen
> >>
> >>
> >>
> >> From: James McMahon 
> >> Sent: Thursday, April 06, 2023 2:03 PM
> >> To: users@nifi.apache.org
> >> Subject: Re: Handling CSVs dynamically with NiFi
> >>
> >>
> >>
> >> Can I ask you one follow-up? I've gotten my ConvertRecord to work. I
> created a CsvReader service with Schema Access Strategy of Use String
> Fields From Header. I created a JsonRecordSetWriter service with Schema
> Write Strategy of Do Not Write Schema.
> >>
> >> When ConvertRecord is finished, my result looks like this sample:
> >>
> >> [ {
> >>   "Bank Name�" : "Almena State Bank",
> >>   "City�" : "Almena",
> >>   "State�" : "KS",
> >>   "Cert�" : "15426",
> >>   "Acquiring Institution�" : "Equity Bank",
> >>   "Closing Date�" : "23-Oct-20",
> >>   "Fund" : "10538"
> >> }, {
> >>   "Bank Name�" : "First City Bank of Florida",
> >>   "City�" : "Fort Walton Beach",
> >>   "State�" : "FL",
> >>   "Cert�" : "16748",
> >>   "Acquiring Institution�" : "United Fidelity Bank, fsb",
> >>   "Closing Date�" : "16-Oct-20",
> >>   "Fund" : "10537"
> >> }, {
> >>   "Bank Name�" : "The First State Bank",
> >>   "City�" : "Barboursville",
> >>   "State�" : "WV",
> >>   "Cert�" : "14361",
> >>   "Acquiring Institution�" : "MVB Bank, Inc.",
> >>   "Closing Date�" : "3-Apr-20",
> >>   "Fund" : "10536"
> >> }]
> >>
> >>
> >>
>

RE: Handling CSVs dynamically with NiFi

2023-04-12 Thread Isha Lamboo
Hi James,

One option you can use is the QueryRecord processor. It allows you to filter 
records with a SQL-like query for any combination of fields that your 
downstream tools require. You can add one for each different output required 
and send a copy of the main json file to each.

This approach should work better if you have a limited number of different 
output files with many records each. If your goal is hundreds of different json 
files with a handful of records each, then splitting per row might be quicker 
than copying the entire json file that many times.

Regards,

Isha

Van: James McMahon 
Verzonden: vrijdag 7 april 2023 17:14
Aan: users@nifi.apache.org
Onderwerp: Re: Handling CSVs dynamically with NiFi

Hello Bryan. Thank you for your question.
A downstream consumer requires the complete set in json. So that's part of why 
I convert.
Other downstream tools require json input, but not the entire set. The data 
needs to be routed based on certain features. Geographic location. Sales data 
by zip codes. Etc, etc. Splitting the records out seemed to be a reasonable 
option to route individual records.
I appreciate you taking the time to ask. You are far more familiar with nifi 
best practices than me. If there is a better way than what I intended, please 
fire away. I'd love to mArch down a better path if there is one.
Cheers,
Jim

On Fri, Apr 7, 2023 at 10:57 AM Bryan Bende 
mailto:bbe...@gmail.com>> wrote:
James,

I'm not sure what the end goal is, but why do you need to use
EvaluateJsonPath and SplitJson?

Generally you don't want to split a flow file of multiple records into
1 record per flow file, this is an anti-pattern that leads to poor
performance in the flow.

Thanks,

Bryan

On Fri, Apr 7, 2023 at 9:41 AM James McMahon 
mailto:jsmcmah...@gmail.com>> wrote:
>
> Very interesting, very helpful insights. Thank you again, Mike.
> Late last night I decided to punt on a pure NiFi solution. I knew I could do 
> this easily with Groovy scripting, and I knew that was well-within my 
> wheelhouse. So that's what I did: Groovy from an ExecuteScript processor. I'm 
> 90% of the way there. Just a few more refinements to get just what I want, 
> which I'll tackle later tonight.
> Groovy is pretty cool. Flexible, easily tailored to just what you need. I 
> like having that flexibility. And I like having options, too: your results 
> have motivated me to look at using QueryRecords, etc etc.
> Jim
>
> On Fri, Apr 7, 2023 at 9:32 AM Mike Sofen 
> mailto:mso...@runbox.com>> wrote:
>>
>> This is where I felt Nifi wasn’t the right tool for the job and Postgres 
>> was.  After I imported the CSV directly into a staging table in the database 
>> (using Nifi), I converted the payload part of the columns into jsonb and 
>> stored that into the final table in a column with additional columns as 
>> relational data (timestamps, identifiers, etc).  It was an object-relational 
>> data model.
>>
>>
>>
>> THEN, using the amazingly powerful Postgres jsonb functions, I was able to 
>> extract the unique keys in an entire dataset or across multiple datasets (to 
>> build a data catalog for example), perform a wide range of validations on 
>> individual keys, etc.  I use the word amazing because they are not just 
>> powerful functions but they run surprisingly fast given the amount of string 
>> data they are traversing.
>>
>>
>>
>> Mike Sofen
>>
>>
>>
>> From: James McMahon mailto:jsmcmah...@gmail.com>>
>> Sent: Thursday, April 06, 2023 2:03 PM
>> To: users@nifi.apache.org<mailto:users@nifi.apache.org>
>> Subject: Re: Handling CSVs dynamically with NiFi
>>
>>
>>
>> Can I ask you one follow-up? I've gotten my ConvertRecord to work. I created 
>> a CsvReader service with Schema Access Strategy of Use String Fields From 
>> Header. I created a JsonRecordSetWriter service with Schema Write Strategy 
>> of Do Not Write Schema.
>>
>> When ConvertRecord is finished, my result looks like this sample:
>>
>> [ {
>>   "Bank Name�" : "Almena State Bank",
>>   "City�" : "Almena",
>>   "State�" : "KS",
>>   "Cert�" : "15426",
>>   "Acquiring Institution�" : "Equity Bank",
>>   "Closing Date�" : "23-Oct-20",
>>   "Fund" : "10538"
>> }, {
>>   "Bank Name�" : "First City Bank of Florida",
>>   "City�" : "Fort Walton Beach",
>>   "State�" : "FL",
>>   "Cert�" : "16748",
>>   "Acquiring Institution�"

Re: Handling CSVs dynamically with NiFi

2023-04-12 Thread James McMahon
Thank you very much Isha. This is helpful. Assuming I wanted to route to N
different output paths, does it follow that I need to use N different Query
Record processors tailored to filter for just one subset?
I'll have to experiment with it to develop more of a feel for how it can be
used.
Thanks again for taking a moment to reply with the suggestion.
Jim

On Wed, Apr 12, 2023 at 6:54 AM Isha Lamboo 
wrote:

> Hi James,
>
>
>
> One option you can use is the QueryRecord processor. It allows you to
> filter records with a SQL-like query for any combination of fields that
> your downstream tools require. You can add one for each different output
> required and send a copy of the main json file to each.
>
>
>
> This approach should work better if you have a limited number of different
> output files with many records each. If your goal is hundreds of different
> json files with a handful of records each, then splitting per row might be
> quicker than copying the entire json file that many times.
>
>
>
> Regards,
>
>
>
> Isha
>
>
>
> *Van:* James McMahon 
> *Verzonden:* vrijdag 7 april 2023 17:14
> *Aan:* users@nifi.apache.org
> *Onderwerp:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Hello Bryan. Thank you for your question.
>
> A downstream consumer requires the complete set in json. So that's part of
> why I convert.
>
> Other downstream tools require json input, but not the entire set. The
> data needs to be routed based on certain features. Geographic location.
> Sales data by zip codes. Etc, etc. Splitting the records out seemed to be a
> reasonable option to route individual records.
>
> I appreciate you taking the time to ask. You are far more familiar with
> nifi best practices than me. If there is a better way than what I intended,
> please fire away. I'd love to mArch down a better path if there is one.
>
> Cheers,
>
> Jim
>
>
>
> On Fri, Apr 7, 2023 at 10:57 AM Bryan Bende  wrote:
>
> James,
>
> I'm not sure what the end goal is, but why do you need to use
> EvaluateJsonPath and SplitJson?
>
> Generally you don't want to split a flow file of multiple records into
> 1 record per flow file, this is an anti-pattern that leads to poor
> performance in the flow.
>
> Thanks,
>
> Bryan
>
> On Fri, Apr 7, 2023 at 9:41 AM James McMahon  wrote:
> >
> > Very interesting, very helpful insights. Thank you again, Mike.
> > Late last night I decided to punt on a pure NiFi solution. I knew I
> could do this easily with Groovy scripting, and I knew that was well-within
> my wheelhouse. So that's what I did: Groovy from an ExecuteScript
> processor. I'm 90% of the way there. Just a few more refinements to get
> just what I want, which I'll tackle later tonight.
> > Groovy is pretty cool. Flexible, easily tailored to just what you need.
> I like having that flexibility. And I like having options, too: your
> results have motivated me to look at using QueryRecords, etc etc.
> > Jim
> >
> > On Fri, Apr 7, 2023 at 9:32 AM Mike Sofen  wrote:
> >>
> >> This is where I felt Nifi wasn’t the right tool for the job and
> Postgres was.  After I imported the CSV directly into a staging table in
> the database (using Nifi), I converted the payload part of the columns into
> jsonb and stored that into the final table in a column with additional
> columns as relational data (timestamps, identifiers, etc).  It was an
> object-relational data model.
> >>
> >>
> >>
> >> THEN, using the amazingly powerful Postgres jsonb functions, I was able
> to extract the unique keys in an entire dataset or across multiple datasets
> (to build a data catalog for example), perform a wide range of validations
> on individual keys, etc.  I use the word amazing because they are not just
> powerful functions but they run surprisingly fast given the amount of
> string data they are traversing.
> >>
> >>
> >>
> >> Mike Sofen
> >>
> >>
> >>
> >> From: James McMahon 
> >> Sent: Thursday, April 06, 2023 2:03 PM
> >> To: users@nifi.apache.org
> >> Subject: Re: Handling CSVs dynamically with NiFi
> >>
> >>
> >>
> >> Can I ask you one follow-up? I've gotten my ConvertRecord to work. I
> created a CsvReader service with Schema Access Strategy of Use String
> Fields From Header. I created a JsonRecordSetWriter service with Schema
> Write Strategy of Do Not Write Schema.
> >>
> >> When ConvertRecord is finished, my result looks like this sample:
> >>
> >> [ {
> >>   "Bank Name�&

RE: Handling CSVs dynamically with NiFi

2023-04-12 Thread Isha Lamboo
Hi James,

I’ve overlooked an even better option: PartitionRecord.

This allows you to add custom properties representing fields in the data. The 
processor will then put records into flowfiles that contain the same 
combination of your properties. So if you have say “City” with a recordpath of 
/sale/store/city it should split the data and spit out a file for each city 
value found.

Depending on your N outputs, you may be able to do this in one go or chain a 
few Processors (say “state”, then “city” then “zipcode”).

Regards,

Isha

Van: James McMahon 
Verzonden: woensdag 12 april 2023 14:56
Aan: users@nifi.apache.org
Onderwerp: Re: Handling CSVs dynamically with NiFi

Thank you very much Isha. This is helpful. Assuming I wanted to route to N 
different output paths, does it follow that I need to use N different Query 
Record processors tailored to filter for just one subset?
I'll have to experiment with it to develop more of a feel for how it can be 
used.
Thanks again for taking a moment to reply with the suggestion.
Jim

On Wed, Apr 12, 2023 at 6:54 AM Isha Lamboo 
mailto:isha.lam...@virtualsciences.nl>> wrote:
Hi James,

One option you can use is the QueryRecord processor. It allows you to filter 
records with a SQL-like query for any combination of fields that your 
downstream tools require. You can add one for each different output required 
and send a copy of the main json file to each.

This approach should work better if you have a limited number of different 
output files with many records each. If your goal is hundreds of different json 
files with a handful of records each, then splitting per row might be quicker 
than copying the entire json file that many times.

Regards,

Isha

Van: James McMahon mailto:jsmcmah...@gmail.com>>
Verzonden: vrijdag 7 april 2023 17:14
Aan: users@nifi.apache.org<mailto:users@nifi.apache.org>
Onderwerp: Re: Handling CSVs dynamically with NiFi

Hello Bryan. Thank you for your question.
A downstream consumer requires the complete set in json. So that's part of why 
I convert.
Other downstream tools require json input, but not the entire set. The data 
needs to be routed based on certain features. Geographic location. Sales data 
by zip codes. Etc, etc. Splitting the records out seemed to be a reasonable 
option to route individual records.
I appreciate you taking the time to ask. You are far more familiar with nifi 
best practices than me. If there is a better way than what I intended, please 
fire away. I'd love to mArch down a better path if there is one.
Cheers,
Jim

On Fri, Apr 7, 2023 at 10:57 AM Bryan Bende 
mailto:bbe...@gmail.com>> wrote:
James,

I'm not sure what the end goal is, but why do you need to use
EvaluateJsonPath and SplitJson?

Generally you don't want to split a flow file of multiple records into
1 record per flow file, this is an anti-pattern that leads to poor
performance in the flow.

Thanks,

Bryan

On Fri, Apr 7, 2023 at 9:41 AM James McMahon 
mailto:jsmcmah...@gmail.com>> wrote:
>
> Very interesting, very helpful insights. Thank you again, Mike.
> Late last night I decided to punt on a pure NiFi solution. I knew I could do 
> this easily with Groovy scripting, and I knew that was well-within my 
> wheelhouse. So that's what I did: Groovy from an ExecuteScript processor. I'm 
> 90% of the way there. Just a few more refinements to get just what I want, 
> which I'll tackle later tonight.
> Groovy is pretty cool. Flexible, easily tailored to just what you need. I 
> like having that flexibility. And I like having options, too: your results 
> have motivated me to look at using QueryRecords, etc etc.
> Jim
>
> On Fri, Apr 7, 2023 at 9:32 AM Mike Sofen 
> mailto:mso...@runbox.com>> wrote:
>>
>> This is where I felt Nifi wasn’t the right tool for the job and Postgres 
>> was.  After I imported the CSV directly into a staging table in the database 
>> (using Nifi), I converted the payload part of the columns into jsonb and 
>> stored that into the final table in a column with additional columns as 
>> relational data (timestamps, identifiers, etc).  It was an object-relational 
>> data model.
>>
>>
>>
>> THEN, using the amazingly powerful Postgres jsonb functions, I was able to 
>> extract the unique keys in an entire dataset or across multiple datasets (to 
>> build a data catalog for example), perform a wide range of validations on 
>> individual keys, etc.  I use the word amazing because they are not just 
>> powerful functions but they run surprisingly fast given the amount of string 
>> data they are traversing.
>>
>>
>>
>> Mike Sofen
>>
>>
>>
>> From: James McMahon mailto:jsmcmah...@gmail.com>>
>> Sent: Thursday, April 06, 2023 2:03 PM
>> To: users@nifi.apa

Re: Handling CSVs dynamically with NiFi

2023-04-12 Thread James McMahon
Very cool. That sounds very promising. Thank you again Isha.

On Wed, Apr 12, 2023 at 9:23 AM Isha Lamboo 
wrote:

> Hi James,
>
>
>
> I’ve overlooked an even better option: PartitionRecord.
>
>
>
> This allows you to add custom properties representing fields in the data.
> The processor will then put records into flowfiles that contain the same
> combination of your properties. So if you have say “City” with a recordpath
> of /sale/store/city it should split the data and spit out a file for each
> city value found.
>
>
>
> Depending on your N outputs, you may be able to do this in one go or chain
> a few Processors (say “state”, then “city” then “zipcode”).
>
>
>
> Regards,
>
>
>
> Isha
>
>
>
> *Van:* James McMahon 
> *Verzonden:* woensdag 12 april 2023 14:56
> *Aan:* users@nifi.apache.org
> *Onderwerp:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Thank you very much Isha. This is helpful. Assuming I wanted to route to N
> different output paths, does it follow that I need to use N different Query
> Record processors tailored to filter for just one subset?
>
> I'll have to experiment with it to develop more of a feel for how it can
> be used.
>
> Thanks again for taking a moment to reply with the suggestion.
>
> Jim
>
>
>
> On Wed, Apr 12, 2023 at 6:54 AM Isha Lamboo <
> isha.lam...@virtualsciences.nl> wrote:
>
> Hi James,
>
>
>
> One option you can use is the QueryRecord processor. It allows you to
> filter records with a SQL-like query for any combination of fields that
> your downstream tools require. You can add one for each different output
> required and send a copy of the main json file to each.
>
>
>
> This approach should work better if you have a limited number of different
> output files with many records each. If your goal is hundreds of different
> json files with a handful of records each, then splitting per row might be
> quicker than copying the entire json file that many times.
>
>
>
> Regards,
>
>
>
> Isha
>
>
>
> *Van:* James McMahon 
> *Verzonden:* vrijdag 7 april 2023 17:14
> *Aan:* users@nifi.apache.org
> *Onderwerp:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Hello Bryan. Thank you for your question.
>
> A downstream consumer requires the complete set in json. So that's part of
> why I convert.
>
> Other downstream tools require json input, but not the entire set. The
> data needs to be routed based on certain features. Geographic location.
> Sales data by zip codes. Etc, etc. Splitting the records out seemed to be a
> reasonable option to route individual records.
>
> I appreciate you taking the time to ask. You are far more familiar with
> nifi best practices than me. If there is a better way than what I intended,
> please fire away. I'd love to mArch down a better path if there is one.
>
> Cheers,
>
> Jim
>
>
>
> On Fri, Apr 7, 2023 at 10:57 AM Bryan Bende  wrote:
>
> James,
>
> I'm not sure what the end goal is, but why do you need to use
> EvaluateJsonPath and SplitJson?
>
> Generally you don't want to split a flow file of multiple records into
> 1 record per flow file, this is an anti-pattern that leads to poor
> performance in the flow.
>
> Thanks,
>
> Bryan
>
> On Fri, Apr 7, 2023 at 9:41 AM James McMahon  wrote:
> >
> > Very interesting, very helpful insights. Thank you again, Mike.
> > Late last night I decided to punt on a pure NiFi solution. I knew I
> could do this easily with Groovy scripting, and I knew that was well-within
> my wheelhouse. So that's what I did: Groovy from an ExecuteScript
> processor. I'm 90% of the way there. Just a few more refinements to get
> just what I want, which I'll tackle later tonight.
> > Groovy is pretty cool. Flexible, easily tailored to just what you need.
> I like having that flexibility. And I like having options, too: your
> results have motivated me to look at using QueryRecords, etc etc.
> > Jim
> >
> > On Fri, Apr 7, 2023 at 9:32 AM Mike Sofen  wrote:
> >>
> >> This is where I felt Nifi wasn’t the right tool for the job and
> Postgres was.  After I imported the CSV directly into a staging table in
> the database (using Nifi), I converted the payload part of the columns into
> jsonb and stored that into the final table in a column with additional
> columns as relational data (timestamps, identifiers, etc).  It was an
> object-relational data model.
> >>
> >>
> >>
> >> THEN, using the amazingly powerful Postgres jsonb functions, I was able
> to extract the unique keys in an entire data