Re: How can I View my flowfile records?

2023-09-25 Thread Chris Sampson
I completely missed the fact that this was an external python conversion
script through the ExecuteStreamCommand, but as Matt says, that will be
catered for in the new NiFi versions.

>From a quick look, although I've not tested to confirm, it appears both the
existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can now
be paired with the relatively new ExcelReader,  e.g. in a ConvertRecord
processor) will both set the result flowfile's mime.type attribute as
text/csv, which would allow the expected downstream content viewer
behaviour.

On Mon, 25 Sept 2023, 06:54 Matt Burgess,  wrote:

> I added MIME Type properties to ExecuteProcess and ExecuteStream command
> so you can set it explicitly if you want [1]. They will be in the 1.24.0
> and 2.0 releases.
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-12011
>
>
> On Mon, Sep 25, 2023 at 1:41 AM Joe Witt  wrote:
>
>>  Chris
>>
>> Yep. Though this case was ExecuteStreamCommand so following with
>> UpdateAttr as you mention or IdentifyMimeType would do the trick.
>>
>> Thanks
>>
>> On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson 
>> wrote:
>>
>>> An UpdateAttribute could also be used to update the mime.type, e.g. to
>>> text/csv.
>>>
>>> I'd think the csv record writer should probably do this automatically
>>> though, so maybe worth a jira to correct that (I'm reasonably sure the
>>> existing json and avro writers do that, for example).
>>>
>>> On Sun, 24 Sept 2023, 23:52 James McMahon,  wrote:
>>>
 That was it. I was missing the forest for the trees, yet again . I
 do all the hard work and then forget to IdentifyMimeType at the end.
 Thanks very much Joe.
 Jim

 On Sun, Sep 24, 2023 at 6:30 PM Joe Witt  wrote:

> Jim,
>
> Before you try to view it you can likely run it through
> IdentifyMimeType.  As you note the conversion from XLS to CSV happens but
> we still see a mime type of 'application/vnd.
> openxmlformats-officedocument.spreadsheetml.sheet' so that is likely
> causing it to not even attempt to display.  So after your python script
> execution run the data through IdentifyMimeType then you can likely view 
> it
> just fine.
>
> Thanks
>
> On Sun, Sep 24, 2023 at 3:21 PM James McMahon 
> wrote:
>
>> I sure can Joe. Here they are:
>>
>> RouteOnAttribute.Route
>> isExcel
>> execution.command
>> /usr/bin/python3
>> execution.command.args
>> /opt/nifi/config_resources/scripts/excelToCSV.py
>> execution.error
>> Empty string set
>> execution.status
>> 0
>> filename
>> Alltables.csv
>> hash.value.md5
>> b48840c161b645a0169e622dcb8f5083
>> hash.value.sha256
>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>> isChild
>> false
>> mime.extension
>> .xlsx
>> mime.type
>> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
>> parent.MD5
>> b48840c161b645a0169e622dcb8f5083
>> parent.SHA256
>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>> path
>> ./
>> s3.bucket
>> rampart-raw-data
>> s3.encryptionStrategy
>> SSE_S3
>> s3.etag
>> b48840c161b645a0169e622dcb8f5083
>> s3.isLatest
>> true
>> s3.lastModified
>> 1672701227000
>> s3.length
>> 830934
>> s3.owner
>> b34a7aa80a4130503fee2e8d4c2b674e154af3c4db69db9a4e3bff8a47cc92d1
>> s3.sseAlgorithm
>> AES256
>> s3.storeClass
>> STANDARD
>> s3.version
>> null
>> sourcing.MD5
>> b48840c161b645a0169e622dcb8f5083
>> sourcing.SHA256
>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>> sourcing.sourceMD5
>> b48840c161b645a0169e622dcb8f5083
>> sourcing.sourceSHA256
>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>> triage.datatype
>> excel
>> uuid
>> d72ec2e9-cfbd-435e-9954-4f7fae55c550
>>
>> Thanks for any help. Perhaps my data is there but I simply can't
>> render it in the Viewer?
>> Jim
>>
>> On Sun, Sep 24, 2023 at 6:08 PM Joe Witt  wrote:
>>
>>> Jim,
>>>
>>> If a content type attribute exists and is not a type NiFi
>>> understands it will not be able to render it.  Can you show what 
>>> flowfile
>>> attributes are present at the point you attempt to view it?
>>>
>>> Thanks
>>>
>>> On Sun, Sep 24, 2023 at 3:03 PM James McMahon 
>>> wrote:
>>>
 Hello. I have converted incoming Excel files to csv. I'd like to
 look at the result, but when I select my flowfiles from the output 
 queue, I
 can only select "View as hex" - but I cannot get the display to show 
 me the
 records in the form I expect. Viewing them using the hex display is not
 helpful.

 How can I fix this viewing issue?

 Here is an example of wh

Re: How can I View my flowfile records?

2023-09-25 Thread James McMahon
A couple of random questions about ConvertExcelToCSVProcessor:

Why does this processor only handle the xlsx Excel file format?  From the
Description for ConvertExcelToCSVProcessor:  "*This processor is currently
only capable of processing .xlsx (XSSF 2007 OOXML file format) Excel
documents and not older .xls (HSSF '97(-2007) file format) documents.*" I
ask because it seems unfortunate to have to develop a separate distinct
flow path to handle the .xls files that this native processor cannot. Why
was it that handling of xls Excel files was not baked into
ConvertExcelToCSVProcessor too? Do later releases lift this limitation?

What is it about this processor that required including the word Processor
in its name? It seems redundant and inconsistent with the naming convention
used for the majority of the other processors. I figure there was an
interesting reason behind this, and so wanted to ask.

I am using a slightly older version of NiFi. Does this limitation go away
in later versions?

On Mon, Sep 25, 2023 at 3:23 AM Chris Sampson 
wrote:

> I completely missed the fact that this was an external python conversion
> script through the ExecuteStreamCommand, but as Matt says, that will be
> catered for in the new NiFi versions.
>
> From a quick look, although I've not tested to confirm, it appears both
> the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can
> now be paired with the relatively new ExcelReader,  e.g. in a ConvertRecord
> processor) will both set the result flowfile's mime.type attribute as
> text/csv, which would allow the expected downstream content viewer
> behaviour.
>
> On Mon, 25 Sept 2023, 06:54 Matt Burgess,  wrote:
>
>> I added MIME Type properties to ExecuteProcess and ExecuteStream command
>> so you can set it explicitly if you want [1]. They will be in the 1.24.0
>> and 2.0 releases.
>>
>> Regards,
>> Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-12011
>>
>>
>> On Mon, Sep 25, 2023 at 1:41 AM Joe Witt  wrote:
>>
>>>  Chris
>>>
>>> Yep. Though this case was ExecuteStreamCommand so following with
>>> UpdateAttr as you mention or IdentifyMimeType would do the trick.
>>>
>>> Thanks
>>>
>>> On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson <
>>> chris.samp...@naimuri.com> wrote:
>>>
 An UpdateAttribute could also be used to update the mime.type, e.g. to
 text/csv.

 I'd think the csv record writer should probably do this automatically
 though, so maybe worth a jira to correct that (I'm reasonably sure the
 existing json and avro writers do that, for example).

 On Sun, 24 Sept 2023, 23:52 James McMahon, 
 wrote:

> That was it. I was missing the forest for the trees, yet again .
> I do all the hard work and then forget to IdentifyMimeType at the end.
> Thanks very much Joe.
> Jim
>
> On Sun, Sep 24, 2023 at 6:30 PM Joe Witt  wrote:
>
>> Jim,
>>
>> Before you try to view it you can likely run it through
>> IdentifyMimeType.  As you note the conversion from XLS to CSV happens but
>> we still see a mime type of 'application/vnd.
>> openxmlformats-officedocument.spreadsheetml.sheet' so that is likely
>> causing it to not even attempt to display.  So after your python script
>> execution run the data through IdentifyMimeType then you can likely view 
>> it
>> just fine.
>>
>> Thanks
>>
>> On Sun, Sep 24, 2023 at 3:21 PM James McMahon 
>> wrote:
>>
>>> I sure can Joe. Here they are:
>>>
>>> RouteOnAttribute.Route
>>> isExcel
>>> execution.command
>>> /usr/bin/python3
>>> execution.command.args
>>> /opt/nifi/config_resources/scripts/excelToCSV.py
>>> execution.error
>>> Empty string set
>>> execution.status
>>> 0
>>> filename
>>> Alltables.csv
>>> hash.value.md5
>>> b48840c161b645a0169e622dcb8f5083
>>> hash.value.sha256
>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>> isChild
>>> false
>>> mime.extension
>>> .xlsx
>>> mime.type
>>> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
>>> parent.MD5
>>> b48840c161b645a0169e622dcb8f5083
>>> parent.SHA256
>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>> path
>>> ./
>>> s3.bucket
>>> rampart-raw-data
>>> s3.encryptionStrategy
>>> SSE_S3
>>> s3.etag
>>> b48840c161b645a0169e622dcb8f5083
>>> s3.isLatest
>>> true
>>> s3.lastModified
>>> 1672701227000
>>> s3.length
>>> 830934
>>> s3.owner
>>> b34a7aa80a4130503fee2e8d4c2b674e154af3c4db69db9a4e3bff8a47cc92d1
>>> s3.sseAlgorithm
>>> AES256
>>> s3.storeClass
>>> STANDARD
>>> s3.version
>>> null
>>> sourcing.MD5
>>> b48840c161b645a0169e622dcb8f5083
>>> sourcing.SHA256
>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>> sourcing.sourceMD5
>>> b48840c

Re: How can I View my flowfile records?

2023-09-25 Thread Joe Witt
Jim

I don't really recall the history of that specific processor but what it
can handle is just a function of what it was coded for the libraries it
uses.  I'm sure older format libraries required some other library.  That
said I think we should consider removing that component in the 2.x and
instead favor the ExcelReader [1].  It has the same noted limitation but
I'm sure that can be addressed.

[1]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-poi-nar/1.23.2/org.apache.nifi.excel.ExcelReader/index.html

Thanks

On Mon, Sep 25, 2023 at 4:02 AM James McMahon  wrote:

> A couple of random questions about ConvertExcelToCSVProcessor:
>
> Why does this processor only handle the xlsx Excel file format?  From the
> Description for ConvertExcelToCSVProcessor:  "*This processor is
> currently only capable of processing .xlsx (XSSF 2007 OOXML file format)
> Excel documents and not older .xls (HSSF '97(-2007) file format) documents.*"
> I ask because it seems unfortunate to have to develop a separate distinct
> flow path to handle the .xls files that this native processor cannot. Why
> was it that handling of xls Excel files was not baked into
> ConvertExcelToCSVProcessor too? Do later releases lift this limitation?
>
> What is it about this processor that required including the word Processor
> in its name? It seems redundant and inconsistent with the naming convention
> used for the majority of the other processors. I figure there was an
> interesting reason behind this, and so wanted to ask.
>
> I am using a slightly older version of NiFi. Does this limitation go away
> in later versions?
>
> On Mon, Sep 25, 2023 at 3:23 AM Chris Sampson 
> wrote:
>
>> I completely missed the fact that this was an external python conversion
>> script through the ExecuteStreamCommand, but as Matt says, that will be
>> catered for in the new NiFi versions.
>>
>> From a quick look, although I've not tested to confirm, it appears both
>> the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can
>> now be paired with the relatively new ExcelReader,  e.g. in a ConvertRecord
>> processor) will both set the result flowfile's mime.type attribute as
>> text/csv, which would allow the expected downstream content viewer
>> behaviour.
>>
>> On Mon, 25 Sept 2023, 06:54 Matt Burgess,  wrote:
>>
>>> I added MIME Type properties to ExecuteProcess and ExecuteStream command
>>> so you can set it explicitly if you want [1]. They will be in the 1.24.0
>>> and 2.0 releases.
>>>
>>> Regards,
>>> Matt
>>>
>>> [1] https://issues.apache.org/jira/browse/NIFI-12011
>>>
>>>
>>> On Mon, Sep 25, 2023 at 1:41 AM Joe Witt  wrote:
>>>
  Chris

 Yep. Though this case was ExecuteStreamCommand so following with
 UpdateAttr as you mention or IdentifyMimeType would do the trick.

 Thanks

 On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson <
 chris.samp...@naimuri.com> wrote:

> An UpdateAttribute could also be used to update the mime.type, e.g. to
> text/csv.
>
> I'd think the csv record writer should probably do this automatically
> though, so maybe worth a jira to correct that (I'm reasonably sure the
> existing json and avro writers do that, for example).
>
> On Sun, 24 Sept 2023, 23:52 James McMahon, 
> wrote:
>
>> That was it. I was missing the forest for the trees, yet again .
>> I do all the hard work and then forget to IdentifyMimeType at the end.
>> Thanks very much Joe.
>> Jim
>>
>> On Sun, Sep 24, 2023 at 6:30 PM Joe Witt  wrote:
>>
>>> Jim,
>>>
>>> Before you try to view it you can likely run it through
>>> IdentifyMimeType.  As you note the conversion from XLS to CSV happens 
>>> but
>>> we still see a mime type of 'application/vnd.
>>> openxmlformats-officedocument.spreadsheetml.sheet' so that is
>>> likely causing it to not even attempt to display.  So after your python
>>> script execution run the data through IdentifyMimeType then you can 
>>> likely
>>> view it just fine.
>>>
>>> Thanks
>>>
>>> On Sun, Sep 24, 2023 at 3:21 PM James McMahon 
>>> wrote:
>>>
 I sure can Joe. Here they are:

 RouteOnAttribute.Route
 isExcel
 execution.command
 /usr/bin/python3
 execution.command.args
 /opt/nifi/config_resources/scripts/excelToCSV.py
 execution.error
 Empty string set
 execution.status
 0
 filename
 Alltables.csv
 hash.value.md5
 b48840c161b645a0169e622dcb8f5083
 hash.value.sha256
 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
 isChild
 false
 mime.extension
 .xlsx
 mime.type
 application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
 parent.MD5
 b48840c161b645a0169e622dcb8f5083
 parent.SHA256
 484

Re: How can I View my flowfile records?

2023-09-25 Thread David Handermann
Jim,

Regarding format support, both the Processor and the new ExcelReader are
limited to the XLSX format and do not support the older binary XLS format.
The code required to support XLS has substantial differences from the code
for XLSX. The older format could be support through the Apache POI library,
but it has not been implemented. Feel free to file an Apache NiFi Jira
issue requesting general support for XLS. It would be helpful to describe
the use case, since the XLSX format dates back to Excel 2007.

As Joe noted, the newer ExcelReader should be preferred over the Convert
Processor, which probably should be deprecated for removal in the next
major release version.

Regards,
David Handermann

On Mon, Sep 25, 2023, 8:24 AM Joe Witt  wrote:

> Jim
>
> I don't really recall the history of that specific processor but what it
> can handle is just a function of what it was coded for the libraries it
> uses.  I'm sure older format libraries required some other library.  That
> said I think we should consider removing that component in the 2.x and
> instead favor the ExcelReader [1].  It has the same noted limitation but
> I'm sure that can be addressed.
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-poi-nar/1.23.2/org.apache.nifi.excel.ExcelReader/index.html
>
> Thanks
>
> On Mon, Sep 25, 2023 at 4:02 AM James McMahon 
> wrote:
>
>> A couple of random questions about ConvertExcelToCSVProcessor:
>>
>> Why does this processor only handle the xlsx Excel file format?  From the
>> Description for ConvertExcelToCSVProcessor:  "*This processor is
>> currently only capable of processing .xlsx (XSSF 2007 OOXML file format)
>> Excel documents and not older .xls (HSSF '97(-2007) file format) documents.*"
>> I ask because it seems unfortunate to have to develop a separate distinct
>> flow path to handle the .xls files that this native processor cannot. Why
>> was it that handling of xls Excel files was not baked into
>> ConvertExcelToCSVProcessor too? Do later releases lift this limitation?
>>
>> What is it about this processor that required including the word
>> Processor in its name? It seems redundant and inconsistent with the naming
>> convention used for the majority of the other processors. I figure there
>> was an interesting reason behind this, and so wanted to ask.
>>
>> I am using a slightly older version of NiFi. Does this limitation go away
>> in later versions?
>>
>> On Mon, Sep 25, 2023 at 3:23 AM Chris Sampson 
>> wrote:
>>
>>> I completely missed the fact that this was an external python conversion
>>> script through the ExecuteStreamCommand, but as Matt says, that will be
>>> catered for in the new NiFi versions.
>>>
>>> From a quick look, although I've not tested to confirm, it appears both
>>> the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can
>>> now be paired with the relatively new ExcelReader,  e.g. in a ConvertRecord
>>> processor) will both set the result flowfile's mime.type attribute as
>>> text/csv, which would allow the expected downstream content viewer
>>> behaviour.
>>>
>>> On Mon, 25 Sept 2023, 06:54 Matt Burgess,  wrote:
>>>
 I added MIME Type properties to ExecuteProcess and ExecuteStream
 command so you can set it explicitly if you want [1]. They will be in the
 1.24.0 and 2.0 releases.

 Regards,
 Matt

 [1] https://issues.apache.org/jira/browse/NIFI-12011


 On Mon, Sep 25, 2023 at 1:41 AM Joe Witt  wrote:

>  Chris
>
> Yep. Though this case was ExecuteStreamCommand so following with
> UpdateAttr as you mention or IdentifyMimeType would do the trick.
>
> Thanks
>
> On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson <
> chris.samp...@naimuri.com> wrote:
>
>> An UpdateAttribute could also be used to update the mime.type, e.g.
>> to text/csv.
>>
>> I'd think the csv record writer should probably do this automatically
>> though, so maybe worth a jira to correct that (I'm reasonably sure the
>> existing json and avro writers do that, for example).
>>
>> On Sun, 24 Sept 2023, 23:52 James McMahon, 
>> wrote:
>>
>>> That was it. I was missing the forest for the trees, yet again
>>> . I do all the hard work and then forget to IdentifyMimeType at the
>>> end.
>>> Thanks very much Joe.
>>> Jim
>>>
>>> On Sun, Sep 24, 2023 at 6:30 PM Joe Witt  wrote:
>>>
 Jim,

 Before you try to view it you can likely run it through
 IdentifyMimeType.  As you note the conversion from XLS to CSV happens 
 but
 we still see a mime type of 'application/vnd.
 openxmlformats-officedocument.spreadsheetml.sheet' so that is
 likely causing it to not even attempt to display.  So after your python
 script execution run the data through IdentifyMimeType then you can 
 likely
 view it just fine.

 Thanks
>