Re: How can I View my flowfile records?
I completely missed the fact that this was an external python conversion script through the ExecuteStreamCommand, but as Matt says, that will be catered for in the new NiFi versions. >From a quick look, although I've not tested to confirm, it appears both the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can now be paired with the relatively new ExcelReader, e.g. in a ConvertRecord processor) will both set the result flowfile's mime.type attribute as text/csv, which would allow the expected downstream content viewer behaviour. On Mon, 25 Sept 2023, 06:54 Matt Burgess, wrote: > I added MIME Type properties to ExecuteProcess and ExecuteStream command > so you can set it explicitly if you want [1]. They will be in the 1.24.0 > and 2.0 releases. > > Regards, > Matt > > [1] https://issues.apache.org/jira/browse/NIFI-12011 > > > On Mon, Sep 25, 2023 at 1:41 AM Joe Witt wrote: > >> Chris >> >> Yep. Though this case was ExecuteStreamCommand so following with >> UpdateAttr as you mention or IdentifyMimeType would do the trick. >> >> Thanks >> >> On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson >> wrote: >> >>> An UpdateAttribute could also be used to update the mime.type, e.g. to >>> text/csv. >>> >>> I'd think the csv record writer should probably do this automatically >>> though, so maybe worth a jira to correct that (I'm reasonably sure the >>> existing json and avro writers do that, for example). >>> >>> On Sun, 24 Sept 2023, 23:52 James McMahon, wrote: >>> That was it. I was missing the forest for the trees, yet again . I do all the hard work and then forget to IdentifyMimeType at the end. Thanks very much Joe. Jim On Sun, Sep 24, 2023 at 6:30 PM Joe Witt wrote: > Jim, > > Before you try to view it you can likely run it through > IdentifyMimeType. As you note the conversion from XLS to CSV happens but > we still see a mime type of 'application/vnd. > openxmlformats-officedocument.spreadsheetml.sheet' so that is likely > causing it to not even attempt to display. So after your python script > execution run the data through IdentifyMimeType then you can likely view > it > just fine. > > Thanks > > On Sun, Sep 24, 2023 at 3:21 PM James McMahon > wrote: > >> I sure can Joe. Here they are: >> >> RouteOnAttribute.Route >> isExcel >> execution.command >> /usr/bin/python3 >> execution.command.args >> /opt/nifi/config_resources/scripts/excelToCSV.py >> execution.error >> Empty string set >> execution.status >> 0 >> filename >> Alltables.csv >> hash.value.md5 >> b48840c161b645a0169e622dcb8f5083 >> hash.value.sha256 >> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa >> isChild >> false >> mime.extension >> .xlsx >> mime.type >> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet >> parent.MD5 >> b48840c161b645a0169e622dcb8f5083 >> parent.SHA256 >> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa >> path >> ./ >> s3.bucket >> rampart-raw-data >> s3.encryptionStrategy >> SSE_S3 >> s3.etag >> b48840c161b645a0169e622dcb8f5083 >> s3.isLatest >> true >> s3.lastModified >> 1672701227000 >> s3.length >> 830934 >> s3.owner >> b34a7aa80a4130503fee2e8d4c2b674e154af3c4db69db9a4e3bff8a47cc92d1 >> s3.sseAlgorithm >> AES256 >> s3.storeClass >> STANDARD >> s3.version >> null >> sourcing.MD5 >> b48840c161b645a0169e622dcb8f5083 >> sourcing.SHA256 >> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa >> sourcing.sourceMD5 >> b48840c161b645a0169e622dcb8f5083 >> sourcing.sourceSHA256 >> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa >> triage.datatype >> excel >> uuid >> d72ec2e9-cfbd-435e-9954-4f7fae55c550 >> >> Thanks for any help. Perhaps my data is there but I simply can't >> render it in the Viewer? >> Jim >> >> On Sun, Sep 24, 2023 at 6:08 PM Joe Witt wrote: >> >>> Jim, >>> >>> If a content type attribute exists and is not a type NiFi >>> understands it will not be able to render it. Can you show what >>> flowfile >>> attributes are present at the point you attempt to view it? >>> >>> Thanks >>> >>> On Sun, Sep 24, 2023 at 3:03 PM James McMahon >>> wrote: >>> Hello. I have converted incoming Excel files to csv. I'd like to look at the result, but when I select my flowfiles from the output queue, I can only select "View as hex" - but I cannot get the display to show me the records in the form I expect. Viewing them using the hex display is not helpful. How can I fix this viewing issue? Here is an example of wh
Re: How can I View my flowfile records?
A couple of random questions about ConvertExcelToCSVProcessor: Why does this processor only handle the xlsx Excel file format? From the Description for ConvertExcelToCSVProcessor: "*This processor is currently only capable of processing .xlsx (XSSF 2007 OOXML file format) Excel documents and not older .xls (HSSF '97(-2007) file format) documents.*" I ask because it seems unfortunate to have to develop a separate distinct flow path to handle the .xls files that this native processor cannot. Why was it that handling of xls Excel files was not baked into ConvertExcelToCSVProcessor too? Do later releases lift this limitation? What is it about this processor that required including the word Processor in its name? It seems redundant and inconsistent with the naming convention used for the majority of the other processors. I figure there was an interesting reason behind this, and so wanted to ask. I am using a slightly older version of NiFi. Does this limitation go away in later versions? On Mon, Sep 25, 2023 at 3:23 AM Chris Sampson wrote: > I completely missed the fact that this was an external python conversion > script through the ExecuteStreamCommand, but as Matt says, that will be > catered for in the new NiFi versions. > > From a quick look, although I've not tested to confirm, it appears both > the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can > now be paired with the relatively new ExcelReader, e.g. in a ConvertRecord > processor) will both set the result flowfile's mime.type attribute as > text/csv, which would allow the expected downstream content viewer > behaviour. > > On Mon, 25 Sept 2023, 06:54 Matt Burgess, wrote: > >> I added MIME Type properties to ExecuteProcess and ExecuteStream command >> so you can set it explicitly if you want [1]. They will be in the 1.24.0 >> and 2.0 releases. >> >> Regards, >> Matt >> >> [1] https://issues.apache.org/jira/browse/NIFI-12011 >> >> >> On Mon, Sep 25, 2023 at 1:41 AM Joe Witt wrote: >> >>> Chris >>> >>> Yep. Though this case was ExecuteStreamCommand so following with >>> UpdateAttr as you mention or IdentifyMimeType would do the trick. >>> >>> Thanks >>> >>> On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson < >>> chris.samp...@naimuri.com> wrote: >>> An UpdateAttribute could also be used to update the mime.type, e.g. to text/csv. I'd think the csv record writer should probably do this automatically though, so maybe worth a jira to correct that (I'm reasonably sure the existing json and avro writers do that, for example). On Sun, 24 Sept 2023, 23:52 James McMahon, wrote: > That was it. I was missing the forest for the trees, yet again . > I do all the hard work and then forget to IdentifyMimeType at the end. > Thanks very much Joe. > Jim > > On Sun, Sep 24, 2023 at 6:30 PM Joe Witt wrote: > >> Jim, >> >> Before you try to view it you can likely run it through >> IdentifyMimeType. As you note the conversion from XLS to CSV happens but >> we still see a mime type of 'application/vnd. >> openxmlformats-officedocument.spreadsheetml.sheet' so that is likely >> causing it to not even attempt to display. So after your python script >> execution run the data through IdentifyMimeType then you can likely view >> it >> just fine. >> >> Thanks >> >> On Sun, Sep 24, 2023 at 3:21 PM James McMahon >> wrote: >> >>> I sure can Joe. Here they are: >>> >>> RouteOnAttribute.Route >>> isExcel >>> execution.command >>> /usr/bin/python3 >>> execution.command.args >>> /opt/nifi/config_resources/scripts/excelToCSV.py >>> execution.error >>> Empty string set >>> execution.status >>> 0 >>> filename >>> Alltables.csv >>> hash.value.md5 >>> b48840c161b645a0169e622dcb8f5083 >>> hash.value.sha256 >>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa >>> isChild >>> false >>> mime.extension >>> .xlsx >>> mime.type >>> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet >>> parent.MD5 >>> b48840c161b645a0169e622dcb8f5083 >>> parent.SHA256 >>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa >>> path >>> ./ >>> s3.bucket >>> rampart-raw-data >>> s3.encryptionStrategy >>> SSE_S3 >>> s3.etag >>> b48840c161b645a0169e622dcb8f5083 >>> s3.isLatest >>> true >>> s3.lastModified >>> 1672701227000 >>> s3.length >>> 830934 >>> s3.owner >>> b34a7aa80a4130503fee2e8d4c2b674e154af3c4db69db9a4e3bff8a47cc92d1 >>> s3.sseAlgorithm >>> AES256 >>> s3.storeClass >>> STANDARD >>> s3.version >>> null >>> sourcing.MD5 >>> b48840c161b645a0169e622dcb8f5083 >>> sourcing.SHA256 >>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa >>> sourcing.sourceMD5 >>> b48840c
Re: How can I View my flowfile records?
Jim I don't really recall the history of that specific processor but what it can handle is just a function of what it was coded for the libraries it uses. I'm sure older format libraries required some other library. That said I think we should consider removing that component in the 2.x and instead favor the ExcelReader [1]. It has the same noted limitation but I'm sure that can be addressed. [1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-poi-nar/1.23.2/org.apache.nifi.excel.ExcelReader/index.html Thanks On Mon, Sep 25, 2023 at 4:02 AM James McMahon wrote: > A couple of random questions about ConvertExcelToCSVProcessor: > > Why does this processor only handle the xlsx Excel file format? From the > Description for ConvertExcelToCSVProcessor: "*This processor is > currently only capable of processing .xlsx (XSSF 2007 OOXML file format) > Excel documents and not older .xls (HSSF '97(-2007) file format) documents.*" > I ask because it seems unfortunate to have to develop a separate distinct > flow path to handle the .xls files that this native processor cannot. Why > was it that handling of xls Excel files was not baked into > ConvertExcelToCSVProcessor too? Do later releases lift this limitation? > > What is it about this processor that required including the word Processor > in its name? It seems redundant and inconsistent with the naming convention > used for the majority of the other processors. I figure there was an > interesting reason behind this, and so wanted to ask. > > I am using a slightly older version of NiFi. Does this limitation go away > in later versions? > > On Mon, Sep 25, 2023 at 3:23 AM Chris Sampson > wrote: > >> I completely missed the fact that this was an external python conversion >> script through the ExecuteStreamCommand, but as Matt says, that will be >> catered for in the new NiFi versions. >> >> From a quick look, although I've not tested to confirm, it appears both >> the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can >> now be paired with the relatively new ExcelReader, e.g. in a ConvertRecord >> processor) will both set the result flowfile's mime.type attribute as >> text/csv, which would allow the expected downstream content viewer >> behaviour. >> >> On Mon, 25 Sept 2023, 06:54 Matt Burgess, wrote: >> >>> I added MIME Type properties to ExecuteProcess and ExecuteStream command >>> so you can set it explicitly if you want [1]. They will be in the 1.24.0 >>> and 2.0 releases. >>> >>> Regards, >>> Matt >>> >>> [1] https://issues.apache.org/jira/browse/NIFI-12011 >>> >>> >>> On Mon, Sep 25, 2023 at 1:41 AM Joe Witt wrote: >>> Chris Yep. Though this case was ExecuteStreamCommand so following with UpdateAttr as you mention or IdentifyMimeType would do the trick. Thanks On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson < chris.samp...@naimuri.com> wrote: > An UpdateAttribute could also be used to update the mime.type, e.g. to > text/csv. > > I'd think the csv record writer should probably do this automatically > though, so maybe worth a jira to correct that (I'm reasonably sure the > existing json and avro writers do that, for example). > > On Sun, 24 Sept 2023, 23:52 James McMahon, > wrote: > >> That was it. I was missing the forest for the trees, yet again . >> I do all the hard work and then forget to IdentifyMimeType at the end. >> Thanks very much Joe. >> Jim >> >> On Sun, Sep 24, 2023 at 6:30 PM Joe Witt wrote: >> >>> Jim, >>> >>> Before you try to view it you can likely run it through >>> IdentifyMimeType. As you note the conversion from XLS to CSV happens >>> but >>> we still see a mime type of 'application/vnd. >>> openxmlformats-officedocument.spreadsheetml.sheet' so that is >>> likely causing it to not even attempt to display. So after your python >>> script execution run the data through IdentifyMimeType then you can >>> likely >>> view it just fine. >>> >>> Thanks >>> >>> On Sun, Sep 24, 2023 at 3:21 PM James McMahon >>> wrote: >>> I sure can Joe. Here they are: RouteOnAttribute.Route isExcel execution.command /usr/bin/python3 execution.command.args /opt/nifi/config_resources/scripts/excelToCSV.py execution.error Empty string set execution.status 0 filename Alltables.csv hash.value.md5 b48840c161b645a0169e622dcb8f5083 hash.value.sha256 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa isChild false mime.extension .xlsx mime.type application/vnd.openxmlformats-officedocument.spreadsheetml.sheet parent.MD5 b48840c161b645a0169e622dcb8f5083 parent.SHA256 484
Re: How can I View my flowfile records?
Jim, Regarding format support, both the Processor and the new ExcelReader are limited to the XLSX format and do not support the older binary XLS format. The code required to support XLS has substantial differences from the code for XLSX. The older format could be support through the Apache POI library, but it has not been implemented. Feel free to file an Apache NiFi Jira issue requesting general support for XLS. It would be helpful to describe the use case, since the XLSX format dates back to Excel 2007. As Joe noted, the newer ExcelReader should be preferred over the Convert Processor, which probably should be deprecated for removal in the next major release version. Regards, David Handermann On Mon, Sep 25, 2023, 8:24 AM Joe Witt wrote: > Jim > > I don't really recall the history of that specific processor but what it > can handle is just a function of what it was coded for the libraries it > uses. I'm sure older format libraries required some other library. That > said I think we should consider removing that component in the 2.x and > instead favor the ExcelReader [1]. It has the same noted limitation but > I'm sure that can be addressed. > > [1] > https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-poi-nar/1.23.2/org.apache.nifi.excel.ExcelReader/index.html > > Thanks > > On Mon, Sep 25, 2023 at 4:02 AM James McMahon > wrote: > >> A couple of random questions about ConvertExcelToCSVProcessor: >> >> Why does this processor only handle the xlsx Excel file format? From the >> Description for ConvertExcelToCSVProcessor: "*This processor is >> currently only capable of processing .xlsx (XSSF 2007 OOXML file format) >> Excel documents and not older .xls (HSSF '97(-2007) file format) documents.*" >> I ask because it seems unfortunate to have to develop a separate distinct >> flow path to handle the .xls files that this native processor cannot. Why >> was it that handling of xls Excel files was not baked into >> ConvertExcelToCSVProcessor too? Do later releases lift this limitation? >> >> What is it about this processor that required including the word >> Processor in its name? It seems redundant and inconsistent with the naming >> convention used for the majority of the other processors. I figure there >> was an interesting reason behind this, and so wanted to ask. >> >> I am using a slightly older version of NiFi. Does this limitation go away >> in later versions? >> >> On Mon, Sep 25, 2023 at 3:23 AM Chris Sampson >> wrote: >> >>> I completely missed the fact that this was an external python conversion >>> script through the ExecuteStreamCommand, but as Matt says, that will be >>> catered for in the new NiFi versions. >>> >>> From a quick look, although I've not tested to confirm, it appears both >>> the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can >>> now be paired with the relatively new ExcelReader, e.g. in a ConvertRecord >>> processor) will both set the result flowfile's mime.type attribute as >>> text/csv, which would allow the expected downstream content viewer >>> behaviour. >>> >>> On Mon, 25 Sept 2023, 06:54 Matt Burgess, wrote: >>> I added MIME Type properties to ExecuteProcess and ExecuteStream command so you can set it explicitly if you want [1]. They will be in the 1.24.0 and 2.0 releases. Regards, Matt [1] https://issues.apache.org/jira/browse/NIFI-12011 On Mon, Sep 25, 2023 at 1:41 AM Joe Witt wrote: > Chris > > Yep. Though this case was ExecuteStreamCommand so following with > UpdateAttr as you mention or IdentifyMimeType would do the trick. > > Thanks > > On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson < > chris.samp...@naimuri.com> wrote: > >> An UpdateAttribute could also be used to update the mime.type, e.g. >> to text/csv. >> >> I'd think the csv record writer should probably do this automatically >> though, so maybe worth a jira to correct that (I'm reasonably sure the >> existing json and avro writers do that, for example). >> >> On Sun, 24 Sept 2023, 23:52 James McMahon, >> wrote: >> >>> That was it. I was missing the forest for the trees, yet again >>> . I do all the hard work and then forget to IdentifyMimeType at the >>> end. >>> Thanks very much Joe. >>> Jim >>> >>> On Sun, Sep 24, 2023 at 6:30 PM Joe Witt wrote: >>> Jim, Before you try to view it you can likely run it through IdentifyMimeType. As you note the conversion from XLS to CSV happens but we still see a mime type of 'application/vnd. openxmlformats-officedocument.spreadsheetml.sheet' so that is likely causing it to not even attempt to display. So after your python script execution run the data through IdentifyMimeType then you can likely view it just fine. Thanks >