Handling incoming file names that contain embedded spaces
I am using NiFi 0.6.1. I am trying to use GetFile to read in a large series of files I have preprocessed outside of NiFi from zip files using bash shell commands. GetFile is throwing errors on many of these files because the files contain embedded spaces. Is there a way to tell NiFi to handle each such filename with surrounding single quotes? Are there other processor options better suited to handle this challenge? Thank you.
Re: Handling incoming file names that contain embedded spaces
James, I suspect there is more to the issue than the spaces. GetFile itself should be fine there. Can you share logs showing what is happening with these files? Can you share some sample filenames that it is struggling with? You can also enable debug logging for that processor which could provide some interesting details as well. Thanks Joe On Wed, Dec 14, 2016 at 5:03 PM, James McMahon wrote: > I am using NiFi 0.6.1. I am trying to use GetFile to read in a large series > of files I have preprocessed outside of NiFi from zip files using bash shell > commands. GetFile is throwing errors on many of these files because the > files contain embedded spaces. Is there a way to tell NiFi to handle each > such filename with surrounding single quotes? Are there other processor > options better suited to handle this challenge? Thank you.
Re: Handling incoming file names that contain embedded spaces
Yes indeed Joe, it appears from the logs that there are non-ASCII unicode characters preceding and at end of the file name. The log shows them as odd representations of "unprintables" - for example, small inverted question marks in diamonds, etc etc. They are embedded in the file names by the application that created the files. I copied and tried to paste and save into a text file, and notepad directed me to switch to another encoding in order to save the file name string. I was able to get it to save by switching to Unicode encoding. I can't send the logs from my system. I can only relay this in this way. Would you expect that such character encoding would cause problems for GetFile? What alternatives do I have to work around this problem? Thank you once again. On Wed, Dec 14, 2016 at 6:04 PM, Joe Witt wrote: > James, > > I suspect there is more to the issue than the spaces. GetFile itself > should be fine there. Can you share logs showing what is happening > with these files? Can you share some sample filenames that it is > struggling with? You can also enable debug logging for that processor > which could provide some interesting details as well. > > Thanks > Joe > > On Wed, Dec 14, 2016 at 5:03 PM, James McMahon > wrote: > > I am using NiFi 0.6.1. I am trying to use GetFile to read in a large > series > > of files I have preprocessed outside of NiFi from zip files using bash > shell > > commands. GetFile is throwing errors on many of these files because the > > files contain embedded spaces. Is there a way to tell NiFi to handle each > > such filename with surrounding single quotes? Are there other processor > > options better suited to handle this challenge? Thank you. >
Re: Handling incoming file names that contain embedded spaces
As a representative example using a random Unicode character at the front and the back of a notional file name, [U+0932][U+0932][U+0932]+123 ABC[U+07C1] On Wed, Dec 14, 2016 at 7:22 PM, James McMahon wrote: > Yes indeed Joe, it appears from the logs that there are non-ASCII unicode > characters preceding and at end of the file name. The log shows them as odd > representations of "unprintables" - for example, small inverted question > marks in diamonds, etc etc. They are embedded in the file names by the > application that created the files. I copied and tried to paste and save > into a text file, and notepad directed me to switch to another encoding in > order to save the file name string. I was able to get it to save by > switching to Unicode encoding. > > I can't send the logs from my system. I can only relay this in this way. > Would you expect that such character encoding would cause problems for > GetFile? What alternatives do I have to work around this problem? Thank you > once again. > > On Wed, Dec 14, 2016 at 6:04 PM, Joe Witt wrote: > >> James, >> >> I suspect there is more to the issue than the spaces. GetFile itself >> should be fine there. Can you share logs showing what is happening >> with these files? Can you share some sample filenames that it is >> struggling with? You can also enable debug logging for that processor >> which could provide some interesting details as well. >> >> Thanks >> Joe >> >> On Wed, Dec 14, 2016 at 5:03 PM, James McMahon >> wrote: >> > I am using NiFi 0.6.1. I am trying to use GetFile to read in a large >> series >> > of files I have preprocessed outside of NiFi from zip files using bash >> shell >> > commands. GetFile is throwing errors on many of these files because the >> > files contain embedded spaces. Is there a way to tell NiFi to handle >> each >> > such filename with surrounding single quotes? Are there other processor >> > options better suited to handle this challenge? Thank you. >> > >
Re: Handling incoming file names that contain embedded spaces
Is the NiFi GetFile processor restricted to a range of characters, or is it able to handle Unicode character encoding? If it is able to map Unicode characters then there must be another problem causing my GetFile processor to throw these "unmappable characters" errors. Thank you for any thoughts. On Thu, Dec 15, 2016 at 6:07 AM, James McMahon wrote: > As a representative example using a random Unicode character at the front > and the back of a notional file name, [U+0932][U+0932][U+0932]+123 > ABC[U+07C1] > > On Wed, Dec 14, 2016 at 7:22 PM, James McMahon > wrote: > >> Yes indeed Joe, it appears from the logs that there are non-ASCII unicode >> characters preceding and at end of the file name. The log shows them as odd >> representations of "unprintables" - for example, small inverted question >> marks in diamonds, etc etc. They are embedded in the file names by the >> application that created the files. I copied and tried to paste and save >> into a text file, and notepad directed me to switch to another encoding in >> order to save the file name string. I was able to get it to save by >> switching to Unicode encoding. >> >> I can't send the logs from my system. I can only relay this in this way. >> Would you expect that such character encoding would cause problems for >> GetFile? What alternatives do I have to work around this problem? Thank you >> once again. >> >> On Wed, Dec 14, 2016 at 6:04 PM, Joe Witt wrote: >> >>> James, >>> >>> I suspect there is more to the issue than the spaces. GetFile itself >>> should be fine there. Can you share logs showing what is happening >>> with these files? Can you share some sample filenames that it is >>> struggling with? You can also enable debug logging for that processor >>> which could provide some interesting details as well. >>> >>> Thanks >>> Joe >>> >>> On Wed, Dec 14, 2016 at 5:03 PM, James McMahon >>> wrote: >>> > I am using NiFi 0.6.1. I am trying to use GetFile to read in a large >>> series >>> > of files I have preprocessed outside of NiFi from zip files using bash >>> shell >>> > commands. GetFile is throwing errors on many of these files because the >>> > files contain embedded spaces. Is there a way to tell NiFi to handle >>> each >>> > such filename with surrounding single quotes? Are there other processor >>> > options better suited to handle this challenge? Thank you. >>> >> >> >
Re: Handling incoming file names that contain embedded spaces
James, I think this would need to be tested and evaluated. I'm not quite sure offhand and also it has to do with the operating system involved. Any details you can provide of your own findings and environment will be helpful. Thanks Joe On Thu, Dec 15, 2016 at 4:38 PM, James McMahon wrote: > Is the NiFi GetFile processor restricted to a range of characters, or is it > able to handle Unicode character encoding? If it is able to map Unicode > characters then there must be another problem causing my GetFile processor > to throw these "unmappable characters" errors. Thank you for any thoughts. > > On Thu, Dec 15, 2016 at 6:07 AM, James McMahon wrote: >> >> As a representative example using a random Unicode character at the front >> and the back of a notional file name, [U+0932][U+0932][U+0932]+123 >> ABC[U+07C1] >> >> On Wed, Dec 14, 2016 at 7:22 PM, James McMahon >> wrote: >>> >>> Yes indeed Joe, it appears from the logs that there are non-ASCII unicode >>> characters preceding and at end of the file name. The log shows them as odd >>> representations of "unprintables" - for example, small inverted question >>> marks in diamonds, etc etc. They are embedded in the file names by the >>> application that created the files. I copied and tried to paste and save >>> into a text file, and notepad directed me to switch to another encoding in >>> order to save the file name string. I was able to get it to save by >>> switching to Unicode encoding. >>> >>> I can't send the logs from my system. I can only relay this in this way. >>> Would you expect that such character encoding would cause problems for >>> GetFile? What alternatives do I have to work around this problem? Thank you >>> once again. >>> >>> On Wed, Dec 14, 2016 at 6:04 PM, Joe Witt wrote: James, I suspect there is more to the issue than the spaces. GetFile itself should be fine there. Can you share logs showing what is happening with these files? Can you share some sample filenames that it is struggling with? You can also enable debug logging for that processor which could provide some interesting details as well. Thanks Joe On Wed, Dec 14, 2016 at 5:03 PM, James McMahon wrote: > I am using NiFi 0.6.1. I am trying to use GetFile to read in a large > series > of files I have preprocessed outside of NiFi from zip files using bash > shell > commands. GetFile is throwing errors on many of these files because > the > files contain embedded spaces. Is there a way to tell NiFi to handle > each > such filename with surrounding single quotes? Are there other > processor > options better suited to handle this challenge? Thank you. >>> >>> >> >