Handling incoming file names that contain embedded spaces

2016-12-14 Thread James McMahon
I am using NiFi 0.6.1. I am trying to use GetFile to read in a large series
of files I have preprocessed outside of NiFi from zip files using bash
shell commands. GetFile is throwing errors on many of these files because
the files contain embedded spaces. Is there a way to tell NiFi to handle
each such filename with surrounding single quotes? Are there other
processor options better suited to handle this challenge? Thank you.


Re: Handling incoming file names that contain embedded spaces

2016-12-14 Thread Joe Witt
James,

I suspect there is more to the issue than the spaces.  GetFile itself
should be fine there.  Can you share logs showing what is happening
with these files?  Can you share some sample filenames that it is
struggling with?  You can also enable debug logging for that processor
which could provide some interesting details as well.

Thanks
Joe

On Wed, Dec 14, 2016 at 5:03 PM, James McMahon  wrote:
> I am using NiFi 0.6.1. I am trying to use GetFile to read in a large series
> of files I have preprocessed outside of NiFi from zip files using bash shell
> commands. GetFile is throwing errors on many of these files because the
> files contain embedded spaces. Is there a way to tell NiFi to handle each
> such filename with surrounding single quotes? Are there other processor
> options better suited to handle this challenge? Thank you.


Re: Handling incoming file names that contain embedded spaces

2016-12-14 Thread James McMahon
Yes indeed Joe, it appears from the logs that there are non-ASCII unicode
characters preceding and at end of the file name. The log shows them as odd
representations of "unprintables" - for example, small inverted question
marks in diamonds, etc etc. They are embedded in the file names by the
application that created the files. I copied and tried to paste and save
into a text file, and notepad directed me to switch to another encoding in
order to save the file name string. I was able to get it to save by
switching to Unicode encoding.

I can't send the logs from my system. I can only relay this in this way.
Would you expect that such character encoding would cause problems for
GetFile? What alternatives do I have to work around this problem? Thank you
once again.

On Wed, Dec 14, 2016 at 6:04 PM, Joe Witt  wrote:

> James,
>
> I suspect there is more to the issue than the spaces.  GetFile itself
> should be fine there.  Can you share logs showing what is happening
> with these files?  Can you share some sample filenames that it is
> struggling with?  You can also enable debug logging for that processor
> which could provide some interesting details as well.
>
> Thanks
> Joe
>
> On Wed, Dec 14, 2016 at 5:03 PM, James McMahon 
> wrote:
> > I am using NiFi 0.6.1. I am trying to use GetFile to read in a large
> series
> > of files I have preprocessed outside of NiFi from zip files using bash
> shell
> > commands. GetFile is throwing errors on many of these files because the
> > files contain embedded spaces. Is there a way to tell NiFi to handle each
> > such filename with surrounding single quotes? Are there other processor
> > options better suited to handle this challenge? Thank you.
>


Re: Handling incoming file names that contain embedded spaces

2016-12-15 Thread James McMahon
As a representative example using a random Unicode character at the front
and the back of a notional file name, [U+0932][U+0932][U+0932]+123
ABC[U+07C1]

On Wed, Dec 14, 2016 at 7:22 PM, James McMahon  wrote:

> Yes indeed Joe, it appears from the logs that there are non-ASCII unicode
> characters preceding and at end of the file name. The log shows them as odd
> representations of "unprintables" - for example, small inverted question
> marks in diamonds, etc etc. They are embedded in the file names by the
> application that created the files. I copied and tried to paste and save
> into a text file, and notepad directed me to switch to another encoding in
> order to save the file name string. I was able to get it to save by
> switching to Unicode encoding.
>
> I can't send the logs from my system. I can only relay this in this way.
> Would you expect that such character encoding would cause problems for
> GetFile? What alternatives do I have to work around this problem? Thank you
> once again.
>
> On Wed, Dec 14, 2016 at 6:04 PM, Joe Witt  wrote:
>
>> James,
>>
>> I suspect there is more to the issue than the spaces.  GetFile itself
>> should be fine there.  Can you share logs showing what is happening
>> with these files?  Can you share some sample filenames that it is
>> struggling with?  You can also enable debug logging for that processor
>> which could provide some interesting details as well.
>>
>> Thanks
>> Joe
>>
>> On Wed, Dec 14, 2016 at 5:03 PM, James McMahon 
>> wrote:
>> > I am using NiFi 0.6.1. I am trying to use GetFile to read in a large
>> series
>> > of files I have preprocessed outside of NiFi from zip files using bash
>> shell
>> > commands. GetFile is throwing errors on many of these files because the
>> > files contain embedded spaces. Is there a way to tell NiFi to handle
>> each
>> > such filename with surrounding single quotes? Are there other processor
>> > options better suited to handle this challenge? Thank you.
>>
>
>


Re: Handling incoming file names that contain embedded spaces

2016-12-15 Thread James McMahon
Is the NiFi GetFile processor restricted to a range of characters, or is it
able to handle Unicode character encoding? If it is able to map Unicode
characters then there must be another problem causing my GetFile processor
to throw these "unmappable characters" errors. Thank you for any thoughts.

On Thu, Dec 15, 2016 at 6:07 AM, James McMahon  wrote:

> As a representative example using a random Unicode character at the front
> and the back of a notional file name, [U+0932][U+0932][U+0932]+123
> ABC[U+07C1]
>
> On Wed, Dec 14, 2016 at 7:22 PM, James McMahon 
> wrote:
>
>> Yes indeed Joe, it appears from the logs that there are non-ASCII unicode
>> characters preceding and at end of the file name. The log shows them as odd
>> representations of "unprintables" - for example, small inverted question
>> marks in diamonds, etc etc. They are embedded in the file names by the
>> application that created the files. I copied and tried to paste and save
>> into a text file, and notepad directed me to switch to another encoding in
>> order to save the file name string. I was able to get it to save by
>> switching to Unicode encoding.
>>
>> I can't send the logs from my system. I can only relay this in this way.
>> Would you expect that such character encoding would cause problems for
>> GetFile? What alternatives do I have to work around this problem? Thank you
>> once again.
>>
>> On Wed, Dec 14, 2016 at 6:04 PM, Joe Witt  wrote:
>>
>>> James,
>>>
>>> I suspect there is more to the issue than the spaces.  GetFile itself
>>> should be fine there.  Can you share logs showing what is happening
>>> with these files?  Can you share some sample filenames that it is
>>> struggling with?  You can also enable debug logging for that processor
>>> which could provide some interesting details as well.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Wed, Dec 14, 2016 at 5:03 PM, James McMahon 
>>> wrote:
>>> > I am using NiFi 0.6.1. I am trying to use GetFile to read in a large
>>> series
>>> > of files I have preprocessed outside of NiFi from zip files using bash
>>> shell
>>> > commands. GetFile is throwing errors on many of these files because the
>>> > files contain embedded spaces. Is there a way to tell NiFi to handle
>>> each
>>> > such filename with surrounding single quotes? Are there other processor
>>> > options better suited to handle this challenge? Thank you.
>>>
>>
>>
>


Re: Handling incoming file names that contain embedded spaces

2016-12-15 Thread Joe Witt
James,

I think this would need to be tested and evaluated.  I'm not quite
sure offhand and also it has to do with the operating system involved.
Any details you can provide of your own findings and environment will
be helpful.

Thanks
Joe

On Thu, Dec 15, 2016 at 4:38 PM, James McMahon  wrote:
> Is the NiFi GetFile processor restricted to a range of characters, or is it
> able to handle Unicode character encoding? If it is able to map Unicode
> characters then there must be another problem causing my GetFile processor
> to throw these "unmappable characters" errors. Thank you for any thoughts.
>
> On Thu, Dec 15, 2016 at 6:07 AM, James McMahon  wrote:
>>
>> As a representative example using a random Unicode character at the front
>> and the back of a notional file name, [U+0932][U+0932][U+0932]+123
>> ABC[U+07C1]
>>
>> On Wed, Dec 14, 2016 at 7:22 PM, James McMahon 
>> wrote:
>>>
>>> Yes indeed Joe, it appears from the logs that there are non-ASCII unicode
>>> characters preceding and at end of the file name. The log shows them as odd
>>> representations of "unprintables" - for example, small inverted question
>>> marks in diamonds, etc etc. They are embedded in the file names by the
>>> application that created the files. I copied and tried to paste and save
>>> into a text file, and notepad directed me to switch to another encoding in
>>> order to save the file name string. I was able to get it to save by
>>> switching to Unicode encoding.
>>>
>>> I can't send the logs from my system. I can only relay this in this way.
>>> Would you expect that such character encoding would cause problems for
>>> GetFile? What alternatives do I have to work around this problem? Thank you
>>> once again.
>>>
>>> On Wed, Dec 14, 2016 at 6:04 PM, Joe Witt  wrote:

 James,

 I suspect there is more to the issue than the spaces.  GetFile itself
 should be fine there.  Can you share logs showing what is happening
 with these files?  Can you share some sample filenames that it is
 struggling with?  You can also enable debug logging for that processor
 which could provide some interesting details as well.

 Thanks
 Joe

 On Wed, Dec 14, 2016 at 5:03 PM, James McMahon 
 wrote:
 > I am using NiFi 0.6.1. I am trying to use GetFile to read in a large
 > series
 > of files I have preprocessed outside of NiFi from zip files using bash
 > shell
 > commands. GetFile is throwing errors on many of these files because
 > the
 > files contain embedded spaces. Is there a way to tell NiFi to handle
 > each
 > such filename with surrounding single quotes? Are there other
 > processor
 > options better suited to handle this challenge? Thank you.
>>>
>>>
>>
>