Thanks Arun and Peter.  Getting that resolved will be nice.  The
performance difference of the record reader/writer approach in all
this is pretty fantastic so the more we can do to iron out these sorts
of edges the better.  Thanks!

On Sun, Sep 24, 2017 at 8:56 PM, Peter Wicks (pwicks) <pwi...@micron.com> wrote:
> Arun,
>
> I'm also using Ctrl+A as a delimiter and had the same problem.  I haven't had 
> time to write up a PR but it looked like a pretty easy fix to me too.
>
> I can't merge the change if you submit it, but I'd be happy to review it.
>
> --Peter
>
> -----Original Message-----
> From: Arun Manivannan [mailto:a...@arunma.com]
> Sent: Sunday, September 24, 2017 11:17 PM
> To: Dev@nifi.apache.org
> Subject: [EXT] ConvertCSVToAvro vs CSVReader - Value Delimiter
>
> Hi,
>
> The ConvertCSVToAvro processor have been having performance issues while 
> processing files which are more than a GB and I was suggested to use the 
> ConvertRecord that leverages the RecordReader and Writer. Did some tests and 
> they do perform well.
>
> Strangely, the CSVReader doesn't accept unicode character as the value 
> delimiter - Control A  (\u0001) character is the delimiter of my CSV.
>
> Did some analysis and I see that a minor change needs to be made on the 
> CSVUtils to unescape the delimiter, like what ConvertCSVToAvro does and also 
> modify the SingleCharacterValidator.
>
> Please let me know if you believe this isn't an issue and there's a 
> workaround for this. Else, I am more than happy to raise an issue and submit 
> a PR for review.
>
> Best Regards,
> Arun

Reply via email to