The reader already is supposed to have support for a two byte delimiter.
Apparently, there is a bug somewhere in the state management.

The code is a byte-based approach (as opposed to character-based approach).

I'm guessing there is an issue in one or both of these blocks: [1][2].

I think the code just needs to be debugged to figure out why the separator
is not being recognized. It is also possible that there is an additional
problem in [3] where we prematurely detect a new line when there isn't one.
See the "newLine" uses.

Hopefully that can give you some pointers on where to look.

thanks,
Jacques

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextInput.java#L274
[2]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextInput.java#L334
[3]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java


--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Oct 29, 2015 at 7:15 AM, Edmon Begoli <[email protected]> wrote:

> I can do it, but with little bit of a guidance where in the Drill code base
> to apply the fix.
>
> Ideally, someone would tell me where to look in the reader that had it
> fixed in a different context, and then give a suggestion where to apply it.
>
> Thank you,
> Edmon
>
> On Wed, Oct 28, 2015 at 10:53 PM, Jacques Nadeau <[email protected]>
> wrote:
>
> > Jim's fix wasn't lost. It was in the context of very different reader.
> That
> > reader was deprecated because there were a number of other issues and
> > performance problems with it. Those items were addressed in this reader.
> >
> > In terms of someone looking at this soon, I agree that this would be
> great.
> > Can someone raise their hand?
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Wed, Oct 28, 2015 at 6:10 PM, Edmon Begoli <[email protected]> wrote:
> >
> > > May I please escalate this issue for 1.3 or 1.4:
> > >
> > > https://issues.apache.org/jira/browse/DRILL-3149
> > >
> > > I understand that Jim's fixed was lost.
> > >
> > > Can the fix be recovered and slipped into 1.3?
> > >
> > > It is causing us to re-format very large volume of files to check and
> > > remove these line terminators.
> > >
> > > Thank you,
> > > Edmon
> > >
> >
>

Reply via email to