Luoc,

Thanks for your reply. Can you point me to documentation about how to
switch readers?



On Fri, May 21, 2021 at 7:08 AM luoc <l...@apache.org> wrote:

> Hi Ted,
>   You can use the new version of CSV reader (binding the
> CompliantTextBatchReader) to query the CSV since 1.16 (no changes in the
> usage). But this reader does not support your idea. I think we can provide
> a few codes to enhance the reader. All the new storage and format plugin
> base the EVF, more powerful and stable.
>
> > 2021年5月20日 下午10:40,Ted Dunning <ted.dunn...@gmail.com> 写道:
> >
> > Luoc,
> >
> > How do I use the CompliantTextBatchReader?
> >
> > How is the speed?
> >
> > Can you point me at the old CSV reader? I am not sure where it is.
> >
> >
> >
> > On Thu, May 20, 2021 at 1:09 AM luoc <l...@apache.org> wrote:
> >
> >> Hello Ted,
> >> It's nice idea. I have done a quick review for the CSV reader, but not
> >> found any settings to process the errors. And then, We have refactored
> the
> >> CSV format using the EVF, please see the CompliantTextBatchReader.java
> >> (Complies with the RFC 4180 standard for text/csv files).
> >>
> >>> 在 2021年5月20日,13:49,Ted Dunning <ted.dunn...@gmail.com> 写道:
> >>>
> >>> I have a csv file that causes an exception when read by Drill. The
> file
> >> is
> >>> slightly mal-formed (but R can read it).
> >>>
> >>> Interestingly, if I don't parse the header line, I don't get the
> >> exception
> >>> and the problematic embedded quotes are handled well. Likewise,
> deleting
> >>> the first data line (which is well-formed) causes the exception to go
> >> away.
> >>> Deleting the second data line also causes the exception to stop. Fixing
> >> the
> >>> quoting of the included quotes also fixes the problem. Swapping the
> lines
> >>> works like deleting the first line. Repeating the first line after the
> >>> second line still gets the exception.
> >>>
> >>> The file is this:
> >>> -------------------------
> >>>
> >>> desc,name
> >>>
> >>> "foo","x"
> >>>
> >>> "manure called "foo"","y"
> >>>
> >>> -------------
> >>>
> >>>
> >>> The exception is shown below. My thought is that if the CSV file is
> >>> considered mal-formed, we should get an error on the line that says
> >>> something along the lines of "mal-formed input". Even better would be
> to
> >>> allow such lines to be omitted (up to some sanity limit) or to parse it
> >>> correctly (which happens without headers being parsed).
> >>>
> >>> Anybody have any thoughts?
> >>>
> >>> Here is the R behavior (it omits the embedded quotes):
> >>>
> >>>> f = read.csv("v.csv")
> >>>
> >>>> f
> >>>
> >>>      desc name
> >>>
> >>> 1               foo    x
> >>>
> >>> 2 manure called foo    y
> >>>
> >>>
> >>> And here is the exception:
> >>>
> >>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> >>> NegativeArraySizeException Please, refer to logs for more information.
> >>> [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ]
> >>> (java.lang.NegativeArraySizeException) null
> >>> org.apache.drill.exec.vector.VarCharVector$Accessor.get():487
> >>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514
> >>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475
> >>> org.apache.drill.exec.server.rest.WebUserConnection.sendData():147
> >>> org.apache.drill.exec.ops.AccountingUserConnection.sendData():42
> >>>
> >>
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120
> >>> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> >>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> >>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> >>> java.security.AccessController.doPrivileged():-2
> >>> javax.security.auth.Subject.doAs():422
> >>> org.apache.hadoop.security.UserGroupInformation.doAs():1669
> >>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> >>> org.apache.drill.common.SelfCleaningRunnable.run():38
> >>> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> >>> java.lang.Thread.run():748
> >>
>
>

Reply via email to