Nice. Powerful of Apache Drill.
> 2021年5月23日 上午10:18,Ted Dunning <ted.dunn...@gmail.com> 写道: > > I was able to test using 1.18 and find that the problem is gone. I was > unable to do a head to head test with 1.16, however, and couldn't figure > out how to run 1.18 on the same machines as the current 1.16 environment > without destablizing that 1.16 environment (collision on the plugins > directory). I didn't want to spend a lot of time so I will stick with the > judgment that the current behavior seems to be correct. > > Notably, the nested quotes are handled correctly without any quoting. > > Nice. > > On Sat, May 22, 2021 at 6:45 PM luoc <l...@apache.org> wrote: > >> Hi Ted, >> You can use this reader without switching if you are using the latest >> version (1.19.0 for better). There are unit tests related to the compliant >> text reader (in `drill-java-exec` module, at the >> `org.apache.drill.exec.store.easy.text.compliant` package). >> >>> 2021年5月23日 上午5:19,Ted Dunning <ted.dunn...@gmail.com> 写道: >>> >>> Also, where would I find the unit tests for the compliant text reader? >>> >>> I have a simple enough case to write a unit test, but I can't see any >>> reference to the class in question outside of working code. >>> >>> >>> On Thu, May 20, 2021 at 7:40 AM Ted Dunning <ted.dunn...@gmail.com> >> wrote: >>> >>>> >>>> Luoc, >>>> >>>> How do I use the CompliantTextBatchReader? >>>> >>>> How is the speed? >>>> >>>> Can you point me at the old CSV reader? I am not sure where it is. >>>> >>>> >>>> >>>> On Thu, May 20, 2021 at 1:09 AM luoc <l...@apache.org> wrote: >>>> >>>>> Hello Ted, >>>>> It's nice idea. I have done a quick review for the CSV reader, but not >>>>> found any settings to process the errors. And then, We have refactored >> the >>>>> CSV format using the EVF, please see the CompliantTextBatchReader.java >>>>> (Complies with the RFC 4180 standard for text/csv files). >>>>> >>>>>> 在 2021年5月20日,13:49,Ted Dunning <ted.dunn...@gmail.com> 写道: >>>>>> >>>>>> I have a csv file that causes an exception when read by Drill. The >>>>> file is >>>>>> slightly mal-formed (but R can read it). >>>>>> >>>>>> Interestingly, if I don't parse the header line, I don't get the >>>>> exception >>>>>> and the problematic embedded quotes are handled well. Likewise, >> deleting >>>>>> the first data line (which is well-formed) causes the exception to go >>>>> away. >>>>>> Deleting the second data line also causes the exception to stop. >> Fixing >>>>> the >>>>>> quoting of the included quotes also fixes the problem. Swapping the >>>>> lines >>>>>> works like deleting the first line. Repeating the first line after the >>>>>> second line still gets the exception. >>>>>> >>>>>> The file is this: >>>>>> ------------------------- >>>>>> >>>>>> desc,name >>>>>> >>>>>> "foo","x" >>>>>> >>>>>> "manure called "foo"","y" >>>>>> >>>>>> ------------- >>>>>> >>>>>> >>>>>> The exception is shown below. My thought is that if the CSV file is >>>>>> considered mal-formed, we should get an error on the line that says >>>>>> something along the lines of "mal-formed input". Even better would be >> to >>>>>> allow such lines to be omitted (up to some sanity limit) or to parse >> it >>>>>> correctly (which happens without headers being parsed). >>>>>> >>>>>> Anybody have any thoughts? >>>>>> >>>>>> Here is the R behavior (it omits the embedded quotes): >>>>>> >>>>>>> f = read.csv("v.csv") >>>>>> >>>>>>> f >>>>>> >>>>>> desc name >>>>>> >>>>>> 1 foo x >>>>>> >>>>>> 2 manure called foo y >>>>>> >>>>>> >>>>>> And here is the exception: >>>>>> >>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: >>>>>> NegativeArraySizeException Please, refer to logs for more information. >>>>>> [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ] >>>>>> (java.lang.NegativeArraySizeException) null >>>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.get():487 >>>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514 >>>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475 >>>>>> org.apache.drill.exec.server.rest.WebUserConnection.sendData():147 >>>>>> org.apache.drill.exec.ops.AccountingUserConnection.sendData():42 >>>>>> >>>>> >> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120 >>>>>> org.apache.drill.exec.physical.impl.BaseRootExec.next():94 >>>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 >>>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 >>>>>> java.security.AccessController.doPrivileged():-2 >>>>>> javax.security.auth.Subject.doAs():422 >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs():1669 >>>>>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 >>>>>> org.apache.drill.common.SelfCleaningRunnable.run():38 >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker():1149 >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run():624 >>>>>> java.lang.Thread.run():748 >>>>> >>>> >> >>