Nice. Powerful of Apache Drill.

> 2021年5月23日 上午10:18,Ted Dunning <ted.dunn...@gmail.com> 写道:
> 
> I was able to test using 1.18 and find that the problem is gone. I was
> unable to do a head to head test with 1.16, however, and couldn't figure
> out how to run 1.18 on the same machines as the current 1.16 environment
> without destablizing that 1.16 environment (collision on the plugins
> directory). I didn't want to spend a lot of time so I will stick with the
> judgment that the current behavior seems to be correct.
> 
> Notably, the nested quotes are handled correctly without any quoting.
> 
> Nice.
> 
> On Sat, May 22, 2021 at 6:45 PM luoc <l...@apache.org> wrote:
> 
>> Hi Ted,
>>  You can use this reader without switching if you are using the latest
>> version (1.19.0 for better). There are unit tests related to the compliant
>> text reader (in `drill-java-exec` module, at the
>> `org.apache.drill.exec.store.easy.text.compliant` package).
>> 
>>> 2021年5月23日 上午5:19,Ted Dunning <ted.dunn...@gmail.com> 写道:
>>> 
>>> Also, where would I find the unit tests for the compliant text reader?
>>> 
>>> I have a simple enough case to write a unit test, but I can't see any
>>> reference to the class in question outside of working code.
>>> 
>>> 
>>> On Thu, May 20, 2021 at 7:40 AM Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>>> 
>>>> 
>>>> Luoc,
>>>> 
>>>> How do I use the CompliantTextBatchReader?
>>>> 
>>>> How is the speed?
>>>> 
>>>> Can you point me at the old CSV reader? I am not sure where it is.
>>>> 
>>>> 
>>>> 
>>>> On Thu, May 20, 2021 at 1:09 AM luoc <l...@apache.org> wrote:
>>>> 
>>>>> Hello Ted,
>>>>> It's nice idea. I have done a quick review for the CSV reader, but not
>>>>> found any settings to process the errors. And then, We have refactored
>> the
>>>>> CSV format using the EVF, please see the CompliantTextBatchReader.java
>>>>> (Complies with the RFC 4180 standard for text/csv files).
>>>>> 
>>>>>> 在 2021年5月20日,13:49,Ted Dunning <ted.dunn...@gmail.com> 写道:
>>>>>> 
>>>>>> I have a csv file that causes an exception when read by Drill. The
>>>>> file is
>>>>>> slightly mal-formed (but R can read it).
>>>>>> 
>>>>>> Interestingly, if I don't parse the header line, I don't get the
>>>>> exception
>>>>>> and the problematic embedded quotes are handled well. Likewise,
>> deleting
>>>>>> the first data line (which is well-formed) causes the exception to go
>>>>> away.
>>>>>> Deleting the second data line also causes the exception to stop.
>> Fixing
>>>>> the
>>>>>> quoting of the included quotes also fixes the problem. Swapping the
>>>>> lines
>>>>>> works like deleting the first line. Repeating the first line after the
>>>>>> second line still gets the exception.
>>>>>> 
>>>>>> The file is this:
>>>>>> -------------------------
>>>>>> 
>>>>>> desc,name
>>>>>> 
>>>>>> "foo","x"
>>>>>> 
>>>>>> "manure called "foo"","y"
>>>>>> 
>>>>>> -------------
>>>>>> 
>>>>>> 
>>>>>> The exception is shown below. My thought is that if the CSV file is
>>>>>> considered mal-formed, we should get an error on the line that says
>>>>>> something along the lines of "mal-formed input". Even better would be
>> to
>>>>>> allow such lines to be omitted (up to some sanity limit) or to parse
>> it
>>>>>> correctly (which happens without headers being parsed).
>>>>>> 
>>>>>> Anybody have any thoughts?
>>>>>> 
>>>>>> Here is the R behavior (it omits the embedded quotes):
>>>>>> 
>>>>>>> f = read.csv("v.csv")
>>>>>> 
>>>>>>> f
>>>>>> 
>>>>>>     desc name
>>>>>> 
>>>>>> 1               foo    x
>>>>>> 
>>>>>> 2 manure called foo    y
>>>>>> 
>>>>>> 
>>>>>> And here is the exception:
>>>>>> 
>>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>>>>> NegativeArraySizeException Please, refer to logs for more information.
>>>>>> [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ]
>>>>>> (java.lang.NegativeArraySizeException) null
>>>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.get():487
>>>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514
>>>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475
>>>>>> org.apache.drill.exec.server.rest.WebUserConnection.sendData():147
>>>>>> org.apache.drill.exec.ops.AccountingUserConnection.sendData():42
>>>>>> 
>>>>> 
>> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120
>>>>>> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>>>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>>>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>>>>>> java.security.AccessController.doPrivileged():-2
>>>>>> javax.security.auth.Subject.doAs():422
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs():1669
>>>>>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>>>>>> org.apache.drill.common.SelfCleaningRunnable.run():38
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>>>>>> java.lang.Thread.run():748
>>>>> 
>>>> 
>> 
>> 

Reply via email to