[Pharo-users] Re: NeoCSVReader and wrong number of fieldAccessors

jtuc...@objektfabrik.de Fri, 22 Jan 2021 03:16:11 -0800

Tim,




Am 22.01.21 um 10:22 schrieb Tim Mackinnon:

I’m not doing any CSV processing at the moment, but have in the past -so was interested in this thread.
@Kasper, can’t you just use #readHeader upfront, and do the assertionyourself, and then proceed to loop through your records? It would seemthat the Neo caters for what you are suggesting - and if you want toadd a helper method extension you have the building blocks to alreadydo this?

This is a good idea. One caveat, however: #readHeader in its currentimplementation does 2 things:


 * read the line respecting each field (thereby, respect line breaks
   within quoted fields - perfect for this purpose)
 * update the number of Columns for further reading (assuming
   #readHeader's purpose is to interpret the header line)

This second thing is in our way, because it may influence the way thefollowing lines will be interpreted. That is ecactly why I created anissue on github (https://github.com/svenvc/NeoCSV/issues/20<https://github.com/svenvc/NeoCSV/issues/20>).A method that reads a line without any side effects (other than pushingthe position pointer forward to the next line) would come in handy forsuch scenarios. But you can always argue that this has nothing to dowith CSV, because in CSV all lines have the same number of columns, eachof them containing the same kind of information, and there may beexactly one header line. Anything else is just some file that maycontain CSV-y stuff in it. So I am really not sure if NeoCSV shouldbuild lots of stuff for such files. I'd love to have this, but I'dunderstand if Sven refused to integrate it.... ;-)

The only flaw I can think of, is if there is no header present then Ican’t recall what Neo does - ideally throws an exception so you candecide what to do - potentially continue if the number of columns iswhat you expect and the data matches the columns - or you fail with anerror that a header is required. But I think you would always need todo some basic initial checks when processing CSV due to the nature ofthe format?

Right. You'd always have to write some specific logic for thisparticular file format and make NeoCSV ignore the right stuff...



Joachim

Tim

On Fri, 22 Jan 2021, at 6:42 AM, Kasper Osterbye wrote:
As it happened, I ran into the exact same scenario as Joachim justthe other day,that is, the external provider of my csv had added some new columns.In my casemanifested itself in an error that an integer field was not aninteger (because new
columns were added in the middle).
Reading through this whole thread leaves me with the feeling that nomatter what Svenadds, there is still a risk for error. Nevertheless, my suggestionwould be to add a
functionality to #skipHeaders, or make a sister method:
#assertAndSkipHeaders: numberOfColumns onFailDo: aBlock given theactual number of headers
That would give me a way to handle the error up front.

This will only be interesting if your data has headers of cause.

Thanks for NeoCSV which I use all the time!

Best,

Kasper


--
-----------------------------------------------------------------------
Objektfabrik Joachim Tuchel          mailto:jtuc...@objektfabrik.de
Fliederweg 1                         http://www.objektfabrik.de
D-71640 Ludwigsburg                  http://joachimtuchel.wordpress.com
Telefon: +49 7141 56 10 86 0         Fax: +49 7141 56 10 86 1

[Pharo-users] Re: NeoCSVReader and wrong number of fieldAccessors

Reply via email to