Tim,
Am 22.01.21 um 10:22 schrieb Tim Mackinnon:
I’m not doing any CSV processing at the moment, but have in the past -
so was interested in this thread.
@Kasper, can’t you just use #readHeader upfront, and do the assertion
yourself, and then proceed to loop through your records? It would seem
that the Neo caters for what you are suggesting - and if you want to
add a helper method extension you have the building blocks to already
do this?
This is a good idea. One caveat, however: #readHeader in its current
implementation does 2 things:
* read the line respecting each field (thereby, respect line breaks
within quoted fields - perfect for this purpose)
* update the number of Columns for further reading (assuming
#readHeader's purpose is to interpret the header line)
This second thing is in our way, because it may influence the way the
following lines will be interpreted. That is ecactly why I created an
issue on github (https://github.com/svenvc/NeoCSV/issues/20
<https://github.com/svenvc/NeoCSV/issues/20>).
A method that reads a line without any side effects (other than pushing
the position pointer forward to the next line) would come in handy for
such scenarios. But you can always argue that this has nothing to do
with CSV, because in CSV all lines have the same number of columns, each
of them containing the same kind of information, and there may be
exactly one header line. Anything else is just some file that may
contain CSV-y stuff in it. So I am really not sure if NeoCSV should
build lots of stuff for such files. I'd love to have this, but I'd
understand if Sven refused to integrate it.... ;-)
The only flaw I can think of, is if there is no header present then I
can’t recall what Neo does - ideally throws an exception so you can
decide what to do - potentially continue if the number of columns is
what you expect and the data matches the columns - or you fail with an
error that a header is required. But I think you would always need to
do some basic initial checks when processing CSV due to the nature of
the format?
Right. You'd always have to write some specific logic for this
particular file format and make NeoCSV ignore the right stuff...
Joachim
Tim
On Fri, 22 Jan 2021, at 6:42 AM, Kasper Osterbye wrote:
As it happened, I ran into the exact same scenario as Joachim just
the other day,
that is, the external provider of my csv had added some new columns.
In my case
manifested itself in an error that an integer field was not an
integer (because new
columns were added in the middle).
Reading through this whole thread leaves me with the feeling that no
matter what Sven
adds, there is still a risk for error. Nevertheless, my suggestion
would be to add a
functionality to #skipHeaders, or make a sister method:
#assertAndSkipHeaders: numberOfColumns onFailDo: aBlock given the
actual number of headers
That would give me a way to handle the error up front.
This will only be interesting if your data has headers of cause.
Thanks for NeoCSV which I use all the time!
Best,
Kasper
--
-----------------------------------------------------------------------
Objektfabrik Joachim Tuchel mailto:jtuc...@objektfabrik.de
Fliederweg 1 http://www.objektfabrik.de
D-71640 Ludwigsburg http://joachimtuchel.wordpress.com
Telefon: +49 7141 56 10 86 0 Fax: +49 7141 56 10 86 1