On 2020-10-17 01:38, Ryan Sleevi wrote:
On Fri, Oct 16, 2020 at 5:27 PM Jakob Bohm via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

RFC4180 section 3 explicitly warns that there are other variants and
specifications of the CSV format, and thus the full generalizations in
RFC4180 should not be exploited to their extremes.


You're referring to this section, correct?

"""
    Interoperability considerations:
       Due to lack of a single specification, there are considerable
       differences among implementations.  Implementors should "be
       conservative in what you do, be liberal in what you accept from
       others" (RFC 793 [8]) when processing CSV files.  An attempt at a
       common definition can be found in Section 2.

       Implementations deciding not to use the optional "header"
       parameter must make their own decision as to whether the header is
       absent or present.
"""


Splitting the input at newlines before parsing for quotes and commas is
a pretty common implementation strategy as illustrated by my examples of
common tools that actually do so.


This would appear to be at fundamental odds with "be liberal in what you
accept from others" and, more specifically, ignoring the remark that
Section 2 is an admirable effort at a "common" definition, which is so
called as it minimizes such interoperability differences.

As your original statement was the file produced was "not CSV", I believe
that's been thoroughly dispelled by highlighting that, indeed, it does
conform to the grammar set forward in RFC 4180, and is consistent with the
IANA mime registration for CSV.

Although you also raised concern that naive and ill-informed attempts at
CSV parsing, which of course fail to parse the grammar of RFC 4180, there
are thankfully alternatives for each of those concerns. With awk, you have
FPAT. With Perl, you have Text::CSV. Of course, as the cut command is too
primitive to handle a proper grammar, there are plenty of equally
reasonable alternatives to this, and which take far less time than the
concerns and misstatements raised on this thread, which I believe we can,
thankfully, end.


Please stop trolling, the section you quoted clearly states that section 2 of the RFC was only an *attempt* at a common definition.

Ideally, a CSV parser should be liberal in what it accepts, but a CSV producer (which is the subject of this thread) should be conservative in what it provides, thus avoiding the least implemented/tested aspects
of the generalized grammar in that section 2.

Putting line feeds inside CSV fields is like using bang paths in an
RFC822 To: field.  Theoretically permitted and implemented by the most
complete e-mail parsing libraries in the world, but not something that
one can expect to work for a global audience.



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to