Another way in Python is to use the CSV library and read the data line by
line, checking the data quality each step.
The CSV library will handle different delimiters, quoted fields, and
variable fields.

#!/usr/bin/env python3
import csv

with open('file.csv', 'r') as infile:
        # reader provides a list of lists
        lines = csv.reader(infile, delimiter=',')
        for line in lines:
                # check for proper length
                print(len(line))



On Sun, May 21, 2023 at 5:42 AM Rich Shepard <rshep...@appl-ecosys.com>
wrote:

> On Sat, 20 May 2023, American Citizen wrote:
>
> > 1. using awk -F, fails when a cell contains a quoted cell with an
> embedded
> > comma
>
> I download .csv files from agency databases where strings are double quoted
> and contain commas within them, as well as using commas to separated
> fields.
>
> I start my gawk script with
> BEGIN { FS="," }
> and it separates (or counts, or selects) fields ignoring commas within
> quoted strings.
>
> Rich
>
>

Reply via email to