On Friday, March 22, 2024 3:20:08 P.M. AEDT Walter Dnes wrote:
> On Thu, Mar 21, 2024 at 11:02:21PM +0100, Frank Steinmetzger wrote
> 
> > Why not make the alteration one step before -- in the CSV?  There are
> > CSV abstraction tools like `q`, which gives you a SQL-like interface
> > to a csv file. Or you could write a quick transformer in python,
> > if you know the language a bit.
> 
>   I wrote a quick transformer in bash, because I know the language a
> bit <G>.  There are six "hospital regions" in Ontario.  The CSV file
> does all data for one "hospital region" April 1, 2020 to present,
> followed by same for next hospital region, etc, etc, for all 6 regions.
> My first step is to run dos2unix and sort by date, ending up with 6
> consecutive lines per day, sorted by date.  Then a "while" loop reads
> through the input CSV file.  Accumulators are added up for all 6 lines
> and a line is written out for each date.  Because I'm doing bash
> arithmetic, the data *MUST* be valid numbers, not ".".  So I *MUST* set
> missing data to somthing like zero.  Adding "." to a number causes bash
> to error out with a CSV file like so...
> 
> "2023-09-05","CENTRAL",5,2,152,6,3,1,1
> "2023-09-06","CENTRAL",5,2,136,6,3,1,1
> "2023-09-07","CENTRAL",7,1,158,8,2,1,1
> "2023-09-08","CENTRAL",7,3,154,6,3,-1,0
> "2023-09-09","CENTRAL",".",".",169,".",".",".","."
> "2023-09-10","CENTRAL",".",".",169,".",".",".","."
> "2023-09-11","CENTRAL",".",".",155,".",".",".","."
> "2023-09-12","CENTRAL",".",".",147,".",".",".","."
> 
>   The magic incantation for CSV files is to set IFS to a comma like so...
> 
> oldifs="${IFS}"
> IFS=","
> 
>   This allows parsing CSV files like so...
> 
> while read
> do
>    dataarray=(${REPLY})
> 
> ...and, kaboom, you have a fully populated array from reading one line
> of a CSV file.  The following "transformer" is my "parsehospicu" script
> that summarizes the data to "hospsum.csv".  Note: I've deleted the
> leading "shebang slash bin slash bash" line because the Gentoo mailing
> list software doesn't seem to like "executable emails".
> 
>   I suppose I could have two different versions of the...
> 
> echo "${prevdate}...
> 
> ...line inside of an if/then/else/fi construct.  It would depend on the
> date being inside the "missing data range".
> 
> =======================================================================
> 
> # Strip out missing "." that screw up the script
> sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv >
> region_hospital_icu_covid_datax.csv dos2unix -n
> region_hospital_icu_covid_datax.csv region_hospital_icu_covid_datay.csv #
> # tail skips headers at beginning of file
> # sed deletes Row_ID, and strips out quotes
> # Output goes to file /dev/shm/temp0.txt
> tail -n +2 region_hospital_icu_covid_datay.csv | sed "s/\"//g" | sort >
> /dev/shm/temp0.txt #
> ## Set up IFS for easier parsing
> oldifs="${IFS}"
> IFS=","
> #
> # Initialize previous line's date to enter loop smoothly
> # expando to read first line
> dataline=$( head -1 /dev/shm/temp0.txt )
> dataarray=(${dataline})
> prevdate="${dataarray[0]}"
> #
> # Zero out accumulators to enter loop smoothly
> accum2=0
> accum3=0
> accum4=0
> accum5=0
> accum6=0
> accum7=0
> accum8=0
> #
> # Remove previous hospsum.csv and open a new one for writing
> rm -rf hospsum.csv
> exec 3>hospsum.csv
> #
> # Write header line to output file
> echo
> "date,icu_current_covid,icu_current_covid_vented,hospitalizations,icu_crci_
> total,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented" >&3 #
> # Main loop
> # Read the data from one line in /dev/shm/temp0.txt
> while read
> do
>    dataarray=(${REPLY})
>    if [ "${dataarray[0]}" = "${prevdate}" ]; then
> #
> # If this line's date is same as previous line's date, add amounts to
> accumulators. accum2=$(( ${accum2} + ${dataarray[2]} ))
>       accum3=$(( ${accum3} + ${dataarray[3]} ))
>       accum4=$(( ${accum4} + ${dataarray[4]} ))
>       accum5=$(( ${accum5} + ${dataarray[5]} ))
>       accum6=$(( ${accum6} + ${dataarray[6]} ))
>       accum7=$(( ${accum7} + ${dataarray[7]} ))
>       accum8=$(( ${accum8} + ${dataarray[8]} ))
>    else
> #
> # If this line's date has changed, output to hospsum.csv, update prevdate,
> # and update accumulators.  ***IMPORTANT*** "echo" TO hospsum.csv MUST BE
> # EXECUTED BEFORE UPDATING ACCUMULATORS AND prevdate***
>       echo
> "${prevdate},${accum2},${accum3},${accum4},${accum5},${accum6},${accum7},${
> accum8}" >&3 prevdate="${dataarray[0]}"
>       accum2=${dataarray[2]}
>       accum3=${dataarray[3]}
>       accum4=${dataarray[4]}
>       accum5=${dataarray[5]}
>       accum6=${dataarray[6]}
>       accum7=${dataarray[7]}
>       accum8=${dataarray[8]}
>    fi
> done</dev/shm/temp0.txt
> #
> # Close file hospsum.csv
> exec 3<&-
> IFS="${oldifs}"


Bash can do patern substitution in variable references.

Replace  accum3=$(( ${accum3} + ${dataarray[3]} ))
with     accum3=$(( ${accum3} + ${dataarray[3]/'.'/0} ))

and similarly with the other lines and any array value of '.' will be replaced 
with a '0'


-- 
Reverend Paul Colquhoun, ULC.     http://andor.dropbear.id.au/
  Asking for technical help in newsgroups?  Read this first:
     http://catb.org/~esr/faqs/smart-questions.html#intro




Reply via email to