On Friday, March 22, 2024 3:20:08 P.M. AEDT Walter Dnes wrote: > On Thu, Mar 21, 2024 at 11:02:21PM +0100, Frank Steinmetzger wrote > > > Why not make the alteration one step before -- in the CSV? There are > > CSV abstraction tools like `q`, which gives you a SQL-like interface > > to a csv file. Or you could write a quick transformer in python, > > if you know the language a bit. > > I wrote a quick transformer in bash, because I know the language a > bit <G>. There are six "hospital regions" in Ontario. The CSV file > does all data for one "hospital region" April 1, 2020 to present, > followed by same for next hospital region, etc, etc, for all 6 regions. > My first step is to run dos2unix and sort by date, ending up with 6 > consecutive lines per day, sorted by date. Then a "while" loop reads > through the input CSV file. Accumulators are added up for all 6 lines > and a line is written out for each date. Because I'm doing bash > arithmetic, the data *MUST* be valid numbers, not ".". So I *MUST* set > missing data to somthing like zero. Adding "." to a number causes bash > to error out with a CSV file like so... > > "2023-09-05","CENTRAL",5,2,152,6,3,1,1 > "2023-09-06","CENTRAL",5,2,136,6,3,1,1 > "2023-09-07","CENTRAL",7,1,158,8,2,1,1 > "2023-09-08","CENTRAL",7,3,154,6,3,-1,0 > "2023-09-09","CENTRAL",".",".",169,".",".",".","." > "2023-09-10","CENTRAL",".",".",169,".",".",".","." > "2023-09-11","CENTRAL",".",".",155,".",".",".","." > "2023-09-12","CENTRAL",".",".",147,".",".",".","." > > The magic incantation for CSV files is to set IFS to a comma like so... > > oldifs="${IFS}" > IFS="," > > This allows parsing CSV files like so... > > while read > do > dataarray=(${REPLY}) > > ...and, kaboom, you have a fully populated array from reading one line > of a CSV file. The following "transformer" is my "parsehospicu" script > that summarizes the data to "hospsum.csv". Note: I've deleted the > leading "shebang slash bin slash bash" line because the Gentoo mailing > list software doesn't seem to like "executable emails". > > I suppose I could have two different versions of the... > > echo "${prevdate}... > > ...line inside of an if/then/else/fi construct. It would depend on the > date being inside the "missing data range". > > ======================================================================= > > # Strip out missing "." that screw up the script > sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv > > region_hospital_icu_covid_datax.csv dos2unix -n > region_hospital_icu_covid_datax.csv region_hospital_icu_covid_datay.csv # > # tail skips headers at beginning of file > # sed deletes Row_ID, and strips out quotes > # Output goes to file /dev/shm/temp0.txt > tail -n +2 region_hospital_icu_covid_datay.csv | sed "s/\"//g" | sort > > /dev/shm/temp0.txt # > ## Set up IFS for easier parsing > oldifs="${IFS}" > IFS="," > # > # Initialize previous line's date to enter loop smoothly > # expando to read first line > dataline=$( head -1 /dev/shm/temp0.txt ) > dataarray=(${dataline}) > prevdate="${dataarray[0]}" > # > # Zero out accumulators to enter loop smoothly > accum2=0 > accum3=0 > accum4=0 > accum5=0 > accum6=0 > accum7=0 > accum8=0 > # > # Remove previous hospsum.csv and open a new one for writing > rm -rf hospsum.csv > exec 3>hospsum.csv > # > # Write header line to output file > echo > "date,icu_current_covid,icu_current_covid_vented,hospitalizations,icu_crci_ > total,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented" >&3 # > # Main loop > # Read the data from one line in /dev/shm/temp0.txt > while read > do > dataarray=(${REPLY}) > if [ "${dataarray[0]}" = "${prevdate}" ]; then > # > # If this line's date is same as previous line's date, add amounts to > accumulators. accum2=$(( ${accum2} + ${dataarray[2]} )) > accum3=$(( ${accum3} + ${dataarray[3]} )) > accum4=$(( ${accum4} + ${dataarray[4]} )) > accum5=$(( ${accum5} + ${dataarray[5]} )) > accum6=$(( ${accum6} + ${dataarray[6]} )) > accum7=$(( ${accum7} + ${dataarray[7]} )) > accum8=$(( ${accum8} + ${dataarray[8]} )) > else > # > # If this line's date has changed, output to hospsum.csv, update prevdate, > # and update accumulators. ***IMPORTANT*** "echo" TO hospsum.csv MUST BE > # EXECUTED BEFORE UPDATING ACCUMULATORS AND prevdate*** > echo > "${prevdate},${accum2},${accum3},${accum4},${accum5},${accum6},${accum7},${ > accum8}" >&3 prevdate="${dataarray[0]}" > accum2=${dataarray[2]} > accum3=${dataarray[3]} > accum4=${dataarray[4]} > accum5=${dataarray[5]} > accum6=${dataarray[6]} > accum7=${dataarray[7]} > accum8=${dataarray[8]} > fi > done</dev/shm/temp0.txt > # > # Close file hospsum.csv > exec 3<&- > IFS="${oldifs}"
Bash can do patern substitution in variable references. Replace accum3=$(( ${accum3} + ${dataarray[3]} )) with accum3=$(( ${accum3} + ${dataarray[3]/'.'/0} )) and similarly with the other lines and any array value of '.' will be replaced with a '0' -- Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/ Asking for technical help in newsgroups? Read this first: http://catb.org/~esr/faqs/smart-questions.html#intro