On Thu, Mar 21, 2024 at 11:02:21PM +0100, Frank Steinmetzger wrote > Why not make the alteration one step before -- in the CSV? There are > CSV abstraction tools like `q`, which gives you a SQL-like interface > to a csv file. Or you could write a quick transformer in python, > if you know the language a bit.
I wrote a quick transformer in bash, because I know the language a bit <G>. There are six "hospital regions" in Ontario. The CSV file does all data for one "hospital region" April 1, 2020 to present, followed by same for next hospital region, etc, etc, for all 6 regions. My first step is to run dos2unix and sort by date, ending up with 6 consecutive lines per day, sorted by date. Then a "while" loop reads through the input CSV file. Accumulators are added up for all 6 lines and a line is written out for each date. Because I'm doing bash arithmetic, the data *MUST* be valid numbers, not ".". So I *MUST* set missing data to somthing like zero. Adding "." to a number causes bash to error out with a CSV file like so... "2023-09-05","CENTRAL",5,2,152,6,3,1,1 "2023-09-06","CENTRAL",5,2,136,6,3,1,1 "2023-09-07","CENTRAL",7,1,158,8,2,1,1 "2023-09-08","CENTRAL",7,3,154,6,3,-1,0 "2023-09-09","CENTRAL",".",".",169,".",".",".","." "2023-09-10","CENTRAL",".",".",169,".",".",".","." "2023-09-11","CENTRAL",".",".",155,".",".",".","." "2023-09-12","CENTRAL",".",".",147,".",".",".","." The magic incantation for CSV files is to set IFS to a comma like so... oldifs="${IFS}" IFS="," This allows parsing CSV files like so... while read do dataarray=(${REPLY}) ...and, kaboom, you have a fully populated array from reading one line of a CSV file. The following "transformer" is my "parsehospicu" script that summarizes the data to "hospsum.csv". Note: I've deleted the leading "shebang slash bin slash bash" line because the Gentoo mailing list software doesn't seem to like "executable emails". I suppose I could have two different versions of the... echo "${prevdate}... ...line inside of an if/then/else/fi construct. It would depend on the date being inside the "missing data range". ======================================================================= # Strip out missing "." that screw up the script sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv > region_hospital_icu_covid_datax.csv dos2unix -n region_hospital_icu_covid_datax.csv region_hospital_icu_covid_datay.csv # # tail skips headers at beginning of file # sed deletes Row_ID, and strips out quotes # Output goes to file /dev/shm/temp0.txt tail -n +2 region_hospital_icu_covid_datay.csv | sed "s/\"//g" | sort > /dev/shm/temp0.txt # ## Set up IFS for easier parsing oldifs="${IFS}" IFS="," # # Initialize previous line's date to enter loop smoothly # expando to read first line dataline=$( head -1 /dev/shm/temp0.txt ) dataarray=(${dataline}) prevdate="${dataarray[0]}" # # Zero out accumulators to enter loop smoothly accum2=0 accum3=0 accum4=0 accum5=0 accum6=0 accum7=0 accum8=0 # # Remove previous hospsum.csv and open a new one for writing rm -rf hospsum.csv exec 3>hospsum.csv # # Write header line to output file echo "date,icu_current_covid,icu_current_covid_vented,hospitalizations,icu_crci_total,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented" >&3 # # Main loop # Read the data from one line in /dev/shm/temp0.txt while read do dataarray=(${REPLY}) if [ "${dataarray[0]}" = "${prevdate}" ]; then # # If this line's date is same as previous line's date, add amounts to accumulators. accum2=$(( ${accum2} + ${dataarray[2]} )) accum3=$(( ${accum3} + ${dataarray[3]} )) accum4=$(( ${accum4} + ${dataarray[4]} )) accum5=$(( ${accum5} + ${dataarray[5]} )) accum6=$(( ${accum6} + ${dataarray[6]} )) accum7=$(( ${accum7} + ${dataarray[7]} )) accum8=$(( ${accum8} + ${dataarray[8]} )) else # # If this line's date has changed, output to hospsum.csv, update prevdate, # and update accumulators. ***IMPORTANT*** "echo" TO hospsum.csv MUST BE # EXECUTED BEFORE UPDATING ACCUMULATORS AND prevdate*** echo "${prevdate},${accum2},${accum3},${accum4},${accum5},${accum6},${accum7},${accum8}" >&3 prevdate="${dataarray[0]}" accum2=${dataarray[2]} accum3=${dataarray[3]} accum4=${dataarray[4]} accum5=${dataarray[5]} accum6=${dataarray[6]} accum7=${dataarray[7]} accum8=${dataarray[8]} fi done</dev/shm/temp0.txt # # Close file hospsum.csv exec 3<&- IFS="${oldifs}" -- Roses are red Roses are blue Depending on their velocity Relative to you