Everyone has different toolbox and regexp logic wired in their brains.
I prefer to break things down somewhat step by step - and think ahead - mostly
it is worth it.

The following is more complex at first, but, in my experience, as soon as you
sort by the first column - I would probably need to sort or do something by/with
the other columns - so turning it to clean unquoted CSV has always been good
investment to me.

I'd probably wired it up this universal way (sorting differently can be done
with changing the sort part only, doing some column ops can be done in awk
only):

cat sample.dat | sed "s/''/ /g;s/'//g"| sort -n -k1,1 -t , | sed "s/ /''/g" |
awk -v FS=, '{printf("'"'%s',%s,'%s','%s','%s','%s'"'\n",$1,$2,$3,$4,$5,$6)}'
'8',11,'2000-07-
18','Insecta','Ephemeroptera','Leptophlebiidae''Paraleptophlebia'
'11',11,'2000-07-18','Insecta','Trichoptera','Glossosomatidae''Agapetus'
'12',11,'2000-07-18','Insecta','Diptera','Tipulidae''Tipula'
'134',41,'2004-06-07','Insecta','Plecoptera','Nemouridae''Amphinemura'
'135',3,'2004-06-07','Insecta','Ephemeroptera','Baetidae''Baetus'
'137',41,'2004-06-07','Insecta','Ephemeroptera','Baetidae''Baetis'
'138',3,'2004-06-07','Insecta','Coleoptera','Hydrophilidae''Berosus'
'139',3,'2004-06-07','Insecta','Plecoptera','Chloroperlidae''Sweltsa'
'141',41,'2004-06-07','Insecta','Plecoptera','Chloroperlidae''Suwallia'
'145',3,'2004-06-07','Insecta','Diptera','Simulidae''Prosimulium'
'148',3,'2004-06-07','Annelida','Oligochaeta','Lumbricidae''Ilyodrilus/Tubifex'
'151',3,'2006-06-15','Insecta','Diptera','Chironomidae''Eukiefferiella'
'154',41,'2004-06-07','Insecta','Coleoptera','Dytiscidae''Hydrovatus'
'155',3,'2004-06-07','Insecta','Coleoptera','Dytiscidae''Hydrovatus'
'216',SC,'2005-07-13','Insecta','Diptera','Ephydridae'''
'592',17,'2011-07-11','Annelida','Oligochaeta','Tubificidae'
'648',17,'2011-07-11','Insecta','Plecoptera','Chloroperlidae''Suwallia'
'652',17,'2011-07-11','Insecta','Plecoptera','Pteronarcidae''Pteronarcella'
'895',17,'2010-09-13','Insecta','Ephemeroptera','Baetidae''Baetis'
'899',17,'2010-09-13','Insecta','Diptera','Psychodidae''Pericoma'
'901',17,'2010-09-13','Insecta','Coleoptera','Hydrophilidae''Cymbiodyta'
'907',17,'2010-09-13','Insecta','Trichoptera','Glossosomatidae''Glossosoma'
'909',17,'2010-09-13','Insecta','Diptera','Chironomidae''Cladotanytarsus'
'914',17,'2010-09-13','Insecta','Plecoptera','Nemouridae''Zapada'
'918',17,'2010-09-13','Insecta','Trichoptera','Hydropsychidae''Hydropsyche'
'919',17,'2010-09-13','Insecta','Coleoptera','Dytiscidae''Hydroporus'
'920',17,'2010-09-13','Insecta','Trichoptera','Lepidostomatidae''Lepidostoma'
'922',17,'2010-09-13','Insecta','Coleoptera','Elmidae''Narpus'
'1120',17,'2006-06-27','Insecta','Diptera','Chironomidae''Polypedilum'
'1126',17,'2006-06-27','Insecta','Ephemeroptera','Baetidae''Baetis'
'1128',17,'2006-06-27','Insecta','Trichoptera','Brachycentridae''Brachycentrus'
'1129',17,'2006-06-27','Insecta','Diptera','Chironomidae''Tvetenia'
'2060',11,'2012-07-11','Insecta','Coleoptera','Elmidae''Narpus'
'2061',11,'2012-07-11','Insecta','Diptera','Chironomidae''Natarsia'
'2062',11,'2012-07-11','Insecta','Trichoptera','Hydroptilidae''Ochrotrichia'

I really, really dislike quoted CSVs - what a waste.

Tomas

PS: I did not see any B(W) in your examples! Perhaps that is why looking for it
with sed did not work.

On Tue, 2020-03-31 at 08:53 +0900, J. Hart wrote:
> try this :
> 
> cat sample.dat | sed "s|^'\([0-9]*\)'|\1 '\1'|" | sort -n | sed 
> "s|^[0-9]* ||" | tee sample.dat.new
> 
> 
> On 03/31/2020 08:30 AM, Rich Shepard wrote:
> > sample.dat:
> > 
> > '648',17,'2011-07-11','Insecta','Plecoptera','Chloroperlidae''Suwallia'
> > '652',17,'2011-07-11','Insecta','Plecoptera','Pteronarcidae''Pteronarcella' 
> > 
> 
> _______________________________________________
> PLUG mailing list
> PLUG@pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug
_______________________________________________
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to