> On Tue, Aug 13, 2019 at 10:47 AM Rich Shepard <rshep...@appl-ecosys.com> > wrote: > > > On Tue, 13 Aug 2019, Robert Citek wrote: > > > > > Sounds like you used Emacs to do the equivalent of this: > > > > > > < hatchery_returns-2019-08-12.csv \ > > > tr -s '\r\n' '\n' | > > > sed -e 's/, /,/g;s/,$//' \ > > >> hatchery_returns-2019-08-12.cleaned.csv > > > > > > Is that right? > > > > Robert, > > > > Nope. > > > > On the command line I ran: > > > > dd if=<infile> bs=1 | tr '\r' '\n' > <outfile> > > > > Then I put the outfile in an emacs buffer. No space at the beginning of the > > file. Then I cleaned it by removing extraneous spaces and removing the > > terminal comma when there were values for the last field in the line. > > > > Interesting. I did a histogram on the number of fields. Is it expected > that the number of fields is not consistent across all records? > > $ cat hatchery_returns-2019-08-12.csv | tr '\r' '\n' | awk -F, '{print NF}' > | sort | uniq -c > 2 0 > 100 41 > 100 53 > 10599 93 > > FWIW, cat is much faster than dd:
Someplace the iseek=1 or incase of gnu dd skip=1 got dropped, cat can NOT do that. > $ dd if=hatchery_returns-2019-08-12.csv bs=1 | tr '\r' '\n' | md5 > 12746089+0 records in > 12746089+0 records out > 12746089 bytes transferred in 37.538310 secs (339549 bytes/sec) > f5450d6738a7d3242700a003266b03e0 > > $ time -p cat hatchery_returns-2019-08-12.csv | tr '\r' '\n' | md5 > f5450d6738a7d3242700a003266b03e0 > real 1.21 > user 1.24 > sys 0.03 > > Or did you mean to write bs=1m ? no > $ dd if=hatchery_returns-2019-08-12.csv bs=1m | tr '\r' '\n' | md5 > 12+1 records in > 12+1 records out > 12746089 bytes transferred in 1.227313 secs (10385361 bytes/sec) > f5450d6738a7d3242700a003266b03e0 > > Although, I'm wondering why use dd ( or even cat ). If you had followed the thread you would know that byte 1 of the file is a 0xA, aka LF, and the dd was to rip that byte off the file, but the command got morphed cause I used a BSD iseek=1 syntax, and gnu dd does not understand that. > > $ time -p < hatchery_returns-2019-08-12.csv tr '\r' '\n' | md5 > f5450d6738a7d3242700a003266b03e0 > real 1.20 > user 1.23 > sys 0.01 > > Regards, > - Robert > _______________________________________________ > PLUG mailing list > PLUG@pdxlinux.org > http://lists.pdxlinux.org/mailman/listinfo/plug > -- Rod Grimes rgri...@freebsd.org _______________________________________________ PLUG mailing list PLUG@pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug