> On Tue, Aug 13, 2019 at 10:47 AM Rich Shepard <rshep...@appl-ecosys.com>
> wrote:
> 
> > On Tue, 13 Aug 2019, Robert Citek wrote:
> >
> > > Sounds like you used Emacs to do the equivalent of this:
> > >
> > > < hatchery_returns-2019-08-12.csv \
> > > tr -s '\r\n' '\n' |
> > > sed -e 's/, /,/g;s/,$//' \
> > >> hatchery_returns-2019-08-12.cleaned.csv
> > >
> > > Is that right?
> >
> > Robert,
> >
> > Nope.
> >
> > On the command line I ran:
> >
> > dd if=<infile> bs=1 | tr '\r' '\n' > <outfile>
> >
> > Then I put the outfile in an emacs buffer. No space at the beginning of the
> > file. Then I cleaned it by removing extraneous spaces and removing the
> > terminal comma when there were values for the last field in the line.
> >
> 
> Interesting.  I did a histogram on the number of fields.  Is it expected
> that the number of fields is not consistent across all records?
> 
> $ cat hatchery_returns-2019-08-12.csv | tr '\r' '\n' | awk -F, '{print NF}'
> | sort | uniq -c
>    2 0
>  100 41
>  100 53
> 10599 93
> 
> FWIW, cat is much faster than dd:

Someplace the iseek=1 or incase of gnu dd skip=1 got dropped,
cat can NOT do that.

> $ dd if=hatchery_returns-2019-08-12.csv bs=1 | tr '\r' '\n' | md5
> 12746089+0 records in
> 12746089+0 records out
> 12746089 bytes transferred in 37.538310 secs (339549 bytes/sec)
> f5450d6738a7d3242700a003266b03e0
> 
> $ time -p cat hatchery_returns-2019-08-12.csv | tr '\r' '\n' | md5
> f5450d6738a7d3242700a003266b03e0
> real 1.21
> user 1.24
> sys 0.03
> 
> Or did you mean to write bs=1m ?
no

> $ dd if=hatchery_returns-2019-08-12.csv bs=1m | tr '\r' '\n' | md5
> 12+1 records in
> 12+1 records out
> 12746089 bytes transferred in 1.227313 secs (10385361 bytes/sec)
> f5450d6738a7d3242700a003266b03e0
> 
> Although, I'm wondering why use dd ( or even cat ).

If you had followed the thread you would know that byte 1
of the file is a 0xA, aka LF, and the dd was to rip that
byte off the file, but the command got morphed cause I
used a BSD iseek=1 syntax, and gnu dd does not understand
that.

> 
> $ time -p < hatchery_returns-2019-08-12.csv tr '\r' '\n' | md5
> f5450d6738a7d3242700a003266b03e0
> real 1.20
> user 1.23
> sys 0.01
> 
> Regards,
> - Robert
> _______________________________________________
> PLUG mailing list
> PLUG@pdxlinux.org
> http://lists.pdxlinux.org/mailman/listinfo/plug
> 

-- 
Rod Grimes                                                 rgri...@freebsd.org
_______________________________________________
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to