On 04/18/2013 08:41 AM, George Brink wrote: > On Wed, Apr 17, 2013 at 9:13 PM, Pádraig Brady <p...@draigbrady.com> wrote: > >> On 04/17/2013 02:26 PM, George Brink wrote: >>> Hello, >>> >>> I have a task of extracting several "fields" from the text file. The >>> standard `cut` tool could be a perfect tool for a job, but... >>> In my file the '\n' character is a legal symbol inside fields and >> therefore >>> the text file uses other symbol for record-separator. And the `cut` has a >>> hard-coded '\n' for record separator (I just checked the source from the >>> coreutils-8.21 package). >> >> The patch would be simple but not without compatibility cost. >> I.E. scripts using this would immediately become incompatible >> with any systems without this feature. >> >> So you'd like something like tac -s, --separator >> However cut -s is taken, so we'd have to avoid the short -s at least. >> Also tac -s takes a string rather than a character, so >> that gives some extra credence (and complexity) to that option there. >> >> Also related would be to support the -z, --zero-terminated option. >> join, sort and uniq all have this option to use NUL as the record >> separator, >> however they're all closely related sort dependent utilities >> and we're trying to unify options between them. >> >> If it is just a character you want to separate on, >> then you can always use tr to convert before processing, >> albeit with associated data copying overhead. >> >> SEP=^ >> tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP" >> >> So given that cut is not special here among the text filters, >> and there is a workaround available, I'm 60:40 against >> adding this feature. >> >> thanks, >> Pádraig. >> > > Pádraig, > > Thank you for alternative suggestions. > Actually I just found yet another way to solve my problem: > perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]), > \"\002\");" data.dat >new_data.dat > It works fine, but I am a little concerned of the speed. I have over three > hundreds of such files, from 3Mb to 30Mb each. And this process should be > run every day... I thought that by using cut (which just looks for > delimiters) I can gain a few minutes on the whole process. > > Originally I though of adding "-r, --record-delimiter=DELIM" and > "--output-record-delimiter=DELIM: keys to the cut. > Then the example above could be done with > cut -d☺ -r☻ --output-delimiter=☺ --output-record-delimiter=☻ -f1-3,15-47 > data.dat >new_data.dat > I think it is feasible and would be more convenient (and hopefully faster) > than using a whole perl or two calls to tr.
Yes they're the tradeoffs. awk is often suggested too as an alternative to cut. > Bob, > I understand your desire to receive a discussion of features not inside the > bug related mail list, but here is a extract from the README: >> Mail suggestions and bug reports for these programs to >> the address on the last line of --help output. > And guess what, the `cut --help` has the bug-coreutils email in the last > line! The coreutils email is not mentioned inside README at all. And > bug-coreutils is mentioned several times in different context. > I apologize for using this mail-list inappropriately, but I did not know > about any other mail-lists No worries. I saw no issue with your mails. In future cut --help will just point at the following URL which hopefully is easier to follow: http://www.gnu.org/software/coreutils/ thanks, Pádraig.