Zhao Peng wrote:
string1_string2_string3_string4.sas7bdat
abc_st_nh_num.sas7bdat
abc_st_vt_num.sas7bdat
abc_st_ma_num.sas7bdat
abcd_region_NewEngland_num.sas7bdat
abcd_region_South_num.sas7bdat
My goal is to :
1, extract string2 from each file name
2, then sort them and keep only unique ones
3, then output them to a .txt file. (one unique string2 per line)
Solution #1:
ls -1 *sas7bdat|awk -F_ '{print $2}'|sort -fu|cat -n >output.txt
Take output of ls, 1 file per line (ls -1) - only files ending with sas7bdat
Feed into awk, splitting on _, print the 2nd field
Sort ignoring case, eliminating duplicates (sort options: f "folds
case", u "keeps only uniques")
Number the lines (cat -n)
Put output in file named output.txt
Solution #2:
ls -1 *sas7bdat|sed 's/^\([a-zA-Z0-9]*_\)\([a-zA-Z0-9]*\)_.*$/\2/'|sort
-fu|cat -n >output.txt
Use sed (stream editor) to break up filenames into atoms separated by _,
and output the 2nd one (the \2). Regular expressions (regex) can be very
handy. ^ matches beginning of string, [a-zA-Z0-9]*_ matches
letter/number string ending with _, the backslashed parentheses groups
the patterns, so the 2nd one can be extracted.
There are many solutions to the problem, as you can see.
--
Dan Jenkins ([EMAIL PROTECTED])
Rastech Inc., Bedford, NH, USA --- 1-603-206-9951
*** Technical Support Excellence for over a quarter century
_______________________________________________
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss