Zhao Peng wrote:

string1_string2_string3_string4.sas7bdat

abc_st_nh_num.sas7bdat
abc_st_vt_num.sas7bdat
abc_st_ma_num.sas7bdat
abcd_region_NewEngland_num.sas7bdat
abcd_region_South_num.sas7bdat

My goal is to :
1, extract string2 from each file name
2, then sort them and keep only unique ones
3, then output them to a .txt file. (one unique string2 per line)

Solution #1:
ls -1 *sas7bdat|awk -F_ '{print $2}'|sort -fu|cat -n >output.txt

Take output of ls, 1 file per line (ls -1) - only files ending with sas7bdat
Feed into awk, splitting on _, print the 2nd field
Sort ignoring case, eliminating duplicates (sort options: f "folds case", u "keeps only uniques")
Number the lines (cat -n)
Put output in file named output.txt

Solution #2:
ls -1 *sas7bdat|sed 's/^\([a-zA-Z0-9]*_\)\([a-zA-Z0-9]*\)_.*$/\2/'|sort -fu|cat -n >output.txt Use sed (stream editor) to break up filenames into atoms separated by _, and output the 2nd one (the \2). Regular expressions (regex) can be very handy. ^ matches beginning of string, [a-zA-Z0-9]*_ matches letter/number string ending with _, the backslashed parentheses groups the patterns, so the 2nd one can be extracted.

There are many solutions to the problem, as you can see.

--
Dan Jenkins ([EMAIL PROTECTED])
Rastech Inc., Bedford, NH, USA --- 1-603-206-9951
*** Technical Support Excellence for over a quarter century

_______________________________________________
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss

Reply via email to