Re: extract string from filename

2006-01-13 Thread Paul Lussier
Tom Buskey <[EMAIL PROTECTED]> writes:

> Unix Shell Programming by Kochan and Wood is a classic on shell programming
>
>
> Portable Shell Programming by Blinn
> The Awk Programming Language by Aho, Weinberger and Kernighan

I'm also a big fan of Kernighan and Pikes, "The UNIX Programming
Environment".  When I first saw this book I thought it was going to be
more of a C programming book explaining thinks like linking and
compiling under UNIX. However, it turned out to be simply a great book
on how to get around the shell and do a variety of things in the UNIX
environment.  So named the UNIX Progamming Environment because, as
we've all seen here, the shell is *programmable* :)

And, yet another plug for my all-time favorite UNIX book, "The UNIX
Philosophy" by Mike Gancarz, which has recently been updated with a
second edition (which I have not yet read) The Linux and UNIX
Philosophy.  This book does a fantastic job of explaining exactly
*why* UNIX is such a great environment, and why other competing
environments just can't compete when what you need is raw power and
flexibility.
-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Paul Lussier
[EMAIL PROTECTED] (Kevin D. Clark) writes:

> Zhao Peng writes:
>
>> I'm back, with another "extract string" question. //grin
>
>
> find FOLDERNAME -name \*sas7bdat -print | sed 's/.*\///' | cut -d _ -f 2 | 
> sort -u > somefile.txt

Or, to simplify this:

  find ./ -name \*sas7bdat | awk -F_ '{print $2}' |sort -u
  ls *sas7bdat | perl -F_ -ane 'print "$F[1]\n";'|sort -u
  perl -e 'opendir(DIR,"."); map { if (/sas7bdat$/) { $k = (split(/_/,$_))[1]; 
$f{$k} =1; } } readdir(DIR); map { print "$_\n";}sort keys %f;'

That last one might be a little better formatted like:

  perl -e 'opendir(DIR,".");
   map { if (/sas7bdat$/) { 
   $k = (split(/_/,$_))[1];
   $f{$k}=1; 
 }
   } readdir(DIR);
   map { print "$_\n";} sort keys %f;'

It should be rather obvious that your best bet for quick one-liners
for this type of thing is to probably stick with standard UNIX tools
like sort, cut, sed, awk, etc.  Perl is great for text manipulation,
but as you can see, none of the perl one-liners has been nearly as
concise as the shell variants.  If speed matters, or process overhead,
then maybe perl is better.  Of course for such a small data set as
you've given, the perl versions are both harder and longer to type.

hth.
-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Dan Jenkins

Zhao Peng wrote:


string1_string2_string3_string4.sas7bdat

abc_st_nh_num.sas7bdat
abc_st_vt_num.sas7bdat
abc_st_ma_num.sas7bdat
abcd_region_NewEngland_num.sas7bdat
abcd_region_South_num.sas7bdat

My goal is to :
1, extract string2 from each file name
2, then sort them and keep only unique ones
3, then output them to a .txt file. (one unique string2 per line)


Solution #1:
ls -1 *sas7bdat|awk -F_ '{print $2}'|sort -fu|cat -n >output.txt

Take output of ls, 1 file per line (ls -1) - only files ending with sas7bdat
Feed into awk, splitting on _, print the 2nd field
Sort ignoring case, eliminating duplicates (sort options: f "folds 
case", u "keeps only uniques")

Number the lines (cat -n)
Put output in file named output.txt

Solution #2:
ls -1 *sas7bdat|sed 's/^\([a-zA-Z0-9]*_\)\([a-zA-Z0-9]*\)_.*$/\2/'|sort 
-fu|cat -n >output.txt
Use sed (stream editor) to break up filenames into atoms separated by _, 
and output the 2nd one (the \2). Regular expressions (regex) can be very 
handy. ^ matches beginning of string, [a-zA-Z0-9]*_ matches 
letter/number string ending with _, the backslashed parentheses groups 
the patterns, so the 2nd one can be extracted.


There are many solutions to the problem, as you can see.

--
Dan Jenkins ([EMAIL PROTECTED])
Rastech Inc., Bedford, NH, USA --- 1-603-206-9951
*** Technical Support Excellence for over a quarter century

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Michael ODonnell


"cat -n" will number output lines

 
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Ben Scott
On 1/13/06, Ben Scott <[EMAIL PROTECTED]> wrote:
> On 1/13/06, Zhao Peng <[EMAIL PROTECTED]> wrote:
> > Is it possible to number the extracted string2?
>
> find -name \*sas7bdat -printf '%f\n' | cut -d _ -f 2 | sort | uniq | cat -n

  I forgot to mention: If the *only* files in that directory are the
ones with the interesting file names, you can just use this:

ls | cut -d _ -f 2 | sort | uniq | cat -n

-- Ben "I would flunk the quiz" Scott
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Ben Scott
On 1/13/06, Zhao Peng <[EMAIL PROTECTED]> wrote:
> Is it possible to number the extracted string2?

find -name \*sas7bdat -printf '%f\n' | cut -d _ -f 2 | sort | uniq | cat -n

  Run that pipeline in the directory you are interested in.

  The find(1) command finds files, based on their name or other
filesystem attributes.

  The "-name \*sas7bdat" part finds files with file names which match
the pattern.  There backslash escapes the star, to keep the shell from
trying to interpret it, so find gets the star instead.

  The "-printf '%f\n'" part has find output just the file name, not the path.

  cut(1) is used to split input strings, as you know.  "-d _" splits
into fields, based on underscores.  "-f 2" outputs the second field
only, one per line.

  sort(1) sorts, and uniq(1) eliminates duplicate lines.

  "cat -n" numbers the output.

-- Ben "Pay attention, there's gonna be a quiz next week" Scott
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Jeff Kinz
On Fri, Jan 13, 2006 at 11:40:26AM -0500, Zhao Peng wrote:
> Kevin,
> 
> Thank you very much! I really appreciate it.
> 
> I like your "find" approach, it's simple and easy to understand.
> 
> I'll also try to understand your perl approach, when I got time to start 
> learning it. (Hopefully it won't be un-fulfilled forever)
> 
> I have one more question:
> 
> Is it possible to number the extracted string2?
> 
> Say, the output file contains the following list of extracted string2:
> 
> st
> region
> local
> 
> Any idea about what command to use to number the list  to make it look 
> like below:
> 
> 1 st
> 2 region
> 3 local


Pipe the output into "pr -n -T"

This is not pr's intended use, but it will work.  -n option means "put
numbers on the lines, -T option means "No page breaks".

The "-n" option appears to be missing from the FC2 man pages.


-- 
Jeff Kinz, Emergent Research, Hudson, MA.
speech recognition software may have been used to create this e-mail

"The greatest dangers to liberty lurk in insidious encroachment by men
of zeal, well-meaning but without understanding." - Brandeis

To think contrary to one's era is heroism. But to speak against it is
madness. -- Eugene Ionesco
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Zhao Peng

Kevin,

Thank you very much! I really appreciate it.

I like your "find" approach, it's simple and easy to understand.

I'll also try to understand your perl approach, when I got time to start 
learning it. (Hopefully it won't be un-fulfilled forever)


I have one more question:

Is it possible to number the extracted string2?

Say, the output file contains the following list of extracted string2:

st
region
local

Any idea about what command to use to number the list  to make it look 
like below:


1 st
2 region
3 local

Again, thank you for your help and time!

Zhao

Kevin D. Clark wrote:

Zhao Peng writes:

  

I'm back, with another "extract string" question. //grin




find FOLDERNAME -name \*sas7bdat -print | sed 's/.*\///' | cut -d _ -f 2 | sort -u 
> somefile.txt

or

perl -MFile::Find -e 'find(sub{$string2 = (split /_/)[2]; $seen{$string2}++; }, @ARGV); 
map { print "$_\n"; } keys(%seen)' FOLDERNAME

(which looks more readable as:

  perl -MFile::Find -e 'find(sub{ $string2 = (split /_/)[2];
  $seen{$string2}++;
 }, @ARGV);
  
 map { print "$_\n"; } keys(%seen)' \

  FOLDERNAME > somefile.txt

)

Either of which solves the problem that you describe.  Actually, they
solve more than the problem that you describe, since it wasn't
apparent to me if you had any subdirectories here, but this is solved too)

(substitute FOLDERNAME with your directory's name)


Honestly, the first solution I present is the way I would have solved
this problem myself.  Very fast this way.

Regards,

--kevin
  


___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Larry Cook

Zhao Peng wrote:

My goal is to :
1, extract string2 from each file name
2, then sort them and keep only unique ones
3, then output them to a .txt file. (one unique string2 per line)


It is really interesting how many ways there are to do things in *nix.  My 
first reaction, if this is a one time event, is to just use vi:


% ls *.sas7bdat > string2.txt
% vi string2.txt
:%s/^[^_]*_//
:%s/_.*$//
:%!sort -u
:wq

The first regex removes the first underscore and everything in front of it, 
while the second regex removes what is now the first underscore (was the 
second originally) and everything after it.  And then I do the unique sort 
right in vi.


Larry
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Bill McGonigle

On Jan 12, 2006, at 19:40, Zhao Peng wrote:

I also downloaded an e-book called "Learning Perl" (OReilly, 
4th.Edition), and had a quick look thru its Contents of Table, but did 
not find any chapter which looks likely addressing any issue related 
to my question.


Good start.  Read these sections: 'A Stroll Through Perl', 'The Split 
and Join Functions', 'Lists and Arrays', 'Hashes', 'Directory Access', 
and 'File Manipulation'.


Your description is the outline of the algorithm.  Take this script 
where I've filled in the requisite perl and figure out how it works:


#!/usr/bin/perl -w
use strict;   # show stupid errors
use warnings FATAL=>'all';# don't let you get away with them

#I have almost 1k small files within one folder. The only pattern of 
the file names is:
my $dirname = shift; # take the command line parameter as the directory 
name

opendir DIRECTORY, $dirname;
my @files = readdir(DIRECTORY);
closedir DIRECTORY;

#string1_string2_string3_string4.sas7bdat

#Note:
#1, string2 often repeat itself across each file name
#2, All 4 strings contain no underscores.
#3, 4 strings are separated by  3 underscores (as you can see)
#4, The length of all 4 strings are not fixed.

my (@part_2s);  # we'll keep the second parts here
foreach my $file (@files) {
next if (($file eq '.') or ($file eq '..')); # the directory will 
contain . and .. which we don't want

#My goal is to :
#1, extract string2 from each file name
my ($filename,$extension) = split('\.',$file); # don't forget to 
escape the . since this is a regex

my @strings = split('_',$filename);
my $part_2 = $strings[1]; # remember, arrays in perl are 
zero-indexed
push(@part_2s,$part_2);   # store the data we want on the end of 
the array

}

#2, keep only unique ones
# perl trick using a hash to easily get unique items
my (%temp_hash);
foreach my $part (@part_2s) {
$temp_hash{$part} = 1;
}
my @uniques = (keys %temp_hash);

# and then sort them
my @sorted = sort { $a cmp $b}  (@uniques);  # cmp for string storting

#3, then output them to a .txt file. (one unique string2 per line)
open OUTFILE, ">output.txt";
foreach my $item (@sorted) {
print OUTFILE $item . "\n";
}
close OUTFILE;

When you understand each line you'll be able to solve future similar 
problems easily.  Note Kevin's perl solution is equally valid and 
probably faster, but you're not going to grok it until you excercise 
the perl part of your brain for a while.


-Bill
-
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
[EMAIL PROTECTED]   Cell: 603.252.2606
http://www.bfccomputing.com/Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Tom Buskey
On 1/12/06, Ben Scott <[EMAIL PROTECTED]> wrote:
On 1/12/06, Zhao Peng <[EMAIL PROTECTED]> wrote:> I'm back, with another "extract string" question. //grin  It sounds like you could use a tutorial on Unix text processing and
command line tools, specifically, one which addresses pipes andredirection, as well as the standard text tools (grep, cut, sed, awk,etc.).  While Paul's recommendation about the O'Reilly regularexpressions book is valid, I suspect it might be a little too focused
on regex's and not cover some of the *other* elements you seem to beneeding.  It's been forever for me, but I seem to recall that _Unix PowerTools_, also published by O'Reilly, covers all of the above and much,
much more.  If others on this list second my suggestion, you mightwant to obtain a copy.  Alternatively, maybe list members can suggestalternatives?Unix Shell Programming by Kochan and Wood is a classic on shell programming
Portable Shell Programming by BlinnThe Awk Programming Language by Aho, Weinberger and KernighanPower Tools is excellent but it more of a tip book in my mind.  Not as much as the Hack series though. 
  There are also a number of free guides at the Linux DocumentationProject.  See:
http://www.tldp.org/guides.html  Look for anything mentioning "bash" (the Bourne-again shell) orscripting.  I can't speak as to how good they are, but you can't beat
the price.Some of them are very good.  And the examples work. -- A strong conviction that something must be done is the parent of many bad measures.
  - Daniel Webster


Re: extract string from filename

2006-01-13 Thread Ted Roche

On Jan 12, 2006, at 8:25 PM, Ben Scott wrote:


  It sounds like you could use a tutorial on Unix text processing and
command line tools, specifically, one which addresses pipes and
redirection, as well as the standard text tools (grep, cut, sed, awk,
etc.).  While Paul's recommendation about the O'Reilly regular
expressions book is valid, I suspect it might be a little too focused
on regex's and not cover some of the *other* elements you seem to be
needing.


Gee, I wonder if that would be a good topic for a meeting .

Bruce Dawson and David Berube did a presentation on Regular  
expressions that helped me grasp what they were and why I'd want to  
know more. Bought the Reg Exp book on my next visit to SoftPro .


A similar kind of presentation that explained the place of sed, grep,  
awk, pipes, redirection, tee and so forth.



  It's been forever for me, but I seem to recall that _Unix Power
Tools_, also published by O'Reilly, covers all of the above and much,
much more.  If others on this list second my suggestion, you might
want to obtain a copy.  Alternatively, maybe list members can suggest
alternatives?


Re: UNIX Power Tools. Third time I've heard that recommended. Guess  
I'll add that to my wish list.


Jerry Peek (http://www.oreillynet.com/pub/au/28 - a number of  
articles and book extracts linked here), one of the original authors  
of Unix Power Tools, has been running a series in Linux Magazine for  
a while now on working from the command line, including the  
inscrutable 2&1 and other arcana.


Linux magazine is online at http://www.linux-mag.com/ and posts their  
issues sixty days after publication at http://www.linux-mag.com/ 
backissues/.


Ben's other links are quite useful, too. The Answers Are Out There.  
The challenge is finding the answer you need now.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-12 Thread Kevin D. Clark

Zhao Peng writes:

> I'm back, with another "extract string" question. //grin


find FOLDERNAME -name \*sas7bdat -print | sed 's/.*\///' | cut -d _ -f 2 | sort 
-u > somefile.txt

or

perl -MFile::Find -e 'find(sub{$string2 = (split /_/)[2]; $seen{$string2}++; }, 
@ARGV); map { print "$_\n"; } keys(%seen)' FOLDERNAME

(which looks more readable as:

  perl -MFile::Find -e 'find(sub{ $string2 = (split /_/)[2];
  $seen{$string2}++;
 }, @ARGV);
  
 map { print "$_\n"; } keys(%seen)' \
  FOLDERNAME > somefile.txt

)

Either of which solves the problem that you describe.  Actually, they
solve more than the problem that you describe, since it wasn't
apparent to me if you had any subdirectories here, but this is solved too)

(substitute FOLDERNAME with your directory's name)


Honestly, the first solution I present is the way I would have solved
this problem myself.  Very fast this way.

Regards,

--kevin
-- 
(There are also also 228 babies named Unique during the 1990s alone,
and 1 each of Uneek, Uneque, and Uneqqee.)

-- _Freakonomics_, Steven D. Levitt and Stephen J. Dubner


[but no Unix folks named their kids "uniq", apparently.  --kevin]

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-12 Thread Python
On Thu, 2006-01-12 at 19:40 -0500, Zhao Peng wrote:
> For example:
> abc_st_nh_num.sas7bdat
> abc_st_vt_num.sas7bdat
> abc_st_ma_num.sas7bdat
> abcd_region_NewEngland_num.sas7bdat
> abcd_region_South_num.sas7bdat

You're not the only one learning here.  

I put these names into a file called str2-test-data

$ cut -d _ -f 2 str2-test-data | sort | uniq
region
st

I think that you could use:
ls | cut -d _ -f 2 | sort | uniq > str2-results.txt

-- 
Lloyd Kvam
Venix Corp

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-12 Thread Ben Scott
On 1/12/06, Zhao Peng <[EMAIL PROTECTED]> wrote:
> I'm back, with another "extract string" question. //grin

  It sounds like you could use a tutorial on Unix text processing and
command line tools, specifically, one which addresses pipes and
redirection, as well as the standard text tools (grep, cut, sed, awk,
etc.).  While Paul's recommendation about the O'Reilly regular
expressions book is valid, I suspect it might be a little too focused
on regex's and not cover some of the *other* elements you seem to be
needing.

  It's been forever for me, but I seem to recall that _Unix Power
Tools_, also published by O'Reilly, covers all of the above and much,
much more.  If others on this list second my suggestion, you might
want to obtain a copy.  Alternatively, maybe list members can suggest
alternatives?

  There are also a number of free guides at the Linux Documentation
Project.  See:

http://www.tldp.org/guides.html

  Look for anything mentioning "bash" (the Bourne-again shell) or
scripting.  I can't speak as to how good they are, but you can't beat
the price.

  Anyway, on to your question...

> I tried to use "cut" commands, but can't even figure out how to use the
> filenames as input. Anyone care to offer me a hint?

  You'll want to pipe the output of "ls" to cut.  This should get you started:

  ls -1 | cut -d _ -f 2

  The "-1" switch to ls(1) tells it to output a single column of file
names.  Some versions of "ls" do this automagically when using
redirection, but it is best to be sure.  The "-d _" switch to cut(1)
tells cut to split fields on the underscore.  The "-f 2" selects the
second field.

  See also: sort(1), uniq(1)

  Hope this helps!

-- Ben "Unix plumber" Scott
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-12 Thread Thomas Charron
On 1/11/06, Ben Scott <[EMAIL PROTECTED]> wrote:
> I felt really embarrassed with my stupid mistake. //blushYou think you were embarrassed?  There was a certain instance of
someone accidentally hitting "Reply to All" to a list message which isstill remembered to this day.  I won't mention any names because Idon't want to get any squeegees in trouble.  ;-)
 
  God damned I wish I had an archive of that thread.
 
  Anyone have one?
 
  Thomas 


Re: extract string

2006-01-11 Thread Kevin D. Clark

Zhao Peng <[EMAIL PROTECTED]> writes:

> You said that "there is an extra column in the 3rd line". I disagree
> with you from my perspective. As you can see, there are 3 commas in
> between "jesse" and "Dartmouth college". For these 3 commas, again, if
> we think the 2nd one as an merely indication that the value for age
> column is missing, then the 3rd line will be be read as ["jesse",
> MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth
> college"] as you suggested.

>From my perspective, your file format makes it harder to be parsed.
If at all possible, I would suggest that if you can, you modify this
file's format.

Still, if this isn't possible, this works on your input:

perl -lane 's/,,/,MISSING/g; @F = split /,/; if (index($F[-1], "univ") != -1) { 
($u = $F[-1]) =~ y/"//d; print $u }'


Formatted more readibly, this looks like this:

perl -lne 's/,,/,MISSING/g;
@F = split /,/; 

if (index($F[-1], "univ") != -1) {
  ($u = $F[-1]) =~ y/"//d;
  print $u
}'


This seems to be a reasonable solution to your problem.  I hope it
helps.


Just another Perl hacker,

--kevin
-- 
GnuPG ID: B280F24E

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread Drew Van Zandt
perl

split on the char pair ,"
Take last element of returned array, either remove the " at the end or replace the one you ate with the split.
Keep a running variable containing largest length encountered so far.
Add 10 to be safe.  ;-)

Any regexp I have to think about for more than 30 seconds is unlikely
to be used unless it greatly improves my execution speed...and then
only if I have a LOT of data to process.  :-)

--Drew "Not showing you my crufty perl" VZ



Re: extract string

2006-01-11 Thread Thomas Charron
On 1/11/06, Bill McGonigle <[EMAIL PROTECTED]> wrote:
On Jan 11, 2006, at 08:42, [EMAIL PROTECTED] wrote:> This poses an interesting problem. The "," is being used for two
> purposes: a delimiter *AND* as a place holder.Now, for the Lazy, Perl regular expressions are a state machine ofsorts.  I suspect you might be able to do the right thing withgreedy/non-greedy matches.  Someone who lives and breathes regex might
have a better handle on this.  It would take me two hours to get thisone figured out.
 
  Hehe, it'd be one of those really, REALLY ugly Regular expressions that, when you stare at it long enough, looks like ASCII art in order to make it 100%.  ;-)
 
  Thomas 


Re: extract string

2006-01-11 Thread Ben Scott
On 1/11/06, Zhao Peng <[EMAIL PROTECTED]> wrote:
> Secondly I'm sorry for the big stir-up as to "homework problems" which
> flooded the list, since I'm origin of it.

  *Trust me*, that wasn't a "big" stir-up.  Search the list archives
for "taxes" if you want to see big ones.  The homework thread was even
more remarkable for being a debate we *haven't* had before ad nasueam.
 :)

> Kenny, your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt"
> works. I mis-read /\ as a simliar sign on the top of "6" key on the
> keyboard ...

  I was wondering if it might be a transcription error.  While shell
syntax and regular expressions are very powerful, they tend to be very
cryptic as well.  That's why I spelled out what each character was.

  The ^ character is called a "caret", by the way.

> I felt really embarrassed with my stupid mistake. //blush

  You think you were embarrassed?  There was a certain instance of
someone accidentally hitting "Reply to All" to a list message which is
still remembered to this day.  I won't mention any names because I
don't want to get any squeegees in trouble.  ;-)

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread Bill McGonigle

On Jan 11, 2006, at 08:42, [EMAIL PROTECTED] wrote:

This poses an interesting problem. The "," is being used for two 
purposes: a delimiter *AND* as a place holder.


I tried to prove to myself last night that this method would produce 
unresolvable ambiguities, but if you think like a state machine, 
character-by-character, it seems to work.


Now, for the Lazy, Perl regular expressions are a state machine of 
sorts.  I suspect you might be able to do the right thing with 
greedy/non-greedy matches.  Someone who lives and breathes regex might 
have a better handle on this.  It would take me two hours to get this 
one figured out.


This format sure makes the parser harder though, so if there's another 
way to get the data that's going to be desirable.  You can't use 
Text::CSV::Simple anymore, for instance, which gives you a 15-minute 
explicit reusable solution.


-Bill

-
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
[EMAIL PROTECTED]   Cell: 603.252.2606
http://www.bfccomputing.com/Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread Thomas Charron
On 1/11/06, Zhao Peng <[EMAIL PROTECTED]> wrote:
Hi All,First I really cannot be more grateful for the answers to my questionfrom all of you, I appreciate your help and time. I'm especially touched
by the outpouring of response on this list., which I have neverexperienced  before anywhere else.
 
  I hope my little comment didn't seem mean, I was more poking fun at the fact that if someone posted a simular post, and called themselves a Systems Administrator on a Windows network, comments simular to mine would have come forth..  ;-)

 
Secondly I'm sorry for the big stir-up as to "homework problems" whichflooded the list, since I'm origin of it.

 
  Nah, it wasn't a flood.  Trust me, once you see a flood, you'll know it.  Usually, it's becouse someone says something political in nature.
 
Kenny, regarding missing column issue, let me try to explain it again.Below is quoted from my original post:
Also, if one column is missing, and "," is used to indicate that missingcolumn, like the following (2nd column of 3rd line is missing):"name","age","school"
"jerry" ,"21","univ of Vermont""jesse",,,"Dartmouth college""jack","18","univ of Penn""john","20","univ of south Florida"
===You said that "there is an extra column in the 3rd line". I disagreewith you from my perspective. As you can see, there are 3 commas inbetween "jesse" and "Dartmouth college". For these 3 commas, again, if
we think the 2nd one as an merely indication that the value for agecolumn is missing, then the 3rd line will be be read as ["jesse",MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth
college"] as you suggested.
 
  This is unusual, as typically, a comma delimited set of values would simply have nothing between the commas, or a set of quotes with no data.
 
  Typically the line would look like this:
 
"jesse",,"Dartmouth college"
 
  Or
 

"jesse","","Dartmouth college" 
Paul, as to your "simplest by what measurement" question. I was thinkingof both "easiest to remember" and "easiest to understand" when I was
posting my question. Now I desire for "most efficient" approach. I knowthat will be my homework.
 
  If this is something that you will be doing repeatedly for different files types, I'd highly suggest getting familiar with regular expressions.  You've seen a small snippet in Kenny's example 'sed s/\"//g'.  The 's/\"//g' says to globally replace all quotes with nothing (s = substitute, /1/2/ says 'replace everything matching 1 with 2', in this case, a quote, with nothing.  g means globally, aka, do it more then just once.  Regular expressions are a powerful way to parse text files based on a given pattern, to get at the data you want.

 
Part of my primary job responsibilities is to convert raw data into SASdata sets. My "extract string" question comes from processing a raw data
file in .txt format, which doesn't have any documentation, except thevariable list. By looking at the raw data, I know that each variable isseparated by a comma. For one particular variable(column) called"school", the length of some of its value is quite long(like: Univ of
Wisconsin at Madison, Health Sci Ctr), but I don't know the definitelength. I need to know it, because if the length I specify it notenough, only partial values will be read. Many of its values contain"univ", so I just thought if I could extract all strings containing
"univ" from that variable(column), I will have a better chance to figureout the length of "school". That's why I had this question.
 
  Haven't even run it, but something perl like:
 
my $maxlen = 0;while(<>) {  /^(.*),(.*),(.*)$/;  if(length($3) > $maxlen) {    $maxlen = $3;  }}print "Longest String in third column is $maxlen\n";
 
  This would read on STDIN till it couldn't read anymore.  Each line, it would split based on the commas (If the third column contains commas, this won't work, becouse $2 or $1 would be greedy and gobble some of the data, FYI), and check the length of the third field against max length.  If it's longer, assign it.  At the end, print it out.

 
  This Regular _expression_ isn't great, but it's the 20 second typing version.
 
  Thomas


Re: extract string -- TIMTOWTDI

2006-01-11 Thread Paul Lussier
William D Ricker <[EMAIL PROTECTED]> writes:

>> On 1/10/06, Paul Lussier <[EMAIL PROTECTED]> wrote:
>> > > perl -ne 'split ","; $_ = $_[2]; s/(^")|("$)//g; print if m/univ/;' <
>> > > abc.txt > def.txt
>> > Egads!

[outstanding explanation I didn't have time to write myself removed ]

> None of this is seriously obfuscatory golfing, but if someone wanted to
> say darn the cost of forking new processes off bash, 'awk/cut|grep|sed'
> is easier to read, well, I won't argue that it's easier for him/her
> to read, and they should do it that way -- unless they need to tune
> for performance.

I would, however, offer that if someone were to find
'awk/cut|grep|sed' easier to read, then that person a) wouldn't have
asked this question ;) and b) would certainly benefit from learning
perl for those times when "the cost of forking new processes off bash"
can't be ignored for some reason :) Additionally, perl offers the
benefit of a debugger which can be immensely helpful for even simple
"one liner" tasks.
-- 
Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread Kevin D. Clark

Zhao Peng writes:

> ... your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt"
> works.

It "works" but is it correct?

What happens if you pass it the following line of input?:

  "Aunivz","28","Cambridge Community College"

By your original problem description, you don't want to see "Cambridge
Community College" but there it is.

I might have overlooked something, but I believe that I have only seen
two people post correct solutions so far.

Just something to think about.

Regards,

--kevin
-- 
(There are also also 228 babies named Unique during the 1990s alone,
and 1 each of Uneekm, Uneque, and Uneqqee.)

-- _Freakonomics_, Steven D. Levitt and Stephen J. Dubner


[but no Unix folks named their kids "uniq", apparently.  --kevin]

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread Paul Lussier
Zhao Peng <[EMAIL PROTECTED]> writes:

> First I really cannot be more grateful for the answers to my question
> from all of you, I appreciate your help and time. I'm especially
> touched by the outpouring of response on this list., which I have
> never experienced  before anywhere else.

Zhao, this is a pretty amazing list, as you and many others have
discovered.  It's seldom I find as good, or complete, answers anywhere
else.  And most often, the ensuing discussion is more interesting,
educational, and enlightening than the original question posed.  (It's
often amusing to me when I google for an answer to a question and
within the top 10 returns from google is a reference to this list.
More amusing is when it was *I* who answered the question for someone
else which I am now asking :)

> Kenny, your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt"
> works. I mis-read /\ as a simliar sign on the top of "6" key on the
> keyboard(so when I typed that sign, I felt strange that it is much
> smaller than /\, but didn't realize that they just are not the same
> thing), instead of forward slash and back slash. I felt really
> embarrassed with my stupid mistake. //blush

A, this makes so much more sense now.  So you in fact typed
something like:

  grep univ abc.txt | cut -f3 -d, | sed s/^/\>/g 

?

That still doesn't end up with a '>' in def.txt, but depending upon
exactly what you typed, I can certainly see where the use of ^ instead
of /\ could result in something like that.

For educational purposes, the use of ^ is to "anchor" following
pattern to match from the beginning of the line.  Therefore:

 sed 's/foo/bar/g'

and

 sed 's/^foo/bar/g'

are very different, since the former results in all occurrences of
'foo' being replaced with 'bar', whereas the latter only changes foo
to bar when foo is found at the beginning of the line.  The use of '$'
in a pattern does exactly the same thing, except for it anchors
patterns at the *end* of a line.

Btw, I highly recommend reading the O'Reilly book on Regular
Expressions.  If you're going to be doing a lot of this type of data
mining, a solid understanding of regexps and mastery of perl will make
your life significantly more fun.

Also, you might want to play with with writing perl/shell scripts that
output data parseable by gnuplot which allow you to auto-generate some
rather interesting and complicated graphs of the data (I know SAS can
do all this, but I bet it's no where as interesting or fun as learning
the UNIX way of doing it, and you don't need an SAS license either ;)

> You said that "there is an extra column in the 3rd line". I disagree
> with you from my perspective. As you can see, there are 3 commas in
> between "jesse" and "Dartmouth college". For these 3 commas, again, if
> we think the 2nd one as an merely indication that the value for age
> column is missing, then the 3rd line will be be read as ["jesse",
> MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth
> college"] as you suggested.

If you're going to be doing a lot of this type of thing, then perl
will most definitely be your best friend :)a

> Paul, as to your "simplest by what measurement" question. I was
> thinking of both "easiest to remember" and "easiest to understand"
> when I was posting my question. Now I desire for "most efficient"
> approach. I know that will be my homework.

Well, again, most efficient by what measurement.  In the long run, I'm
going to bet it's in your best interests to learn perl, since it's one
tool which will allow you write rather small and arbitrarily complex
scripts which would mostly obviate the need to learn several different
tools like cut, sed, awk, comm, etc.  In fact, learning perl will
likely lead you to learn about these other tools over time as the
situation dictates, but make you vastly more productive in the short
term.  Since perl excels at textual manipulation, it's perfect for
this type of data analysis.  And, since perl, combined with gnuplot,
is simple to run from an Apache web server Well, I'm sure your
imagination will lead you to wherever you need to go :)

Good luck, and please feel free to post more interesting questions.

-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread klussier

 -- Original message --
From: Zhao Peng <[EMAIL PROTECTED]>
> Hi All,
> 

> Kenny, your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt" 
> works. I mis-read /\ as a simliar sign on the top of "6" key on the 
> keyboard(so when I typed that sign, I felt strange that it is much 
> smaller than /\, but didn't realize that they just are not the same 
> thing), instead of forward slash and back slash. I felt really 
> embarrassed with my stupid mistake. //blush

It happens. Believe me, I have done much dumber things in my time :-)

> Kenny, regarding missing column issue, let me try to explain it again. 
> Below is quoted from my original post:

[SNIP]

> You said that "there is an extra column in the 3rd line". I disagree 
> with you from my perspective. As you can see, there are 3 commas in 
> between "jesse" and "Dartmouth college". For these 3 commas, again, if 
> we think the 2nd one as an merely indication that the value for age 
> column is missing, then the 3rd line will be be read as ["jesse", 
> MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth 
> college"] as you suggested.

This poses an interesting problem. The "," is being used for two purposes: a 
delimiter *AND* as a place holder. Unfortunately, cut and the like will see it 
as a delimiter and only a delimiter. It's what they do. I think that you may 
need to use the awk line that I sent, or some of the perl one-liners to get 
just the last column. Otherwise, you will end up with emty fields. 


> For one particular variable(column) called 
> "school", the length of some of its value is quite long(like: Univ of 
> Wisconsin at Madison, Health Sci Ctr), but I don't know the definite 
> length. I need to know it, because if the length I specify it not 
> enough, only partial values will be read. Many of its values contain 
> "univ", so I just thought if I could extract all strings containing 
> "univ" from that variable(column), I will have a better chance to figure 
> out the length of "school". That's why I had this question.

This is going to be another problem. Every "," that is used is going to be seen 
as a dilimiter. If the school name has a "," in it as there is between Madison 
and Health above. That means that taking just the last field will not work 
either. I think that the easiest thing to do in this case is to change the 
delimiter to something that is unlikely to be found in any of the columns, like 
a ":". 

C-Ya,
Kenny
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread Jon maddog Hall
Zhao,

I am really busy right now, so I have not read all of the responses to your
problem completely, but I did notice this:


[EMAIL PROTECTED] said:
> You said that "there is an extra column in the 3rd line". I disagree  with
> you from my perspective. As you can see, there are 3 commas in  between
> "jesse" and "Dartmouth college". For these 3 commas, again, if  we think the
> 2nd one as an merely indication that the value for age  column is missing,
> then the 3rd line will be be read as ["jesse",  MISSING, "Dartmouth
> college"], not ["jesse",empty,empty, "Dartmouth  college"] as you suggested.

A lot of these textual commands depend on the concept of a "field delimiter".
In your first example, it seemed clear that a possible "field delimiter" was
the comma (","), and so if you saw two commas together, it represented an
"empty" field.  Not a "missing" field, because the field was technically still
thereit just had NO data in it.  When you included the line:

 "jesse",,,"Dartmouth college"

and claimed that the middle comma represented a missing age, to a textual
based scanning program that has been told that the comma is a field separator
means that there are now four fields in the line, not just three.

If, from the beginning, you had shown that you meant for the comma to be used
both as a delimiter and as a piece of data, then a lot of the answers would
have been completely different (and probably considerably more complex).

md
-- 
Jon "maddog" Hall
Executive Director   Linux International(R)
email: [EMAIL PROTECTED] 80 Amherst St. 
Voice: +1.603.672.4557   Amherst, N.H. 03031-3032 U.S.A.
WWW: http://www.li.org

Board Member: Uniforum Association, USENIX Association

(R)Linux is a registered trademark of Linus Torvalds in several countries.
(R)Linux International is a registered trademark in the USA used pursuant
   to a license from Linux Mark Institute, authorized licensor of Linus
   Torvalds, owner of the Linux trademark on a worldwide basis
(R)UNIX is a registered trademark of The Open Group in the USA and other
   countries.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Zhao Peng

Hi All,

First I really cannot be more grateful for the answers to my question 
from all of you, I appreciate your help and time. I'm especially touched 
by the outpouring of response on this list., which I have never 
experienced  before anywhere else.


Secondly I'm sorry for the big stir-up as to "homework problems" which 
flooded the list, since I'm origin of it.


Kenny, your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt" 
works. I mis-read /\ as a simliar sign on the top of "6" key on the 
keyboard(so when I typed that sign, I felt strange that it is much 
smaller than /\, but didn't realize that they just are not the same 
thing), instead of forward slash and back slash. I felt really 
embarrassed with my stupid mistake. //blush


Kenny, regarding missing column issue, let me try to explain it again. 
Below is quoted from my original post:



Also, if one column is missing, and "," is used to indicate that missing 
column, like the following (2nd column of 3rd line is missing):

"name","age","school"
"jerry" ,"21","univ of Vermont"
"jesse",,,"Dartmouth college"
"jack","18","univ of Penn"
"john","20","univ of south Florida"
===

You said that "there is an extra column in the 3rd line". I disagree 
with you from my perspective. As you can see, there are 3 commas in 
between "jesse" and "Dartmouth college". For these 3 commas, again, if 
we think the 2nd one as an merely indication that the value for age 
column is missing, then the 3rd line will be be read as ["jesse", 
MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth 
college"] as you suggested.


Paul, as to your "simplest by what measurement" question. I was thinking 
of both "easiest to remember" and "easiest to understand" when I was 
posting my question. Now I desire for "most efficient" approach. I know 
that will be my homework.


BTW,
A bit about me: I'm a junior SAS programmer at Dartmouth Medical school. 
(FYI: core strength of SAS lies in statistical analysis, I think, so you 
could say it's a statistical software, check www.sas.com). We run SAS on 
a RedHat server, but I basically know nothing about linux before I 
started working on this position(July, 2005). Fortunately, SAS 
programming doesn't require much linux knowledge. However, as you can 
imagine, at least I need to know some basic linux commands since I work 
on linux platform.


Part of my primary job responsibilities is to convert raw data into SAS 
data sets. My "extract string" question comes from processing a raw data 
file in .txt format, which doesn't have any documentation, except the 
variable list. By looking at the raw data, I know that each variable is 
separated by a comma. For one particular variable(column) called 
"school", the length of some of its value is quite long(like: Univ of 
Wisconsin at Madison, Health Sci Ctr), but I don't know the definite 
length. I need to know it, because if the length I specify it not 
enough, only partial values will be read. Many of its values contain 
"univ", so I just thought if I could extract all strings containing 
"univ" from that variable(column), I will have a better chance to figure 
out the length of "school". That's why I had this question.


Thank you all again!

Zhao
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string -- TIMTOWTDI

2006-01-10 Thread William D Ricker
> On 1/10/06, Paul Lussier <[EMAIL PROTECTED]> wrote:
> > > perl -ne 'split ","; $_ = $_[2]; s/(^")|("$)//g; print if m/univ/;' <
> > > abc.txt > def.txt
> > Egads!

That's a literal start at a Perl "bring the grep sed and cut-or-awk
into one process", but it's not maximally Perl-ish.  It is also
inefficient, it does the side-effecting "" removal before discarding
non-univs. It is a literal translation of a `cut | sed | grep` pipe,
not of the `cut|grep|sed` pipe shown earlier.  Of coures, if the
requirements allowed it, a `grep|cut|sed` pipe would be the best
shell impelementation -- but that relies  upon knowning all 'univ'
are in the desired column, which we haven't been granted.

If it weren't for the desire to drop the quotes, Perl couldn't beat
Cut's golf-score (key stroke count) on this one anyway, but we can
try to optimize expressivity while saving two of three process forks.

The Perl Motto is TIMTOWTDI: There Is More Than One Way TO Do It.
(We've already seen that this is often true for BASH too.)  This is
usually a good thing, as often some are better for some requirements
than for others.

For generalness in real code, I'd like to explicitly ignore the header
line on this sort of CSV file:

$perl -F, -lane 'next if $.==1 or $F[-1] !~/univ/; print 
$F[-1]=~m/"(.*)"/;' 
univ of Vermont
univ of Penn
univ of south Florida
$

As with one prior posting, the '-naF,' args cause Perl to auto-split on
',' into @F on each line.  I normally used '-F, -lane' on
one-liners, since it's memorable.

The '$.==1 or' is not strictly required since the top line of the
sample file had "school" not "university" for the column head, but
it it had "university or school" or "school/univ" on line 1, would
be required.

$F[-1] is Perl's equivalent to AWK's NF, referring to the last
column, instead of by number. (By number is notoriously error prone
with 0-based field counting).  $F[-2] means last-but-one, etc, too,
and you can slice with them as @F[-6..-2] .

Rather than remove the "" with s/"//g, I've captured what's between
them and printed that.

We can make more use of -F ... we'll split on all the punctuation.

$ perl -F'/^"|","|"$/' -lane 'next if $.==1 or $F[-1] !~/univ/i; print 
$F[-1]' schools.txt
univ of Vermont
univ of Penn
univ of south Florida
$

Of course, some CSV files the ""'s are optional. In qhich case we
can do 

$perl -F'/^"|"?\s*,"?|"$/' -lane 'next if $.==1 or $F[-1] !~/univ/i; print 
$F[-1]' schools.txt
univ of Vermont
univ of Penn
univ of south Florida
$

Alternatively, to print **any** quoted phrase containing univ,
whether in last column or not, using the commas ..

$perl -F, -lane 'for (@F){s/"//g; print if /univ/i}' schools.txt
univ of Vermont
univ of Penn
univ of south Florida
$

or ignoring the commas, just uses the quotes to capture between quotes,
but only if there's a univ between.  I started sneaking in a /i flag
to be case insensitive above, and I'll continue here ...

$perl -lane 'print for m{ " ( [^"]*? univ [^"]* ) " }xig' schools.txt
univ of Vermont
univ of Penn
univ of south Florida
$

[the ? isn't required but it should help efficiency.]

or

$ perl -lane 'print for grep {/univ/i} m{"([^"]*)"}g' schools.txt
univ of Vermont
univ of Penn
univ of south Florida

There's also a CPAN module or two for processing CSV files that
handles the commas and quotes in CSV files ...
  http://search.cpan.org/search?query=Text%3A%3ACSV&mode=all
Your Linux distro should have Text::CSV_XS as a apt/yum/rpm/...
module option, or grab it from CPAN and build. (It has an XS =>
.c module, so is ripping fast, but has to be make'd.)


None of this is seriously obfuscatory golfing, but if someone wanted to
say darn the cost of forking new processes off bash, 'awk/cut|grep|sed'
is easier to read, well, I won't argue that it's easier for him/her
to read, and they should do it that way -- unless they need to tune
for performance.


-- 
/"\ Bill Ricker  N1VUX  [EMAIL PROTECTED]
\ / http://world.std.com/~wdr/   
 X  Member of the ASCII Ribbon Campaign Against HTML Mail
/ \
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Paul Lussier
Ben Scott <[EMAIL PROTECTED]> writes:

> On 1/10/06, Jon maddog Hall <[EMAIL PROTECTED]> wrote:
>> I was the senior systems administrator for Bell Labs in North Andover, MA.  I
>> got the job without ever having seen a UNIX system.
>
>   Well, really.  How many people *had* seen a UNIX system, back then?  ;-)

Two; *Kernighan* and *Ritchie* ;)  This *was* Bell Labs, right ;)

-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Kevin D. Clark

Ben Scott writes:

>   Is there a tool that quickly and easily extracts one or more columns
> of text (separated by whitespace) from an output stream?  I'm familiar
> with the
>
>   awk '{ print $3 }'
>
> mechanism, but I've always felt that was clumsy.  I've tried to get
> cut(1) to do it in the past, but the field separator semantics appear
> to assume one and only one separator, not "whitespace" (one or more
> space or tab characters).
>
>   I get the feeling there is some command or switch I'm not aware of
> that I should be using.  This hyptherical command might work something
> like this:
>
>  ls -l | foo 3
>
> to extract just the third column (username) from the ls(1) output.

Thoughts:

0:  I've always found awk and cut to be very convenient for these
operations.  For complex things, I recommend Perl.  In particular,
awk and Perl allow for pattern separators, as you desire.

1:  You might find awk's -F option to be useful.

2:  Something like this is always fun:

   perl -F: -ane 'print join " ", @F[0,5,6]' /etc/passwd

3:  If you really want foo, how about this:

   foo() {
 if [ $# -eq 0 ] ; then
   foo 0
 else
   awk "{ print `echo "[EMAIL PROTECTED]" | sed 's/\([0-9]*\)/\\$\1/g'` 
}"
 fi
  }

   I leave it to the reader to improve this if so desired.  (-:

Hope this helps,

--kevin
-- 
(There are also also 228 babies named Unique during the 1990s alone,
and 1 each of Uneekm, Uneque, and Uneqqee.)

-- _Freakonomics_, Steven D. Levitt and Stephen J. Dubner

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread klussier

 -- Original message --
From: Zhao Peng <[EMAIL PROTECTED]>
> Kenny,
> 
> Thank you for your suggestion.
> 
> The following line works:
> grep univ abc.txt | cut -f3 -d, >> dev.txt.
> 
> 
> While the following line intended to remove quotes does NOT work:
> grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt
> It resulted in a line starts with ">" prompt, and not output dev.txt
> 
> Could you please double-check or modify it?

I have checked it, and it works exactly as it should. 

> Also, if one column is missing, and "," is used to indicate that missing 
> column, like the following (2nd column of 3rd line is missing):
> 
> "name","age","school"
> "jerry" ,"21","univ of Vermont"
> "jesse",,,"Dartmouth college"
> "jack","18","univ of Penn"
> "john","20","univ of south Florida"
> 
> Does the "cut" approach still apply? If not, what command would you 
> suggest to address this missing issue?
> 

A column is not missing, it is just empty. It is still delimited by a ",", so 
it is still a valid column. However, in the example above, there is an extra 
column in the 3rd line. All of the other lines have "name,age,school". Line 3 
has "name,empty,empty,school".

Now, if you know that the school is always going to be the last field, you may 
not want to use cut at all. You might want to use something like :

grep univ abc.txt | awk -F, '{print $NF}'| sed 's/\"//g'

awk takes the place of `cut` in this case by looking at the last field 
delimited by a ",".

FYI,
Kenny
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Jon maddog Hall <[EMAIL PROTECTED]> wrote:
> I was the senior systems administrator for Bell Labs in North Andover, MA.  I
> got the job without ever having seen a UNIX system.

  Well, really.  How many people *had* seen a UNIX system, back then?  ;-)

  (Sorry, couldn't resist.)

> It was that experience that led me to page through section (1) of the manual
> every six months, just to remind myself of the gold that was hidden in those
> pages.

  I learned the plurality of what I know about the shell by reading
the bash(1) man "page" (and most of that while sitting at VT-320's in
UNH computer clusters, no less).  Never underestimate the power of
RTFM.  :-)

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Jon maddog Hall

[EMAIL PROTECTED] said:
>> While the following line intended to remove quotes does NOT work:
>> grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt
>> It resulted in a line starts with ">" prompt, and not output dev.txt

> I can't see any reason why what state should be happening.  As a matter of
> fact, I tried that exact command line on my system and it worked exactly as
> (specified|advertised|expected). 

Might this not be affected by different command interpreters?  sh vs csh vs ksh
vs bash?


>Simplest by what measurement?

> - fewest processes spawned
> - most efficient
> - least amount of typing
> - easiest to remember
> - easiest to understand
> - ability to debug
> - extensibility

Most portable?

[EMAIL PROTECTED] said:
> While it does seem like a few man page pointers would be better (more
> instructive in the long run), I have to admit I wasn't familiar with cut, so
> I've learned something from this one. 

I still remember the time that I first was learning UNIX (all capital 
letters)...

I was the senior systems administrator for Bell Labs in North Andover, MA.  I
got the job without ever having seen a UNIX system.  Of course I had programmed
on dozens of different OS systemsbut there I was, late at night, trying
to solve much the same type of problem that was solved here.

After thinking about it, and wondering if I would have to write a program
to do it, I thought to myself..."I do not KNOW that UNIX has a command that
could do this, but I am willing to BET it does."  And I started paging through
section 1 of the manual.the shell commands.  Sure enough, I came to "cut(1)"
and it was exactly what I needed.  (Later on I was glad the command was not
at the back of the section, being named something like "Yet Another Cut
Command".hmmmmaybe in a way it was) :-}

It was that experience that led me to page through section (1) of the manual
every six months, just to remind myself of the gold that was hidden in those
pages.

Warmest regards,

maddog
-- 
Jon "maddog" Hall
Executive Director   Linux International(R)
email: [EMAIL PROTECTED] 80 Amherst St. 
Voice: +1.603.672.4557   Amherst, N.H. 03031-3032 U.S.A.
WWW: http://www.li.org

Board Member: Uniforum Association, USENIX Association

(R)Linux is a registered trademark of Linus Torvalds in several countries.
(R)Linux International is a registered trademark in the USA used pursuant
   to a license from Linux Mark Institute, authorized licensor of Linus
   Torvalds, owner of the Linux trademark on a worldwide basis
(R)UNIX is a registered trademark of The Open Group in the USA and other
   countries.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Drew Van Zandt <[EMAIL PROTECTED]> wrote:
> While it does seem like a few man page pointers would be better (more
> instructive in the long run), I have to admit I wasn't familiar with cut, so
> I've learned something from this one.

  Since we're on the subject...

  Is there a tool that quickly and easily extracts one or more columns
of text (separated by whitespace) from an output stream?  I'm familiar
with the

  awk '{ print $3 }'

mechanism, but I've always felt that was clumsy.  I've tried to get
cut(1) to do it in the past, but the field separator semantics appear
to assume one and only one separator, not "whitespace" (one or more
space or tab characters).

  I get the feeling there is some command or switch I'm not aware of
that I should be using.  This hyptherical command might work something
like this:

 ls -l | foo 3

to extract just the third column (username) from the ls(1) output.

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread klussier

 -- Original message --
From: Paul Lussier <[EMAIL PROTECTED]>
> [EMAIL PROTECTED] writes:
> 
> > Actually, if you are looking for only lines that contain the string "univ", 
> then you would want to grep for it:
> >
> > grep univ abc.txt | cut -f3 -d, >> dev.txt.
> 
> Why are you appending to dev.txt? (or def.txt even).  Are you assuming
> the file already exists and don't want to over-write the contents?
> 

That is exactly what I was thinking. Even if it isn't being appended to, The 
result is essentially the same. Unless, of course, you want to over-write the 
file. Then that would work out to well.  It's better to be safe then sorry  :-) 
Besides, I've been doing a log of this sort of thing in the last few days, and 
the >> just sort of rolled off my fingertips. 
 
> > Paul's example would give you the third field of each line, even if
> > they don't have "univ" in them. Now, if you wanted to remove the
> > quotes, then you would need something like:
> >
> >
> > grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt 
> 
> yep, that should work, but no need for the >> when a simple > will do.

What? Two redirects are better then one, right :-)

C-Ya,
Kenny
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Paul Lussier <[EMAIL PROTECTED]> wrote:
> > perl -ne 'split ","; $_ = $_[2]; s/(^")|("$)//g; print if m/univ/;' <
> > abc.txt > def.txt
>
> Egads!

  Egads?

-- Ben "As I was saying about explanation..." Scott
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Drew Van Zandt
While it does seem like a few man page pointers would be better (more
instructive in the long run), I have to admit I wasn't familiar with
cut, so I've learned something from this one.

--Drew



Re: extract string

2006-01-10 Thread Paul Lussier
Zhao Peng <[EMAIL PROTECTED]> writes:

> While the following line intended to remove quotes does NOT work:
> grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt
> It resulted in a line starts with ">" prompt, and not output dev.txt

I can't see any reason why what state should be happening.  As a
matter of fact, I tried that exact command line on my system and it
worked exactly as (specified|advertised|expected).

> Could you please double-check or modify it?

Well, I think you've received lots of good help.  Perhaps you should
spend some time reading the relevant man pages and trying to
understand exactly what has been offered so you double-check and/or
modify it ?

> Also, if one column is missing, and "," is used to indicate that
> missing column, like the following (2nd column of 3rd line is
> missing):
[...]
> Does the "cut" approach still apply? If not, what command would you
> suggest to address this missing issue?

man cut will answer this question.
-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Michael ODonnell


> Ooo, look! - a new business model for Lugs!

I happen to like these threads and far from regarding
them as a burden I think they're a pleasant diversion
and extremely useful as learning opportunities.

But I've been asking for a long time when our IPO will
be happening; we've got more talent and longevity than
most of the scams^H^H^H^H^Hventures that got millions
during the dotcom era, and when the juices are really
flowing most VC firms appear(ed) to regard those pesky
business plans as optional, anyway...;->

 
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Paul Lussier
[EMAIL PROTECTED] writes:

> Actually, if you are looking for only lines that contain the string "univ", 
> then you would want to grep for it:
>
> grep univ abc.txt | cut -f3 -d, >> dev.txt.

Why are you appending to dev.txt? (or def.txt even).  Are you assuming
the file already exists and don't want to over-write the contents?


> Paul's example would give you the third field of each line, even if
> they don't have "univ" in them. Now, if you wanted to remove the
> quotes, then you would need something like:
>
>
> grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt 

yep, that should work, but no need for the >> when a simple > will do.

-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Paul Lussier
Ben Scott <[EMAIL PROTECTED]> writes:

> Here's one way, as a Perl one-liner:
>
> perl -ne 'split ","; $_ = $_[2]; s/(^")|("$)//g; print if m/univ/;' <
> abc.txt > def.txt

Egads!
-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Paul Lussier
Zhao Peng <[EMAIL PROTECTED]> writes:

> Hi
>
> Suppose that I have a file called abc.txt, which contains the
> following 5 lines (columns are delimited by ",")
>
> "name","age","school"
> "jerry" ,"21","univ of Vermont"
> "jesse","28","Dartmouth college"
> "jack","18","univ of Penn"
> "john","20","univ of south Florida"
>
> My OS is RedHat Enterprise, how could I extract the string which
> contains "univ" and create an output file called def.txt, which only
> has 3 following lines:
>
> univ of Vermont
> univ of Penn
> univ of south Florida
>

Here are 3, pick your poison:

  awk -F, '/univ/ && gsub(/\"/,"") {print $3}' abc.txt > def.txt
  perl -F, -ane 'if (/univ/) { $F[2] =~ s/\"//g; print $F[2]};' abc.txt \
> def.txt
  grep univ abc.txt | cut -f3 -d, | sed 's/\"//g' > def.txt


> Please suggest the simplest command line approach.

Simplest by what measurement?

 - fewest processes spawned
 - most efficient
 - least amount of typing
 - easiest to remember
 - easiest to understand
 - ability to debug
 - extensibility

Simplest is a rather subjective approach...





-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Jeff Kinz
Ooo, look! - a new business model for Lugs!

Achieve Lug financial independence today!

Now your Lug can achieve its financial funding goals simply by charging
25 cents for each shell scripting homework problem answered and 50 cents
for extended explanations such as rendered below. :-)

All we need now is a PayPal account. :-)

(rendered tongue at least halfway in cheek, all proceeds to
go to GNHLUGS tab at Martha's)



On Tue, Jan 10, 2006 at 01:23:14PM -0500, Ben Scott wrote:
> On 1/10/06, Zhao Peng <[EMAIL PROTECTED]> wrote:
> > While the following line intended to remove quotes does NOT work:
> > grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt
> > It resulted in a line starts with ">" prompt, and not output dev.txt
> 
>   The ">" prompt indicates the shell thinks you are still in the
> middle of some shell construct, and is prompting you to finish it.  It
> usually manifests due to an unclosed quote.  Most likely, something is
> eating the backslash that appears before the double-quote in the sed
> command.  It should be
> 
>  sed s/\"//g
> 
> where the second word contains the characters letter s, a forward
> slash (/), a backslash (\), a double-quote, two forward slashes (//),
> and the letter g.  The backslash tells the shell that the following
> character (in this case, a quote) is not to be interpreted as shell
> syntax, but instead passed to the specified command "as is".  This is
> called an "escape character" or a "shell escape".
> 
>   If you're putting this shell command inside some other program or
> shell, you may find *that* program also interprets the backslash this
> way.  So you need to escape it *twice*:
> 
>  sed s/\\"//g
> 
> The characters \\ get interpreted by the first program as "literal
> backslash here".  The shell then receives a single backslash, which it
> applies to the double-quote.
> 
>   Shell escapes can get very, very messy.
> 
> -- Ben
> ___
> gnhlug-discuss mailing list
> gnhlug-discuss@mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
> 

-- 
Jeff Kinz, Emergent Research, Hudson, MA.
speech recognition software may have been used to create this e-mail

"The greatest dangers to liberty lurk in insidious encroachment by men
of zeal, well-meaning but without understanding." - Brandeis

To think contrary to one's era is heroism. But to speak against it is
madness. -- Eugene Ionesco
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Zhao Peng <[EMAIL PROTECTED]> wrote:
> While the following line intended to remove quotes does NOT work:
> grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt
> It resulted in a line starts with ">" prompt, and not output dev.txt

  The ">" prompt indicates the shell thinks you are still in the
middle of some shell construct, and is prompting you to finish it.  It
usually manifests due to an unclosed quote.  Most likely, something is
eating the backslash that appears before the double-quote in the sed
command.  It should be

 sed s/\"//g

where the second word contains the characters letter s, a forward
slash (/), a backslash (\), a double-quote, two forward slashes (//),
and the letter g.  The backslash tells the shell that the following
character (in this case, a quote) is not to be interpreted as shell
syntax, but instead passed to the specified command "as is".  This is
called an "escape character" or a "shell escape".

  If you're putting this shell command inside some other program or
shell, you may find *that* program also interprets the backslash this
way.  So you need to escape it *twice*:

 sed s/\\"//g

The characters \\ get interpreted by the first program as "literal
backslash here".  The shell then receives a single backslash, which it
applies to the double-quote.

  Shell escapes can get very, very messy.

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Whelan, Paul <[EMAIL PROTECTED]> wrote:
> Like so: cat abc.txt | cut -d, -f3

1.  Randal Schwartz likes to call that UUOC (Useless Use Of cat).  :-)
 You can just do this instead:

  cut -d, -f3 < abc.txt

If you like the input file at the start of the command line, that's legal, too:

 < abc.txt cut -d, -f3

You can read more about UUOC at: http://sial.org/code/shell/tips/useless-cat/

2. The above simply returns the third field.  OP appeared to want only
lines containing "univ".  So:

 cut -d, -f3 < abc.txt | grep univ

3. I'll leave the quote removal as an exercise to the reader.  ;-)

-- Ben "Pedantic" Scott  ;-)
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Zhao Peng <[EMAIL PROTECTED]> wrote:
> how could I extract the string which
> contains "univ" and create an output file called def.txt, which only has
> 3 following lines:

Here's one way, as a Perl one-liner:

perl -ne 'split ","; $_ = $_[2]; s/(^")|("$)//g; print if m/univ/;' <
abc.txt > def.txt

That trims out the quotes, as it appears you want.  The search for
"univ" is case-sensitive.

Broken down into a script with comments:

#!/usr/bin/perl -n
split ","; # split input fields into @_ (split at commas)
$_ = $_[2];# grab the third field, put into default workspace ($_)
s/(^")|("$)//g;# delete double-quote (") at start and/or end
print if m/univ/;  # print if contains "univ"

  HTH,

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Zhao Peng

Kenny,

Thank you for your suggestion.

The following line works:
grep univ abc.txt | cut -f3 -d, >> dev.txt.


While the following line intended to remove quotes does NOT work:
grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt
It resulted in a line starts with ">" prompt, and not output dev.txt

Could you please double-check or modify it?

Also, if one column is missing, and "," is used to indicate that missing 
column, like the following (2nd column of 3rd line is missing):


"name","age","school"
"jerry" ,"21","univ of Vermont"
"jesse",,,"Dartmouth college"
"jack","18","univ of Penn"
"john","20","univ of south Florida"

Does the "cut" approach still apply? If not, what command would you 
suggest to address this missing issue?


Thank you again.
Zhao


[EMAIL PROTECTED] wrote:

Actually, if you are looking for only lines that contain the string "univ", 
then you would want to grep for it:

grep univ abc.txt | cut -f3 -d, >> dev.txt.

Paul's example would give you the third field of each line, even if they don't have 
"univ" in them. Now, if you wanted to remove the quotes, then you would need 
something like:


grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt 


FYI,
Kenny

 -- Original message --
From: "Whelan, Paul" <[EMAIL PROTECTED]>
  

Like so: cat abc.txt | cut -d, -f3

Thanks.

-Original Message-
From: Zhao Peng [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 10, 2006 11:51 AM

To: gnhlug-discuss@mail.gnhlug.org
Subject: extract string

Hi

Suppose that I have a file called abc.txt, which contains the following 
5 lines (columns are delimited by ",")


"name","age","school"
"jerry" ,"21","univ of Vermont"
"jesse","28","Dartmouth college"
"jack","18","univ of Penn"
"john","20","univ of south Florida"

My OS is RedHat Enterprise, how could I extract the string which 
contains "univ" and create an output file called def.txt, which only has


3 following lines:

univ of Vermont
univ of Penn
univ of south Florida

Please suggest the simplest command line approach.

Thank you.
Zhao
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss





  


___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


RE: extract string

2006-01-10 Thread klussier
Actually, if you are looking for only lines that contain the string "univ", 
then you would want to grep for it:

grep univ abc.txt | cut -f3 -d, >> dev.txt.

Paul's example would give you the third field of each line, even if they don't 
have "univ" in them. Now, if you wanted to remove the quotes, then you would 
need something like:


grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt 

FYI,
Kenny

 -- Original message --
From: "Whelan, Paul" <[EMAIL PROTECTED]>
> Like so: cat abc.txt | cut -d, -f3
> 
> Thanks.
> 
> -Original Message-
> From: Zhao Peng [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, January 10, 2006 11:51 AM
> To: gnhlug-discuss@mail.gnhlug.org
> Subject: extract string
> 
> Hi
> 
> Suppose that I have a file called abc.txt, which contains the following 
> 5 lines (columns are delimited by ",")
> 
> "name","age","school"
> "jerry" ,"21","univ of Vermont"
> "jesse","28","Dartmouth college"
> "jack","18","univ of Penn"
> "john","20","univ of south Florida"
> 
> My OS is RedHat Enterprise, how could I extract the string which 
> contains "univ" and create an output file called def.txt, which only has
> 
> 3 following lines:
> 
> univ of Vermont
> univ of Penn
> univ of south Florida
> 
> Please suggest the simplest command line approach.
> 
> Thank you.
> Zhao
> ___
> gnhlug-discuss mailing list
> gnhlug-discuss@mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
> 
> ___
> gnhlug-discuss mailing list
> gnhlug-discuss@mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


RE: extract string

2006-01-10 Thread Whelan, Paul
Like so: cat abc.txt | cut -d, -f3

Thanks.

-Original Message-
From: Zhao Peng [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 10, 2006 11:51 AM
To: gnhlug-discuss@mail.gnhlug.org
Subject: extract string

Hi

Suppose that I have a file called abc.txt, which contains the following 
5 lines (columns are delimited by ",")

"name","age","school"
"jerry" ,"21","univ of Vermont"
"jesse","28","Dartmouth college"
"jack","18","univ of Penn"
"john","20","univ of south Florida"

My OS is RedHat Enterprise, how could I extract the string which 
contains "univ" and create an output file called def.txt, which only has

3 following lines:

univ of Vermont
univ of Penn
univ of south Florida

Please suggest the simplest command line approach.

Thank you.
Zhao
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss