Re: extract string from filename

2006-01-13 Thread Ted Roche

On Jan 12, 2006, at 8:25 PM, Ben Scott wrote:


  It sounds like you could use a tutorial on Unix text processing and
command line tools, specifically, one which addresses pipes and
redirection, as well as the standard text tools (grep, cut, sed, awk,
etc.).  While Paul's recommendation about the O'Reilly regular
expressions book is valid, I suspect it might be a little too focused
on regex's and not cover some of the *other* elements you seem to be
needing.


Gee, I wonder if that would be a good topic for a meeting g.

Bruce Dawson and David Berube did a presentation on Regular  
expressions that helped me grasp what they were and why I'd want to  
know more. Bought the Reg Exp book on my next visit to SoftPro s.


A similar kind of presentation that explained the place of sed, grep,  
awk, pipes, redirection, tee and so forth.



  It's been forever for me, but I seem to recall that _Unix Power
Tools_, also published by O'Reilly, covers all of the above and much,
much more.  If others on this list second my suggestion, you might
want to obtain a copy.  Alternatively, maybe list members can suggest
alternatives?


Re: UNIX Power Tools. Third time I've heard that recommended. Guess  
I'll add that to my wish list.


Jerry Peek (http://www.oreillynet.com/pub/au/28 - a number of  
articles and book extracts linked here), one of the original authors  
of Unix Power Tools, has been running a series in Linux Magazine for  
a while now on working from the command line, including the  
inscrutable 21 and other arcana.


Linux magazine is online at http://www.linux-mag.com/ and posts their  
issues sixty days after publication at http://www.linux-mag.com/ 
backissues/.


Ben's other links are quite useful, too. The Answers Are Out There.  
The challenge is finding the answer you need now.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Tom Buskey
On 1/12/06, Ben Scott [EMAIL PROTECTED] wrote:
On 1/12/06, Zhao Peng [EMAIL PROTECTED] wrote: I'm back, with another extract string question. //grinIt sounds like you could use a tutorial on Unix text processing and
command line tools, specifically, one which addresses pipes andredirection, as well as the standard text tools (grep, cut, sed, awk,etc.).While Paul's recommendation about the O'Reilly regularexpressions book is valid, I suspect it might be a little too focused
on regex's and not cover some of the *other* elements you seem to beneeding.It's been forever for me, but I seem to recall that _Unix PowerTools_, also published by O'Reilly, covers all of the above and much,
much more.If others on this list second my suggestion, you mightwant to obtain a copy.Alternatively, maybe list members can suggestalternatives?Unix Shell Programming by Kochan and Wood is a classic on shell programming
Portable Shell Programming by BlinnThe Awk Programming Language by Aho, Weinberger and KernighanPower Tools is excellent but it more of a tip book in my mind. Not as much as the Hack series though. 
There are also a number of free guides at the Linux DocumentationProject.See:
http://www.tldp.org/guides.htmlLook for anything mentioning bash (the Bourne-again shell) orscripting.I can't speak as to how good they are, but you can't beat
the price.Some of them are very good. And the examples work.-- A strong conviction that something must be done is the parent of many bad measures.
- Daniel Webster


Re: extract string from filename

2006-01-13 Thread Bill McGonigle

On Jan 12, 2006, at 19:40, Zhao Peng wrote:

I also downloaded an e-book called Learning Perl (OReilly, 
4th.Edition), and had a quick look thru its Contents of Table, but did 
not find any chapter which looks likely addressing any issue related 
to my question.


Good start.  Read these sections: 'A Stroll Through Perl', 'The Split 
and Join Functions', 'Lists and Arrays', 'Hashes', 'Directory Access', 
and 'File Manipulation'.


Your description is the outline of the algorithm.  Take this script 
where I've filled in the requisite perl and figure out how it works:


#!/usr/bin/perl -w
use strict;   # show stupid errors
use warnings FATAL='all';# don't let you get away with them

#I have almost 1k small files within one folder. The only pattern of 
the file names is:
my $dirname = shift; # take the command line parameter as the directory 
name

opendir DIRECTORY, $dirname;
my @files = readdir(DIRECTORY);
closedir DIRECTORY;

#string1_string2_string3_string4.sas7bdat

#Note:
#1, string2 often repeat itself across each file name
#2, All 4 strings contain no underscores.
#3, 4 strings are separated by  3 underscores (as you can see)
#4, The length of all 4 strings are not fixed.

my (@part_2s);  # we'll keep the second parts here
foreach my $file (@files) {
next if (($file eq '.') or ($file eq '..')); # the directory will 
contain . and .. which we don't want

#My goal is to :
#1, extract string2 from each file name
my ($filename,$extension) = split('\.',$file); # don't forget to 
escape the . since this is a regex

my @strings = split('_',$filename);
my $part_2 = $strings[1]; # remember, arrays in perl are 
zero-indexed
push(@part_2s,$part_2);   # store the data we want on the end of 
the array

}

#2, keep only unique ones
# perl trick using a hash to easily get unique items
my (%temp_hash);
foreach my $part (@part_2s) {
$temp_hash{$part} = 1;
}
my @uniques = (keys %temp_hash);

# and then sort them
my @sorted = sort { $a cmp $b}  (@uniques);  # cmp for string storting

#3, then output them to a .txt file. (one unique string2 per line)
open OUTFILE, output.txt;
foreach my $item (@sorted) {
print OUTFILE $item . \n;
}
close OUTFILE;

When you understand each line you'll be able to solve future similar 
problems easily.  Note Kevin's perl solution is equally valid and 
probably faster, but you're not going to grok it until you excercise 
the perl part of your brain for a while.


-Bill
-
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
[EMAIL PROTECTED]   Cell: 603.252.2606
http://www.bfccomputing.com/Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Larry Cook

Zhao Peng wrote:

My goal is to :
1, extract string2 from each file name
2, then sort them and keep only unique ones
3, then output them to a .txt file. (one unique string2 per line)


It is really interesting how many ways there are to do things in *nix.  My 
first reaction, if this is a one time event, is to just use vi:


% ls *.sas7bdat  string2.txt
% vi string2.txt
:%s/^[^_]*_//
:%s/_.*$//
:%!sort -u
:wq

The first regex removes the first underscore and everything in front of it, 
while the second regex removes what is now the first underscore (was the 
second originally) and everything after it.  And then I do the unique sort 
right in vi.


Larry
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Zhao Peng

Kevin,

Thank you very much! I really appreciate it.

I like your find approach, it's simple and easy to understand.

I'll also try to understand your perl approach, when I got time to start 
learning it. (Hopefully it won't be un-fulfilled forever)


I have one more question:

Is it possible to number the extracted string2?

Say, the output file contains the following list of extracted string2:

st
region
local

Any idea about what command to use to number the list  to make it look 
like below:


1 st
2 region
3 local

Again, thank you for your help and time!

Zhao

Kevin D. Clark wrote:

Zhao Peng writes:

  

I'm back, with another extract string question. //grin




find FOLDERNAME -name \*sas7bdat -print | sed 's/.*\///' | cut -d _ -f 2 | sort -u 
 somefile.txt

or

perl -MFile::Find -e 'find(sub{$string2 = (split /_/)[2]; $seen{$string2}++; }, @ARGV); 
map { print $_\n; } keys(%seen)' FOLDERNAME

(which looks more readable as:

  perl -MFile::Find -e 'find(sub{ $string2 = (split /_/)[2];
  $seen{$string2}++;
 }, @ARGV);
  
 map { print $_\n; } keys(%seen)' \

  FOLDERNAME  somefile.txt

)

Either of which solves the problem that you describe.  Actually, they
solve more than the problem that you describe, since it wasn't
apparent to me if you had any subdirectories here, but this is solved too)

(substitute FOLDERNAME with your directory's name)


Honestly, the first solution I present is the way I would have solved
this problem myself.  Very fast this way.

Regards,

--kevin
  


___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Jeff Kinz
On Fri, Jan 13, 2006 at 11:40:26AM -0500, Zhao Peng wrote:
 Kevin,
 
 Thank you very much! I really appreciate it.
 
 I like your find approach, it's simple and easy to understand.
 
 I'll also try to understand your perl approach, when I got time to start 
 learning it. (Hopefully it won't be un-fulfilled forever)
 
 I have one more question:
 
 Is it possible to number the extracted string2?
 
 Say, the output file contains the following list of extracted string2:
 
 st
 region
 local
 
 Any idea about what command to use to number the list  to make it look 
 like below:
 
 1 st
 2 region
 3 local


Pipe the output into pr -n -T

This is not pr's intended use, but it will work.  -n option means put
numbers on the lines, -T option means No page breaks.

The -n option appears to be missing from the FC2 man pages.


-- 
Jeff Kinz, Emergent Research, Hudson, MA.
speech recognition software may have been used to create this e-mail

The greatest dangers to liberty lurk in insidious encroachment by men
of zeal, well-meaning but without understanding. - Brandeis

To think contrary to one's era is heroism. But to speak against it is
madness. -- Eugene Ionesco
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Ben Scott
On 1/13/06, Zhao Peng [EMAIL PROTECTED] wrote:
 Is it possible to number the extracted string2?

find -name \*sas7bdat -printf '%f\n' | cut -d _ -f 2 | sort | uniq | cat -n

  Run that pipeline in the directory you are interested in.

  The find(1) command finds files, based on their name or other
filesystem attributes.

  The -name \*sas7bdat part finds files with file names which match
the pattern.  There backslash escapes the star, to keep the shell from
trying to interpret it, so find gets the star instead.

  The -printf '%f\n' part has find output just the file name, not the path.

  cut(1) is used to split input strings, as you know.  -d _ splits
into fields, based on underscores.  -f 2 outputs the second field
only, one per line.

  sort(1) sorts, and uniq(1) eliminates duplicate lines.

  cat -n numbers the output.

-- Ben Pay attention, there's gonna be a quiz next week Scott
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Ben Scott
On 1/13/06, Ben Scott [EMAIL PROTECTED] wrote:
 On 1/13/06, Zhao Peng [EMAIL PROTECTED] wrote:
  Is it possible to number the extracted string2?

 find -name \*sas7bdat -printf '%f\n' | cut -d _ -f 2 | sort | uniq | cat -n

  I forgot to mention: If the *only* files in that directory are the
ones with the interesting file names, you can just use this:

ls | cut -d _ -f 2 | sort | uniq | cat -n

-- Ben I would flunk the quiz Scott
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Michael ODonnell


cat -n will number output lines

 
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Dan Jenkins

Zhao Peng wrote:


string1_string2_string3_string4.sas7bdat

abc_st_nh_num.sas7bdat
abc_st_vt_num.sas7bdat
abc_st_ma_num.sas7bdat
abcd_region_NewEngland_num.sas7bdat
abcd_region_South_num.sas7bdat

My goal is to :
1, extract string2 from each file name
2, then sort them and keep only unique ones
3, then output them to a .txt file. (one unique string2 per line)


Solution #1:
ls -1 *sas7bdat|awk -F_ '{print $2}'|sort -fu|cat -n output.txt

Take output of ls, 1 file per line (ls -1) - only files ending with sas7bdat
Feed into awk, splitting on _, print the 2nd field
Sort ignoring case, eliminating duplicates (sort options: f folds 
case, u keeps only uniques)

Number the lines (cat -n)
Put output in file named output.txt

Solution #2:
ls -1 *sas7bdat|sed 's/^\([a-zA-Z0-9]*_\)\([a-zA-Z0-9]*\)_.*$/\2/'|sort 
-fu|cat -n output.txt
Use sed (stream editor) to break up filenames into atoms separated by _, 
and output the 2nd one (the \2). Regular expressions (regex) can be very 
handy. ^ matches beginning of string, [a-zA-Z0-9]*_ matches 
letter/number string ending with _, the backslashed parentheses groups 
the patterns, so the 2nd one can be extracted.


There are many solutions to the problem, as you can see.

--
Dan Jenkins ([EMAIL PROTECTED])
Rastech Inc., Bedford, NH, USA --- 1-603-206-9951
*** Technical Support Excellence for over a quarter century

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Paul Lussier
[EMAIL PROTECTED] (Kevin D. Clark) writes:

 Zhao Peng writes:

 I'm back, with another extract string question. //grin


 find FOLDERNAME -name \*sas7bdat -print | sed 's/.*\///' | cut -d _ -f 2 | 
 sort -u  somefile.txt

Or, to simplify this:

  find ./ -name \*sas7bdat | awk -F_ '{print $2}' |sort -u
  ls *sas7bdat | perl -F_ -ane 'print $F[1]\n;'|sort -u
  perl -e 'opendir(DIR,.); map { if (/sas7bdat$/) { $k = (split(/_/,$_))[1]; 
$f{$k} =1; } } readdir(DIR); map { print $_\n;}sort keys %f;'

That last one might be a little better formatted like:

  perl -e 'opendir(DIR,.);
   map { if (/sas7bdat$/) { 
   $k = (split(/_/,$_))[1];
   $f{$k}=1; 
 }
   } readdir(DIR);
   map { print $_\n;} sort keys %f;'

It should be rather obvious that your best bet for quick one-liners
for this type of thing is to probably stick with standard UNIX tools
like sort, cut, sed, awk, etc.  Perl is great for text manipulation,
but as you can see, none of the perl one-liners has been nearly as
concise as the shell variants.  If speed matters, or process overhead,
then maybe perl is better.  Of course for such a small data set as
you've given, the perl versions are both harder and longer to type.

hth.
-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-13 Thread Paul Lussier
Tom Buskey [EMAIL PROTECTED] writes:

 Unix Shell Programming by Kochan and Wood is a classic on shell programming


 Portable Shell Programming by Blinn
 The Awk Programming Language by Aho, Weinberger and Kernighan

I'm also a big fan of Kernighan and Pikes, The UNIX Programming
Environment.  When I first saw this book I thought it was going to be
more of a C programming book explaining thinks like linking and
compiling under UNIX. However, it turned out to be simply a great book
on how to get around the shell and do a variety of things in the UNIX
environment.  So named the UNIX Progamming Environment because, as
we've all seen here, the shell is *programmable* :)

And, yet another plug for my all-time favorite UNIX book, The UNIX
Philosophy by Mike Gancarz, which has recently been updated with a
second edition (which I have not yet read) The Linux and UNIX
Philosophy.  This book does a fantastic job of explaining exactly
*why* UNIX is such a great environment, and why other competing
environments just can't compete when what you need is raw power and
flexibility.
-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


extract string from filename

2006-01-12 Thread Zhao Peng

Hi all,

I'm back, with another extract string question. //grin

I have almost 1k small files within one folder. The only pattern of the 
file names is:


string1_string2_string3_string4.sas7bdat

Note:
1, string2 often repeat itself across each file name
For example:
abc_st_nh_num.sas7bdat
abc_st_vt_num.sas7bdat
abc_st_ma_num.sas7bdat
abcd_region_NewEngland_num.sas7bdat
abcd_region_South_num.sas7bdat

2, All 4 strings contain no underscores.
3, 4 strings are separated by  3 underscores (as you can see)
4, The length of all 4 strings are not fixed.

My goal is to :
1, extract string2 from each file name
2, then sort them and keep only unique ones
3, then output them to a .txt file. (one unique string2 per line)

I tried to use cut commands, but can't even figure out how to use the 
filenames as input. Anyone care to offer me a hint?


I also downloaded an e-book called Learning Perl (OReilly, 
4th.Edition), and had a quick look thru its Contents of Table, but did 
not find any chapter which looks likely addressing any issue related to 
my question.


Thank you very much!

Zhao
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-12 Thread Ben Scott
On 1/12/06, Zhao Peng [EMAIL PROTECTED] wrote:
 I'm back, with another extract string question. //grin

  It sounds like you could use a tutorial on Unix text processing and
command line tools, specifically, one which addresses pipes and
redirection, as well as the standard text tools (grep, cut, sed, awk,
etc.).  While Paul's recommendation about the O'Reilly regular
expressions book is valid, I suspect it might be a little too focused
on regex's and not cover some of the *other* elements you seem to be
needing.

  It's been forever for me, but I seem to recall that _Unix Power
Tools_, also published by O'Reilly, covers all of the above and much,
much more.  If others on this list second my suggestion, you might
want to obtain a copy.  Alternatively, maybe list members can suggest
alternatives?

  There are also a number of free guides at the Linux Documentation
Project.  See:

http://www.tldp.org/guides.html

  Look for anything mentioning bash (the Bourne-again shell) or
scripting.  I can't speak as to how good they are, but you can't beat
the price.

  Anyway, on to your question...

 I tried to use cut commands, but can't even figure out how to use the
 filenames as input. Anyone care to offer me a hint?

  You'll want to pipe the output of ls to cut.  This should get you started:

  ls -1 | cut -d _ -f 2

  The -1 switch to ls(1) tells it to output a single column of file
names.  Some versions of ls do this automagically when using
redirection, but it is best to be sure.  The -d _ switch to cut(1)
tells cut to split fields on the underscore.  The -f 2 selects the
second field.

  See also: sort(1), uniq(1)

  Hope this helps!

-- Ben Unix plumber Scott
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-12 Thread Python
On Thu, 2006-01-12 at 19:40 -0500, Zhao Peng wrote:
 For example:
 abc_st_nh_num.sas7bdat
 abc_st_vt_num.sas7bdat
 abc_st_ma_num.sas7bdat
 abcd_region_NewEngland_num.sas7bdat
 abcd_region_South_num.sas7bdat

You're not the only one learning here.  

I put these names into a file called str2-test-data

$ cut -d _ -f 2 str2-test-data | sort | uniq
region
st

I think that you could use:
ls | cut -d _ -f 2 | sort | uniq  str2-results.txt

-- 
Lloyd Kvam
Venix Corp

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string from filename

2006-01-12 Thread Kevin D. Clark

Zhao Peng writes:

 I'm back, with another extract string question. //grin


find FOLDERNAME -name \*sas7bdat -print | sed 's/.*\///' | cut -d _ -f 2 | sort 
-u  somefile.txt

or

perl -MFile::Find -e 'find(sub{$string2 = (split /_/)[2]; $seen{$string2}++; }, 
@ARGV); map { print $_\n; } keys(%seen)' FOLDERNAME

(which looks more readable as:

  perl -MFile::Find -e 'find(sub{ $string2 = (split /_/)[2];
  $seen{$string2}++;
 }, @ARGV);
  
 map { print $_\n; } keys(%seen)' \
  FOLDERNAME  somefile.txt

)

Either of which solves the problem that you describe.  Actually, they
solve more than the problem that you describe, since it wasn't
apparent to me if you had any subdirectories here, but this is solved too)

(substitute FOLDERNAME with your directory's name)


Honestly, the first solution I present is the way I would have solved
this problem myself.  Very fast this way.

Regards,

--kevin
-- 
(There are also also 228 babies named Unique during the 1990s alone,
and 1 each of Uneek, Uneque, and Uneqqee.)

-- _Freakonomics_, Steven D. Levitt and Stephen J. Dubner


[but no Unix folks named their kids uniq, apparently.  --kevin]

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread Jon maddog Hall
Zhao,

I am really busy right now, so I have not read all of the responses to your
problem completely, but I did notice this:


[EMAIL PROTECTED] said:
 You said that there is an extra column in the 3rd line. I disagree  with
 you from my perspective. As you can see, there are 3 commas in  between
 jesse and Dartmouth college. For these 3 commas, again, if  we think the
 2nd one as an merely indication that the value for age  column is missing,
 then the 3rd line will be be read as [jesse,  MISSING, Dartmouth
 college], not [jesse,empty,empty, Dartmouth  college] as you suggested.

A lot of these textual commands depend on the concept of a field delimiter.
In your first example, it seemed clear that a possible field delimiter was
the comma (,), and so if you saw two commas together, it represented an
empty field.  Not a missing field, because the field was technically still
thereit just had NO data in it.  When you included the line:

 jesse,,,Dartmouth college

and claimed that the middle comma represented a missing age, to a textual
based scanning program that has been told that the comma is a field separator
means that there are now four fields in the line, not just three.

If, from the beginning, you had shown that you meant for the comma to be used
both as a delimiter and as a piece of data, then a lot of the answers would
have been completely different (and probably considerably more complex).

md
-- 
Jon maddog Hall
Executive Director   Linux International(R)
email: [EMAIL PROTECTED] 80 Amherst St. 
Voice: +1.603.672.4557   Amherst, N.H. 03031-3032 U.S.A.
WWW: http://www.li.org

Board Member: Uniforum Association, USENIX Association

(R)Linux is a registered trademark of Linus Torvalds in several countries.
(R)Linux International is a registered trademark in the USA used pursuant
   to a license from Linux Mark Institute, authorized licensor of Linus
   Torvalds, owner of the Linux trademark on a worldwide basis
(R)UNIX is a registered trademark of The Open Group in the USA and other
   countries.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread klussier

 -- Original message --
From: Zhao Peng [EMAIL PROTECTED]
 Hi All,
 

 Kenny, your grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt 
 works. I mis-read /\ as a simliar sign on the top of 6 key on the 
 keyboard(so when I typed that sign, I felt strange that it is much 
 smaller than /\, but didn't realize that they just are not the same 
 thing), instead of forward slash and back slash. I felt really 
 embarrassed with my stupid mistake. //blush

It happens. Believe me, I have done much dumber things in my time :-)

 Kenny, regarding missing column issue, let me try to explain it again. 
 Below is quoted from my original post:

[SNIP]

 You said that there is an extra column in the 3rd line. I disagree 
 with you from my perspective. As you can see, there are 3 commas in 
 between jesse and Dartmouth college. For these 3 commas, again, if 
 we think the 2nd one as an merely indication that the value for age 
 column is missing, then the 3rd line will be be read as [jesse, 
 MISSING, Dartmouth college], not [jesse,empty,empty, Dartmouth 
 college] as you suggested.

This poses an interesting problem. The , is being used for two purposes: a 
delimiter *AND* as a place holder. Unfortunately, cut and the like will see it 
as a delimiter and only a delimiter. It's what they do. I think that you may 
need to use the awk line that I sent, or some of the perl one-liners to get 
just the last column. Otherwise, you will end up with emty fields. 


 For one particular variable(column) called 
 school, the length of some of its value is quite long(like: Univ of 
 Wisconsin at Madison, Health Sci Ctr), but I don't know the definite 
 length. I need to know it, because if the length I specify it not 
 enough, only partial values will be read. Many of its values contain 
 univ, so I just thought if I could extract all strings containing 
 univ from that variable(column), I will have a better chance to figure 
 out the length of school. That's why I had this question.

This is going to be another problem. Every , that is used is going to be seen 
as a dilimiter. If the school name has a , in it as there is between Madison 
and Health above. That means that taking just the last field will not work 
either. I think that the easiest thing to do in this case is to change the 
delimiter to something that is unlikely to be found in any of the columns, like 
a :. 

C-Ya,
Kenny
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread Paul Lussier
Zhao Peng [EMAIL PROTECTED] writes:

 First I really cannot be more grateful for the answers to my question
 from all of you, I appreciate your help and time. I'm especially
 touched by the outpouring of response on this list., which I have
 never experienced  before anywhere else.

Zhao, this is a pretty amazing list, as you and many others have
discovered.  It's seldom I find as good, or complete, answers anywhere
else.  And most often, the ensuing discussion is more interesting,
educational, and enlightening than the original question posed.  (It's
often amusing to me when I google for an answer to a question and
within the top 10 returns from google is a reference to this list.
More amusing is when it was *I* who answered the question for someone
else which I am now asking :)

 Kenny, your grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt
 works. I mis-read /\ as a simliar sign on the top of 6 key on the
 keyboard(so when I typed that sign, I felt strange that it is much
 smaller than /\, but didn't realize that they just are not the same
 thing), instead of forward slash and back slash. I felt really
 embarrassed with my stupid mistake. //blush

A, this makes so much more sense now.  So you in fact typed
something like:

  grep univ abc.txt | cut -f3 -d, | sed s/^/\/g 

?

That still doesn't end up with a '' in def.txt, but depending upon
exactly what you typed, I can certainly see where the use of ^ instead
of /\ could result in something like that.

For educational purposes, the use of ^ is to anchor following
pattern to match from the beginning of the line.  Therefore:

 sed 's/foo/bar/g'

and

 sed 's/^foo/bar/g'

are very different, since the former results in all occurrences of
'foo' being replaced with 'bar', whereas the latter only changes foo
to bar when foo is found at the beginning of the line.  The use of '$'
in a pattern does exactly the same thing, except for it anchors
patterns at the *end* of a line.

Btw, I highly recommend reading the O'Reilly book on Regular
Expressions.  If you're going to be doing a lot of this type of data
mining, a solid understanding of regexps and mastery of perl will make
your life significantly more fun.

Also, you might want to play with with writing perl/shell scripts that
output data parseable by gnuplot which allow you to auto-generate some
rather interesting and complicated graphs of the data (I know SAS can
do all this, but I bet it's no where as interesting or fun as learning
the UNIX way of doing it, and you don't need an SAS license either ;)

 You said that there is an extra column in the 3rd line. I disagree
 with you from my perspective. As you can see, there are 3 commas in
 between jesse and Dartmouth college. For these 3 commas, again, if
 we think the 2nd one as an merely indication that the value for age
 column is missing, then the 3rd line will be be read as [jesse,
 MISSING, Dartmouth college], not [jesse,empty,empty, Dartmouth
 college] as you suggested.

If you're going to be doing a lot of this type of thing, then perl
will most definitely be your best friend :)a

 Paul, as to your simplest by what measurement question. I was
 thinking of both easiest to remember and easiest to understand
 when I was posting my question. Now I desire for most efficient
 approach. I know that will be my homework.

Well, again, most efficient by what measurement.  In the long run, I'm
going to bet it's in your best interests to learn perl, since it's one
tool which will allow you write rather small and arbitrarily complex
scripts which would mostly obviate the need to learn several different
tools like cut, sed, awk, comm, etc.  In fact, learning perl will
likely lead you to learn about these other tools over time as the
situation dictates, but make you vastly more productive in the short
term.  Since perl excels at textual manipulation, it's perfect for
this type of data analysis.  And, since perl, combined with gnuplot,
is simple to run from an Apache web server Well, I'm sure your
imagination will lead you to wherever you need to go :)

Good luck, and please feel free to post more interesting questions.

-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread Kevin D. Clark

Zhao Peng writes:

 ... your grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt
 works.

It works but is it correct?

What happens if you pass it the following line of input?:

  Aunivz,28,Cambridge Community College

By your original problem description, you don't want to see Cambridge
Community College but there it is.

I might have overlooked something, but I believe that I have only seen
two people post correct solutions so far.

Just something to think about.

Regards,

--kevin
-- 
(There are also also 228 babies named Unique during the 1990s alone,
and 1 each of Uneekm, Uneque, and Uneqqee.)

-- _Freakonomics_, Steven D. Levitt and Stephen J. Dubner


[but no Unix folks named their kids uniq, apparently.  --kevin]

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string -- TIMTOWTDI

2006-01-11 Thread Paul Lussier
William D Ricker [EMAIL PROTECTED] writes:

 On 1/10/06, Paul Lussier [EMAIL PROTECTED] wrote:
   perl -ne 'split ,; $_ = $_[2]; s/(^)|($)//g; print if m/univ/;' 
   abc.txt  def.txt
  Egads!

[outstanding explanation I didn't have time to write myself removed ]

 None of this is seriously obfuscatory golfing, but if someone wanted to
 say darn the cost of forking new processes off bash, 'awk/cut|grep|sed'
 is easier to read, well, I won't argue that it's easier for him/her
 to read, and they should do it that way -- unless they need to tune
 for performance.

I would, however, offer that if someone were to find
'awk/cut|grep|sed' easier to read, then that person a) wouldn't have
asked this question ;) and b) would certainly benefit from learning
perl for those times when the cost of forking new processes off bash
can't be ignored for some reason :) Additionally, perl offers the
benefit of a debugger which can be immensely helpful for even simple
one liner tasks.
-- 
Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread Thomas Charron
On 1/11/06, Zhao Peng [EMAIL PROTECTED] wrote:
Hi All,First I really cannot be more grateful for the answers to my questionfrom all of you, I appreciate your help and time. I'm especially touched
by the outpouring of response on this list., which I have neverexperiencedbefore anywhere else.

 I hope my little comment didn't seem mean, I was more poking fun at the fact that if someone posted a simular post, and called themselves a Systems Administrator on a Windows network, comments simular to mine would have come forth.. ;-)


Secondly I'm sorry for the big stir-up as to homework problems whichflooded the list, since I'm origin of it.


 Nah, it wasn't a flood. Trust me, once you see a flood, you'll know it. Usually, it's becouse someone says something political in nature.

Kenny, regarding missing column issue, let me try to explain it again.Below is quoted from my original post:
Also, if one column is missing, and , is used to indicate that missingcolumn, like the following (2nd column of 3rd line is missing):name,age,school
jerry ,21,univ of Vermontjesse,,,Dartmouth collegejack,18,univ of Pennjohn,20,univ of south Florida
===You said that there is an extra column in the 3rd line. I disagreewith you from my perspective. As you can see, there are 3 commas inbetween jesse and Dartmouth college. For these 3 commas, again, if
we think the 2nd one as an merely indication that the value for agecolumn is missing, then the 3rd line will be be read as [jesse,MISSING, Dartmouth college], not [jesse,empty,empty, Dartmouth
college] as you suggested.

 This is unusual, as typically, a comma delimited set of values would simply have nothing between the commas, or a set of quotes with no data.

 Typically the line would look like this:

jesse,,Dartmouth college

 Or


jesse,,Dartmouth college
Paul, as to your simplest by what measurement question. I was thinkingof both easiest to remember and easiest to understand when I was
posting my question. Now I desire for most efficient approach. I knowthat will be my homework.

 If this is something that you will be doing repeatedly for different files types, I'd highly suggest getting familiar with regular expressions. You've seen a small snippet in Kenny's example 'sed s/\//g'. The 's/\//g' says to globally replace all quotes with nothing (s = substitute, /1/2/ says 'replace everything matching 1 with 2', in this case, a quote, with nothing. g means globally, aka, do it more then just once. Regular expressions are a powerful way to parse text files based on a given pattern, to get at the data you want.


Part of my primary job responsibilities is to convert raw data into SASdata sets. My extract string question comes from processing a raw data
file in .txt format, which doesn't have any documentation, except thevariable list. By looking at the raw data, I know that each variable isseparated by a comma. For one particular variable(column) calledschool, the length of some of its value is quite long(like: Univ of
Wisconsin at Madison, Health Sci Ctr), but I don't know the definitelength. I need to know it, because if the length I specify it notenough, only partial values will be read. Many of its values containuniv, so I just thought if I could extract all strings containing
univ from that variable(column), I will have a better chance to figureout the length of school. That's why I had this question.

 Haven't even run it, but something perl like:

my $maxlen = 0;while() { /^(.*),(.*),(.*)$/; if(length($3)  $maxlen) { $maxlen = $3; }}print Longest String in third column is $maxlen\n;

 This would read on STDIN till it couldn't read anymore. Each line, it would split based on the commas (If the third column contains commas, this won't work, becouse $2 or $1 would be greedy and gobble some of the data, FYI), and check the length of the third field against max length. If it's longer, assign it. At the end, print it out.


 This Regular _expression_ isn't great, but it's the 20 second typing version.

 Thomas


Re: extract string

2006-01-11 Thread Bill McGonigle

On Jan 11, 2006, at 08:42, [EMAIL PROTECTED] wrote:

This poses an interesting problem. The , is being used for two 
purposes: a delimiter *AND* as a place holder.


I tried to prove to myself last night that this method would produce 
unresolvable ambiguities, but if you think like a state machine, 
character-by-character, it seems to work.


Now, for the Lazy, Perl regular expressions are a state machine of 
sorts.  I suspect you might be able to do the right thing with 
greedy/non-greedy matches.  Someone who lives and breathes regex might 
have a better handle on this.  It would take me two hours to get this 
one figured out.


This format sure makes the parser harder though, so if there's another 
way to get the data that's going to be desirable.  You can't use 
Text::CSV::Simple anymore, for instance, which gives you a 15-minute 
explicit reusable solution.


-Bill

-
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
[EMAIL PROTECTED]   Cell: 603.252.2606
http://www.bfccomputing.com/Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-11 Thread Thomas Charron
On 1/11/06, Bill McGonigle [EMAIL PROTECTED] wrote:
On Jan 11, 2006, at 08:42, [EMAIL PROTECTED] wrote: This poses an interesting problem. The , is being used for two
 purposes: a delimiter *AND* as a place holder.Now, for the Lazy, Perl regular expressions are a state machine ofsorts.I suspect you might be able to do the right thing withgreedy/non-greedy matches.Someone who lives and breathes regex might
have a better handle on this.It would take me two hours to get thisone figured out.

 Hehe, it'd be one of those really, REALLY ugly Regular expressions that, when you stare at it long enough, looks like ASCII art in order to make it 100%. ;-)

 Thomas


Re: extract string

2006-01-11 Thread Drew Van Zandt
perl

split on the char pair ,
Take last element of returned array, either remove the  at the end or replace the one you ate with the split.
Keep a running variable containing largest length encountered so far.
Add 10 to be safe. ;-)

Any regexp I have to think about for more than 30 seconds is unlikely
to be used unless it greatly improves my execution speed...and then
only if I have a LOT of data to process. :-)

--Drew Not showing you my crufty perl VZ



Re: extract string

2006-01-11 Thread Kevin D. Clark

Zhao Peng [EMAIL PROTECTED] writes:

 You said that there is an extra column in the 3rd line. I disagree
 with you from my perspective. As you can see, there are 3 commas in
 between jesse and Dartmouth college. For these 3 commas, again, if
 we think the 2nd one as an merely indication that the value for age
 column is missing, then the 3rd line will be be read as [jesse,
 MISSING, Dartmouth college], not [jesse,empty,empty, Dartmouth
 college] as you suggested.

From my perspective, your file format makes it harder to be parsed.
If at all possible, I would suggest that if you can, you modify this
file's format.

Still, if this isn't possible, this works on your input:

perl -lane 's/,,/,MISSING/g; @F = split /,/; if (index($F[-1], univ) != -1) { 
($u = $F[-1]) =~ y///d; print $u }'


Formatted more readibly, this looks like this:

perl -lne 's/,,/,MISSING/g;
@F = split /,/; 

if (index($F[-1], univ) != -1) {
  ($u = $F[-1]) =~ y///d;
  print $u
}'


This seems to be a reasonable solution to your problem.  I hope it
helps.


Just another Perl hacker,

--kevin
-- 
GnuPG ID: B280F24E

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


extract string

2006-01-10 Thread Zhao Peng

Hi

Suppose that I have a file called abc.txt, which contains the following 
5 lines (columns are delimited by ,)


name,age,school
jerry ,21,univ of Vermont
jesse,28,Dartmouth college
jack,18,univ of Penn
john,20,univ of south Florida

My OS is RedHat Enterprise, how could I extract the string which 
contains univ and create an output file called def.txt, which only has 
3 following lines:


univ of Vermont
univ of Penn
univ of south Florida

Please suggest the simplest command line approach.

Thank you.
Zhao
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


RE: extract string

2006-01-10 Thread Whelan, Paul
Like so: cat abc.txt | cut -d, -f3

Thanks.

-Original Message-
From: Zhao Peng [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 10, 2006 11:51 AM
To: gnhlug-discuss@mail.gnhlug.org
Subject: extract string

Hi

Suppose that I have a file called abc.txt, which contains the following 
5 lines (columns are delimited by ,)

name,age,school
jerry ,21,univ of Vermont
jesse,28,Dartmouth college
jack,18,univ of Penn
john,20,univ of south Florida

My OS is RedHat Enterprise, how could I extract the string which 
contains univ and create an output file called def.txt, which only has

3 following lines:

univ of Vermont
univ of Penn
univ of south Florida

Please suggest the simplest command line approach.

Thank you.
Zhao
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


RE: extract string

2006-01-10 Thread klussier
Actually, if you are looking for only lines that contain the string univ, 
then you would want to grep for it:

grep univ abc.txt | cut -f3 -d,  dev.txt.

Paul's example would give you the third field of each line, even if they don't 
have univ in them. Now, if you wanted to remove the quotes, then you would 
need something like:


grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt 

FYI,
Kenny

 -- Original message --
From: Whelan, Paul [EMAIL PROTECTED]
 Like so: cat abc.txt | cut -d, -f3
 
 Thanks.
 
 -Original Message-
 From: Zhao Peng [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, January 10, 2006 11:51 AM
 To: gnhlug-discuss@mail.gnhlug.org
 Subject: extract string
 
 Hi
 
 Suppose that I have a file called abc.txt, which contains the following 
 5 lines (columns are delimited by ,)
 
 name,age,school
 jerry ,21,univ of Vermont
 jesse,28,Dartmouth college
 jack,18,univ of Penn
 john,20,univ of south Florida
 
 My OS is RedHat Enterprise, how could I extract the string which 
 contains univ and create an output file called def.txt, which only has
 
 3 following lines:
 
 univ of Vermont
 univ of Penn
 univ of south Florida
 
 Please suggest the simplest command line approach.
 
 Thank you.
 Zhao
 ___
 gnhlug-discuss mailing list
 gnhlug-discuss@mail.gnhlug.org
 http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
 
 ___
 gnhlug-discuss mailing list
 gnhlug-discuss@mail.gnhlug.org
 http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Zhao Peng

Kenny,

Thank you for your suggestion.

The following line works:
grep univ abc.txt | cut -f3 -d,  dev.txt.


While the following line intended to remove quotes does NOT work:
grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt
It resulted in a line starts with  prompt, and not output dev.txt

Could you please double-check or modify it?

Also, if one column is missing, and , is used to indicate that missing 
column, like the following (2nd column of 3rd line is missing):


name,age,school
jerry ,21,univ of Vermont
jesse,,,Dartmouth college
jack,18,univ of Penn
john,20,univ of south Florida

Does the cut approach still apply? If not, what command would you 
suggest to address this missing issue?


Thank you again.
Zhao


[EMAIL PROTECTED] wrote:

Actually, if you are looking for only lines that contain the string univ, 
then you would want to grep for it:

grep univ abc.txt | cut -f3 -d,  dev.txt.

Paul's example would give you the third field of each line, even if they don't have 
univ in them. Now, if you wanted to remove the quotes, then you would need 
something like:


grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt 


FYI,
Kenny

 -- Original message --
From: Whelan, Paul [EMAIL PROTECTED]
  

Like so: cat abc.txt | cut -d, -f3

Thanks.

-Original Message-
From: Zhao Peng [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 10, 2006 11:51 AM

To: gnhlug-discuss@mail.gnhlug.org
Subject: extract string

Hi

Suppose that I have a file called abc.txt, which contains the following 
5 lines (columns are delimited by ,)


name,age,school
jerry ,21,univ of Vermont
jesse,28,Dartmouth college
jack,18,univ of Penn
john,20,univ of south Florida

My OS is RedHat Enterprise, how could I extract the string which 
contains univ and create an output file called def.txt, which only has


3 following lines:

univ of Vermont
univ of Penn
univ of south Florida

Please suggest the simplest command line approach.

Thank you.
Zhao
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss





  


___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Zhao Peng [EMAIL PROTECTED] wrote:
 how could I extract the string which
 contains univ and create an output file called def.txt, which only has
 3 following lines:

Here's one way, as a Perl one-liner:

perl -ne 'split ,; $_ = $_[2]; s/(^)|($)//g; print if m/univ/;' 
abc.txt  def.txt

That trims out the quotes, as it appears you want.  The search for
univ is case-sensitive.

Broken down into a script with comments:

#!/usr/bin/perl -n
split ,; # split input fields into @_ (split at commas)
$_ = $_[2];# grab the third field, put into default workspace ($_)
s/(^)|($)//g;# delete double-quote () at start and/or end
print if m/univ/;  # print if contains univ

  HTH,

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Whelan, Paul [EMAIL PROTECTED] wrote:
 Like so: cat abc.txt | cut -d, -f3

1.  Randal Schwartz likes to call that UUOC (Useless Use Of cat).  :-)
 You can just do this instead:

  cut -d, -f3  abc.txt

If you like the input file at the start of the command line, that's legal, too:

  abc.txt cut -d, -f3

You can read more about UUOC at: http://sial.org/code/shell/tips/useless-cat/

2. The above simply returns the third field.  OP appeared to want only
lines containing univ.  So:

 cut -d, -f3  abc.txt | grep univ

3. I'll leave the quote removal as an exercise to the reader.  ;-)

-- Ben Pedantic Scott  ;-)
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Zhao Peng [EMAIL PROTECTED] wrote:
 While the following line intended to remove quotes does NOT work:
 grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt
 It resulted in a line starts with  prompt, and not output dev.txt

  The  prompt indicates the shell thinks you are still in the
middle of some shell construct, and is prompting you to finish it.  It
usually manifests due to an unclosed quote.  Most likely, something is
eating the backslash that appears before the double-quote in the sed
command.  It should be

 sed s/\//g

where the second word contains the characters letter s, a forward
slash (/), a backslash (\), a double-quote, two forward slashes (//),
and the letter g.  The backslash tells the shell that the following
character (in this case, a quote) is not to be interpreted as shell
syntax, but instead passed to the specified command as is.  This is
called an escape character or a shell escape.

  If you're putting this shell command inside some other program or
shell, you may find *that* program also interprets the backslash this
way.  So you need to escape it *twice*:

 sed s/\\//g

The characters \\ get interpreted by the first program as literal
backslash here.  The shell then receives a single backslash, which it
applies to the double-quote.

  Shell escapes can get very, very messy.

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Jeff Kinz
Ooo, look! - a new business model for Lugs!

Achieve Lug financial independence today!

Now your Lug can achieve its financial funding goals simply by charging
25 cents for each shell scripting homework problem answered and 50 cents
for extended explanations such as rendered below. :-)

All we need now is a PayPal account. :-)

(rendered tongue at least halfway in cheek, all proceeds to
go to GNHLUGS tab at Martha's)



On Tue, Jan 10, 2006 at 01:23:14PM -0500, Ben Scott wrote:
 On 1/10/06, Zhao Peng [EMAIL PROTECTED] wrote:
  While the following line intended to remove quotes does NOT work:
  grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt
  It resulted in a line starts with  prompt, and not output dev.txt
 
   The  prompt indicates the shell thinks you are still in the
 middle of some shell construct, and is prompting you to finish it.  It
 usually manifests due to an unclosed quote.  Most likely, something is
 eating the backslash that appears before the double-quote in the sed
 command.  It should be
 
  sed s/\//g
 
 where the second word contains the characters letter s, a forward
 slash (/), a backslash (\), a double-quote, two forward slashes (//),
 and the letter g.  The backslash tells the shell that the following
 character (in this case, a quote) is not to be interpreted as shell
 syntax, but instead passed to the specified command as is.  This is
 called an escape character or a shell escape.
 
   If you're putting this shell command inside some other program or
 shell, you may find *that* program also interprets the backslash this
 way.  So you need to escape it *twice*:
 
  sed s/\\//g
 
 The characters \\ get interpreted by the first program as literal
 backslash here.  The shell then receives a single backslash, which it
 applies to the double-quote.
 
   Shell escapes can get very, very messy.
 
 -- Ben
 ___
 gnhlug-discuss mailing list
 gnhlug-discuss@mail.gnhlug.org
 http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
 

-- 
Jeff Kinz, Emergent Research, Hudson, MA.
speech recognition software may have been used to create this e-mail

The greatest dangers to liberty lurk in insidious encroachment by men
of zeal, well-meaning but without understanding. - Brandeis

To think contrary to one's era is heroism. But to speak against it is
madness. -- Eugene Ionesco
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Paul Lussier
Zhao Peng [EMAIL PROTECTED] writes:

 Hi

 Suppose that I have a file called abc.txt, which contains the
 following 5 lines (columns are delimited by ,)

 name,age,school
 jerry ,21,univ of Vermont
 jesse,28,Dartmouth college
 jack,18,univ of Penn
 john,20,univ of south Florida

 My OS is RedHat Enterprise, how could I extract the string which
 contains univ and create an output file called def.txt, which only
 has 3 following lines:

 univ of Vermont
 univ of Penn
 univ of south Florida


Here are 3, pick your poison:

  awk -F, '/univ/  gsub(/\/,) {print $3}' abc.txt  def.txt
  perl -F, -ane 'if (/univ/) { $F[2] =~ s/\//g; print $F[2]};' abc.txt \
 def.txt
  grep univ abc.txt | cut -f3 -d, | sed 's/\//g'  def.txt


 Please suggest the simplest command line approach.

Simplest by what measurement?

 - fewest processes spawned
 - most efficient
 - least amount of typing
 - easiest to remember
 - easiest to understand
 - ability to debug
 - extensibility

Simplest is a rather subjective approach...





-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Paul Lussier
Ben Scott [EMAIL PROTECTED] writes:

 Here's one way, as a Perl one-liner:

 perl -ne 'split ,; $_ = $_[2]; s/(^)|($)//g; print if m/univ/;' 
 abc.txt  def.txt

Egads!
-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Paul Lussier
[EMAIL PROTECTED] writes:

 Actually, if you are looking for only lines that contain the string univ, 
 then you would want to grep for it:

 grep univ abc.txt | cut -f3 -d,  dev.txt.

Why are you appending to dev.txt? (or def.txt even).  Are you assuming
the file already exists and don't want to over-write the contents?


 Paul's example would give you the third field of each line, even if
 they don't have univ in them. Now, if you wanted to remove the
 quotes, then you would need something like:


 grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt 

yep, that should work, but no need for the  when a simple  will do.

-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Michael ODonnell


 Ooo, look! - a new business model for Lugs!

I happen to like these threads and far from regarding
them as a burden I think they're a pleasant diversion
and extremely useful as learning opportunities.

But I've been asking for a long time when our IPO will
be happening; we've got more talent and longevity than
most of the scams^H^H^H^H^Hventures that got millions
during the dotcom era, and when the juices are really
flowing most VC firms appear(ed) to regard those pesky
business plans as optional, anyway...;-

 
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Drew Van Zandt
While it does seem like a few man page pointers would be better (more
instructive in the long run), I have to admit I wasn't familiar with
cut, so I've learned something from this one.

--Drew



Homework problems (was: extract string)

2006-01-10 Thread Ben Scott
On 1/10/06, Jeff Kinz [EMAIL PROTECTED] wrote:
 Now your Lug can achieve its financial funding goals simply by charging
 25 cents for each shell scripting homework problem answered and 50 cents
 for extended explanations such as rendered below. :-)

  I was wondering if I should raise the Ya know, this looks an awful
lot like a homework problem to me... question.  But I also considered
the following:

  Assume it is a homework problem.  Does it make a real difference
whether the student learns the material from the text book, this list,
or some random web page found via Google?

  That assumes the student learns the material of course.  But
students copy answers from each other, text books, and other materials
all the time.  If someone is so dumb as to copy material without
learning it, well, life will eventually teach them the folly of that,
too.  Life is a persistent teacher.

  There's also the fact that several answers have been posted, each
with varying degrees of applicability to the questions posed.  The
student would have to sort through those answers and choose the one
they thought best suited to the task at hand.  That's a useful skill,
too.

  And if the student hands in a Perl one-liner in a basic class on
shell scripting, the resulting student/instructor discuss will
doubtless by very educational.

  This is not to say I'm going to go down a list of home work
questions and provide concise, unexplained answers.  I generally try
to avoid that regardless of the audience.  I find the explanation and
reasoning behind the solution to a problem is generally far more
educational then the solution alone.

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Paul Lussier [EMAIL PROTECTED] wrote:
  perl -ne 'split ,; $_ = $_[2]; s/(^)|($)//g; print if m/univ/;' 
  abc.txt  def.txt

 Egads!

  Egads?

-- Ben As I was saying about explanation... Scott
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread klussier

 -- Original message --
From: Paul Lussier [EMAIL PROTECTED]
 [EMAIL PROTECTED] writes:
 
  Actually, if you are looking for only lines that contain the string univ, 
 then you would want to grep for it:
 
  grep univ abc.txt | cut -f3 -d,  dev.txt.
 
 Why are you appending to dev.txt? (or def.txt even).  Are you assuming
 the file already exists and don't want to over-write the contents?
 

That is exactly what I was thinking. Even if it isn't being appended to, The 
result is essentially the same. Unless, of course, you want to over-write the 
file. Then that would work out to well.  It's better to be safe then sorry  :-) 
Besides, I've been doing a log of this sort of thing in the last few days, and 
the  just sort of rolled off my fingertips. 
 
  Paul's example would give you the third field of each line, even if
  they don't have univ in them. Now, if you wanted to remove the
  quotes, then you would need something like:
 
 
  grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt 
 
 yep, that should work, but no need for the  when a simple  will do.

What? Two redirects are better then one, right :-)

C-Ya,
Kenny
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Drew Van Zandt [EMAIL PROTECTED] wrote:
 While it does seem like a few man page pointers would be better (more
 instructive in the long run), I have to admit I wasn't familiar with cut, so
 I've learned something from this one.

  Since we're on the subject...

  Is there a tool that quickly and easily extracts one or more columns
of text (separated by whitespace) from an output stream?  I'm familiar
with the

  awk '{ print $3 }'

mechanism, but I've always felt that was clumsy.  I've tried to get
cut(1) to do it in the past, but the field separator semantics appear
to assume one and only one separator, not whitespace (one or more
space or tab characters).

  I get the feeling there is some command or switch I'm not aware of
that I should be using.  This hyptherical command might work something
like this:

 ls -l | foo 3

to extract just the third column (username) from the ls(1) output.

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


RE: Homework problems (was: extract string)

2006-01-10 Thread Brian
Answer C: Who cares? 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Ben Scott
 Sent: Tuesday, January 10, 2006 4:01 PM
 To: gnhlug-discuss@mail.gnhlug.org
 Subject: Homework problems (was: extract string)
 
   Assume it is a homework problem.  Does it make a real 
 difference whether the student learns the material from the 
 text book, this list, or some random web page found via Google?
 
 

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Jon maddog Hall

[EMAIL PROTECTED] said:
 While the following line intended to remove quotes does NOT work:
 grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt
 It resulted in a line starts with  prompt, and not output dev.txt

 I can't see any reason why what state should be happening.  As a matter of
 fact, I tried that exact command line on my system and it worked exactly as
 (specified|advertised|expected). 

Might this not be affected by different command interpreters?  sh vs csh vs ksh
vs bash?


Simplest by what measurement?

 - fewest processes spawned
 - most efficient
 - least amount of typing
 - easiest to remember
 - easiest to understand
 - ability to debug
 - extensibility

Most portable?

[EMAIL PROTECTED] said:
 While it does seem like a few man page pointers would be better (more
 instructive in the long run), I have to admit I wasn't familiar with cut, so
 I've learned something from this one. 

I still remember the time that I first was learning UNIX (all capital 
letters)...

I was the senior systems administrator for Bell Labs in North Andover, MA.  I
got the job without ever having seen a UNIX system.  Of course I had programmed
on dozens of different OS systemsbut there I was, late at night, trying
to solve much the same type of problem that was solved here.

After thinking about it, and wondering if I would have to write a program
to do it, I thought to myself...I do not KNOW that UNIX has a command that
could do this, but I am willing to BET it does.  And I started paging through
section 1 of the manual.the shell commands.  Sure enough, I came to cut(1)
and it was exactly what I needed.  (Later on I was glad the command was not
at the back of the section, being named something like Yet Another Cut
Command.hmmmmaybe in a way it was) :-}

It was that experience that led me to page through section (1) of the manual
every six months, just to remind myself of the gold that was hidden in those
pages.

Warmest regards,

maddog
-- 
Jon maddog Hall
Executive Director   Linux International(R)
email: [EMAIL PROTECTED] 80 Amherst St. 
Voice: +1.603.672.4557   Amherst, N.H. 03031-3032 U.S.A.
WWW: http://www.li.org

Board Member: Uniforum Association, USENIX Association

(R)Linux is a registered trademark of Linus Torvalds in several countries.
(R)Linux International is a registered trademark in the USA used pursuant
   to a license from Linux Mark Institute, authorized licensor of Linus
   Torvalds, owner of the Linux trademark on a worldwide basis
(R)UNIX is a registered trademark of The Open Group in the USA and other
   countries.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Ben Scott
On 1/10/06, Jon maddog Hall [EMAIL PROTECTED] wrote:
 I was the senior systems administrator for Bell Labs in North Andover, MA.  I
 got the job without ever having seen a UNIX system.

  Well, really.  How many people *had* seen a UNIX system, back then?  ;-)

  (Sorry, couldn't resist.)

 It was that experience that led me to page through section (1) of the manual
 every six months, just to remind myself of the gold that was hidden in those
 pages.

  I learned the plurality of what I know about the shell by reading
the bash(1) man page (and most of that while sitting at VT-320's in
UNH computer clusters, no less).  Never underestimate the power of
RTFM.  :-)

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread klussier

 -- Original message --
From: Zhao Peng [EMAIL PROTECTED]
 Kenny,
 
 Thank you for your suggestion.
 
 The following line works:
 grep univ abc.txt | cut -f3 -d,  dev.txt.
 
 
 While the following line intended to remove quotes does NOT work:
 grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt
 It resulted in a line starts with  prompt, and not output dev.txt
 
 Could you please double-check or modify it?

I have checked it, and it works exactly as it should. 

 Also, if one column is missing, and , is used to indicate that missing 
 column, like the following (2nd column of 3rd line is missing):
 
 name,age,school
 jerry ,21,univ of Vermont
 jesse,,,Dartmouth college
 jack,18,univ of Penn
 john,20,univ of south Florida
 
 Does the cut approach still apply? If not, what command would you 
 suggest to address this missing issue?
 

A column is not missing, it is just empty. It is still delimited by a ,, so 
it is still a valid column. However, in the example above, there is an extra 
column in the 3rd line. All of the other lines have name,age,school. Line 3 
has name,empty,empty,school.

Now, if you know that the school is always going to be the last field, you may 
not want to use cut at all. You might want to use something like :

grep univ abc.txt | awk -F, '{print $NF}'| sed 's/\//g'

awk takes the place of `cut` in this case by looking at the last field 
delimited by a ,.

FYI,
Kenny
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Homework problems (was: extract string)

2006-01-10 Thread Travis Roy

I agree..

Does it matter. We're here to help, discuss, and answer questions.

If you feel that your answer will cause more problems in the long run, 
then don't answer.


Not having a degree, some of my best information has come from places 
like this and other sources on the internet. In fact, during my 
interview for my current job I stated that knowing who to ask and where 
to get information is far more important then knowing the information. 
It's impossible to know everything.


If we give this person the wrong answer, and they fail, well.. that's 
what you get for trusting the list. If they get it right, good for them, 
they still got the job done.


Answer C: Who cares? 




-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Ben Scott

Sent: Tuesday, January 10, 2006 4:01 PM
To: gnhlug-discuss@mail.gnhlug.org
Subject: Homework problems (was: extract string)

 Assume it is a homework problem.  Does it make a real 
difference whether the student learns the material from the 
text book, this list, or some random web page found via Google?






___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Kevin D. Clark

Ben Scott writes:

   Is there a tool that quickly and easily extracts one or more columns
 of text (separated by whitespace) from an output stream?  I'm familiar
 with the

   awk '{ print $3 }'

 mechanism, but I've always felt that was clumsy.  I've tried to get
 cut(1) to do it in the past, but the field separator semantics appear
 to assume one and only one separator, not whitespace (one or more
 space or tab characters).

   I get the feeling there is some command or switch I'm not aware of
 that I should be using.  This hyptherical command might work something
 like this:

  ls -l | foo 3

 to extract just the third column (username) from the ls(1) output.

Thoughts:

0:  I've always found awk and cut to be very convenient for these
operations.  For complex things, I recommend Perl.  In particular,
awk and Perl allow for pattern separators, as you desire.

1:  You might find awk's -F option to be useful.

2:  Something like this is always fun:

   perl -F: -ane 'print join  , @F[0,5,6]' /etc/passwd

3:  If you really want foo, how about this:

   foo() {
 if [ $# -eq 0 ] ; then
   foo 0
 else
   awk { print `echo [EMAIL PROTECTED] | sed 's/\([0-9]*\)/\\$\1/g'` 
}
 fi
  }

   I leave it to the reader to improve this if so desired.  (-:

Hope this helps,

--kevin
-- 
(There are also also 228 babies named Unique during the 1990s alone,
and 1 each of Uneekm, Uneque, and Uneqqee.)

-- _Freakonomics_, Steven D. Levitt and Stephen J. Dubner

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Homework problems (was: extract string)

2006-01-10 Thread Jim Kuzdrall
On Tuesday 10 January 2006 04:13 pm, Brian wrote:
 Answer C: Who cares?

All of us will care when the country has to depend on the products 
of today's education system.  Get ready for it.  The standards are so 
incredibly low that these graduates will not even know the buzz words 
of technology.

The professors look the other way on all manner of things that were 
cheating when I got my education.  Copying from other students, 
lifting solutions from the Internet, putting all the test answers on 
one's programmable calculator, and other guarantees of ignorance are 
completely acceptable.

The cruel world will reveal their error when it is too late for them 
or us to recover from it.

It is not just in tech schools.  Can you imagine a cum laude English 
graduate who cannot spell or write a grammatical sentence?   Well, our 
family has just paid for one.

Jim Kuzdrall
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Homework problems (was: extract string)

2006-01-10 Thread Travis Roy

Jim Kuzdrall wrote:

On Tuesday 10 January 2006 04:13 pm, Brian wrote:


Answer C: Who cares?



All of us will care when the country has to depend on the products 
of today's education system.  Get ready for it.  The standards are so 
incredibly low that these graduates will not even know the buzz words 
of technology.


All education system debate aside...

How do we, as a list, tell what's a homework problem and what's a legit 
question. And if we start blocking homework questions a cheater will 
just work around that and word their question into something that seems 
like a personal or work related problem rather then a homework problem.



Just let it go, if you think it's somebody cheating then don't answer, 
or give them a vague answer or point them to places where they can learn 
about it rather then copy it off of.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Homework problems (was: extract string)

2006-01-10 Thread Bill McGonigle

On Jan 10, 2006, at 18:05, Travis Roy wrote:

How do we, as a list, tell what's a homework problem and what's a 
legit question.


I think there's little substitute for knowing the membership.  Zhao is 
a programmer for Dartmouth Medical School.


-Bill

-
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
[EMAIL PROTECTED]   Cell: 603.252.2606
http://www.bfccomputing.com/Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


RE: Homework problems (was: extract string)

2006-01-10 Thread Brian
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Jim Kuzdrall
 Sent: Tuesday, January 10, 2006 5:45 PM
 To: gnhlug-discuss@mail.gnhlug.org
 Subject: Re: Homework problems (was: extract string)
 
 
 All of us will care when the country has to depend on the 
 products of today's education system.  Get ready for it.  The 
 standards are so incredibly low that these graduates will not 
 even know the buzz words of technology.
 

Our country, as a whole, has never really depended on the cheaters and
slackers (we'll skip the shooting-fish-in-a-barrel political jokes here).
Those are the people that are sweeping our hallways, hanging bumpers on the
assembly lines, etc.

Our country participates in the World Economy, we trend toward the solutions
that provide the best cost/performance ratio, often looking globally for the
answer.

Our justice system was founded on the idea that It is better to let 1000
guilty men go free than to convict 1 innocent man.  On a Linux/FOSS
oriented list, newsgroup, etc, I would rather answer 1000 homework
questions, than risk alienating 1 potential comrade.

And, as we've come to find out, the OP was simply asking a legitimate
question that merely appeared homeworky.  I don't have enough time to be
that judgmental and concerned about the impact on the freedom of our country
because someone is too lazy to do their own homework.  Ask a question of me,
if I can I'll answer it, if I can't, I'll try to point you in the right
direction.  If you get Extra Credit for my answer, then you owe me a dollar,
you can paypal it to: [EMAIL PROTECTED]

:)

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Homework problems (was: extract string)

2006-01-10 Thread Jeff Kinz
On Tue, Jan 10, 2006 at 04:01:05PM -0500, Ben Scott wrote:
 On 1/10/06, Jeff Kinz [EMAIL PROTECTED] wrote:
  Now your Lug can achieve its financial funding goals simply by charging
  25 cents for each shell scripting homework problem answered and 50 cents
  for extended explanations such as rendered below. :-)
 
   I was wondering if I should raise the Ya know, this looks an awful
 lot like a homework problem to me... question.  But I also considered
 the following:
 

My homework business model was simply a tongue in cheek comment
'cause I was leaving and didn't have time to add anything
substantive to the thread.

   Assume it is a homework problem.  Does it make a real difference
 whether the student learns the material from the text book, this list,
 or some random web page found via Google?
 

I've always found all of the above to be useful tools for
learning.   

Well, not always - Google hit a dry spell from 1973 to 1997 
or thereabouts   :-)  

umm - wait a minute... 

(Googles for Google founding date.. )  1998.


   And if the student hands in a Perl one-liner in a basic class on
 shell scripting, the resulting student/instructor discuss will
 doubtless by very educational.

heh heh, very!

-- 
Jeff Kinz, Emergent Research, Hudson, MA.
speech recognition software may have been used to create this e-mail

The greatest dangers to liberty lurk in insidious encroachment by men
of zeal, well-meaning but without understanding. - Brandeis

To think contrary to one's era is heroism. But to speak against it is
madness. -- Eugene Ionesco
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Paul Lussier
Ben Scott [EMAIL PROTECTED] writes:

 On 1/10/06, Jon maddog Hall [EMAIL PROTECTED] wrote:
 I was the senior systems administrator for Bell Labs in North Andover, MA.  I
 got the job without ever having seen a UNIX system.

   Well, really.  How many people *had* seen a UNIX system, back then?  ;-)

Two; *Kernighan* and *Ritchie* ;)  This *was* Bell Labs, right ;)

-- 

Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Homework problems (was: extract string)

2006-01-10 Thread Thomas Charron
On 1/10/06, Bill McGonigle [EMAIL PROTECTED] wrote:
On Jan 10, 2006, at 18:05, Travis Roy wrote: How do we, as a list, tell what's a homework problem and what's a legit question.I think there's little substitute for knowing the membership.Zhao isa programmer for Dartmouth Medical School.

 For, or attending? ;-)

 A programmer that doesn't know how to grep and split text strings..

 Well.. Isn't..

 Thomas


Re: Homework problems (was: extract string)

2006-01-10 Thread Ben Scott
On 1/10/06, Thomas Charron [EMAIL PROTECTED] wrote:
A programmer that doesn't know how to grep and split text strings..

  Believe it or not, there are environments *other* then nix, and a
great many well-qualified professionals have never touched nix.  I
don't just mean doze, either.  Classic Mac, VMS, the various IBM
mainframe and mini systems, and other, less well know worlds have
syntax and tools all their own.  I, for one, think we should be
welcoming to newcomers to the nix world -- not scold them for being
new.

-- Ben Used to be a DOS weenie Scott
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Homework problems (was: extract string)

2006-01-10 Thread Jeff Kinz
On Tue, Jan 10, 2006 at 07:56:46PM -0500, Thomas Charron wrote:
 On 1/10/06, Bill McGonigle [EMAIL PROTECTED] wrote:
 
  On Jan 10, 2006, at 18:05, Travis Roy wrote:
   How do we, as a list, tell what's a homework problem and what's a
   legit question.
  I think there's little substitute for knowing the membership.  Zhao is
  a programmer for Dartmouth Medical School.
 
 
   For, or attending?  ;-)

Hopefully things haven't gotten so bad that programmers are now
attending medical school for their next career move.  ;-)

-- 
Jeff Kinz, Emergent Research, Hudson, MA.
speech recognition software may have been used to create this e-mail

The greatest dangers to liberty lurk in insidious encroachment by men
of zeal, well-meaning but without understanding. - Brandeis

To think contrary to one's era is heroism. But to speak against it is
madness. -- Eugene Ionesco
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Homework problems (was: extract string)

2006-01-10 Thread Christopher Schmidt
On Tue, Jan 10, 2006 at 07:56:46PM -0500, Thomas Charron wrote:
   A programmer that doesn't know how to grep and split text strings..
 
   Well..  Isn't..

I know of several ways to do it, but none of them would have worked as
well as the cut solution presented here. I've been working on Linux as
my primary platform for 2.5 years, I've been coding in various languages
for 5.

I'm relatively intelligent, know how to use awk, grep, and sed.

Considering the huge number of programmers who are doomed to forever
live and work in a GUI-only MSVC++ (or whatever it's called) without the
tools such as sed, grep and awk, I'd say I'm in the top 50% as far as
knowledge goes for programmers -- and I think I'm probably being
relatively modest.

The lack of knowledge of a simple command line tool to do what you want
it to does not indicate whether someone is a programmer or not. It
simply indicates one thing -- their level of experience with core *nix
tools. Lack of that is not an indication of deficiencies in their
ability to program.

I'm assuming that your post was made with tongue in cheek, but I think
it's a ridiculous statement and decided to do what all good people on
the internet do: blow it out of proportion in a rant on a mailing list
that few will ever care about. (I think I'm supposed to call you Hitler
now or something. Godwin told me that once.)

-- 
Christopher Schmidt
Web Developer


signature.asc
Description: Digital signature


Re: Homework problems (was: extract string)

2006-01-10 Thread Bill McGonigle


On Jan 10, 2006, at 20:16, Christopher Schmidt wrote:


The lack of knowledge of a simple command line tool to do what you want
it to does not indicate whether someone is a programmer or not. It
simply indicates one thing -- their level of experience with core *nix
tools. Lack of that is not an indication of deficiencies in their
ability to program.


Right - Zhao is pretty new to unix and linux.  For those following 
along, notice he didn't say, 'is there any way to do this' but 'what's 
the most efficient way to do this'?  (paraphrasing).


What matters is not whether one has achieved enlightenment but rather 
that one is on the path to enlightenment.


-Bill
-
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
[EMAIL PROTECTED]   Cell: 603.252.2606
http://www.bfccomputing.com/Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Homework problems (was: extract string)

2006-01-10 Thread Jeff Kinz
On Tue, Jan 10, 2006 at 08:16:47PM -0500, Christopher Schmidt wrote:
 On Tue, Jan 10, 2006 at 07:56:46PM -0500, Thomas Charron wrote:
A programmer that doesn't know how to grep and split text strings..
  
Well..  Isn't..
 
 I know of several ways to do it, but none of them would have worked as
 well as the cut solution presented here. I've been working on Linux as
 my primary platform for 2.5 years, I've been coding in various languages
 for 5.
 
 I'm relatively intelligent, know how to use awk, grep, and sed.
 
 Considering the huge number of programmers who are doomed to forever
 live and work in a GUI-only MSVC++ (or whatever it's called) without the
 tools such as sed, grep and awk, I'd say I'm in the top 50% as far as
 knowledge goes for programmers -- and I think I'm probably being
 relatively modest.
 
 The lack of knowledge of a simple command line tool to do what you want
 it to does not indicate whether someone is a programmer or not. It
 simply indicates one thing -- their level of experience with core *nix
 tools. Lack of that is not an indication of deficiencies in their
 ability to program.


Easily fixed, All we need is the appropriate man page:..

http://ars.userfriendly.org/cartoons/?id=19990216;-)

I have that one on the cover of my Intro to Linux slides



-- 
Jeff Kinz, Emergent Research, Hudson, MA.
speech recognition software may have been used to create this e-mail

The greatest dangers to liberty lurk in insidious encroachment by men
of zeal, well-meaning but without understanding. - Brandeis

To think contrary to one's era is heroism. But to speak against it is
madness. -- Eugene Ionesco
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Homework problems (was: extract string)

2006-01-10 Thread Jim Kuzdrall
On Tuesday 10 January 2006 06:05 pm, Travis Roy wrote:

 Just let it go, if you think it's somebody cheating then don't
 answer, or give them a vague answer or point them to places where
 they can learn about it rather then copy it off of.

That is my technique too.  I get to answer a lot of questions about 
IR optics and detector physics on my web site.  If it appears I am 
doing someone's term project, I start outlining how to get the answers 
to the problem rather than giving the answers.

I must confess, though, that I have a hard time keeping my mouth 
(keyboard) shut if I know the answer.

There have always been cheaters, but the American culture has 
changed dramatically in the last 40 years.  Claiming the work of 
others as your own has become completely accepted.

Jim Kuzdrall


___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string -- TIMTOWTDI

2006-01-10 Thread William D Ricker
 On 1/10/06, Paul Lussier [EMAIL PROTECTED] wrote:
   perl -ne 'split ,; $_ = $_[2]; s/(^)|($)//g; print if m/univ/;' 
   abc.txt  def.txt
  Egads!

That's a literal start at a Perl bring the grep sed and cut-or-awk
into one process, but it's not maximally Perl-ish.  It is also
inefficient, it does the side-effecting  removal before discarding
non-univs. It is a literal translation of a `cut | sed | grep` pipe,
not of the `cut|grep|sed` pipe shown earlier.  Of coures, if the
requirements allowed it, a `grep|cut|sed` pipe would be the best
shell impelementation -- but that relies  upon knowning all 'univ'
are in the desired column, which we haven't been granted.

If it weren't for the desire to drop the quotes, Perl couldn't beat
Cut's golf-score (key stroke count) on this one anyway, but we can
try to optimize expressivity while saving two of three process forks.

The Perl Motto is TIMTOWTDI: There Is More Than One Way TO Do It.
(We've already seen that this is often true for BASH too.)  This is
usually a good thing, as often some are better for some requirements
than for others.

For generalness in real code, I'd like to explicitly ignore the header
line on this sort of CSV file:

$perl -F, -lane 'next if $.==1 or $F[-1] !~/univ/; print 
$F[-1]=~m/(.*)/;' 
univ of Vermont
univ of Penn
univ of south Florida
$

As with one prior posting, the '-naF,' args cause Perl to auto-split on
',' into @F on each line.  I normally used '-F, -lane' on
one-liners, since it's memorable.

The '$.==1 or' is not strictly required since the top line of the
sample file had school not university for the column head, but
it it had university or school or school/univ on line 1, would
be required.

$F[-1] is Perl's equivalent to AWK's NF, referring to the last
column, instead of by number. (By number is notoriously error prone
with 0-based field counting).  $F[-2] means last-but-one, etc, too,
and you can slice with them as @F[-6..-2] .

Rather than remove the  with s///g, I've captured what's between
them and printed that.

We can make more use of -F ... we'll split on all the punctuation.

$ perl -F'/^|,|$/' -lane 'next if $.==1 or $F[-1] !~/univ/i; print 
$F[-1]' schools.txt
univ of Vermont
univ of Penn
univ of south Florida
$

Of course, some CSV files the 's are optional. In qhich case we
can do 

$perl -F'/^|?\s*,?|$/' -lane 'next if $.==1 or $F[-1] !~/univ/i; print 
$F[-1]' schools.txt
univ of Vermont
univ of Penn
univ of south Florida
$

Alternatively, to print **any** quoted phrase containing univ,
whether in last column or not, using the commas ..

$perl -F, -lane 'for (@F){s///g; print if /univ/i}' schools.txt
univ of Vermont
univ of Penn
univ of south Florida
$

or ignoring the commas, just uses the quotes to capture between quotes,
but only if there's a univ between.  I started sneaking in a /i flag
to be case insensitive above, and I'll continue here ...

$perl -lane 'print for m{  ( [^]*? univ [^]* )  }xig' schools.txt
univ of Vermont
univ of Penn
univ of south Florida
$

[the ? isn't required but it should help efficiency.]

or

$ perl -lane 'print for grep {/univ/i} m{([^]*)}g' schools.txt
univ of Vermont
univ of Penn
univ of south Florida

There's also a CPAN module or two for processing CSV files that
handles the commas and quotes in CSV files ...
  http://search.cpan.org/search?query=Text%3A%3ACSVmode=all
Your Linux distro should have Text::CSV_XS as a apt/yum/rpm/...
module option, or grab it from CPAN and build. (It has an XS =
.c module, so is ripping fast, but has to be make'd.)


None of this is seriously obfuscatory golfing, but if someone wanted to
say darn the cost of forking new processes off bash, 'awk/cut|grep|sed'
is easier to read, well, I won't argue that it's easier for him/her
to read, and they should do it that way -- unless they need to tune
for performance.


-- 
/\ Bill Ricker  N1VUX  [EMAIL PROTECTED]
\ / http://world.std.com/~wdr/   
 X  Member of the ASCII Ribbon Campaign Against HTML Mail
/ \
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: extract string

2006-01-10 Thread Zhao Peng

Hi All,

First I really cannot be more grateful for the answers to my question 
from all of you, I appreciate your help and time. I'm especially touched 
by the outpouring of response on this list., which I have never 
experienced  before anywhere else.


Secondly I'm sorry for the big stir-up as to homework problems which 
flooded the list, since I'm origin of it.


Kenny, your grep univ abc.txt | cut -f3 -d, | sed s/\//g  dev.txt 
works. I mis-read /\ as a simliar sign on the top of 6 key on the 
keyboard(so when I typed that sign, I felt strange that it is much 
smaller than /\, but didn't realize that they just are not the same 
thing), instead of forward slash and back slash. I felt really 
embarrassed with my stupid mistake. //blush


Kenny, regarding missing column issue, let me try to explain it again. 
Below is quoted from my original post:



Also, if one column is missing, and , is used to indicate that missing 
column, like the following (2nd column of 3rd line is missing):

name,age,school
jerry ,21,univ of Vermont
jesse,,,Dartmouth college
jack,18,univ of Penn
john,20,univ of south Florida
===

You said that there is an extra column in the 3rd line. I disagree 
with you from my perspective. As you can see, there are 3 commas in 
between jesse and Dartmouth college. For these 3 commas, again, if 
we think the 2nd one as an merely indication that the value for age 
column is missing, then the 3rd line will be be read as [jesse, 
MISSING, Dartmouth college], not [jesse,empty,empty, Dartmouth 
college] as you suggested.


Paul, as to your simplest by what measurement question. I was thinking 
of both easiest to remember and easiest to understand when I was 
posting my question. Now I desire for most efficient approach. I know 
that will be my homework.


BTW,
A bit about me: I'm a junior SAS programmer at Dartmouth Medical school. 
(FYI: core strength of SAS lies in statistical analysis, I think, so you 
could say it's a statistical software, check www.sas.com). We run SAS on 
a RedHat server, but I basically know nothing about linux before I 
started working on this position(July, 2005). Fortunately, SAS 
programming doesn't require much linux knowledge. However, as you can 
imagine, at least I need to know some basic linux commands since I work 
on linux platform.


Part of my primary job responsibilities is to convert raw data into SAS 
data sets. My extract string question comes from processing a raw data 
file in .txt format, which doesn't have any documentation, except the 
variable list. By looking at the raw data, I know that each variable is 
separated by a comma. For one particular variable(column) called 
school, the length of some of its value is quite long(like: Univ of 
Wisconsin at Madison, Health Sci Ctr), but I don't know the definite 
length. I need to know it, because if the length I specify it not 
enough, only partial values will be read. Many of its values contain 
univ, so I just thought if I could extract all strings containing 
univ from that variable(column), I will have a better chance to figure 
out the length of school. That's why I had this question.


Thank you all again!

Zhao
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss