Re: Convert AWK regex to Python

2011-05-17 Thread J
Hello,

I have managed to get my script finished in the end by taking bits from 
everyone who answered.  Thank you so much.  the finished query string looks 
like this (still not the best but it gets the job done.  Once I learn to code 
more with Python I will probably go back to it and re-write it):-

# Log file to work on
filetoread = open(/tmp/pdu.log, r)
# Perform filtering in the log file
text = filetoread.read()
text = text.replace(G_, )
text = text.replace(.,  )
text = text.replace(r(,  )
filetoread.close()
# File to write output to
filetowrite = file(/tmp/pdu_filtered.log, w)
# Write new log file
filetowrite.write(text)
filetowrite.close()
# Read new log and get required fields from it
filtered_log =  open(/tmp/pdu_filtered.log, r)
filtered_line = filtered_log.readlines()
for line in filtered_line:
field = line.split( )
field5 = field[5].rsplit(_, 1)
print field5[0], field[14], field[22]
print Done
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-17 Thread AlienBaby
On May 17, 11:07 am, J jnr.gonza...@googlemail.com wrote:
 Hello,

 I have managed to get my script finished in the end by taking bits from 
 everyone who answered.  Thank you so much.  the finished query string looks 
 like this (still not the best but it gets the job done.  Once I learn to code 
 more with Python I will probably go back to it and re-write it):-

 # Log file to work on
 filetoread = open(/tmp/pdu.log, r)
 # Perform filtering in the log file
 text = filetoread.read()
 text = text.replace(G_, )
 text = text.replace(.,  )
 text = text.replace(r(,  )
 filetoread.close()
 # File to write output to
 filetowrite = file(/tmp/pdu_filtered.log, w)
 # Write new log file
 filetowrite.write(text)
 filetowrite.close()
 # Read new log and get required fields from it
 filtered_log =  open(/tmp/pdu_filtered.log, r)
 filtered_line = filtered_log.readlines()
 for line in filtered_line:
         field = line.split( )
         field5 = field[5].rsplit(_, 1)
         print field5[0], field[14], field[22]
 print Done

You can also process the lines and write them out to the new logfile
as you read them in first time around, rather than: read them in,
process them, write them out, read them in, process them, write them
out;

log_file=open(old_log_file,r)
output_file=open(new_log_file,w)
for line in log_file:
 line=line.replace(G_, ).replace(.,  ).replace((,  )
 tokens=line.split()
 tokens_5=tokens[5].rsplit(_,1)
 output.file_write('%s %s %s\n' % (tokens_5,tokens[14],tokens[22]))
output_file.close()
log_file.close()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-17 Thread harrismh777

J wrote:

Hello,


Hello, J,

   This is totally off-topic, but I was wondering why you are posting 
with double messages (triple) all over the place?


   Your reply-to is set to comp.lang.pyt...@googlegroups.com, and you 
cc to python-list@python.org...  and you're stuff is showing up in 
newsgroup comp.lang.python...


... did you know that all you need to do is use the newsgroup (use a 
client like sea monkey, or other... ) and your posts will show up in 
googlegroups, and will also be archived forever authomatically?


   Your messages are duplicating which is not only annoying on the 
surface, but also breaks the threads apart in a news client. What should 
appear (and does in googlegroups) as a single thread appears as many 
threads in the mail client; partly because of the RE: in the subject, 
and partly because of the reduplication.


   Just a heads up


kind regards,
m harris
--
http://mail.python.org/mailman/listinfo/python-list


Convert AWK regex to Python

2011-05-16 Thread J
Good morning all,
Wondering if you could please help me with the following query:-
I have just started learning Python last weekend after a colleague of mine 
showed me how to dramatically cut the time a Bash script takes to execute by 
re-writing it in Python.  I was amazed at how fast it ran.  I would now like to 
do the same thing with another script I have.

This other script reads a log file and using AWK it filters certain fields from 
the log and writes them to a new file.  See below the regex the script is 
executing.  I would like to re-write this regex in Python as my script is 
currently taking about 1 hour to execute on a log file with about 100,000 
lines.  I would like to cut this time down as much as possible.

cat logs/pdu_log_fe.log | awk -F\- '{print $1,$NF}' | awk -F\. '{print $1,$NF}' 
| awk '{print $1,$4,$5}' | sort | uniq | while read service command status; do 
echo Service: $service, Command: $command, Status: $status, Occurrences: `grep 
$service logs/pdu_log_fe.log | grep $command | grep $status | wc -l | awk '{ 
print $1 }'`  logs/pdu_log_fe_clean.log; done

This AWK command gets lines which look like this:-

2011-05-16 09:46:22,361 [Thread-4847133] PDU D G_CC_SMS_SERVICE_51408_656.O_ 
CC_SMS_SERVICE_51408_656-ServerThread-VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
 - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 8004 Status: 0 
SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) 

And outputs lines like this:-

CC_SMS_SERVICE_51408 submit_resp: 0

I have tried writing the Python script myself but I am getting stuck writing 
the regex.  So far I have the following:-

#!/usr/bin/python

# Import RegEx module
import re as regex
# Log file to work on
filetoread = open('/tmp/ pdu_log.log', r)
# File to write output to
filetowrite =  file('/tmp/ pdu_log_clean.log', w)
# Perform filtering in the log file
linetoread = filetoread.readlines()
for line in linetoread:
filter0 = regex.sub(rG_,,line)
filter1 = regex.sub(r\., ,filter0)
# Write new log file
filetowrite.write(filter1)
filetowrite.close()
# Read new log and get required fields from it
filtered_log =  open('/tmp/ pdu_log_clean.log', r)
filtered_line = filtered_log.readlines()
for line in filtered_line:
token = line.split( )
print token[0], token[1], token[5], token[13], token[20]
print Done

Ugly I know but please bear in mind that I have just started learning Python 
two days ago.

I have been looking on this group and on the Internet for snippets of code that 
I could use but so far what I have found do not fit my needs or are too 
complicated (at least for me).

Any suggestion, advice you can give me on how to accomplish this task will be 
greatly appreciated.

On another note, can you also recommend a good no-nonsense book to learn 
Python?  I have read the book “A Byte of Python” by Swaroop C H (great 
introductory book!) and I am now reading “Dive into Python” by Mark Pilgrim.  I 
am looking for a book that explains things in simple terms and goes straight to 
the point (similar to how “A Byte of Python” was written)

Thanks in advance

Kind regards,

Junior
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-16 Thread Chris Angelico
On Mon, May 16, 2011 at 6:19 PM, J jnr.gonza...@googlemail.com wrote:
 cat logs/pdu_log_fe.log | awk -F\- '{print $1,$NF}' | awk -F\. '{print 
 $1,$NF}' | awk '{print $1,$4,$5}' | sort | uniq | while read service command 
 status; do echo Service: $service, Command: $command, Status: $status, 
 Occurrences: `grep $service logs/pdu_log_fe.log | grep $command | grep 
 $status | wc -l | awk '{ print $1 }'`  logs/pdu_log_fe_clean.log; done

Small side point: Instead of | sort | uniq |, you could use a Python
dictionary. That'll likely speed things up somewhat!

Chris Angelico
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-16 Thread J
Good morning Angelico,
Do I understand correctly? Do you mean incorporating a Python dict inside the 
AWK command? How can I do this?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-16 Thread Chris Angelico
On Mon, May 16, 2011 at 6:43 PM, J jnr.gonza...@googlemail.com wrote:
 Good morning Angelico,
 Do I understand correctly? Do you mean incorporating a Python dict inside the 
 AWK command? How can I do this?

No, inside Python. What I mean is that you can achieve the same
uniqueness requirement by simply storing the intermediate data in a
dictionary and then retrieving it at the end.

Chris Angelico
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-16 Thread Peter Otten
J wrote:

 Good morning all,
 Wondering if you could please help me with the following query:-
 I have just started learning Python last weekend after a colleague of mine
 showed me how to dramatically cut the time a Bash script takes to execute
 by re-writing it in Python.  I was amazed at how fast it ran.  I would now
 like to do the same thing with another script I have.
 
 This other script reads a log file and using AWK it filters certain fields
 from the log and writes them to a new file.  See below the regex the
 script is executing.  I would like to re-write this regex in Python as my
 script is currently taking about 1 hour to execute on a log file with
 about 100,000 lines.  I would like to cut this time down as much as
 possible.
 
 cat logs/pdu_log_fe.log | awk -F\- '{print $1,$NF}' | awk -F\. '{print
 $1,$NF}' | awk '{print $1,$4,$5}' | sort | uniq | while read service
 command status; do echo Service: $service, Command: $command, Status:
 $status, Occurrences: `grep $service logs/pdu_log_fe.log | grep $command |
 grep $status | wc -l | awk '{ print $1 }'`  logs/pdu_log_fe_clean.log;
 done
 
 This AWK command gets lines which look like this:-
 
 2011-05-16 09:46:22,361 [Thread-4847133] PDU D
 G_CC_SMS_SERVICE_51408_656.O_
 CC_SMS_SERVICE_51408_656-ServerThread-
VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
 - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 8004
 Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) 
 
 And outputs lines like this:-
 
 CC_SMS_SERVICE_51408 submit_resp: 0
 
 I have tried writing the Python script myself but I am getting stuck
 writing the regex.  So far I have the following:-

For the moment forget about the implementation. The first thing you should 
do is to describe the problem as clearly as possible, in plain English.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-16 Thread J
Hello Peter, Angelico,

Ok lets see, My aim is to filter out several fields from a log file and write 
them to a new log file.  The current log file, as I mentioned previously, has 
thousands of lines like this:-
2011-05-16 09:46:22,361 [Thread-4847133] PDU D G_CC_SMS_SERVICE_51408_656.O_ 
CC_SMS_SERVICE_51408_656-ServerThread-VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
 - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 8004 Status: 0 
SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) 

All the lines in the log file are similar and they all have the same length 
(same amount of fields).  Most of the fields are separated by spaces except for 
couple of them which I am processing with AWK (removing G_ from the string 
for example).  So in essence what I want to do is evaluate each line in the log 
file and break them down into fields which I can call individually and write 
them to a new log file (for example selecting only fields 1, 2 and 3).

I hope this is clearer now

Regards,

Junior
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-16 Thread Steven D'Aprano
On Mon, 16 May 2011 03:57:49 -0700, J wrote:

 Most of the fields are separated by
 spaces except for couple of them which I am processing with AWK
 (removing G_ from the string for example).  So in essence what I want
 to do is evaluate each line in the log file and break them down into
 fields which I can call individually and write them to a new log file
 (for example selecting only fields 1, 2 and 3).

fields = line.split(' ')
output.write(fields[1] + ' ')
output.write(fields[2] + ' ')
output.write(fields[3] + '\n')



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-16 Thread Peter Otten
J wrote:

 Hello Peter, Angelico,
 
 Ok lets see, My aim is to filter out several fields from a log file and
 write them to a new log file.  The current log file, as I mentioned
 previously, has thousands of lines like this:- 2011-05-16 09:46:22,361
 [Thread-4847133] PDU D G_CC_SMS_SERVICE_51408_656.O_
 CC_SMS_SERVICE_51408_656-ServerThread-
VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
 - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 8004
 Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) 
 
 All the lines in the log file are similar and they all have the same
 length (same amount of fields).  Most of the fields are separated by
 spaces except for couple of them which I am processing with AWK (removing
 G_ from the string for example).  So in essence what I want to do is
 evaluate each line in the log file and break them down into fields which I
 can call individually and write them to a new log file (for example
 selecting only fields 1, 2 and 3).
 
 I hope this is clearer now

Not much :( 

It doesn't really matter whether there are 100, 1000, or a million lines in 
the file; the important information is the structure of the file. You may be 
able to get away with a quick and dirty script consisting of just a few 
regular expressions, e. g.

import re

filename = ...

def get_service(line):
return re.compile(r[(](\w+)).search(line).group(1)

def get_command(line):
return re.compile(rG_(\w+)).search(line).group(1)

def get_status(line):
return re.compile(rStatus:\s+(\d+)).search(line).group(1)

with open(filename) as infile:
for line in infile:
print get_service(line), get_command(line), get_status(line)

but there is no guarantee that there isn't data in your file that breaks the 
implied assumptions. Also, from the shell hackery it looks like your 
ultimate goal seems to be a kind of frequency table which could be built 
along these lines:

freq = {}
with open(filename) as infile:
for line in infile:
service = get_service(line)
command = get_command(line)
status = get_status(line)
key = command, service, status
freq[key] = freq.get(key, 0) + 1

for key, occurences in sorted(freq.iteritems()):
print Service: {}, Command: {}, Status: {}, Occurences: {}.format(*key 
+ (occurences,))

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-16 Thread J
Thanks for the sugestions Peter, I will give them a try

Peter Otten wrote:
 J wrote:

  Hello Peter, Angelico,
 
  Ok lets see, My aim is to filter out several fields from a log file and
  write them to a new log file.  The current log file, as I mentioned
  previously, has thousands of lines like this:- 2011-05-16 09:46:22,361
  [Thread-4847133] PDU D G_CC_SMS_SERVICE_51408_656.O_
  CC_SMS_SERVICE_51408_656-ServerThread-
 VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
  - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 8004
  Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) 
 
  All the lines in the log file are similar and they all have the same
  length (same amount of fields).  Most of the fields are separated by
  spaces except for couple of them which I am processing with AWK (removing
  G_ from the string for example).  So in essence what I want to do is
  evaluate each line in the log file and break them down into fields which I
  can call individually and write them to a new log file (for example
  selecting only fields 1, 2 and 3).
 
  I hope this is clearer now

 Not much :(

 It doesn't really matter whether there are 100, 1000, or a million lines in
 the file; the important information is the structure of the file. You may be
 able to get away with a quick and dirty script consisting of just a few
 regular expressions, e. g.

 import re

 filename = ...

 def get_service(line):
 return re.compile(r[(](\w+)).search(line).group(1)

 def get_command(line):
 return re.compile(rG_(\w+)).search(line).group(1)

 def get_status(line):
 return re.compile(rStatus:\s+(\d+)).search(line).group(1)

 with open(filename) as infile:
 for line in infile:
 print get_service(line), get_command(line), get_status(line)

 but there is no guarantee that there isn't data in your file that breaks the
 implied assumptions. Also, from the shell hackery it looks like your
 ultimate goal seems to be a kind of frequency table which could be built
 along these lines:

 freq = {}
 with open(filename) as infile:
 for line in infile:
 service = get_service(line)
 command = get_command(line)
 status = get_status(line)
 key = command, service, status
 freq[key] = freq.get(key, 0) + 1

 for key, occurences in sorted(freq.iteritems()):
 print Service: {}, Command: {}, Status: {}, Occurences: {}.format(*key
 + (occurences,))
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-16 Thread Giacomo Boffi
J jnr.gonza...@googlemail.com writes:

 cat logs/pdu_log_fe.log | awk -F\- '{print $1,$NF}' | awk -F\. '{print 
 $1,$NF}' | awk '{print $1,$4,$5}' | sort | uniq | while read service command 
 status; do echo Service: $service, Command: $command, Status: $status, 
 Occurrences: `grep $service logs/pdu_log_fe.log | grep $command | grep 
 $status | wc -l | awk '{ print $1 }'`  logs/pdu_log_fe_clean.log; done

 This AWK command gets lines which look like this:-

 2011-05-16 09:46:22,361 [Thread-4847133] PDU D G_CC_SMS_SERVICE_51408_656.O_ 
 CC_SMS_SERVICE_51408_656-ServerThread-VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
  - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 8004 Status: 
 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) 

 And outputs lines like this:-

 CC_SMS_SERVICE_51408 submit_resp: 0


i see some discrepancies in the description of your problem

1. if i echo a properly quoted line like this above in the pipeline
   formed by the first three awk commands i get

$ echo $likethis | awk -F\- '{print $1,$NF}' \
 | awk -F\. '{print$1,$NF}'  \
 | awk '{print $1,$4,$5}'
2011 ) )
$ 
   not a triple 'service command status'

2. with regard to the final product, you script outputs lines like in

echo Service: $service, [...]

   and you say that it produces lines like

CC_SMS_SERVICE_51408 submit_resp: 


WHATEVER, the abnormous run time is due to the fact that for every
output line you rescan again and again the whole log file

IF i had understood what you want, imho you should run your data
through sort and uniq -c

$ awk -F\- '{print $1,$NF}'  $file \
| awk -F\. '{print$1,$NF}'  \
| awk '{print $1,$4,$5}'| sort | uniq -c | format_program

uniq -c drops repeated lines from a sorted input AND prepends to each
line the count of equal lines in the original stream

hth
g
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-16 Thread Matt Berends
This doesn't directly bear upon the posted example, but I found the
following tutorial extremely helpful for learning how to parse log
files with idiomatic python. Maybe you'll might find it useful, too.

http://www.dabeaz.com/generators/

http://www.dabeaz.com/generators/Generators.pdf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert AWK regex to Python

2011-05-16 Thread MRAB

On 16/05/2011 09:19, J wrote:
[snip]

#!/usr/bin/python

# Import RegEx module
import re as regex
# Log file to work on
filetoread = open('/tmp/ pdu_log.log', r)
# File to write output to
filetowrite =  file('/tmp/ pdu_log_clean.log', w)
# Perform filtering in the log file
linetoread = filetoread.readlines()
for line in linetoread:
 filter0 = regex.sub(rG_,,line)
 filter1 = regex.sub(r\., ,filter0)
# Write new log file
 filetowrite.write(filter1)
filetowrite.close()
# Read new log and get required fields from it
filtered_log =  open('/tmp/ pdu_log_clean.log', r)
filtered_line = filtered_log.readlines()
for line in filtered_line:
 token = line.split( )
 print token[0], token[1], token[5], token[13], token[20]
print Done


[snip]

If you don't need the power of regex, it's faster to use string methods:

 filter0 = line.replace(G_, )
 filter1 = filter0.replace(.,  )

Actually, seeing as how you're reading all the lines in one go anyway,
it's probably faster to do this instead:

text = filetoread.read()
text = text.replace(G_, )
text = text.replace(.,  )
# Write new log file
filetowrite.write(text)
--
http://mail.python.org/mailman/listinfo/python-list