Re: [Tutor] Question on regular expressions

2013-02-12 Thread Alan Gauld

On 12/02/13 17:43, Marcin Mleczko wrote:


but I am interested only in the second part between the 2nd start and
the end: start AnotherArbitraryAmountOfText end

What would be best, most clever way to search for that?


best and clever are not always the same.

The simplest way if its a fixed string is just use the string split() 
method...  being more 'clever' you could use the re.split() method to 
handle non-constant strings. Being even more clever you can define regex 
of increasing complexity to match the Nth appearance of a pattern. These 
kinds of regex are very easy to get wrong so you have to be very clever 
to get them right.


HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2013-02-12 Thread Peter Otten
Marcin Mleczko wrote:

 given this kind of string:
 
 start SomeArbitraryAmountOfText start AnotherArbitraryAmountOfText end
 
 a search string like: rstart.*?end would give me the entire string
 from the first start to end : start SomeArbitraryAmountOfText start
 AnotherArbitraryAmountOfText end
 
 but I am interested only in the second part between the 2nd start and
 the end: start AnotherArbitraryAmountOfText end
 
 What would be best, most clever way to search for that?
 
 Or even more general: how do I exlude always the text between the last
 start and the end tag assuming the entire text contains several
 start tags spaced by an arbitrary amount of text befor the end tag?

 s = start SomeArbitraryAmountOfText start AnotherArbitraryAmountOfText 
end

 [t[::-1] for t in re.compile(dne(.*?)trats).findall(s[::-1])]
[' AnotherArbitraryAmountOfText ']

Ok, I'm not serious about this one -- but how about

 parts = (t.partition(end) for t in s.split(start))
 [left for left, mid, right in parts if mid]
[' AnotherArbitraryAmountOfText ']


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2013-02-12 Thread Mark Lawrence

On 12/02/2013 17:43, Marcin Mleczko wrote:

Hello,

given this kind of string:

start SomeArbitraryAmountOfText start AnotherArbitraryAmountOfText end

a search string like: rstart.*?end would give me the entire string
from the first start to end : start SomeArbitraryAmountOfText start
AnotherArbitraryAmountOfText end

but I am interested only in the second part between the 2nd start and
the end: start AnotherArbitraryAmountOfText end

What would be best, most clever way to search for that?

Or even more general: how do I exlude always the text between the last
start and the end tag assuming the entire text contains several
start tags spaced by an arbitrary amount of text befor the end tag?

Any ideas?

Thank you in advance. ;-)

Marcin



IMHO the best way is to use the rindex method to grab what you're after. 
 I don't do clever, it makes code too difficult to maintain.  So how about.


 a=start SomeArbitraryAmountOfText start 
AnotherArbitraryAmountOfText end

 b=start 
 x=a.rindex(b)
 y=a.rindex(' end')
 a[x+len(b):y]
'AnotherArbitraryAmountOfText'
 c=garbage in, garbage out
 x=c.rindex(b)
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: substring not found


--
Cheers.

Mark Lawrence

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Andrew Robert
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Everyone


I did a comparison of the output between the perl and python methodology.

They do basically the same thing but the perl form seems to be more true

The python method inserts extra blank lines after each hex value line.

For example:

Original text:

def handler(signal, frame):

Trap signal interrupts if they occur


Converted In Perl:

def handler%28signal%2C frame%29%3A
%22%22%22
Trap signal interrupts if they occur
%22%22%22


Converted In Python:

def handler%28signal%2C frame%29%3A

%22%22%22

Trap signal interrupts if they occur

%22%22%22

Does anyone know why this might be?

Is the print statement inserting a artificial new line character?

If so, how cam I remove that?


The python code I am using is:



import re,sys

for line i open(r'e:\pycode\sigh.txt','rb'):
print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line)



The file is being opened in rb mode because eventually binary files
would be opened via this method as well.



Alan Gauld wrote:
 a = open(r'e:\pycode\csums.txt','rb').readlines()

 for line in a:
print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line)
 
 Or just
 
 for line in open(r'e:\pycode\csums.txt','rb'):
   print.
 
 Breaking down the command, you appear to be calling an un-named function
 to act against any characters trapped by the regular expression.

 Not familiar with lamda :).
 
 You ae absolutely right.
 It creates an un-named(or anonymous function). :-)
 
 The un-named function does in-place transformation of the character to
 the established hex value.
 
 Its actually the call to re.sub() that makes in in place.
 
 How would you reverse the process from a python point of view?
 
 Just write a reverse function for the lamda...
 
 Alan G.
 
 

- --
Thank you,
Andrew Robert
Systems Architect
Information Technologies
MFS Investment Management
Phone:   617-954-5882

E-mail:  [EMAIL PROTECTED]
Linux User Number: #201204
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (MingW32)

iD8DBQFEdZ9oDvn/4H0LjDwRAo89AJwJ64+wpfOnboxw4/+w8PhmZBzgwACfYH7C
VPW5VPyqSWhAUgkoOBorjJM=
=bOj0
-END PGP SIGNATURE-
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Kent Johnson
Andrew Robert wrote:
 The python method inserts extra blank lines after each hex value line.
 Does anyone know why this might be?
 
 Is the print statement inserting a artificial new line character?

Yes, this is a feature of print, it always inserts a newline. To avoid 
this, use sys.stdout.write() instead of print:
for line i open(r'e:\pycode\sigh.txt','rb'):
 line = re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line)
 sys.stdout.write(line)

Kent

 
 If so, how cam I remove that?
 
 
 The python code I am using is:
 
 
 
 import re,sys
 
 for line i open(r'e:\pycode\sigh.txt','rb'):
 print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line)

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Andrew Robert
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Great!

Taking this a little further along, I wrote the converted file to a new
file using:


import re,sys

output = open(r'e:\pycode\out_test.txt','wb')

for line in open(r'e:\pycode\sigh.txt','rb') :
output.write( re.sub(r'([^\w\s])', lambda s: '%%%2X' %
ord(s.group()), line))

output.close()


Not elegant but its strictly for test :)


Last part and we can call it a day.

How would you modify the lambda statement to covert a the hex value back
to its original value?


Do I need to incorporate base64.16basedecode somehow?

The original perl code to covert back to normal is:

`perl -ple 's/(?:%([0-9A-F]{2}))/chr hex $1/eg' somefiletxt



Kent Johnson wrote:
 Yes, this is a feature of print, it always inserts a newline. To avoid 
 this, use sys.stdout.write() instead of print:
 for line i open(r'e:\pycode\sigh.txt','rb'):
  line = re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line)
  sys.stdout.write(line)
 
 Kent
 
snip

/snip

- --
Thank you,
Andrew Robert
Systems Architect
Information Technologies
MFS Investment Management
Phone:   617-954-5882

E-mail:  [EMAIL PROTECTED]
Linux User Number: #201204
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (MingW32)

iD8DBQFEdacCDvn/4H0LjDwRAkTWAJ4/KS6WnAgUraPZLmyPCQ45izq5tQCgl7sR
nkZbIauRcdlavA89ZhnDSuM=
=YZPS
-END PGP SIGNATURE-
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Kent Johnson
Andrew Robert wrote:
 Taking this a little further along, I wrote the converted file to a new
 file using:
 
 
 import re,sys
 
 output = open(r'e:\pycode\out_test.txt','wb')
 
 for line in open(r'e:\pycode\sigh.txt','rb') :
 output.write( re.sub(r'([^\w\s])', lambda s: '%%%2X' %
 ord(s.group()), line))
 
 output.close()
 
 
 Not elegant but its strictly for test :)
 
 
 Last part and we can call it a day.
 
 How would you modify the lambda statement to covert a the hex value back
 to its original value?

Use int(s, 16) to convert a base 16 string to an integer, and chr() to
convert the int to a string. So something like this:
lambda s: chr(int(s.group(), 16)))

Kent


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Andrew Robert
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

I tried:


output = open(r'e:\pycode\new_test.txt','wb')

for line in open(r'e:\pycode\out_test.txt','rb') :
output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(),
16))) % ord(s.group()), line))


This generated the traceback:

File E:\pycode\sample_decode_file_specials_from_hex.py, line 8
output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(),
16))) % ord(s.group()), line))

 ^
SyntaxError: invalid syntax


By any chance, do you see where the syntax issue is?


Kent Johnson wrote:
 Andrew Robert wrote:
snip

/snip

 Use int(s, 16) to convert a base 16 string to an integer, and chr() to
 convert the int to a string. So something like this:
 lambda s: chr(int(s.group(), 16)))
 
 Kent
 
 
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor
 

- --
Thank you,
Andrew Robert
Systems Architect
Information Technologies
MFS Investment Management
Phone:   617-954-5882

E-mail:  [EMAIL PROTECTED]
Linux User Number: #201204
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (MingW32)

iD8DBQFEdbMrDvn/4H0LjDwRAi09AKC1I6XIcXiqYmpk4hpcbnkwux1NawCgt/zp
xySHXPrh5JncZphAcVRtbtI=
=xtr9
-END PGP SIGNATURE-
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Kent Johnson
Andrew Robert wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Hi all,
 
 I tried:
 
 
 output = open(r'e:\pycode\new_test.txt','wb')
 
 for line in open(r'e:\pycode\out_test.txt','rb') :
 output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(),
 16))) % ord(s.group()), line))
 
 
 This generated the traceback:
 
 File E:\pycode\sample_decode_file_specials_from_hex.py, line 8
 output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(),
 16))) % ord(s.group()), line))
 
  ^
 SyntaxError: invalid syntax
 
 
 By any chance, do you see where the syntax issue is?

Take out  % ord(s.group()) - the result of chr() is the actual string 
you want, not a format string.

The syntax error is caused by mismatched parentheses.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Danny Yoo


 for line in open(r'e:\pycode\out_test.txt','rb') :
output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(),
 16))) % ord(s.group()), line))


Let's add some whitespace.

 output.write(re.sub(r'([^\w\s])',
lambda s: chr(
   int(s.group(), 16)
 )
  ) % ord(s.group()), line))

I do see at least one too many parens here, so that's something you should 
look at.

But I'd also recommend writing a helper function here.  Just because you 
can do this in one line doesn't mean you have to.  *grin* It might be 
useful to change the lambda back to a helper function.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Andrew Robert
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

When I alter the code to:

import re,sys

output = open(r'e:\pycode\new_test.txt','wb')

for line in open(r'e:\pycode\out_test.txt','rb') :
   output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16)))
, line)

output.close()

I get the trace:

Traceback (most recent call last):
  File E:\pycode\sample_decode_file_specials_from_hex.py, line 8, in ?
output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(),
16))) , line)
TypeError: sub() takes at least 3 arguments (2 given)

It appears that the code is not recognizing the line.

I checked the parentheses and they appear to be properly enclosed.

Any ideas?

Kent Johnson wrote:

snip

/snip
 Take out  % ord(s.group()) - the result of chr() is the actual string 
 you want, not a format string.
 
 The syntax error is caused by mismatched parentheses.
 
 Kent
 
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor
 

- --
Thank you,
Andrew Robert
Systems Architect
Information Technologies
MFS Investment Management
Phone:   617-954-5882

E-mail:  [EMAIL PROTECTED]
Linux User Number: #201204
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (MingW32)

iD8DBQFEdbpWDvn/4H0LjDwRAhEmAJ9WSfKitH1VgsTD5kTLI4cWP5YZRwCgs0mz
Y9jl5l6Q/VZe6NmUaibZGa4=
=nezG
-END PGP SIGNATURE-
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Andrew Robert
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

lol..


Glutton for punishment I guess.

I tried removing the last parentheses but I then get an error that two
arguments are passed when three are expected.



Danny Yoo wrote:
 
 
 for line in open(r'e:\pycode\out_test.txt','rb') :
output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(),
 16))) % ord(s.group()), line))
 
 
 Let's add some whitespace.
 
 output.write(re.sub(r'([^\w\s])',
lambda s: chr(
   int(s.group(), 16)
 )
  ) % ord(s.group()), line))
 
 I do see at least one too many parens here, so that's something you
 should look at.
 
 But I'd also recommend writing a helper function here.  Just because you
 can do this in one line doesn't mean you have to.  *grin* It might be
 useful to change the lambda back to a helper function.
 

- --
Thank you,
Andrew Robert
Systems Architect
Information Technologies
MFS Investment Management
Phone:   617-954-5882

E-mail:  [EMAIL PROTECTED]
Linux User Number: #201204
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (MingW32)

iD8DBQFEdbtlDvn/4H0LjDwRAqg+AJ0SZY/T3kCpG+3qWX3F3yRSt73P7ACdFsZQ
LnBhWh95EfuHA+eMkz6gkF4=
=C0oN
-END PGP SIGNATURE-
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Kent Johnson
Andrew Robert wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 When I alter the code to:
 
 import re,sys
 
 output = open(r'e:\pycode\new_test.txt','wb')
 
 for line in open(r'e:\pycode\out_test.txt','rb') :
output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16)))
 , line)
 
 output.close()
 
 I get the trace:
 
 Traceback (most recent call last):
   File E:\pycode\sample_decode_file_specials_from_hex.py, line 8, in ?
 output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(),
 16))) , line)
 TypeError: sub() takes at least 3 arguments (2 given)
 
 It appears that the code is not recognizing the line.
 
 I checked the parentheses and they appear to be properly enclosed.
 
 Any ideas?

You have an argument in the wrong place. Stop trying to do everything in 
one line! Put the lambda in a def'd function. Put the re.sub on it's own 
line. You are tripping over unnecessary complexity. I'm not going to fix 
it any more.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Andrew Robert
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Kent,


Sorry for causing so much trouble.

I am not married to either a single or multi-line solution one way or
another.


Just a solution that works.

Based on something by Danny Yoo provided, I had started with something like:



import re,base64

# Evaluate captured character as hex
def ret_hex(value):
return base64.b16encode(value)

def ret_ascii(value):
return base64.b16decode(value)

# Evaluate the value of whatever was matched
def eval_match(match):
return ret_ascii(match.group(0))


out=open(r'e:\pycode\sigh.new2','wb')

# Read each line, pass any matches on line to function for
# line in file.readlines():
for line in open(r'e:\pycode\sigh.new','rb'):
print (re.sub('[^\w\s]',eval_match, line))



The char to hex pass works but omits the leading x.

The hex to char pass does not appear to work at all.

No error is generated. It just appears to be ignored.



Kent Johnson wrote:
 Andrew Robert wrote:
snip
/snip

 
 You have an argument in the wrong place. Stop trying to do everything in 
 one line! Put the lambda in a def'd function. Put the re.sub on it's own 
 line. You are tripping over unnecessary complexity. I'm not going to fix 
 it any more.
 
 Kent
 
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor
 

- --
Thank you,
Andrew Robert
Systems Architect
Information Technologies
MFS Investment Management
Phone:   617-954-5882

E-mail:  [EMAIL PROTECTED]
Linux User Number: #201204
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (MingW32)

iD8DBQFEddHSDvn/4H0LjDwRAibnAJ4/6/IiPtz7k+jIa01kRe1X25UNkACfaq24
bbqKqyOZyLpCRBEHbrO7H7A=
=8+rq
-END PGP SIGNATURE-
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Andrew Robert
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Everyone,


Thanks for all of your patience on this.

I finally got it to work.


Here is the completed test code showing what is going on.

Not cleaned up yet but it works for proof-of-concept purposes.



#!/usr/bin/python

import re,base64

# Evaluate captured character as hex
def ret_hex(value):
return '%'+base64.b16encode(value)

# Evaluate the value of whatever was matched
def enc_hex_match(match):
return ret_hex(match.group(0))

def ret_ascii(value):
return base64.b16decode(value)

# Evaluate the value of whatever was matched
def enc_ascii_match(match):

arg=match.group()

#remove the artifically inserted % sign
arg=arg[1:]

# decode the result
return ret_ascii(arg)

def file_encoder():
# Read each line, pass any matches on line to function for
# line in file.readlines():
output=open(r'e:\pycode\sigh.new','wb')
for line in open(r'e:\pycode\sigh.txt','rb'):
 output.write( (re.sub('[^\w\s]',enc_hex_match, line)) )
output.close()


def file_decoder():
# Read each line, pass any matches on line to function for
# line in file.readlines():

output=open(r'e:\pycode\sigh.new2','wb')
for line in open(r'e:\pycode\sigh.new','rb'):
output.write(re.sub('%[0-9A-F][0-9A-F]',enc_ascii_match, line))
output.close()




file_encoder()

file_decoder()
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (MingW32)

iD8DBQFEdfQQDvn/4H0LjDwRAnbIAJ0cD9fdtIqtpfksP07n02Er9YMPiwCfTSsC
pCVDgnQ8pbZS40BuA8gNNBQ=
=mPoG
-END PGP SIGNATURE-
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions (fwd)

2006-05-25 Thread Terry Carroll
On Thu, 25 May 2006, Alan Gauld wrote:

 In general I prefer to use string formatting to convert into hex 
 format.

I'm a big fan of hexlify:

 from binascii import hexlify
 s=abc-123
 hexlify(s)
'6162632d313233'



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-25 Thread Alan Gauld
 for line in open(r'e:\pycode\out_test.txt','rb') :
output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(),
 16))) % ord(s.group()), line))

 This generated the traceback:

 File E:\pycode\sample_decode_file_specials_from_hex.py, line 8
output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(),
 16))) % ord(s.group()), line))

 ^
 SyntaxError: invalid syntax


 By any chance, do you see where the syntax issue is?

Andrew, This is a good place to use the Python interactive prompt.
Try the various bits in the interpreter to find out what causes the 
error.
To be honest I'd break that single line into at least 2 if not 3 lines
anyway purely from a debug and maintenance point of view.
You are in real danger of turning Python into perl here! :-)

As to your error:

output.write(
  re.sub(
r'([^\w\s])',
lambda s: chr(int(s.group(),16))
) % ord(s.group()),
  line))

the parens dont seem to match up... Or am I miscounting?

Alan G


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-24 Thread Danny Yoo

 perl -ple s/([^\w\s])/sprintf(q#%%%2X#, ord $1)/ge  somefile.txt

Hi Andrew,


Give me a second.  I'm trying to understand the command line switches:

(Looking in 'perl --help'...)

   -p  assume loop like -n but print line also, like sed
   -l[octal]   enable line ending processing, specifies line terminator
   -e program  one line of program (several -e's allowed, omit programfile)

and the regular expression modifiers there --- 'g' and 'e' --- mean ... 
(reading 'perldoc perlop'...)

g   Match globally, i.e., find all occurrences.
e   Evaluate the right side as an expression.


Ok, I have a better idea of what's going on here now.  This takes a file, 
and translates every non-whitespace character into a hex string.  That's a 
dense one-liner.



 How would you convert this to a python equivalent using the re or 
 similar module?

The substitution on the right hand side in the Perl code actually 
is evaluated rather than literally substituted.  To get the same effect 
from Python, we pass a function off as the substituting value to re.sub().


For example, we can translate every word-like character by shifting it
one place ('a' - 'b', 'b' - 'c', etc...)

###
 import re
 def rot1(ch):
... return chr((ord(ch) + 1) % 256)
...
 def rot1_on_match(match):
... return rot1(match.group(0))
...
 re.sub(r'\w', rot1_on_match, hello world)
'ifmmp xpsme'
###



 I've begun reading about using re expressions at
 http://www.amk.ca/python/howto/regex/ but I am still hazy on implementation.

The part in:

http://www.amk.ca/python/howto/regex/regex.html#SECTION00062

that talks about a replacement function is relevant to what you're 
asking.  We need to provide a replacement function to simulate the 
right-hand-side evaluation that's happening in the Perl code.



Good luck!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-24 Thread Karl Pflästerer
On 24 Mai 2006, [EMAIL PROTECTED] wrote:

 I have two Perl expressions


 If windows:

 perl -ple s/([^\w\s])/sprintf(q#%%%2X#, ord $1)/ge  somefile.txt

 If posix

 perl -ple 's/([^\w\s])/sprintf(%%%2X, ord $1)/ge'  somefile.txt



 The [^\w\s]  is a negated expression stating that any character
 a-zA-Z0-9_, space or tab is ignored.

 The () captures whatever matches and throws it into the $1 for
 processing by the sprintf

 In this case, %%%2X which is a three character hex value.

 How would you convert this to a python equivalent using the re or
 similar module?

python -c import re, sys;print re.sub(r'([^\w\s])', lambda s: '%%%2X' % 
ord(s.group()), sys.stdin.read()),  somefile

It's not as short as the Perl version (and might have problems with big
files). Python does not have such useful command line switches like -p
(but you doesn't use Python so much for one liners as Perl) but it does
the same ; at least in this special case (Python lacks something like the
-l switch).

With bash it's a bit easier. (maybe there's also a way with cmd.com to
write multiple lines)?

$ python -c import re,sys
for line in sys.stdin: print re.sub(r'([^\w\s])', lambda s: '%%%2X' % 
ord(s.group()), line),  somefile


   Karl
-- 
Please do *not* send copies of replies to me.
I read the list
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-24 Thread Andrew Robert
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Wow!!..

That awesome!


My goal was not to make it a one-liner per-se..

I was simply trying to show the functionality I was trying to duplicate.

Boiling your one-liner down into a multi-line piece of code, I did:

#!c:\python24\python

import re,sys

a = open(r'e:\pycode\csums.txt','rb').readlines()

for line in a:
print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line)


Breaking down the command, you appear to be calling an un-named function
to act against any characters trapped by the regular expression.

Not familiar with lamda :).

The un-named function does in-place transformation of the character to
the established hex value.


Does this sound right?

If I then saved the altered output to a file and wanted to transform it
back to its original form, I would do the following in perl.


perl -ple 's/(?:%([0-9A-F]{2}))/chr hex $1/eg' somefiletxt

How would you reverse the process from a python point of view?


snip

/snip
Karl Pflästerer wrote:
 python -c import re, sys;print re.sub(r'([^\w\s])', lambda s: '%%%2X' % 
 ord(s.group()), sys.stdin.read()),  somefile
 
 It's not as short as the Perl version (and might have problems with big
 files). Python does not have such useful command line switches like -p
 (but you doesn't use Python so much for one liners as Perl) but it does
 the same ; at least in this special case (Python lacks something like the
 -l switch).
 
 With bash it's a bit easier. (maybe there's also a way with cmd.com to
 write multiple lines)?
 
 $ python -c import re,sys
 for line in sys.stdin: print re.sub(r'([^\w\s])', lambda s: '%%%2X' % 
 ord(s.group()), line),  somefile
 
 
Karl

- --
Thank you,
Andrew Robert
Systems Architect
Information Technologies
MFS Investment Management
Phone:   617-954-5882

E-mail:  [EMAIL PROTECTED]
Linux User Number: #201204
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (MingW32)

iD8DBQFEdPFwDvn/4H0LjDwRAuzuAKCOPja9Js1ueP2GoT+B0hoFubDEegCguzfT
QL87gmKUx6znmGQxXqg6V+A=
=7MT2
-END PGP SIGNATURE-
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions (fwd)

2006-05-24 Thread Danny Yoo
[forwarding to tutor, although it looks like Andrew's making some good 
headway from other messages]

-- Forwarded message --
Date: Wed, 24 May 2006 14:59:43 -0400
From: Andrew Robert [EMAIL PROTECTED]
To: Danny Yoo [EMAIL PROTECTED]
Subject: Re: [Tutor] Question on regular expressions

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hey Danny,

Your code put me right on track.

- From that point, I crafted the following code.

What is confusing is how to take the captured character and transform it
into a 3 digit hex value.

Do you know how that might be accomplished?


#!/usr/bin/python

import re

# Evaluate captured character as hex
def ret_hex(ch):
return chr((ord(ch) + 1 ) % 256 )

# Evaluate the value of whatever was matched
def eval_match(match):
return ret_hex(match.group(0))

# open file
file = open(r'm:\mq\mq\scripts\sigh.txt','r')

# Read each line, pass any matches on line to function for
# line in file.readlines():
for line in file:
a=re.sub('[^\w\s]',eval_match, line)
print a


- --
Thank you,
Andrew Robert
Systems Architect
Information Technologies
MFS Investment Management
Phone:   617-954-5882

E-mail:  [EMAIL PROTECTED]
Linux User Number: #201204
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (MingW32)

iD8DBQFEdK0fDvn/4H0LjDwRAuipAKDFqOeQQkJ+WkaI+veIgC8oEn9/CQCfUfNO
xb7AT8W04B/F684i+Lw6kxw=
=5mPe
-END PGP SIGNATURE-

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-24 Thread Kent Johnson
Andrew Robert wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Wow!!..
 
 That awesome!
 
 
 My goal was not to make it a one-liner per-se..
 
 I was simply trying to show the functionality I was trying to duplicate.
 
 Boiling your one-liner down into a multi-line piece of code, I did:
 
 #!c:\python24\python
 
 import re,sys
 
 a = open(r'e:\pycode\csums.txt','rb').readlines()
 
 for line in a:

You probably want to open the file in text mode, not binary. You don't 
have to read all the lines of the file, you can iterate reading one line 
at a time. Combining these two changes, the above two lines consolidate to
for line in open(r'e:\pycode\csums.txt'):

 print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line)
 
 
 Breaking down the command, you appear to be calling an un-named function
 to act against any characters trapped by the regular expression.
 
 Not familiar with lamda :).

It is a way to make an anonymous function, occasionally abused to write 
Python one-liners. You could just as well spell it out:
def hexify(match):
 return ''%%%2X' % ord(match.group())

print re.sub(r'([^\w\s])', hexify, line)

 
 The un-named function does in-place transformation of the character to
 the established hex value.
 
 
 Does this sound right?
Yes.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions

2006-05-24 Thread Alan Gauld
 a = open(r'e:\pycode\csums.txt','rb').readlines()

 for line in a:
print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), 
 line)

Or just

for line in open(r'e:\pycode\csums.txt','rb'):
   print.

 Breaking down the command, you appear to be calling an un-named 
 function
 to act against any characters trapped by the regular expression.

 Not familiar with lamda :).

You ae absolutely right.
It creates an un-named(or anonymous function). :-)

 The un-named function does in-place transformation of the character 
 to
 the established hex value.

Its actually the call to re.sub() that makes in in place.

 How would you reverse the process from a python point of view?

Just write a reverse function for the lamda...

Alan G. 


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question on regular expressions (fwd)

2006-05-24 Thread Alan Gauld
 Your code put me right on track.

 - From that point, I crafted the following code.

 What is confusing is how to take the captured character and 
 transform it
 into a 3 digit hex value.

In general I prefer to use string formatting to convert into hex 
format.

print %3X% % myValue

you can play around with the length specifier,
left/right formatting etc etc. Think sprintf in C...

Alan G.


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor