Re: [Tutor] Question on regular expressions
On 12/02/13 17:43, Marcin Mleczko wrote: but I am interested only in the second part between the 2nd start and the end: start AnotherArbitraryAmountOfText end What would be best, most clever way to search for that? best and clever are not always the same. The simplest way if its a fixed string is just use the string split() method... being more 'clever' you could use the re.split() method to handle non-constant strings. Being even more clever you can define regex of increasing complexity to match the Nth appearance of a pattern. These kinds of regex are very easy to get wrong so you have to be very clever to get them right. HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
Marcin Mleczko wrote: given this kind of string: start SomeArbitraryAmountOfText start AnotherArbitraryAmountOfText end a search string like: rstart.*?end would give me the entire string from the first start to end : start SomeArbitraryAmountOfText start AnotherArbitraryAmountOfText end but I am interested only in the second part between the 2nd start and the end: start AnotherArbitraryAmountOfText end What would be best, most clever way to search for that? Or even more general: how do I exlude always the text between the last start and the end tag assuming the entire text contains several start tags spaced by an arbitrary amount of text befor the end tag? s = start SomeArbitraryAmountOfText start AnotherArbitraryAmountOfText end [t[::-1] for t in re.compile(dne(.*?)trats).findall(s[::-1])] [' AnotherArbitraryAmountOfText '] Ok, I'm not serious about this one -- but how about parts = (t.partition(end) for t in s.split(start)) [left for left, mid, right in parts if mid] [' AnotherArbitraryAmountOfText '] ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
On 12/02/2013 17:43, Marcin Mleczko wrote: Hello, given this kind of string: start SomeArbitraryAmountOfText start AnotherArbitraryAmountOfText end a search string like: rstart.*?end would give me the entire string from the first start to end : start SomeArbitraryAmountOfText start AnotherArbitraryAmountOfText end but I am interested only in the second part between the 2nd start and the end: start AnotherArbitraryAmountOfText end What would be best, most clever way to search for that? Or even more general: how do I exlude always the text between the last start and the end tag assuming the entire text contains several start tags spaced by an arbitrary amount of text befor the end tag? Any ideas? Thank you in advance. ;-) Marcin IMHO the best way is to use the rindex method to grab what you're after. I don't do clever, it makes code too difficult to maintain. So how about. a=start SomeArbitraryAmountOfText start AnotherArbitraryAmountOfText end b=start x=a.rindex(b) y=a.rindex(' end') a[x+len(b):y] 'AnotherArbitraryAmountOfText' c=garbage in, garbage out x=c.rindex(b) Traceback (most recent call last): File stdin, line 1, in module ValueError: substring not found -- Cheers. Mark Lawrence ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Everyone I did a comparison of the output between the perl and python methodology. They do basically the same thing but the perl form seems to be more true The python method inserts extra blank lines after each hex value line. For example: Original text: def handler(signal, frame): Trap signal interrupts if they occur Converted In Perl: def handler%28signal%2C frame%29%3A %22%22%22 Trap signal interrupts if they occur %22%22%22 Converted In Python: def handler%28signal%2C frame%29%3A %22%22%22 Trap signal interrupts if they occur %22%22%22 Does anyone know why this might be? Is the print statement inserting a artificial new line character? If so, how cam I remove that? The python code I am using is: import re,sys for line i open(r'e:\pycode\sigh.txt','rb'): print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line) The file is being opened in rb mode because eventually binary files would be opened via this method as well. Alan Gauld wrote: a = open(r'e:\pycode\csums.txt','rb').readlines() for line in a: print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line) Or just for line in open(r'e:\pycode\csums.txt','rb'): print. Breaking down the command, you appear to be calling an un-named function to act against any characters trapped by the regular expression. Not familiar with lamda :). You ae absolutely right. It creates an un-named(or anonymous function). :-) The un-named function does in-place transformation of the character to the established hex value. Its actually the call to re.sub() that makes in in place. How would you reverse the process from a python point of view? Just write a reverse function for the lamda... Alan G. - -- Thank you, Andrew Robert Systems Architect Information Technologies MFS Investment Management Phone: 617-954-5882 E-mail: [EMAIL PROTECTED] Linux User Number: #201204 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (MingW32) iD8DBQFEdZ9oDvn/4H0LjDwRAo89AJwJ64+wpfOnboxw4/+w8PhmZBzgwACfYH7C VPW5VPyqSWhAUgkoOBorjJM= =bOj0 -END PGP SIGNATURE- ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
Andrew Robert wrote: The python method inserts extra blank lines after each hex value line. Does anyone know why this might be? Is the print statement inserting a artificial new line character? Yes, this is a feature of print, it always inserts a newline. To avoid this, use sys.stdout.write() instead of print: for line i open(r'e:\pycode\sigh.txt','rb'): line = re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line) sys.stdout.write(line) Kent If so, how cam I remove that? The python code I am using is: import re,sys for line i open(r'e:\pycode\sigh.txt','rb'): print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line) ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Great! Taking this a little further along, I wrote the converted file to a new file using: import re,sys output = open(r'e:\pycode\out_test.txt','wb') for line in open(r'e:\pycode\sigh.txt','rb') : output.write( re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line)) output.close() Not elegant but its strictly for test :) Last part and we can call it a day. How would you modify the lambda statement to covert a the hex value back to its original value? Do I need to incorporate base64.16basedecode somehow? The original perl code to covert back to normal is: `perl -ple 's/(?:%([0-9A-F]{2}))/chr hex $1/eg' somefiletxt Kent Johnson wrote: Yes, this is a feature of print, it always inserts a newline. To avoid this, use sys.stdout.write() instead of print: for line i open(r'e:\pycode\sigh.txt','rb'): line = re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line) sys.stdout.write(line) Kent snip /snip - -- Thank you, Andrew Robert Systems Architect Information Technologies MFS Investment Management Phone: 617-954-5882 E-mail: [EMAIL PROTECTED] Linux User Number: #201204 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (MingW32) iD8DBQFEdacCDvn/4H0LjDwRAkTWAJ4/KS6WnAgUraPZLmyPCQ45izq5tQCgl7sR nkZbIauRcdlavA89ZhnDSuM= =YZPS -END PGP SIGNATURE- ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
Andrew Robert wrote: Taking this a little further along, I wrote the converted file to a new file using: import re,sys output = open(r'e:\pycode\out_test.txt','wb') for line in open(r'e:\pycode\sigh.txt','rb') : output.write( re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line)) output.close() Not elegant but its strictly for test :) Last part and we can call it a day. How would you modify the lambda statement to covert a the hex value back to its original value? Use int(s, 16) to convert a base 16 string to an integer, and chr() to convert the int to a string. So something like this: lambda s: chr(int(s.group(), 16))) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I tried: output = open(r'e:\pycode\new_test.txt','wb') for line in open(r'e:\pycode\out_test.txt','rb') : output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) % ord(s.group()), line)) This generated the traceback: File E:\pycode\sample_decode_file_specials_from_hex.py, line 8 output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) % ord(s.group()), line)) ^ SyntaxError: invalid syntax By any chance, do you see where the syntax issue is? Kent Johnson wrote: Andrew Robert wrote: snip /snip Use int(s, 16) to convert a base 16 string to an integer, and chr() to convert the int to a string. So something like this: lambda s: chr(int(s.group(), 16))) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor - -- Thank you, Andrew Robert Systems Architect Information Technologies MFS Investment Management Phone: 617-954-5882 E-mail: [EMAIL PROTECTED] Linux User Number: #201204 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (MingW32) iD8DBQFEdbMrDvn/4H0LjDwRAi09AKC1I6XIcXiqYmpk4hpcbnkwux1NawCgt/zp xySHXPrh5JncZphAcVRtbtI= =xtr9 -END PGP SIGNATURE- ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
Andrew Robert wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I tried: output = open(r'e:\pycode\new_test.txt','wb') for line in open(r'e:\pycode\out_test.txt','rb') : output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) % ord(s.group()), line)) This generated the traceback: File E:\pycode\sample_decode_file_specials_from_hex.py, line 8 output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) % ord(s.group()), line)) ^ SyntaxError: invalid syntax By any chance, do you see where the syntax issue is? Take out % ord(s.group()) - the result of chr() is the actual string you want, not a format string. The syntax error is caused by mismatched parentheses. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
for line in open(r'e:\pycode\out_test.txt','rb') : output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) % ord(s.group()), line)) Let's add some whitespace. output.write(re.sub(r'([^\w\s])', lambda s: chr( int(s.group(), 16) ) ) % ord(s.group()), line)) I do see at least one too many parens here, so that's something you should look at. But I'd also recommend writing a helper function here. Just because you can do this in one line doesn't mean you have to. *grin* It might be useful to change the lambda back to a helper function. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 When I alter the code to: import re,sys output = open(r'e:\pycode\new_test.txt','wb') for line in open(r'e:\pycode\out_test.txt','rb') : output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) , line) output.close() I get the trace: Traceback (most recent call last): File E:\pycode\sample_decode_file_specials_from_hex.py, line 8, in ? output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) , line) TypeError: sub() takes at least 3 arguments (2 given) It appears that the code is not recognizing the line. I checked the parentheses and they appear to be properly enclosed. Any ideas? Kent Johnson wrote: snip /snip Take out % ord(s.group()) - the result of chr() is the actual string you want, not a format string. The syntax error is caused by mismatched parentheses. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor - -- Thank you, Andrew Robert Systems Architect Information Technologies MFS Investment Management Phone: 617-954-5882 E-mail: [EMAIL PROTECTED] Linux User Number: #201204 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (MingW32) iD8DBQFEdbpWDvn/4H0LjDwRAhEmAJ9WSfKitH1VgsTD5kTLI4cWP5YZRwCgs0mz Y9jl5l6Q/VZe6NmUaibZGa4= =nezG -END PGP SIGNATURE- ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 lol.. Glutton for punishment I guess. I tried removing the last parentheses but I then get an error that two arguments are passed when three are expected. Danny Yoo wrote: for line in open(r'e:\pycode\out_test.txt','rb') : output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) % ord(s.group()), line)) Let's add some whitespace. output.write(re.sub(r'([^\w\s])', lambda s: chr( int(s.group(), 16) ) ) % ord(s.group()), line)) I do see at least one too many parens here, so that's something you should look at. But I'd also recommend writing a helper function here. Just because you can do this in one line doesn't mean you have to. *grin* It might be useful to change the lambda back to a helper function. - -- Thank you, Andrew Robert Systems Architect Information Technologies MFS Investment Management Phone: 617-954-5882 E-mail: [EMAIL PROTECTED] Linux User Number: #201204 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (MingW32) iD8DBQFEdbtlDvn/4H0LjDwRAqg+AJ0SZY/T3kCpG+3qWX3F3yRSt73P7ACdFsZQ LnBhWh95EfuHA+eMkz6gkF4= =C0oN -END PGP SIGNATURE- ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
Andrew Robert wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 When I alter the code to: import re,sys output = open(r'e:\pycode\new_test.txt','wb') for line in open(r'e:\pycode\out_test.txt','rb') : output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) , line) output.close() I get the trace: Traceback (most recent call last): File E:\pycode\sample_decode_file_specials_from_hex.py, line 8, in ? output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) , line) TypeError: sub() takes at least 3 arguments (2 given) It appears that the code is not recognizing the line. I checked the parentheses and they appear to be properly enclosed. Any ideas? You have an argument in the wrong place. Stop trying to do everything in one line! Put the lambda in a def'd function. Put the re.sub on it's own line. You are tripping over unnecessary complexity. I'm not going to fix it any more. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Kent, Sorry for causing so much trouble. I am not married to either a single or multi-line solution one way or another. Just a solution that works. Based on something by Danny Yoo provided, I had started with something like: import re,base64 # Evaluate captured character as hex def ret_hex(value): return base64.b16encode(value) def ret_ascii(value): return base64.b16decode(value) # Evaluate the value of whatever was matched def eval_match(match): return ret_ascii(match.group(0)) out=open(r'e:\pycode\sigh.new2','wb') # Read each line, pass any matches on line to function for # line in file.readlines(): for line in open(r'e:\pycode\sigh.new','rb'): print (re.sub('[^\w\s]',eval_match, line)) The char to hex pass works but omits the leading x. The hex to char pass does not appear to work at all. No error is generated. It just appears to be ignored. Kent Johnson wrote: Andrew Robert wrote: snip /snip You have an argument in the wrong place. Stop trying to do everything in one line! Put the lambda in a def'd function. Put the re.sub on it's own line. You are tripping over unnecessary complexity. I'm not going to fix it any more. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor - -- Thank you, Andrew Robert Systems Architect Information Technologies MFS Investment Management Phone: 617-954-5882 E-mail: [EMAIL PROTECTED] Linux User Number: #201204 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (MingW32) iD8DBQFEddHSDvn/4H0LjDwRAibnAJ4/6/IiPtz7k+jIa01kRe1X25UNkACfaq24 bbqKqyOZyLpCRBEHbrO7H7A= =8+rq -END PGP SIGNATURE- ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Everyone, Thanks for all of your patience on this. I finally got it to work. Here is the completed test code showing what is going on. Not cleaned up yet but it works for proof-of-concept purposes. #!/usr/bin/python import re,base64 # Evaluate captured character as hex def ret_hex(value): return '%'+base64.b16encode(value) # Evaluate the value of whatever was matched def enc_hex_match(match): return ret_hex(match.group(0)) def ret_ascii(value): return base64.b16decode(value) # Evaluate the value of whatever was matched def enc_ascii_match(match): arg=match.group() #remove the artifically inserted % sign arg=arg[1:] # decode the result return ret_ascii(arg) def file_encoder(): # Read each line, pass any matches on line to function for # line in file.readlines(): output=open(r'e:\pycode\sigh.new','wb') for line in open(r'e:\pycode\sigh.txt','rb'): output.write( (re.sub('[^\w\s]',enc_hex_match, line)) ) output.close() def file_decoder(): # Read each line, pass any matches on line to function for # line in file.readlines(): output=open(r'e:\pycode\sigh.new2','wb') for line in open(r'e:\pycode\sigh.new','rb'): output.write(re.sub('%[0-9A-F][0-9A-F]',enc_ascii_match, line)) output.close() file_encoder() file_decoder() -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (MingW32) iD8DBQFEdfQQDvn/4H0LjDwRAnbIAJ0cD9fdtIqtpfksP07n02Er9YMPiwCfTSsC pCVDgnQ8pbZS40BuA8gNNBQ= =mPoG -END PGP SIGNATURE- ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions (fwd)
On Thu, 25 May 2006, Alan Gauld wrote: In general I prefer to use string formatting to convert into hex format. I'm a big fan of hexlify: from binascii import hexlify s=abc-123 hexlify(s) '6162632d313233' ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
for line in open(r'e:\pycode\out_test.txt','rb') : output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) % ord(s.group()), line)) This generated the traceback: File E:\pycode\sample_decode_file_specials_from_hex.py, line 8 output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))) % ord(s.group()), line)) ^ SyntaxError: invalid syntax By any chance, do you see where the syntax issue is? Andrew, This is a good place to use the Python interactive prompt. Try the various bits in the interpreter to find out what causes the error. To be honest I'd break that single line into at least 2 if not 3 lines anyway purely from a debug and maintenance point of view. You are in real danger of turning Python into perl here! :-) As to your error: output.write( re.sub( r'([^\w\s])', lambda s: chr(int(s.group(),16)) ) % ord(s.group()), line)) the parens dont seem to match up... Or am I miscounting? Alan G ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
perl -ple s/([^\w\s])/sprintf(q#%%%2X#, ord $1)/ge somefile.txt Hi Andrew, Give me a second. I'm trying to understand the command line switches: (Looking in 'perl --help'...) -p assume loop like -n but print line also, like sed -l[octal] enable line ending processing, specifies line terminator -e program one line of program (several -e's allowed, omit programfile) and the regular expression modifiers there --- 'g' and 'e' --- mean ... (reading 'perldoc perlop'...) g Match globally, i.e., find all occurrences. e Evaluate the right side as an expression. Ok, I have a better idea of what's going on here now. This takes a file, and translates every non-whitespace character into a hex string. That's a dense one-liner. How would you convert this to a python equivalent using the re or similar module? The substitution on the right hand side in the Perl code actually is evaluated rather than literally substituted. To get the same effect from Python, we pass a function off as the substituting value to re.sub(). For example, we can translate every word-like character by shifting it one place ('a' - 'b', 'b' - 'c', etc...) ### import re def rot1(ch): ... return chr((ord(ch) + 1) % 256) ... def rot1_on_match(match): ... return rot1(match.group(0)) ... re.sub(r'\w', rot1_on_match, hello world) 'ifmmp xpsme' ### I've begun reading about using re expressions at http://www.amk.ca/python/howto/regex/ but I am still hazy on implementation. The part in: http://www.amk.ca/python/howto/regex/regex.html#SECTION00062 that talks about a replacement function is relevant to what you're asking. We need to provide a replacement function to simulate the right-hand-side evaluation that's happening in the Perl code. Good luck! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
On 24 Mai 2006, [EMAIL PROTECTED] wrote: I have two Perl expressions If windows: perl -ple s/([^\w\s])/sprintf(q#%%%2X#, ord $1)/ge somefile.txt If posix perl -ple 's/([^\w\s])/sprintf(%%%2X, ord $1)/ge' somefile.txt The [^\w\s] is a negated expression stating that any character a-zA-Z0-9_, space or tab is ignored. The () captures whatever matches and throws it into the $1 for processing by the sprintf In this case, %%%2X which is a three character hex value. How would you convert this to a python equivalent using the re or similar module? python -c import re, sys;print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), sys.stdin.read()), somefile It's not as short as the Perl version (and might have problems with big files). Python does not have such useful command line switches like -p (but you doesn't use Python so much for one liners as Perl) but it does the same ; at least in this special case (Python lacks something like the -l switch). With bash it's a bit easier. (maybe there's also a way with cmd.com to write multiple lines)? $ python -c import re,sys for line in sys.stdin: print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line), somefile Karl -- Please do *not* send copies of replies to me. I read the list ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Wow!!.. That awesome! My goal was not to make it a one-liner per-se.. I was simply trying to show the functionality I was trying to duplicate. Boiling your one-liner down into a multi-line piece of code, I did: #!c:\python24\python import re,sys a = open(r'e:\pycode\csums.txt','rb').readlines() for line in a: print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line) Breaking down the command, you appear to be calling an un-named function to act against any characters trapped by the regular expression. Not familiar with lamda :). The un-named function does in-place transformation of the character to the established hex value. Does this sound right? If I then saved the altered output to a file and wanted to transform it back to its original form, I would do the following in perl. perl -ple 's/(?:%([0-9A-F]{2}))/chr hex $1/eg' somefiletxt How would you reverse the process from a python point of view? snip /snip Karl Pflästerer wrote: python -c import re, sys;print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), sys.stdin.read()), somefile It's not as short as the Perl version (and might have problems with big files). Python does not have such useful command line switches like -p (but you doesn't use Python so much for one liners as Perl) but it does the same ; at least in this special case (Python lacks something like the -l switch). With bash it's a bit easier. (maybe there's also a way with cmd.com to write multiple lines)? $ python -c import re,sys for line in sys.stdin: print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line), somefile Karl - -- Thank you, Andrew Robert Systems Architect Information Technologies MFS Investment Management Phone: 617-954-5882 E-mail: [EMAIL PROTECTED] Linux User Number: #201204 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (MingW32) iD8DBQFEdPFwDvn/4H0LjDwRAuzuAKCOPja9Js1ueP2GoT+B0hoFubDEegCguzfT QL87gmKUx6znmGQxXqg6V+A= =7MT2 -END PGP SIGNATURE- ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions (fwd)
[forwarding to tutor, although it looks like Andrew's making some good headway from other messages] -- Forwarded message -- Date: Wed, 24 May 2006 14:59:43 -0400 From: Andrew Robert [EMAIL PROTECTED] To: Danny Yoo [EMAIL PROTECTED] Subject: Re: [Tutor] Question on regular expressions -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hey Danny, Your code put me right on track. - From that point, I crafted the following code. What is confusing is how to take the captured character and transform it into a 3 digit hex value. Do you know how that might be accomplished? #!/usr/bin/python import re # Evaluate captured character as hex def ret_hex(ch): return chr((ord(ch) + 1 ) % 256 ) # Evaluate the value of whatever was matched def eval_match(match): return ret_hex(match.group(0)) # open file file = open(r'm:\mq\mq\scripts\sigh.txt','r') # Read each line, pass any matches on line to function for # line in file.readlines(): for line in file: a=re.sub('[^\w\s]',eval_match, line) print a - -- Thank you, Andrew Robert Systems Architect Information Technologies MFS Investment Management Phone: 617-954-5882 E-mail: [EMAIL PROTECTED] Linux User Number: #201204 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (MingW32) iD8DBQFEdK0fDvn/4H0LjDwRAuipAKDFqOeQQkJ+WkaI+veIgC8oEn9/CQCfUfNO xb7AT8W04B/F684i+Lw6kxw= =5mPe -END PGP SIGNATURE- ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
Andrew Robert wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Wow!!.. That awesome! My goal was not to make it a one-liner per-se.. I was simply trying to show the functionality I was trying to duplicate. Boiling your one-liner down into a multi-line piece of code, I did: #!c:\python24\python import re,sys a = open(r'e:\pycode\csums.txt','rb').readlines() for line in a: You probably want to open the file in text mode, not binary. You don't have to read all the lines of the file, you can iterate reading one line at a time. Combining these two changes, the above two lines consolidate to for line in open(r'e:\pycode\csums.txt'): print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line) Breaking down the command, you appear to be calling an un-named function to act against any characters trapped by the regular expression. Not familiar with lamda :). It is a way to make an anonymous function, occasionally abused to write Python one-liners. You could just as well spell it out: def hexify(match): return ''%%%2X' % ord(match.group()) print re.sub(r'([^\w\s])', hexify, line) The un-named function does in-place transformation of the character to the established hex value. Does this sound right? Yes. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions
a = open(r'e:\pycode\csums.txt','rb').readlines() for line in a: print re.sub(r'([^\w\s])', lambda s: '%%%2X' % ord(s.group()), line) Or just for line in open(r'e:\pycode\csums.txt','rb'): print. Breaking down the command, you appear to be calling an un-named function to act against any characters trapped by the regular expression. Not familiar with lamda :). You ae absolutely right. It creates an un-named(or anonymous function). :-) The un-named function does in-place transformation of the character to the established hex value. Its actually the call to re.sub() that makes in in place. How would you reverse the process from a python point of view? Just write a reverse function for the lamda... Alan G. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Question on regular expressions (fwd)
Your code put me right on track. - From that point, I crafted the following code. What is confusing is how to take the captured character and transform it into a 3 digit hex value. In general I prefer to use string formatting to convert into hex format. print %3X% % myValue you can play around with the length specifier, left/right formatting etc etc. Think sprintf in C... Alan G. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor