Re: sobering observation, python vs. perl

2016-03-20 Thread Charles T. Smith
On Thu, 17 Mar 2016 21:18:43 +0530, srinivas devaki wrote:

> please upload the log file,


Sorry, it's work stuff, can't do that, but just take any big set of files
and change the strings appropriately and the numbers should be equivalent.


> 
> and global variables in python are slow, so just keep all that in a
> function and try again. generally i get 20-30% time improvement by
> doin that.

#!/usr/bin/env python
# vim: tw=0
import sys
import re

def faster ():
isready = re.compile ("(.*) is ready")
relreq = re.compile (".*release_req")
for fn in sys.argv[1:]: # logfile name
tn = None
with open (fn) as fd:
for line in fd:
#match = re.match ("(.*) is ready", line)
match = isready.match (line)
if match:
tn = match.group(1)
continue
#match = re.match (".*release_req", line)
match = relreq.match (line)
if match:
#print "%s: %s" % (tn, line),
print tn

faster()

$ time python ./find-relreq *.out | sort -u
TestCase_F_00_P
TestCase_F_00_S
TestCase_F_01_S
TestCase_F_02_M

real0m25.515s
user0m25.294s
sys 0m0.136s

3 more seconds!

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-20 Thread Ethan Furman

On 03/17/2016 09:08 AM, Charles T. Smith wrote:

On Thu, 17 Mar 2016 10:52:30 -0500, Tim Chase wrote:



Not saying this will make a great deal of difference, but these two
items jumped out at me.  I'd even be tempted to just use string
manipulations for the isready aspect as well.  Something like
(untested)


well, I don't want to forgo REs in order to have python's numbers be better


The issue is not avoiding REs, but using Python's strengths and idioms. 
 Write the code in Python's style, get the same results, then compare 
the times.


If you posted the data file and exact results the rest of us could try, 
but as it is all we can do is offer ideas and you have test them.


--
~Ethan~

--
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-20 Thread Charles T. Smith
On Thu, 17 Mar 2016 15:29:47 +, Charles T. Smith wrote:

And for completeness, and also surprising:

time sed -n -e '/ is ready/{s///;h}' -e '/release_req/{g;p}'  *.out | sort -u
TestCase_F_00_P
TestCase_F_00_S
TestCase_F_01_S
TestCase_F_02_M

real0m10.998s
user0m10.885s
sys 0m0.108s

Twice as long as perl...  I guess there's no excuse for sed anymore...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-20 Thread Charles T. Smith
On Thu, 17 Mar 2016 19:08:58 +0200, Marko Rauhamaa wrote:

> "Charles T. Smith" :
> 

> 
> Compare Perl (http://www.perlmonks.org/?node_id=98357>):
> 
>my $str = "I have a dream";
>my $find = "have";
>my $replace = "had";
>$find = quotemeta $find; # escape regex metachars if present
>$str =~ s/$find/$replace/g;
>print $str;
> 
> with Python:
> 
>print("I have a dream".replace("have", "had"))
> 
> 
> Marko

Uh... that perl is way over my head.  I admit though, that perl's
powerful substitute command is also clumsy.  The best I can do
right now is:

$v =  "I have a dream\n";
$v =~ s/have/had/;
print $v

One of the ugliest things about perl are the "silly" type
prefixes ($, @, %).  But in a python project I'm doing now,
I realized an important advantage that they bring...

I want to be able to initialize msgs to communicate with
C.  Ideally, I'd to just specify the path to an equivalent
python instance but all intermediate instances have to
already exist - python does not have autovivication.
I implemented it but only up until the leaf node - because
python doesn't know their types.  Perl can do that, because the
prefix tells it the type.

But, don't get me wrong, coding in python is a JOY!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-20 Thread Marko Rauhamaa
"Charles T. Smith" :

> I need the second check to also be a RE because it's not
> separate tokens.

The string "in" check doesn't care about tokens.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread BartC

On 17/03/2016 18:53, Marko Rauhamaa wrote:

BartC :



sub replacewith{
$s = $_[0];
$t = $_[1];
$u = $_[2];
$s =~ s/$t/$u/;
return $s;
}

Although once done, the original task now looks a proper language:

print (replacewith("I have a dream","have","had"));


Now try your function with:

print (replacewith("I have a dream",".","had"));


Yeah, it needs your quotemeta line (whatever that does). But the call is 
unaffected as the clutter is in the function.


--
bartc

--
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Charles T. Smith
On Thu, 17 Mar 2016 10:52:30 -0500, Tim Chase wrote:

>> Not saying this will make a great deal of difference, but these two
> items jumped out at me.  I'd even be tempted to just use string
> manipulations for the isready aspect as well.  Something like
> (untested)

well, I don't want to forgo REs in order to have python's numbers be better
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Marko Rauhamaa
"Charles T. Smith" :

> well, I don't want to forgo REs in order to have python's numbers be
> better

http://stackoverflow.com/questions/12793562/text-processing-pytho
n-vs-perl-performance>


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Marko Rauhamaa
"Charles T. Smith" :

> Here's the programs:
>
> #!/usr/bin/env python
> # vim: tw=0
> import sys
> import re
>
> isready = re.compile ("(.*) is ready")
> relreq = re.compile (".*release_req")
> for fn in sys.argv[1:]: # logfile name
> tn = None
> with open (fn) as fd:
> for line in fd:
> #match = re.match ("(.*) is ready", line)
> match = isready.match (line)
> if match:
> tn = match.group(1)
> #match = re.match (".*release_req", line)
> match = relreq.match (line)
> if match:
> #print "%s: %s" % (tn, line),
> print tn
>
> vs.
>
> while (<>) {
> if (/(.*) is ready/) {
> $tn = $1;
> }
> elsif (/release_req/) {
> print "$tn\n";
> }
> }
>
> Look at those numbers:
> 1 minute for python without precompiled REs
> 1/2 minute with precompiled REs
> 5 seconds with perl.

Can't comment on the numbers but the code segments are not quite
analogous. What about this one:

#!/usr/bin/env python
# vim: tw=0
import sys
import re

isready = re.compile("(.*) is ready")
for fn in sys.argv[1:]:
tn = None
with open(fn) as fd:
for line in fd:
match = isready.match(line)
if match:
tn = match.group(1)
elif "release_req" in line:
print tn


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Charles T. Smith
On Thu, 17 Mar 2016 18:30:29 +0200, Marko Rauhamaa wrote:

> "Charles T. Smith" :
> 
>> I need the second check to also be a RE because it's not
>> separate tokens.
> 
> The string "in" check doesn't care about tokens.
> 
> 
> Marko


Ah, yes.  Okay.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Marko Rauhamaa
BartC :

> On 17/03/2016 18:53, Marko Rauhamaa wrote:
>> BartC :
>
>>> sub replacewith{
>>> $s = $_[0];
>>> $t = $_[1];
>>> $u = $_[2];
>>> $s =~ s/$t/$u/;
>>> return $s;
>>> }
>>>
>>> Although once done, the original task now looks a proper language:
>>>
>>> print (replacewith("I have a dream","have","had"));
>>
>> Now try your function with:
>>
>> print (replacewith("I have a dream",".","had"));
>
> Yeah, it needs your quotemeta line (whatever that does). But the call
> is unaffected as the clutter is in the function.

Well, you fell in the trap. Most perl programmers would fall in it. Same
with bash programmers, including myself.

That's why I'm wondering if Python could come to the rescue and offer a
solid alternative to bash. You have to go out of your way to get into
accidental quoting/escaping problems in Python.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Steven D'Aprano
On Fri, 18 Mar 2016 03:08 am, Charles T. Smith wrote:

> On Thu, 17 Mar 2016 10:52:30 -0500, Tim Chase wrote:
> 
>>> Not saying this will make a great deal of difference, but these two
>> items jumped out at me.  I'd even be tempted to just use string
>> manipulations for the isready aspect as well.  Something like
>> (untested)
> 
> well, I don't want to forgo REs in order to have python's numbers be
> better


Even when REs are the wrong tool for the job?

"Yeah, I know I ought to be using a power drill, but all I've got is this
hammer..."




-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Marko Rauhamaa
BartC :

> I was going to suggest just using a function. But never having coded in
> Perl before, I wasn't expecting something this ugly:
>
> sub replacewith{
>$s = $_[0];
>$t = $_[1];
>$u = $_[2];
>$s =~ s/$t/$u/;
>return $s;
> }
>
> Although once done, the original task now looks a proper language:
>
> print (replacewith("I have a dream","have","had"));

Now try your function with:

   print (replacewith("I have a dream",".","had"));


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Charles T. Smith
On Thu, 17 Mar 2016 10:26:12 -0700, Ethan Furman wrote:

> On 03/17/2016 09:36 AM, Charles T. Smith wrote:
> 
>> Yes, your point was to forgo REs despite that they are useful.
>> I could have thought the search would have been better as:
>>
>>  'release[-.:][Rr]eq'
>>
>> or something else ... you're in a "defend python at all costs!" mode.
> 
> No, I'm in the "don't try to write  in Python" mode, and 
> "don't use 10lb sledge when 6oz hammer will do" mode:


Yes, fine.

I'd only like to add that the perl numbers might also improve
if the print in the loop were postponed.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread BartC

On 17/03/2016 17:25, Charles T. Smith wrote:

On Thu, 17 Mar 2016 19:08:58 +0200, Marko Rauhamaa wrote:



my $str = "I have a dream";
my $find = "have";
my $replace = "had";
$find = quotemeta $find; # escape regex metachars if present
$str =~ s/$find/$replace/g;
print $str;

with Python:

print("I have a dream".replace("have", "had"))



Uh... that perl is way over my head.  I admit though, that perl's
powerful substitute command is also clumsy.  The best I can do
right now is:

$v =  "I have a dream\n";
$v =~ s/have/had/;
print $v


I was going to suggest just using a function. But never having coded in 
Perl before, I wasn't expecting something this ugly:


sub replacewith{
   $s = $_[0];
   $t = $_[1];
   $u = $_[2];
   $s =~ s/$t/$u/;
   return $s;
}

Although once done, the original task now looks a proper language:

print (replacewith("I have a dream","have","had"));

--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Charles T. Smith
On Thu, 17 Mar 2016 18:07:12 +0200, Marko Rauhamaa wrote:

> "Charles T. Smith" :
> Ok. The LANG=C setting has a tremendous effect on the performance of
> textutils.
> 
> 
> Marko

Good to know, thank you...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Charles T. Smith
On Thu, 17 Mar 2016 17:48:54 +0200, Marko Rauhamaa wrote:

> "Charles T. Smith" :
> 
>> On Thu, 17 Mar 2016 15:29:47 +, Charles T. Smith wrote:
>>
>> And for completeness, and also surprising:
>>
>> time sed -n -e '/ is ready/{s///;h}' -e '/release_req/{g;p}'  *.out | sort -u
>> TestCase_F_00_P
>> TestCase_F_00_S
>> TestCase_F_01_S
>> TestCase_F_02_M
>>
>> real0m10.998s
>> user0m10.885s
>> sys 0m0.108s
>>
>> Twice as long as perl...  I guess there's no excuse for sed anymore...
> 
> Try running the sed command again after setting:
> 
> export LANG=C
> 
> 
> Marko

Hmmm.  Interesting thought.  But...

$ locale
LANG=C
LANGUAGE=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE=C
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Tim Chase
On 2016-03-17 15:29, Charles T. Smith wrote:
> isready = re.compile ("(.*) is ready")
> relreq = re.compile (".*release_req")
> for fn in sys.argv[1:]: # logfile
> name tn = None
> with open (fn) as fd:
> for line in fd:
> #match = re.match ("(.*) is ready", line)
> match = isready.match (line)
> if match:
> tn = match.group(1)
> #match = re.match (".*release_req", line)
> match = relreq.match (line)
> if match:

Note that this "match" and "if" get executed for every line

> #print "%s: %s" % (tn, line),
> print tn
> 
> vs.
> 
> while (<>) {
> if (/(.*) is ready/) {
> $tn = $1;
> }
> elsif (/release_req/) {

Note this else ^

> print "$tn\n";
> }
> }

Also, you might just test for string-presence on that second one

So what happens if your code looks something like

isready = re.compile ("(.*) is ready")
for fn in sys.argv[1:]: # logfile name
tn = None
with open (fn) as fd:
for line in fd:
match = isready.match (line)
if match:
tn = match.group(1)
elif "release_req" in line:
print tn

Not saying this will make a great deal of difference, but these two
items jumped out at me.  I'd even be tempted to just use string
manipulations for the isready aspect as well.  Something like
(untested)

IS_READY = " is ready"
REL_REQ = "release_req"
for n in sys.argv[1:]:
  tn = None
  with open(fn) as fd):
for line in fd:
  try:
index = line.rindex(IS_READY)
  except ValueError:
if REL_REQ in line:
  print tn
  else:
tn = line[:index]

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Marko Rauhamaa
"Charles T. Smith" :

> On Thu, 17 Mar 2016 17:48:54 +0200, Marko Rauhamaa wrote:
>> Try running the sed command again after setting:
>> 
>> export LANG=C
>
> Hmmm.  Interesting thought.  But...
>
> $ locale
> LANG=C

Ok. The LANG=C setting has a tremendous effect on the performance of
textutils.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Charles T. Smith
On Thu, 17 Mar 2016 09:21:51 -0700, Ethan Furman wrote:

>> well, I don't want to forgo REs in order to have python's numbers be 
>> better
> 
> The issue is not avoiding REs, but using Python's strengths and idioms. 
>   Write the code in Python's style, get the same results, then compare 
> the times.


Yes, your point was to forge REs despite that they are useful.
I could have thought the search would have been better as:

'release[-.:][Rr]eq'

or something else ... you're in a "defend python at all costs!" mode.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Peter Otten
Charles T. Smith wrote:

> On Thu, 17 Mar 2016 10:52:30 -0500, Tim Chase wrote:
> 
>>> Not saying this will make a great deal of difference, but these two
>> items jumped out at me.  I'd even be tempted to just use string
>> manipulations for the isready aspect as well.  Something like
>> (untested)
> 
> well, I don't want to forgo REs in order to have python's numbers be
> better

As has been said, for simple text processing tasks string methods are the 
preferred approach in Python. I think this is more for clarity than 
performance.

If you need regular expressions a simple way to boost performance may be to 
use the external regex module.

(By the way, if you are looking for a simple way to iterate over multiple 
files use
for line in fileinput.input():
...
)

Some numbers:

$ time perl find.pl data/sample*.txt > r1.txt

real0m0.504s
user0m0.466s
sys 0m0.036s
$ time python find.py data/sample*.txt > r2.txt

real0m2.403s
user0m2.339s
sys 0m0.059s
$ time python find_regex.py data/sample*.txt > r3.txt

real0m0.693s
user0m0.631s
sys 0m0.060s
$ time python find_no_re.py data/sample*.txt > r4.txt

real0m0.319s
user0m0.267s
sys 0m0.048s

Python 3 slows down things:

$ time python3 find_no_re.py data/sample*.txt > r5.txt

real0m0.497s
user0m0.444s
sys 0m0.051s

The scripts:
$ cat find.pl
#!/usr/bin/env perl

while (<>) {
if (/(.*) is ready/) {
$tn = $1;
}
elsif (/release_req/) {
print "$tn\n";
}
}
$ cat find.py
#!/usr/bin/env python
import sys
import re

def main():
isready = re.compile ("(.*) is ready").match
relreq = re.compile (".*release_req").match

tn = ""
for fn in sys.argv[1:]:
with open(fn) as fd:
for line in fd:
match = isready(line)
if match:
tn = match.group(1)
elif relreq(line):
print(tn)

main()

$ cat find_regex.py
#!/usr/bin/env python
import sys
import regex as re
[rest the same as find.py]

$ cat find_no_re.py
#!/usr/bin/env python
import sys

def main():
tn = ""
for fn in sys.argv[1:]:
with open(fn) as fd:
for line in fd:
if " is ready" in line:
tn = line.partition(" is ready")[0]
elif "release_req" in line:
print(tn)

main()

The test data was generated with

$ cat make_test_data.py
#!/usr/bin/env python3
import os
import random
import shutil

from itertools import islice


def make_line_factory(words, line_length, isready):
choice = random.choice

def make_line():
while True:
line = [choice(words)]
length = len(line[0])
while length < line_length:
word = choice(words)
line.append(word)
length += len(word) + 1
if random.randrange(100) < isready:
pos = random.randrange(len(line))
line[pos:pos+1] = ["is", "ready"]
elif random.randrange(100) < isready:
pos = random.randrange(len(line))
line[pos:pos] = ["release_req"]
yield " ".join(line)

return make_line


def main():
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--words", default="/usr/share/dict/words")
parser.add_argument("--line-length", type=int, default=80)
parser.add_argument("--num-lines", type=eval, default=10**5)
parser.add_argument("--num-files", type=int, default=4)
parser.add_argument("--name-template", default="sample{:0{}}.txt")
parser.add_argument("--data-folder", default="data")
parser.add_argument("--remove-data-folder", action="store_true")
parser.add_argument("--first-match-percent", type=int, default=10)
try:
import argcomplete
except ImportError:
pass
else:
argcomplete.autocomplete(parser)

args = parser.parse_args()

if args.remove_data_folder:
shutil.rmtree(args.data_folder)
os.mkdir(args.data_folder)

with open(args.words) as f:
words = [line.strip() for line in f]

make_line = make_line_factory(
words, args.line_length, args.first_match_percent)()

width = len(str(args.num_files))
for index in range(1, args.num_files+1):
filename = os.path.join(
args.data_folder,
args.name_template.format(index, width))
print(filename)
with open(filename, "w") as f:
for line in islice(make_line, args.num_lines):
print(line, file=f)


if __name__ == "__main__":
main()


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Charles T. Smith
On Thu, 17 Mar 2016 17:47:55 +0200, Marko Rauhamaa wrote:


> Can't comment on the numbers but the code segments are not quite
> analogous. What about this one:
> 
> #!/usr/bin/env python
> # vim: tw=0
> import sys
> import re
> 
> isready = re.compile("(.*) is ready")
> for fn in sys.argv[1:]:
> tn = None
> with open(fn) as fd:
> for line in fd:
> match = isready.match(line)
> if match:
> tn = match.group(1)
> elif "release_req" in line:
> print tn
> 
> 
> Marko


I need the second check to also be a RE because it's not
separate tokens.  How about this change:

match = isready.match (line)
if match:
tn = match.group(1)
 >  continue

match = relreq.match (line)
if match:
print tn

real0m28.737s
user0m28.538s
sys 0m0.128s

Shaved 2 seconds off.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Charles T. Smith
On Thu, 17 Mar 2016 18:34:06 +0200, Marko Rauhamaa wrote:

> n-vs-perl-performance


Okay, that was interesting.

Actually, I saw a study some years ago that concluded that python
could be both slower and faster than perl, but that perl had much
less deviation than python.  I took that and accepted it, but
was surprised now that in exactly the field of application that I've
traditionally used perl, it really is better, er... faster.

Furthermore, the really nice thing about python is its OO, but
I've really neglected looking into that with perl's OO capabilities.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Ethan Furman

On 03/17/2016 09:36 AM, Charles T. Smith wrote:


Yes, your point was to forgo REs despite that they are useful.
I could have thought the search would have been better as:

 'release[-.:][Rr]eq'

or something else ... you're in a "defend python at all costs!" mode.


No, I'm in the "don't try to write  in Python" mode, and 
"don't use 10lb sledge when 6oz hammer will do" mode:



# using `in` and printing line as each is found
real0m1.703s
user0m0.184s
sys 0m0.260s

# using `in` and printing lines at the end
real0m0.217s
user0m0.112s
sys 0m0.068s

# using 're' and printing lines at the end
real0m0.608s
user0m0.516s
sys 0m0.060s


As you can see, how you print has a huge impact.  Hopefully you also 
noticed that using `re` when `in` would do made the script 3 times slower.



# using `in` code
import sys
found = []
for fn in sys.argv[1:]:
   with open(fn) as fh:
  for line in fh:
 if 'timezone' in line:
found.append(line)
print ''.join(found)

# using `re` code
import sys
import re
found = []
for fn in sys.argv[1:]:
   with open(fn) as fh:
  for line in fh:
 if re.search('timezone', line):
found.append(line)
print ''.join(found)


--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Anders J. Munch

Charles T. Smith:

I've really learned to love working with python, but it's too soon
to pack perl away.  I was amazed at how long a simple file search took
so I ran some statistics:


Write Python in pythonic style instead of translated-from-Perl style, and the 
tables are turned:


$ cat find-rel.py
| import sys
| def main():
| for fn in sys.argv[1:]:
| tn = None
| with open(fn, 'rt') as fd:
| for line in fd:
| if ' is ready' in line:
| tn = line.split(' is ready', 1)[0]
| elif 'release_req' in line:
| print tn
| main()


$ time python find-rel.py *.out
real0m0.647s
user0m0.616s
sys0m0.029s

$ time perl find-rel.pl *.out
real0m0.935s
user0m0.910s
sys0m0.023s

I don't have your log files and my quickly assembled test file doesn't actually 
contain the phrase 'release_req', so my results may be misleading. Perhaps 
you'll try it and post your results?


regards, Anders

--
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Rustom Mody
On Thursday, March 17, 2016 at 11:24:00 PM UTC+5:30, BartC wrote:
> On 17/03/2016 17:25, Charles T. Smith wrote:
> > On Thu, 17 Mar 2016 19:08:58 +0200, Marko Rauhamaa wrote:
> 
> >> my $str = "I have a dream";
> >> my $find = "have";
> >> my $replace = "had";
> >> $find = quotemeta $find; # escape regex metachars if present
> >> $str =~ s/$find/$replace/g;
> >> print $str;
> >>
> >> with Python:
> >>
> >> print("I have a dream".replace("have", "had"))
> 
> > Uh... that perl is way over my head.  I admit though, that perl's
> > powerful substitute command is also clumsy.  The best I can do
> > right now is:
> >
> > $v =  "I have a dream\n";
> > $v =~ s/have/had/;
> > print $v
> 
> I was going to suggest just using a function. But never having coded in 
> Perl before, I wasn't expecting something this ugly:
> 
> sub replacewith{
> $s = $_[0];
> $t = $_[1];
> $u = $_[2];

I think [untested] you can shorten those 3 lines to:
($s, $t, $u) = @_ ;

> $s =~ s/$t/$u/;
> return $s;
> }
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Marko Rauhamaa
"Charles T. Smith" :

> On Thu, 17 Mar 2016 15:29:47 +, Charles T. Smith wrote:
>
> And for completeness, and also surprising:
>
> time sed -n -e '/ is ready/{s///;h}' -e '/release_req/{g;p}'  *.out | sort -u
> TestCase_F_00_P
> TestCase_F_00_S
> TestCase_F_01_S
> TestCase_F_02_M
>
> real0m10.998s
> user0m10.885s
> sys 0m0.108s
>
> Twice as long as perl...  I guess there's no excuse for sed anymore...

Try running the sed command again after setting:

export LANG=C


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-19 Thread Ben Bacarisse
Marko Rauhamaa  writes:

> "Charles T. Smith" :
>
>> Actually, I saw a study some years ago that concluded that python
>> could be both slower and faster than perl, but that perl had much less
>> deviation than python. I took that and accepted it, but was surprised
>> now that in exactly the field of application that I've traditionally
>> used perl, it really is better, er... faster.
>>
>> Furthermore, the really nice thing about python is its OO, but I've
>> really neglected looking into that with perl's OO capabilities.
>
> I haven't had such log processing needs as you, nor has it come down to
> performance in such a way. Do use the best tool for the job.
>
> (When it comes to freely formatted logs, gleaning information from them
> is somewhat of a lost cause. I've done my best to move to rigorously
> formatted logs that are much more amenable to post processing.)
>
> Perl might be strong on its home turf, but I am a minimalist and
> reductionist -- Perl was intentionally designed to be a maximalist,
> imitating the principles of natural languages. Python has concise,
> crystal-clear semantics that are convenient to work with.
>
> Compare Perl (http://www.perlmonks.org/?node_id=98357>):
>
>my $str = "I have a dream";
>my $find = "have";
>my $replace = "had";
>$find = quotemeta $find; # escape regex metachars if present
>$str =~ s/$find/$replace/g;
>print $str;
>
> with Python:
>
>print("I have a dream".replace("have", "had"))

If you know the strings are "have" and "had", you can just write

  print 'I have a dream' =~ s/have/had/r;

but I think your point is to show up the lack of a string (rather than
regex) replace in Perl, so the strings should be considered arbitrarily
"dangerous".  For that purpose it might have been better to give the
example as

  print("I have a dream".replace(find, replace))

for which the closest Perl match is probably

  print 'I have a dream' =~ s/\Q$find/$replace/r;

The closest to the actual line -- where you can just edit two strings
with your only concern being the end quote of the string -- would be
something like

  my $find = 'have'; print 'I have a dream' =~ s{\Q$find}'had'r

I don't want to start a language war!  I'm not saying that this is as
simple and clear as the Python, but a "compare X with Y" should try to
do the best by both X and Y.

-- 
Ben.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-18 Thread Ethan Furman

On 03/17/2016 10:35 AM, Charles T. Smith wrote:

On Thu, 17 Mar 2016 10:26:12 -0700, Ethan Furman wrote:


On 03/17/2016 09:36 AM, Charles T. Smith wrote:


Yes, your point was to forgo REs despite that they are useful.
I could have thought the search would have been better as:

  'release[-.:][Rr]eq'

or something else ... you're in a "defend python at all costs!" mode.


No, I'm in the "don't try to write  in Python" mode, and
"don't use 10lb sledge when 6oz hammer will do" mode:



Yes, fine.

I'd only like to add that the perl numbers might also improve
if the print in the loop were postponed.


Yup, might!  Try it and see.

--
~Ethan~

--
https://mail.python.org/mailman/listinfo/python-list


sobering observation, python vs. perl

2016-03-18 Thread Charles T. Smith
I've really learned to love working with python, but it's too soon
to pack perl away.  I was amazed at how long a simple file search took
so I ran some statistics:

$ time python find-rel.py
./find-relreq *.out | sort -u
TestCase_F_00_P
TestCase_F_00_S
TestCase_F_01_S
TestCase_F_02_M

real1m4.581s
user1m4.412s
sys 0m0.140s


$ time python find-rel.py
# modified to use precompiled REs:
TestCase_F_00_P
TestCase_F_00_S
TestCase_F_01_S
TestCase_F_02_M

real0m29.337s
user0m29.174s
sys 0m0.100s


$ time perl find-rel.pl
find-relreq.pl *.out | sort -u
TestCase_F_00_P
TestCase_F_00_S
TestCase_F_01_S
TestCase_F_02_M

real0m5.009s
user0m4.932s
sys 0m0.072s

Here's the programs:

#!/usr/bin/env python
# vim: tw=0
import sys
import re

isready = re.compile ("(.*) is ready")
relreq = re.compile (".*release_req")
for fn in sys.argv[1:]: # logfile name
tn = None
with open (fn) as fd:
for line in fd:
#match = re.match ("(.*) is ready", line)
match = isready.match (line)
if match:
tn = match.group(1)
#match = re.match (".*release_req", line)
match = relreq.match (line)
if match:
#print "%s: %s" % (tn, line),
print tn

vs.

while (<>) {
if (/(.*) is ready/) {
$tn = $1;
}
elsif (/release_req/) {
print "$tn\n";
}
}

Look at those numbers:
1 minute for python without precompiled REs
1/2 minute with precompiled REs
5 seconds with perl.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-18 Thread Mark Lawrence

On 17/03/2016 16:36, Charles T. Smith wrote:

On Thu, 17 Mar 2016 09:21:51 -0700, Ethan Furman wrote:


well, I don't want to forgo REs in order to have python's numbers be better


The issue is not avoiding REs, but using Python's strengths and idioms.
   Write the code in Python's style, get the same results, then compare
the times.



Yes, your point was to forge REs despite that they are useful.
I could have thought the search would have been better as:

 'release[-.:][Rr]eq'

or something else ... you're in a "defend python at all costs!" mode.



I believe it is more along the lines of "In Rome, do as the Romans".

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-18 Thread srinivas devaki
please upload the log file,

and global variables in python are slow, so just keep all that in a
function and try again. generally i get 20-30% time improvement by
doin that.


On Thu, Mar 17, 2016 at 8:59 PM, Charles T. Smith
 wrote:
> I've really learned to love working with python, but it's too soon
> to pack perl away.  I was amazed at how long a simple file search took
> so I ran some statistics:
>
> $ time python find-rel.py
> ./find-relreq *.out | sort -u
> TestCase_F_00_P
> TestCase_F_00_S
> TestCase_F_01_S
> TestCase_F_02_M
>
> real1m4.581s
> user1m4.412s
> sys 0m0.140s
>
>
> $ time python find-rel.py
> # modified to use precompiled REs:
> TestCase_F_00_P
> TestCase_F_00_S
> TestCase_F_01_S
> TestCase_F_02_M
>
> real0m29.337s
> user0m29.174s
> sys 0m0.100s
>
>
> $ time perl find-rel.pl
> find-relreq.pl *.out | sort -u
> TestCase_F_00_P
> TestCase_F_00_S
> TestCase_F_01_S
> TestCase_F_02_M
>
> real0m5.009s
> user0m4.932s
> sys 0m0.072s
>
> Here's the programs:
>
> #!/usr/bin/env python
> # vim: tw=0
> import sys
> import re
>
> isready = re.compile ("(.*) is ready")
> relreq = re.compile (".*release_req")
> for fn in sys.argv[1:]: # logfile name
> tn = None
> with open (fn) as fd:
> for line in fd:
> #match = re.match ("(.*) is ready", line)
> match = isready.match (line)
> if match:
> tn = match.group(1)
> #match = re.match (".*release_req", line)
> match = relreq.match (line)
> if match:
> #print "%s: %s" % (tn, line),
> print tn
>
> vs.
>
> while (<>) {
> if (/(.*) is ready/) {
> $tn = $1;
> }
> elsif (/release_req/) {
> print "$tn\n";
> }
> }
>
> Look at those numbers:
> 1 minute for python without precompiled REs
> 1/2 minute with precompiled REs
> 5 seconds with perl.
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-18 Thread Marko Rauhamaa
"Charles T. Smith" :

> Actually, I saw a study some years ago that concluded that python
> could be both slower and faster than perl, but that perl had much less
> deviation than python. I took that and accepted it, but was surprised
> now that in exactly the field of application that I've traditionally
> used perl, it really is better, er... faster.
>
> Furthermore, the really nice thing about python is its OO, but I've
> really neglected looking into that with perl's OO capabilities.

I haven't had such log processing needs as you, nor has it come down to
performance in such a way. Do use the best tool for the job.

(When it comes to freely formatted logs, gleaning information from them
is somewhat of a lost cause. I've done my best to move to rigorously
formatted logs that are much more amenable to post processing.)

Perl might be strong on its home turf, but I am a minimalist and
reductionist -- Perl was intentionally designed to be a maximalist,
imitating the principles of natural languages. Python has concise,
crystal-clear semantics that are convenient to work with.

Compare Perl (http://www.perlmonks.org/?node_id=98357>):

   my $str = "I have a dream";
   my $find = "have";
   my $replace = "had";
   $find = quotemeta $find; # escape regex metachars if present
   $str =~ s/$find/$replace/g;
   print $str;

with Python:

   print("I have a dream".replace("have", "had"))


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list