Re: When are immutable tuples *essential*? Why can't you just use lists *everywhere* instead?

2007-04-20 Thread garrickp
On Apr 20, 4:37 pm, John Machin [EMAIL PROTECTED] wrote:
 One inessential but very useful thing about tuples when you have a lot
 of them is that they are allocated the minimum possible amount of
 memory. OTOH lists are created with some slack so that appending etc
 can avoid taking quadratic time.

Speaking of inessential but very useful things, I'm also a big fan of
the tuple swap...
a = 2
b = 3
(a, b) = (b, a)
print a # 3
print b # 2

As well as the simple return of multiple values from a single
function:

c_stdout, c_stdin = popen2(ls)

IMO, the biggest thing going for tuples is the syntactical sugar they
bring to Python. Doing either of these using lists or other data
constructs would not be nearly as clean as they are with tuples.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: catching exceptions from an except: block

2007-03-07 Thread garrickp
On Mar 7, 2:48 pm, Arnaud Delobelle [EMAIL PROTECTED] wrote:

 I'm not really thinking about this situation so let me clarify. Here
 is a simple concrete example, taking the following for the functions
 a,b,c I mention in my original post.
   - a=int
   - b=float
   - c=complex
   - x is a string
 This means I want to convert x to an int if possible, otherwise a
 float, otherwise a complex, otherwise raise CantDoIt.

 I can do:

 for f in int, float, complex:
 try:
 return f(x)
 except ValueError:
 continue
 raise CantDoIt

 But if the three things I want to do are not callable objects but
 chunks of code this method is awkward because you have to create
 functions simply in order to be able to loop over them (this is whay I
 was talking about 'abusing loop constructs').  Besides I am not happy
 with the other two idioms I can think of.

 --
 Arnaud

Wouldn't it be easier to do:

if isinstance(x, int):
# do something
elif isinstance(x, float)t:
# do something
elif isinstance(x, complex):
# do something
else:
raise CantDoIt

or,

i = [int, float, complex]
for f in i:
if isinstance(x, f):
return x
else:
raise CantDoIt

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: catching exceptions from an except: block

2007-03-07 Thread garrickp
On Mar 7, 3:04 pm, [EMAIL PROTECTED] wrote:
 On Mar 7, 2:48 pm, Arnaud Delobelle [EMAIL PROTECTED] wrote:





  I'm not really thinking about this situation so let me clarify. Here
  is a simple concrete example, taking the following for the functions
  a,b,c I mention in my original post.
- a=int
- b=float
- c=complex
- x is a string
  This means I want to convert x to an int if possible, otherwise a
  float, otherwise a complex, otherwise raise CantDoIt.

  I can do:

  for f in int, float, complex:
  try:
  return f(x)
  except ValueError:
  continue
  raise CantDoIt

  But if the three things I want to do are not callable objects but
  chunks of code this method is awkward because you have to create
  functions simply in order to be able to loop over them (this is whay I
  was talking about 'abusing loop constructs').  Besides I am not happy
  with the other two idioms I can think of.

  --
  Arnaud

 Wouldn't it be easier to do:

 if isinstance(x, int):
 # do something
 elif isinstance(x, float)t:
 # do something
 elif isinstance(x, complex):
 # do something
 else:
 raise CantDoIt

 or,

 i = [int, float, complex]
 for f in i:
 if isinstance(x, f):
 return x
 else:
 raise CantDoIt

I so missed the point of this. Not my day. Please ignore my post.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex Speed

2007-02-23 Thread garrickp
On Feb 21, 10:34 am, [EMAIL PROTECTED] wrote:
 On Feb 20, 6:14 pm, Pop User [EMAIL PROTECTED] wrote:
 http://swtch.com/~rsc/regexp/regexp1.html

Going back a bit on a tangent, the author of this citation states that
any regex can be expressed as a  DFA machine. However, while
investigating this more I appear to have found one example of a regex
which breaks this assumption.

ab+c|abd

Am I correct? Can you think of a deterministic method of computing
this expression? It would be easier with a NFA machine, but given that
the Python method of computing RE's involves pre-compiling a re
object, optimizing the matching engine would make the most sense to
me.

Here's what I have so far:

class State(object):
def __init__(self):
self.nextState = {}
self.nextStateKeys = []
self.prevState = None
self.isMatchState = True
def setNextState(self, chars, iNextState):
self.nextState[chars] = iNextState
self.nextStateKeys = self.nextState.keys()
self.isMatchState = False
def setPrevState(self, iPrevState):
self.prevState = iPrevState
def moveToNextState(self, testChar):
if testChar in self.nextStateKeys:
return self.nextState[testChar]
else:
return None

class CompiledRegex(object):
def __init__(self, startState):
self.startState = startState
def match(self, matchStr):
match_set = []
currentStates = [self.startState]
nextStates = [self.startState]
for character in matchStr:
for state in currentStates:
nextState = state.moveToNextState(character)
if nextState is not None:
nextStates.append(nextState)
if nextState.isMatchState:
print Match!
return
currentStates = nextStates
nextStates = [self.startState]
print No Match!

def compile(regexStr):
startState = State()
currentState = startState
backRefState = None
lastChar = 
for character in regexStr:
if character == +:
currentState.setNextState(lastChar, currentState)
elif character == |:
currentState = startState
elif character == ?:
backRefState = currentState.prevState
elif character == (:
# Implement (
pass
elif character == ):
# Implement )
pass
elif character == *:
currentState = currentState.prevState
currentState.setNextState(lastChar, currentState)
else:
testRepeatState = currentState.moveToNextState(character)
if testRepeatState is None:
newState = State()
newState.setPrevState(currentState)
currentState.setNextState(character, newState)
if backRefState is not None:
backRefState.setNextState(character, newState)
backRefState = None
currentState = newState
else:
currentState = testRepeatState
lastChar = character
return CompiledRegex(startState)

 a = compile(ab+c)
 a.match(abc)
Match!
 a.match(abbc)
Match!
 a.match(ac)
No Match!
 a = compile(ab+c|abd)
 a.match(abc)
Match!
 a.match(abbc)
Match!
 a.match(ac)
No Match!
 a.match(abd)
Match!
 a.match(abbd)
Match!


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex Speed

2007-02-21 Thread garrickp
On Feb 20, 6:14 pm, Pop User [EMAIL PROTECTED] wrote:

 Its very hard to beat grep depending on the nature of the regex you are
 searching using. The regex engines in python/perl/php/ruby have traded
 the speed of grep/awk for the ability to do more complex searches.

 http://swtch.com/~rsc/regexp/regexp1.html

Some darned good reading. And it explains what happened fairly well.
Thanks!

  And python 2.5.2.

 2.5.2? Who needs crystal balls when you've got a time machine? Or did
 you mean 2.5? Or 1.5.2 -- say it ain't so, Joe!

2.5. I'm not entirely sure where I got that extra 2. I blame Monday.

In short... avoid using re as a sledgehammer against every problem. I
had a feeling that would be the case.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating a daemon process in Python

2007-02-21 Thread garrickp
On Feb 21, 9:33 am, Eirikur Hallgrimsson [EMAIL PROTECTED]
wrote:
 Sakagami Hiroki wrote:
  What is the easiest way to create a daemon process in Python?

I've found it even easier to use the built in threading modules:

import time

t1 = time.time()
print t_poc.py called at, t1

import threading

def im_a_thread():
time.sleep(10)
print This is your thread speaking at, time.time()

thread = threading.Thread(target=im_a_thread)
thread.setDaemon(True)
thread.start()
t2 = time.time()
print Time elapsed in main thread:, t2 - t1


Of course, your mileage may vary.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating a daemon process in Python

2007-02-21 Thread garrickp
On Feb 21, 3:34 pm, Benjamin Niemann [EMAIL PROTECTED] wrote:
 That's not a daemon process (which are used to execute 'background services'
 in UNIX environments).

I had not tested this by running the script directly, and in writing a
response, I found out that the entire interpreter closed when the main
thread exited (killing the daemonic thread in the process). This is
different behavior from running the script interactively, and thus my
confusion.

Thanks! ~Garrick

-- 
http://mail.python.org/mailman/listinfo/python-list


Regex Speed

2007-02-20 Thread garrickp
While creating a log parser for fairly large logs, we have run into an
issue where the time to process was relatively unacceptable (upwards
of 5 minutes for 1-2 million lines of logs). In contrast, using the
Linux tool grep would complete the same search in a matter of seconds.

The search we used was a regex of 6 elements ored together, with an
exclusionary set of ~3 elements. Due to the size of the files, we
decided to run these line by line, and due to the need of regex
expressions, we could not use more traditional string find methods.

We did pre-compile the regular expressions, and attempted tricks such
as map to remove as much overhead as possible.

With the known limitations of not being able to slurp the entire log
file into memory, and the need to use regular expressions, do you have
an ideas on how we might speed this up without resorting to system
calls (our current solution)?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex Speed

2007-02-20 Thread garrickp
On Feb 20, 4:15 pm, John Machin [EMAIL PROTECTED] wrote:

 What is an exclusionary set? It would help enormously if you were to
 tell us what the regex actually is. Feel free to obfuscate any
 proprietary constant strings, of course.

My apologies. I don't have specifics right now, but it's something
along the line of this:

error_list = re.compile(rerror|miss|issing|inval|nvalid|math)
exclusion_list = re.complie(rNo Errors Found|Premature EOF, stopping
translate)

for test_text in test_file:
if error_list.match(test_text) and not
exclusion_list.match(test_text):
#Process test_text

Yes, I know, these are not re expressions, but the requirements for
the script specified that the error list be capable of accepting
regular expressions, since these lists are configurable.

 I presume you mean you didn't read the whole file into memory;
 correct? 2 million lines doesn't sound like much to me; what is the
 average line length and what is the spec for the machine you are
 running it on?

You are correct. The individual files can be anywhere from a few bytes
to 2gig. The average is around one gig, and there are a number of
files to be iterated over (an average of 4). I do not know the machine
specs, though I can safely say it is a single core machine, sub
2.5ghz, with 2gigs of RAM running linux.

 map is a built-in function, not a trick. What tricks?

I'm using the term tricks where I may be obfuscating the code in an
effort to make it run faster. In the case of map, getting rid of the
interpreted for loop overhead in favor of the implied c loop offered
by map.

 What system calls? Do you mean running grep as a subprocess?

Yes. While this may not seem evil in and of itself, we are trying to
get our company to adopt Python into more widespread use. I'm guessing
the limiting factor isn't python, but us python newbies missing an
obvious way to speed up the process.

 To help you, we need either (a) basic information or (b) crystal
 balls. Is it possible for you to copy  paste your code into a web
 browser or e-mail/news client? Telling us which version of Python you
 are running might be a good idea too.

Can't copy and paste code (corp policy and all that), no crystal balls
for sale, though I hope the above information helps. Also, running a
trace on the program indicated that python was spending a lot of time
looping around lines, checking for each element of the expression in
sequence.

And python 2.5.2.

Thanks!


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: output to console and to multiple files

2007-02-16 Thread garrickp
On Feb 16, 3:28 pm, Gabriel Genellina [EMAIL PROTECTED] wrote:


 That's ok inside the same process, but the OP needs to use it from a
 subprocess or spawn.
 You have to use something like tee, working with real file handles.


I'm not particularly familiar with this, but it seems to me that if
you're trying to catch stdout/stderr from a program you can call with
(say) popen2, you could just read from the returned stdout/stderr
pipe, and then write to a series of file handles (including
sys.stdout).

Or am I missing something? =)

~G

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: threading and multicores, pros and cons

2007-02-14 Thread garrickp
On Feb 13, 9:07 pm, Maric Michaud [EMAIL PROTECTED] wrote:
 I've heard of a bunch of arguments to defend python's choice of GIL, but I'm
 not quite sure of their technical background, nor what is really important
 and what is not. These discussions often end in a prudent python has made a
 choice among others... which is not really convincing.

Well, INAG (I'm not a Guru), but we recently had training from a Guru.
When we brought up this question, his response was fairly simple.
Paraphrased for inaccuracy:

Some time back, a group did remove the GIL from the python core, and
implemented locks on the core code to make it threadsafe. Well, the
problem was that while it worked, the necessary locks it made single
threaded code take significantly longer to execute.

He then proceeded to show us how to achieve the same effect
(multithreading python for use on multi-core computers) using popen2
and stdio pipes.

FWIW, ~G

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: multi processes

2007-02-14 Thread garrickp
On Feb 14, 7:53 am, amadain [EMAIL PROTECTED] wrote:
 Hi
 Heres a poser. I want to start a program 4 times at exactly the same
 time (emulating 4 separate users starting up the same program). I am
 using pexpect to run the program from 4 separate locations accross the
 network. How do I start the programs running at exactly the same time?
 I want to time how long it takes each program to complete and to show
 if any of the program initiations failed. I also want to check for
 race conditions. The program that I am running is immaterial for this
 question - it could be mysql running queries on the same database for
 example. Using threading, you call start() to start each thread but if
 I call start on each instance in turn I am not starting
 simultaneously.
 A

Standard answers about starting anything at *exactly* the same time
aside, I would expect that the easiest answer would be to have a fifth
controlling program in communication with all four, which can then
send a start message over sockets to each of the agents at the same
time.

There are several programs out there which can already do this. One
example, Grinder, is designed for this very use (creating concurrent
users for a test). It's free, uses Jython as it's scripting language,
and even is capable of keeping track of your times for you. IMO, it's
worth checking out.

http://grinder.sourceforge.net

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: division by 7 efficiently ???

2007-02-06 Thread garrickp
On Feb 1, 8:25 pm, Krypto [EMAIL PROTECTED] wrote:
 The correct answer as told to me by a person is
 (N3) + ((N-7*(N3))3)
 The above term always gives division by 7

Does anybody else notice that this breaks the spirit of the problem
(regardless of it's accuracy)? 'N-7' uses the subtraction operator,
and is thus an invalid solution for the original question.

Build a recursive function, which uses two arbitrary numbers, say 1
and 100. Check each, times 7, and make sure that your target number,
N, is between them. Increase or decrease your arbitrary numbers as
appropriate. Now pick a random number between those two numbers, and
check it. Figure out which two the answer is between, and then check a
random number in that subset. Continue this, and you will drill down
to the correct answer, by using only *, +, , and .

I'll bet money that since this was a programming interview, that it
wasn't a check of your knowledge of obscure formulas, but rather a
check of your lateral thinking and knowledge of programming.

~G

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: division by 7 efficiently ???

2007-02-06 Thread garrickp
On Feb 6, 4:54 pm, John Machin [EMAIL PROTECTED] wrote:
 Recursive? Bzzzt!

I woudl be happy to hear your alternative, which doesn't depend on
language specific tricks. Thus far, all you have suggested is using an
alternative form of the division function, which I would consider to
be outside the spirit of the question (though I have been wrong many
times before).

 Might it not be better to halve the interval at each iteration instead
 of calling a random number function? mid = (lo + hi)  1 looks
 permitted and cheap to me. Also you don't run the risk of it taking a
 very high number of iterations to get a result.

I had considered this, but to halve, you need to divide by 2. Using
random, while potentially increasing the number of iterations, removes
the dependency of language tricks and division.

 Did you notice the important word *efficiently* in line 1 of the spec?
 Even after ripping out recursion and random numbers, your proposed
 solution is still way off the pace.

Again, I look forward to reading your solution.

Respectfully, G.

-- 
http://mail.python.org/mailman/listinfo/python-list