Re: istep() addition to itertool? (Was: Re: Printing n elements per line in a list)

2006-08-20 Thread Justin Azoff
Rhamphoryncus wrote:
 I've run into this problem a few times, and although many solutions
 have been presented specifically for printing I would like to present a
 more general alternative.

[snip interesting istep function]

 Would anybody else find this useful?  Maybe worth adding it to itertool?

yeah, but why on earth did you make it so complicated?

def istep(iterable, step):
a=[]
for x in iterable:
if len(a) = step:
yield a
a=[]
a.append(x)
if a:
yield a

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: trouble using \ as a string

2006-08-20 Thread Justin Azoff
OriginalBrownster wrote:
 i want this because using python I am pulling in filenames from a
 mac..thus they are / in the pathways..and i want to .split it at the
 / to obtain the filename at the end...but its proving diffucult with
 this obstacle in the way.

sounds like you want
import posixpath
posixpath.basename(path)

assuming you are on a windows box,otherwise the normal os.path.basename
will do it.

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: variable creation

2006-08-08 Thread Justin Azoff
Alistair King wrote:
 Hei all,

 im trying to create a list of variables for further use:
[snip]
 this works to a certain extent but gets stuck on some loop. Im a
 beginner and am not sure where im going wrong.

You are trying to do too much in one function.  Split those loops up
into a few little ones and the program will work... or if it doesn't,
you'll know exactly where the problem is.

def get_element(pt):
Return None or a single element from the periodic table
while 1:
el = raw_input(Which element would you like to include? )
if not el: #blank answer
return
if el in pt:
return pt[el]
print This element is not in the periodic table, please try
again

def get_elements(pt):
elements = []
while 1:
el = get_element(pt)
if not el:
break
elements.append(el)

return elements

See how using two separate functions makes it easy to test?
In [10]:print get_element(pt)
Which element would you like to include? X
This element is not in the periodic table, please try again
Which element would you like to include? H
1.00794001

In [11]:print get_elements(pt)
Which element would you like to include? Z
This element is not in the periodic table, please try again
Which element would you like to include? Li
Which element would you like to include? B
Which element would you like to include? H
Which element would you like to include?
[6.9408, 10.811, 1.00794001]


Now, since the information for a single element consists of more than
just a single number, you'll probably want to make a class for them.
Once you have an object for every element, you can add them to a class
for the periodic table.

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: newb question: file searching

2006-08-08 Thread Justin Azoff
[EMAIL PROTECTED] wrote:
 I've narrowed down the problem.  All the problems start when I try to
 eliminate the hidden files and directories.  Is there a better way to
 do this?


Well you almost have it, but your problem is that you are trying to do
too many things in one function.  (I bet I am starting to sound like a
broken record :-))  The four distinct things you are doing are:

* getting a list of all files in a tree
* combining a files directory with its name to give the full path
* ignoring hidden directories
* matching files based on their extension

If you split up each of those things into their own function you will
end up with smaller easier to test pieces, and separate, reusable
functions.

The core function would be basically what you already have:

def get_files(directory, include_hidden=False):
Return an expanded list of files for a directory tree
   optionally not ignoring hidden directories
for path, dirs, files in os.walk(directory):
for fn in files:
full = os.path.join(path, fn)
yield full

if not include_hidden:
remove_hidden(dirs)

and remove_hidden is a short, but tricky function since the directory
list needs to be edited in place:

def remove_hidden(dirlist):
For a list containing directory names, remove
   any that start with a dot

dirlist[:] = [d for d in dirlist if not d.startswith('.')]

at this point, you can play with get_files on it's own, and test
whether or not the include_hidden parameter works as expected.

For the final step, I'd use an approach that pulls out the extension
itself, and checks to see if it is in a list(or better, a set) of
allowed filenames.  globbing (*.foo) works as well, but if you are only
ever matching on the extension, I believe this will work better.

def get_files_by_ext(directory, ext_list, include_hidden=False):
Return an expanded list of files for a directory tree
   where the file ends with one of the extensions in ext_list
ext_list = set(ext_list)

for fn in get_files(directory, include_hidden):
_, ext = os.path.splitext(fn)
ext=ext[1:] #remove dot
if ext.lower() in ext_list:
yield fn

notice at this point we still haven't said anything about images!  The
task of finding files by extension is pretty generic, so it shouldn't
be concerned about the actual extensions.

once that works, you can simply do

def get_images(directory, include_hidden=False):
image_exts = ('jpg','jpeg','gif','png','bmp')
return get_files_by_ext(directory, image_exts, include_hidden)

Hope this helps :-)

--
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: newb question: file searching

2006-08-08 Thread Justin Azoff
[EMAIL PROTECTED] wrote:
 I do appreciate the advice, but I've got a 12 line function that does
 all of that.  And it works!  I just wish I understood a particular line
 of it.

You miss the point.  The functions I posted, up until get_files_by_ext
which is the equivalent of your getFileList, total 17 actual lines.
The 5 extra lines give 3 extra features.  Maybe in a while when you
need to do a similar file search you will realize why my way is better.

[snip]
 The line I don't understand is:
 reversed(range(len(dirnames)))

This is why I wrote and documented a separate remove_hidden function,
it can be tricky.  If you broke it up into multiple lines, and added
print statements it would be clear what it does.

l  = len(dirnames) # l is the number of elements in dirnames, e.g. 6
r  = range(l) # r contains the numbers 0,1,2,3,4,5
rv = reversed(r) # rv contains the numbers 5,4,3,2,1,0

The problem arises from how to remove elements in a list as you are
going through it. If you delete element 0, element 1 then becomes
element 0, and funny things happen.  That particular solution is
relatively simple, it just deletes elements from the end instead.  That
complicated expression arises because python doesn't have normal for
loops.  The version of remove_hidden I wrote is simpler, but relies on
the even more obscure lst[:] construct for re-assigning a list.  Both
of them accomplish the same thing though, so if you wanted, you should
be able to replace those 3 lines with just

dirnames[:] = [d for d in dirnames if not d.startswith('.')]


-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: technique to enter text using a mobile phone keypad (T9 dictionary-based disambiguation)

2006-08-08 Thread Justin Azoff
Petr Jakeš wrote:
 I have a standard 12-key mobile phone keypad connected to my Linux
 machine as a I2C peripheral. I would like to write a code which allows
 the text entry to the computer using this keypad (something like T9 on
 the mobile phones)

 According to the http://www.yorku.ca/mack/uist01.html
 dictionary-based disambiguation is coming in the mind.

 With dictionary-based disambiguation, each key is pressed only once.
 For example, to enter the, the user enters 8-4-3-0. The 0 key, for
 SPACE, delimits words and terminates disambiguation of the preceding
 keys. The key sequence 8-4-3 has 3 × 3 × 3 = 27 possible renderings
 (see Figure 1). The system compares the possibilities to a dictionary
 of words to guess the intended word.

 I would like to ask some guru here to give me the direction which
 technique (Python functionality) or which strategy to use to solve
 this riddle.

 Thanks for your advices and comments

 Regards

 Petr Jakes

I can think of 2 approaches to this, 1) Map the numbers to parts of a
regular expression, and then use this to search through the
dictiionary. 2) Pre-compute a copy of the dictionary converted to it's
numerical equivalent, then just match the numbers.

The basic structure you need for both of these is simple.  For the
first method you use
keys = ['','abc','def','ghi',']

then if you have s=123321
''.join(['[%s]' % keys[int(l)] for l in s])
will give you a string like
'[abc][def][ghi][def][abc]', which you can then use to match words...

I think the second solution would end up being faster, as long as you
have the memory - no regex work, plus, you can sort the wordlist.

The following quickly written class seems to work nicely:

import string
import bisect

letters = string.lowercase
numbers = '222333444555666888999'
letter_mapping = dict(zip(letters, numbers))

class phone:
def __init__(self):
self.read_dictionary()

def word_as_numbers(self, word):
nums=''
for letter in word:
if letter in letter_mapping:
nums += letter_mapping[letter]
return nums

def read_dictionary(self):
words = []
for line in file(/usr/share/dict/words):
word = line.strip().lower()
nums = self.word_as_numbers(word)
words.append((nums, word))

words.sort()
self.dict = words

def get_matching_words(self, number_str):
tup = (number_str,)
left = bisect.bisect_left(self.dict,   tup)

for num, word in self.dict[left:]:
if num.startswith(number_str):
yield word
else:
break


It takes a second or two to read the list of words in, but matching is
instant thanks to bisect:
In [14]:%time p=phone.phone()
CPU times: user 1.65 s, sys: 0.00 s, total: 1.65 s
Wall time: 1.66

In [15]:%time list(p.get_matching_words('43556'))
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.01
Out[15]:['hello', 'hellman', hellman's, hello's, 'hellos']

It seems the ruby version just posted takes a similar approach, but
uses an actual tree.. using the bisect module keeps it simple.

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why do I require an elif statement here?

2006-08-06 Thread Justin Azoff
danielx wrote:
 I'm surprised no one has mentioned neat-er, more pythonic ways of doing
 this. I'm also surprised no one mentioned regular expressions. Regular
 expressions are really powerful for searching and manipulating text.
[snip]

I'm surprised you don't count my post as a neat and pythonic way of
doing this.  I'm also surprised that you mention regular expressions
after neat and pythonic.  While regular expressions often serve a
purpose, they are rarely neat.

 Anyway, here's my solution, which does Not use regular expressions:

 def reindent(line):
 ## we use slicing, because we don't know how long line is
 head = line[:OLD_INDENT]
 tail = line[OLD_INDENT:]
 ## if line starts with Exactly so many spaces...
 if head == whitespace*OLD_INDENT and not tail.startswith(' '):
 return whitespace*NEW_INDENT + tail
 else: return line# our default
[snip]

This function is broken.  Not only does it still rely on global
variables to work, it does not actually reindent lines correctly.  Your
function only changes lines that start with exactly OLD_INDENT spaces,
ignoring any lines that start with a multiple of OLD_INDENT.

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why do I require an elif statement here?

2006-08-04 Thread Justin Azoff
Jim wrote:
 Could somebody tell me why I need the elif char == '\n' in the
 following code?
 This is required in order the pick up lines with just spaces in them.
 Why doesn't
 the else: statement pick this up?

No idea.  Look at the profile of your program: for.. if.. for.. if..
else.. if..  This is NOT good.  The reason why you are having trouble
getting it to work is that you are not writing it in a way that is easy
to debug and test.  If one block of code ends up being indented halfway
across the screen it means you are doing something wrong.

This program should be split up into a handful of small functions that
each do one thing.  The following is slightly longer, but immensely
simpler.  Most importantly, it can be imported from the python shell
and each function can be tested individually.

def leading_spaces(line):
Return the number of leading spaces
num = 0
for char in line:
if char != ' ':
break
num += 1
return num

def change_indent(line, old, new):
Change the indent of this line using a ratio of old:new
ws = leading_spaces(line)

#if there was no leading whitespace,
#or it wasn't a multiple of the old indent, do nothing
if ws == 0 or ws % old:
return line

#otherwise change the indent
new_spaces = ws/old*new
new_indent = ' ' * new_spaces
return new_indent + line.lstrip(' ')


def reindent(ifname, ofname, old, new):
f = open(ifname)
o = open(ofname, 'w')

for line in f:
line = change_indent(line, old, new)
o.write(line)

f.close()
o.close()

if __name__ == __main__:
try :
ifname, ofname, old, new = sys.argv[1:]
old = int(old)
new = int(new)
except ValueError:
print blah
sys.exit(1)

reindent(ifname, ofname, old, new)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help - iter dict

2006-08-03 Thread Justin Azoff
[EMAIL PROTECTED] wrote:
 Im trying to iterate through values in a dictionary so i can find the
 closest value and then extract the key for that valuewhat ive done so far:
[snip]
 short time. I was trying to define a function (its my first!) so that i
 could apply to several 'dictionary's and 'exvalue's.
[snip]

If you plan on searching a single dictionary for many values, it may be
much faster to convert the dictionary into a sorted list, and use the
bisect module to find the closest value...

something like:

import bisect

class closer_finder:
def __init__(self, dataset):
self.dataset = dataset
flat = [(k,v) for v,k in dataset.iteritems()]
flat.sort()
self.flat = flat

def __getitem__(self, target):
flat = self.flat
index = bisect.bisect_right(flat, (target, ))

#simple cases, smaller than the smaller,
#or larger than the largest
if index == 0:
v,k = flat[0]
return k,v
elif index == len(flat):
v,k = flat[-1]
return k,v

#otherwise see which of the neighbors is closest.
leftval, leftkey   = flat[index-1]
rightval, rightkey = flat[index]

leftdiff  = abs(leftval - target)
rightdiff = abs(rightval - target)

if leftdiff = rightdiff:
return leftkey, leftval
else:
return rightkey, rightval

In [158]:sample_data
Out[158]:{'a': 1, 'c': 6, 'b': 3}

In [159]:d=closer_finder(sample_data)

In [160]:d.flat
Out[160]:[(1, 'a'), (3, 'b'), (6, 'c')]

In [161]:d[4]
Out[161]:('b', 3)

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Thread Question

2006-07-28 Thread Justin Azoff
Ritesh Raj Sarraf wrote:
 I'd like to put my understanding over here and would be happy if people can
 correct me at places.
ok :-)

 So here it goes:
 Firstly the code initializes the number of threads. Then it moves on to
 initializing requestQueue() and responseQueue().
 Then it moves on to thread_pool, where it realizes that it has to execute the
 function run().
 From NUMTHREADS in the for loop, it knows how many threads it is supposed to
 execute parallelly.

right...

 So once the thread_pool is populated, it starts the threads.
 Actually, it doesn't start the threads. Instead, it puts the threads into the
 queue.

Neither.. it puts the threads into a list.  It puts the queues into
the thread - by passing them as arguments to the Thread constructor.

 Then the real iteration, about which I was talking in my earlier post, is 
 done.
 The iteration happens in one go. And requestQueue.put(item) puts all the items
 from lRawData into the queue of the run().

It doesn't necessarily have put all the items into the queue at once.
The previous line starts all the threads, which immediately start
running
while 1:
item = request.get()


the default Queue size is infinite, but the program would still work
fine if the queue was fixed to say, 6 elements.  Now that I think of
it, it may even perform better... If you have an iterator that will
generate a very large number of items, and the function being called by
each thread is slow, the queue may end up growing to hold millions of
items and cause the system to run out of memory.

 But there, the run() already known its limitation on the number of threads.

run() doesn't know anything about threads.  All it knows is that it can
call request.get() to get an item to work on, and response.put() when
finished.

 No, I think the above statement is wrong. The actual pool about the number of
 threads is stored by thread_pool. Once its pool (at a time 3 as per this
 example) is empty, it again requests for more threads using the requestQueue()

The pool is never empty. The program works like a bank with 3 tellers.
Each teller knows nothing about any of the other tellers, or how many
people are waiting in the line.  All they know is that when they say
Next! (request.get()) another person steps in front of them.  The
tellers don't move, the line moves.

 And in function run(), when the item of lRawData is None, the thread stops.
 The the cleanup and checks of any remaining threads is done.

Yes, since each thread doesn't know anything about the rest of the
program, when you send it an empty item it knows to quit.  It would be
analogous to the bank teller saying Next! and instead of a customer,
the bank mananger steps forward to tell them that they can go home for
the day.

 Is this all correct ?

Mostly :)  When you understand it fully, you should look at the example
I showed you before.  It is essentially the same thing, just wrapped in
a class to be reusable.

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Thread Question

2006-07-27 Thread Justin Azoff
Ritesh Raj Sarraf wrote:
[snip]
 for item in list_items:
 download_from_web(item)

 This way, one items is downloaded at a time.

 I'm planning to implement threads in my application so that multiple
 items can be downloaded concurrently. I want the thread option to be
 user-defined.
[snip]

See my post about the iterthreader module I wrote...
http://groups.google.com/group/comp.lang.python/browse_frm/thread/2ef29fae28cf44c1/

for url, result in Threader(download_from_web, list_items):
print url, result
#...

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: splitting words with brackets

2006-07-26 Thread Justin Azoff
faulkner wrote:
 er,
 ...|\[[^\]]*\]|...
 ^_^

That's why it is nice to use re.VERBOSE:

def splitup(s):
return re.findall('''
\( [^\)]* \)  |
\[ [^\]]* \]  |
\S+
''', s, re.VERBOSE)

Much less error prone this way

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: splitting words with brackets

2006-07-26 Thread Justin Azoff
Paul McGuire wrote:
 Comparitive timing of pyparsing vs. re comes in at about 2ms for pyparsing,
 vs. 0.13 for re's, so about 15x faster for re's.  If psyco is used (and we
 skip the first call, which incurs all the compiling overhead), the speed
 difference drops to about 7-10x.  I did try compiling the re, but this
 didn't appear to make any difference - probably user error.

That is because of how the methods in the sre module are implemented...
Compiling a regex really just saves you a dictionary lookup.

def findall(pattern, string, flags=0):
snip
return _compile(pattern, flags).findall(string)

def compile(pattern, flags=0):
snip
return _compile(pattern, flags)

def _compile(*key):
# internal: compile pattern
cachekey = (type(key[0]),) + key
p = _cache.get(cachekey)
if p is not None:
return p
#snip

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Nested function scope problem

2006-07-22 Thread Justin Azoff
Josiah Manson wrote:
 I just did some timings, and found that using a list instead of a
 string for tok is significantly slower (it takes 1.5x longer). Using a
 regex is slightly faster for long strings, and slightly slower for
 short ones. So, regex wins in both berevity and speed!

I think the list.append method of building strings may only give you
speed improvements when you are adding bigger chunks of strings
together instead of 1 character at a time. also:

http://docs.python.org/whatsnew/node12.html#SECTION000121

String concatenations in statements of the form s = s + abc and s
+= abc are now performed more efficiently in certain circumstances.
This optimization won't be present in other Python implementations such
as Jython, so you shouldn't rely on it; using the join() method of
strings is still recommended when you want to efficiently glue a large
number of strings together. (Contributed by Armin Rigo.)

I tested both, and these are my results for fairly large strings:

[EMAIL PROTECTED]:/tmp$ python /usr/lib/python2.4/timeit.py -s'import
foo' 'foo.test(foo.breakLine)'
10 loops, best of 3: 914 msec per loop

[EMAIL PROTECTED]:/tmp$ python /usr/lib/python2.4/timeit.py -s'import
foo' 'foo.test(foo.breakLineRE)'
10 loops, best of 3: 289 msec per loop

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Nested function scope problem

2006-07-22 Thread Justin Azoff
Bruno Desthuilliers wrote:
 Justin Azoff a écrit :
  if len(tok)  0:
  should be written as
  if(tok):
 

 actually, the parenthesis are useless.

yes, that's what happens when you edit something instead of typing it
over from scratch :-)

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python newbie needs constructive suggestions

2006-07-21 Thread Justin Azoff
[EMAIL PROTECTED] wrote:
 What is the idiomatically appropriate Python way to pass, as a function-type 
 parameter, code that is most clearly written with a local variable?

 For example, map takes a function-type parameter:

map(lambda x: x+1, [5, 17, 49.5])

 What if, instead of just having x+1, I want an expression that is most 
 clearly coded with a variable that is needed _only_ inside the lambda, e.g. 
 if I wanted to use the name one instead of 1:

map(lambda x: (one = 1  x+one), [5, 17, 49.5])

I believe most people would just write something like this:

def something():
#local helper function to add one to a number
def addone(x):
one = 1
return x+one
return map(addone, [5, 17, 49.5])

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Nested function scope problem

2006-07-21 Thread Justin Azoff
Simon Forman wrote:
 That third option seems to work fine.

Well it does, but there are still many things wrong with it

if len(tok)  0:
should be written as
if(tok):

tok = ''
tok = toc + c
should be written as
tok = []
tok.append(c)
and later
''.join(toc)

anyway, the entire thing should be replaced with something like this:
import re
def breakLine(s):
splitters = '?()|:~,'
chars = '^ \t\n\r\f\v%s' % splitters
regex = '''(
(?:[%s])
|
(?:[%s]+))''' % (splitters, chars)
return re.findall(regex, s,re.VERBOSE)

That should be able to be simplified even more if one were to use the
character lists built into the regex standard.

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary question

2006-07-18 Thread Justin Azoff
Brian Elmegaard wrote:
 for a, e in l[-2].iteritems():
 # Can this be written better?
 if a+c in l[-1]:
 if l[-1][a+c]x+e:
 l[-1][a+c]=x+e
 else:
 l[-1][a+c]=x+e
 #

I'd start with something like

for a, e in l[-2].iteritems():
keytotal  = a+c
valtotal  = x+e
last  = l[-1]
if keytotal in last:
if last[keytotal]  valtotal:
last[keytotal] = valtotal
else:
last[keytotal] = valtotal

Could probably simplify that even more by using min(), but I don't know
what kind of data you are expecting
last[keytotal] = min(last.get(keytotal), valtotal)
comes close to working - it would if you were doing max.


-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Howto Determine mimetype without the file name extension?

2006-07-18 Thread Justin Azoff
Phoe6 wrote:
 Hi all,
  I had a filesystem crash and when I retrieved the data back
 the files had random names without extension. I decided to write a
 script to determine the file extension and create a newfile with
 extension.
[...]
 but the problem with using file was it recognized both .xls (MS Excel)
 and .doc ( MS Doc) as Microsoft Word Document only. I need to separate
 the .xls and .doc files, I dont know if file will be helpful here.

You may want to try the gnome.vfs module:

info = gnome.vfs.get_file_info(filename,
gnome.vfs.FILE_INFO_GET_MIME_TYPE)
info.mime_type #mime type

If all of your documents are .xls and .doc, you could also use one of
the cli tools that converts .doc to txt like catdoc.  These tools will
fail on an .xls document, so if you run it and check for output.  .doc
files would output a lot, .xls files would output an error or nothing.
The gnome.vfs module is probably your best bet though :-)

Additionally, I would re-organize your program a bit. something like:

import os
import re
import subprocess

types = (
('rtf', 'Rich Text Format data'),
('doc', 'Microsoft Office Document'),
('pdf', 'PDF'),
('txt', 'ASCII English text'),
)

def get_magic(filename):
pipe=subprocess.Popen(['file',filename],stdout=subprocess.PIPE)
output = pipe.stdout.read()
pipe.wait()
return output

def detext(filename):
fileoutput = get_magic(filename)
for ext, pattern in types:
if pattern in fileoutput:
return ext


def allfiles(path):
for root,dirs,files in os.walk(os.getcwd()):
for each in files:
fname = os.path.join(root,each)
yield fname

def fixnames(path):
for fname in allfiles(path):
extension = detext(fname)
print fname, extension #

def main():
path = os.getcwd()
fixnames(path)

if __name__ == '__main__':
main()

Short functions that just do one thing are always best.

To change that to use gnome.vfs, just change the types list to be a
dictionary like
types = {
 'application/msword': 'doc',
 'application/vnd.ms-powerpoint': 'ppt',
}

and then

def get_mime(filename):
info = gnome.vfs.get_file_info(filename,
gnome.vfs.FILE_INFO_GET_MIME_TYPE)
return info.mime_type

def detext(filename):
mime_type = get_mime(filename)
return types.get(mime_type)

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: compiling 2.3.5 on ubuntu

2006-07-17 Thread Justin Azoff
Steve Holden wrote:
 I'm quessing because (s)he wants to test programs on less recent
 versions of Python. Ubuntu 5.10 was already up to Python 2.4.2, so I
 can't imagine there's anything older on Ubuntu 6.06.
 
 regards
   Steve

Both are avaiaible...

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


RFC: my iterthreader module

2006-07-17 Thread Justin Azoff
I have this iterthreader module that I've been working on for a while
now.  It is similar to itertools.imap, but it calls each function in
its own thread and uses Queues for moving the data around.  A better
name for it would probably be ithreadmap, but anyway...

The short explanation of it is if you have a loop like
for item in biglist:
print The value for %s is %s % (item, slowfunc(item))
or
for item,val in ((item, slowfunc(item)) for item in biglist):
print The value for %s is %s % (item, val)

you can simply rewrite it as

for item,val in iterthreader.Threader(slowfunc, biglist):
print The value for %s is %s % (item, val)

and it will hopefully run faster.  The usual GIL issues still apply of
course  You can also subclass it in various ways, but I almost
always just call it in the above manner.

So, can anyone find any obvious problems with it?  I've been meaning to
re-post [1]  it to the python cookbook, but I'd like to hear what
others think first.  I'm not aware of any other module that makes this
particular use of threading this simple.

[1] I _think_ I posted it before, but that may have just been in a
comment

import threading
import Queue

class Threader:
def __init__(self, func=None, data=None, numthreads=2):
if not numthreads  0:
raise AssertionError(numthreads should be greater than 0)

if func:
self.handle_input=func
if data:
self.get_input = lambda : data

self._numthreads=numthreads
self.threads = []
self.run()


def __iter__(self):
return self

def next(self):
still_running, input, output = self.DQ.get()
if not still_running:
raise StopIteration
return input, output

def get_input(self):
raise NotImplementedError, You must implement get_input as a
function that returns an iterable

def handle_input(self, input):
raise NotImplementedError, You must implement handle_input as
a function that returns anything

def _handle_input(self):
while 1:
work_todo, input = self.Q.get()
if not work_todo:
break
self.DQ.put((True, input, self.handle_input(input)))

def cleanup(self):
wait for all threads to stop and tell the main iter to
stop
for t in self.threads:
t.join()
self.DQ.put((False,None,None))


def run(self):
self.Q=Queue.Queue()
self.DQ=Queue.Queue()
for x in range(self._numthreads):
t=threading.Thread(target=self._handle_input)
t.start()
self.threads.append(t)

try :
for x in self.get_input():
self.Q.put((True, x))
except NotImplementedError, e:
print e
for x in range(self._numthreads):
self.Q.put((False, None))

threading.Thread(target=self.cleanup).start()


-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: compiling 2.3.5 on ubuntu

2006-07-16 Thread Justin Azoff
Py PY wrote:
 (Apologies if this appears twice. I posted it yesterday and it was held
 due to a 'suspicious header')

 I'm having a hard time trying to get a couple of tests to pass when
 compling Python 2.3.5 on Ubuntu Server Edition 6.06 LTS. I'm sure it's
 not too far removed from the desktop edition but, clearly, I need to
 tweak something or install some missling libs.

Why are you compiling a package that is already built for you?

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Deferred imports

2006-07-14 Thread Justin Azoff
Tom Plunket wrote:
 I'm using this package that I can't import on startup, instead needing
 to wait until some initialization takes place so I can set other
 things up so that I can subsequently import the package and have the
 startup needs of that package met.
[...]
 So as y'all might guess, I have the solution in this sort of thing:

 import os
 global myDeferredModule

 class MyClass:
def __init__(self):
   os.environ['MY_DEFERRED_MODULE_PARAM'] = str(self)
   global myDeferredModule
   import myDeferredModule

 m = MyClass()

 HOWEVER, my problem now comes from the fact that other modules need to
 use this import.  So, I need to either use the above trick, or I
 need to just import the module in every function that needs it (which
 will be almost exclusively the constructors for the objects as they're
 the ones creating Pygame objects).

 So my question is, is 'import' heavy enough to want to avoid doing it
 every time an object is created, or is it fairly light if it's
 currently active somewhere else in the application?  I suppose that
 may depend on how many objects I'm creating, and how frequently I'm
 creating them, but if 'import' resolves to essentially

 if not global_imports.has_key(module):
do_stuff_to_import_module

Importing a module repeatedly certainly won't help the application run
any _faster_, but you may not notice the difference.  Especially if
this particular block of code does not run in a tight loop.  It does
work like you described, the specific dictionary is sys.modules:

 print file('a.py').read()
print THIS IS A!
 'a' in sys.modules
False
 import a
THIS IS A!
 'a' in sys.modules
True
 import a #will not execute a.py again


 ...then I'm not so worried about putting it into every constructor.
 Otherwise I'll do this trick, starting myDeferredModule = None and
 only do the import if not None.

The extra check for not None probably wouldn't be much faster than the
check import does in sys.modules.  Just calling import in the
constructor will also be easier for someone else to understand :-)

 Thanks!

 -tom!

pypy has a neat lazy loading importer, you could see how they implement
it, I think it is just something like

class defer:
def __getattr__(self, attr):
return __import__(attr)
defer = defer()

then defer.foo will import foo for you.

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String handling and the percent operator

2006-07-13 Thread Justin Azoff
Tom Plunket wrote:
   boilerplate = \
 
[big string]
 

   return boilerplate % ((module,) * 3)

 My question is, I don't like hardcoding the number of times that the
 module name should be repeated in the two return functions.  Is there
 an straight forward (inline-appropriate) way to count the number of
 '%s'es in the 'boilerplate' strings?  ...or maybe a different and more
 Pythonic way to do this?  (Maybe I could somehow use generators?)

 thx.
 -tom!

Of course..

 stuff = {'lang': 'python', 'page': 'typesseq-strings.html'}
 print I should read the %(lang)s documentation at
... http://docs.%(lang)s.org/lib/%(page)s % stuff
I should read the python documentation at
http://docs.python.org/lib/typesseq-strings.html


-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression problem

2006-07-13 Thread Justin Azoff
John Blogger wrote:
 That I want a particular tag value of one of my HTML files.

 ie: I want only the value after 'href=' in the tag 

 'link href=mystylesheet.css rel=stylesheet type=text/css'

 here it would be 'mystylesheet.css'. I used the following regex to get
 this value(I dont know if it is good).

No matter how good it is you should still use something that
understands html:

 from BeautifulSoup import BeautifulSoup
 html='link href=mystylesheet.css rel=stylesheet type=text/css'
 page=BeautifulSoup(html)
 page.link.get('href')
'mystylesheet.css'

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression problem

2006-07-13 Thread Justin Azoff
Justin  Azoff wrote:
  from BeautifulSoup import BeautifulSoup
  html='link href=mystylesheet.css rel=stylesheet type=text/css'
  page=BeautifulSoup(html)
  page.link.get('href')
 'mystylesheet.css'

On second thought, you will probably want something like
 [link.get('href') for link in page.fetch('link',{'type':'text/css'})]
['mystylesheet.css']

which will properly handle multiple link tags.

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how can I avoid abusing lists?

2006-07-07 Thread Justin Azoff
Thomas Nelson wrote:
 This is exactly what I want to do: every time I encounter this kind of
 value in my code, increment the appropriate type by one.  Then I'd like
 to go back and find out how many of each type there were.  This way
 I've written seems simple enough and effective, but it's very ugly and
 I don't think it's the intended use of lists.  Does anyone know a
 cleaner way to have the same funtionality?

 Thanks,
 THN

Just assign each type a number (type1 - 1, type2 - 2) and then count
the values as usual

def count(map, it):
d={}
for x in it:
x = map[x] #only difference from normal count function
#d[x]=d.get(x,0)+1
if x in d:
d[x] +=1
else:
d[x] = 1
return d

 map = {0:1, 1:1, 2:3, 3:1, 4:2}
 count(map, [1,1,0,4])
{1: 3, 2: 1}
 for x in count(map, [1,1,0,4]).items():
...  print 'type%d: %d' %x
... 
type1: 3
type2: 1

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Built-in Exceptions - How to Find Out Possible Errno's

2006-07-05 Thread Justin Azoff
Gregory Piñero wrote:
 Hi Guys,

 I'm sure this is documented somewhere, I just can't locate it.  Say I
 have this code:

 try:
   myfile=file('greg.txt','r')
 except IOError, error:
[...]
 So basically I'm looking for the document that tells me what possible
 errors I can catch and their numbers.

 I did find this but it doesn't have numbers and I can't tell if it's
 even what I'm looking for:
 http://docs.python.org/lib/module-errno.html

 Much thanks!

that IS the module you are looking for.

 help(errno)
[...]
DESCRIPTION
The value of each symbol is the corresponding integer value,
e.g., on most systems, errno.ENOENT equals the integer 2.
[...]
ENODATA = 61
ENODEV = 19
ENOENT = 2
ENOEXEC = 8

all those E* constants ARE the numbers.

furthermore, the object you get back from except has both the code and
the string already:
 e
exceptions.IOError instance at 0xb7ddbfec
 print e
[Errno 2] No such file or directory: 'foo'
 dir(e)
['__doc__', '__getitem__', '__init__', '__module__', '__str__', 'args',
'errno', 'filename', 'strerror']
 e.strerror
'No such file or directory'

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: smtplib problem for newbie

2006-06-22 Thread Justin Azoff
Noah Gift wrote:
[snip]
 a = long(time.time() * 256) # use fractional seconds
 TypeError: 'module' object is not callable

Part of your program includes a file or directory that you called
'long'.  You should not re-use names of built-ins in your programs..
they cause you to get errors like the above.

see:

 long('12')
12L
 open(long.py,'w')
open file 'long.py', mode 'w' at 0x401e3380
 import long
 long('12')
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: 'module' object is not callable
 


-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python to PHP Login System (HTTP Post)

2006-06-22 Thread Justin Azoff
Jeethu Rao wrote:
 You need to use httplib.
 http://docs.python.org/lib/httplib-examples.html

 Jeethu Rao

Not at all.  They need to read the documentation for urrlib:

http://docs.python.org/lib/module-urllib.html
http://docs.python.org/lib/node483.html
The following example uses the POST method instead:

Additionally, they probably need to use cookielib, otherwise the logged
in state will not be persistant.

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: newb: comapring two strings

2006-05-18 Thread Justin Azoff
manstey wrote:
 Hi,

 Is there a clever way to see if two strings of the same length vary by
 only one character, and what the character is in both strings.

 E.g. str1=yaqtil str2=yaqtel

 they differ at str1[4] and the difference is ('i','e')

something like this maybe?

 str1='yaqtil'
 str2='yaqtel'
 set(enumerate(str1)) ^ set(enumerate(str2))
set([(4, 'e'), (4, 'i')])


-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: best way to determine sequence ordering?

2006-04-29 Thread Justin Azoff
John Salerno wrote:
 If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'],
 and then figure out if a certain element precedes another element, what
 would be the best way to do that?

 Looking at the built-in list functions, I thought I could do something like:

 if L.index('A')  L.index('D'):
  # do some stuff

This actually performs pretty well since list.index is implemented in
C.

The obvious (to me) implementation of:
def before(lst, a, b):
for x in lst:
if x == a:
return True
if x == b:
return False

runs about 10-50 times faster than the double index method if I use
psyco.  Without psyco, it ends up being faster for the cases where a or
b appears early on in the list, and the other appears towards the end.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a simple regex question

2006-03-31 Thread Justin Azoff

John Salerno wrote:
 Ok, I'm stuck on another Python challenge question. Apparently what you
 have to do is search through a huge group of characters and find a
 single lowercase character that has exactly three uppercase characters
 on either side of it. Here's what I have so far:

 pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
 print re.search(pattern, mess).groups()

 Not sure if 'groups' is necessary or not.

 Anyway, this returns one matching string, but when I put this letter in
 as the solution to the problem, I get a message saying yes, but there
 are more, so assuming this means that there is more than one character
 with three caps on either side, is my RE written correctly to find them
 all? I didn't have the parentheses or + sign at first, but I added them
 to find all the possible matches, but still only one comes up.

 Thanks.

I don't believe you _need_ the parenthesis or the + in that usage...

Have a look at http://docs.python.org/lib/node115.html

It should be obvious which method you need to use to find them all

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Counting number of each item in a list.

2006-03-19 Thread Justin Azoff
Bruno Desthuilliers wrote:
 And of course, I was right. My solution seems to be faster than Paul's
 one (but slower than bearophile's), be it on small, medium or large lists.

Your version is only fast on lists with a very small number of unique
elements.

changing mklist to have
items = range(64) instead of the 9 item list and re-timing you will get
better results:

A100 (1 times): 7.63829684258
B100 (1 times): 1.34028482437
C100 (1 times): 0.812223911285

A1 (100 times): 9.78499102592
B1 (100 times): 1.26520299911
C1 (100 times): 0.857560873032

A100 (10 times): 87.6713900566
B100 (10 times): 12.7302949429
C100 (10 times): 8.35931396484



-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: general coding issues - coding style...

2006-02-18 Thread Justin Azoff
Dylan Moreland wrote:
 I would look into one of the many Vim scripts which automatically fold
 most large blocks without the ugly {{{.

Who needs a script?
set foldmethod=indent
works pretty well for most python programs.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: append to the end of a dictionary

2006-01-25 Thread Justin Azoff
Magnus Lycka wrote:
  orderedListOfTuples = [(k,mydict[k]) for k in sorted(mydict.keys())]

orderedListOfTuples = sorted(mydict.items())

 It's great that many people try to help out on comp.lang.python,
 the community won't survive otherwise, but I think it's important
 to test answers before posting them, unless you're sure about your
 answer. A wrong answer might actually be worse than no answer at
 all. I fear that newbies will just get scared off if they get bunch
 a of replies to their questions, and most are wrong.

indeed.

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiway Branching

2006-01-09 Thread Justin Azoff
[EMAIL PROTECTED] wrote:
 I need to look at two-byte pairs coming from a machine, and interpret the
 meaning based on the relative values of the two bytes. In C I'd use a switch
 statement. Python doesn't have such a branching statement. I have 21
 comparisons to make, and that many if/elif/else statements is clunky and
 inefficient. Since these data are coming from an OMR scanner at 9600 bps (or
 faster if I can reset it programmatically to 38K over the serial cable), I
 want a fast algorithm.

   The data are of the form:

   if byte1 == 32 and byte2 == 32:
   row_value = 0
   elif byte1 == 36 and byte2 == 32:
   row_value = natural
...
   elif byte1 == 32 and byte2 == 1:
   row_value = 5
   elif byte1 == 66 and byte2 == 32:
   row_value = 0.167

   There are two rows where the marked response equates to a string and 28
 rows where the marked response equates to an integer (1-9) or float of
 defined values.

   Suggestions appreciated.

 Rich

 --
 Richard B. Shepard, Ph.D.   |   Author of Quantifying 
 Environmental
 Applied Ecosystem Services, Inc. (TM)   |  Impact Assessments Using Fuzzy 
 Logic
 http://www.appl-ecosys.com Voice: 503-667-4517 Fax: 503-667-8863

Use a dictionary:

byte_values = {
   (32,32)  : 0,
   (36,32)  : 'natural',
   (32,1 )  : 5,
}

row_value = byte_values[byte1,byte2]

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiway Branching

2006-01-09 Thread Justin Azoff
[EMAIL PROTECTED] wrote:
 I need to look at two-byte pairs coming from a machine, and interpret the
 meaning based on the relative values of the two bytes. In C I'd use a switch
 statement. Python doesn't have such a branching statement. I have 21
 comparisons to make, and that many if/elif/else statements is clunky and
 inefficient. Since these data are coming from an OMR scanner at 9600 bps (or
 faster if I can reset it programmatically to 38K over the serial cable), I
 want a fast algorithm.

   The data are of the form:

   if byte1 == 32 and byte2 == 32:
   row_value = 0
   elif byte1 == 36 and byte2 == 32:
   row_value = natural
...
   elif byte1 == 32 and byte2 == 1:
   row_value = 5
   elif byte1 == 66 and byte2 == 32:
   row_value = 0.167

   There are two rows where the marked response equates to a string and 28
 rows where the marked response equates to an integer (1-9) or float of
 defined values.

   Suggestions appreciated.

 Rich

 --
 Richard B. Shepard, Ph.D.   |   Author of Quantifying 
 Environmental
 Applied Ecosystem Services, Inc. (TM)   |  Impact Assessments Using Fuzzy 
 Logic
 http://www.appl-ecosys.com Voice: 503-667-4517 Fax: 503-667-8863

Use a dictionary:

byte_values = {
   (32,32)  : 0,
   (36,32)  : 'natural',
   (32,1 )  : 5,
}

row_value = byte_values[byte1,byte2]

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Number set type

2005-12-28 Thread Justin Azoff
You could use IPy...
http://svn.23.nu/svn/repos/IPy/trunk/IPy.py is one location for it...

I wonder where you get O(n) and O(n^2) from... CIDR blocks are all
sequential.. All you need to store is the starting and ending address
or length.  Then any set operation only has to deal with 4 numbers, and
should be literally a few lines of code with no loops.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Number set type

2005-12-28 Thread Justin Azoff
Heiko Wundram wrote:
 Union of two IP4Ranges is simply normalizing a concatenated list of both
 IP4Range ranges. Normalizing takes O(log n)+O(n) = O(n) steps, where n is
 the number of ranges in the combined IP4Range.

I see now :-)  If the ranges are sorted, I bet you could just iterate
through both at the same time, merging intersecting ranges where
possible.

 Intersection takes O(n^2) steps in my current implementation (which I know
 is mathematically correct), where n is max(n_1,n_2) where n_1 is the number
 of ranges in the first IP4Range and n_2 the number of ranges in the second
 IP4Range respectively.

 Intersecting two IP4Ranges can be done with fewer steps, and I think it
 could be done in O(n) in the case of normalized and sorted ranges, and I
 have a few ideas of myself, but I'm currently too lazy to try to prove them
 correct.

Yes.. if they are sorted, something like this should work:

def intersection(self, other):
ret = []
ai=iter(self.ranges)
bi=iter(other.ranges)
try :
a = ai.next()
b = bi.next()
except StopIteration:
return IP4Range([])

while 1:
try :
if a.intersects(b):
ret.append(a.intersection(b))
a = ai.next()
b = bi.next()
elif a.start  b.start:
a = ai.next()
else :
b = bi.next()
except StopIteration:
break
return IP4Range(ret)


-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Number set type

2005-12-28 Thread Justin Azoff
Justin  Azoff wrote:
 Yes.. if they are sorted, something like this should work:
Oops, that was almost right, but it would skip some ranges.

This should always work:

...
while 1:
try :
if a.intersects(b):
ret.append(a.intersection(b))
if a.end  b.end:
a = ai.next()
else :
b = bi.next()
elif a.start  b.start:
a = ai.next()
else :
b = bi.next()
except StopIteration:
break
return RangeList(ret)



-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python coding contest

2005-12-27 Thread Justin Azoff
Tim Hochberg wrote:
 Note that in principle it's possible to encode the data for how to
 display a digit in one byte. Thus it's at least theoretically possible
 to condense all of the information about the string into a string that's
 10 bytes long. In practice it turns out to be hard to do that, since a
 10 byte string will generally have a representation that is longer than
 10 bytes because of the way the escape sequences get printed out. As a
 result various people seem to be encoding the data in long integers of
 one sort or another. The data is then extracted using some recipe
 involving shifts and s.

 -tim

I have a 163 character version(on 8 lines, haven't tried to compress it
further) that does something like that.. the string ended up being
printable enough to be included in the source unescaped.

I think for most approaches, any space you save by using a string you
lose after quoting it and using ord() to turn a character back into a
number.

I'm sure this particular method is a dead end, but it is a very
intersting and probably unique  solution :-)

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python coding contest

2005-12-26 Thread Justin Azoff
Tim Hochberg wrote:
 In the 130's is definately possible, but I haven't heard of anyone doing
 better than that.

I have a version that is 127, but only if you strip extra whitespace
:-(

-- 
- Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python coding contest

2005-12-25 Thread Justin Azoff
 c=open(seven_seg.py).read()
 len(c)
251
 len(c.replace( ,))
152

:-)

Knowing me, I'll forget to submit it.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Some simple performace tests (long)

2005-08-06 Thread Justin Azoff
How much ram does your machine have?
the main point is except when a very large range is used on a
memory-starved machine


run
x = range(10 ** 6)
and look at the memory usage of python..

what happens when you run this program:

import time

def t(func, num):
s = time.time()
for x in func(num):
pass
return time.time() - s

def run(func, num):
times = []
for x in range(5):
times.append(t(func,num))
return min(times), max(times), sum(times)/5

def main():
x = 10 ** 6
while 1:
print trying, x
for s, f in ('xr', xrange), (' r', range):
print s +  %.3f %.3f %.3f % run(f, x)
x *= 1.5
x = int(x)


if __name__ == __main__:
main()


I get (columns are mix/max/average):

trying 100
xr 0.110 0.115 0.111
 r 0.101 0.186 0.119
trying 150
xr 0.082 0.087 0.083
 r 0.152 0.158 0.154
trying 225
xr 0.124 0.138 0.128
 r 0.228 0.235 0.230
trying 3375000
xr 0.184 0.189 0.186
 r 0.344 0.352 0.346
trying 5062500
xr 0.276 0.284 0.279
 r 0.515 0.528 0.519
trying 7593750
xr 0.415 0.421 0.416
 r 0.774 0.795 0.779
trying 11390625
xr 0.623 0.634 0.626
 r 1.163 1.246 1.180
trying 17085937
xr 0.934 0.941 0.937
Killed

The Killed is from the linux OOM killing the python process.. notice
that the xrange for that number worked fine.

-- 
http://mail.python.org/mailman/listinfo/python-list