Re: find overlapping lines & output times observed

2013-05-06 Thread Oscar Benjamin
On 6 May 2013 19:39, Linsey Raaijmakers  wrote:
> I have a file like this:
> action startend
> 50  53215321
> 7   53235347
> 12  53395351
> 45  53735373
> 45  54205420
> 25  54255425
[snip]

your code below suggests that your file also has an "apex" column. If
that's correct what do you what is it and what do you want to do with
it?

[snip]
> I have a script now that identifies overlap between two actions (see bottom 
> page), but how can I change this so that it outputs all possible combinations?

I looked at that code and I don't think it does what you describe. Are
you sure that it works?

> My desired output would be:
>
> actiontimes observedapex
> 50 5  5321, 5451, 5533, 5634,  5740
> 50,451  5533;5533
> 7   4  5347, 5689, 5688, 5845
> 7,25  2  5347;5425, 5689;5682
> 7,25,26 1  5689;5682;5690
>
> CODE:
>
> from collections import Counter
> f = open('and.txt','r');
>
> action_list = []
> onset_list = []
> apex_list = []
> offset_list = []
> action_counter = 0
> combination_list = []
>
>
> for line in f:
>   fields = line.split("\t")
>   for col in fields:
> action = fields[0]
> onset = fields[1]
> apex = fields[2]
> offset = fields[3]

The above code suggests that the file has four columns. Also since
you're not actually using the loop variable "col" you can just delete
the "for" line and unindent the rest. In fact the easiest way to do
all of this is just:

action, onset, apex, offset = line.split()

>
>   action_list.append(action)
>   onset_list.append(onset)
>   apex_list.append(apex)
>   offset_list.append(offset)
>
> action_cnvrt = map(int, action_list)
> c = Counter(action_cnvrt)
>
> filtered = list(set(action_list))

There's no need to convert this back to a list if you're just going to
iterate over it again with map.

> filtered_cnvrt = map(int, filtered)
>
> for a in filtered_cnvrt:
>   action_count = str(a)+"\t"+str(c[a])
>   print action_count

The above counts the number of times each event occurs which is one
part of your desired output.

>
> for i in range (0,len(onset_list)):
>   combination_list.append(action_list[i])
>   for j in range(0,len(apex_list)):
> if i != j:
>   if onset_list[j]>= onset_list[i] and apex_list[j] <= apex_list[i]:
> print 
> action_list[j]+","+action_list[i]+'\t'+onset_list[j]+'\t'+apex_list[j]+'\t'+onset_list[i]+'\t'+apex_list[i]

What is combination_list for? It should just end up containing the
same thing as action_list if I understand correctly.

It's generally better in Python to loop directly over things rather
than using indices so, instead of something like:

  for i in range(len(onset_list)):
print offset_list[i] - onset_list[i]

you should do something like

  for offset, onset in zip(offset_list, onset_list):
print offset - onset

and your code will be a lot clearer.

The algorithm you are using is to loop over all events and then to
loop over all other events comparing all pairs of events. This will
not scale very well if you want to look at a large file or to compare
simultaneous occurrences of more than two events.

It looks as if your input data is ordered by the onset column. Is that
the case? If so then you can use an algorithm that just loops once
over all the events. The way the algorithm works is that you store
which events are currently active and loop through the events keeping
track of the start time of the most recently added event adding and
removing events as they start and stop. In pseudocode:

now = start of first event
active = [first event]
for next_starting in events (not including first):
next_ending = event that ends soonest from active
while start of next_starting > end of next_ending:
report active events from now to end of next_ending
now = end of next_ending
remove next_ending from active
report active events from now until start of next_starting
now = start of next_starting
add next_starting to active

And some more code to deal with what happens when you get to the end
of the list of events...

The report steps probably mean adding to a Counter or dict to remember
that the currently active events were active during each particular
time window.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: object.enable() anti-pattern

2013-05-09 Thread Oscar Benjamin
On 9 May 2013 14:07, Roy Smith  wrote:
> In article <518b32ef$0$11120$c3e8...@news.astraweb.com>,
>  Steven D'Aprano  wrote:
>
>> There is no sensible use-case for creating a file without opening it.
>
> Sure there is.  Sometimes just creating the name in the file system is
> all you want to do.  That's why, for example, the unix "touch" command
> exists.

Wouldn't the code that implements the touch command just look
something like this:

f = open(filename)
f.close()

Or is there some other way of creating the file that doesn't open it
(I mean in general not just in Python)?


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: object.enable() anti-pattern

2013-05-10 Thread Oscar Benjamin
On 10 May 2013 15:01, Roy Smith  wrote:
> In article ,
>  Robert Kern  wrote:
>
>> I'd be curious to see in-the-wild instances of the anti-pattern that
>> you are talking about, then. I think everyone agrees that entirely
>> unmotivated "enable" methods should be avoided, but I have my doubts
>> that they come up very often.
>
> As I mentioned earlier in this thread, this was a common pattern in the
> early days of C++, when exceptions were a new concept and handled poorly
> by many compilers (and, for that matter, programmers).
>
> There was a school of thought that constructors should never be able to
> fail (because the only way for a constructor to fail is to throw an
> exception).  The pattern was to always have the constructor succeed, and
> then either have a way to check to see if the newly-constructed object
> was valid, or have a separate post-construction initialization step
> which could fail.

It's not just because of exceptions. In C++ virtual method calls in a
constructor for a class A will always call the methods of class A even
if the object being constructed is actually of a subclass B because
the B part of the object isn't initialised when the A constructor is
called. There may be a better way to do this since I last used C++ but
as I remember it the two-phase pattern was a recommended way to
implement polymorphic behaviour during initialisation.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 2.7.x - problem with obejct.__init__() not accepting *args and **kwargs

2013-05-15 Thread Oscar Benjamin
On 15 May 2013 12:18, wzab  wrote:
> I had to implement in Python 2.7.x a system which heavily relies on
> multiple inheritance.
> Working on that, I have came to very simplistic code which isolates
> the problem:
> (The essential thing is that each base class receives all arguments
> and uses only those,
> which it understands).
>
[snip]
>
> I have found a workaround:
>
> # Class my_object added only as workaround for a problem with
> # object.__init__() not accepting any arguments.
[snip]
>
> The above works correctly, producing the same results as the first
> code in Python 2.5.2,
> but anyway it seems to me just a dirty trick...
> What is the proper way to solve that problem in Python 2.7.3?

I don't generally use super() but I did see some advice about it in
this article:
https://fuhm.net/super-harmful/

>From the conclusion:
"Never use positional arguments in __init__ or __new__. Always use
keyword args, and always call them as keywords, and always pass all
keywords on to super."


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Determine actually given command line arguments

2013-05-15 Thread Oscar Benjamin
On 15 May 2013 13:52, Henry Leyh  wrote:
> On 15.05.2013 14:24, Roy Smith wrote:
>>
>> In article ,
>>   Henry Leyh  wrote:
>>
>>> Is there a simple way to determine which
>>> command line arguments were actually given on the commandline, i.e. does
>>> argparse.ArgumentParser() know which of its namespace members were
>>> actually hit during parse_args().
>>
>>
>> I think what you're looking for is sys.argv:
>>
>> $ cat argv.py
>> import sys
>> print sys.argv
>>
>> $ python argv.py foo bar
>> ['argv.py', 'foo', 'bar']
>
> Thanks, but as I wrote in my first posting I am aware of sys.argv and was
> hoping to _avoid_ using it because I'd then have to kind of re-implement a
> lot of the stuff already there in argparse, e.g. parsing sys.argv for
> short/long options, flag/parameter options etc.
>
> I was thinking of maybe some sort of flag that argparse sets on those
> optional arguments created with add_argument() that are really given on the
> command line, i.e. those that it stumbles upon them during parse_args().

I don't know about that but I imagine that you could compare values
with their defaults to see which have been changed.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 2.7.x - problem with obejct.__init__() not accepting *args and **kwargs

2013-05-16 Thread Oscar Benjamin
On 16 May 2013 03:06, Steven D'Aprano
 wrote:
> On Wed, 15 May 2013 13:16:09 +0100, Oscar Benjamin wrote:
>
>
>> I don't generally use super()
>
> Then you should, especially in Python 3.
>
> If you're not using super in single-inheritance classes, then you're
> merely making your own code harder to read and write, and unnecessarily
> difficult for others to use with multiple-inheritance.
>
> If you're not using super in multiple-inheritance[1] classes, then your
> code is probably buggy.
>
> There really is no good reason to avoid super in Python 3.

I should have been clearer. I don't generally use super() because I
don't generally use Python in a very object-oriented way. My comment
was intended as a qualification of my advice rather than a suggestion
that there is something wrong with super(). I can certainly see how
that would be misinterpreted given the article I linked to:

>> but I did see some advice about it in this article:
>> https://fuhm.net/super-harmful/
>
> It's not a good article. The article started off claiming that super was
> harmful, hence the URL. He's had to back-pedal, and *hard*. The problem
> isn't that super is harmful, it is that the problem being solved --
> generalized multiple inheritance -- is inherently a fiendishly difficult
> problem to solve. Using super and cooperative multiple inheritance makes
> it a merely difficult but tractable problem.
>
> The above article is useful to see the sorts of issues that can come up
> in multiple inheritance, and perhaps as an argument for avoiding MI
> (except in the tamed versions provided by mixins or straits). But as an
> argument against super? No.

I read that article when I was trying to do something with multiple
inheritance. It was helpful to me at that time as it explained why
whatever I was trying to do (I don't remember) was never really going
to work.

>
> A much better article about super is:
>
> http://rhettinger.wordpress.com/2011/05/26/super-considered-super/

This is a good article and I read it after Ian posted it.

>
>
>> From the conclusion:
>> "Never use positional arguments in __init__ or __new__. Always use
>> keyword args, and always call them as keywords, and always pass all
>> keywords on to super."
>
> Even that advice is wrong. See Super Considered Super above.

Raymond's two suggestions for signature are:
'''
One approach is to stick with a fixed signature using positional
arguments. This works well with methods like __setitem__ which have a
fixed signature of two arguments, a key and a value. This technique is
shown in the LoggingDict example where __setitem__ has the same
signature in both LoggingDict and dict.

A more flexible approach is to have every method in the ancestor tree
cooperatively designed to accept keyword arguments and a
keyword-arguments dictionary, to remove any arguments that it needs,
and to forward the remaining arguments using **kwds, eventually
leaving the dictionary empty for the final call in the chain.
'''

The first cannot be used with object.__init__ and the second is not
what the OP wants. I think from the article that the appropriate
suggestion is to do precisely what the OP has done and make everything
a subclass of a root class that has the appropriate signature. Perhaps
instead of calling it my_object it could have a meaningful name
related to what the subclasses are actually for and then it wouldn't
seem so much like a dirty trick.

> [1] To be precise: one can write mixin classes without super, and
> strictly speaking mixins are a form of multiple inheritance, but it is a
> simplified version of multiple inheritance that avoids most of the
> complications.

They're also mostly the only kind of multiple inheritance that I would
think of using.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Harmonic distortion of a input signal

2013-05-19 Thread Oscar Benjamin
On 19 May 2013 23:25,   wrote:
> How can i at least find a peek in FFT spectrum of a square wave ?
> From there i could easily build formula. Sorry for bothering but i am new to 
> Python.

Are you the same person who posted the original question?

You probably want to use numpy for this. I'm not sure if I understand
your question but here goes:

First import numpy (you may need to install this first):

>>> import numpy as np

Create a square wave signal:

>>> x = np.zeros(50)
>>> x[:25] = -1
>>> x[25:] = +1
>>> x
array([-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
   -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,  1.,
1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

Compute the magnitude spectrum:

>>> spect = abs(np.fft.fft(x)[:25])
>>> spect
array([  0.,  31.85194222,   0.,  10.67342282,
 0.,   6.47213595,   0.,   4.69726931,
 0.,   3.73254943,   0.,   3.13762901,
 0.,   2.7436023 ,   0.,   2.47213595,
 0.,   2.28230601,   0.,   2.15105461,
 0.,   2.06487174,   0.,   2.01589594,   0.])

Find the index of the maximum element:

>>> np.argmax(spect)
1

So the peak is the lowest non-zero frequency component of the DFT. In
Hz this corresponds to a frequency of 1/T where T is the duration of
the signal.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Harmonic distortion of a input signal

2013-05-21 Thread Oscar Benjamin
On 20 May 2013 18:23, jmfauth  wrote:
> Non sense.
>
> The discrete fft algorithm is valid only if the number of data
> points you transform does correspond to a power of 2 (2**n).

As with many of your comments about Python's unicode implementation
you are confusing performance with validity. The DFT is defined and is
a valid invertible map (barring roundoff) for complex vectors of any
integer length. It is also a valid method for understanding the
frequency content of periodic signals. The fastest FFT algorithms are
for vectors whose length is a power of 2 but the other algorithms
produce equally *valid* DFT results.

In the example I posted the computation of the DFT using numpy.fft.fft
was (as far as I could tell) instantaneous. I could use timeit to
discover exactly how many microseconds it took but why when I already
have the results I wanted?

> Keywords to the problem: apodization, zero filling, convolution
> product, ...
>
> eg. http://en.wikipedia.org/wiki/Convolution

These points are not relevant to the example given.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file I/O and arithmetic calculation

2013-05-22 Thread Oscar Benjamin
On 22 May 2013 22:05, Carlos Nepomuceno  wrote:
>
> filenames = ['1.txt', '2.txt', '3.txt', '4.txt', '5.txt']
> contents  = [[[int(z) for z in y.split(',')] for y in open(x).read().split()] 
> for x in filenames]
> s1c  = [sum([r[0] for r in f]) for f in contents]
> a1r  = [sum(f[0])/float(len(f[0])) for f in contents]
> print '\n'.join([x for x in ['File "{}" has 1st row average = 
> {:.2f}'.format(n,a1r[i]) if s1c[i]==50 else '' for i,n in 
> enumerate(filenames)] if x])

Do you find this code easy to read? I wouldn't write something like
this and I certainly wouldn't use it when explaining something to a
beginner.

Rather than repeated list comprehensions you should consider using a
single loop e.g.:

for filename in filenames:
# process each file

This will make the code a lot simpler.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP 378: Format Specifier for Thousands Separator

2013-05-22 Thread Oscar Benjamin
On 22 May 2013 23:31, Carlos Nepomuceno  wrote:
>
> I still don't understand why % benefits from literals optimization 
> ("'%d'%12345") while '{:d}'.format(12345) doesn't.

There's no reason why that optimisation can't happen in principle.
However no one has written a patch for it. Why don't you look into
what it would take to make it happen?


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file I/O and arithmetic calculation

2013-05-22 Thread Oscar Benjamin
On 23 May 2013 00:49, Carlos Nepomuceno  wrote:
>
> The code is pretty obvious to me, I mean there's no obfuscation at all.

I honestly can't tell if you're joking.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file I/O and arithmetic calculation

2013-05-23 Thread Oscar Benjamin
On 23 May 2013 04:15, Carlos Nepomuceno  wrote:
> The last line of my noob piece can be improved. So this is it:

Most of it can be improved.

> filenames = ['1.txt', '2.txt', '3.txt', '4.txt', '5.txt']
> contents  = [[[int(z) for z in y.split(',')] for y in open(x).read().split()] 
> for x in filenames]
> s1c  = [sum([r[0] for r in f]) for f in contents]
> a1r  = [sum(f[0])/float(len(f[0])) for f in contents]
> print '\n'.join(['File "{}" has 1st row average = {:.2f}'.format(n,a1r[i]) 
> for i,n in enumerate(filenames) if s1c[i]==50])

You're writing repeated list comprehensions that feed into one another
like this:

list2 = [func1(x) for x in list1]
list3 = [func2(y) for y in list2]
list4 = [func3(y) for y in list2]

In this case it is usually better to write a single loop

for x in list1:
y = func1(x)
v = func2(y)
w = func3(y)

With that your code becomes:

filenames = ['1.txt', '2.txt', '3.txt', '4.txt', '5.txt']
for filename in filenames:
contents  = [[int(z) for z in y.split(',')] for y in
open(filename).read().split()]
s1c  = sum([r[0] for r in contents])
a1r  = sum(f[0])/float(len(contents[0]))
if s1c == 50:
print('File "{}" has 1st row average = {:.2f}'.format(filename,a1r))

However you shouldn't really be doing open(x).read().split() part. You
should use the with statement to open the files:

with open(filename, 'rb') as inputfile:
contents = [map(int, line.split()) for line in inputfile]

Of course if you don't have so many list comprehensions in your code
then your lines will be shorter and you won't feel so much pressure to
use such short variable names. It's also better to define a mean
function as it makes it clearer to read:

# Needed by the mean() function in Python 2.x
from  __future__ import division

def mean(numbers):
return sum(numbers) / len(numbers)

filenames = ['1.txt', '2.txt', '3.txt', '4.txt', '5.txt']

for filename in filenames:
with open(filename, 'rb') as inputfile:
matrix = [map(int, line.split()) for line in inputfile]
column1 = [row[0] for row in matrix]
row1 = matrix[0]
if mean(column1) == 50:
print('File "{}" has 1st row average =
{:.2f}'.format(filename, mean(row1)))

It's all a little easier if you use numpy:

import numpy as np

filenames = ['1.txt', '2.txt', '3.txt', '4.txt', '5.txt']

for filename in filenames:
matrix = np.loadtxt(filename, dtype=int)
column1 = matrix[:, 0]
row1 = matrix[0, :]
if sum(column1) == 50 * len(column1):
print('File "{}" has 1st row average =
{:.2f}'.format(filename, np.mean(row1)))

Then again in practise I wouldn't be testing for equality of the mean.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fatal Python error

2013-05-29 Thread Oscar Benjamin
On 29 May 2013 12:48, Joshua Landau  wrote:
> Hello all, again. Instead of revising like I'm meant to be, I've been
> delving into a bit of Python and I've come up with this code:

Here's a simpler example that gives similar results:

$ py -3.3
Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600
32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> def broken():
...   try:
... broken()
...   except RuntimeError:
... broken()
...
>>> broken()
Fatal Python error: Cannot recover from stack overflow.

Current thread 0x058c:
  File "", line 3 in broken
  File "", line 3 in broken
...

Under Python 2.7.5 it just goes into an infinite loop. Under Python
3.2.5 and 3.3.2 it crashes the interpreter as shown above.

What the broken() function is doing is totally stupid: responding to a
recursion error with more recursion. However this may indicate or be
considered a bug in the 3.x interpreter.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fatal Python error

2013-05-29 Thread Oscar Benjamin
On 29 May 2013 14:02, Dave Angel  wrote:
> On 05/29/2013 08:45 AM, Oscar Benjamin wrote:
>
> More likely a bug in the 2.x interpreter.  Once inside an exception handler,
> that frame must be held somehow.  If not on the stack, then in some separate
> list.  So the logic will presumably fill memory, it just may take longer on
> 2.x .

I'm not so sure. The following gives the same behaviour in 2.7, 3.2 and 3.3:

$ cat tmp.py
def loop():
loop()

loop()

$ py -3.2 tmp.py
Traceback (most recent call last):
  File "tmp.py", line 4, in 
loop()
  File "tmp.py", line 2, in loop
loop()
  File "tmp.py", line 2, in loop
loop()
  File "tmp.py", line 2, in loop
loop()
  File "tmp.py", line 2, in loop
...

However the following leads to a RuntimeError in 2.7 but different
stack overflow errors in 3.2 and 3.3:

$ cat tmp.py
def loop():
try:
(lambda: None)()
except RuntimeError:
pass
loop()

loop()

$ py -2.7 tmp.py
Traceback (most recent call last):
  File "tmp.py", line 8, in 
loop()
  File "tmp.py", line 6, in loop
loop()
  File "tmp.py", line 6, in loop
loop()
  File "tmp.py", line 6, in loop
loop()
  File "tmp.py", line 6, in loop
...
RuntimeError: maximum recursion depth exceeded

$ py -3.2 tmp.py
Fatal Python error: Cannot recover from stack overflow.

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

$ py -3.3 tmp.py
Fatal Python error: Cannot recover from stack overflow.

Current thread 0x05c4:
  File "tmp.py", line 3 in loop
  File "tmp.py", line 6 in loop
  File "tmp.py", line 6 in loop
  File "tmp.py", line 6 in loop
  File "tmp.py", line 6 in loop
  File "tmp.py", line 6 in loop
  File "tmp.py", line 6 in loop
...

I would expect this to give "RuntimeError: maximum recursion depth
exceeded" in all three cases.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Harmonic distortion of a input signal

2013-06-12 Thread Oscar Benjamin
On 20 May 2013 00:36,   wrote:
> One more question. Function np.argmax returns max of non-complex numbers ?
> Because FFT array of my signal is complex.

Use abs() like in my example. This will give the absolute value of the
complex numbers:

>>> z = 1+1j
>>> z
(1+1j)
>>> abs(z)
1.4142135623730951


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Short-circuit Logic

2013-06-12 Thread Oscar Benjamin
On 30 May 2013 22:03, Carlos Nepomuceno  wrote:
>> Here's another way, mathematically equivalent (although not necessarily
>> equivalent using floating point computations!) which avoids the divide-by-
>> zero problem:
>>
>> abs(a - b) < epsilon*a
>
> That's wrong! If abs(a) < abs(a-b)/epsilon you will break the commutative law.

There is no commutative law for relative tolerance floating point
comparisons. If you want to compare with a relative tolerance then you
you should choose carefully what your tolerance is to be relative to
(and how big your relative tolerance should be).

In some applications it's obvious which of a or b you should use to
scale the tolerance but in others it is not or you should compare with
something more complex. For an example where it is obvious, when
testing numerical code I might write something like:

eps = 1e-7
true_answer = 123.4567879
estimate = myfunc(5)
assert abs(estimate - true_answer) < eps * abs(true_answer)


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Split a list into two parts based on a filter?

2013-06-13 Thread Oscar Benjamin
On 12 June 2013 19:47, Terry Reedy  wrote:
> The proper loop statement
>
> for s in songs:
> (new_songs if s.is_new() else old_songs).append(s)

I think I would just end up rewriting this as

for s in songs:
if s.is_new():
new_songs.append(s)
else:
old_songs.append(s)

but then we're back where we started. I don't think any of the
solutions posted in this thread have been better than this. If you
want to make this a nice one-liner then just put this code in a
function.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Don't feed the troll...

2013-06-17 Thread Oscar Benjamin
On 17 June 2013 17:35, D'Arcy J.M. Cain  wrote:
> On Mon, 17 Jun 2013 14:39:56 + (UTC)
> Grant Edwards  wrote:
>> I don't want _any_ copies from from Mailman.  I don't subscribe to
>> whatever mailing list you're talking about.  I'm reading this via an
>> NNTP server.  Keep replies in the group or on the list.
>
> And that is part of the problem.  I have always argued that gatewaying
> the mailing list to newgroups is wrong.  If this was only a mailing
> list there are many things we could do to reduce abuse but because of
> the gateway they can't be done.

There is a very simple solution used by many mailing lists which is to
set the Reply-To header to point back to the mailing list. That way
any old email client on any OS/computer/phone/website etc. has the
required button to reply to the list without CCing anyone. It also
reduces the chance of accidentally replying off-list. Anyone who wants
to reply off-list or to deliberately CC someone (as I did here) can
still do so but it will rarely happen accidentally.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Liscensing

2013-06-18 Thread Oscar Benjamin
On 18 June 2013 09:56, Steven Hern  wrote:
>
> We are an educational establishment which wishes to use Python 3.3.2 – Does
> the license cover multi-users in a classroom environment?

Yes, absolutely. Many educational institutions universities, schools,
etc. use Python in classroom environments (the fact that it is a
classroom really makes no difference).

Here is the full license:
http://docs.python.org/3.3/license.html

And here is the relevant text (from clause 2):
'''
Subject to the terms and conditions of this License Agreement, PSF
hereby grants Licensee a nonexclusive, royalty-free, world-wide
license to reproduce, analyze, test, perform and/or display publicly,
prepare derivative works, distribute, and otherwise use Python 3.3.2
alone or in any derivative version, provided, however, that PSF’s
License Agreement and PSF’s notice of copyright, i.e., “Copyright ©
2001-2013 Python Software Foundation; All Rights Reserved” are
retained in Python 3.3.2 alone or in any derivative version prepared
by Licensee.
'''

Half of that text refers to making a derivative version of Python
(which I assume you're not intending to do). Otherwise it essentially
just says that you can use it anywhere you like for anything you want
without paying any money.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Beginner Question: 3D Models

2013-06-19 Thread Oscar Benjamin
On 19 June 2013 12:13,   wrote:
>
> I've seen some information on Blender.  Is it possible to have the entire 
> program contained within a single exe (or exe and some other files) so that 
> it can be passed around and used by others without having to install blender?

I don't know if Blender would cause problems for that but it's not
hard to install Blender generally; apparently there is a portable
version that can be simply unzipped on the target computer.

More generally, though, there are some legal issues relating to
packaging standard MSVC-compiled Python with all of its dependencies
in a single .exe file for Windows. The particular problem is the
Microsoft C runtime library. py2exe has some information about this
here:
http://www.py2exe.org/index.cgi/Tutorial

Generally Python is not designed with the intention that applications
would be packaged into a standalone executable file although a number
of projects exist to make that possible. Is it so hard for your users
to install Python and Blender if you tell them which files to download
and install?


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Beginner Question: 3D Models

2013-06-19 Thread Oscar Benjamin
On 19 June 2013 14:14,   wrote:
> This sounds similar to what I might want. So you know of any online tutorials 
> for this?

It's hard to tell what you're referring to since you haven't included
any quoted context in your message (like I have above). I'll assume
you're referring to what Fábio said.

I've already posted the link to the py2exe tutorial (I assume Fábio
used py2exe since nothing else was specified).

The legal issue I mentioned is precisely about the .dll files that
Fábio referred to. The reason that py2exe (and similar projects) do
not bundle these into the .exe is because it normally isn't legal to
distribute these files. From the tutorial:
'''
you need to check redist.txt within your Visual Studio installation to
see whether you have the legal right to redistribute this DLL. If you
do have these rights, then you have the option to bundle the C runtime
DLL with you application.
'''


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with the "for" loop syntax

2013-06-20 Thread Oscar Benjamin
On 20 June 2013 04:11, Cameron Simpson  wrote:
> I use vi/vim and it both shows the matching bracket when the cursor
> is on one and also have a keystroke to bounce the curser between
> this bracket and the matching one.
>
> If you suspect you failed to close a bracket, one approach is to
> go _below_ the syntax error (or right on it) and type a closing
> bracket. Then see where the editor thinks the opening one is.

I use this technique sometimes and it works if the unclosed bracket is
still in view.

If you use vim then you can do [( i.e. type  '[' followed by '(' in
normal mode. It will jump backwards to the first unmatched opening
bracket. Use ]) to find the next unmatched closing bracket. You can
also do [{ and ]} for curly brackets. I'm not sure how to do square
brackets - [[ and ]] are used for navigating between functions.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why is the argparse module so inflexible?

2013-06-27 Thread Oscar Benjamin
On 27 June 2013 22:30, Jason Swails  wrote:
>
> An alternative is, of course, to simply subclass ArgumentParser and copy
> over all of the code that catches an ArgumentError to eliminate the internal
> exception handling and instead allow them to propagate the call stack.

I would think it easier to wrap getopt than monkey-patch argparse in this way.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python adds an extra half space when reading from a string or list

2013-07-03 Thread Oscar Benjamin
On 4 July 2013 01:53, Ben Finney  wrote:
> rusi  writes:
>
>> As a good Christian I believe that Chris tried more than anyone else
>> on this list to help Nikos before talking recourse to another gem of
>> biblical wisdom:
>
>> He that spareth his rod hateth his son: but he that loveth him
>> chasteneth him betimes.
>
> Good Christian morality entails biblical encouragement to beat one's
> child with a rod, I see.
>
> Please, may I be spared encounters with good Christians.
>
> Let's end right now the insidious doctrine that beating a person -
> metaphorically or otherwise - is ever acceptable in this forum. If that
> contradicts anyone's good Christian morality, then good Christian
> morality is dead wrong and needs to be rejected.

And also, let's end this and all the related discussions about
trolling and how to deal with trolls. I can see how some are annoyed
by Νίκος and his posts but I for one am *much more* concerned/bothered
by the surrounding (often highly) unpleasant discussion by others.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Coping with cyclic imports

2013-07-04 Thread Oscar Benjamin
On 4 July 2013 13:48,   wrote:
> On Tuesday, April 8, 2008 10:06:46 PM UTC+2, Torsten Bronger wrote:
[snip]
>
> If you do "import foo" inside bar and "import bar" inside foo, it will work 
> fine. By the time anything actually runs, both modules will be fully loaded 
> and will have references to each other.
>
> The problem is when instead you do "from foo import abc" and "from bar import 
> xyz". Because now each module requires the other module to already be 
> compiled (so that the name we are importing exists) before it can be compiled.
>
> from
> http://stackoverflow.com/questions/744373/circular-or-cyclic-imports-in-python

Is there some reason you're responding to a post from 5 years ago?

Or is it just a joke that you've created a cyclic import advice link
by referring to a SO question where the top answer is actually a quote
linking back to the previous post in this same thread?


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Coping with cyclic imports

2013-07-05 Thread Oscar Benjamin
On 5 July 2013 02:24, Cameron Simpson  wrote:
> On 04Jul2013 16:03, Oscar Benjamin  wrote:
> |
> | Is there some reason you're responding to a post from 5 years ago?
>
> Is there some reason not to, if no newer solutions are available?

No, I was genuinely curious. My way of accessing this
forum/newsgroup/mailing list doesn't give me a way to respond to very
old posts but others seem to do it every now and again. I see now that
if you're looking at an old thread in Google Groups (rather than e.g.
the python.org archives) it makes the thread seem more like a forum
than a newsgroup or a mailing list so that it's easy and seems more
natural to respond to old posts.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to make this faster

2013-07-05 Thread Oscar Benjamin
On 5 July 2013 09:22, Helmut Jarausch  wrote:
> Hi,
>
> I have coded a simple algorithm to solve a Sudoku (probably not the first 
> one).
> Unfortunately, it takes 13 seconds for a difficult problem which is more than 
> 75 times slower
> than the same algorithm coded in C++.
> Is this to be expected or could I have made my Python version faster *** 
> without *** sacrificing readability.

It is to be expected that this kind of processing is faster in C than
in straight Python. Where Python can win in these kind of problems is
if it makes it easier to implement a better algorithm but if your code
is just a transliteration from C to Python you should expect it to run
slower. Another way that Python can win is if you can express your
problem in terms of optimised operations from e.g. the numpy library
but you're not doing that here.

Of course that does depend on your Python implementation so you could
try e.g. using PyPy/nuitka/cython etc. to speed up the core
processing.

> Profiling shows that the function find_good_cell is called (only) 45267 times 
> and this take 12.9 seconds
> CPU time (on a 3.2 GHz machine)
[snip]
>
> def find_good_cell() :
>   Best= None
>   minPoss= 10
>   for r in range(9) :
> for c in range(9) :
>   if  Grid[r,c] > 0 : continue
>   Sq_No= (r//3)*3+c//3
>   Possibilities= 0
>   for d in range(1,10) :
> if Row_Digits[r,d] or Col_Digits[c,d] or Sqr_Digits[Sq_No,d] : 
> continue
> Possibilities+= 1
>
>   if ( Possibilities < minPoss ) :
> minPoss= Possibilities
> Best= (r,c)
>
>   if minPoss == 0 : Best=(-1,-1)
>   return Best

My one comment is that you're not really making the most out of numpy
arrays. Numpy's ndarrays are efficient when each line of Python code
is triggering a large number of numerical computations performed over
the array. Because of their N-dimensional nature and the fact that
they are in some sense second class citizens in CPython they are often
not as good as lists for this kind of looping and indexing.

I would actually expect this program to run faster with ordinary
Python lists and lists of lists. It means that you need to change e.g.
Grid[r, c] to Grid[r][c] but really I think that the indexing syntax
is all you're getting out of numpy here.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to make this faster

2013-07-05 Thread Oscar Benjamin
On 5 July 2013 11:53, Helmut Jarausch  wrote:
> I even tried to use dictionaries instead of Numpy arrays. This version is a 
> bit
> slower then the lists of lists version (7.2 seconds instead of 6 second) but 
> still
> much faster than the Numpy array solution.

When you switched to dictionaries did you take advantage of the
sparseness by iterating over dictionary keys instead of indices? This
is the kind of thing that I meant when I said that in Python it's
often easier to implement a better algorithm than in C. What I mean is
that if Grid is a dict so that Grid[(r, c)] is the entry at row r and
column c (if it exists) then you can change a loop like:

for r in range(9):
for c in range(9):
if Grid[r, c] > 0: continue
# do stuff

so that it looks like:

for r, c in Grid:
# do stuff

If the grid is sparsely occupied then this could be a significant improvement.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to make this faster

2013-07-05 Thread Oscar Benjamin
On 5 July 2013 15:28, Helmut Jarausch  wrote:
> On Fri, 05 Jul 2013 14:41:23 +0100, Oscar Benjamin wrote:
>
>> On 5 July 2013 11:53, Helmut Jarausch  wrote:
>>> I even tried to use dictionaries instead of Numpy arrays. This version is a 
>>> bit
>>> slower then the lists of lists version (7.2 seconds instead of 6 second) 
>>> but still
>>> much faster than the Numpy array solution.
>>
>> When you switched to dictionaries did you take advantage of the
>> sparseness by iterating over dictionary keys instead of indices? This
>> is the kind of thing that I meant when I said that in Python it's
>> often easier to implement a better algorithm than in C. What I mean is
>> that if Grid is a dict so that Grid[(r, c)] is the entry at row r and
>> column c (if it exists) then you can change a loop like:
>>
>> for r in range(9):
>> for c in range(9):
>> if Grid[r, c] > 0: continue
>> # do stuff
>>
>> so that it looks like:
>>
>> for r, c in Grid:
>> # do stuff
>>
>> If the grid is sparsely occupied then this could be a significant 
>> improvement.
>>
> This gives a big speedup. Now, the time is gone down to 1.73 seconds in 
> comparison to
> original 13 seconds or the 7 seconds for the first version above.

Presumably then you're now down to the innermost loop as a bottle-neck:

  Possibilities= 0
  for d in range(1,10) :
if Row_Digits[r,d] or Col_Digits[c,d] or Sqr_Digits[Sq_No,d] : continue
Possibilities+= 1

If you make it so that e.g. Row_Digits[r] is a set of indices rather
than a list of bools then you can do this with something like

Possibilities = len(Row_Digits[r] | Col_Digits[c] | Sqr_Digits[Sq_No])

or perhaps

Possibilities = len(set.union(Row_Digits[r], Col_Digits[c],
Sqr_Digits[Sq_No]))

which I would expect to be a little faster than looping over range
since the loop is then performed under the hood by the builtin
set-type.

> Many thanks,
> it seems hard to optimize a Python program,

It just takes practice. It's a little less obvious in Python than in
low-level languages where the bottlenecks will be and which operations
are faster/slower but optimisation always involves a certain amount of
trial and error anyway.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to make this faster

2013-07-05 Thread Oscar Benjamin
On 5 July 2013 15:48, Helmut Jarausch  wrote:
> On Fri, 05 Jul 2013 12:02:21 +, Steven D'Aprano wrote:
>
>> On Fri, 05 Jul 2013 10:53:35 +, Helmut Jarausch wrote:
>>
>>> Since I don't do any numerical stuff with the arrays, Numpy doesn't seem
>>> to be a good choice. I think this is an argument to add real arrays to
>>> Python.
>>
>> Guido's time machine strikes again:
>>
>> import array
>>
>>
>> By the way, I'm not exactly sure how you go from "I don't do numerical
>> calculations on numpy arrays" to "therefore Python should have arrays".
>
> I should have been more clear. I meant multi-dimensional arrays (2D, at least)
> Numpy is fine if I do math with matrices (without loops in python).
>
> Given that I don't like to use the old FORTRAN way (when "dynamic" arrays are 
> passed to
> functions) of indexing a 2-d array I would need a MACRO or an INLINED 
> function in Python
> or something like a META-compiler phase transforming
>
> def access2d(v,i,j,dim1) :  # doesn't work on the l.h.s.
>   return v[i*dim1+j]
>
> access2d(v,i,j,dim1) = 7# at compile time, please
>
> to
>
> v[i*dim1+j]= 7  # this, by itself, is considered ugly (the FORTRAN way)

The list of lists approach works fine for what you're doing. I don't
think that a[r][c] is that much worse than a[r, c]. It's only when you
want to do something like a[:, c] that it breaks down. In any case,
your algorithm would work better with Python's set/dict/list types
than numpy arrays.

One of the reasons that it's faster to use lists than numpy arrays (as
you found out) is precisely because the N-dimensional array logic
complicates 1-dimensional processing. I've seen discussions in Cython
and numpy about lighter-weight 1-dimensional array types for this
reason.

The other reason that numpy arrays are slower for what you're doing is
that (just like the stdlib array type Steven referred to) they use
homogeneous types in a contiguous buffer and each element is not a
Python object in its own right until you access it with e.g. a[0].
That means that the numpy array has to create a new object every time
you index into it whereas the list can simply return a new reference
to an existing object. You can get the same effect with numpy arrays
by using dtype=object but I'd still expect it to be slower for this.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to make this faster

2013-07-05 Thread Oscar Benjamin
On 5 July 2013 16:17, Helmut Jarausch  wrote:
>
> I've tried the following version
>
> def find_good_cell() :
>   Best= None
>   minPoss= 10
>   for r,c in Grid :
> if  Grid[(r,c)] > 0 : continue

Sorry, I think what I meant was that you should have a structure
called e.g. Remaining which is the set of all (r, c) pairs that you
want to loop over here. Then there's no need to check on each
iteration whether or not Grid[(r, c)] > 0. When I said "sparse" I
meant that you don't need to set keys in Grid unless you actually have
a value there so the test "Grid[(r, c)] > 0" would look like "(r, c)
in Grid". Remaining is the set of all (r, c) pairs not in Grid that
you update incrementally with .add() and .remove().

Then this

   for r,c in Grid :
 if  Grid[(r,c)] > 0 : continue

becomes

for r, c in Remaining:

> Sq_No= (r//3)*3+c//3
> Possibilities= 9-len(Row_Digits[r] | Col_Digits[c] | Sqr_Digits[Sq_No])
> if ( Possibilities < minPoss ) :
>   minPoss= Possibilities
>   Best= (r,c)
>
>   if minPoss == 0 : Best=(-1,-1)
>   return Best
>
> All_digits= set((1,2,3,4,5,6,7,8,9))

All_digits= set(range(1, 10))

or

All_digits = {1,2,3,4,5,6,7,8,9}

>
> def Solve(R_Cells) :
>   if  R_Cells == 0 :
> print("\n\n++ S o l u t i o n ++\n")
> Print_Grid()
> return True
>
>   r,c= find_good_cell()
>   if r < 0 : return False
>   Sq_No= (r//3)*3+c//3
>
>   for d in All_digits - (Row_Digits[r] | Col_Digits[c] | Sqr_Digits[Sq_No]) :
> # put d into Grid
> Grid[(r,c)]= d
> Row_Digits[r].add(d)
> Col_Digits[c].add(d)
> Sqr_Digits[Sq_No].add(d)
>
> Success= Solve(R_Cells-1)
>
> # remove d again
> Grid[(r,c)]= 0
> Row_Digits[r].remove(d)
> Col_Digits[c].remove(d)
> Sqr_Digits[Sq_No].remove(d)
>
> if Success :
>   Zuege.append((d,r,c))
>   return True
>
>   return False
>
> which turns out to be as fast as the previous "dictionary only version".
> Probably,  set.remove is a bit slow

No it's not and you're not using it in your innermost loops anyway.
Probably the loop I referred to isn't your bottleneck.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: xslice idea | a generator slice

2013-07-11 Thread Oscar Benjamin
On 11 July 2013 15:54, Russel Walker  wrote:
> ...oh and here is the class I made for it.
>
> class xslice(object):
> '''
> xslice(seq, start, stop, step) -> generator slice
> '''
>
> def __init__(self, seq, *stop):

Wouldn't it be better if it has the same signature(s) as itertools.islice?

> if len(stop) > 3:
> raise TypeError("xslice takes at most 4 arguments")
> elif len(stop) < 0:

How would len(stop) be negative?

> raise TypeError("xslice requires atleast 2 arguments")
> else:
> start, stop, step = (((0,) + stop[:2])[-2:] +  # start, stop
>  (stop[2:] + (1,))[:1])# step
> stop = min(stop, len(seq))
> self._ind = iter(xrange(start, stop, step))
> self._seq = seq
>
> def __iter__(self):
> return self
>
> def next(self):
> return self._seq[self._ind.next()]
>
>
>
> Although now that I think about it, it probably should've just been a simple 
> generator function.

Or you can use itertools.imap:

def xslice(sequence, start_or_stop, *args):
indices = xrange(*slice(start_or_stop, *args).indices(len(sequence)))
return imap(sequence.__getitem__, indices)


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: xslice idea | a generator slice

2013-07-11 Thread Oscar Benjamin
On 11 July 2013 17:21, Russel Walker  wrote:
> To confess, this is the second time I've made the mistake of trying to 
> implement generator like functionality of a builtin when there already is on 
> in itertools. Need to start studying that module abit more I think. I'm 
> looking at the docs now and I see there are actually a couple of 
> isomethings().

Your xslice (or mine) would still be better than islice when the step
size is large; islice has to iterate over all the skipped elements
which could be wasteful if the input is indexable. Also islice doesn't
support negative values for start, stop or step which xslice does.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3: dict & dict.keys()

2013-07-24 Thread Oscar Benjamin
On Jul 24, 2013 7:25 AM, "Peter Otten" <__pete...@web.de> wrote:
>
> Ethan Furman wrote:
>
> > So, my question boils down to:  in Python 3 how is dict.keys() different
> > from dict?  What are the use cases?
>
> I just grepped through /usr/lib/python3, and could not identify a single
> line where some_object.keys() wasn't either wrapped in a list (or set,
> sorted, max) call, or iterated over.
>
> To me it looks like views are a solution waiting for a problem.

What do you mean? Why would you want to create a temporary list just to
iterate over it explicitly or implicitly (set, sorted, max,...)?

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3: dict & dict.keys()

2013-07-24 Thread Oscar Benjamin
On Jul 24, 2013 2:27 PM, "Peter Otten" <__pete...@web.de> wrote:
>
> Oscar Benjamin wrote:
>
> > On Jul 24, 2013 7:25 AM, "Peter Otten" <__pete...@web.de> wrote:
> >>
> >> Ethan Furman wrote:
> >>
> >> > So, my question boils down to:  in Python 3 how is dict.keys()
> >> > different
> >> > from dict?  What are the use cases?
> >>
> >> I just grepped through /usr/lib/python3, and could not identify a
single
> >> line where some_object.keys() wasn't either wrapped in a list (or set,
> >> sorted, max) call, or iterated over.
> >>
> >> To me it looks like views are a solution waiting for a problem.
> >
> > What do you mean? Why would you want to create a temporary list just to
> > iterate over it explicitly or implicitly (set, sorted, max,...)?
>
> I mean I don't understand the necessity of views when all actual usecases
> need iterators. The 2.x iterkeys()/iteritems()/itervalues() methods didn't
> create lists either.

Oh, okay. I see what you mean.

>
> Do you have 2.x code lying around where you get a significant advantage by
> picking some_dict.viewkeys() over some_dict.iterkeys()?

No. I don't think I've ever used viewkeys. I noticed it once, didn't see an
immediate use and forgot about it but...

> I could construct
> one
>
> >>> d = dict(a=1, b=2, c=3)
> >>> e = dict(b=4, c=5, d=6)
> >>> d.viewkeys() & e.viewkeys()
> set(['c', 'b'])

that might be useful.

>
> but have not seen it in the wild.

> My guess is that most non-hardcore users don't even know about viewkeys().
> By the way, my favourite idiom to iterate over the keys in both Python 2
and
> 3 is -- for example -- max(some_dict) rather than
> max(some_dict.whateverkeys()).

Agreed.

Earlier I saw that I had list(some_dict) in some code. Not sure why but
maybe because it's the same in Python 2 and 3.

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Unexpected results comparing float to Fraction

2013-07-30 Thread Oscar Benjamin
On 29 July 2013 17:09, MRAB  wrote:
> On 29/07/2013 16:43, Steven D'Aprano wrote:
>>
>> Comparing floats to Fractions gives unexpected results:

You may not have expected these results but as someone who regularly
uses the fractions module I do expect them.

>> # Python 3.3
>> py> from fractions import Fraction
>> py> 1/3 == Fraction(1, 3)
>> False
>>
>> but:
>>
>> py> 1/3 == float(Fraction(1, 3))
>> True

Why would you do the above? You're deliberately trying to create a
float with a value that you know is not representable by the float
type. The purpose of Fractions is precisely that they can represent
all rational values, hence avoiding these problems.

When I use Fractions my intention is to perform exact computation. I
am very careful to avoid allowing floating point imprecision to sneak
into my calculations. Mixing floats and fractions in computation is
not IMO a good use of duck-typing.

>> I expected that float-to-Fraction comparisons would convert the Fraction
>> to a float, but apparently they do the opposite: they convert the float
>> to a Fraction:
>>
>> py> Fraction(1/3)
>> Fraction(6004799503160661, 18014398509481984)
>>
>> Am I the only one who is surprised by this? Is there a general rule for
>> which way numeric coercions should go when doing such comparisons?

I would say that if type A is a strict superset of type B then the
coercion should be to type A. This is the case for float and Fraction
since any float can be represented exactly as a Fraction but the
converse is not true.

> I'm surprised that Fraction(1/3) != Fraction(1, 3); after all, floats
> are approximate anyway, and the float value 1/3 is more likely to be
> Fraction(1, 3) than Fraction(6004799503160661, 18014398509481984).

Refuse the temptation to guess: Fraction(float) should give the exact
value of the float. It should not give one of the countably infinite
number of other possible rational numbers that would (under a
particular rounding scheme and the floating point format in question)
round to the same float. If that is the kind of equality you would
like to test for in some particular situation then you can do so by
coercing to float explicitly.

Calling Fraction(1/3) is a misunderstanding of what the fractions
module is for and how to use it. The point is to guarantee avoiding
floating point errors; this is impossible if you use floating point
computations to initialise Fractions.

Writing Fraction(1, 3) does look a bit ugly so my preferred way to
reduce the boiler-plate in a script that uses lots of Fraction
"literals" is to do:

from fractions import Fraction as F

# 1/3 + 1/9 + 1/27 + ...
limit = F('1/3') / (1 - F('1/3'))

That's not as good as dedicated syntax but with code highlighting it's
still quite readable.


Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Read STDIN as bytes rather than a string

2012-06-19 Thread Oscar Benjamin
On 19 June 2012 00:53, Jason Friedman  wrote:

> Which leads me to another question ... how can I debug these things?
>
> $ echo 'hello' | python3 -m pdb ~/my-input.py
> > /home/jason/my-input.py(2)()
> -> import sys
> (Pdb) *** NameError: name 'hello' is not defined
> --
> http://mail.python.org/mailman/listinfo/python-list
>

It's difficult to debug problems that are related to reading from stdin. I
don't know of any good way, so I just end up doing things like adding print
statements and checking the output rather than using a debugger. Tools like
hd can help with checking the input/output files that you're using. If
there were a debugger it would probably need to be one with a GUI - the
only one I know is spyder but I don't think that will allow you to pipe
anything on stdin.

One thing I wanted to say is that if your script is intended to work on
Windows you'll need to use msvcrt.setmode() to disable newline translation
on stdin (I haven't tested with Python 3.x, but it's definitely necessary
with Python 2.x). See Frazil's post here:
http://stackoverflow.com/questions/2850893/reading-binary-data-from-stdin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Faster way to map numpy arrays

2012-06-25 Thread Oscar Benjamin
On 25 June 2012 08:24, Stefan Behnel  wrote:

> Saurabh Kabra, 25.06.2012 05:37:
> > I have written a script to map a 2D numpy array(A) onto another array(B)
> of
> > different dimension. more than one element (of array A) are summed and
> > mapped to each element of array B.  To achieve this I create a list
> where I
> > store the index of array A to be mapped to array B. The list is the
> > dimension of array B (if one can technically say that) and each element
> is
> > a list of indices to be summed. Then I parse this list with a nested loop
> > and compute each element of array B.
>
>
> > Because of the nested loop and the big arrays the process takes a minute
> or
> > so. My question is: is there a more elegant and significantly faster way
> of
> > doing this in python?
>
> I'm sure there's a way to do this kind of transformation more efficiently
> in NumPy. I faintly recall that you can use one array to index into
> another, something like that might do the trick already. In any case, using
> a NumPy array also for the mapping matrix sounds like a straight forward
> thing to try.
>

I can't tell from the description of the problem what you're trying to do
but for the special case of summing along one axis of a numpy array of
dimension N to produce a new numpy array of dimension N-1, there is  fast
builtin support in numpy:

>>> import numpy
>>> a = numpy.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
   [3, 4]])
>>> a.sum()   # sum of all elements
10
>>> a.sum(axis=1)  # sum of each row
array([3, 7])
>>> a.sum(axis=0)  # sum of each column
array([4, 6])

If your problem is not in this form, you can use numpy's fancy indexing to
convert it to this form, provided the number of elements summed from A is
the same for each element of B (i.e. if each element of B is the result of
summing exactly 10 elements chosen from A).


> But you might also want to take a look at Cython. It sounds like a problem
> where a trivial Cython implementation would seriously boost the
> performance.
>
> http://docs.cython.org/src/tutorial/numpy.html


>
> Stefan
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Faster way to map numpy arrays

2012-06-26 Thread Oscar Benjamin
On 26 June 2012 04:20, Saurabh Kabra  wrote:

> Thanks guys
>
> I implemented a numpy array with fancy indices and got rid of the list and
> the loops. The time to do the mapping improved ~10x. As a matter of fact,
> the number of elements in array A to be summed and mapped was different for
> each element in B (which was the reason I was using lists). But I solved
> that problem by simply adding zero elements to make a regular 3D numpy
> array out of the list.
>

Is that good enough, or are you looking for more speedup?

Padding with zeros to create the larger-than-needed array may be less
time-efficient (and is definitely less memory efficient) than extracting
each subarray in a loop. Consider the following:

>>> import numpy
>>> a = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> a
array([[1, 2, 3],
   [4, 5, 6],
   [7, 8, 9]])
>>> af = a.flatten()
>>> af
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> indices = [[1, 7, 8], [0, 1], [8, 4]]
>>> indices = [numpy.array(inds) for inds in indices]
>>> indices
[array([1, 7, 8]), array([0, 1]), array([8, 4])]
>>> for inds in indices:
... print inds, af[inds], af[inds].sum()
...
[1 7 8] [2 8 9] 19
[0 1] [1 2] 3
[8 4] [9 5] 14

Knowing the most efficient way depends on a number of things. The first
would be whether or not the operation you're describing is repeated. If,
for example, you keep doing this for the same indices but each time
changing the array A then you should try precomputing all the indices as a
list of numpy arrays (as shown above).

On the other hand, if you're repeating with the same matrix A but different
sets of indices, you'll need to think about how you are generating the
indices.

In my experience the fastest way to do something like this would be to use
cython as suggested above by Stefan.

Oscar.


>
> Saurabh
>
>
>
>
>
> On 25 June 2012 08:24, Stefan Behnel
>  wrote:
>
>  Saurabh Kabra, 25.06.2012 05:37:
>> > I have written a script to map a 2D numpy array(A) onto another
>> array(B) of
>> > different dimension. more than one element (of array A) are summed and
>> > mapped to each element of array B.  To achieve this I create a list
>> where I
>> > store the index of array A to be mapped to array B. The list is the
>> > dimension of array B (if one can technically say that) and each element
>> is
>> > a list of indices to be summed. Then I parse this list with a nested
>> loop
>> > and compute each element of array B.
>>
>  >
>> > Because of the nested loop and the big arrays the process takes a
>> minute or
>> > so. My question is: is there a more elegant and significantly faster
>> way of
>> > doing this in python?
>>
>> I'm sure there's a way to do this kind of transformation more efficiently
>> in NumPy. I faintly recall that you can use one array to index into
>> another, something like that might do the trick already. In any case,
>> using
>> a NumPy array also for the mapping matrix sounds like a straight forward
>> thing to try.
>>
>
> I can't tell from the description of the problem what you're trying to do
> but for the special case of summing along one axis of a numpy array of
> dimension N to produce a new numpy array of dimension N-1, there is  fast
> builtin support in numpy:
>
> >>> import numpy
> >>> a = numpy.array([[1, 2], [3, 4]])
> >>> a
> array([[1, 2],
>[3, 4]])
> >>> a.sum()   # sum of all elements
> 10
> >>> a.sum(axis=1)  # sum of each row
> array([3, 7])
> >>> a.sum(axis=0)  # sum of each column
> array([4, 6])
>
> If your problem is not in this form, you can use numpy's fancy indexing to
> convert it to this form, provided the number of elements summed from A is
> the same for each element of B (i.e. if each element of B is the result of
> summing exactly 10 elements chosen from A).
>
>
>> But you might also want to take a look at Cython. It sounds like a problem
>> where a trivial Cython implementation would seriously boost the
>> performance.
>>
>> http://docs.cython.org/src/tutorial/numpy.html
>
>
>>
>> Stefan
>>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Opening multiple Files in Different Encoding

2012-07-11 Thread Oscar Benjamin
On 11 July 2012 19:15,  wrote:

> On Tuesday, July 10, 2012 11:16:08 PM UTC+5:30, Subhabrata wrote:
> > Dear Group,
> >
> > I kept a good number of files in a folder. Now I want to read all of
> > them. They are in different formats and different encoding. Using
> > listdir/glob.glob I am able to find the list but how to open/read or
> > process them for different encodings?
> >
> > If any one can help me out.I am using Python3.2 on Windows.
> >
> > Regards,
> > Subhabrata Banerjee.
> Dear Group,
>
> No generally I know the glob.glob or the encodings as I work lot on
> non-ASCII stuff, but I recently found an interesting issue, suppose there
> are .doc,.docx,.txt,.xls,.pdf files with different encodings.


Some of the formats you have listed are not text-based. What do you mean by
the encoding of e.g. a .doc or .xls file?

My understanding is that these are binary files. You won't be able to read
them without the help of a special module (I don't know of one that can).


> 1) First I have to determine on the fly the file type.
> 2) I can not assign encoding="..." whatever be the encoding I have to read
> it.
>

Perhaps you just want to open the file as binary? The following will read
the contents of any file binary or text regardless of encoding or anything
else:

f = open('spreadsheet.xls', 'rb')
data = f.read()   # returns binary data rather than text


>
> Any idea. Thinking.
>
> Thanks in Advance,
> Regards,
> Subhabrata Banerjee.
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: properly catch SIGTERM

2012-07-20 Thread Oscar Benjamin
What about Kushal's suggestion above? Does the following work for you?

signal.signal(signal.SIGTERM, my_SIGTERM_handler)
signal.siginterrupt(signal.SIGTERM, flag=False)

According to the siginterrupt docs (
http://docs.python.org/library/signal.html)
"""
Change system call restart behaviour: if flag is False, system calls will
be restarted when interrupted by signal signalnum, otherwise system calls
will be interrupted. Returns nothing. Availability: Unix
'"""

Cheers,
Oscar.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: default repr?

2012-07-22 Thread Oscar Benjamin
On 22 July 2012 23:48, Dan Stromberg  wrote:

>
> If a class has defined its own __repr__ method, is there a way of getting
> the default repr output for that class anyway?
>

For new style classes you can just call object.__repr__ e.g.:

In [1]: class A(object):
   ...: pass
   ...:

In [2]: class B(object):
   ...: def __repr__(self):
   ...: return 'foo'
   ...:

In [3]: a = A()

In [4]: b = B()

In [5]: repr(a)
Out[5]: '<__main__.A object at 0x2136b10>'

In [6]: repr(b)
Out[6]: 'foo'

In [7]: object.__repr__(b)
Out[7]: '<__main__.B object at 0x2136c10>'

Oscar.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: default repr?

2012-07-23 Thread Oscar Benjamin
On 23 July 2012 01:24, Steven D'Aprano  wrote:

> On Mon, 23 Jul 2012 08:54:00 +1000, Chris Angelico wrote:
>
> > On Mon, Jul 23, 2012 at 8:48 AM, Dan Stromberg 
> > wrote:
> >> If a class has defined its own __repr__ method, is there a way of
> >> getting the default repr output for that class anyway?
>
> If the class, call it C, is a subclass of some other class (or classes),
> then there is also the repr of the parent. You can get to that by calling
> parent.__repr__(instance), although there are some subtleties.
>
> In Python 2, there are old-style or classic classes that don't inherit
> from anything else. I don't believe there is any way to get the repr of a
> classic class with no __repr__ method *except* from an instance with no
> __repr__ method. So the answer for C below will be No:
>
> # Python 2.x
> class C:
> def __repr__(self):
> return "C()"
>

You coudl always implement repr yourself:

def repr_oldstyle(obj):
mod = obj.__class__.__module__
cls = obj.__class__.__name__
mem = '0x' + hex(id(obj))[2:].zfill(8).upper()
return '<{0}.{1} instance at {2}>'.format(mod, cls, mem)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: argparse limitations

2012-07-27 Thread Oscar Benjamin
On 27 July 2012 15:26, Benoist Laurent  wrote:

> Hi,
>
> I'm impletting a tool in Python.
> I'd like this tool to behave like a standard unix tool, as grep for
> exemple.
> I chose to use the argparse module to parse the command line and I think
> I'm getting into several limitations of this module.
>
> > First Question.
> How can I configure the the ArgumentParser to allow the user to give
> either an input file or to pipe the output from another program?
>
> $ mytool.py file.txt

$ cat file.txt | mytool.py
>

A better way to do that last line is:
$ mytool.py < file.txt

To answer the question, just make the first argument optional defaulting to
None. Then you can do:
if file1 is None:
file1 = sys.stdin


>
>
> > Second Question.
> How can I get the nargs options working with subparser?
> Cause basically if I've got a positionnal argument with nargs > 1, then
> the subparsers are recognized as values for the positionnal argument.
>
> $ mytool.py file1.txt file2.txt foo
>
> Here foo is a command I'd like to pass to mytool but argparse considers
> it's another input file (as are file1.txt and file2.txt).
>

I haven't used subparsers in argparse but I imagine that you would call it
like:
$ mytool.py foo file1.txt file2.txt


Cheers,
Oscar.


>
> Any help would be appreciated.
> Ben.
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: argparse limitations

2012-07-31 Thread Oscar Benjamin
On Jul 31, 2012 10:32 AM, "Benoist Laurent"  wrote:
>
> Well sorry about that but it seems I was wrong.
> It was Friday evening and I guess I've not been careful.
>
> Actually when you specify nargs="?",  the doc says "One argument will be
consumed from the command line if possible, and produced as a single item".
> So you can't pass several arguments to the program.

Right below that in the docs it explains about using nargs='*' and
nargs='+'. One of those will do what you want.

Oscar.

>
> So, to rephrase the question, how can I get a argument parser that parses
the command-line just as Unix grep would do?
> i.e.
>
> $ echo 42 > foo.txt
> $ echo 172 >> foo.txt
> $ cp foo.txt bar.txt
> $
> $ grep 42 foo.txt
> 42
> $ grep 42 foo.txt bar.txt
> foo.txt:42
> bar.txt:42
> $ cat foo.txt | grep 42
> 42
> $ grep -c 42 foo.txt
> 1
>
>
> Cheers,
> Ben
>
>
>
>
> Le Jul 27, 2012 à 7:08 PM, Benoist Laurent a écrit :
>
>>
>>
>> Yes basically looks like you get it.
>> I have to further test it but my first impression is that it's correct.
>>
>> So actually the point was to use nargs="?".
>>
>> Thank you very much.
>> Ben
>>
>>
>>
>> Le Jul 27, 2012 à 5:44 PM, Peter Otten a écrit :
>>
>>> Benoist Laurent wrote:
>>>
 I'm impletting a tool in Python.

 I'd like this tool to behave like a standard unix tool, as grep for

 exemple. I chose to use the argparse module to parse the command line
and

 I think I'm getting into several limitations of this module.


> First Question.

 How can I configure the the ArgumentParser to allow the user to give

 either an input file or to pipe the output from another program?


 $ mytool.py file.txt

 $ cat file.txt | mytool.py
>>>
>>>
>>> $ echo alpha > in.txt
>>> $ cat in.txt | ./mytool.py
>>> ALPHA
>>> $ cat in.txt | ./mytool.py - out.txt
>>> $ cat out.txt
>>> ALPHA
>>> $ ./mytool.py in.txt
>>> ALPHA
>>> $ ./mytool.py in.txt out2.txt
>>> $ cat out2.txt
>>> ALPHA
>>> $ cat ./mytool.py
>>> #!/usr/bin/env python
>>> assert __name__ == "__main__"
>>>
>>> import argparse
>>> import sys
>>>
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument("infile", nargs="?", type=argparse.FileType("r"),
>>> default=sys.stdin)
>>> parser.add_argument("outfile", nargs="?", type=argparse.FileType("w"),
>>> default=sys.stdout)
>>> args = parser.parse_args()
>>>
>>> args.outfile.writelines(line.upper() for line in args.infile)
>>>
>>> Is that good enough?
>>>
>>>
>>> --
>>> http://mail.python.org/mailman/listinfo/python-list
>>>
>>
>> --
>> Benoist Laurent
>> Laboratoire de Biochimie Theorique / CNRS UPR 9080
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie
>> F-75005 Paris
>> Tel. +33 [0]1 58 41 51 67 or +33 [0]6 21 64 50 56
>>
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>
>
> --
> Benoist Laurent
> Laboratoire de Biochimie Theorique / CNRS UPR 9080
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie
> F-75005 Paris
> Tel. +33 [0]1 58 41 51 67 or +33 [0]6 21 64 50 56
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: argparse limitations

2012-07-31 Thread Oscar Benjamin
On 31 July 2012 12:03, Benoist Laurent  wrote:

> Finally.
>
> The code I proposed doesn't work in this case: if you add any positional
> argument to one of the subparsers, then the parsing doesn't work anymore.
> The reason seems to be that argparse thinks the last argument of the first
> parser is the last but one argument.
> Hence, if a subparser takes some arguments, it fails.
>
> Example: if the "-n" argument of the foo parser is set mandatory (so
> becomes "n" instead of "-n")
>
> python toto.py foo.txt bar.txt foo 10
> usage: toto.py [-h] [fname [fname ...]] command ...
> toto.py: error: argument command: invalid choice: '10' (choose from 'foo',
> 'bar')
>

What about:

$ python toto.py foo.txt bar.txt foo -n 10

Note that contrary to what you said above, your program does not work like
a "standard unix tool". A standard command line program to do what you want
would normally look like

$ python toto.py foo -n 10 foo.txt bar.txt

or perhaps

$ python toto.py foo foo.txt bar.txt -n 10

so that the algorithm for differentiating the command 'foo' from the
filenames is well defined. How do you propose that your user enters a
filename 'foo'?

Oscar.


>
> Any solution?
>
> Cheers,
> Ben
>
>
>
> Le Jul 31, 2012 à 12:37 PM, Benoist Laurent a écrit :
>
> Really sorry about that.
>
> So, for the community, below is the full code for a tool that behaves like
> a Unix standard tool.
> It takes in argument the files to process and a command.
>
> """Just to setup a command-line parser that acts just like a unix
> standard tool."""
>
> import argparse
> import sys
>
> def define_options():
> parser = argparse.ArgumentParser()
> parser.add_argument("fname", help="input file", nargs="*")
>
> # create subparsers
> subparsers = parser.add_subparsers(dest="cmd", metavar="command")
>
> # create the parser for the "foo" command
> get_parser = subparsers.add_parser("foo", help="foo help")
> get_parser.add_argument("-n", help="number of foo to print",
> type=int, default=10)
>
> # create the parser for the "bar" command
> sum_parser = subparsers.add_parser("bar", help="bar help")
>
> return parser
>
>
> if __name__ == '__main__':
> args = define_options().parse_args()
>
> if not args.fname:
> content = sys.stdin.read()
> # do something
> else:
> for fname in args.fname:
> with(open(fname, "rt")) as f:
> content = f.read()
> # do somet
>
>
> Benoist
>
>
>
> Le Jul 31, 2012 à 11:55 AM, Oscar Benjamin a écrit :
>
>
> On Jul 31, 2012 10:32 AM, "Benoist Laurent"  wrote:
> >
> > Well sorry about that but it seems I was wrong.
> > It was Friday evening and I guess I've not been careful.
> >
> > Actually when you specify nargs="?",  the doc says "One argument will be
> consumed from the command line if possible, and produced as a single item".
> > So you can't pass several arguments to the program.
>
> Right below that in the docs it explains about using nargs='*' and
> nargs='+'. One of those will do what you want.
>
> Oscar.
>
> >
> > So, to rephrase the question, how can I get a argument parser that
> parses the command-line just as Unix grep would do?
> > i.e.
> >
> > $ echo 42 > foo.txt
> > $ echo 172 >> foo.txt
> > $ cp foo.txt bar.txt
> > $
> > $ grep 42 foo.txt
> > 42
> > $ grep 42 foo.txt bar.txt
> > foo.txt:42
> > bar.txt:42
> > $ cat foo.txt | grep 42
> > 42
> > $ grep -c 42 foo.txt
> > 1
> >
> >
> > Cheers,
> > Ben
> >
> >
> >
> >
> > Le Jul 27, 2012 à 7:08 PM, Benoist Laurent a écrit :
> >
> >>
> >>
> >> Yes basically looks like you get it.
> >> I have to further test it but my first impression is that it's correct.
> >>
> >> So actually the point was to use nargs="?".
> >>
> >> Thank you very much.
> >> Ben
> >>
> >>
> >>
> >> Le Jul 27, 2012 à 5:44 PM, Peter Otten a écrit :
> >>
> >>> Benoist Laurent wrote:
> >>>
> >>>> I'm impletting a tool in Python.
> >>>>
> >>>>

Re: profiling and optimizing

2012-07-31 Thread Oscar Benjamin
On 31 July 2012 13:13, Rita  wrote:

> hello,
>
> I recently inherented a large python process and everything is lovely. As
> a learning experience I would like to optimize the code so I ran it thru
> the profiler
>
> python -m cProfile myscript.py
>
> It seems majority of the time is taking in the deep copy but that seems to
> come from a function (or functions) in the code. Is there a way to optimize
> that? perhaps have a C implementation of the deep copy? Would that
> be feasible?
>

I think you'll need to provide more information to get an answer to your
question. Why are you copying? What are you copying? Do you need a
deep-copy?

I don't really know what you're doing but my first approach would be to try
and reduce or eliminate the deep copies rather than implement them in c.

Oscar.


>
>
>
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: argparse limitations

2012-07-31 Thread Oscar Benjamin
On 31 July 2012 13:51, Benoist Laurent  wrote:

>
> Le Jul 31, 2012 à 1:45 PM, Oscar Benjamin a écrit :
>
>
>
> On 31 July 2012 12:03, Benoist Laurent  wrote:
>
>> Finally.
>>
>> The code I proposed doesn't work in this case: if you add any positional
>> argument to one of the subparsers, then the parsing doesn't work anymore.
>> The reason seems to be that argparse thinks the last argument of the
>> first parser is the last but one argument.
>> Hence, if a subparser takes some arguments, it fails.
>>
>> Example: if the "-n" argument of the foo parser is set mandatory (so
>> becomes "n" instead of "-n")
>>
>> python toto.py foo.txt bar.txt foo 10
>> usage: toto.py [-h] [fname [fname ...]] command ...
>> toto.py: error: argument command: invalid choice: '10' (choose from
>> 'foo', 'bar')
>>
>
> What about:
>
> $ python toto.py foo.txt bar.txt foo -n 10
>
> Note that contrary to what you said above, your program does not work like
> a "standard unix tool". A standard command line program to do what you want
> would normally look like
>
>
> You're right.
> But then, using argparse, I would have to add the same argument to all my
> subparsers since argparse does the work sequentially: once it recognized
> the start of a subparser, everything that follows have to be an argument of
> this subparser.
> Hence, arguments (therefore options) from the main parser are not
> recognized anymore.
>

If the parsing after the subcommand is to be the same for each subcommand,
then don't use subparsers. You could just make the first argument be the
command name and use one parser for everything.

If the parsing is supposed to be different for different subcommands then
use subparsers and add the files argument to each subparser; you can make a
function to add the common arguments and options if you don't want to
duplicate the code.

Well I guess it is a intrinsec limitation: I think it's quite natural that
> the user can't enter a filename named as a command.
>

There is an intrinsic limitation on any parser that it must be possible to
determine the targets of the arguments uniquely. If you want the parser to
scan through and take the first argument matching 'foo' or 'bar' and parse
the remaining arguments accordingly then you already have your solution. It
just won't work if the user wants to pass in a file called 'foo' or 'bar'
(maybe that's acceptable, though).

The standard way, however, is to have a parser that takes the first
non-option argument as a subcommand name and parses the remaining arguments
according to that subcommand. Your command line users are more likely to be
able to understand how to use the program if it works that way.

Oscar.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: looking for a neat solution to a nested loop problem

2012-08-06 Thread Oscar Benjamin
Are you familiar with the itertools module?

itertools.product is designed for this purpose:
http://docs.python.org/library/itertools#itertools.product

Oscar.

On 6 August 2012 16:52, Tom P  wrote:

> consider a nested loop algorithm -
>
> for i in range(100):
> for j in range(100):
> do_something(i,j)
>
> Now, suppose I don't want to use i = 0 and j = 0 as initial values, but
> some other values i = N and j = M, and I want to iterate through all 10,000
> values in sequence - is there a neat python-like way to this? I realize I
> can do things like use a variable for k in range(1): and then derive
> values for i and j from k, but I'm wondering if there's something less
> clunky.
> --
> http://mail.python.org/**mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: looking for a neat solution to a nested loop problem

2012-08-06 Thread Oscar Benjamin
On 6 August 2012 16:52, Tom P  wrote:

> consider a nested loop algorithm -
>
> for i in range(100):
> for j in range(100):
> do_something(i,j)
>
> Now, suppose I don't want to use i = 0 and j = 0 as initial values, but
> some other values i = N and j = M, and I want to iterate through all 10,000
> values in sequence - is there a neat python-like way to this? I realize I
> can do things like use a variable for k in range(1): and then derive
> values for i and j from k, but I'm wondering if there's something less
> clunky.
> --
> http://mail.python.org/**mailman/listinfo/python-list
>

Are you familiar with the itertools module?

itertools.product is designed for this purpose:
http://docs.python.org/library/itertools#itertools.product
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: looking for a neat solution to a nested loop problem

2012-08-06 Thread Oscar Benjamin
On 6 August 2012 18:14, Tom P  wrote:

> On 08/06/2012 06:18 PM, Nobody wrote:
>
>> On Mon, 06 Aug 2012 17:52:31 +0200, Tom P wrote:
>>
>>  consider a nested loop algorithm -
>>>
>>> for i in range(100):
>>>   for j in range(100):
>>>   do_something(i,j)
>>>
>>> Now, suppose I don't want to use i = 0 and j = 0 as initial values, but
>>> some other values i = N and j = M, and I want to iterate through all
>>> 10,000 values in sequence - is there a neat python-like way to this?
>>>
>>
>> for i in range(N,N+100):
>> for j in range(M,M+100):
>> do_something(i,j)
>>
>> Or did you mean something else?
>>
>
> no, I meant something else ..
>
>   j runs through range(M, 100) and then range(0,M), and i runs through
> range(N,100) and then range(0,N)
>
> .. apologies if I didn't make that clear enough.


How about range(N, 100) + range(0, N)?

Example (Python 2.x):

>>> range(3, 10)
[3, 4, 5, 6, 7, 8, 9]
>>> range(0, 3)
[0, 1, 2]
>>> range(3, 10) + range(0, 3)
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2]

In Python 3.x you'd need to do list(range(...)) + list(range(...)) or use
itertools.chain.

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pickle file and send via socket

2012-08-08 Thread Oscar Benjamin
On 8 August 2012 16:07, lipska the kat  wrote:

> On 08/08/12 14:50, S.B wrote:
>
>> On Wednesday, August 8, 2012 3:48:43 PM UTC+3, lipska the kat wrote:
>>
>>> On 06/08/12 14:32, S.B wrote:
>>>
>>>
> [snip]
>
>
>  Thank you so much !
>> The examples are very helpful.
>> What happens if I have a regular text file I want to send via the network.
>> Do I need to read the file and then dump it into the "stargate" file
>> object?
>>
>
Lipska's example code is a good demonstration of how to use sockets and how
to pickle/unpickle objects in order to send them as bytes over a socket.

However, I don't think you really want to pickle the file object (i.e. you
don't want to use the pickle module). It seems that you want to send the
content of the file, which already is bytes, to the other computer. If
that's the case then you can read the bytes of a file with:

filename = 'myfile.txt'
fileobject = open(filename, 'rb')  # We use 'rb' to get bytes, not unicode
filecontents = fileobject.read()# Reads the whole content of the file
and returns a bytes object
fileobject.close()

stargate.write(filecontents)   # Actually send the data

In the server program, you can do:

filecontents_received = stargate.read()

Now you have the bytes object and can write it to a file with

filename_server = 'myserverfile.txt'
fileobject_server = open(filename, 'wb')
fileobject_server.write(filecontents_received)
fileobject_server.close()

Notice the difference between the fileobject variable and the filecontents
variable. The subject of your post suggests that you want to send the
fileobject variable (which doesn't make much sense) but your last message
suggests that you're more interested in transferring the filecontents
variable.

This has already been said but if you are planning to have this server
program connected to the internet, don't use the pickle module on the data
that you receive. It would make your server vulnerable to being hacked.




> Well according to the documentation at
>
> http://docs.python.org/py3k/**tutorial/inputoutput.html#**
> reading-and-writing-files
>
> it should be straightforward to read and write pickled files
> Not sure why you want to pickle a text file over the network when you
> could just stream it between ports !
>
> however ...
>
> I'm currently getting a Unicode decode error on the first byte in the
> stream when it gets to the other end, no idea why so I guess I have to
> continue searching, read the documentation above and see if you can figure
> it out, that's what I'm doing.


Lipska, are you getting the UnicodeDecodeError during the call to
pickle.load? If stargate is opened with the binary flag there shouldn't be
any decoding during the reading of the socket. pickle expects its input to
be binary format so I guess it should only decode when trying to unpickle a
string (I assume these are simply encoded in utf-8). Try just using
stargate.read to see what data arrives. pickled data is often almost human
readable so you might be able to make an educated guess at what it's trying
to do.

Also, the code you posted looks good, but I have one suggestion. Python has
a nicer way of formatting strings using the str.format method. With this
you can replace

"I'm a " + self.name + " class spaceship and I have " + str(self.engines) +
" engines")

with

"I'm a {0} class spaceship and I have {1} engines".format(self.name,
self.engines)

The advantages of this method are that
1) You keep the template string together so it's easier to read.
2) You don't need to call str() on any of your arguments.
3) It's also more efficient (in some cases).

You can also use named parameters in the format string:

"I'm a {name} class spaceship and I have {number} engines".format(name=
self.name, number=self.engines)

There is also the older, deprecated (but not about to be removed) method:

"I'm a %s class spaceship and I have %i engines" % (self.name, self.engines)

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: save dictionary to a file without brackets.

2012-08-09 Thread Oscar Benjamin
On Aug 9, 2012 9:17 PM,  wrote:
>
> Hi,
> I have a dict() unique
> like this
> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
> and i want to print to a file without the brackets comas and semicolon in
order to obtain something like this?
> 4 5 1
> 5 4 1
> 4 4 2
> 2 3 1
> 4 3 2
> Any ideas?
> Thanks in advance

How's this?

from __future__ import print_function

output = open("out.txt", "w")

for (a, b), c in d.items():
print(a, b, c, file=output)

output.close()

Oscar.
> --
> http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: save dictionary to a file without brackets.

2012-08-09 Thread Oscar Benjamin
> What do you think? is there a way to speed up the process?
> Thanks
> Giuseppe

Which part is slow? How slow is it?

A simple test to find the slow part of your code is to print messages
between the commands so that you can see how long it takes between each
message.

Oscar.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: save dictionary to a file without brackets.

2012-08-09 Thread Oscar Benjamin
On Aug 10, 2012 12:34 AM, "Giuseppe Amatulli" 
wrote:
>
> Ciao,
> is 12 minutes for 5000x5000 pixel image. half of the time is for
> reading the arrays.
> and the other half for making the loop.
> I will try again to incorporate the mask action in the loop
> and
> read the image line by line.
> Thanks
> ciao

That does seem slow. I'm sure it can be a lot faster than that.

Did you also write the code for reading the arrays? The loop can certainly
be made faster but if you can't make the array reading faster there's not
much point spending a long time trying to speed up the rest.

Oscar.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: New internal string format in 3.3

2012-08-19 Thread Oscar Benjamin
On 19 August 2012 15:09,  wrote:

> I can not give you more numbers than those I gave.
> As a end user, I noticed and experimented my random tests
> are always slower in Py3.3 than in Py3.2 on my Windows platform.
>

Do the problems have a significant impact on any real application (rather
than random tests)?

Any significant change in implementation such as this is likely to have
both positive and negative performance costs. The important thing is how it
affects a real application as a whole.


>
> It is up to you, the core developers to give an explanation
> about this behaviour.


Unless others are unable to reproduce your observations.

If there is a big performance hit for text heavy applications then it's
worth reporting but you should focus your energy on distilling a
*meaningful* test case (rather than ranting about Americans, unicode,
latin-1 and so on).

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: New internal string format in 3.3

2012-08-19 Thread Oscar Benjamin
On Aug 19, 2012 5:22 PM,  wrote
>
> Py 3.2.3
> >>> timeit.timeit("('aœ€'*100).replace('a', 'œ€é')")
> 4.99396356635981
>
> Py 3.3b2
> >>> timeit.timeit("('aœ€'*100).replace('a', 'œ€é')")
> 7.560455708007855
>
> Maybe, not so demonstative. It shows at least, we
> are far away from the 10-30% "annouced".
>
> >>> 7.56 / 5
> 1.512
> >>> 5 / (7.56 - 5) * 100
> 195.312503

Maybe the problem is that your understanding of a percentage differs from
that of others.

I make that a 51% increase. I don't really understand what your 195 figure
is demonstrating.

Oscar.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Abuse of Big Oh notation

2012-08-20 Thread Oscar Benjamin
On Sun, 19 Aug 2012 16:42:03 -0700, Paul Rubin 
 wrote:

Steven D'Aprano  writes:
> Of course *if* k is constant, O(k) is constant too, but k is not 
> constant. In context we are talking about string indexing and 
slicing. 
> There is no value of k, say, k = 2, for which you can say "People 
will 
> sometimes ask for string[2] but never ask for string[3]". That is 

absurd.


The context was parsing, e.g. recognizing a token like "a" or "foo" 

in a
human-written chunk of text.  Occasionally it might be 

"sesquipidalian"

or some even worse outlier, but one can reasonably put a fixed and
relatively small upper bound on the expected value of k.  That 

makes the

amortized complexity O(1), I think.


No it doen't. It is still O(k). The point of big O notation is to 
understand the asymptotic behaviour of one variable as it becomes 
large because of changes in other variables. If k is small then you 
can often guess that O(k) is small. To say that an operation is O(k), 
however, is a statement about what happens when k is big (and is not 
refuted by saying that k is typically not big).


Oscar

--
http://mail.python.org/mailman/listinfo/python-list


Re: Abuse of Big Oh notation

2012-08-20 Thread Oscar Benjamin
On 20 August 2012 17:01, Paul Rubin  wrote:

> Oscar Benjamin  writes:
> > No it doen't. It is still O(k). The point of big O notation is to
> > understand the asymptotic behaviour of one variable as it becomes
> > large because of changes in other variables.
>
> Actually, two separate problems got mixed together late at night.  In
> neither case is k an independent variable that ranges over all possible
> values.  In both cases it is selected or observed by measurement (i.e.
> it is a dependent variable determined by something that is itself not
> independent).
>
> 1) Access in a rope: here, k is basically determined by the pointer size
> of the computer, which in CPython (the implementation we're discussing)
> the pointer size is 4 or 8 bytes (constants) in all instances AFAIK.  k
> should be a big enough that the pointer and allocation overhead is small
> compared to bloating the strings with UCS-2 or UCS-4, and small enough
> to not add much scan time.  It seems realistic to say k<=128 for this
> (several times smaller is probably fine).  128 is of course a constant
> and not a variable.  We are not concerned about hypothetical computers
> with billion bit pointers.
>

Okay, I see what you mean. If k is a hard-coded constant then it's not
unreasonable to say that O(k) is constant time in relation to the input
data (however big k is).

Oscar.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Class.__class__ magic trick help

2012-08-21 Thread Oscar Benjamin
On Mon, 20 Aug 2012 21:17:15 -0700 (PDT), Massimo Di Pierro 
 wrote:

Consider this code:




class SlowStorage(dict):
def __getattr__(self,key):
  return self[key]
def __setattr__(self,key):
  self[key]=value




class FastStorage(dict):
def __init__(self, __d__=None, **kwargs):
self.update(__d__,**kwargs)
def __getitem__(self,key):
return self.__dict__.get(key,None)
def __setitem__(self,key,value):
self.__dict__[key] = value
def __delitem__(self,key):
delattr(self,key)
def __copy__(self):
return Storage(self)
def __nonzero__(self):
return len(self.__dict__)>0
def pop(self,key,default=None):
if key in self:
default = getattr(self,key)
delattr(self,key)
return default
def clear(self):
self.__dict__.clear()
def __repr__(self):
return repr(self.__dict__)
def keys(self):
return self.__dict__.keys()
def values(self):
return self.__dict__.values()
def items(self):
return self.__dict__.items()
  def iterkeys(self):
return self.__dict__.iterkeys()
def itervalues(self):
return self.__dict__.itervalues()
def iteritems(self):
return self.__dict__.iteritems()
def viewkeys(self):
return self.__dict__.viewkeys()
def viewvalues(self):
return self.__dict__.viewvalues()
def viewitems(self):
return self.__dict__.viewitems()
def fromkeys(self,S,v=None):
return self.__dict__.fromkeys(S,v)
def setdefault(self, key, default=None):
try:
return getattr(self,key)
except AttributeError:
setattr(self,key,default)
return default
def clear(self):
self.__dict__.clear()
def len(self):
return len(self.__dict__)
def __iter__(self):
return self.__dict__.__iter__()
def has_key(self,key):
return key in self.__dict__
def __contains__(self,key):
return key in self.__dict__
def update(self,__d__=None,**kwargs):
if __d__:
for key in __d__:
kwargs[key] = __d__[key]
self.__dict__.update(**kwargs)
def get(self,key,default=None):
return getattr(self,key) if key in self else default




>>> s=SlowStorage()
>>> a.x=1  ### (1)
>>> a.x### (2)
1 # ok
>>> isinstance(a,dict)
True # ok
>>> print dict(a)
{'x':1} # ok (3)


Try:

a.items()


What does that show?





>>> s=FastStorage()
>>> a.x=1  ### (4)
>>> a.x### (5)
1 # ok
>>> isinstance(a,dict)
True # ok
>>> print dict(a)
{} # not ok (6)



Lines (4) and (5) are about 10x faster then lines (1) and (2). I 

like
FastStorage better but while (3) behaves ok, (6) does not behave as 

I

want.




I intuitively understand why FastStorage is cannot cast into dict
properly.




What I do not know is how to make it do the casting properly without
losing the 10x speedup of FastStorage over SlowStorage.




Any idea?


I don't really understand what your trying to do but since you didn't 
add the __setattr__ method to FastStorage the item is not added to 
the dictionary when you do a.x = 1


Oscar

--
http://mail.python.org/mailman/listinfo/python-list


Re: Class.__class__ magic trick help

2012-08-21 Thread Oscar Benjamin
On 21 August 2012 13:52, Massimo Di Pierro wrote:

> On Aug 21, 2:40 am, Oscar Benjamin  wrote:
> > On Mon, 20 Aug 2012 21:17:15 -0700 (PDT), Massimo Di Pierro
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >  wrote:
> > > Consider this code:
> > > class SlowStorage(dict):
> > > def __getattr__(self,key):
> > >   return self[key]
> > > def __setattr__(self,key):
> > >   self[key]=value
> > > class FastStorage(dict):
> > > def __init__(self, __d__=None, **kwargs):
> > > self.update(__d__,**kwargs)
> > > def __getitem__(self,key):
> > > return self.__dict__.get(key,None)
> > > def __setitem__(self,key,value):
> > > self.__dict__[key] = value
> > > def __delitem__(self,key):
> > > delattr(self,key)
> > > def __copy__(self):
> > > return Storage(self)
> > > def __nonzero__(self):
> > > return len(self.__dict__)>0
> > > def pop(self,key,default=None):
> > > if key in self:
> > > default = getattr(self,key)
> > > delattr(self,key)
> > > return default
> > > def clear(self):
> > > self.__dict__.clear()
> > > def __repr__(self):
> > > return repr(self.__dict__)
> > > def keys(self):
> > > return self.__dict__.keys()
> > > def values(self):
> > > return self.__dict__.values()
> > > def items(self):
> > > return self.__dict__.items()
> > >   def iterkeys(self):
> > > return self.__dict__.iterkeys()
> > > def itervalues(self):
> > > return self.__dict__.itervalues()
> > > def iteritems(self):
> > > return self.__dict__.iteritems()
> > > def viewkeys(self):
> > > return self.__dict__.viewkeys()
> > > def viewvalues(self):
> > > return self.__dict__.viewvalues()
> > > def viewitems(self):
> > > return self.__dict__.viewitems()
> > > def fromkeys(self,S,v=None):
> > > return self.__dict__.fromkeys(S,v)
> > > def setdefault(self, key, default=None):
> > > try:
> > > return getattr(self,key)
> > > except AttributeError:
> > > setattr(self,key,default)
> > > return default
> > > def clear(self):
> > > self.__dict__.clear()
> > > def len(self):
> > > return len(self.__dict__)
> > > def __iter__(self):
> > > return self.__dict__.__iter__()
> > > def has_key(self,key):
> > > return key in self.__dict__
> > > def __contains__(self,key):
> > > return key in self.__dict__
> > > def update(self,__d__=None,**kwargs):
> > > if __d__:
> > > for key in __d__:
> > > kwargs[key] = __d__[key]
> > > self.__dict__.update(**kwargs)
> > > def get(self,key,default=None):
> > > return getattr(self,key) if key in self else default
> > > >>> s=SlowStorage()
> > > >>> a.x=1  ### (1)
> > > >>> a.x### (2)
> > > 1 # ok
> > > >>> isinstance(a,dict)
> > > True # ok
> > > >>> print dict(a)
> > > {'x':1} # ok (3)
> >
> > Try:
> >
> > >>> a.items()
> >
> > What does that show?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > >>> s=FastStorage()
> > > >>> a.x=1  ### (4)
> > > >>> a.x### (5)
> > > 1 # ok
> > > >>> isinstance(a,dict)
> > > True # ok
> > > >>> print dict(a)
> > > {} # not ok (6)
> > > Lines (4) and (5) are about 10x faster then lines (1) and (2). I
> > like
> > > FastStorage better but while (3) behaves ok, (6) does not behave as
> > I
> > > want.
> > > I intuitively understand why FastStorage is cannot cast into dict
> > > properly.
> > > What I do not know is how to make it do the casting properly without
> > > losing the 10x speedup of FastStorage over SlowStorage.
> > > Any idea?
> >
> > I don't really understand what your trying to do but since you didn't
> > add the __setattr__ method to FastStorage the item is not added to
> > the dictionary when you do a.x = 1
> >
> > Oscar
>
> >>> a.items()
> [('x',1')]
>
> all the APIs work as expected except casting to dict.
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Sorry, I see what you're doing now. Because you've subclassed dict the dict
constructor is not using any of the python methods you have defined to
create a new dict. It is copying directly from the builtin instance that
your instance is wrapping, but you haven't actually passed your values on
to the superclass so it hasn't stored them in the builtin data structure.

Either subclass object so that your methods are called, or do something
like:

def __setitem__(self,key,value):
self.__dict__[key] = value
dict.__setitem__(self, key, value)

I still don't see the point of this but that should work.

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Class.__class__ magic trick help

2012-08-21 Thread Oscar Benjamin
On 21 August 2012 14:50, Massimo Di Pierro wrote:

> Hello Oscar,
>
> thanks for your help but your proposal of adding:
>
> def __setitem__(self,key,value):
>self.__dict__[key] = value
>dict.__setitem__(self, key, value)
>
> does not help me.
>
> What I have today is a class that works like SlowStorage. I want to
> replace it with NewStorage because it is 10x faster. That is the only
> reason. NewStorage does everything I want and all the APIs work like
> SlowStorage except casting to dict.
>
> By defining __setitem__ as you propose, you solve the casting to dict
> issue but you have two unwanted effects: each key,value is store twice
> (in different places), accessing the elements becomes slower the
> SlowStprage which is my problem in the first place.
>
> The issue for me is understanding how the casting dict(obj) works and
> how to change its behavior so that is uses methods exposed by obj to
> do the casting, if all possible.
>

Then you have two options:
1) subclass object instead of dict - you're not using any of the features
of the dict superclass and the fact that it is a superclass is confusing
the dict() constructor.
2) use a different "cast" e.g. d = dict(a.items())

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Class.__class__ magic trick help

2012-08-21 Thread Oscar Benjamin
On 21 August 2012 16:19, Oscar Benjamin wrote:

>
> On Aug 21, 2012 3:42 PM, "Massimo DiPierro" 
> wrote:
> >
> > Thanks again Oscar. I cannot do that. I have tight constraints. I am not
> at liberty to modify the code that uses the class. The exposed API cannot
> change including a.x, dict(a), is isinstance(a,dict).
> >
> > My goal it so change the definition of this class to make it faster.
> >
> > Where is in the Python source code is the casting to dict defined? Where
> can I try understand what it does?
>
> help(dict)
>
> There is no cast, there is only the dict constructor. If the dict
> constructor finds a dict instance (including from a subclass) then it will
> efficiently create the new dict from the old dict's underlying data without
> calling your methods. If you want dict(a) and isinstance(dict) to work them
> you need to tell the dict superclass to store the data using
> dict.__setitem__.
>
> You have three options:
> 1) use SlowStorage
> 2) duplicate the data in self and self.__dict__ (this will probably end up
> slowing down FastStorage)
> 3) change the requirement to isinstance(Mapping)
>
> Oscar
>
Okay, there is a way to solve your problem:

>>> class Storage(dict):
... def __init__(self, *args, **kwargs):
... dict.__init__(self, *args, **kwargs)
... self.__dict__ = self
...
>>> s = Storage()
>>> s
{}
>>> s.x = 1
>>> s
{'x': 1}
>>> dict(s)
{'x': 1}
>>> isinstance(s, dict)
True

But it's a stupid idea unless you know that the keys of your dict will
*never* have any of the following values:

>>> dir({})
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__',
'__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__',
'__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__',
'__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items',
'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem',
'setdefault', 'update', 'values', 'viewitems', 'viewkeys', 'viewvalues']

To see what goes wrong:

>>> s['items'] = [1,2,3]
>>> s.items()
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'list' object is not callable

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Guarding arithmetic

2012-08-23 Thread Oscar Benjamin
On 23 August 2012 10:05, Mark Carter  wrote:

> Suppose I want to define a function "safe", which returns the argument
> passed if there is no error, and 42 if there is one. So the setup is
> something like:
>
> def safe(x):
># WHAT WOULD DEFINE HERE?
>
> print safe(666) # prints 666
> print safe(1/0) # prints 42
>
> I don't see how such a function could be defined. Is it possible?
>

It isn't possible to define a function that will do this as the function
will never be called if an exception is raised while evaluating its
arguments. Depending on your real problem is a context-manager might do
what you want:

>>> from contextlib import contextmanager
>>> @contextmanager
... def safe(default):
... try:
... yield default
... except:
... pass
...
>>> with safe(42) as x:
... x = 1/0
...
>>> x
42
>>> with safe(42) as x:
... x = 'qwe'
...
>>> x
'qwe'

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What do I do to read html files on my pc?

2012-08-28 Thread Oscar Benjamin
On Tue, 28 Aug 2012 03:09:11 -0700 (PDT), mikcec82 
 wrote:

f = open(fileorig, 'r')
nomefile = f.read()




for x in nomefile:
if '' in nomefile:
print 'NOK'
else :
print 'OK'


You don't need the for loop. Just do:

nomefile = f.read()
if '' in nomefile:
   print('NOK')


**
But this one works on charachters and not on strings (i.e.: in this 

way I h=

ave searched NOT string by string, but charachters-by-charachters).


Oscar

--
http://mail.python.org/mailman/listinfo/python-list


Re: class object's attribute is also the instance's attribute?

2012-08-30 Thread Oscar Benjamin
On Thu, 30 Aug 2012 05:34:51 -0700 (PDT), Marco Nawijn 
 wrote:
If you want attributes to be local to the instance, you have to 

define them in the __init__ section of the class like this:


class A(object):
   def __init__(self):
d = 'my attribute'


Except that in this case you'd need to do:
self.d = 'my attribute'

Oscar

--
http://mail.python.org/mailman/listinfo/python-list


Re: Beginners question

2012-08-30 Thread Oscar Benjamin

On Thu, 30 Aug 2012 09:23:03 -0400, Dave Angel  wrote:
I haven't discovered why sometimes the type output shows type 

instead of
class.  There are other ways of defining classes, however, and 

perhaps

this is using one of them.  Still, it is a class, and stat() is
returning an instance of that class.


Builtin types show as type and classes defined in python show as 
class (even if they inherit from builtin types).


Oscar

--
http://mail.python.org/mailman/listinfo/python-list


Re: class object's attribute is also the instance's attribute?

2012-08-30 Thread Oscar Benjamin
On 30 August 2012 15:11, Marco Nawijn  wrote:
>
>
> Learned my lesson today. Don't assume you know something. Test it first
> ;). I have done quite some programming in Python, but did not know that
> class attributes are still local to the instances. It is also a little
> surprising I must say. I always considered them like static variables in
> C++ (not that I am an expert in C++).
>

Class attributes are analogous to static variables in C++ provided you only
ever assign to them as an attribute of the class.

>>> class A(object):
...   static = 5
...
>>> a = A()
>>> a.static
5
>>> A.static
5
>>> b = A()
>>> b.static
5
>>> A.static = 10
>>> a.static
10
>>> b.static
10

An instance attribute with the same name as a class attribute hides the
class attribute for that instance only.

>>> b.static = -1
>>> a.static
10
>>> b.static
-1
>>> del b.static
>>> b.static
10

This is analogous to having a local variable in a function that hides a
module level variable with the same name:

x = 10

def f1():
x = 4
print(x)

def f2():
print(x)

f2()  # 10
f1()  # 4
f2()  # still 10

If you want f1 to modify the value of x seen by f2 then you should
explicitly declare x as global in f1.

Likewise if you want to modify an attribute for all instances of a class
you should explicitly assign to the class attribute rather than an instance
attribute.

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: simple client data base

2012-09-03 Thread Oscar Benjamin
On 3 September 2012 15:12, Mark R Rivet  wrote:

> Hello all, I am learning to program in python. I have a need to make a
> program that can store, retrieve, add, and delete client data such as
> name, address, social, telephone number and similar information. This
> would be a small client database for my wife who has a home accounting
> business.
>

I would use the sqlite3 module for this (if I wasn't using gmail contacts).


> I have been reading about lists, tuples, and dictionary data
> structures in python and I am confused as to which would be more
> appropriate for a simple database.
>

As already said by Chris these are the types that Python uses to represent
data in memory, rather than on disk. There are a number of ways that you
can use these to represent the information from your database. For example,
you could use a dict of dicts:

>>> contact_db = {} # empty dict
>>> contact_db['john'] = {'alias':'john', 'name':'John Doe', 'email': '
j...@example.com'}
>>> contact_db['dave'] = {'alias':'dave', 'name':'Dave Doe', 'email': '
d...@example.com'}
>>> contact_db
{'dave': {'alias': 'dave', 'name': 'Dave Doe', 'email': 'd...@example.com'},
'john': {'alias': 'john', 'name': 'John Doe', 'email': 'j...@example.com'}}
>>> contact_db['dave']
{'alias': 'dave', 'name': 'Dave Doe', 'email': 'd...@example.com'}
>>> contact_db['dave']['email']
'd...@example.com'

I know that python has real database capabilities but I'm not there
> yet and would like to proceed with as simple a structure as possible.
>

If you don't want to use real database capabilities you could save the data
above into a csv file using the csv module:

>>> import csv
>>> with open('contacts.csv', 'wb') as f:
...   writer = csv.DictWriter(f)
...   writer.writelines(contact_db.values())

You can then reload the data with:

>>> with open('contacts.csv', 'rb') as f:
...   reader = csv.DictReader(f, ['alias', 'name', 'email'])
...   new_contact_db = {}
...   for row in reader:
... new_contact_db[row['alias']] = row
...
>>> new_contact_db
{'dave': {'alias': 'dave', 'name': 'Dave Doe', 'email': 'd...@example.com'},
'john': {'alias': 'john', 'name': 'John Doe', 'email': 'j...@example.com'}}
>>> contact_db == new_contact_db
True


>
> Can anyone give me some idea's or tell me which structure would be
> best to use?
>

The above method for storing the data on disk is simple but not very safe.
If you use it for your wife's business make sure that you are always
keeping backups of the file. Preferably don't overwrite the file directly
but write the data out to a separate file first and then rename the file
(to avoid loss of data if the program has an error while writing).

The obvious way to improve on the above is to use the sqlite3 module to
store the data in an sqlite3 file instead of a csv file. There is one
advantage to using the above over using an sqlite3 database which is that
the data can be edited manually as a text file or using standard
spreadsheet software if necessary.

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-04 Thread Oscar Benjamin
On 4 September 2012 19:07, Steven D'Aprano <
steve+comp.lang.pyt...@pearwood.info> wrote:

> On Tue, 04 Sep 2012 18:32:57 +0200, Johannes Bauer wrote:
>
> > On 04.09.2012 04:17, Steven D'Aprano wrote:
> >
> >> On average, string equality needs to check half the characters in the
> >> string.
> >
> > How do you arrive at that conclusion?
>
> Take two non-empty strings of the same length, N. If the strings are
> equal, you have to make N comparisons to be sure they are equal (you
> check all N pairs of characters). If they are unequal, you have to check
> each pair of characters up to the first pair that are different:
>
> def string_equality(astr, bstr):
> # Assumes lengths are equal.
> for achr, bchr in zip(astr, bstr):
> if achr != bchr:
> return False
> return True
>
> For the unequal case, how many comparisons do you do? Ahead of time, we
> know very little about the strings. We don't even know how many possible
> characters there are (are they English alphanumeric, ASCII, Greek, Thai,
> or from the full range of 1114111 Unicode code points?) or what their
> probability distribution is.
>
> A reasonable, conservative assumption is to calculate the largest
> possible value of the average for random strings. That largest value
> occurs when the alphabet is as small as possible, namely two characters.
> In practice, strings come from a larger alphabet, up to 1114111 different
> characters for full Unicode strings, so the average for them will be less
> than the average we calculate now.
>
> So for unequal strings, the number of comparisons is equally likely to be
> 1, 2, 3, ..., N. The average then is:


What?


>
> sum([1, 2, 3, ..., N])/N
>
> which by a bit of simple algebra works out to be (N+1)/2, or half the
> characters as I said.
>
> (Note that this average assumes the strings are completely random. In
> practice, strings tend to come from strongly non-uniform distributions of
> characters. For instance, in English texts, 'e' is much more likely than
> 'q' and most characters are vanishingly rare, and in practice, strings
> are likely to be highly non-random.)


If the strings are 'completely random' (by which I assume you mean that
each character is IID) then the probability of a match for the character at
any one index is the same as the probability for a match at any other
index. Lets say the probability for a match is p and that p < 1.

Then for the first comparison:
1) with probability (1 - p) we terminate the loop after 1 comparison.
2) With probability p we continue to the second comparison

The second comparison occurs with probability p (from 2 above) and if we
reach this point then:
1) with probability (1 - p) we terminate the loop after this second
comparison
2) With probability p we continue to the third comparison

The probability of reaching the second comparison is p and the probability
of terminating at this comparison *if we reach it* is (1-p). So the
probability from the outset that we terminate at the second comparison is
p*(1 - p).

Prob(1 comparison) = (1-p) > p*(1-p) = prob(2 comparisons)

(since p < 1)

This can easily be extended by induction or otherwise to show that the
probability of terminating after N comparisons decreases as N increases. In
fact since it decreases by a factor of p each time, even if p is 1/2 (as
would be the case if there were two equally likely characters) the
probability of continuing to N comparisons becomes vanishingly small very
quickly as N increases. In practise p is larger for mast ascii/bmp/unicode
problems so the probability vanishes even more quickly.

If you follow this through you should arrive at Johannes' result
above. Applying this to real data is obviously more complicated since
successive characters are not independent but it would be a fairly unusual
situation if you were not more likely to stop at an earlier point in the
comparison loop than a later one.

Oscar.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-04 Thread Oscar Benjamin
On 4 September 2012 22:59, Chris Angelico  wrote:

> On Wed, Sep 5, 2012 at 2:32 AM, Johannes Bauer 
> wrote:
> > How do you arrive at that conclusion? When comparing two random strings,
> > I just derived
> >
> > n = (256 / 255) * (1 - 256 ^ (-c))
> >
> > where n is the average number of character comparisons and c. The
> > rationale as follows: The first character has to be compared in any
> > case. The second with a probability of 1/256, the third with 1/(256^2)
> > and so on.
>
> That would be for comparing two random areas of memory. Python strings
> don't have 256 options per character; and in terms of actual strings,
> there's so many possibilities. The strings that a program is going to
> compare for equality are going to use a vastly restricted alphabet;
> for a lot of cases, there might be only a few dozen plausible
> characters.
>

> But even so, it's going to scale approximately linearly with the
> string length. If they're really random, then yes, there's little
> chance that either a 1MB string or a 2MB string will be the same, but
> with real data, they might very well have a long common prefix. So
> it's still going to be more or less O(n).
>

It doesn't matter whether there are 256 possible characters or 2. String
comparisons are best case O(1) and worst case O(n). For the average case we
need to assume the distribution of the strings. Assuming random strings
(with IID characters), even if there are only 2 characters the probability
that all the characters up to the jth compare equal will still decrease
exponentially with n, giving an average case of O(1) comparisons (if the
two characters are equally likely: 1/2 + 2/4 + 3/8 + 4/16 + ... + j / (2 **
j) + ...).

For real strings common prefixes may be more likely but unless the length
of those common prefixes scales with n the average case number of
comparisons required will not be O(n).

The only way to get round the problem of O(1) string comparisons is to use
strings composed entirely from a set of 1 possible character. Not only does
this do away with all of the inherent performance problems of flexible
string representations but it results in O(0) comparison complexity (far
outstripping previous Python versions).

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-05 Thread Oscar Benjamin
On 5 September 2012 10:48, Peter Otten <__pete...@web.de> wrote:

> Chris Angelico wrote:
>
> > On Wed, Sep 5, 2012 at 6:29 PM, Peter Otten <__pete...@web.de> wrote:
> >> comparing every pair in a sample of 1000 8-char words
> >> taken from '/usr/share/dict/words'
> >>
> >> head
> >> 1: 477222 
> >> 2:  18870 **
> >> ...
>
> tail
> 1: 386633 
> 2:  74966 ***
>
>
> > Not understanding this. What are the statistics,
>
> I measured how many chars they have in common for all combinations of 1000
> words taken from /usr/share/dict/words.
>
> 477222 pairs have one char in common, 18870 pairs have two chars in common
> when compared from the *beginning* of the string.
>
> 386633 pairs have one char in common, 74966 pairs have two chars in common
> when compared from the *end* of the string.
>
> and what (if it's not obvious from the previous answer) do they prove?
>
> They demonstrate that for two words from that particular corpus it is
> likely
> that a[::-1] == b[::-1] has to take more (1.34 versus 1.05 on average)
> characters into account than a == b, i. e. comparing from the back should
> be
> slower rather than faster.
>
> If that doesn't help, here's the code ;)
>  

def count_common(a, b):
> i = 0
> for i, (x, y) in enumerate(zip(a, b), 1):
> if x != y:
> break
> return i
>

This function will return 1 if the first character differs. It does not
count the number of common characters but rather the more relevant quantity
which is the number of comparisons required to decide if the strings are
equal.

It shows that, for the English words in this dictionary, the first letters
of two randomly selected 8-character words are equal ~5% of the time while
the last letters are equal ~20% of the time.

But despite the non-uniformity in the distribution of these strings,
this provides a good example of the fact that for many situations involving
real data, average case comparison complexity is O(1). This is because the
probability of stopping after N comparisons decreases exponentially with N,
so that the sequence of counts forms something loosely like an arithmetic
progression:

>>> cmp_forward = [477222, 18870, 2870, 435, 74, 17, 12]
>>> cmp_backward = [386633, 74966, 29698, 6536, 1475, 154, 28, 10]
>>> def ratios(seq):
... for count1, count2 in zip(seq[:-1], seq[1:]):
... yield count2 / float(count1)
...
>>> list(ratios(cmp_forward))
[0.03954134553729711, 0.15209326974032855, 0.15156794425087108,
0.17011494252873563, 0.22972972972972974, 0.7058823529411765]
>>> list(ratios(cmp_backward))
[0.19389446839767946, 0.39615292265827173, 0.22008216041484274,
0.22567319461444307, 0.10440677966101695, 0.18181818181818182,
0.35714285714285715]

A notable outlier in these sequences is for comparing the first character
of the two words which is why for this string distribution it is better to
start at the beginning than the end.

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-05 Thread Oscar Benjamin
In news.gmane.comp.python.general, you wrote:
> On Wed, 05 Sep 2012 16:51:10 +0200, Johannes Bauer wrote:
> [...]
>>> You are making unjustified assumptions about the distribution of
>>> letters in the words. This might be a list of long chemical compounds
>>> where the words typically differ only in their suffix. It might be a
>>> list of people with titles:
>> 
>> Actually, I'm not. I'm stating exactly what assumptions I'm making to
>> get my calculation. I'm comparing *random* character strings or
>> bitstrings.
>
> Excuse me, you are not. You are comparing English words which are highly 
> non-random.

Evidently we have different understandings of what 'random' means. I don't
think it's unreasonable to say that strings drawn uniformly from the set of
all strings in the English language (having a given number of characters) is
random. The distribution is not uniform over the set of all possible character
strings but it is still random. I think Johannes deliberately chose these
strings to emulate a particular kind of 'real' distribution of strings that
might occur in practise.

>
>
>> You, on the other hand, are making vague assumptions which you do not
>> care for formalize and yet you claim that "the number of comparisons is
>> equally likely to be 1, 2, 3, ..., N. The average then is". Without any
>> explanation for this. At all.
>
> I will accept that my explanation was not good enough, but I strongly 
> disagree that I gave no explanation at all.
>
>
>>> Herr Professor Frederick Schmidt
>>> Herr Professor Frederick Wagner
>>> ...
>> 
>> Is your assumtion that we're comparing words that have the common prefix
>> "Herr Professor Frederick "? 
>
> No, I am pointing out that *your* assumption that most string comparisons 
> will halt close to the beginning of the string is an invalid assumption. 
> Your assumption only holds for some non-random strings.

I think you have this backwards. The case where this assumption is provably
true is precisely for random strings. To be clear, when I say 'random' in this
context I mean that each character is chosen independently from the same
probability distribution over the possible characters regardless of which
index it has in the string and regardless of what the other characters are
(IID). In this case the probability that comparison terminates at the jth
character decreases exponentially with j. This means that for large strings
the expected number of character comparisons is independent of the number of
characters in the string as the probability of reaching the later parts of the
string is too small for them to have any significant effect. This is provable
and applies regardless of how many possible characters there are and whether
or not each character is equally likely (except for the pathological case
where one character has a probability of 1).

For strings from 'real' distributions it is harder to make statements about
the 'average case' and it is possible to construct situations where the
comparison would regularly need to compare a common prefix. However, to get
asymptotic performance worse than O(1) it is not sufficient to say that there
may be a common prefix such as 'Herr' in the distribution of strings. It is
necessary that, somehow, the common prefix is likely to grow as the size of
the strings grows.

For example, the set of all strings of length N whose first N//2 characters
are always 'a' and whose remaining characters are chosen IID would lead to
O(N) performance. This is why the file paths example chosen at the start of
this thread is a good one. If a program is dealing with a number of large
strings representing file paths then it is not uncommon that many of those
paths would refer to files in the same deeply nested directory and hence
compare equal for a significant number of characters. This could lead to
average case O(N) performance.

I think it's appropriate to compare string comparison with dict insertion:
Best case O(1) (no hash collisions)
Worst case O(N) (collides with every key)
Average case O(1) (as long as you don't use pathological data)

The only difference with string comparison is that there are some conceivable,
non-malicious cases where the pathological data can occur (such as with file
paths). However, I suspect that out of all the different uses of python
strings these cases are a small minority.

In saying that, it's not inconceivable that someone could exploit string
comparison by providing pathological data to make normally O(1) operations
behave as O(N). If I understand correctly it was precisely this kind of
problem with dict insertion/lookup that lead to the recent hash-seed security
update.

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-06 Thread Oscar Benjamin

On Thu, 06 Sep 2012 06:07:38 -0400, Dave Angel  wrote:

For random strings (as defined below), the average compare time is
effectively unrelated to the size of the string, once the size 

passes

some point.



Define random string as being a selection from a set of characters, 

with
replacement.  So if we pick some set of characters, say 10 (or 256, 

it

doesn't really matter), the number of possible strings is 10**N.



The likelihood of not finding a mismatch within k characters is  
(1/10)**k   The likelihood of actually reaching the end of the 

random
string is (1/10)**N.  (which is the reciprocal of the number of 

strings,

naturally)



If we wanted an average number of comparisons, we'd have to 

calculate a

series, where each term is a probability times a value for k.
   sum((k * 9*10**-k) for k in range(1, 10))





Those terms very rapidly approach 0, so it's safe to stop after a 
few. 

Looking at the first 9 items, I see a value of 1.111



This may not be quite right, but the value is certainly well under 

2 for

a population of 10 characters, chosen randomly.  And notice that N
doesn't really come into it.


It's exactly right. You can obtain this result analytically from 
Johannes' formula above. Just replace 256 with 10 to get that the 
expected number of comparisons is


(10/9)*(1 - 10**(-N))

The last term shows the dependence on N and is tiny even for N=9.

Oscar

--
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-07 Thread Oscar Benjamin
On 2012-09-07, Steven D'Aprano  wrote:
> 
>
> After further thought, and giving consideration to the arguments given by 
> people here, I'm now satisfied to say that for equal-length strings, 
> string equality is best described as O(N).
>
> 1) If the strings are equal, a == b will always compare all N 
>characters in each string.
>
> 2) If the strings are unequal, a == b will *at worst* compare
>all N characters.
>
> 3) Following usual practice in this field, worst case is the
>one which conventionally is meant when discussing Big Oh
>behaviour. See, for example, "Introduction To Algorithms" 
>by Cormen, Leiserson and Rivest.

Would you say, then, that dict insertion is O(N)?

>
> Also of some interest is the best case: O(1) for unequal strings (they 
> differ at the first character) and O(N) for equal strings.
>
> Also of interest is the case that has caused the majority of the 
> discussion, the average case. I am now satisfied that the average number 
> of comparisons for unequal strings is O(1). To be precise, it is bounded 
> below by 1 comparison (you always have to compare at least one pair of 
> characters) and bounded above by 2 comparisons.

I find this idea of separating into the comparison of equal strings versus the
comparison of unequal strings rather odd. If the strings you compare come from
a distribution where they are guaranteed to be equal (or unequal) then you can
just use the O(0) comparison method.

Since string comparison is only useful if the strings can be equal or unequal,
the average case depends on how often they are equal/unequal as well as the
average complexity of both. For random strings the frequency of equal strings
decreases very fast as N increases so that the comparison of random strings is
O(1).

>
> (I'm talking about the average here -- the actual number of comparisons 
> can range all the way up to N, but the average is <= 2.)
>
> If I've done the maths right, the exact value for the average is:
>
> ((M-1)*sum( (N-i)*M**i for i in range(0, N) ) + N)/(M**N)

I'm not sure where the extra N comes from ^ but otherwise good.

I would have written that as:

(1 - p) * sum(i * p**(i-1) for i in range(1, N+1))

where p is the probability of a match (1/M for M equally likely characters) or
in closed form:

 ⎛ N ⎞
 ⎝1 - p ⋅(1 + N ⋅(1 - p))⎠
 ─
   1 - p

>
> for random strings of length N taken from an alphabet of size M.
>
> For M = 2, that average approaches but never exceeds 2 as N increases; 
> for M = 3, the average approaches 1.5, for M = 4 it approaches 1.333... 
> and so forth.

It approaches 1 / (1 - p) or, if you prefer: M / (M - 1)

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-07 Thread Oscar Benjamin
On 2012-09-07, Oscar Benjamin  wrote:
> On 2012-09-07, Steven D'Aprano  wrote:
>> 
>
> Since string comparison is only useful if the strings can be equal or unequal,
> the average case depends on how often they are equal/unequal as well as the
> average complexity of both. For random strings the frequency of equal strings
> decreases very fast as N increases so that the comparison of random strings is
> O(1).
>
>>
>> (I'm talking about the average here -- the actual number of comparisons 
>> can range all the way up to N, but the average is <= 2.)
>>
>> If I've done the maths right, the exact value for the average is:
>>
>> ((M-1)*sum( (N-i)*M**i for i in range(0, N) ) + N)/(M**N)
>
> I'm not sure where the extra N comes from ^ but otherwise good.

Ok, I see it's for the case where they're equal.

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-08 Thread Oscar Benjamin
On 2012-09-08, Steven D'Aprano  wrote:
> On Fri, 07 Sep 2012 19:10:16 +0000, Oscar Benjamin wrote:
>
>> On 2012-09-07, Steven D'Aprano 
>> wrote:
>> 
>> 
>> Would you say, then, that dict insertion is O(N)?
>
> Pedantically, yes. 
>
> But since we're allowed to state (or even imply *wink*) whatever 
> assumptions we like, we're allowed to assume "in the absence of 
> significant numbers of hash collisions" and come up with amortized O(1) 
> for dict insertions and lookups.
>
> (Provided, of course, that your computer has an infinite amount of 
> unfragmented memory and the OS never starts paging your dict to disk. 
> Another unstated assumption that gets glossed over when we talk about 
> complexity analysis -- on real world computers, for big enough N, 
> *everything* is O(2**N) or worse.)
>
> Big Oh analysis, despite the formal mathematics used, is not an exact 
> science. Essentially, it is a way of bringing some vague order to hand-
> wavy estimates of complexity, and the apparent mathematical rigour is 
> built on some awfully shaky foundations. But despite that, it actually is 
> useful.
>
> Coming back to strings... given that in any real-world application, you 
> are likely to have some string comparisons on equal strings and some on 
> unequal strings, and more importantly you don't know which are which 
> ahead of time, which attitude is less likely to give you a nasty surprise 
> when you run your code?
>
> "I have many millions of 100K strings to compare against other 100K 
> strings, and string comparisons are O(1) so that will be fast."
>
> "I have many millions of 100K strings to compare against other 100K 
> strings, and string comparisons are O(N) so that will be slow, better 
> find another algorithm."

True. I can't think of a situation where I've used string comparisons
directly in any text heavy code. Rather, I would use a dict or a set (or a
regex) and hash(str) is always O(N).

>
>
> Remember too that "for small enough N, everything is O(1)". Getting hung 
> up on Big Oh is just as much a mistake as ignoring it completely.
>
>

I can't think of a situation in my own work where O(N) vs O(1) string
comparisons would cause a significant problem (except perhaps in libraries
that I use but didn't write). However, I can find a number of cases where I
compare numpy.ndarrays for equality. For example, I found

if np.all(a == b):

in some code that I recently wrote. Although np.all() short-circuits, a==b
does not so that line forces O(N) behaviour onto a situation where the average
case can be better. Unfortunately numpy doesn't seem to provide a
short-circuit equals() function. array_equal() is what I want but it does the
same as the above. In future, I'll consider using something like

def cmparray(a, b):
  return a.shape == b.shape and a.dtype == b.dtype and buffer(a) == buffer(b)

to take advantage of (what I assume are) short-circuit buffer comparisons.

>> Since string comparison is only useful if the strings can be equal or
>> unequal, the average case depends on how often they are equal/unequal as
>> well as the average complexity of both. For random strings the frequency
>> of equal strings decreases very fast as N increases so that the
>> comparison of random strings is O(1).
>
> But that is not an upper bound, and Big Oh analysis is strictly defined 
> in terms of upper bounds.

It is an upper bound, but it is an upper bound on the *expectation value*
assuming a particular distribution of inputs, rather than an upper bound on
all possible inputs.

>>> (I'm talking about the average here -- the actual number of comparisons
>>> can range all the way up to N, but the average is <= 2.)

The average is actually bounded by 1 / (1 - p) where p is the probability that
two characters match. This bound can be arbitrarily large as p approaches 1 as
would be the case if, say, one character was much more likely than others. The
particular assumption that you have made p = 1/M where M is the number of
characters is actually the *smallest* possible value of p. For non-uniform
real data (English words for example) p is significantly greater than 1/M but
in a strict bounds sense we should say that 1/M <= p <= 1.

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Standard Asynchronous Python

2012-09-10 Thread Oscar Benjamin
On 2012-09-10, Dennis Lee Bieber  wrote:
> On Sun, 9 Sep 2012 20:07:51 -0400, "Dustin J. Mitchell"
> declaimed the following in
> gmane.comp.python.general:
>
>> 
>> My proposal met with near-silence, and I didn't pursue it.  Instead, I
>> did what any self-respecting hacker would do - I wrote up a framework,
>> uthreads [4], that implemented my idea.  This was initially a simple
>> trampoline scheduler, but I eventually refactored it to run atop
>> Twisted, since that's what I use.  To my knowledge, it's never been
>> used.
>>
>   So for your small attempt to hide an event-driven dispatcher, one
> has to load a massive event-driven library. Some years ago I tried to
> make sense of Twisted and failed badly. Maybe it makes sense to those
> raised on UNIX style select() (where practically anything that involved
> data transfer over some sort of channel could be tested -- but doesn't
> work as such on Windows where only network sockets can be used, file i/o
> needs to use a different call),
>> 

I think the idea behind the PEP is to facilitate modularisation of event
driven frameworks into dispatchers and libraries that are suitable for running
within dispatchers. When you say a 'massive-event driven library' I guess you
mean something liek Twisted. I don't have much experience with Twisted but
having looked at it a bit my impression is that it is so large because it
includes many components that are not essential for every user. I guess that
the reason for keeping those components in Twisted rather than as separate
projects is not so much because every user needs them but because many of them
are implemented in a way that makes them not much use outside of Twisted.

The idea that Dustin is proposing is that in the same way that a library might
declare a subset of its API to be thread-safe, and so usable with threading
frameworks, a library could expose a PEP-XXX compliant interface for use with
a PEP-XXX compliant dispatcher. If implemented that should facilitate the
creation of minimal dispatchers and minimal standard components that can run
within those dispatchers. This would mean that it wouldn't be necessary to
make massive event-driven libraries but rather smaller interchangeable
libraries. For example, it might facilitate the creation of a Windows-specific
dispatcher that would be able to use the best underlying Windows APIs while
also benefitting from any PEP-XXX compliant libraries that would work with any
other dispatcher.

>> As I get to work on the PEP, I'd like to hear any initial reactions to the
>> idea.

I don't have much experience with the event-driven frameworks but having made
a couple of simple scripts using gevent/Twisted my experience is that learning
to use these frameworks is hard, largely because of the number of framework-
specific concepts that are needed to make simple examples work. I would expect
that giving each framework a relatively standardised interface would make them
much easier to learn.

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-10 Thread Oscar Benjamin
On 2012-09-10, Steven D'Aprano  wrote:
> On Mon, 10 Sep 2012 08:59:37 +, Duncan Booth wrote:
>
>> Gelonida N  wrote:
>> 
>> so at the expense of a single dictionary
>> insertion when the string is created you can get guaranteed O(1) on all
>> the comparisons.
>
> What interning buys you is that "s == t" is an O(1) pointer compare if 
> they are equal. But if s and t differ in the last character, __eq__ will 
> still inspect every character. There is no way to tell Python "all 
> strings are interned, if s is not t then s != t as well".
>

I thought that if *both* strings were interned then a pointer comparison could
decide if they were unequal without needing to check the characters.

Have I misunderstood how intern() works?

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-10 Thread Oscar Benjamin
On 2012-09-10, Chris Angelico  wrote:
> On Tue, Sep 11, 2012 at 12:06 AM, Oscar Benjamin
> wrote:
>> On 2012-09-10, Steven D'Aprano  wrote:
>>> What interning buys you is that "s == t" is an O(1) pointer compare if
>>> they are equal. But if s and t differ in the last character, __eq__ will
>>> still inspect every character. There is no way to tell Python "all
>>> strings are interned, if s is not t then s != t as well".
>>>
>>
>> I thought that if *both* strings were interned then a pointer comparison
>> could decide if they were unequal without needing to check the characters.
>>
>> Have I misunderstood how intern() works?
>
> In a language where _all_ strings are guaranteed to be interned (such
> as Lua, I think), you do indeed gain this. Pointer inequality implies
> string inequality. But when interning is optional (as in Python), you
> cannot depend on that, unless there's some way of recognizing interned
> strings. Of course, that may indeed be the case; a simple bit flag
> "this string has been interned" would suffice, and if both strings are
> interned AND their pointers differ, THEN you can be sure the strings
> differ.
>
> I have no idea whether or not CPython version X.Y.Z does this. The
> value of such an optimization really depends on how likely strings are
> to be interned; for instance, if the compiler automatically interns
> all the names of builtins, this could be quite beneficial. Otherwise,
> probably not; most Python scripts don't bother interning anything.
>

I haven't looked at the source but my understanding was precisely that there
is an intern() bit and that not only the builtins module but all the literals
in any byte-compiled module are interned.

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-10 Thread Oscar Benjamin
On 2012-09-10, Oscar Benjamin  wrote:
> On 2012-09-10, Chris Angelico  wrote:
>> On Tue, Sep 11, 2012 at 12:06 AM, Oscar Benjamin
>> wrote:
>>> On 2012-09-10, Steven D'Aprano  wrote:
>>>> What interning buys you is that "s == t" is an O(1) pointer compare if
>>>> they are equal. But if s and t differ in the last character, __eq__ will
>>>> still inspect every character. There is no way to tell Python "all
>>>> strings are interned, if s is not t then s != t as well".
>>>>
>>>
>>> I thought that if *both* strings were interned then a pointer comparison
>>> could decide if they were unequal without needing to check the characters.
>>>
>>> Have I misunderstood how intern() works?
>>
>> In a language where _all_ strings are guaranteed to be interned (such
>> as Lua, I think), you do indeed gain this. Pointer inequality implies
>> string inequality. But when interning is optional (as in Python), you
>> cannot depend on that, unless there's some way of recognizing interned
>> strings. Of course, that may indeed be the case; a simple bit flag
>> "this string has been interned" would suffice, and if both strings are
>> interned AND their pointers differ, THEN you can be sure the strings
>> differ.
>>
>> I have no idea whether or not CPython version X.Y.Z does this. The
>> value of such an optimization really depends on how likely strings are
>> to be interned; for instance, if the compiler automatically interns
>> all the names of builtins, this could be quite beneficial. Otherwise,
>> probably not; most Python scripts don't bother interning anything.
>>
>
> I haven't looked at the source but my understanding was precisely that there
> is an intern() bit and that not only the builtins module but all the literals
> in any byte-compiled module are interned.
>

s/literals/identifiers/

You can see the interned flag in the PyUnicodeObject struct here:
http://hg.python.org/cpython/file/3ffd6ad93fe4/Include/unicodeobject.h#l303

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-10 Thread Oscar Benjamin
On 2012-09-10, Dan Goodman  wrote:
> On 04/09/2012 03:54, Roy Smith wrote:
>> Let's assume you're testing two strings for equality.  You've already
>> done the obvious quick tests (i.e they're the same length), and you're
>> down to the O(n) part of comparing every character.
>>
>> I'm wondering if it might be faster to start at the ends of the strings
>> instead of at the beginning?  If the strings are indeed equal, it's the
>> same amount of work starting from either end.  But, if it turns out that
>> for real-life situations, the ends of strings have more entropy than the
>> beginnings, the odds are you'll discover that they're unequal quicker by
>> starting at the end.
>
>  From the rest of the thread, it looks like in most situations it won't 
> make much difference as typically very few characters need to be 
> compared if they are unequal.
>
> However, if you were in a situation with many strings which were almost 
> equal, the most general way to improve the situation might be store a 
> hash of the string along with the string, i.e. store (hash(x), x) and 
> then compare equality of this tuple. Almost all of the time, if the 
> strings are unequal the hash will be unequal. Or, as someone else 
> suggested, use interned versions of the strings. This is basically the 
> same solution but even better. In this case, your startup costs will be 
> higher (creating the strings) but your comparisons will always be instant.
>

Computing the hash always requires iterating over all characters in the string
so is best case O(N) where string comparison is best case (and often average
case) O(1).

Also, so far as I know the hash value once computed is stored on the string
object itself [1] and used for subsequent string comparisons so there's no
need for you to do that in your code.

Oscar

[1] http://hg.python.org/cpython/file/71d94e79b0c3/Include/unicodeobject.h#l293

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-10 Thread Oscar Benjamin
On 2012-09-10, Dan Goodman  wrote:
> On 10/09/2012 18:07, Dan Goodman wrote:
>> On 04/09/2012 03:54, Roy Smith wrote:
>>> Let's assume you're testing two strings for equality.  You've already
>>> done the obvious quick tests (i.e they're the same length), and you're
>>> down to the O(n) part of comparing every character.
>>>
>>> I'm wondering if it might be faster to start at the ends of the strings
>>> instead of at the beginning?  If the strings are indeed equal, it's the
>>> same amount of work starting from either end.  But, if it turns out that
>>> for real-life situations, the ends of strings have more entropy than the
>>> beginnings, the odds are you'll discover that they're unequal quicker by
>>> starting at the end.
>>
>>  From the rest of the thread, it looks like in most situations it won't
>> make much difference as typically very few characters need to be
>> compared if they are unequal.
>>
>> However, if you were in a situation with many strings which were almost
>> equal, the most general way to improve the situation might be store a
>> hash of the string along with the string, i.e. store (hash(x), x) and
>> then compare equality of this tuple. Almost all of the time, if the
>> strings are unequal the hash will be unequal. Or, as someone else
>> suggested, use interned versions of the strings. This is basically the
>> same solution but even better. In this case, your startup costs will be
>> higher (creating the strings) but your comparisons will always be instant.
>
> Just had another thought about this. Although it's unlikely to be 
> necessary in practice since (a) it's rarely necessary at all, and (b) 
> when it is, hashing and optionally interning seems like the better 
> approach, I had another idea that would be more general. Rather than 
> starting from the beginning or the end, why not do something like: check 
> the first and last character, then the len/2 character, then the len/4, 
> then 3*len/4, then len/8, 3*len/8, etc. You'd need to be a bit clever 
> about making sure you hit every character but I'm sure someone's already 
> got an efficient algorithm for this. You could probably even make this 
> cache efficient by working on cache line length blocks. Almost certainly 
> entirely unnecessary, but I like the original question and it's a nice 
> theoretical problem.

It's not totally theoretical in the sense that the reasoning applies to all
sequence comparisons. If you needed to compare lists of objects where the
comparison of each pair of elements was an expensive operation then you would
want to think carefully about what order you used. Also in general you can't
hash/intern all sequences.

If I was going to change the order of comparisons for all strings then I would
use a random order. This is essentially how dict gets away with claiming to
have O(1) lookup. There are sequences of inputs that can cause every possible
hash collision to occur but because the hash function acts as a kind of
randomisation filter the pathological sequences are very unlikely to occur
unless someone is going out of their way. The clever way that Python 3.3
prevents someone from even doing this on purpose is just to introduce
additional per-process randomisation.

The difference between dict lookup and string comparison is that string
comparison always compares the characters in the same order and it corresponds
to the natural ordering of the data. This means that some pefectly natural use
cases like comparing file-paths can have close to worst case behaviour. If
string/sequence comparison occurs in a random order then there can be no use
case where the likely strings would induce close to worst case behaviour
unless you really are just comparing lots of almost identical sequences.

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-11 Thread Oscar Benjamin
On 11 September 2012 10:51, Duncan Booth wrote:

> Oscar Benjamin  wrote:
>
> >> What interning buys you is that "s == t" is an O(1) pointer compare
> >> if they are equal. But if s and t differ in the last character,
> >> __eq__ will still inspect every character. There is no way to tell
> >> Python "all strings are interned, if s is not t then s != t as well".
> >>
> >
> > I thought that if *both* strings were interned then a pointer
> > comparison could decide if they were unequal without needing to check
> > the characters.
> >
> > Have I misunderstood how intern() works?
> >
>
> I don't think you've misunderstood how it work, but so far as I can see the
> code doesn't attempt to short circuit the "not equal but interned" case.
> The comparison code doesn't look at interning at all, it only looks for
> identity as a shortcut.


It also doesn't seem to check if the hash values have been set. I guess the
cached hash value is only used in contexts where the hash is explicitly
desired.

That makes two optimisations that can bring worst case string comparison
down to O(1) in many contexts that are available to cpython but unused. But
then if full string comparison is already on average O(1) then the cost of
checking the interned and hash flags for every string comparison would
outweigh the benefits of avoiding the rare worst case O(N) comparisons.

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: submit jobs on multi-core

2012-09-11 Thread Oscar Benjamin
On 2012-09-11, Dhananjay  wrote:
> --===0316394162==
> Content-Type: multipart/alternative; boundary=20cf30776bd309ffd004c96557e2
>
> --20cf30776bd309ffd004c96557e2
> Content-Type: text/plain; charset=ISO-8859-1
>
> Dear all,
>
> I have a python script in which I have a list of files to input one by one
> and for each file I get a number as an output.
> I used for loop to submit the file to script.
> My script uses one file at a time and returns the output.
>
> My computers has 8 cores.
> Is there any way that I could submit 8 jobs at a time and get all the
> output faster ?
> In other words, how can I modify my script so that I could submit 8 jobs
> together on 8 different processors ?
>
> I am bit new to this stuff, please suggest me some directions.
>
> Thank you.

The simplest way I've found to do this is to use something like GNU parallel.
I don't know if there's a Windows equivalent but it works well for me on linux
and you can use it for any program (not just python scripts).

>From the wikipedia page:
http://en.wikipedia.org/wiki/GNU_parallel

"""
The most common usage is to replace the shell loop, for example

(for x in `cat list` ; do
  do_something $x
 done) | process_output

to the form of

cat list | parallel do_something | process_output
"""

Note that there are two basic types of parallel execution depending on whether
or not your parallel processes need to communicate with one another. I'm
assuming that you really just want to run independent jobs simultaneously.
Otherwise the other suggestions may be more relevant.

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: generators as decorators simple issue

2012-09-12 Thread Oscar Benjamin
On Wed, 12 Sep 2012 03:22:31 -0700 (PDT), pyjoshsys 
 wrote:
The output is still not what I want. Now runtime error free, 

however the output is not what I desire.


def setname(cls):
'''this is the proposed generator to call SetName on the 

object'''


try:
cls.SetName(cls.__name__)
except Exception as e:
print e
finally:
return cls


I would write the function above in one line:

cls.name = name



class Trial(object):
'''class to demonstrate with'''
def __init__(self):
object.__init__(self)
self.name = None


Remove the line above. The instance attribute self.name is hiding the 
class attribute cls.name.


Oscar

--
http://mail.python.org/mailman/listinfo/python-list


Re: Boolean function on variable-length lists

2012-09-12 Thread Oscar Benjamin
On 12 September 2012 14:25, Libra  wrote:

> On Wednesday, September 12, 2012 3:11:42 PM UTC+2, Steven D'Aprano wrote:
> > On Wed, 12 Sep 2012 05:48:09 -0700, Libra wrote:
>
> > > I need to implement a function that returns 1 only if all the values in
> > > a list satisfy given constraints (at least one constraint for each
> > > element in the list), and zero otherwise.
> >
> > What are the restrictions on the constraints themselves?
> > Could they be arbitrarily complicated?
> > "Item 2 must be an even number divisible by 17 and 39 with at least eight
> > digits but no greater than four million, unless today is Tuesday, in
> > which case it must be equal to six exactly."
>
> Generally the constraints are quite simple, like the one in my example.
> But I can also have 2 or more constraints for each value:
> L[0] >= 1
> L[0] <= 5
>

You can use:
   lambda x: 1 <= x and x <= 5
or
  lambda x: 1 <= x <= 5


> To complicate a little, what about constraints like:
> L[0] + L[2] >= 3


You could rewrite all your constraints as functions on the sequence of
values:
  lambda y: 1 <= y[0] <= 5
  lambda y: y[0] + y[2] >= 3

If all of your constraints are linear (like all of the ones you have shown)
then you can represent each one as a set of coefficients for a linear
projection of the list combined with a threshold value (if this last point
doesn't make sense then just ignore it).

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: using subprocess.Popen does not suppress terminal window on Windows

2012-09-13 Thread Oscar Benjamin
On Thu, 13 Sep 2012 00:27:10 -0700 (PDT), janis.judvai...@gmail.com 
wrote:
I'm making a little embedded system programming IDE so I need to 
run .exe(windows only), make commands, perl & python scripts 
etc(multiplatform).  I'm using subprocess.Popen for all of them and 
it works fine except that blank console window and btw it pop's out 
under linux too.



Maybe the problem is that original python script has .pyw 

extension, so it hides his own console, but I don't need thatone too.



P.S. If it makes a diffrence I'm using wxPython 2.9. & Python 2.7.2.


Perhaps wxPython is causing the problem. Does the 'terminal' look 
like a normal terminal? Does it only appear if you actually print 
something?


Oscar

--
http://mail.python.org/mailman/listinfo/python-list


Re: Re: using subprocess.Popen does not suppress terminal window on Windows

2012-09-13 Thread Oscar Benjamin
On 13 September 2012 10:22, Oscar Benjamin wrote:

> On Thu, 13 Sep 2012 00:27:10 -0700 (PDT), janis.judvai...@gmail.com wrote:
>
>> I'm making a little embedded system programming IDE so I need to
>>
> run .exe(windows only), make commands, perl & python scripts
> etc(multiplatform).  I'm using subprocess.Popen for all of them and it
> works fine except that blank console window and btw it pop's out under
> linux too.
>
>
>  Maybe the problem is that original python script has .pyw
>>
> extension, so it hides his own console, but I don't need thatone too.
>
>
>  P.S. If it makes a diffrence I'm using wxPython 2.9. & Python 2.7.2.
>>
>
> Perhaps wxPython is causing the problem. Does the 'terminal' look like a
> normal terminal? Does it only appear if you actually print something?


Are you using

app = wx.App(redirect=False)

to prevent wxPython from redirecting stdout/stderr into special wxPython
output windows?

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: using subprocess.Popen does not suppress terminal window on Windows

2012-09-13 Thread Oscar Benjamin
On 13 September 2012 13:33,  wrote:

> It looks like normal terminal to me, could You define normal?
>
> Looks like it appears only when target script prints something, but it
> shouldn't cus I'm using pipes on stdout and stderr.
>
> If anyone is interested I'm using function doPopen from here:
> http://code.google.com/p/mansos/source/browse/trunk/tools/IDE/src/helperFunctions.py
> --
> http://mail.python.org/mailman/listinfo/python-list
>

I asked about the terminal window since you mentioned that it pops up under
linux which would suggest you're not having the usual Windows console/gui
problem.

In any case, have you tried this:
http://code.activestate.com/recipes/409002-launching-a-subprocess-without-a-console-window/

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: gc.get_objects()

2012-09-17 Thread Oscar Benjamin
On 2012-09-17, Matteo Boscolo  wrote:
> from my gc.get_object()
> I extract the sub system of the object that I would like to delete:
>
> this is the object:
> Class name 
> win32com.gen_py.F4503A16-F637-11D2-BD55-00500400405Bx0x1x0.ITDProperty.ITDProperty
> that is traked and the reference are:
> get_referents     
>      
> 
> RefCount 5
>   ( 0x026ACB58>,)
> RefCount 5
>   '__int__': , 
> '__module__': 'win32com.gen_py.F45
> RefCount 8
>   ITDProperty
> RefCount 9
>   
> RefCount 9
>   
> get_referrers     
>      
> 
> RefCount 15
>   'python_version': 34014192, 'defaultUnnamedArg': 
> RefCount 6
>    win32com.gen_py.F4503A16-F637-11D2-BD55-00500400405Bx0x1x0.ITDProperty.I
> RefCount 4
>   (u'ItemsListCreator', u'trick', u'pVal'), (3, 49, 
> '0', None), (16393, 10, None,
> RefCount 4
>   
> RefCount 7
>   
> RefCount 5
>  '{39AAEA35-F71F-11D2-BD59-00500400405B}':  win32com.gen_py.F4503A16-F637-
>
> how can I understand how to clean up this situation or were are the 
> references that I need to delete ?
>
>  From the cad non python script I call an in process python com object, 
> and before coming back to the cad application I need to clean up all com 
> reference, because if I do not do that I corrupt the cad application .
>
> so I urgently need to clean up all reference before coming back to the 
> cad application.
>
> any idea?
>

http://mg.pov.lt/objgraph/
http://mg.pov.lt/blog/hunting-python-memleaks.html

I have previously used the code from one of the links above to hunt down a
reference leak. It gives a graphical view of the alive references which helps
to locate the source of the ref leak.

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using dict as object

2012-09-19 Thread Oscar Benjamin
On 2012-09-19, Dave Angel  wrote:
> On 09/19/2012 06:24 AM, Pierre Tardy wrote:
>> All implementation I tried are much slower than a pure native dict access.
>> 
 Each implementation have bench results in commit comment. All of them
>> are 20+x slower than plain dict!
>
> Assuming you're talking about CPython benchmarks, the dict is highly
> optimized, C code.  And when you provide your own __getitem__
> implementation in pure python, there are many attribute accesses, just
 to
> make the code work.
>
>> I would like to have python guys advices on how one could optimize this.
>

I agree with all of Dave's objections to this idea. It is possible, however,
to make a more efficient implementation than the one that you have:

class Namespace(dict):
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
self.__dict__ = self

This implementation is not really sane, though, as it doesn't hide any of the
dict methods as attributes. It does, however, demonstrate something that be a
potentially simple way of making an alternate type object in C.

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using dict as object

2012-09-19 Thread Oscar Benjamin
On 2012-09-19, Pierre Tardy  wrote:
> --===1362296571==
> Content-Type: multipart/alternative; boundary=bcaec554d3229e814204ca105e50
>
> --bcaec554d3229e814204ca105e50
> Content-Type: text/plain; charset=ISO-8859-1
>
>>
>>  This has been proposed and discussed and even implemented many
>> times on this list and others.
>>
> I can find this question on SO
> http://stackoverflow.com/questions/4984647/accessing-dict-keys-like-an-attribute-in-python
> which is basically answered with this solution
>
> class AttributeDict(dict):
> __getattr__ = dict.__getitem__
> __setattr__ = dict.__setitem__
>
>
> but this does not allow recursive access, you would need to first convert
> all nested dictionaries to AttributeDict.
> a.b.c.d = 2 # fail
> a.b = dict(c=3)
> a.b.c=4 # fail

There is no way to control "recursive access" in Python. The statement

a.b.c = 2

is equivalent to the statements

o = a.b   # o = a.__getattr__('b')
o.c = 2   # o.__setattr__('c', 2)

The way that the o.c assignment is handled is determined by the type of o
regardless of the type of a. If you're looking for a way to change only the
type of a and make a custom __(set|get)attr__ work for all dicts that are
indirectly referred to then there is no solution to your problem.

Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list


For Counter Variable

2012-09-23 Thread Oscar Benjamin
On Sep 23, 2012 5:42 PM, "jimbo1qaz"  wrote:
>
> Am I missing something obvious, or do I have to manually put in a counter
in the for loops? That's a very basic request, but I couldn't find anything
in the documentation.

Have you seen the enumerate function?

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: For Counter Variable

2012-09-23 Thread Oscar Benjamin
On Sep 23, 2012 6:52 PM, "jimbo1qaz"  wrote:
>
> On Sunday, September 23, 2012 9:36:19 AM UTC-7, jimbo1qaz wrote:
> > Am I missing something obvious, or do I have to manually put in a
counter in the for loops? That's a very basic request, but I couldn't find
anything in the documentation.
>
> Ya, they should really give a better way, but for now, enumerate works
pretty well.

I can't tell who you're responding to here. It would make more sense if you
quote from the post you're replying to.

Also, maybe there is a better way. Unfortunately your post was quite vague
so this is as good a response as you can hope for. Why don't you post a
code snippet representing what your trying to do? Then someone can tell you
a better way if there is one.

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Anyone able to help on installing packages?

2012-09-23 Thread Oscar Benjamin
On Sep 23, 2012 6:56 PM, "John Mordecai Dildy"  wrote:
>
> Hello everyone out there.  Ive been trying to install packages like
distribute, nose, and virturalenv and believe me it is a hard process to
do. I tried everything I could think of to install.
>
> I have done the following:
>
> pip install "package name"
>
> easy_install "package name"

What happened when you ran those commands? Was there an error message? If
so can can you post the exact output?

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Anyone able to help on installing packages?

2012-09-23 Thread Oscar Benjamin
Please send your reply to the mailing list (python-list@python.org) rather
than privately to me.

On 23 September 2012 20:57, John Dildy  wrote:

> When I give input at the start of terminal using the command pip install
>  virtualenv:
>
> Downloading/unpacking virtualenv
>   Running setup.py egg_info for package virtualenv
>
> warning: no previously-included files matching '*' found under
> directory 'docs/_templates'
> warning: no previously-included files matching '*' found under
> directory 'docs/_build'
> Installing collected packages: virtualenv
>   Running setup.py install for virtualenv
> error: /Library/Python/2.7/site-packages/virtualenv.py: Permission
> denied
> Complete output from command /usr/bin/python -c "import
> setuptools;__file__='/var/folders/4r/jxvj6v_j5571vbjxkx_jbdy8gp/T/pip-build/virtualenv/setup.py';exec(compile(open(__file__).read().replace('\r\n',
> '\n'), __file__, 'exec'))" install --record
> /var/folders/4r/jxvj6v_j5571vbjxkx_jbdy8gp/T/pip-S9mDRc-record/install-record.txt
> --single-version-externally-managed:
> running install
>
> running build
>
> running build_py
>
> running install_lib
>
> copying build/lib/virtualenv.py -> /Library/Python/2.7/site-packages
>
> error: /Library/Python/2.7/site-packages/virtualenv.py: Permission denied
>

Your user account does not have permission to install the package in the
place where you want to install it.


>
> 
> Command /usr/bin/python -c "import
> setuptools;__file__='/var/folders/4r/jxvj6v_j5571vbjxkx_jbdy8gp/T/pip-build/virtualenv/setup.py';exec(compile(open(__file__).read().replace('\r\n',
> '\n'), __file__, 'exec'))" install --record
> /var/folders/4r/jxvj6v_j5571vbjxkx_jbdy8gp/T/pip-S9mDRc-record/install-record.txt
> --single-version-externally-managed failed with error code 1 in
> /var/folders/4r/jxvj6v_j5571vbjxkx_jbdy8gp/T/pip-build/virtualenv
> Storing complete log in /Users/jd3/Library/Logs/pip.log
>
> When I give the input of easy_install virtualenv:
>
> error: can't create or remove files in install directory
>
> The following error occurred while trying to add or remove files in the
> installation directory:
>
> [Errno 13] Permission denied:
> '/Library/Python/2.7/site-packages/test-easy-install-6258.write-test'
>

This problem is exactly the same. It doesn't matter whether you use pip or
easy_install, you need to be an administrator to install the package in
that location.  See the rest of the message:


> The installation directory you specified (via --install-dir, --prefix, or
> the distutils default setting) was:
>
> /Library/Python/2.7/site-packages/
>
> Perhaps your account does not have write access to this directory?  If the
> installation directory is a system-owned directory, you may need to sign in
> as the administrator or "root" account.  If you do not have administrative
> access to this machine, you may wish to choose a different installation
> directory, preferably one that is listed in your PYTHONPATH environment
> variable.
>

There are two ways around this:

1) Run those commands as root. I don't use OSX but I believe the command is:
$ sudo pip install virtualenv

2)  Install into your user directory. I don't know if there's anything that
needs to be done to make this work on OSX but I can do this with:
$ sudo pip install --user virtualenv

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: List Problem

2012-09-23 Thread Oscar Benjamin
On 23 September 2012 22:31, jimbo1qaz  wrote:

> I have a nested list. Whenever I make a copy of the list, changes in one
> affect the other, even when I use list(orig) or even copy the sublists one
> by one. I have to manually copy each cell over for it to work.
> Link to broken code: http://jimbopy.pastebay.net/1090401


There are many things wrong with that code but I can't tell what you're
referring to. Can you paste the code into your post (rather than just a
link to it)? Can you also explain what you want it to do and at what point
it does the wrong thing?

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Editing Inkscape SVG files with Python?

2012-09-23 Thread Oscar Benjamin
On 23 September 2012 23:53, Steven D'Aprano <
steve+comp.lang.pyt...@pearwood.info> wrote:

> I have some SVG files generated with Inkscape containing many text blocks
> (over 100). I wish to programmatically modify those text blocks using
> Python. Is there a library I should be using, or any other guidelines or
> advice anyone can give me?
>
> Googling for "python inkscape" comes up with too many hits for Inkscape's
> plugin system to be much help to me.
>

I thought for a moment that PyX would do it. I just checked their roadmap
though and SVG support is "not started":
http://pyx.sourceforge.net/roadmap.html

Since SVG files are a type of XML and you only want to modify the text
blocks can you not just use an XML library?

Alternatively, if you don't get an answer here it might be worth trying the
PyX-user list.

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Java singletonMap in Python

2012-09-23 Thread Oscar Benjamin
On 24 September 2012 00:14, Mark Lawrence  wrote:

> Purely for fun I've been porting some code to Python and came across the
> singletonMap[1].  I'm aware that there are loads of recipes on the web for
> both singletons e.g.[2] and immutable dictionaries e.g.[3].  I was
> wondering how to combine any of the recipes to produce the best
> implementation, where to me best means cleanest and hence most
> maintainable.  I then managed to muddy the waters for myself by recalling
> the Alex Martelli Borg pattern[4].  Possibly or even probably the latter is
> irrelevant, but I'm still curious to know how you'd code this beast.
>

What exactly is wanted when an attempt is made to instantiate an instance?
Should it raise an error or return the previously created instance?

This attempt makes all calls to __new__ after the first return the same
instance:

def singleton(cls):
instance = None
class sub(cls):
def __new__(cls_, *args, **kwargs):
nonlocal instance
if instance is None:
instance = super(sub, cls_).__new__(cls_, *args, **kwargs)
return instance
sub.__name__ == cls.__name__
return sub

@singleton
class A(object):
pass

print(A() is A())

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   >