Re: recv_into(bytearray) complains about a "pinned buffer"

2010-02-01 Thread Andrew Dalke
On Feb 2, 12:12 am, Martin v. Loewis wrote:
> My recommendation would be to not use recv_into in 2.x, but only in 3.x.

> I don't think that's the full solution. The array module should also
> implement the new buffer API, so that it would also fail with the old
> recv_into.

Okay. But recv_into was added in 2.5 and the test case in
2.6's test_socket.py clearly allows an array there:


def testRecvInto(self):
buf = array.array('c', ' '*1024)
nbytes = self.cli_conn.recv_into(buf)
self.assertEqual(nbytes, len(MSG))
msg = buf.tostring()[:len(MSG)]
self.assertEqual(msg, MSG)

Checking koders and Google Code search engines, I found one project
which used recv_into, with the filename bmpreceiver.py . It
uses a array.array("B", [0] * length) .

Clearly it was added to work with an array, and it's
being used with an array. Why shouldn't people use it
with Python 2.x?

Andrew
da...@dalkescientific.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: recv_into(bytearray) complains about a "pinned buffer"

2010-01-31 Thread Andrew Dalke
On Feb 1, 1:04 am, Antoine Pitrou  wrote:
> The problem is that socket.recv_into() in 2.6 doesn't recognize the new
> buffer API which is needed to accept bytearray objects.
> (it does in 3.1, because the old buffer API doesn't exist anymore there)

That's about what I thought it was, but I don't know if this was a
deliberate choice or accidental.

BTW, 2.7 (freshly built from version control) also has the same
exception.

> You could open an issue on the bug tracker for this.

I've done that. It's http://bugs.python.org/issue7827 .

Cheers!
Andrew
da...@dalkescientific.com

-- 
http://mail.python.org/mailman/listinfo/python-list


recv_into(bytearray) complains about a "pinned buffer"

2010-01-31 Thread Andrew Dalke
In Python 2.6 I can't socket.recv_into(a byte array instance). I get a
TypeError which complains about a "pinned buffer". I have only an
inkling of what that means. Since an array.array("b") works there, and
since it works in Python 3.1.1, and since I thought the point of a
bytearray was to make things like recv_into easier, I think this
exception is a bug in Python 2.6.

I want to double check before posting it to the tracker.

Here's my reproducibles:

Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> sock = socket.socket()
>>> sock.connect( ("python.org", 80) )
>>> sock.send(b"GET / HTTP/1.0\r\n\r\n")
18
>>> buf = bytearray(b" " * 10)
>>> sock.recv_into(buf)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: recv_into() argument 1 must be pinned buffer, not bytearray
>>>

I expected a bytearray to work there. In fact, I thought the point of
bytearray was to allow this to work.

By comparison, an array of bytes does work:

>>> import array
>>> arr = array.array("b")
>>> arr.extend(map(ord, "This is a test"))
>>> len(arr)
14
>>> sock.recv_into(arr)
14
>>> arr
array('b', [72, 84, 84, 80, 47, 49, 46, 49, 32, 51, 48, 50, 32, 70])
>>> "".join(map(chr, arr))
'HTTP/1.1 302 F'

I don't even know what a "pinned buffer" means, and searching
python.org isn't helpful.

Using a bytearray in Python 3.1.1 *does* work:

Python 3.1.1 (r311:74480, Jan 31 2010, 23:07:16)
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> sock = socket.socket()
>>> sock.connect( ("python.org", 80) )
>>> sock.send(b"GET / HTTP/1.0\r\n\r\n")
18
>>> buf = bytearray(b" " * 10)
>>> sock.recv_into(buf)
10
>>> buf
bytearray(b'HTTP/1.1 3')
>>>

Is this a bug in Python 2.6 or a deliberate choice regarding
implementation concerns I don't know about?

If it's a bug, I'll add it to the tracker.

Andrew Dalke
da...@dalkescientific.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Update the sorting mini-howto

2005-11-29 Thread Andrew Dalke
I wrote:
> Years ago I wrote the Sorting mini-howto, currently at
>
>   http://www.amk.ca/python/howto/sorting/sorting.html

Thanks to amk it's now on the Wiki at
   http://wiki.python.org/moin/HowTo/Sorting

so feel free to update it directly.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Update the sorting mini-howto

2005-11-29 Thread Andrew Dalke
Years ago I wrote the Sorting mini-howto, currently at

   http://www.amk.ca/python/howto/sorting/sorting.html

I've had various people thank me for that, in person and
through email.

It's rather out of date now given the decorate-sort-undecorate
option and 'sorted' functions in Python 2.4.  Hmmm, and perhaps
also some mention of rich comparisons.

I don't particularly want to update it myself so I'm tossing it
to the winds.  Anyone here want to take care of it?  I'll
provide feedback if you want it.

Email me if you're interested.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on path module for standard library

2005-07-31 Thread Andrew Dalke
Peter Hansen wrote:
> A scattered assortment of module-level global function names, and 
> builtins such as open(), make it extraordinarily difficult to do 
> effective and efficient automated testing with "mock" objects.

I have been able to do this by inserting my own module-scope function
that intercepts the lookup before it gets to builtins.  A problem
though is that a future (Python 3K?) Python may not allow that.

For example,

module.open = mock_open
try:
  ...
finally:
  module.open = open

By looking at the call stack it is possible to replace the built-in
open to have new behavior only when called from specific modules or
functions, but that gets to be rather hairy.

> Object-oriented solutions like Path make it near trivial to substitute a 
> mock or other specialized object which (duck typing) acts like a Path 
> except perhaps for actually writing the file to disk, or whatever other 
> difference you like.

By analogy to the other builtins, another solution is to have a
protocol by which open() dispatches to an instance defined method.

> So, for the PEP, another justification for Path is that its use can 
> encourage better use of automated testing techniques and thereby improve 
> the quality of Python software, including in the standard library.

But then what does the constructor for the file object take?

I've also heard mention that a future (Py3K era) 'open' may allow
URLs and not just a path string.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-29 Thread Andrew Dalke
Peter Otten wrote:
> Seems my description didn't convince you. So here's an example:

Got it.  In my test case the longest element happened to be the last
one, which is why it didn't catch the problem.

Thanks.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-29 Thread Andrew Dalke
Scott David Daniels wrote:
> Can I play too? How about:

Sweet!


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-29 Thread Andrew Dalke
Me:
>> Could make it one line shorter with
>  
>> from itertools import chain, izip, repeat
>> def fillzip(*seqs):
>> def done_iter(done=[len(seqs)]):
>> done[0] -= 1
>> if not done[0]:
>> return []
>> return repeat(None)
>> seqs = [chain(seq, done_iter()) for seq in seqs]
>> return izip(*seqs)

Peter Otten:
> that won't work because done_iter() is now no longer a generator.
> In effect you just say
> 
> seqs = [chain(seq, repeat(None)) for seq in seqs[:-1]] + [chain(seq[-1],
> [])]

It does work - I tested it.  The trick is that izip takes iter()
of the terms passed into it.  iter([]) -> an empty iterator and
iter(repeat(None)) -> the repeat(None) itself.

'Course then the name should be changed.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-29 Thread Andrew Dalke
Peter Otten wrote:
> Combining your "clever" and your "elegant" approach to something fast
> (though I'm not entirely confident it's correct):
> 
> def fillzip(*seqs):
> def done_iter(done=[len(seqs)]):
> done[0] -= 1
> if not done[0]:
> return
> while 1:
> yield None
> seqs = [chain(seq, done_iter()) for seq in seqs]
> return izip(*seqs)

Ohh, that's pretty neat passing in 'done' via a mutable default argument.

It took me a bit to even realize why it does work.  :)

Could make it one line shorter with

from itertools import chain, izip, repeat
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return []
return repeat(None)
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Go too far on that path and the code starts looking likg

from itertools import chain, izip, repeat
forever, table = repeat(None), {0: []}.get
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
return table(done[0], forever)
return izip(*[chain(seq, done_iter()) for seq in seqs])

Now add the performance tweak

  def done_iter(done=[len(seqs)], forever=forever, table=table)

Okay, I'm over it.  :)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os._exit vs. sys.exit

2005-07-28 Thread Andrew Dalke
Bryan wrote:
> Why does os._exit called from a Python Timer kill the whole process while 
> sys.exit does not?  On Suse.

os._exit calls the C function _exit() which does an immediate program
termination.  See for example
  
http://developer.apple.com/documentation/Darwin/Reference/ManPages/man2/_exit.2.html
and note the statement "can never return".

sys.exit() is identical to "raise SystemExit()".  It raises a Python
exception which may be caught at a higher level in the program stack.

Andrew
[EMAIL PROTECTED]


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-28 Thread Andrew Dalke
Christopher Subich wrote:
> My  naive solution:
  ...
>for i in ilist:
>   try:
>  g = i.next()
>  count += 1
>   except StopIteration: # End of iter
>  g = None
  ...

What I didn't like about this was the extra overhead of all
the StopIteration exceptions.  Eg, 

zipfill("a", range(1000))

will raise 1000 exceptions (999 for "a" and 1 for the end of the range).

But without doing timing tests I'm not sure which approach is
fastest, and it may depend on the data set.

Since this is code best not widely used, I don't think it's something
anyone should look into either.  :)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-28 Thread Andrew Dalke
Me:
> Here's a clever, though not (in my opinion) elegant solution
 ...
> This seems a bit more elegant, though the "replace" dictionary is
> still a bit of a hack

Here's the direct approach without using itertools.  Each list is
iterated over only once.  No test against a sequence element is ever
made (either as == or 'is') and the end of the sequence exception
is raised only once per input iterator.

The use of a list for the flag is a bit of a hack.  If the list has
1 element then its true, no elements then its false.  By doing it this
way I don't need one extra array and one extra indexing/enumeration.

def zipfill(*seqs):
count = len(seqs)
seq_info = [(iter(seq), [1]) for seq in seqs]
while 1:
fields = []
for seq, has_data in seq_info:
if has_data:
try:
fields.append(seq.next())
except StopIteration:
fields.append(None)
del has_data[:]
count -= 1
else:
fields.append(None)
if count:
yield fields
else:
break


Hmm, it should probably yield tuple(fields)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-28 Thread Andrew Dalke
Steven Bethard wrote:
> Here's one possible solution:
> 
> py> import itertools as it
> py> def zipfill(*lists):
> ...   max_len = max(len(lst) for lst in lists)

A limitation to this is the need to iterate over the
lists twice, which might not be possible if one of them
is a file iterator.

Here's a clever, though not (in my opinion) elegant solution

import itertools

def zipfill(*seqs):
count = [len(seqs)]
def _forever(seq):
for item in seq: yield item
count[0] -= 1
while 1: yield None
seqs = [_forever(seq) for seq in seqs]
while 1:
x = [seq.next() for seq in seqs]
if count == [0]:
break
yield x

for x in zipfill("This", "is", "only", "a", "test."):
print x

This generates

['T', 'i', 'o', 'a', 't']
['h', 's', 'n', None, 'e']
['i', None, 'l', None, 's']
['s', None, 'y', None, 't']
[None, None, None, None, '.']

This seems a bit more elegant, though the "replace" dictionary is
still a bit of a hack

from itertools import repeat, chain, izip

sentinel = object()
end_of_stream = repeat(sentinel)

def zipfill(*seqs):
replace = {sentinel: None}.get
seqs = [chain(seq, end_of_stream) for seq in seqs]
for term in izip(*seqs):
for element in term:
if element is not sentinel:
break
else:
# All sentinels
break

yield [replace(element, element) for element in term]


(I originally had a "element == tuple([sentinel]*len(seqs))" check
but didn't like all the == tests incurred.)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-27 Thread Andrew Dalke
David Isaac wrote:
> I have been generally open to the proposal that list comprehensions
> should replace 'map', but I ran into a need for something like
> map(None,x,y)
> when len(x)>len(y).  I cannot it seems use 'zip' because I'll lose
> info from x.  How do I do this as a list comprehension? (Or,
> more generally, what is the best way to do this without 'map'?)

If you know that len(x)>=len(y) and you want the same behavior as
map() you can use itertools to synthesize a longer iterator


>>> x = [1,2,3,4,5,6]
>>> y = "Hi!"
>>> from itertools import repeat, chain
>>> zip(x, chain(y, repeat(None)))   
[(1, 'H'), (2, 'i'), (3, '!'), (4, None), (5, None), (6, None)]
>>> 

This doesn't work if you want the result to be max(len(x), len(y))
in length - the result has length len(x).

As others suggested, if you want to use map, go ahead.  It won't
disappear for a long time and even if it does it's easy to
retrofit if needed.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to write a line in a text file

2005-07-26 Thread Andrew Dalke
> [EMAIL PROTECTED] wrote:
>> Well, it's what (R)DBMS are for, but plain files are not.

Steven D'Aprano wrote:
> This isn't 1970, users expect more from professional 
> programs than "keep your fingers crossed that nothing 
> bad will happen". That's why applications have multiple 
> levels of undo (and some of them even save the undo 
> history in the file) and change-tracking, and auto-save 
> and auto-backup. 

This isn't 1970.  Why does your app code work directly with
files?  Use a in-process database library (ZODB, SQLLite,
BerkeleyDB, etc.) to maintain your system state and let the
library handle transactions for you.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [path-PEP] Path inherits from basestring again

2005-07-25 Thread Andrew Dalke
> Reinhold Birkenfeld wrote:
>> Current change:
>> 
>> * Add base() method for converting to str/unicode.

Now that [:] slicing works, and returns a string,
another way to convert from path.Path to str/unicode
is path[:]

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [path-PEP] Path inherits from basestring again

2005-07-24 Thread Andrew Dalke
Reinhold Birkenfeld wrote:
> Okay. While a path has its clear use cases and those don't need above methods,
> it may be that some brain-dead functions needs them.

"brain-dead"?

Consider this code, which I think is not atypical.

import sys

def _read_file(filename):
  if filename == "-":
# Can use '-' to mean stdin
return sys.stdin
  else:
return open(filename, "rU")


def file_sum(filename):
  total = 0
  for line in _read_file(filename):
total += int(line)
  return total

(Actually, I would probably write it

def _read_file(file):
  if isinstance(file, basestring):
if filename == "-":
  # Can use '-' to mean stdin
  return sys.stdin
else:
  return open(filename, "rU")
  return file

)

Because the current sandbox Path doesn't support
the is-equal test with strings, the above function
won't work with a filename = path.Path("-").  It
will instead raise an exception saying
  IOError: [Errno 2] No such file or directory: '-'

(Yes, the code as-is can't handle a file named '-'.
The usual workaround (and there are many programs
which support '-' as an alias for stdin) is to use "./-"

% cat > './-'
This is a file
% cat ./-
This is a file
% cat -
I'm typing directly into stdin.
^D
I'm typing directly into stdin.
% 
)


If I start using the path.Path then in order to use
this function my upstream code must be careful on
input to distinguish between filenames which are
really filenames and which are special-cased pseudo
filenames.

Often the code using the API doesn't even know which
names are special.  Even if it is documented,
the library developer may decide in the future to
extend the list of pseudo filenames to include, say,
environment variable style expansion, as
  $HOME/.config

Perhaps the library developer should have come up
with a new naming system to include both types of
file naming schemes, but that's rather overkill.

As a programmer calling the API should I convert
all my path.Path objects to strings before using it?
Or to Unicode?  How do I know which filenames will
be treated specially through time?

Is there a method to turn a path.Path into the actual
string?  str() and unicode() don't work because I
want the result to be unicode if the OS&Python build
support it, otherwise string.

Is that library example I mentioned "brain-dead"?
I don't think so.  Instead I think you are pushing
too much for purity and making changes that will
cause problems - and hard to fix problems - with
existing libraries.



Here's an example of code from an existing library
which will break in several ways if it's passed a
path object instead of a string.  It comes from
spambayes/mboxutils.py

#

This is mostly a wrapper around the various useful classes in the
standard mailbox module, to do some intelligent guessing of the
mailbox type given a mailbox argument.

+foo  -- MH mailbox +foo
+foo,bar  -- MH mailboxes +foo and +bar concatenated
+ALL  -- a shortcut for *all* MH mailboxes
/foo/bar  -- (existing file) a Unix-style mailbox
/foo/bar/ -- (existing directory) a directory full of .txt and .lorien
 files
/foo/bar/ -- (existing directory with a cur/ subdirectory)
 Maildir mailbox
/foo/Mail/bar/ -- (existing directory with /Mail/ in its path)
 alternative way of spelling an MH mailbox

  

def getmbox(name):
"""Return an mbox iterator given a file/directory/folder name."""

if name == "-":
return [get_message(sys.stdin)]

if name.startswith("+"):
# MH folder name: +folder, +f1,f2,f2, or +ALL
name = name[1:]
import mhlib
mh = mhlib.MH()
if name == "ALL":
names = mh.listfolders()
elif ',' in name:
names = name.split(',')
else:
names = [name]
mboxes = []
mhpath = mh.getpath()
for name in names:
filename = os.path.join(mhpath, name)
mbox = mailbox.MHMailbox(filename, get_message)
mboxes.append(mbox)
if len(mboxes) == 1:
return iter(mboxes[0])
else:
return _cat(mboxes)

if os.path.isdir(name):
# XXX Bogus: use a Maildir if /cur is a subdirectory, else a MHMailbox
# if the pathname contains /Mail/, else a DirOfTxtFileMailbox.
if os.path.exists(os.path.join(name, 'cur')):
mbox = mailbox.Maildir(name, get_message)
elif name.find("/Mail/") >= 0:
mbox = mailbox.MHMailbox(name, get_message)
else:
mbox = DirOfTxtFileMailbox(name, get_message)
else:
fp = open(name, "rb")
mbox = mailbox.PortableUnixMailbox(fp, get_message)
return iter(mbox)



It breaks with the current sandbox path because:
  - a path can't be compared to "-"
  - range isn't supported, as "name = name[1:]"

note that this example uses __contains__ ("," in name)


Is this function brain-dead?  Is it reasonable that people might
want to pass a path.Path

Re: unit test nested functions

2005-07-23 Thread Andrew Dalke
Andy wrote:
> How can you unit test nested functions?

I can't think of a good way.  When I write a nested function it's because
the function uses variables from the scope of the function in which it's
embedded, which means it makes little sense to test it independent of the
larger function.

My tests in that case are only of the enclosing function.

> Or do you have to pull them out to
> unit test them, which basically means I will never use nested functions.

You don't test every line in a function by itself, right?  Nor
every loop in a function.  It should be possible to test the outer
function enough that the implementation detail - of using an inner
function - doesn't make much difference.

> Also, same thing with private member functions protected by __.  Seems
> like there is a conflict there between using these features and unit
> testing.

In that case the spec defined that the real function name is of
the form "___".  For example

>>> class Spam:
...   def __sing(self):
... print "I don't see any Vikings."
... 
>>> spam = Spam()
>>> spam._Spam__sing()
I don't see any Vikings.
>>> 

I've found though that the double-leading-underscore is overkill.
Using a single underscore is enough of a hint that the given
method shouldn't be called directly.

Then again, I don't write enough deep hierarchies where I need
to worry about a subclass implementation using the same private
name as a superclass.
Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on path module for standard library

2005-07-23 Thread Andrew Dalke
George Sakkis wrote:
> That's why phone numbers would be a subset of integers, i.e. not every
> integer would correspond to a valid number, but with the exception of
> numbers starting with zeros, all valid numbers would be an integers.

But it's that exception which violates the LSP.

With numbers, if x==y then (x,y) = (y,x) makes no difference.
If phone numbers are integers then 001... == 01... but swapping
those two numbers makes a difference.  Hence they cannot be modeled
as integers.

> Regardless, this was not my point; the point was that adding
> two phone numbers or subtracting them never makes sense semantically.

I agree. But modeling them as integers doesn't make sense either.
Your example of adding phone numbers depends on them being represented
as integers.  Since that representation doesn't work, it makes sense
that addition of phone number is suspect.

> There are (at least) two frequently used path string representations,
> the absolute and the relative to the working directory. Which one *is*
> the path ? Depending on the application, one of them woud be more
> natural choice than the other.

Both.  I don't know why one is more natural than the other.

>> I trust my intuition on this, I just don't know how to justify it, or
>> correct it if I'm wrong.
> 
> My intuition also happens to support subclassing string, but for
> practical reasons rather than conceptual.

As you may have read elsewhere in this thread, I give some examples
of why subclassing from string fits best with existing code.

Even if there was no code base, I think deriving from string is the
right approach.  I have a hard time figuring out why though.  I think
if the lowest level Python/C interface used a "get the filename"
interface then perhaps it wouldn't make a difference.  Which means
I'm also more guided by practical reasons than conceptual.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on path module for standard library

2005-07-22 Thread Andrew Dalke
George Sakkis wrote:
> Bringing up how C models files (or anything else other than primitive types
> for that matter) is not a particularly strong argument in a discussion on
> OO design ;-)

While I have worked with C libraries which had a well-developed
OO-like interface, I take your point.

Still, I think that the C model of a file system should be a
good fit since after all C and Unix were developed hand-in-hand.  If
there wasn't a good match then some of the C path APIs should be
confusing or complicated.  Since I don't see that it suggests that
the "path is-a string" is at least reasonable.

> Liskov substitution principle imposes a rather weak constraint

Agreed.  I used that as an example of the direction I wanted to
go.  What principles guide your intuition of what is a "is-a"
vs a "has-a"?

> Take for example the case where a PhoneNumber class is subclass
> of int. According to LSP, it is perfectly ok to add phone numbers
> together, subtract them, etc, but the result, even if it's a valid
> phone number, just doesn't make sense.

Mmm, I don't think an integer is a good model of a phone number.
For example, in the US
  00148762040828
will ring a mobile number in Sweden while
  148762040828
will give a "this isn't a valid phone number" message.

Yet both have the same base-10 representation.  (I'm not using
a syntax where leading '0' indicates an octal number. :)

> I wouldn't say more complicated, but perhaps less intuitive in a few cases, 
> e.g.:
> 
>> path(r'C:\Documents and Settings\Guest\Local Settings').split()
> ['C:\\Documents', 'and', 'Settings\\Guest\\Local', 'Settings']
> instead of
> ['C:', 'Documents and Settings', 'Guest', 'Local Settings']

That is why the path module using a different method to split
on pathsep vs. whitespace.  I get what you are saying, I just think
it's roughly equivalent to appealing to LSP in terms of weight.

Mmm, then there's a question of the usefulness of ".lower()" and
".expandtabs()" and similar methods.  Hmmm

> I just noted that conceptually a path is a composite object consisting of
> many properties (dirname, extension, etc.) and its string representation
> is just one of them. Still, I'm not suggesting that a 'pure' solution is
> better that a more practical that covers most usual cases.

For some reason I think that

  path.dirname()

is better than

  path.dirname

Python has properties now so the implementation of the latter is
trivial - put a @property on the line before the "def dirname(self):".

I think that the string representation of a path is so important that
it *is* the path.  The other things you call properties aren't quite
properties in my model of a path and are more like computable values.

I trust my intuition on this, I just don't know how to justify it, or
correct it if I'm wrong.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Something that Perl can do that Python can't?

2005-07-22 Thread Andrew Dalke
Dr. Who wrote:
> Well, I finally managed to solve it myself by looking at some code.
> The solution in Python is a little non-intuitive but this is how to get
> it:
> 
> while 1:
> line = stdout.readline()
> if not line:
> break
> print 'LINE:', line,
> 
> If anyone can do it the more Pythonic way with some sort of iteration
> over stdout, please let me know.

Python supports two different but related iterators over
lines of a file.  What you show here is the oldest way.
It reads up to the newline (or eof) and returns the line.

The newer way is

  for line in stdout:
...

which is equivalent to

  _iter = iter(stdout)
  while 1:
try:
  line = _iter.next()
except StopIteration:
  break

...

The file.__iter__() is implemented by doing
a block read and internally breaking the block
into lines.  This make the read a lot faster
because it does a single system call for the
block instead of a system call for every
character read.  The downside is that the read
can block (err, a different use of "block")
waiting for enough data.

If you want to use the for idiom and have
the guaranteed "no more than a line at a time"
semantics, try this 

  for line in iter(stdout.readline, ""):
print "LINE:", line
sys.stdout.flush()

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Iterators from urllib2

2005-07-22 Thread Andrew Dalke
Joshua Ginsberg wrote:

>  >>> dir(ifs)
> ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',  
> 'fileno', 'fp', 'geturl', 'headers', 'info', 'next', 'read',  
> 'readline', 'readlines', 'url']
> 
> Yep. But what about in my code? I modify my code to print dir(ifs)  
> before creating the DictReader...
> 
> ['__doc__', '__init__', '__module__', '__repr__', 'close', 'fp',  
> 'geturl', 'headers', 'info', 'read', 'readline', 'url']
 ...
> Whoa! Where did the __iter__, readlines, and next attributes
> go? Ideas?

That difference comes from this code in urllib.py:addbase

class addbase:
"""Base class for addinfo and addclosehook."""

def __init__(self, fp):
self.fp = fp
self.read = self.fp.read
self.readline = self.fp.readline
if hasattr(self.fp, "readlines"): self.readlines = self.fp.readlines
if hasattr(self.fp, "fileno"): self.fileno = self.fp.fileno
if hasattr(self.fp, "__iter__"):
self.__iter__ = self.fp.__iter__
if hasattr(self.fp, "next"):
self.next = self.fp.next

It looks like the fp for your latter code
doesn't have the additional properties.  Try
adding the following debug code to figure out
what's up

print dir(ifs)
print "fp=", ifs.fp
print "dir(fp)", dir(ifs.fp)

Odds are you'll get different results.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Difference between " and '

2005-07-22 Thread Andrew Dalke
François Pinard wrote:
> There is no strong reason to use one and avoid the other.  Yet, while
> representing strings, Python itself has a _preference_ for single
> quotes. 

I use "double quoted strings" in almost all cases because I
think it's easier to see than 'single quoted quotes'.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on path module for standard library

2005-07-22 Thread Andrew Dalke
George Sakkis wrote:
> You're right, conceptually a path
> HAS_A string description, not IS_A string, so from a pure OO point of
> view, it should not inherit string.

How did you decide it's "has-a" vs. "is-a"?

All C calls use a "char *" for filenames and paths,
meaning the C model file for the filesystem says
paths are strings.

Paths as strings fit the Liskov substitution principle
in that any path object can be used any time a
string is used (eg, "loading from " + filename)

Good information hiding suggests that a better API
is one that requires less knowledge.  I haven't
seen an example of how deriving from (unicode)
string makes things more complicated than not doing so.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on path module for standard library

2005-07-22 Thread Andrew Dalke
Duncan Booth wrote:
> Personally I think the concept of a specific path type is a good one, but 
> subclassing string just cries out to me as the wrong thing to do.

I disagree.  I've tried using a class which wasn't derived from
a basestring and kept running into places where it didn't work well.
For example, "open" and "mkdir" take strings as input.  There is no
automatic coercion.

>>> class Spam:
...   def __getattr__(self, name):
... print "Want", repr(name)
... raise AttributeError, name
... 
>>> open(Spam())
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found
>>> import os
>>> os.mkdir(Spam())
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found
>>> 

The solutions to this are:
  1) make the path object be derived from str or unicode.  Doing
this does not conflict with any OO design practice (eg, Liskov
substitution).

  2) develop a new "I represent a filename" protocol, probably done
via adapt().

I've considered the second of these but I think it's a more
complicated solution and it won't fit well with existing APIs
which do things like


  if isinstance(input, basestring):
input = open(input, "rU")
  for line in input:
print line

I showed several places in the stdlib and in 3rd party packages
where this is used.


> In other words, to me a path represents something in a filesystem,

Being picky - or something that could be in a filesystem.

> the fact that it 
> has one, or indeed several string representations does not mean that the 
> path itself is simply a more specific type of string.

I didn't follow this.

> You should need an explicit call to convert a path to a string and that 
> forces you when passing the path to something that requires a string to 
> think whether you wanted the string relative, absolute, UNC, uri etc.

You are broadening the definition of a file path to include URIs?
That's making life more complicated.  Eg, the rules for joining
file paths may be different than the rules for joining URIs.
Consider if I have a file named "mail:[EMAIL PROTECTED]" and I
join that with "file://home/dalke/badfiles/".

Additionally, the actions done on URIs are different than on file
paths.  What should os.listdir("http://www.python.org/";) do?

As I mentioned, I tried some classes which emulated file
paths.  One was something like

class TempDir:
  """removes the directory when the refcount goes to 0"""
  def __init__(self):
self.filename = ... use a function from the tempfile module
  def __del__(self):
if os.path.exists(self.filename):
  shutil.rmtree(self.filename)
  def __str__(self):
return self.filename

I could do

  dirname = TempDir()

but then instead of

  os.mkdir(dirname)
  tmpfile = os.path.join(dirname, "blah.txt")

I needed to write it as

  os.mkdir(str(dirname))
  tmpfile = os.path.join(str(dirname), "blah.txt"))

or have two variables, one which could delete the
directory and the other for the name.  I didn't think
that was good design.


If I had derived from str/unicode then things would
have been cleaner.

Please note, btw, that some filesystems are unicode
based and others are not.  As I recall, one nice thing
about the path module is that it chooses the appropriate
base class at import time.  My "str()" example above
does not and would fail on a Unicode filesystem aware
Python build.

> It may even be that we need a hierarchy of path
> classes: URLs need similar but not identical manipulations
> to file paths, so if we want to address the failings
> of os.path perhaps we should also look at the failings 
> of urlparse at the same time.

I've found that hierarchies are rarely useful compared
to the number of times they are proposed and used.  One
of the joys to me of Python is its deemphasis of class
hierarchies.

I think the same is true here.  File paths and URIs are
sufficiently different that there are only a few bits
of commonality between them.  Consider 'split' which
for files creates (dirname, filename) while for urls
it creates (scheme, netloc, path, query, fragment)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on path module for standard library

2005-07-22 Thread Andrew Dalke
Michael Hoffman wrote:
> Having path descend from str/unicode is extremely useful since I can 
> then pass a path object to any function someone else wrote without 
> having to worry about whether they were checking for basestring. I think 
> there is a widely used pattern of accepting either a basestring[1] or a 
> file-like object as a function argument, and using isinstance() to 
> figure out which it is.

Reinhold Birkenfeld wrote:
> Where do you see that pattern? IIRC it's not in the stdlib.

Here's the first place that comes to mind for me

xml.sax.saxutils

def prepare_input_source(source, base = ""):
"""This function takes an InputSource and an optional base URL and
returns a fully resolved InputSource object ready for reading."""

if type(source) in _StringTypes:
source = xmlreader.InputSource(source)
elif hasattr(source, "read"):
f = source
source = xmlreader.InputSource()
source.setByteStream(f)
if hasattr(f, "name"):
source.setSystemId(f.name)


and xml.dom.pulldom

def parse(stream_or_string, parser=None, bufsize=None):
if bufsize is None:
bufsize = default_bufsize
if type(stream_or_string) in _StringTypes:
stream = open(stream_or_string)
else:
stream = stream_or_string
if not parser:
parser = xml.sax.make_parser()
return DOMEventStream(stream, parser, bufsize)

Using the power of grep

aifc.py
def __init__(self, f):
if type(f) == type(''):
f = __builtin__.open(f, 'rb')
# else, assume it is an open file object already
self.initfp(f)

binhex.py
class HexBin:
def __init__(self, ifp):
if type(ifp) == type(''):
ifp = open(ifp)

imghdr.py
if type(file) == type(''):
f = open(file, 'rb')
h = f.read(32)
else:
location = file.tell()
h = file.read(32)
file.seek(location)
f = None

mimify.py
if type(infile) == type(''):
ifile = open(infile)
if type(outfile) == type('') and infile == outfile:
import os
d, f = os.path.split(infile)
os.rename(infile, os.path.join(d, ',' + f))
else:
ifile = infile

wave.py
def __init__(self, f):
self._i_opened_the_file = None
if type(f) == type(''):
f = __builtin__.open(f, 'rb')
self._i_opened_the_file = f
# else, assume it is an open file object already
self.initfp(f)


compiler/transformer.py:

if type(file) == type(''):
file = open(file)
return self.parsesuite(file.read())

plat-mac/applesingle.py
if type(input) == type(''):
input = open(input, 'rb')
# Should we also test for FSSpecs or FSRefs?
header = input.read(AS_HEADER_LENGTH)

site-packages/ZODB/ExportImport.py
if file is None: file=TemporaryFile()
elif type(file) is StringType: file=open(file,'w+b')


site-packages/numarray/ndarray.py
if type(file) == type(""):
name = 1
file = open(file, 'wb')


site-packages/kiva/imaging/GdImageFile.py
if type(fp) == type(""):
import __builtin__
filename = fp
fp = __builtin__.open(fp, "rb")
else:
filename = ""

site-packages/reportlab/graphics/renderPM.py
   if type(image.path) is type(''):
im = _getImage().open(image.path).convert('RGB')
   else:
im = image.path.convert('RGB')


site-packages/twisted/protocols/irc.py
def __init__(self, file):
if type(file) is types.StringType:
self.file = open(file, 'r')

(hmm, that last one looks buggy.  It should
have a "else: self.file = file" afterwards.)


Used in the std. lib and used by many different
people.  (I excluded the Biopython libraries
in this list, btw, because I may have influenced
the use of this sort of type check.)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-15 Thread Andrew Dalke
Terry Hancock wrote:
> Of course, since children are vastly better at learning
> than adults, perhaps adults are stupid to do this. ;-)

Take learning a language.  I'm learning Swedish.  I'll
never have a native accent and 6 year olds know more
of the language than I do.  But I make much more
complicated sentences than 6 year olds.  (Doesn't mean
they are grammatically correct, but I can get my point
across given a lot of time.)

> Quantum mechanics notwithstanding, I'm not sure there
> is a "bottom" "most-reduced" level of understanding. It's
> certainly not clear that it is relevant to programming.

I agree.  That's why I make this thread branch.  I think
learning is often best taught from extending what you know
and not from some sort of top/bottom approach. I'm also
one who bristles at hierarchies.  Maybe that's why I like
Python and duck typing. :)

Some learning works by throwing yourself in the deep end.
Languages are sometimes learned that way.  The Suzuki method
extends that to music, though that's meant for kids.

> Python is actually remarkably good at solving things in a
> nearly optimal way.

Have you read Richard Gabriel's "Worse is Better" essay?
 http://www.dreamsongs.com/WIB.html
Section "2.2.4 Totally Inappropriate Data Structures"
relates how knowing the data structure for Lisp affects
the performance and seems relevant to your point.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-14 Thread Andrew Dalke
Andrea Griffini wrote:
> Wow... I always get surprises from physics. For example I
> thought that no one could drop confutability requirement
> for a theory in an experimental science...

Some physicists (often mathematical physicists) propose
alternate worlds because the math is interesting.

There is a problem in physics in that we know (I was
trained as a physicist hence the "we" :) quantum mechanics
and gravity don't agree with each other.  String theory
is one attempt to reconcile the two.  One problem is
the math of string theory is hard enough that it's hard
to make a good prediction.  Another problem is the
realm where QM and GR disagree requires such high energies
that it's hard to test directly.

> I was told that
> in physics there are current theories for which there
> is no hypotetical experiment that could prove them wrong...
> (superstrings may be ? it was a name like that but I
> don't really remember).

If we had a machine that could reach Planck scale energies
then I'm pretty sure there are tests.  But we don't, by
a long shot.

Andrew Dalke

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-14 Thread Andrew Dalke
Peter Maas wrote:
> Yes, but what did you notice first when you were a child - plants
> or molecules? I imagine little Andrew in the kindergarten fascinated
> by molecules and suddenly shouting "Hey, we can make plants out of
> these little thingies!" ;)

One of the first science books that really intrigued me
was a book on stars I read in 1st or 2nd grade.

As I mentioned, I didn't understand the science of biology
until I was in college.

Teaching kids is different than teaching adults.  The
latter can often take bigger steps and start from a
sound understanding of logical and intuitive thought.
"Simple" for an adult is different than for a child.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-14 Thread Andrew Dalke
Andreas Kostyrka wrote:
> On Tue, Jun 14, 2005 at 12:02:29AM +, Andrea Griffini wrote:
>> Caching is indeed very important, and sometimes the difference
>> is huge.
 ...
> Easy Question:
> You've got 2 programs that are running in parallel.
> Without basic knowledge about caches, the naive answer would be that
> the programs will probably run double time. The reality is different.

Okay, I admit I'm making a comment almost solely to have
Andrea, Andreas and Andrew in the same thread.

I've seen superlinear and sublinear performance for this.
Superlinear when the problem fits into 2x cache size but not
1x cache size and is nicely decomposable, and sublinear when
the data doesn't have good processor affinity.

Do I get an A for Andre.*?  :)
 
Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "also" to balance "else" ?

2005-06-14 Thread Andrew Dalke
Terry Hancock wrote:
> No, I know what it should be.  It should be "finally".   It's already
> a keyword, and it has a similar meaning w.r.t. "try".

Except that a finally block is executed with normal and exceptional
exit, while in this case you would have 'finally' only called
when the loop exited without a break.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "also" to balance "else" ?

2005-06-14 Thread Andrew Dalke
Ron Adam wrote:
> True, but I think this is considerably less clear.  The current for-else 
> is IMHO is reversed to how the else is used in an if statement.

As someone else pointed out, that problem could be resolved in
some Python variant by using a different name, like "at end".
Too late for anything before P3K.

> I'm asking if changing the current 'else' in a for statement to 'also'
> would make it's current behavior clearer.  It's been stated before here
> that current behavior is confusing.

"It's been stated" is the passive tense.  You are one, and I
saw a couple others.  But it isn't the same as "many people say
that the current behavior is confusing."  If memory serves, I
don't even recall an FAQ on this, while there is a FAQ regarding
the case statement.

> You are correct that the 'else' behavior could be nested in the if:break
> statement.  I think the logical non-nested grouping of code in the
> for-also-else form is easier to read.  The block in the if statement
> before the break isn't part of the loop, IMO,  being able to move it to
> after the loop makes it clear it evaluates after the loop is done.

There is a tension with code coherency.  In my version the code
that occurs a result of the condition is only in one place while
in yours its in two spots.

If all (>1) break statements in the loop have the same post-branch
code then it might make some sense.  But as I said, I don't think
it occurs all that often.

Given the Python maxim of
  There should be one-- and preferably only one --obvious way to do it.

which of these is the preferred and obvious way?

while f():
  print "Hello!"
  if g():
break
else:
  print "this is a test"
also:
  print "this is not a pipe"

 -or-

while f():
  print "Hello!"
  if g():
print "this is a test"
break
else:
  print "this is not a pipe"


I prefer the second over the first.

Which of these is preferred?

while f():
  print "Hello"
  if g():
a = 10
print "world", a
break
  if h():
a = 12
print "world",a
break

  -or-

while f():
  print "Hello"
  if g():
a = 10
break
  if h():
a = 12
break
else:  # your else, not std. python's
  print "world", a

The latter is fragile, in some sense.  Suppose I added

  if hg():
a = 14
print "there"
break

Then I have to change all of the existing code to put the
"else:" block back into the loop.

That for me makes it a big no.

>> That is ... funky.  When is it useful?
> 
> Any time you've writen code that repeats a section of code at the end of
> all the if/elif statements or sets a variable to check so you can
> conditionally execute a block of code after the if for the same purpose.

Let me clarify.  When is it useful in real code?  Most cases
I can think of have corner cases which treat some paths different
than others.


> My thinking is that this would be the type of thing that would be used
> to argue against more specialized suggestions.  ie...   No a  new suggested keyword here> isn't needed because the also-else form
> already does that.  ;-)

An argument for 'X' because it prevents people from asking for
some theoretical 'Y' isn't that strong.  Otherwise Python would
have had a goto years ago.

> An example of this might be the case statement suggestions which have
> some support and even a PEP.  The if-alif-also-else works near enough to
> a case statement to fulfill that need.  'alif' (also-if) could  be
> spelled 'case' and maybe that would be clearer as many people are
> already familiar with case statements from other languages.

Assuming you are talking about PEP 275 ("Switching on Multiple
Values"), how does this fulfill that need any better than the
existing if/elif/else chain?

> Vetoing a suggestion on grounds of it can be done in another way, is
> also not sufficient either as by that reasoning we would still be using
> assembly language.  So the question I'm asking here is can an inverse to
>   the 'else' be useful enough to be considered?

I disagree.  Given the "one -- and preferably only one -- obvious
way to do it" there is already a strong bias against language
features which exist only to do something another way but not
a notably better way.

> I'll try to find some use case examples tomorrow, it shouldn't be too
> hard.  It probably isn't the type of thing that going to make huge
> differences.  But I think it's a fairly common code pattern so shouldn't
> be too difficult to find example uses from pythons library.

My guess is that it will be be hard.  There's no easy pattern
to grep for and I don't think the use case you mention comes up
often, much less often enough to need another control mechanism.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-13 Thread Andrew Dalke
Andrea Griffini wrote:
> This is investigating. Programming is more similar to building
> instead (with a very few exceptions). CS is not like physics or
> chemistry or biology where you're given a result (the world)
> and you're looking for the unknown laws. In programming *we*
> are building the world. This is a huge fundamental difference!

Philosophically I disagree.  Biology and physics depends on
models of how the world works.  The success of a model depends
on how well it describes and predicts what's observed.

Programming too has its model of how things work; you've mentioned
algorithmic complexity and there are models of how humans
interact with computers.  The success depends in part on how
well it fits with those models.

In biology there's an extremely well developed body of evidence
to show the general validity of evolution.  That doesn't mean
that a biological theory of predator-prey cycles must be based
in an evolutionary model.  Physics too has its share of useful
models which aren't based on QCD or gravity; weather modeling
is one and the general term is "phenomenology."

In programming you're often given a result ("an inventory
management system") and you're looking for a solution which
combines models of how people, computers, and the given domain work.

Science also has its purely observational domains.  A
biologist friend of mine talked about one of his conferences
where the conversations range from the highly theoretical
to the "look at this sucker we caught!"

My feeling is that most scientists do not develop new fundamental
theories.  They instead explore and explain things within
existing theory.  I think programming is similar.  Both fields
may build new worlds, but success is measured by its impact
in this world.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "also" to balance "else" ?

2005-06-13 Thread Andrew Dalke
Ron Adam wrote:
> It occurred to me (a few weeks ago while trying to find the best way to 
> form a if-elif-else block, that on a very general level, an 'also' 
> statement might be useful.  So I was wondering what others would think 
> of it.

> for x in :
> BLOCK1
> if : break   # do else block
> also:
> BLOCK2
> else:
> BLOCK3


For this specific case you could rewrite the code in
current Python as

for x in :
  BLOCK1
  if :
BLOCK3
break
else:
  BLOCK2

In order for your proposal to be useful you would need an
example more like the following in current Python

for x in :
  ...
  if :
BLOCK3
break
  ...
  if :
BLOCK3
break
else:
  BLOCK2

That is, where "BLOCK3;break" occurs multiple times in
the loop.  My intuition is that that doesn't occur often
enough to need a new syntax to simplify it.

Can you point to some existing code that would be improved
with your also/else?

> while :
>  BLOCK1
>  if : break# jump to else
> also:
>  BLOCK2
> else:
>  BLOCK3
> 
> Here if the while loop ends at the while , the BLOCK2 
> executes,  or if the break is executed, BLOCK3 executes.

which is the same (in current Python) as


while :
  BLOCK1
  if :
BLOCK3
break
else:
  BLOCK2

> In and if statement...
> 
> if :
>  BLOCK1
> elif :
>  BLOCK2
> elif :
>  BLOCK3
> also:
>  BLOCK4
> else:
>  BLOCK5
> 
> Here, the also block would execute if any previous condition is true, 
> else the else block would execute.

That is ... funky.  When is it useful?

One perhaps hackish solution I've done for the rare cases when
I think your proposal is useful is

while 1:
  if :
BLOCK1
  elif :
BLOCK2
  elif :
BLOCK3
  else:
# couldn't do anything
break
  BLOCK4
  break

> I think this gives Pythons general flow control some nice symmetrical 
> and consistent characteristics that seem very appealing to me.  Anyone 
> agree?

No.  Having more ways to do control flow doesn't make for code that's
easy to read.

My usual next step after thinking (or hearing) about a new Python
language change is to look at existing code and see if there's
existing code which would be easier to read/understand and get an
idea if it's a common or rare problem.  Perhaps you could point
out a few examples along those lines?

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-13 Thread Andrew Dalke
Peter Maas wrote:
> I think Peter is right. Proceeding top-down is the natural way of
> learning (first learn about plants, then proceed to cells, molecules,
> atoms and elementary particles).

Why in the world is that way "natural"?  I could see how biology
could start from molecular biology - how hereditary and self-regulating
systems work at the simplest level - and using that as the scaffolding
to describe how cells and multi-cellular systems work.

Plant biology was my least favorite part of my biology classes.  In
general I didn't like the "learn the names of all these parts" approach
of biology.  Physics, with its more directly predictive view of the world,
was much more interesting.  It wasn't until college when I read some
Stephen J. Gould books that I began to understand that biology was
different than "'the mitochondria is the powerhouse of the cell', here's
the gall bladder, that plant's a dicot, this is a fossilized trilobite."

Similarly, programming is about developing algorithmic thought.
A beginner oriented programming language should focus on that, and
minimize the other details.

Restating my belief in a homologous line: proceeding from simple to
detailed is the most appropriate way of learning.  Of course in some
fields even the simplest form takes a long time to understand, but
programming isn't string theory.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: new string function suggestion

2005-06-13 Thread Andrew Dalke
Andy wrote:
> What do people think of this?
> 
> 'prefixed string'.lchop('prefix') == 'ed string'
> 'string with suffix'.rchop('suffix') == 'string with '
> 'prefix and suffix.chop('prefix', 'suffix') == ' and '

Your use case is

> I get tired of writing stuff like:
> 
> if path.startswith('html/'):
>   path = path[len('html/'):]
> elif s.startswith('text/'):
>   path = path[len('text/'):]
> 
> It just gets tedious, and there is duplication.  Instead I could just write:
> 
> try:
>   path = path.lchop('html/')
>   path = path.lchop('text/')
> except SomeException:
>   pass

But your posted code doesn't implement your use case.  Consider
if path == "html/text/something".  Then the if/elif code sets
path to "text/something" while the lchop code sets it to "something".

One thing to consider is a function (or string method) which
is designed around the 'or' function, like this.  (Named 'lchop2'
but it doesn't give the same interface as your code.)

def lchop2(s, prefix):
  if s.startswith(prefix):
return s[len(prefix):]
  return None

path = lchop2(path, "html/") or lchop2(path, "text/") or path


If I saw a function named "lchop" (or perhaps named "lchomp") I
would expect it to be (named 'lchomp3' so I can distinguish
between it and the other two)

def lchop3(s, prefix):
  if s.startswith(prefix):
return s[len(prefix):]
  return s

and not raise an exception if the prefix/suffix doesn't match.
Though in this case your use case is not made any simpler.
Indeed it's uglier with either

newpath = path.lchop3("html/")
if newpath == path
  newpath = path.lchop3("text/")
  if newpath == path:
...

or

if path.startswith("html/"):
  path = path.lstrip("html/")
elif path.startswith("text/"):
  path = path.lstrip("text/")
   ...



I tried finding an example in the stdlib of code that would be
improved with your proposal.  Here's something that would not
be improved, from mimify.py (it was the first grep hit I
looked at)

if prefix and line[:len(prefix)] == prefix:
line = line[len(prefix):]
pref = prefix
else:
pref = ''

In your version it would be:

if prefix:
try:
line = line.rstrip(prefix)
except TheException:
pref = ''
else:
pref = prefix
else:
pref = ''

which is longer than the original.

 From pickle.py (grepping for 'endswith(' and a context of 2)

pickle.py-if ashex.endswith('L'):
pickle.py:ashex = ashex[2:-1]
pickle.py-else:
pickle.py:ashex = ashex[2:]

this would be better with my '3' variant, as

  ashex = ashex.rchop3('L')[2:]

while your version would have to be

  try:
ashex = ashex.rchomp('L')[2:]
  except SomeException:
ashex = ashex[2:]


Even with my '2' version it's the simpler

  ashex = (ashex.rchop2('L') or ashex)[2:]

The most common case will be for something like this

tarfile.py-if self.name.endswith(".gz"):
tarfile.py-self.name = self.name[:-3]

My "3" code handles it best

  self.name = self.name.rstrip3(".gz")

Because your code throws an exception for what isn't
really an exceptional case it in essence needlessly
requires try/except/else logic instead of the simpler
if/elif logic.

> Does anyone else find this to be a common need?  Has this been suggested 
> before?

To summarize:
  - I don't think it's needed that often
  - I don't think your implementation's behavior (using an
   exception) is what people would expect
  - I don't think it does what you expect

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dealing with marketing types...

2005-06-12 Thread Andrew Dalke
Paul Rubin wrote:
> Andrew Dalke <[EMAIL PROTECTED]> writes:
  ...
>> I found more details at
>> http://jeremy.zawodny.com/blog/archives/001866.html
>> 
>> It's a bunch of things - Perl, C, MySQL-InnoDB, MyISAM, Akamai,
>> memcached.  The linked slides say "lots of MySQL usage." 60 servers.
> 
> LM uses MySQL extensively but what I don't know is whether it serves
> up individual pages by the obvious bunch of queries like a smaller BBS
> might.  I have the impression that it's more carefully tuned than that.

The linked page links to a PDF describing the architecture.
The careful tuning comes in part from a high-performance caching
system - memcached.

>> I don't see that example as validating your statement that
>> LAMP doesn't scale for mega-numbers of hits any better than
>> whatever you might call "printing press" systems.
> 
> What example?  Slashdot?

Livejournal.  You gave it as a counter example to the LAMP
architecture used by /.

]  It seems to me that by using implementation methods that
] map more directly onto the hardware, a site with Slashdot's
] traffic levels could run on a single modest PC (maybe a laptop).
] I believe LiveJournal (which has something more like a million
] users) uses methods like that, as does ezboard. 

Since LJ uses a (highly hand-tuned) LAMP architecture, it isn't
an effective counterexample.

>  It uses way more hardware than it needs to,
> at least ten servers and I think a lot more.  If LJ is using 6x as
> many servers and taking 20x (?) as much traffic as Slashdot, then LJ
> is doing something more efficiently than Slashdot.  

I don't know where the 20x comes from.  Registered users?  I
read /. but haven't logged into it in 5+ years.  I know I
hit a lot /. more often than I do LJ (there's only one diary
I follow there).  The use is different as well; all people
hit one story / comments page, and the comments are ranked
based on reader-defined evaluations.  LJ has no one journal
that gets anywhere as many hits and there is no ranking scheme.

>> I'ld say that few sites have >100k users, much less
>> daily users with personalized information. As a totally made-up
>> number, only few dozens of sites (maybe a couple hundred?) would
>> need to worry about those issues.
> 
> Yes, but for those of us interested in how big sites are put together,
> those are the types of sites we have to think about ;-).

My apologies since I know this sounds snide, but then why didn't
you (re)read the LJ architecture overview I linked to above?
That sounds like something you would have been interested in
reading and would have directly provided information that
counters what you said in your followup.

The "ibm-poop-heads" article by Ryan Tomayko gives pointers to 
several other large-scale LAMP-based web sites.  You didn't
like the Google one.  I checked a couple of the others:

  IMDB -
  http://www.findarticles.com/p/articles/mi_zdpcm/is_200408/ai_ziff130634
  As you might expect, the site is now co-located with other Amazon.com
  sites, served up from machines running Linux and Apache, but ironically,
  most of the IMDb does not use a traditional database back end. Its
  message boards are built on PostgreSQL, and certain parts of IMDb
  Pro-including its advanced search-use MySQL, but most of the site is
  built with good old Perl script.

  del.icio.us
  Took some digging but I found
  http://lists.del.icio.us/pipermail/discuss/2004-November/001421.html
  "The database gets corrupted because the machine gets power-cycled,
  not through any fault of MySQL's."

The point is that LAMP systems do scale, both down and up.  It's
a polemic against "architecture astronauts" who believe the only
way to handle large sites (and /., LJ, IMDB, and del.icio.us are
larger than all but a few sites) is with some spiffy "enterprise"
architecture framework.

> I'd say
> there's more than a few hundred of them, but it's not like there's
> millions.  And some of them really can't afford to waste so much
> hardware--look at the constant Wikipedia fundraising pitches for more
> server iron because the Wikimedia software (PHP/MySQL, natch) can't
> handle the load.

Could that have, for example, bought EnterpriseWeb-O-Rama and done
any better/cheaper?  Could they have even started the project
had they gone that route?

> Yes, of course there is [exprience in large-scale web apps]. 
> Look at the mainframe transaction systems of the 60's-70's-80's, for
> example. Look at Google.

For the mainframe apps you'll have to toss anything processed
in batch mode, like payrolls.  What had the customization level
and scale comparable to 100K+ sites of today?  ATMs?  Stock trading?

Goo

Re: Code documentation tool similar to what Ruby (on Rails?) uses

2005-06-12 Thread Andrew Dalke
Ksenia Marasanova responsded to Michele Simionato
>> >>> print "%s" % inspect.getsource(os.makedirs)
> 
> That's easy, thanks! I guess I'll submit a patch for Epydoc with the
> functionality I've mentioned :)

Before doing that, add a "cgi.escape()" to the text.  Otherwise
embedded [<>&] characters will be interpreted as HTML.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree Namespace Prefixes

2005-06-12 Thread Andrew Dalke
On Sun, 12 Jun 2005 15:06:18 +, Chris Spencer wrote:

> Does anyone know how to make ElementTree preserve namespace prefixes in 
> parsed xml files?

See the recent c.l.python thread titled "ElemenTree and namespaces"
and started "May 16 2:03pm".  One archive is at

http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/31b2e9f4a8f7338c/363f46513fb8de04?&rnum=3&hl=en

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dealing with marketing types...

2005-06-11 Thread Andrew Dalke
Paul Rubin replied to me:
> If you're running a web site with 100k users (about 1/3 of the size of
> Slashdot) that begins to be the range where I'd say LAMP starts
> running out of gas.

Let me elaborate a bit.  That claim of 100K from me is the
entire population of people who would use bioinformatics or
chemical informatics.  It's the extreme upper bound of the
capacity I ever expect.  It's much more likely I'll only
need to handle a few thousand users.


> I believe
> LiveJournal (which has something more like a million users) uses
> methods like that, as does ezboard.  There was a thread about it here
> a year or so ago.

I know little about it, though I read at
http://goathack.livejournal.org/docs.html
] LiveJournal source is lots of Perl mixed up with lots of MySQL

I found more details at
http://jeremy.zawodny.com/blog/archives/001866.html

It's a bunch of things - Perl, C, MySQL-InnoDB, MyISAM, Akamai,
memcached.  The linked slides say "lots of MySQL usage."
60 servers.

I don't see that example as validating your statement that
LAMP doesn't scale for mega-numbers of hits any better than
whatever you might call "printing press" systems.

> As a simple example, that article's advice of putting all fine grained
> session state into the database (so that every single browser hit sets
> off SQL queries) is crazy.

To be fair, it does say "database plus cache" though the author
suggests the place for the cache is at the HTTP level and not
at the DB level.  I would have considered something like memcached
perhaps backed by an asychronous write to a db if you want the
user state saved even after the cache is cleared/reset.

How permanent though does the history need to be?  Your
approach wipes history when the user clears the cookie and it
might not be obvious that doing so should clear the history.

In any case, the implementation cost for this is likely
higher than what you did.  I mention it to suggest an
alternative.


> As for "big", hmm, I'd say as production web sites go, 100k users is
> medium sized, Slashdot is "largish", Ebay is "big", Google is huge.

I'ld say that few sites have >100k users, much less
daily users with personalized information. As a totally made-up
number, only few dozens of sites (maybe a couple hundred?) would
need to worry about those issues.

If that's indeed the case then I'll also argue that each of
them is going to have app-specific choke points which are best
hand-optimized and not framework optimized.  Is there enough
real-world experience to design a EnterpriseWeb-o-Rama (your
"printing press") which can handle those examples you gave
any better than starting off with a LAMP system and hand-caching
the parts that need it?

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dealing with marketing types...

2005-06-11 Thread Andrew Dalke
Paul Rubin wrote:
> That article makes a lot of bogus claims and is full of hype.  LAMP is
> a nice way to throw a small site together without much fuss, sort of
> like fancy xerox machines are a nice way to print a small-run
> publication without much fuss.  If you want to do something big, you
> still need an actual printing press.

In the comments the author does say he's trying to be provocative.

My question to you is - what is "something big"?  I've not been
on any project for which "LAMP" can't be used, and nor do I
expect to be.  After all, there's only about 100,000 people in
the world who might possibly interested using my software.  (Well,
the software I get paid to do; not, say, the couple of patches I've
sent in to Python).

I had one client consider moving from Python/CGI/flat files to
Java/WebLogic/Oracle.  The old code took nearly 10 seconds to
display a page (!).  They were convinced that they had gone past
the point where Python/CGI was useful, and they needed to use a
more scalable enterprise solution.  The conviction meant they
didn't profile the system.  With about a day of work I got the
performance down to under a second by removing some needless imports,
delaying others until they were needed, making sure all the
.pyc files existed, etc.

I could have gotten more performance switching to a persistent
Python web server and using a database instead of a bunch of
flat files in a directory, but that wasn't worth the time.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Developers Handbook

2005-06-10 Thread Andrew Dalke
Robert Kern wrote:
> There is no moderator. We are all moderators.

I am Spartacus!

We are all Kosh.

- Nicolas Bourbaki

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is pyton for me?

2005-06-10 Thread Andrew Dalke
Kent Johnson wrote:
> Where do you find check_call()? It's not in the docs and I get
>  >>> import subprocess
>  >>> subprocess.check_call
> Traceback (most recent call last):
>   File "", line 1, in ?
> AttributeError: 'module' object has no attribute 'check_call'
> 
> with Python 2.4.1.

Interesting.  I got my subprocess.py from CVS.  The CVS log

revision 1.12
date: 2005/01/01 09:36:34;  author: astrand;  state: Exp;  lines: +39 -1
New subprocess utility function: check_call. Closes #1071764.

The bug tracker is
  
http://sourceforge.net/tracker/index.php?func=detail&aid=1071764&group_id=5470&atid=305470

which says it's a 2.5ism.  Oops!  Sorry about that post
from the future.  I didn't realize it.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Developers Handbook

2005-06-10 Thread Andrew Dalke
wooks wrote:
> If I had posted or invited the group to look at my full list of items
> rather than just the python book link then I could see where you are
> coming from.

Take a look at http://www.templetons.com/brad/spamterm.html
for some of the first spams and reactions thereof.

There's a 30+ year history of posts which one person thinks
is relevant or important and others find off-topic, crass,
and rude.  A rough sort of social norms - called netiquette -
have come from that experience.

> If my intention was to "spam" this NG then the complaints as they were
> phrased would  only have served to make me more determined.

The intention is to prevent it from happening in the future.

If your intention is indeed to spam the group then there
are mechanisms to stop you, including such lovely terms as
killfiles and cancelbots.  Too much of it and you might find
your account suspended.  Or have you not wondered why few
spams make it here?

If your intention is to continue posting then it's a
warning of sorts that as in every community there are social
forms to follow, and often good reasons for those forms.

Terry backed up his response explaining not only the
convention for what you were doing, but also mentioned
(briefly) why he responded in the way he did.


I personally found your original posting blunt.  I thought
it was a virus or spam.  You see, I don't do eBay and
whenever I see that term in my mail in a URL it's either
a spam or a phishing attack.  So I ignored it.  If you
really wanted to sell it then following Terry's advice
and holding to social forms would have been better for
your auction.  There's little incentive for anyone to
follow that link without knowing more about it.

> Maybe we will all learn something from each other.

Hopefully you, but not likely the others involved.  As
I said, this sort of thing has a long history and for
anyone who's been doing this for years (like me) there's
little new to learn on the topic.  

To give an idea of the history, there's even an RFC
on netiquette from 10 years ago:
  http://www.faqs.org/rfcs/rfc1855.html

The directly relevant part is

- Advertising is welcomed on some lists and Newsgroups, and abhorred
  on others!  This is another example of knowing your audience
  before you post.  Unsolicited advertising which is completely
  off-topic will most certainly guarantee that you get a lot of
  hate mail.

Most assuredly, what Terry sent you is *not* hate mail.


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is pyton for me?

2005-06-09 Thread Andrew Dalke
Mark de la Fuente wrote:
> Here is an example of the type of thing I would like to be able to do. 
> Can I do this with python?  How do I get python to execute command line
> functions?
 ...
> # simple script to create multiple sky files.
> 
> foreach hour (10 12 14)
>   gensky 3 21 $hour > sky$hour.rad
> end

Dan Bishop gave one example using os.system.  The important
thing to know is that in the shell all programs can be used
as commands while in Python there isn't a direct connection.
Instead you need to call a function which translates a
request into something which calls the command-line program.

There are several ways to do that.  In Python before 2.4
the easiest way is with os.system(), which takes the command-line
text as a string.  For example,

import os
os.system("gensky 3 21 10 > sky10.rad")

You could turn this into a Python function rather easily

import os

def gensky(hour):
  os.system("gensky 3 21 %d > sky%d.rad" % (hour, hour))

for hour in (10, 12, 14):
  gensky(hour)


Python 2.4 introduces the subprocess module which makes it
so much easier to avoid nearly all the mistakes that can
occur in using os.system().  You could replace the 'gensky'
python function with

import subprocess
def gensky(hour):
  subprocess.check_call(["gensky", "3", "21", str(hour)],
   stdout = open("sky%d.rad" % (hour,), "w"))


The main differences here are:
 - the original code didn't check the return value of os.system().
It should do this because, for example, the gensky program might
not be on the path.  The check_call does that test for me.

 - I needed to do the redirection myself.  (I wonder if the
subprocess module should allow

  if isinstance(stdout, basestring):
stdout = open(stdout, "wb")

Hmmm)


> If I try and do a gensky command from the python interpreter or within a
> python.py file, I get an error message:
> 
> NameError: name ‘gensky’ is not defined

That's because Python isn't set up to search the command path
for an executable.  It only knows about variable names defined
in the given Python module or imported from another Python
module.

> If anyone has any suggestions on how to get python scripts to execute
> this sort of thing, what I should be looking at, or if there is
> something else I might consider besides python, please let me know.

You'll have to remember that Python is not a shell programming
language.  Though you might try IPython - it allows some of the
things you're looking for, though not all.

You should also read through the tutorial document on Python.org
and look at some of the Python Cookbook..  Actually, start with
  http://wiki.python.org/moin/BeginnersGuide

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: tail -f sys.stdin

2005-06-09 Thread Andrew Dalke
garabik:
> what about:
> 
> for line in sys.stdin:
> process(line)

This does not meet the OP's requirement, which was
>> I'd like to write a prog which reads one line at a time on its sys.stdin
>> and immediately processes it.
>> If there are'nt any new lines wait (block on input).

It's a subtle difference.  The implementation of iter(file)
reads a block of data at a time and breaks that into lines,
along with the logic to read another block as needed.  If
there isn't yet enough data for the block then Python will
sit there waiting.

The OP already found the right solution which is to call
the "readline()" method.

Compare the timestamps in the following

% ( echo "a" ; sleep 2 ; echo "b" ) | python -c "import sys, time\
for line in sys.stdin:\   
  print time.time(),  repr(line)"

1118335675.45 'a\n'
1118335675.45 'b\n'
% ( echo "a" ; sleep 2 ; echo "b" ) | python -c "import sys, time\
while 1:\
line = sys.stdin.readline()\
if not line: break \
print time.time(), repr(line)"
1118335678.56 'a\n'
1118335680.28 'b\n'
% 

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Incorrect number of arguments

2005-06-09 Thread Andrew Dalke
Steven D'Aprano wrote:
> *eureka moment*
> 
> I can use introspection on the function directly to see 
> how many arguments it accepts, instead of actually 
> calling the function and trapping the exception.

For funsies, the function 'can_call' below takes a function 'f'
and returns a new function 'g'.  Calling 'g' with a set of
arguments returns True if 'f' would take the arguments,
otherwise it returns False.  See the test case for an
example of use.


import new

def noop():
pass


def can_call(func):
# Make a new function with the same signature

# code(argcount, nlocals, stacksize, flags, codestring, constants, names,
#  varnames, filename, name, firstlineno, lnotab[, freevars[, 
cellvars]])
code = func.func_code
new_code = new.code(code.co_argcount,
code.co_nlocals,

noop.func_code.co_stacksize,

code.co_flags,

noop.func_code.co_code,  # don't do anything

code.co_consts,
code.co_names,
code.co_varnames,
code.co_filename,
"can_call_" + code.co_name,
code.co_firstlineno,

noop.func_code.co_lnotab, # for line number info

code.co_freevars,
# Do I need to set cellvars?  Don't think so.
)

# function(code, globals[, name[, argdefs[, closure]]])
new_func = new.function(new_code, func.func_globals,
"can_call_" + func.func_name,
func.func_defaults)

# Uses a static scope
def can_call_func(*args, **kwargs):
try:
new_func(*args, **kwargs)
except TypeError, err:
return False
return True
try:
can_call_func.__name__ = "can_call_" + func.__name__
except TypeError:
# Can't change the name in Python 2.3 or earlier
pass
return can_call_func


 test

def spam(x, y, z=4):
raise AssertionError("Don't call me!")


can_spam = can_call(spam)

for (args, kwargs) in (
((1,2), {}),
((1,), {}),
((1,), {"x": 2}),
((), {"x": 1, "y": 2}),
((), {"x": 1, "z": 2}),
((1,2,3), {}),
((1,2,3), {"x": 3}),
):
can_spam_result = can_spam(*args, **kwargs)
try:
spam(*args, **kwargs)
except AssertionError:
could_spam = True
except TypeError:
could_spam = False

if can_spam_result == could_spam:
continue

print "Failure:", repr(args), repr(kwargs)
print "Could I call spam()?", could_spam
print "Did I think I could?", can_spam_result
print

print "Done."


> Still a good question though. Why is it TypeError?

My guess - in most languages with types, functions are
typed not only on "is callable" but on the parameter
signature.  For example, in C


dalke% awk '{printf("%3d %s\n", NR, $0)}' tmp.c
  1 
  2 int f(int x, int y) {
  3 }
  4 
  5 int g(int x) {
  6 }
  7 
  8 main() {
  9   int (*func_ptr)(int, int);
 10   func_ptr = f;
 11   func_ptr = g;
 12 }
% cc tmp.c
tmp.c: In function `main':
tmp.c:11: warning: assignment from incompatible pointer type
% 

'Course the next question might be "then how about an
ArgumentError which is a subclasss of TypeError?"

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fast text display?

2005-06-08 Thread Andrew Dalke
Christopher Subich wrote:
> You're off by a decimal, though, an 80-column line 
> at 20ms is 4kbytes/sec.

D'oh!  Yeah, I did hundredths of a second instead of thousands.

> My guess is that any faster throughput than 
> 10kbytes/sec is getting amusing for a mud, which in theory intends for 
> most of this text to be read anyway.

Which is why I don't think you'll have a problem with any of
the standard GUI libraries.
 
> That looks quite good, except that Trolltech doesn't yet have a GPL-qt 
> for Win32. 

Cost and license weren't listed as requirements.  :)

You *did* say "hobby" though in post-hoc justification, I've known
people with some pretty expensive hobbies.


> See the scrolling problem in the original post, as to why I can't use it 
> as a temporary user interface. :)

Indeed, but MUDs 15 years ago could run in a terminal and display
colored text via ANSI terminal controls, letting the terminal
itself manage history and scrolling.  I had some sort of TSR for
the latter, under DOS.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fast text display?

2005-06-08 Thread Andrew Dalke
Christopher Subich wrote:
> My first requirement is raw speed; none of what I'm doing is 
> processing-intensive, so Python itself shouldn't be a problem here.

There's raw speed and then there's raw speed.  Do you want to
display, say, a megacharacter/second?

> it's highly desirable to have very fast text updates (text 
> inserted only at the end)-- any slower than 20ms/line stretches 
> usability for fast-scrolling.

Ahh, that's 400 bytes per second.  That's pretty slow.

> The second requirement is that it support text coloration.

> The third requirement is cross-platform-osity

qtextedit has all of those.  See
  http://doc.trolltech.com/3.3/qtextedit.html

Looks like LogText mode is exactly what you want
 http://doc.trolltech.com/3.3/qtextedit.html#logtextmode

] Setting the text format to LogText puts the widget in a special mode
] which is optimized for very large texts. Editing, word wrap, and rich
] text support are disabled in this mode (the widget is explicitly made
] read-only). This allows the text to be stored in a different, more
] memory efficient manner.

 and

] By using tags it is possible to change the color, bold, italic and
] underline settings for a piece of text. 

Depending on what you want, curses talking to a terminal might be
a great fit.  That's how we did MUDs back in the old days.  :)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: split up a list by condition?

2005-06-07 Thread Andrew Dalke
Reinhold Birkenfeld wrote:
>>> So I think: Have I overlooked a function which splits up a sequence
>>> into two, based on a condition? Such as
>>> 
>>> vees, cons = split(wlist[::-1], lambda c: c in vocals)

> This is clear. I actually wanted to know if there is a function which I
> overlooked which does that, which wouldn't be a maintenance nightmare at
> all.

Not that I know of, but if there is one it should be named
"bifilter", or "difilter" if you prefer Greek roots. :)


def bifilter(test, seq):
  passes = []
  fails = []
  for term in seq:
if test(term):
  passes.append(term)
else:
  fails.append(term)
  return passes, fails


>>> bifilter("aeiou".__contains__, "This is a test")
(['i', 'i', 'a', 'e'], ['T', 'h', 's', ' ', 's', ' ', ' ', 't', 's', 't'])
>>> 

Another implementation, though in this case I cheat because I
do the test twice, is

>>> from itertools import ifilter, ifilterfalse, tee
>>> def bifilter(test, seq):
...   seq1, seq2 = tee(seq)
...   return ifilter(test, seq1), ifilterfalse(test, seq2)
... 
>>> bifilter("aeiou".__contains__, "This is another test")
(, )
>>> map(list, _)
[['i', 'i', 'a', 'o', 'e', 'e'], ['T', 'h', 's', ' ', 's', ' ', 'n', 't', 'h', 
'r', ' ', 't', 's', 't']]
>>> 


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating file of size x

2005-06-06 Thread Andrew Dalke
Jan Danielsson wrote:
>Is there any way to create a file with a specified size?

Besides the simple

def make_empty_file(filename, size):
  f = open(filename, "wb")
  f.write("\0" * size)
  f.close()

?

If the file is large, try (after testing and fixing any
bugs):

def make_empty_file(filename, size, block = 32*1024):
  f = open(filename, "wb")
  written = 0
  s = "\0" * block
  for i in range(size//block):
f.write(s)
  remainder = size%block
  f.write(s[:remainder])
  f.close()

As Grant Edwards pointed out, you can do a seek(size-1)
but I don't know if it's fully portable.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: the python way?

2005-06-06 Thread Andrew Dalke
Reinhold Birkenfeld wrote:
> To make it short, my version is:
> 
> import random
> def reinterpolate2(word, vocals='aeiouy'):
> wlist = list(word)
> random.shuffle(wlist)
> vees = [c for c in wlist[::-1] if c in vocals]
> cons = [c for c in wlist[::-1] if c not in vocals]

Why the [::-1]?  If it's randomly shuffled the order isn't important.

> short, long = sorted((cons, vees), key=len)
> return ''.join(long[i] + short[i] for i in range(len(short))) + 
> ''.join(long[len(short):])

All the cool kids are using 2.4 these days.  :)

Another way to write this is (assuming the order of characters
can be swapped)

 N = min(len(short), len(long))
 return (''.join( [c1+c2 for (c1, c2) in zip(cons, vees)] +
 cons[N:] + vees[N:])

The main change here is that zip() stops when the first iterator finishes
so there's no need to write the 'for i in range(len(short))'

If the order is important then the older way is

if len(cons) >= len(vees):
short, long = vees, cons
else:
short, long = cons, vees
return (''.join( [c1+c2 for (c1, c2) in zip(short, long)] +
 long[len(short):])


'Course to be one of the cool kids, another solution is to use the
roundrobin() implementation found from http://www.python.org/sf/756253

from collections import deque
def roundrobin(*iterables):
pending = deque(iter(i) for i in iterables)
while pending:
task = pending.popleft()
try:
yield task.next()
except StopIteration:
continue
pending.append(task)



With it the last line becomes

 return ''.join(roundrobin(short, long))

Anyone know if/when roundrobin() will be part of the std. lib?
The sf tracker implies that it won't be.

Andrew
[EMAIL PROTECTED]
 
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: About size of Unicode string

2005-06-06 Thread Andrew Dalke
Frank Abel Cancio Bello wrote:
> Can I get how many bytes have a string object independently of its encoding?
> Is the "len" function the right way of get it?

No.  len(unicode_string) returns the number of characters in the
unicode_string.

Number of bytes depends on how the unicode character are represented.
Different encodings will use different numbers of bytes.

>>> u = u"G\N{Latin small letter A with ring above}"
>>> u
u'G\xe5'
>>> len(u)
2
>>> u.encode("utf-8")
'G\xc3\xa5'
>>> len(u.encode("utf-8"))
3
>>> u.encode("latin1")  
'G\xe5'
>>> len(u.encode("latin1"))
2
>>> u.encode("utf16") 
'\xfe\xff\x00G\x00\xe5'
>>> len(u.encode("utf16"))
6
>>> 

> Laci look the following code:
> 
>   import urllib2
>   request = urllib2.Request(url= 'http://localhost:6000')
>   data = 'data to send\n'.encode('utf_8')
>   request.add_data(data)
>   request.add_header('content-length', str(len(data)))
>   request.add_header('content-encoding', 'UTF-8')
>   file = urllib2.urlopen(request)
> 
> Is always true that "the size of the entity-body" is "len(data)"
> independently of the encoding of "data"?

For this case it is true because the logical length of 'data'
(which is a byte string) is equal to the number of bytes in the
string, and the utf-8 encoding of a byte string with character
values in the range 0-127, inclusive, is unchanged from the
original string.

In general, as if 'data' is a unicode strings, no.

len() returns the logical length of 'data'.  That number does
not need to be the number of bytes used to represent 'data'.
To get the bytes you must encode the object.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Software licenses and releasing Python programs for review

2005-06-06 Thread Andrew Dalke
max:
>> For me, the fact 
>> that corporations are considered people by the law is ridiculous. 

Steven D'Aprano wrote:
> Ridiculous? I don't think so. Take, for example, Acme Inc. Acme purchases
> a new factory. Who owns the factory? The CEO? The Chairperson of the Board
> of Directors? Split in equal shares between all the directors? Split
> between all the thousands of shareholders? Society has to decide between
> these methods.

Getting off-topic for c.l.py.  Might want to move this to, for example,
the talk thread for
  http://en.wikipedia.org/wiki/Corporate_personhood
which is
  http://en.wikipedia.org/wiki/Talk:Corporate_personhood
and read also
  http://en.wikipedia.org/wiki/Corporation

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: any macro-like construct/technique/trick?

2005-06-05 Thread Andrew Dalke
Mike Meyer wrote:
> I've never tried it with python, but the C preprocessor is available
> as 'cpp' on most Unix systesm. Using it on languages other than C has
> been worthwhile on a few occasions. It would certainly seem to
> directly meet the OP's needs.

Wouldn't that prohibit using #comments in the macro-Python code?
I suppose they could be made with strings, as in


  "here is a comment"
  do_something()

but it's ... strange.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: For review: PEP 343: Anonymous Block Redux and Generator Enhancements

2005-06-04 Thread Andrew Dalke
Steven Bethard wrote:
> Ahh, so if I wanted the locking one I would write:
> 
>  with locking(mutex) as lock, opening(readfile) as input:
>  ...

That would make sense to me.

> There was another proposal that wrote this as:
> 
>  with locking(mutex), opening(readfile) as lock, input:
>  ...
> 
> which is what was confusing me.  Mirroring the 'as' from the import 
> statement seems reasonable.

Ahh, you're right.  That was an earlier proposal.

> 
> But it doesn't address my other concern, namely, is
> 
>  with locking(mutex), opening(readfile) as input:
>  ...
> 
> equivalent to the nested with-statements, e.g.:

I would think it's the same as

with locking(mutex):
  with opening(readfile) as input:
...

which appears to map to the first of your alternatives

> Or is it equivalent to something different, perhaps:
> 
>  _locking = locking(mutex)
>  _opening = opening(readfile)
>  _exc = (None, None, None)
>  _locking.__enter__()
>  input = _opening.__enter__()
>  try:
>  try:
>  ...
>  except:
>  _exc = sys.exc_info()
>  raise
>  finally:
>  _opening.__exit__(*exc)
>  _locking.__exit__(*exc)

That wouldn't work; consider if _opening.__enter__() raised
an exception.  The _locking.__exit__() would never be called,
which is not what anyone would expect from the intent of
this PEP.

> Or maybe:
> 
>  _locking = locking(mutex)
>  _opening = opening(readfile)
>  _exc = (None, None, None)
>  _locking.__enter__()
>  input = _opening.__enter__()

Same problem here

>  finally:
>  # same order as __enter__ calls this time!!
>  _locking.__exit__(*exc)
>  _opening.__exit__(*exc)

and the order would be wrong since consider multiple
statements as

with server.opening() as connection, connection.lock(column) as C:
  C.replace("X", "Y")

The inner with depends on the outer and must be closed
in inverted order.


> And if it *is* just equivalent to the nested with-statements, how often 
> will this actually be useful?  Is it a common occurrence to need 
> multiple with-statements?  Is the benefit of saving a level of 
> indentation going to outweigh the complexity added by complicating the 
> with-statement?

Agreed.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: For review: PEP 343: Anonymous Block Redux and Generator Enhancements

2005-06-04 Thread Andrew Dalke
Nicolas Fleury wrote:
> I think it is simple and that the implementation is as much 
> straight-forward.  Think about it, it just means that:

Okay, I think I understand now.

Consider the following

server = open_server_connection()
with abc(server)
with server.lock()
do_something(server)

server.close()

it would be translated to

server = open_server_connection()
with abc(server):
  with server.lock()
do_something(server)
server.close()

when I meant for the first code example to be implemented
like this

server = open_server_connection()
with abc(server):
  with server.lock()
do_something(server)

server.close()


(It should probably use the with-block to handle the server open
and close, but that's due to my lack of imagination in coming up
with a decent example.)

Because of the implicit indentation it isn't easy to see that
the "server.close()" is in an inner block and not at the outer
one that it appears to be in.  To understand the true scoping
a reader would need to scan the code for 'with' lines, rather
than just looking at the layout.


> Good point.  As a C++ programmer, I use RAII a lot.

And I've used it a few times in Python, before I found
out it wasn't a guaranteed behavior by the language.

> So I come to another conclusion: the indentation syntax will most of the 
> time result in a waste of space.  Typically a programmer would want its 
> with-block to end at the end of the current block.

A test for how often this is needed would be to look in existing
code for the number of try/finally blocks.  I have seen and
written some gnarly deeply stacked blocks but not often - once
a year?

That's not to say it's a good indicator.  A lot of existing code
looks like this

def get_first_line(filename):
  f = open(filename)
  return f.readline()

depending on the gc to clean up the code.  A more ... not
correct, but at least finicky ... implementation could be

def get_first_line(filename):
  f = open(filename)
  try:
return f.readline()
  finally:
f.close()

Almost no one does that.  With the PEP perhaps the idiomatic
code would be

def get_first_line(filename):
  with open(filename) as f:
return f.readline()


(Add __enter__/__exit__ semantics to the file object?  Make
a new 'opening' function?  Don't know.)

What I mean by all of this is that the new PEP may encourage
more people to use indented blocks, in a way that can't be
inferred by simply looking at existing code.  In that case
your proposal, or the one written

  with abc, defg(mutex) as D, server.lock() as L:
..

may be needed.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: For review: PEP 343: Anonymous Block Redux and Generator Enhancements

2005-06-04 Thread Andrew Dalke
On Sat, 04 Jun 2005 10:43:48 -0600, Steven Bethard wrote:
> Ilpo Nyyssönen wrote:
>> How about this instead:
>> 
>> with locking(mutex), opening(readfile) as input:
>> ...

> I don't like the ambiguity this proposal introduces.  What is input 
> bound to?

It would use the same logic as the import statement, which already
supports an 'as' like this

>>> import sys, math, cStringIO as StringIO, xml.sax.saxutils as su
>>> 

> But the point is 
> that, whatever decision you make, I now have to *memorize* that decision.

It's the same rule so the rule would be "ahh, uses the 'as' form".

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to get name of function from within function?

2005-06-03 Thread Andrew Dalke
I'm with Steven Bethard on this; I don't know what you
(Christopher J. Bottaro) are trying to do.

Based on your example, does the following meet your needs?

>>> class Spam(object):
...   def funcA(self):
... print "A is called"
...   def __getattr__(self, name):
... if name.startswith("_"):
...   raise AttributeError, name
... f = get_function(name)
... if f is not None:
...   return f
... raise AttributeError, name
... 
>>> def get_function(name):
... return globals().get(name + "IMPL", None)
... 
>>> x = Spam()
>>> x.funcA()
A is called
>>> x.funcB()
Traceback (most recent call last):
  File "", line 1, in ?
  File "", line 10, in __getattr__
AttributeError: funcB
>>> def funcBIMPL():
...   print "Calling all bees"
... 
>>> x.funcB()
Calling all bees
>>> 


Confused-ly-your's

Andrew
[EMAIL PROTECTED]


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: For review: PEP 343: Anonymous Block Redux and Generator Enhancements

2005-06-03 Thread Andrew Dalke
Nicolas Fleury wrote:
> There's no change in order of deletion, it's just about defining the 
> order of calls to __exit__, and they are exactly the same.

BTW, my own understanding of this is proposal is still slight.
I realize a bit better that I'm not explaining myself correctly.

> As far as I 
> know, PEP343 has nothing to do with order of deletion, which is still 
> implementation-dependant.  It's not a constructor/destructor thing like 
> in C++ RAII, but __enter__/__exit__.

I'm mixing (because of my lack of full comprehension) RAII with
your proposal.

What I meant to say was in the PEP

 with locking(someMutex)
 with opening(readFilename) as input
 with opening(writeFilename) as output
 ...

it's very well defined when the __exit__() methods are
called and in which order.  If it's


 with locking(someMutex)
 with opening(readFilename) as input
 with opening(writeFilename) as output

with the __exit__()s called at the end of the scope (as if it
were a __del__, which it isn't) then the implementation could
still get the __exit__ order correct, by being careful.  Though
there would be no way to catch an exception raised in an __exit__.
I think.
 
>> Your approach wouldn't allow the following
> 
> No, I said making the ':' *optional*.  I totally agree supporting ':' is 
> useful.

Ahh, I think I understand.  You want both

with abc:
  with cde:
pass

and

with abc
with def

and to have the second form act somewhat like RAII in that
the __exit__() for that case is called when the scope ends.


Hmm.  My first thought is I don't like it because I'm a stodgy
old traditionalist and don't like the ambiguity of having to look
multiple tokens ahead to figure out which form is which.  

I can see that it would work.  Umm, though it's tricky.  Consider

with abc

with defg:
  with ghi
  with jkl:
1/0



The implementation would need to track all the with/as forms
in a block so they can be __exit__()ed as appropriate.  In this
case ghi.__exit() is called after jkl.__exit__() and
before defg.__exit__

The PEP gives an easy-to-understand mapping from the proposed
change to how it could be implemented by hand in the existing
Python.  Can you do the same?

> True.  But does it look as good?  Particularly the _ part?

I have not idea if the problem you propose (multiple with/as
blocks) will even exist so I can't comment on which solution
looks good.  It may not be a problem in real code, so not needing
any solution.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: For review: PEP 343: Anonymous Block Redux and Generator Enhancements

2005-06-03 Thread Andrew Dalke
Nicolas Fleury wrote:
> What about making the ':' optional (and end implicitly at end of current 
> block) to avoid over-indentation?
> 
> def foo():
>  with locking(someMutex)
>  with opening(readFilename) as input
>  with opening(writeFilename) as output
>  ...
>
> would be equivalent to:
> 
> def foo():
>  with locking(someMutex)
>  with opening(readFilename) as input
>  with opening(writeFilename) as output
>  ...

Nothing in Python ends at the end of the current block.
They only end with the scope exits.  The order of deletion
is not defined, and you would change that as well.

Your approach wouldn't allow the following

with locking(mutex):
  increment_counter()

x = counter()

with locking(mutex):
  decrement_counter()

 
except by making a new block, as

if 1:
  locking(mutex)

  x = counter()

if 1:
  locking(mutex)


If the number of blocks is a problem it wouldn't be that
hard to do

with multi( locking(someMutex),
opening(readFilename),
opening(writeFilename) ) as _, input, output:
  ...

Untested sketch of an implementation


class multi(object):
  def __init__(self, *args):
self.args = args
  def __enter__(self):
results = []
for i, arg in enumerate(self.args):
  try:
results.append(arg.__enter__())
  except:
# back up through the already __entered__ args
exc = sys.exc_info()
for j in range(i-1, -1, -1):
  try:
self.args[j].__exit__(*exc)
  except:
# Need to get the new exception, to match the PEP behavior
exc = sys.exc_info()
raise exc[0], exc[1], exc[2]
return results

  def __exit__(self, type, value, traceback):
for arg in self.args[::-1]:
  try:
arg.__exit__(type, value, traceback)
  except:
type, value, traceback = sys.exc_info()

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Formatting Time

2005-06-03 Thread Andrew Dalke
Coates, Steve (ACHE) wrote:
 import time
 t=36100.0
 time.strftime('%H:%M:%S',time.gmtime(t))
> '10:01:40'

But if t>=24*60*60 then H cycles back to 0

>>> import time
>>> t=24*60*60
>>> time.strftime('%H:%M:%S',time.gmtime(t))
'00:00:00'
>>> 

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: provide 3rd party lib or not... philosophical check

2005-06-02 Thread Andrew Dalke
Maurice LING wrote:
> Just a philosophical check here. When a program is distributed, is it 
> more appropriate to provide as much of the required 3rd party libraries, 
> like SOAPpy, PLY etc etc, in the distribution itself or it is the 
> installer's onus to get that part done?

Depends on who you are delivering the software to.

I've made distributions that include everything.  Often for people
who don't want to install other software.

I've made distributions that assume several other packages were
already installed.  Often for people who are Python developers
or who were okay with simple doing installation.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Two questions

2005-06-02 Thread Andrew Dalke
Steven D'Aprano wrote:
> The existence of one or two or a thousand profitable software packages
> out of the millions in existence does not invalidate my skepticism that
> some random piece of software will directly make money for the
> developer.

'Tis true.  I think (but have no numbers to back me up) that most software
in the world is developed in-house and is not distributed.  Eric Raymond
at http://www.catb.org/~esr/writings/magic-cauldron/magic-cauldron-3.html
says it's <5%

For example, all my income has been from consulting and contract work, and
none from product development.

> Even assuming that the money-making ability would be lost if the source
> code was available, which is not a given (id software open-sources old
> versions of their rendering engines, and MySQL is quite profitable and
> their software is available source code and all).

Regarding id, see section 10.3 of
 http://www.catb.org/~esr/writings/magic-cauldron/magic-cauldron-10.html

They open their software when there isn't much money to be made from it. 
See also John Carmack's comments at
 http://slashdot.org/interviews/99/10/15/1012230.shtml

> Going open-source from development day one with a game probably doesn't
> make much sense. Design by committee doesn't work particularly well, and
> for something with as much popular appeal as games, the signal to noise
> ratio would probably be very low.
 ...
> I am going to be releasing the majority of the code for Q3 soon, but
> there will still be proprietary bits that we reserve all rights to.
> We make a fairly good chunk of income from technology licensing, so
> it would take some damn good arguments to convince everyone that giving
> it all away would be a good idea.

MySQL isn't relevant; I know there's companies that make money
that way.  There's also those that don't, and there are times
when obsfucation (compiling, .pyc, etc) changes the economic landscape
enough to bring in enough extra money that overrides what is to
many the low or non-existent moral obligation to provide the
original source in an easily usable and re-distributable form.

> Software rarely makes money for the developers directly. The odds are
> against any developer, hence my skepticism.

Software and restaurant startups have high failure rates.  But
people like to think they are special and can beat the odds.  Some do.
 
>>  - stock market trading companies make money in part by having
>> specialized software to help with market trading, forecasts, etc.
> 
> You are mixing up a number of seperate issues here.

Yes, I am.

> If they distribute the software externally, then they almost certainly
> have more protection from licence agreements and copyright than they get
> from merely hiding the source. If they even do hide the source code,
> which is not a given.

Ahh, I thought by "hide the source" you meant "not open source".  Of
course there are many schemes whereby purchasers also get access
to the code but don't have redistribution rights.

> In-house use of market forecasting software falls into the "carpenter's
> hammer" category, not the "make money by selling software" category.

I was actually thinking of market trading software that was sold.
It's rather absurd to hide software from yourself.

One story I heard at the Python conference in Houston ('98, I think)
was of a company that developed this sort of package.  They redid
it and in-house used the newer version but sold the older and less
capable version to other companies, including competitors.

Nothing to do with open/closed/hidden/etc. but an interesting story.

> As for selling forecasting software, well, you haven't demonstrated that
> making the source code available would harm the ability to make money
> from it. Firstly, very often the value of the software is not the
> algorithms they use (curve fitting software and extrapolation algorithms
> are hardly secret), but the data used by the algorithm. So long as you
> keep the financial data proprietary, keeping the source code secret adds
> nothing.

At the 2000 Python conference in DC, Eric Raymond was the keynote.
He presented his ideas from "The Magic Cauldron"
  http://www.catb.org/~esr/writings/magic-cauldron/magic-cauldron.html

One of the examples he gave of a program that should not be open-sourced
was from a company that developed software to optimize lumber cutting
from a tree.  In that case the value *was* the algorithm used.

Note by the way that there were several objections to his presentation.
One was to his "Give Away the Recipe, Open A Restaurant"
http://www.catb.org/~esr/writings/magic-cauldron/magic-cauldron-9.html#ss9.3

In his talk he mentioned a famous restaurant, and pointed out you could
get the recipes for the meals.  One guy from the audience said he
worked for a sister restaurant to the one cited, that they signed
NDAs, and that the published recipes often excluded a few key parts, to
make it hard to duplicate.

>>   You are the US government developing s

Re: Two questions

2005-06-02 Thread Andrew Dalke
Greg Ewing wrote:
> Hmmm... if these are GPL weapons, if you want to fire
> them at anyone you'll *have* to make them available to
> all other countries as well... not good for
> non-proliferation...

I think the source code only needs to be sent to the
country which receive the weapons.  Include a DVD with
the warhead (in a usable form for further development)
and the GPL should be satisfied.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Formatting Time

2005-06-02 Thread Andrew Dalke
Ognjen Bezanov wrote:
> I have a float variable representing seconds, and i want to format it
> like this:
> 
> 0:00:00  (h:mm:ss)


>>> def format_secs(t):
...   m, s = divmod(t, 60)
...   h, m = divmod(m, 60)
...   return "%d:%02d:%02d" % (h, m, s)
... 
>>> format_secs(0)
'0:00:00'
>>> format_secs(1)
'0:00:01'
>>> format_secs(59)
'0:00:59'
>>> format_secs(60)
'0:01:00'
>>> format_secs(61)
'0:01:01'
>>> format_secs(3600)
'1:00:00'
>>> format_secs(3601)
'1:00:01'
>>> format_secs(3661)
'1:01:01'
>>> format_secs(3600*100+120+58)
'100:02:58'
>>> 

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Two questions

2005-06-02 Thread Andrew Dalke
Steven D'Aprano wrote:
> I can think of a number of reasons why somebody might want to hide their
> code. In no particular order:

> (3) You have create an incredibly valuable piece of code that will be
> worth millions, but only if nobody can see the source code. Yeah right.

 - id software makes a lot of money licensing their 3D FPS engine

 - stock market trading companies make money in part by having
specialized software to help with market trading, forecasts, etc.

> (8) You are programming a game or puzzle, and you don't want players to
> cheat by reading the source code. Consider pulling out the information
> they need to cheat and putting it in an encrypted data file instead.

  But code is data ...


> There may be other reasons for wanting to keep the code secret. Some of
> them might even be good reasons, for some value of "good".

  You are the US government developing software to design/test the
  next generation nuclear weapons system and don't want any other
  country to use it.  (GnuNuke?)

  You are a student working on a take-home example and you aren't
  allowed to work with/help anyone else

 
> If you really what to hide your code, you might like to think about
> using C-extensions instead.

Or go the Amazon/EBay/Google approach and provide only client access
to your code.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: date and time range checking

2005-06-02 Thread Andrew Dalke
Maksim Kasimov wrote:
> there are few of a time periods, for example:
>   2005-06-08 12:30 -> 2005-06-10 15:30,
>   2005-06-12 12:30 -> 2005-06-14 15:30
> 
> and there is some date and time value:
>   2005-06-11 12:30



> what is the "pythonic" way to check is the date/time value in the given 
> periods range?


>>> import datetime
>>> t1 = datetime.datetime(2005, 6, 8, 12, 30)
>>> t2 = datetime.datetime(2005, 6, 10, 15, 30)
>>> t = datetime.datetime(2005, 6, 9, 14, 00)
>>> if t1 < t < t2:
...   print "In range"
... 
In range
>>> t = datetime.datetime(2005, 6, 8, 14, 00)
>>> if t1 < t < t2:
...   print "In range"
... 
In range
>>> t = datetime.datetime(2005, 6, 7, 14, 00)
>>> 
>>> if t1 < t < t2:
...   print "In range"
... 
>>>

If you want to use the "in" syntax

>>> class InRange:
...   def __init__(self, low, high):
... self.low = low
... self.high = high
...   def __contains__(self, obj):
... return self.low < obj < self.high
... 
>>> r = InRange(t1, t2)
>>> datetime.datetime(2005, 6, 7, 14, 00) in r
False
>>> datetime.datetime(2005, 6, 8, 14, 00) in r
True
>>> datetime.datetime(2005, 6, 9, 14, 00) in r
True
>>> datetime.datetime(2005, 6, 9, 18, 00) in r
True
>>> datetime.datetime(2005, 6, 10, 18, 00) in r
False
>>> 

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: any macro-like construct/technique/trick?

2005-06-01 Thread Andrew Dalke
Mac wrote:
> After I wrote my post I realized I didn't provide enough context of
> what I'm doing, [explanation followed]

I have a similar case in mind.  Some graph algorithms work with a
handler, which is notified about various possible events: "entered
new node", "doing a backtrack", "about to leave a node".  A general
algorithm may implement many of these.  But if the handler doesn't
care about some of the events there's still the cost of either
doing a no-op call or checking if the callback function doesn't exist.

I've considered the idea of limited macro/template support for that
case, which either removes a callback or perhaps in-lines some
user-supplied code for the given circumstance.


>it... no... wait... no good, the problem is the following case:
># real line of code
>DbgObjFoo(a,b,costly_function(c))
># real line of code

In another branch I suggested

  debug_emit(DbgObjFoo, a, b, costly_function(c))

which obviously wouldn't work for this case.  The following lightly
tested code would work (assuming appropriate debugging)

  debug_emit(DbgObjFoo, a, b, 
 Call(costly_function, c, Call(expensive_function, d)))

def debug_emit(klass, *args):
  if debug:
 emit_dbg_code(Call(klass, *args)())

class Call:
  def __init__(self, f, *args):
self.f = f
self.args = args
  def __call__(self):
args = []
for arg in self.args:
  if isinstance(arg, Call):
args.append(arg())
  else:
args.append(arg)
return self.f(*args)

There's still the overhead of making the Call objects, but it
shouldn't be that large.  You can save a smidgeon by doing

if debug:
  class Call:
... as defined earlier
else:
  def Call(f, *args): pass



Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: any macro-like construct/technique/trick?

2005-06-01 Thread Andrew Dalke
Mac wrote:
> Is there a way to mimic the behaviour of C/C++'s preprocessor for
> macros?

There are no standard or commonly accepted ways of doing that.

You could do as Jordan Rastrick suggested and write your own sort
of preprocessor, or use an existing one.  With the new import
hooks you can probably make the conversion happen automatically,
though I hesitate suggestions that as you might actually do that.
It's typically a bad idea because you're in essence creating a
new language that is similar to but not Python, making it harder
for people to understand what's going on.

>  The problem: a lot of code like this:
> 
> def foo():
> #  do some stuff
> if debug:
> emit_dbg_obj(DbgObjFoo(a,b,c))
  ...
> * the two-lines of debug conditional tend to really break up the flow
> of the surrounding code

If flow is your only concern you can have a do-nothing
function and at the top have

if debug:
  emit = emit_dbg_obj
else:
  def emit(*args, **kwargs): pass

then write all your code as

   emit(DbgObjFoo(a,b,c))

> * using
>def debug_emit(obj):
>if debug:
>emit_dbg_obj(obj)
> is a poor solution, because it *always* instantiates DbgObj*, even when
> not needed; I want to avoid such unnecessary waste

That would work as well of course.

How bad is the waste?  Is it really a problem?

Is all your code of the form

  emit(Class(constructor, args))

?  If so, your debug_emit could be made to look like

  debug_emit(klass, *args, **kwargs):
if debug:
  emit_dbg_obj(klass(*args, **kwargs))

and used

  debug_emit(DbgObjFoo, a, b, c)
  debug_emit(DbgObjBar, d, e)

though I would use the do-nothing function version I sketched
earlier because, *ahem*, it avoids an unnecessary waste of
the extra "if debug:" check.  :)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pickle alternative

2005-06-01 Thread Andrew Dalke
simonwittber wrote:
> It would appear that the new version 1 format introduced in Python 2.4
> is much slower than version 0, when using the dumps function.

Interesting.  Hadn't noticed that change.  Is dump(StringIO()) as
slow?

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pickle alternative

2005-06-01 Thread Andrew Dalke
simonwittber posted his test code.

I tooks the code from the cookbook, called it "sencode" and
added these two lines

dumps = encode
loads = decode


I then ran your test code (unchanged except that my newsreader
folded the "value = ..." line) and got

marshal enc T: 0.21
marshal dec T: 0.4
sencode enc T: 7.76
sencode dec T: 11.56

This is with Python 2.3; the stock one provided by Apple
for my Mac.

I expected the numbers to be like this because the marshal
code is used to make and read the .pyc files and is supposed
to be pretty fast.

BTW, I tried the performance approach I outlined earlier.
The numbers aren't much better

marshal enc T: 0.2
marshal dec T: 0.38
sencode2 enc T: 7.16
sencode2 dec T: 9.49


I changed the format a little bit; dicts are treated a bit
differently.


from struct import pack, unpack
from cStringIO import StringIO

class EncodeError(Exception):
pass
class DecodeError(Exception):
pass

def encode(data):
f = StringIO()
_encode(data, f.write)
return f.getvalue()

def _encode(data, write, pack = pack):
# The original code use the equivalent of "type(data) is list"
# I preserve that behavior

T = type(data)

if T is int:
write("I")
write(pack("!i", data))
elif T is list:
write("L")
write(pack("!L", len(data)))
# Assumes len and 'for ... in' aren't lying
for item in data:
_encode(item, write)
elif T is tuple:
write("T")
write(pack("!L", len(data)))
# Assumes len and 'for ... in' aren't lying
for item in data:
_encode(item, write)
elif T is str:
write("S")
write(pack("!L", len(data)))
write(data)
elif T is long:
s = hex(data)[2:-1]
write("B")
write(pack("!i", len(s)))
write(s)
elif T is type(None):
write("N")
elif T is float:
write("F")
write(pack("!f", data))
elif T is dict:
write("D")
write(pack("!L", len(data)))
for k, v in data.items():
_encode(k, write)
_encode(v, write)
else:
raise EncodeError((data, T))
  

def decode(s):
"""
Decode a binary string into the original Python types.
"""
buffer = StringIO(s)
return _decode(buffer.read)

def _decode(read, unpack = unpack):
code = read(1)
if code == "I":
return unpack("!i", read(4))[0]
if code == "D":
size = unpack("!L", read(4))[0]
x = [_decode(read) for i in range(size*2)]
return dict(zip(x[0::2], x[1::2]))
if code == "T":
size = unpack("!L", read(4))[0]
return tuple([_decode(read) for i in range(size)])
if code == "L":
size = unpack("!L", read(4))[0]
return [_decode(read) for i in range(size)]
if code == "N":
return None
if code == "S":
size = unpack("!L", read(4))[0]
return read(size)
if code == "F":
return unpack("!f", read(4))[0]
if code == "B":
size = unpack("!L", read(4))[0]
return long(read(size), 16)
raise DecodeError(code)



dumps = encode
loads = decode


I wonder if this could be improved by a "struct2" module
which could compile a pack/unpack format once.  Eg,

float_struct = struct2.struct("!f")

float_struct.pack(f)
return float_struct.unpack('?\x80\x00\x00')[0]
  which might the same as
return float_struct.unpack1('?\x80\x00\x00')



Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pickle alternative

2005-05-31 Thread Andrew Dalke
simonwittber wrote:
> marshal can serialize small structures very qucikly, however, using the
> below test value:
> 
> value = [r for r in xrange(100)] +
> [{1:2,3:4,5:6},{"simon":"wittber"}]
> 
> marshal took 7.90 seconds to serialize it into a 561 length string.
> decode took 0.08 seconds.

Strange.  Here's what I found:

>>> value = [r for r in xrange(100)] +[{1:2,3:4,5:6},{"simon":"wittber"}]
>>> import time, marshal
>>> t1=time.time();s=marshal.dumps(value);t2=time.time()
>>> t2-t1
0.22474002838134766
>>> len(s)
561
>>> t1=time.time();new_value=marshal.loads(s);t2=time.time()
>>> t2-t1
0.3606879711151123
>>> new_value == value
True
>>> 

I can't reproduce your large times for marshal.dumps.  Could you
post your test code?

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pickle alternative

2005-05-31 Thread Andrew Dalke
simonwittber wrote: 
>>From the marhal documentation:
> Warning: The marshal module is not intended to be secure against
> erroneous or maliciously constructed data. Never unmarshal data
> received from an untrusted or unauthenticated source.

Ahh, I had forgotten that.  Though I can't recall what an attack
might be, I think it's because the C code hasn't been fully vetted
for unexpected error conditions.
 
> Any idea how this might be solved? The number of bytes used has to be
> consistent across platforms. I guess this means I cannot use the struct
> module?

How do you want to solve it?  Should a 64 bit machine be able to read
a data stream made on a 32 bit machine?  What about vice versa?  How
are floats interconverted?

You could preface the output stream with a description of the encoding
used: version number, size of float, size of int (which should always
be sizeof float these days, I think).  Read these then use that
information to figure out which decode/dispatch function to use.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pickle alternative

2005-05-31 Thread Andrew Dalke
simonwittber wrote:
> I've written a simple module which serializes these python types:
> 
> IntType, TupleType, StringType, FloatType, LongType, ListType, DictType

For simple data types consider "marshal" as an alternative to "pickle".

> It appears to work faster than pickle, however, the decode process is
> much slower (5x) than the encode process. Has anyone got any tips on
> ways I might speed this up?


   def dec_int_type(data):
   value = int(unpack('!i', data.read(4))[0])
   return value

That 'int' isn't needed -- unpack returns an int not a string
representation of the int.

BTW, your code won't work on 64 bit machines.

def enc_long_type(obj):
return "%s%s%s" % ("B", pack("!L", len(str(obj))), str(obj))

There's no need to compute str(long) twice -- for large longs
it takes a lot of work to convert to base 10.  For that matter,
it's faster to convert to hex, and the hex form is more compact.

Every decode you do requires several function calls.  While
less elegant, you'll likely get better performance (test it!)
if you minimize that; try something like this

def decode(data):
return _decode(StringIO(data).read)

def _decode(read, unpack = struct.unpack):
code = read(1)
if not code:
  raise IOError("reached the end of the file")
if code == "I":
   return unpack("!i", read(4))[0]
if code == "F":
   return unpack("!f", read(4))[0]
if code == "L":
   count = unpack("!i", read(4))
   return [_decode(read) for i in range(count)]
if code == "D":
   count = unpack("!i", read(4))
   return dict([_decode(read) for i in range(count)]
...



Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Exiting SocketServer: socket.error: (98, 'Address already in use')

2005-05-30 Thread Andrew Dalke
Magnus Lyckå wrote:
> Why doesn't my socket
> get released by the OS when I exit via my handle_error?

Hi Magnus,

  I wrote about this at
http://www.dalkescientific.com/writings/diary/archive/2005/04/21/using_xmlrpc.html

The reason for it is described at
  http://hea-www.harvard.edu/~fine/Tech/addrinuse.html

You can set the class variable "allow_reuse_address = True" in
your derived ExitableSocketServer to get the behaviour you
expect, at the expense of some problems mentioned in the
above URL.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: searching substrings with interpositions

2005-05-24 Thread Andrew Dalke
Claudio Grondi wrote:
> Note: code below is intended to help to clarify things only,
> so that a bunch of examples can be tested.
> If you need bugfree production quality code, maybe
> someone else can provide it.

Still not tested enough to ensure that it's bug free, but more
concise.  Here's one the implements the algorithm directly and
another that uses a regexp.  The latter should be the preferred
approach.  My intent was that the algorithm implements the given
pattern so they should given identical results.

# Doing the work ourselves
def find_spread_substring(query, target, limit = None):
stack = []
ti = qi = 0
Nq = len(query)
Nt = len(target)
delta = 0

while ti < Nt:
# We have a match
if query[qi] == target[ti]:
stack.append( (qi, ti, delta) )
qi = qi + 1
if qi == Nq:
return [ti for (qi, ti, delta) in stack]
ti = ti + 1
delta = 0
else:
# No match
while 1:
# If we have a partial match, check if we've
# gone over the limit.
if stack:
delta = delta + 1
if limit is not None and delta > limit:
# backtrack, treating it as an invalid match
# (so retry this 'else:' block)
qi, ti, delta = stack.pop()
continue
# No backtracking needed
break
# Advance to check the next character in the target
ti = ti + 1

# Failure
return None

# Using regular expressions
import re
def find_spread_substring2(query, target, limit = None):
if limit is None:
template = "(%s).*?"
else:
template = "(%%s).{,%d}?" % (limit,)
terms = [template % c for c in query]
pattern = "".join(terms)

pat = re.compile(pattern)
m = pat.search(target)
if not m:
return None
return [m.start(i) for i in range(1, len(query)+1)]


def test():
for (q, t, limit, is_valid) in (
("1010", "10001001", None, True),
("1010", "100011", None, False),
("1010", "100010", 3, True),
("1010", "100010", 1, True),
("1010", "110", 1, False),
("1010", "0110", 2, True),
("1010", "0110", 1, False),
("1010", "010", None, False),

):
result = find_spread_substring(q, t, limit)
result2 = find_spread_substring2(q, t, limit)
if result != result2:
raise AssertionError( (result, result2) )

if result is not None:
if limit is not None:
# check that it's a proper subset
for (x, y) in zip(result[:-1], result[1:]):
# +1 because 'limit' is the maximum gap size
if (y-x) > limit+1:
raise AssertionError((q, t, limit, result, x, y))
s = "".join([t[i] for i in result])
if s != q:
raise AssertionError((q, t, limit, result, s))

if result is None and not is_valid:
pass
elif result is not None and is_valid:
pass
else:
raise AssertionError( (q, t, limit, is_valid, result) )

if __name__ == "__main__":
test()
print "All tests passed."


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: searching substrings with interpositions

2005-05-24 Thread Andrew Dalke
[EMAIL PROTECTED] wrote:
> the next step of my job is to make limits of lenght of interposed
> sequences (if someone can help me in this way i'll apreciate a lot)
> thanx everyone.

Kent Johnson had the right approach, with regular expressions.
For a bit of optimization, use non-greedy groups.  That will
give you shorter matches.

Suppose you want no more than 10 bases between terms.  You could
use this pattern.

a.{,10}?t.{,10}?c.{,10}?g.{,10}?


>>> import re
>>> pat = re.compile('a.{,10}t.{,10}c.{,10}g.{,10}?')
>>> m = pat.search("tcgaacccgtagctaatcg")
>>> m.group(0), m.start(0), m.end(0)
('aacccgtagctaatcg', 3, 23)
>>> 

>>> pat.search("tcgaacccgtagctaatttg")
<_sre.SRE_Match object at 0x9b950>
>>> pat.search("tcgaacccgtagctaag")
>>> 

If you want to know the location of each of the bases, and
you'll have less than 100 of them (I think that's the limit)
then you can use groups in the regular expression language

>>> def make_pattern(s, limit = None):
... if limit is None:
... t = ".*?"
... else:
... t = ".{,%d}?" % (limit,)
... text = []
... for c in s:
... text.append("(%s)%s" % (c, t))
... return "".join(text)
... 
>>> make_pattern("atcg")
'(a).*?(t).*?(c).*?(g).*?'
>>> make_pattern("atcg", 10)
'(a).{,10}?(t).{,10}?(c).{,10}?(g).{,10}?'
>>> pat = re.compile(make_pattern("atcg", 10))
>>> m = pat.search("tcgaacccgtagctaatttg")
>>> m
<_sre.SRE_Match object at 0x8ea70>
>>> m.groups()
('a', 't', 'c', 'g')
>>> for i in range(1, len("atcg")+1):
...   print m.group(i), m.start(i), m.end(i)
... 
a 3 4
t 9 10
c 16 17
g 27 28
>>> 



Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: performance of Nested for loops

2005-05-20 Thread Andrew Dalke
querypk wrote:
> Is there a better way to code nested for loops as far as performance is
> concerned.
> 
> what better way can we write to improve the speed.
> for example:
> N=1
> for i in range(N):
>for j in range(N):
>do_job1
>for j in range(N):
>do_job2

For this case compute the range once

range_1 = range(1)
for i in range_1:
  for j in range_1:
do_job1()
  for j in range_1:
do_job2()

Using xrange(1) may be faster but you would need to test
that out for your case.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Python suitable for a huge, enterprise size app?

2005-05-18 Thread Andrew Dalke
Ivan Van Laningham wrote:
>  ...   Oh, it's interpreted, is it?  Interesting."  You can
> see Python going down the sewer pipes, right on their faces.

Nahh, the right answer is "It's byte-compiled, just like Java."

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Representing ambiguity in datetime?

2005-05-17 Thread Andrew Dalke
Ron Adam wrote:
> This is a very common problem in genealogy research as well as other 
> sciences that deal with history, such as geology, geography, and archeology.
  ..
> So it seems using 0's for the missing day or month may be how to do it.

Except of course humans like to make things more complicated than that.
Some journals are published quarterly so an edition might be "Jan-Mar".
Some countries refer to week numbers, so an event might be in "week 12".

I offer no suggestions as to how to handle these cases.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElemenTree and namespaces

2005-05-16 Thread Andrew Dalke
Matthew Thorley wrote:
> Does any one know if there a way to force the ElementTree module to
> print out name spaces 'correctly' rather than as ns0, ns1 etc? Or is
> there at least away to force it to include the correct name spaces in
> the output of tostring?

See http://online.effbot.org/2004_08_01_archive.htm#20040803
(starting with "more xml").  That was a response to Uche's article at
  http://www.xml.com/pub/a/2004/06/30/py-xml.html
and with a followup at
  http://www.xml.com/pub/a/2004/08/11/py-xml.html

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie : checking semantics

2005-05-16 Thread Andrew Dalke
Stefan Nobis wrote:
> From time to time I teach some programming (in an institution
> called "Volkshochschule" here in Germany -- inexpensive courses
> for adults). My Python course is for absolute beginners with no
> previous programming experience of any kind.

I also taught a beginning programming course.   I found that
the hard thing was learning the concept of an algorithm.  I
don't recall people getting confused over the indentation part
that much.

Python's block indentation is derived from ABC.  ABC did various
human factors studies to test how to make a language easier to
use by beginning programmers.  Combining that with my own experience
both using and teaching the language make me distrustful of anyone
else's evidence based only on anecdotal accounts.

Leaving out the easy of learning aspect, there are two other
reasons for the indentation:
  http://python.fyxm.net/doc/essays/foreword.html

] Perhaps Python's most controversial feature is its use of indentation
] for statement grouping, which derives directly from ABC. It is one of
] the language's features that is dearest to my heart. It makes Python
] code more readable in two ways. First, the use of indentation reduces
] visual clutter and makes programs shorter, thus reducing the attention
] span needed to take in a basic unit of code. Second, it allows the
] programmer less freedom in formatting, thereby enabling a more uniform
] style, which makes it easier to read someone else's code. 


> So I would appreciate optional statements to end a block
> (indentation rules may be mandatory). This comes also very handy
> in something like Python Server Pages of mod_python (where a
> comment line to explicitly end a block is sometimes needed).

See the program "pindent.py" in the Python distribution under
Tools/scripts

# This file contains a class and a main program that perform three
# related (though complimentary) formatting operations on Python
# programs.  When called as "pindent -c", it takes a valid Python
# program as input and outputs a version augmented with block-closing
# comments.  When called as "pindent -d", it assumes its input is a
# Python program with block-closing comments and outputs a commentless
# version.   When called as "pindent -r" it assumes its input is a
# Python program with block-closing comments but with its indentation
# messed up, and outputs a properly indented version.

# A "block-closing comment" is a comment of the form '# end '
# where  is the keyword that opened the block.  If the
# opening keyword is 'def' or 'class', the function or class name may
# be repeated in the block-closing comment as well.  Here is an
# example of a program fully augmented with block-closing comments:

# def foobar(a, b):
#if a == b:
#a = a+1
#elif a < b:
#b = b-1
#if b > a: a = a-1
## end if
#else:
#print 'oops!'
## end if
# # end def foobar

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question about the id()

2005-05-16 Thread Andrew Dalke
Peter Dembinski wrote:
> So, the interpreter creates new 'point in address space' every time
> there is object-dot-method invocation in program?

Yes.  That's why some code hand-optimizes inner loops by hoisting
the bound objection creation, as

data = []
data_append = data.append
for x in some_other_data:
   work with x to make y 
  data_append(y)


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie : checking semantics

2005-05-16 Thread Andrew Dalke
Stefan Nobis wrote:
> The other point is a missing (optional) statement to end blocks
> (so you optional don't have to mark block via whitespace). IMHO
> this comes very handy in some cases (like mixing Python and HTML
> like in PSP). From my experience i also would say beginners have
> quite some problems with only whitespace marking blocks (but it
> also has some benefits).

When you say "beginners" is that people with no previous
programming experience or those who have done C/Java/etc. language
which uses {}s?


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question about the id()

2005-05-16 Thread Andrew Dalke
kyo guan wrote:
>   Can someone explain why the id() return the same value, and why
> these values are changing? Thanks you.

 a=A()
 id(a.f)
> 11365872
 id(a.g)
> 11365872


The Python functions f and g, inside of a class A, are
unbound methods.  When accessed through an instance what's
returned is a bound method.

>>> A.f

>>> A().f
>
>>> 

In your code you do a.f, which creates a new bound method.
After the id() call its ref-count goes to zero and its
memory is freed.  Next you do a.g which creates a new
bound method.  In this case it reuses the same memory location,
which is why you get the same id.

I know Python keeps free lists for some data types.  I
suspect bound method objects are tracked this way because
they are made/destroyed so frequently.  That would increase
the likelihood of you seeing the same id value.


 a.f is a.g
> False

This is the first time you have two bound methods at
the same time.  Previously a bound method was garbage
collected before the next one was created.

 id(a.f), id(a.g), id(b.f), id(b.g)
> (11492408, 11492408, 11492408, 11492408)
 a.f is a.g
> False
 id(a.f), id(a.g), id(b.f), id(b.g)
> (11365872, 11365872, 11365872, 11365872)


The memory locations changed.  Here's a conjecture that
fits the facts and is useful to help understand.

Suppose the free list is maintained as a stack, with
the most recently freed object at the top of the stack,
which is the first to be used for the next object.

The "a.f is a.g" creates two bound methods, one at
11492408 and the other at 11365872.  Once the 'is'
is done it dec-refs the two methods, a.f first and
a.g second.  In this case the ref counts go to zero
and the memory moved to the free list.  At this point
the stack looks like

[11365872, 11492408,  ... rest of stack ... ]

You then do a.f.  This pulls from the top of the
stack so you get 11365872 again.  The id() tells
you that, and then the object gets decrefed and
put back on the stack.


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Escape spaces in strings

2005-05-12 Thread Andrew Dalke
Florian Lindner wrote:
> is there a function to escape spaces and other characters in string for
> using them as a argument to unix command? In this case rsync
> (http://samba.anu.edu.au/rsync/FAQ.html#10)

It's best that you use the subprocess module and completely skip
dealing with shell escapes.  The module is standard with 2.4 but
you can get the module and use it for 2.2 or 2.3.

http://docs.python.org/lib/module-subprocess.html

http://cvs.sourceforge.net/viewcvs.py/*checkout*/python/python/dist/src/Lib/subprocess.py

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unique Elements in a List

2005-05-12 Thread Andrew Dalke
Scott David Daniels wrote:
> Again polynomial, not exponential time.  Note that there is no
> polynomial time algorithm with (k < 1), since it takes O(n) time
> to read the problem.

Being a stickler (I develop software after all :) quantum computers
can do better than that.  For example, Grover's algorithm
  http://en.wikipedia.org/wiki/Grover%27s_algorithm
for searching an unsorted list solves the problem in O(N**0.5) time.

Being even more picky, I think the largest N that's been tested
so far is on the order of 5.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python features

2005-05-12 Thread Andrew Dalke
Peter Dembinski wrote:
> If you want to redirect me to Google, don't bother.  IMO ninety percent
> of writings found on WWW is just a garbage.

Sturgeon's law: Ninety percent of everything is crap. 


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: urllib download insanity

2005-05-12 Thread Andrew Dalke
Timothy Smith wrote:
> ok what i am seeing is impossible.
> i DELETED the file from my webserver, uploaded the new one. when my app 
> logs in it checks the file, if it's changed it downloads it. the 
> impossible part, is that on my pc is downloading the OLD file i've 
> deleted! if i download it via IE, i get the new file. SO, my only 
> conculsion is that urllib is caching it some where. BUT i'm already 
> calling urlcleanup(), so what else can i do?

Here are some ideas to use in your hunt.

 - If you are getting a cached local file then the returned object
will have a "name" attribute.

   result = urllib.retrieve(".")
   print result.fp.name

As far as I can tell, this will only occur if you use
a tempcache or a file URL.


  - You can force some debugging of the open calls, to see if
your program is dealing with a local file.

>>> old_open = open
>>> def my_open(*args):
...   print "opening", args
...   return old_open(*args)
... 
>>> open("/etc/passwd")

>>> import __builtin__
>>> __builtin__.open = my_open
>>> open("/etc/passwd")
opening ('/etc/passwd',)

>>> 

You'll may also need to change os.fdopen because that's used
by retrieve if it needs a tempfile.

If you want to see where the open is being called from,
use one of the functions in the traceback module to print
the stack trace.

  - for surety's sake, also do 

import webbrowser
webbrowser.open(url)

just before you do 

urllib.retrieve(url, filename)

This will double check that your program is using the URL you
expect it to use.

  - beyond that, check that you've got network activity,

You could check the router lights, or use a web sniffer like
ethereal, or set up a debugging proxy

  - check the headers.  If your ISP is using a cache then
it might insert a header into what it returns.  But if
it was caching then your IE view should have seen the cached
version as well.

Andrew
[EMAIL PROTECTED]
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pyvm -- faster python

2005-05-12 Thread Andrew Dalke
Paul Rubin wrote:
> Yes, there are several Python compilers already
 ...
> It's true that CPython doesn't have a compiler and that's a serious
> deficiency.  A lot of Python language features don't play that well
> with compilation, and that's often unnecessary.  So I hope the baseline
> implementation changes to a compiled one before the language evolves
> too much more.

Years ago, presented at one of the Python conferences, was a
program to generate C code from the byte code.  It would still
make calls to the Python run-time library (just as C does to
its run-time library).

The presenter did some optimizations, like not decref at the
end of one instruction when the next immediately does an incref
to it.  The conclusion I recall was that it wasn't faster -
at best a few percent - and there was a big memory hit because
of all the duplicated code.  One thought was that the cache miss
caused some of the performance problems.

Does that count as a compiler?

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: optparse

2005-05-11 Thread Andrew Dalke
Steven Bethard wrote:
> Well one reason might be that it's easy to convert from an object's 
> attributes to a dict, while it's hard to go the other direction:
  ...
> py> options['x'], options['y']
> ('spam', 42)
> py> o = ??? # convert to object???
> ...
> py> o.x, o.y
> ('spam', 42)

"hard" == "slightly less easy"?

class Spam:
  def __init__(self, d):
self.__dict__.update(d)

then

  o = Spam(options)

or use the types module (if you have a classic class)

>>> import types
>>> class Spam: pass
... 
>>> o = types.InstanceType(Spam, {"x": 5, "y": 10})
>>> o.x
5
>>> 

My guess is the original intent was to make the command-line
parameters act more like regular variables.  They are easier
to type (x.abc vs. x["abc"]) and the syntax coloring is different.


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Need a little parse help

2005-05-11 Thread Andrew Dalke
Delaney, Timothy C (Timothy) wrote:
> Remember, finalisers are not called when Python exits. So if you don't
> explicitly close the file you are *writing* to, it may not be flushed
> before being closed (by the OS because the process no longer exists).

Wrong.

% python
Python 2.3 (#1, Sep 13 2003, 00:49:11) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class ABC:
...   def __del__(self):
... print "I am here!"
... 
>>> abc = ABC()
>>> del abc
I am here!
>>> abc = ABC()
>>> import sys
>>> sys.exit()
I am here!
% 

There's documentation somewhere that describes what occurs
during Python's exit, but I can't find it right now.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: lists in cx_Oracle

2005-05-10 Thread Andrew Dalke
Daniel Dittmar wrote:
> Possible workarounds:
   ...
> - create a class for this purpose. Statement are created on the fly, but
> with placeholders so you don't run into the SQL Injection problem. As
> it's an object, you could cache these generated statements base on the
> size of the list
 
> It is unlikely that this can be solved at the driver level. Without
> support from the database, the driver would have to manipulate the SQL
> statement.
> And there are few predicates where a list parameter is useful. Expanding
> a list always yould lead to very bizarre error messages. Expanding them
> only where useful would require a SQL parser.

Perhaps I'm missing something fundamental here.  I thought the
terms like :arg2 were already being parsed at the Python/driver
interface, to insert the right values from the Python args.

If that was so then it could be solved at the driver level pretty
easily; use the aformentioned "class for this purpose".

It sounds like you're saying that the interface is actually implemented
by passing the execute string and a database-specific dictionary-like
object; the latter created by the DB-API interface.

If so, I now understand the limitation.

H.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A Faster Way...

2005-05-10 Thread Andrew Dalke
andrea.gavana wrote:
> If I simplify the problem, suppose I have 2 lists like:
> 
> a = range(10)
> b = range(20,30)
> 
> What I would like to have, is a "union" of the 2 list in a single tuple. In
> other words (Python words...):
> 
> c = (0, 20, 1, 21, 2, 22, 3, 23, 4, 24, 5, 25, .

The 'yield' statement is very useful for this sort of thing as
well as the itertools module.  I thought the latter had something
for this already but I don't see it.  Here's an implementation

>>> import itertools
>>> def round_robin(*iterables):
...   iterables = map(iter, iterables)
...   for element in itertools.cycle(iterables):
... yield element.next()
... 
>>> tuple(round_robin(range(10), range(20, 30)))
(0, 20, 1, 21, 2, 22, 3, 23, 4, 24, 5, 25, 6, 26, 7, 27, 8, 28, 9, 29)
>>> 

Don't know about the speed though.  Didn't have anything to
compare it took.  You mentioned you do a lot of string concatenation
Double checking; do you know that in Python it's faster to append
the new string elements to a list and only then do a single
string concatenation of the list elements?

That is, do

terms = []
for x in data:
   s = process_the_element(x)
   terms.append(s)

s = "".join(data)

rather than

# this is slow if there are many string concatenations
s = ""
for x in data:
  s = s + process_the_element(x)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: control precision for str(obj) output?

2005-05-03 Thread Andrew Dalke
Mike Meyer wrote:
> Someone want to tell me the procedure for submitting FAQ entries, so I
> can do that for this?

You mean more than what already exists at
  
http://www.python.org/doc/faq/general.html#why-are-floating-point-calculations-so-inaccurate

which has a link to an even more detailed chapter in the tutorial at
  http://docs.python.org/tut/node16.html

?

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: lists in cx_Oracle

2005-05-02 Thread Andrew Dalke
Steve Holden wrote:
> Do you think this is a DB-API 3-ish kind of a thing, or would it layer 
> over DB-API 2 in a relatively platform-independent manner?
... 
> but-you-may-know-better-ly y'rs  - steve

I am a tyro at this.  I had to find some tutorials on SQL
to learn there even was an IN clause for the WHERE statement.
All told I've had about 1 hour experience using DB-API 2.

I thought this would be a common enough need that others
would have chimed in by now saying "oh yes, you just need
to XYZ" where XYZ is something cleaner than "make a new
string to execute".

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: lists in cx_Oracle

2005-05-02 Thread Andrew Dalke
infidel wrote:
> I think perhaps you are asking for something that the OCI doesn't
> provide.

But it doesn't need to be supported by the OCI.

> And really, it all boils down to the list comprehension:
> 
> in_clause = ', '.join([':id%d' % x for x in xrange(len(ids))])

And why can't the equivalent to that be supported in the
DB-API interface, so I can pass in a list/tuple and have
it just work?

> ... elegance is certainly subjective, and the above statement isn't the
> cleanest ever, but it solves your main problem while avoiding the other
> problem you mentiong (sql injection).  Seems "elegant enough" to me.

The problem I mentioned is supporting inexperienced developers
(scientists writing software without local programming support)
who, in my experience, don't know about this pitfall and are
more likely to use a close but wrong solution than this correct
one.  repr(ids) is after all much easier to write.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: lists in cx_Oracle

2005-05-02 Thread Andrew Dalke
infidel wrote:

> Something like this might work for you:
> 
 ids= ['D102', 'D103', 'D107', 'D108']
 in_clause = ', '.join([':id%d' % x for x in xrange(len(ids))])
 sql = "select * from tablename where id in (%s)" % in_clause
 import cx_Oracle as ora
 con = ora.connect('foo/[EMAIL PROTECTED]')
 cur = con.cursor()
 cur.execute(sql, ids)

That's pretty much what I did but it seems inelegant.
I would rather do

  ids = ['D102', 'D103', 'D107', 'D108']
   .. connect and set up the cursor ..
  cursor.execute("select * from tablename where id in :ids", ids)

and if 'ids' is seen to be a list or tuple then it does
the appropriate conversion.  (I'm also fine with having
to use ()s in the SQL query, as in "id in (:ids)".)

The lack of a simple way to do this is error prone. I've seen
people do

  cursor.execute("select * from tablename where id in (%s)" % repr(ids))

because the repr of a string is close enough that it works
for expected string values.  But it opens up the possibility
of SQL injection problems.

Andrew
[EMAIL PROTECTED]
 
-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   >