Re: multiprocessing and freezing on Windows

2009-07-05 Thread Gabriel Genellina

En Sat, 04 Jul 2009 22:15:43 -0300, SK  escribió:


To add a bit more information, I found that I needed to patch
get_command_line in multiprocessing/forking.py [...]

Is packaging with multiprocessing supposed to be this hard? If so,
some documentation is needed.


Shouldn't be so hard, I presume. You may want to post your comments and  
upload the patch to http://bugs.python.org


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Method to separate unit-test methods and data?

2009-07-05 Thread Gabriel Genellina
En Sun, 05 Jul 2009 15:48:06 -0300, Nick Daly   
escribió:



[test_Midpoint_mid]
none_values = ((-1, None),
              (None, -12.8))

What I haven't yet figured out how to do though, is properly override
the default class member values with values from the config file.  The
config file's data is loaded as a string instead of as a list, as I'd
want.  This causes all the tests to fail, as while none_values needs
to be interpreted as a list, it is instead understood as:

" ((-1, None),\n               (None, -12.8))"

Does anyone have any solutions for these problems?  


You may use a .py file to configure it, instead of your .cfg


First, is there a
known and simple way to separate unit-test data and methods into
separate files?  


Just write them in separate files? I didn't quite understand the  
question...



Secondly, if not, is there a simple way to convert
strings into other base types, like lists, dictionaries, and so forth?


eval(), but I'd just use a .py file
People usually warns against using eval on arbitrary strings, or on  
user-supplied data, but in this case it is not worse than  
importing/executing the module.



Or, am I going to have to write my own list-parsing methods?  Would
pickling help?  I haven't yet had a chance to look into if or how that
would work...  If there's anything else I can clarify about this
request, feel free to let me know.


Another alternative would be a .csv file; I use them when the test data  
comes from other parties, like a manufacturer's data sheet.


I've never used a pickle to store test data - how do you generate the  
pickle contents in the first place? Usually I'd just use *that* code  
directly, but I think pickling the resulting objects would be OK is the  
process to regenerate them takes a lot of time.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: finding most common elements between thousands of multiple arrays.

2009-07-05 Thread Peter Otten
Scott David Daniels wrote:

> Scott David Daniels wrote:

>  t = timeit.Timer('sum(part[:-1]==part[1:])',
>   'from __main__ import part')

What happens if you calculate the sum in numpy? Try

t = timeit.Timer('(part[:-1]==part[1:]).sum()',
 'from __main__ import part')


Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A Bug By Any Other Name ...

2009-07-05 Thread Gary Herron

Gabriel Genellina wrote:
En Mon, 06 Jul 2009 00:28:43 -0300, Steven D'Aprano 
 escribió:

On Mon, 06 Jul 2009 14:32:46 +1200, Lawrence D'Oliveiro wrote:


I wonder how many people have been tripped up by the fact that

++n

and

--n

fail silently for numeric-valued n.


What do you mean, "fail silently"? They do exactly what you should 
expect:

++5  # positive of a positive number is positive


I'm not sure what "bug" you're seeing. Perhaps it's your expectations
that are buggy, not Python.


Well, those expectations are taken seriously when new features are 
introduced into the language - and sometimes the feature is dismissed 
just because it would be confusing for some.
If a += 1 works, expecting ++a to have the same meaning is very 
reasonable (for those coming from languages with a ++ operator, like C 
or Java) - more when ++a is a perfectly valid expression.
If this issue isn't listed under the various "Python gotchas" 
articles, it should...


Well sure, it's not unreasonable to expect ++n and --n to behave as in 
other languages, and since they don't, perhaps they should be listed as 
a "Python gotcha". 

But even so, it's quite arrogant of the OP to flaunt his ignorance of 
the language by claiming this is a bug and a failure.  It shouldn't have 
been all that hard for him to figure out what was really happening.


Gary Herron

--
http://mail.python.org/mailman/listinfo/python-list


Re: Python and webcam capture delay?

2009-07-05 Thread jack catcher (nick)

Tim Roberts kirjoitti:

"jack catcher (nick)"  wrote:
I'm thinking of using Python for capturing and showing live webcam 
stream simultaneously between two computers via local area network. 
Operating system is Windows. I'm going to begin with VideoCapture 
extension, no ideas about other implementation yet. Do you have any 
suggestions on how short delay I should hope to achieve in showing the 
video? This would be part of a psychological experiment, so I would need 
to deliver the video stream with a reasonable delay (say, below 100ms).


You need to do the math on this.  Remember that a full 640x480 RGB stream
at 30 frames per second runs 28 megabytes per second.  That's more than
twice what a 100 megabit network can pump.

You can probably use Python to oversee this, but you might want to consider
using lower-level code to control the actual hardware.  If you are
targeting Windows, for example, you could write a DirectShow graph to pump
into a renderer that transmits out to a network, then another graph to
receive from the network and display it.

You can manage the network latency by adding a delays in the local graph.


Thanks Tim, you're correct about the math. What is your main point about 
DirectShow: that it is generally faster and more reliable than doing the 
job high-level, or that one could use coding/decoding in DirectShow to 
speed up the transmission? I think the latter would be a great idea if 
the latency were tolerable. On the other hand, I'd like to keep things 
simple and do all the programming in Python. I've got no experience with 
DirectShow, but I guess the filters need to be programmed in C++ and 
called from Python?


Another option might be to use resolution 320x...@15fps.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Help with Sockets.

2009-07-05 Thread Gabriel Genellina
En Sun, 05 Jul 2009 23:06:30 -0300, tanner barnes   
escribió:


I am writing a program and in one section there is going to be a lobby  
with (for testing purposes) about 4 people in it. in the lobby there are  
two txtctrl's the first for entering your message and the second for  
displaying the message you and the other people in the lobby type. i am  
trying to figure out how to get all the text entrys from the users and  
display them in the second one. For clarification and example of what i  
want would be like yahoo or windows live messanger but for more than 2  
people.


Like a chat room, IRC?
It's easy to do using a client-server architecture.
Make all the clients connect to a central server. Any time someone writes  
some text, the client sends it to the server (but does not display it).  
The server just receives text from any client, and sends the received  
messages to all connected clients (including the one that sent it  
originally).
Any book on socket programming will help; there are a few specific for  
Python.
You may start with the "echo" example in the Python documentation. Make  
the networking part work first, then add the wx GUI if you want.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Does cProfile include IO wait time?

2009-07-05 Thread Gabriel Genellina
En Sat, 04 Jul 2009 21:03:38 -0300, Matthew Wilson   
escribió:


I expected to see a bunch of my IO file-reading code in there, but I  
don't.  So

this makes me think that the profiler uses CPU time, not
clock-on-the-wall time.
I'm not an expert on python profiling, and the docs seem sparse.  Can I
rule out IO as the bottleneck here?  How do I see the IO consequences?


I don't know either - but it's easy to check.
Write a program that just reads a lot from /dev/zero, and look at its  
profile. You should be able to tell whether I/O time is included or not.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Creating alot of class instances?

2009-07-05 Thread kk
Steven,

Before your post I was contemplating about the merits of using the
globals(). After reading your post I am totally convinced that your
suggestion that was also suggested by previous posters is the way to
go. At first I thought it would be limiting to not to have the
instance names properly setup, but now I understand it better.

thank you all again.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: mail

2009-07-05 Thread Banibrata Dutta
On Sat, Jul 4, 2009 at 9:51 PM,  wrote:

> Hi,
>
> I want to know that whether using python programming is it possible to
> extract chemical shift information about some amino acids of some protein
> from BMRB(BioMagResBank) or Ref-DB(referenced databank) or not.
>
> Thanks,
> Amrita Kumari
> Research Fellow
> IISER Mohali
> Chandigarh
> INDIA
>

Without any real knowledge of the problem domain, but with keyword level
google search, here's one one finds --
http://www.ccpn.ac.uk/api-documentation/ccpnmr/ccpnmr2.0/python/doc/api.html

http://biopython.org/wiki/Main_Page
http://chempython.org/
http://www.sschwarzer.net/

Also as Grant says, if you can create a software program in any language,
you can potentially do it with Python as well. As Python is a "batteries
included" language, chances are, the included-batteries would make your life
easier. Of course mileage may vary. If you find existing modules that do
much of what you want, you have a very good starting point.
-- 
regards,
Banibrata
http://www.linkedin.com/in/bdutta
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A Bug By Any Other Name ...

2009-07-05 Thread Gabriel Genellina
En Mon, 06 Jul 2009 00:28:43 -0300, Steven D'Aprano  
 escribió:

On Mon, 06 Jul 2009 14:32:46 +1200, Lawrence D'Oliveiro wrote:


I wonder how many people have been tripped up by the fact that

++n

and

--n

fail silently for numeric-valued n.


What do you mean, "fail silently"? They do exactly what you should  
expect:

++5  # positive of a positive number is positive


I'm not sure what "bug" you're seeing. Perhaps it's your expectations
that are buggy, not Python.


Well, those expectations are taken seriously when new features are  
introduced into the language - and sometimes the feature is dismissed just  
because it would be confusing for some.
If a += 1 works, expecting ++a to have the same meaning is very reasonable  
(for those coming from languages with a ++ operator, like C or Java) -  
more when ++a is a perfectly valid expression.
If this issue isn't listed under the various "Python gotchas" articles, it  
should...


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Clarity vs. code reuse/generality

2009-07-05 Thread David Smith
kj wrote:
> In <7x4otsux7f@ruckus.brouhaha.com> Paul Rubin 
>  writes:
> 
>> kj  writes:
>>> sense = cmp(func(hi), func(lo))
>>> assert sense != 0, "func is not strictly monotonic in [lo, hi]"
> 
>> bisection search usually just requires the function to be continuous
>> and to have its value cross the target somewhere between the endpoints,
>> not be monotonic.
> 
> Try the algorithm I posted with lo = -pi/4, hi = 2*pi, func = cos,
> target = -1, and see what you get...
> 
>>> I regard the very special case of func(hi)==func(lo)==target as
>>> pathological (analogous to the fact that a stopped watch is "exactly
>>> right" twice a day), and not one I care to support.
> 
>> I do think you should support that case, under the "do 'nothing'
>> gracefully" principle.
> 
> You keep missing the point that this is an *internal* *helper*
> *convenience* function, meant to abstract away common logic from
> a handful of places and thus eliminate some code repetition within
> a module.  It is *not* a library function intended to be called
> from elsewhere.  So talk of "supporting" anything is besides the
> point.  Any internal use of this function that applies it to a
> non-strictly-monotonic function is, by assumption, an error.
> 
> kj

First, let me say *I got the point*.  I use asserts, but only in unit
testing where I want to test the result of some action for correctness.
 In the course of programming product code, I personally don't think
they should ever be used exactly for the reasons everyone else is
pointing out.  They can be disabled with the -O option and that changes
the program's behavior in ways that could break in production.

If you insist on teaching the assert statement, teach it in the context
of writing unit testing code.  Its an extremely valuable skill.

--David
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating alot of class instances?

2009-07-05 Thread Steven D'Aprano
On Sun, 05 Jul 2009 19:27:25 -0700, kk wrote:

> Hi
> 
> Thank you so much for wonderful tips and suggestions.
> 
> I also found a solution to dynamic naming of the instances(I think). It
> does not sound like a very secure method but since my application will
> be just processing data one way I think it might be alright. I will
> compare to the list and dictionary methods.
> 
> globals()["Some_Instance_Name"]


You're fighting the computer instead of working with it. That's the Wrong 
Way to solve the problem -- you're doing more work than needed, for 
little or no benefit.

My bet is, you have code that looks something like this:


for i in range(N):  # N comes from somewhere else
# Create a new variable
globals()["Some_Instance_Name%s" % i] = instance()

# Do something with the variables
for i in range(N):
# Look up the variable
x = globals()["Some_Instance_Name%s" % i]
process(x)


Am I close?

That's the Wrong Way to do it -- you're using a screwdriver to hammer a 
nail. The right way to work with an unknown number of data elements is to 
put them in a list, and process each element in the list, not to try 
giving them all unique names. The only reason for using named variables 
is so you can use the name in source code:

my_value = Some_Instance87 + Some_Instance126

But you can't do that, because you don't know how many instances there 
are, you don't know whether to write Some_Instance87 or Some_Instance125 
or Some_Instance19.


So instead, do something like this:


instances = []
for i in range(N):
# Create a new instance and store it for later
instances.append( instance() )

# Later on:
for x in instances():
process(x)




-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Adding the Copy Property to a Simple Histogram

2009-07-05 Thread W. eWatson

Simon Forman wrote:

On Jul 5, 9:48 pm, "W. eWatson"  wrote:

The code below produces a text window 60hx20w with a scroll bar. The
contents are something of a histogram of values from 0 to 255. If one tries
to copy the contents, Windows doesn't allow it. What needs to be done to
allow a copy and paste?

 def ShowHistogram(self):
 if not self.current_image:
 return

 if self.histogram:
 self.histogram.destroy()

 t = Toplevel( self.master )
 t.title("Histogram")
 t.bind( '', self.DestroyHistogram )
 text = Text( t, height=60, width=20 )
 scroll = Scrollbar(t, command=text.yview)
 text.configure(yscrollcommand=scroll.set)
 text.pack(side=LEFT, fill='both', expand=1)
 scroll.pack(side=RIGHT, fill=Y)
 self.histogram = t

 hist = self.current_image.histogram()
 for i in range(len(hist)):
 msg = "%5d %6d\n" % (i,hist[i])
 text.insert( END, msg )
--
W. eWatson

  (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
   Obz Site:  39° 15' 7" N, 121° 2' 32" W, 2700 feet

 Web Page: 


Do you mean that the Text widget doesn't let you copy-and-paste copy
its contents using selection and ?  That shouldn't have
anything to do with the contents of the Text widget.
Whoops, I missed it. I'm not quite sure what I did to make it seem like it 
doesn't copy, but it does. I fooled myself. All is well.


--
   W. eWatson

 (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
  Obz Site:  39° 15' 7" N, 121° 2' 32" W, 2700 feet

Web Page: 

--
http://mail.python.org/mailman/listinfo/python-list


Re: finding most common elements between thousands of multiple arrays.

2009-07-05 Thread Steven D'Aprano
On Sun, 05 Jul 2009 17:30:58 -0700, Scott David Daniels wrote:

> Summary: when dealing with numpy, (or any bulk <-> individual values
> transitions), try several ways that you think are equivalent and
> _measure_.

This advice is *much* more general than numpy -- it applies to any 
optimization exercise. People's intuitions about what's fast and what's 
slow are often very wrong.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A Bug By Any Other Name ...

2009-07-05 Thread Steven D'Aprano
On Mon, 06 Jul 2009 14:32:46 +1200, Lawrence D'Oliveiro wrote:

> I wonder how many people have been tripped up by the fact that
> 
> ++n
> 
> and
> 
> --n
> 
> fail silently for numeric-valued n.

What do you mean, "fail silently"? They do exactly what you should expect:


>>> ++5  # positive of a positive number is positive
5
>>> --5  # negative of a negative number is positive
5
>>> -+5  # negative of a positive number is negative
-5

So does the bitwise-not unary operator:

>>> ~~5
5


I'm not sure what "bug" you're seeing. Perhaps it's your expectations 
that are buggy, not Python.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Adding the Copy Property to a Simple Histogram

2009-07-05 Thread Simon Forman
On Jul 5, 9:48 pm, "W. eWatson"  wrote:
> The code below produces a text window 60hx20w with a scroll bar. The
> contents are something of a histogram of values from 0 to 255. If one tries
> to copy the contents, Windows doesn't allow it. What needs to be done to
> allow a copy and paste?
>
>      def ShowHistogram(self):
>          if not self.current_image:
>              return
>
>          if self.histogram:
>              self.histogram.destroy()
>
>          t = Toplevel( self.master )
>          t.title("Histogram")
>          t.bind( '', self.DestroyHistogram )
>          text = Text( t, height=60, width=20 )
>          scroll = Scrollbar(t, command=text.yview)
>          text.configure(yscrollcommand=scroll.set)
>          text.pack(side=LEFT, fill='both', expand=1)
>          scroll.pack(side=RIGHT, fill=Y)
>          self.histogram = t
>
>          hist = self.current_image.histogram()
>          for i in range(len(hist)):
>              msg = "%5d %6d\n" % (i,hist[i])
>              text.insert( END, msg )
> --
>                                 W. eWatson
>
>               (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
>                Obz Site:  39° 15' 7" N, 121° 2' 32" W, 2700 feet
>
>                      Web Page: 

Do you mean that the Text widget doesn't let you copy-and-paste copy
its contents using selection and ?  That shouldn't have
anything to do with the contents of the Text widget.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A Bug By Any Other Name ...

2009-07-05 Thread Chris Rebert
On Sun, Jul 5, 2009 at 7:32 PM, Lawrence
D'Oliveiro wrote:
> I wonder how many people have been tripped up by the fact that
>
>    ++n
>
> and
>
>    --n
>
> fail silently for numeric-valued n.

Given that C-style for-loops are relatively infrequent in Python and
are usually written using range() when they are needed, it's probably
not that prevalent a problem.
I suppose the lexer could be changed to make ++ and -- illegal...

Cheers,
Chris
-- 
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


A Bug By Any Other Name ...

2009-07-05 Thread Lawrence D'Oliveiro
I wonder how many people have been tripped up by the fact that

++n

and

--n

fail silently for numeric-valued n.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating alot of class instances?

2009-07-05 Thread kk
Hi

Thank you so much for wonderful tips and suggestions.

I also found a solution to dynamic naming of the instances(I think).
It does not sound like a very secure method but since my application
will be just processing data one way I think it might be alright. I
will compare to the list and dictionary methods.

globals()["Some_Instance_Name"]

-- 
http://mail.python.org/mailman/listinfo/python-list


Help with Sockets.

2009-07-05 Thread tanner barnes

I am writing a program and in one section there is going to be a lobby with 
(for testing purposes) about 4 people in it. in the lobby there are two 
txtctrl's the first for entering your message and the second for displaying the 
message you and the other people in the lobby type. i am trying to figure out 
how to get all the text entrys from the users and display them in the second 
one. For clarification and example of what i want would be like yahoo or 
windows live messanger but for more than 2 people. 

Python version: 2.6
GUI toolkit: WxPython


_
Hotmail® has ever-growing storage! Don’t worry about storage limits. 
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009-- 
http://mail.python.org/mailman/listinfo/python-list


Adding the Copy Property to a Simple Histogram

2009-07-05 Thread W. eWatson
The code below produces a text window 60hx20w with a scroll bar. The 
contents are something of a histogram of values from 0 to 255. If one tries 
to copy the contents, Windows doesn't allow it. What needs to be done to 
allow a copy and paste?


def ShowHistogram(self):
if not self.current_image:
return

if self.histogram:
self.histogram.destroy()

t = Toplevel( self.master )
t.title("Histogram")
t.bind( '', self.DestroyHistogram )
text = Text( t, height=60, width=20 )
scroll = Scrollbar(t, command=text.yview)
text.configure(yscrollcommand=scroll.set)
text.pack(side=LEFT, fill='both', expand=1)
scroll.pack(side=RIGHT, fill=Y)
self.histogram = t

hist = self.current_image.histogram()
for i in range(len(hist)):
msg = "%5d %6d\n" % (i,hist[i])
text.insert( END, msg )
--
   W. eWatson

 (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
  Obz Site:  39° 15' 7" N, 121° 2' 32" W, 2700 feet

Web Page: 

--
http://mail.python.org/mailman/listinfo/python-list


Re: PEP 376

2009-07-05 Thread Lawrence D'Oliveiro
In message , Charles 
Yeomans wrote:

> On the contrary, MD5 was intended to be a cryptographic hash function,
> not a checksum.

Just like MD4 and MD2 before it. They have long since been considered 
worthless, and now MD5 has joined them.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python and webcam capture delay?

2009-07-05 Thread Tim Roberts
"jack catcher (nick)"  wrote:
>
>I'm thinking of using Python for capturing and showing live webcam 
>stream simultaneously between two computers via local area network. 
>Operating system is Windows. I'm going to begin with VideoCapture 
>extension, no ideas about other implementation yet. Do you have any 
>suggestions on how short delay I should hope to achieve in showing the 
>video? This would be part of a psychological experiment, so I would need 
>to deliver the video stream with a reasonable delay (say, below 100ms).

You need to do the math on this.  Remember that a full 640x480 RGB stream
at 30 frames per second runs 28 megabytes per second.  That's more than
twice what a 100 megabit network can pump.

You can probably use Python to oversee this, but you might want to consider
using lower-level code to control the actual hardware.  If you are
targeting Windows, for example, you could write a DirectShow graph to pump
into a renderer that transmits out to a network, then another graph to
receive from the network and display it.

You can manage the network latency by adding a delays in the local graph.
-- 
Tim Roberts, t...@probo.com
Providenza & Boekelheide, Inc.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why is my code faster with append() in a loop than with a large list?

2009-07-05 Thread MRAB

Xavier Ho wrote:
(Here's a short version of the long version below if you don't want to 
read:)


Why is version B of the code faster than version A? (Only three lines 
different)


Version A: http://pastebin.com/f14561243
Version B: http://pastebin.com/f1f657afc



I was doing the problems on Project Euler for practice with Python last 
night. Problem 12 was to find the value of the first triangular number 
that has over 500 divisors.

=

The sequence of triangle numbers is generated by adding the natural 
numbers. So the 7^(^th ) triangle number would be 1 + 2 + 3 + 4 + 5 + 6 
+ 7 = 28. The first ten terms would be:


1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...

Let us list the factors of the first seven triangle numbers:

* 1*: 1
* 3*: 1,3
* 6*: 1,2,3,6
*10*: 1,2,5,10
*15*: 1,3,5,15
*21*: 1,3,7,21
*28*: 1,2,4,7,14,28

We can see that 28 is the first triangle number to have over five divisors.

What is the value of the first triangle number to have over five hundred 
divisors?


=

My initial code was to loop through from 1 to half the number and see 
which were divisors, and as I find them I store them in a list. That 
would have taken days.


My second try was factorising the number each time, and count the 
divisors using the powers of each factor, plus 1, and multiply together.

The code is here (Version A): http://pastebin.com/f14561243

This worked, but it took overnight to compute. Before I went to bed a 
friend of mine caught me online, and apparently left me a working 
version under 8 seconds with only 3 line difference.

The code is here (Version B): http://pastebin.com/f1f657afc

That was amazing. But I have no idea why his edit makes it so much 
faster. I did a test to see whether if append() was faster (which I 
doubted) than defining a list with a large size to begin with, and I was 
right:

http://pastebin.com/f4b46d0db
Which shows that appending is 40x slower, and was expected. But I still 
can't puzzle out why his use of appending in Version B was so much 
faster than mine.


Any insights would be welcome. I'm going on a family trip, though, so my 
replies may delay.



In your version you're creating a list of (num + 1) elements, but in the
other version the list is only as long as the largest factor.

For example, for num=28, your version creates a list 29 elements long,
but the other version creates one only 7 elements long. Also, 'the time
needed to append an item to the list is "amortized constant"' (quoted
from http://effbot.org/zone/python-list.htm).

This means that your little speed test isn't representation of what's
actually happening.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Why is my code faster with append() in a loop than with a large list?

2009-07-05 Thread David

Xavier Ho wrote:
(Here's a short version of the long version below if you don't want to 
read:)


Why is version B of the code faster than version A? (Only three lines 
different)


Version A: http://pastebin.com/f14561243
Version B: http://pastebin.com/f1f657afc

I don't know but here is the diff for someone that may;

1c1,2
< # This one only took 8 seconds on my machine. Wow?
---
>
> # This one took hours to compute, overnight.
28c29
< powers = [0, 0]
---
> powers = [0] * (num + 1)
32c33
< powers[factor-1] += 1
---
> powers[factor] += 1
35d35
< powers.append(0)
55c55
< n += 1
\ No newline at end of file
---
> n += 1


--
Powered by Gentoo GNU/Linux
http://linuxcrazy.com
--
http://mail.python.org/mailman/listinfo/python-list


Re: finding most common elements between thousands of multiple arrays.

2009-07-05 Thread Scott David Daniels

Scott David Daniels wrote:

... Here's a heuristic replacement for my previous frequency code:
I've tried to mark where you could fudge numbers if the run time
is at all close.


Boy, I cannot let go.  I did a bit of a test checking for cost to
calculated number of discovered samples, and found after:
import timeit
import numpy
original = numpy.random.random(0, 100, (1000, 1000)).astype(int)
data = original.flatten()
data.sort()
part = data[::100]
t = timeit.Timer('sum(part[:-1]==part[1:])',
 'from __main__ import part')
v = timeit.Timer('len(part[part[:-1]==part[1:]])',
 'from __main__ import part')

I got:
>>> t.repeat(3, 10)
[0.58319842326318394, 0.57617574300638807, 0.57831819407238072]
>>> v.repeat(3, 1000)
[0.93933027801040225, 0.93704535073584339, 0.94096260837613954]

So, len(part[mask]) is almost 50X faster!  I checked:
>>> sum(part[:-1]==part[1:])
9393
>>> len(part[part[:-1]==part[1:]])
9393

That's an awful lot of matches, so I with high selectivity:
data = original.flatten()  # no sorting, so runs missing
part = data[::100]

>>> t.repeat(3, 10)
[0.58641335700485797, 0.58458854407490435, 0.58872594142576418]
>>> v.repeat(3, 1000)
[0.27352554584422251, 0.27375686015921019, 0.27433291102624935]

about 200X faster

>>> len(part[part[:-1]==part[1:]])
39
>>> sum(part[:-1]==part[1:])
39

So my new version of this (compressed) code:

...
sampled = data[::stride]
matches = sampled[:-1] == sampled[1:]
candidates = sum(matches) # count identified matches
while candidates > N * 10: # 10 -- heuristic
stride *= 2 # # heuristic increase
sampled = data[::stride]
matches = sampled[:-1] == sampled[1:]
candidates = sum(matches)
while candidates < N * 3: # heuristic slop for long runs
stride //= 2 # heuristic decrease
sampled = data[::stride]
matches = sampled[:-1] == sampled[1:]
candidates = sum(matches)
former = None
past = 0
for value in sampled[matches]:
...

is:
  ...
  sampled = data[::stride]
  candidates = sampled[sampled[:-1] == sampled[1:]]
  while len(candidates) > N * 10: # 10 -- heuristic
  stride *= 2 # # heuristic increase
  sampled = data[::stride]
  candidates = sampled[sampled[:-1] == sampled[1:]]
  while len(candidates) < N * 3: # heuristic slop for long runs
  stride //= 2 # heuristic decrease
  sampled = data[::stride]
  candidates = sampled[sampled[:-1] == sampled[1:]]
  former = None
  past = 0
  for value in candidates:
  ...
This change is important, for we try several strides before
settling on a choice, meaning the optimization can be valuable.
This also means we could be pickier at choosing strides (try
more values), since checking is cheaper than before.

Summary: when dealing with numpy, (or any bulk <-> individual values
transitions), try several ways that you think are equivalent and
_measure_.  In the OODB work I did we called this "impedance mismatch,"
and it is likely some boundary transitions are _much_ faster than
others.  The sum case is one of them; I am getting numpy booleans
back, rather than numpy booleans, so conversions aren't going fastpath.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: memoization module?

2009-07-05 Thread Daniel Fetchinson
> Is there a memoization module for Python?  I'm looking for something
> like Mark Jason Dominus' handy Memoize module for Perl.

The Python Cookbook has several examples:

http://www.google.com/search?q=python+memoize&sitesearch=code.activestate.com

HTH,
Daniel




-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown
-- 
http://mail.python.org/mailman/listinfo/python-list


Why is my code faster with append() in a loop than with a large list?

2009-07-05 Thread Xavier Ho
(Here's a short version of the long version below if you don't want to
read:)

Why is version B of the code faster than version A? (Only three lines
different)

Version A: http://pastebin.com/f14561243
Version B: http://pastebin.com/f1f657afc



I was doing the problems on Project Euler for practice with Python last
night. Problem 12 was to find the value of the first triangular number that
has over 500 divisors.
=

The sequence of triangle numbers is generated by adding the natural numbers.
So the 7[image: ^(]th[image: )] triangle number would be 1 + 2 + 3 + 4 + 5 +
6 + 7 = 28. The first ten terms would be:

1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...

Let us list the factors of the first seven triangle numbers:

* 1*: 1
* 3*: 1,3
* 6*: 1,2,3,6
*10*: 1,2,5,10
*15*: 1,3,5,15
*21*: 1,3,7,21
*28*: 1,2,4,7,14,28

We can see that 28 is the first triangle number to have over five divisors.

What is the value of the first triangle number to have over five hundred
divisors?
=

My initial code was to loop through from 1 to half the number and see which
were divisors, and as I find them I store them in a list. That would have
taken days.

My second try was factorising the number each time, and count the divisors
using the powers of each factor, plus 1, and multiply together.
The code is here (Version A): http://pastebin.com/f14561243

This worked, but it took overnight to compute. Before I went to bed a friend
of mine caught me online, and apparently left me a working version under 8
seconds with only 3 line difference.
The code is here (Version B): http://pastebin.com/f1f657afc

That was amazing. But I have no idea why his edit makes it so much faster. I
did a test to see whether if append() was faster (which I doubted) than
defining a list with a large size to begin with, and I was right:
http://pastebin.com/f4b46d0db
Which shows that appending is 40x slower, and was expected. But I still
can't puzzle out why his use of appending in Version B was so much faster
than mine.

Any insights would be welcome. I'm going on a family trip, though, so my
replies may delay.

Best regards,

Ching-Yun "Xavier" Ho, Technical Artist

Contact Information
Mobile: (+61) 04 3335 4748
Skype ID: SpaXe85
Email: cont...@xavierho.com
Website: http://xavierho.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wrapping comments

2009-07-05 Thread Lawrence D'Oliveiro
In message , Michael 
Torrie wrote:

> Lawrence D'Oliveiro wrote:
>
>> I tried using Emacs via SSH from a Mac once. Made me run screaming for
>> the nearest Windows box
>> 
.
> 
> Interesting rant, but the problem is with the key bindings they chose to
> use in Terminal.app program.

Except for control-space, which was defined systemwide to bring up 
Spotlight. So you can't blame the Terminal app for that. The brain damage 
was more widespread than just one program.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-05 Thread Paul Rubin
Simon Forman  writes:
> BTW, Paul, kind of a tangent: I reimplemented the same algorithm but
> using tuples instead of instances (and empty tuples for "NULL"
> values.)  I was trying to mess around in the space you seemed to
> indicate existed, i.e. a better implementation using other datatypes,
> but I didn't have a clear idea what I was doing and, as I said, I
> started by simply re-implementing with a different datatype.
> 
> Much to my surprise and delight, I discovered the tuple-based BTree
> was /already/ a "persistent data type"!  It was both awesome and a bit
> of an anti-climax. :]

Cool ;-).  It also seems to me a bit irregular to require every tree
to have a node with optional children, rather than allowing trees to be
compleely empty.  I think the irregularity complicated the code somewhat.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-05 Thread Paul Rubin
Steven D'Aprano  writes:
> > but I don't accept that "somethingness"
> > vs. "nothingness" is the same distinction as truth vs falsehood.
> 
> It's the distinction used by Python since the dawn of time. Python only 
> grew a bool type a few versions back.

That's true, part of the situation we have now is an artifact of that
history.

> I'm not talking about the constants True and False (nouns), but about 
> true and false values (adjectives).

But, it seems to me, the constants True and False are the only values
to which the adjectives "true" and "false" should be applicable to.

> 
> > The idea that the "if"
> > statement selects between "somethingness" and "nothingness" rather than
> > between True and False is a bogus re-imagining of the traditional
> > function of an "if" statement 
> 
> There's nothing bogus about it.
> 
> > and has been an endless source of bugs in Python code.
> I wonder why these "endless" bugs weren't important enough to be 
> mentioned in the rationale to PEP 285:

Because adding the bool type doesn't really fix those bugs.

> describing `if x` as the "correct form" and calling scrapping such a
> feature as "crippling the language".

Certainly, changing "if" would have broken an immense amount of code
and been a completely unworkable approach.  We are using a fairly 
mature language by now; it has a culture and history that carries
certain baggage, as one should expect.

> > Look how much confusion it causes here in the newsgroup all the time.
> The only confusion is that you're causing me. Would you care to link to 
> some?

This current discussion (about bools) came from such confusion just a
few posts up in this very thread:

From: upwestdon 
Date: Fri, 3 Jul 2009 23:03:39 -0700 (PDT)
How about just:

if not (self.higher and self.lower):
return self.higher or self.lower

That test was designed to treat None as a boolean False, without
noticing that numeric 0 is also treated as False and could make the
test do the wrong thing.  This is an extremely common type of error.

> > could see some value to having a generic "is_empty" predicate
> We have that already. It's spelled __bool__ or __nonzero__

That's fine, but under the "explicit is better than implicit"
principle, it's preferable to call that predicate explicitly: 
"if bool(x): ..." rather than "if x:".  Also, after many years of
fixing bugs caused by the mushing together of None and False, it seems
to me that we'd have been better off if bool(None) raised an exception
rather than returning false.  None is supposed to denote a nonexistent
value, not a false or empty value.  The only valid way to check
whether something is None should have been "if x is None".  However,
it is of course way too late to do this differently.

> Iterators are a special case, because in general they can't tell if 
> they're exhausted until they try to yield a value.

Right, it would be nice if they supported a lookahead slot, though
that would complicate a lot of __iter__ methods.  There's been various
kludgy attempts to wrap them in ways that support this, though.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wrapping comments

2009-07-05 Thread Michael Torrie
Lawrence D'Oliveiro wrote:
> I tried using Emacs via SSH from a Mac once. Made me run screaming for the 
> nearest Windows box 
> .

Interesting rant, but the problem is with the key bindings they chose to
use in Terminal.app program. Fortunately in leopard most of the problems
are now fixed, or can be configured to work in a non-broken fashion.
The rest of us, in the meantime, all switched to iTerm which, although
had some performance issues, behaved like we all expected terminals to
behave.

As far as terminal hell goes, I regularly find that when ssh-ing to
remote boxes that backspace doesn't work.  Or does in bash (because it's
 smart enough to play games) but not in vim.  Somehow sometimes over ssh
the key bindings for backspace get lost of messed up.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP368 and pixeliterators

2009-07-05 Thread Rhodri James
On Fri, 03 Jul 2009 09:21:09 +0100, Steven D'Aprano  
 wrote:



On Thu, 02 Jul 2009 10:32:04 +0200, Joachim Strömbergson wrote:


for pixel in rgb_image:
# swap red and blue, and set green to 0 pixel.value = pixel.b, 0,
pixel.r


The idea I'm having is that fundamentally the image is made up of a 2D
array of pixels, not rows of pixels.


A 2D array implies rows (and columns) of pixels.


But not necessarily an internal representation in which those rows or
columns are contiguous.  An efficient internal storage format might
well include margins to make transform edge cases easier, or store
the pixel components in separate arrays, or both.  I'd imagine that
quite frequently, the iterator across all pixels will in fact just
be hiding from the programmer the fact that it's really iterating
by row and then by pixel.  At Python's level of abstraction, that's
just fine, but the assumption that an image is made up of a 2D
array of pixels is not safe.

--
Rhodri James *-* Wildebeest Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: pep 8 constants

2009-07-05 Thread Tim Chase

You can get giant piano keyboards that you step on, so how about a giant
computer keyboard? "I wrote 5 miles of code before lunch!" :-)


You can get/make MIDI organ pedal-boards (a friend of mine has two).  From
there it's just one small step...


Is that a two-step?  a box-step?  Count it off...slow, slow, 
quick, quick.  I think I just coded up Conway's Game of Life...


-tkc



--
http://mail.python.org/mailman/listinfo/python-list


Re: pep 8 constants

2009-07-05 Thread Rhodri James
On Fri, 03 Jul 2009 15:52:31 +0100, MRAB   
wrote:



Eric S. Johansson wrote:

Horace Blegg wrote:
I've been kinda following this. I have a cousin who is permanently  
wheel

chair bound and doesn't have perfect control of her hands, but still
manages to use a computer and interact with society. However, the
idea/thought of disabled programmers was new to me/hadn't ever occurred
to me.

You say that using your hands is painful, but what about your feet?
Wouldn't it be possible to rig up some kind of foot peddle for
shift/caps lock? Kinda like the power peddle used with sowing machines,
so the hands are free to hold fabric.

I don't mean this in a condescending manor, and I apologize if you take
it as such. I'm genuinely curious if you think something like this  
could

work.

The way I was envisioning it working last night (and I haven't the
faintest clue how SR works, nor have I ever used SR) was that you would
hit the foot peddle, which would tell the SR program to capitalize the
first letter of the next word (a smart shift, basically, so you don't
end up doing something like ... WONderland -or- "stocks are up 1,0))%
TOday".)

Possible? Stupid?


it's not stupid.
 People have used foot pedals for decades for a variety of controls. I  
don't

think foot pedals would work for me because when I am dictating, I pace.
Standing, sitting, I pace. With a cord headset, I'm forced to stay  
within about
4 feet of the computer. But what I'm using a Bluetooth headset, I will  
sometimes
ramble as far as 10 or 15 feet from the computer. It helps if I make  
the font

larger so I can glance over and see what kind of errors I've gotten.
 I really love a Bluetooth headset with speech recognition. It's so  
liberating.
 Your question about foot pedals makes me think of alternative. would  
it make
sense to have a handheld keyboard which would be used for  
command-and-control
functionality or as an adjunct to speech recognition use? It would have  
to be
designed in such a way that it doesn't aggravate a hand injury which  
may not be

possible. Anyway, just thinking out loud.


You can get giant piano keyboards that you step on, so how about a giant
computer keyboard? "I wrote 5 miles of code before lunch!" :-)


You can get/make MIDI organ pedal-boards (a friend of mine has two).  From
there it's just one small step...
:-)

--
Rhodri James *-* Wildebeest Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Creating alot of class instances?

2009-07-05 Thread Christian Heimes
kk wrote:
> I will be querying some data and create class instances based on the
> data I gather. But the problem as I mentioned is that I do not know
> the names and the number of the end class instances. They will be
> based on the content of the data. So how can I create class instances
> within a loop and when the loop is done how can I figure out the list
> of instances via class membership?  I can track the names by
> introducing another list but I want to understand the class side of
> things.

Do you need an exact number or just a rough statistic? In order to
estimate the number of instances you can query the reference count of
the class. Since every instance usually increases the reference count by
one it provides a good overview. Note that lots of other things like
imports increase the reference count, too.

>>> import sys
>>> class Example(object):
... pass
...
>>> sys.getrefcount(Example)
5
>>> examples = list(Example() for i in range(10))
>>> examples
[<__main__.Example object at 0x7f2e5cd61110>, <__main__.Example object
at 0x7f2e5cd61150>, <__main__.Example object at 0x7f2e5cd61190>,
<__main__.Example object at 0x7f2e5cd611d0>, <__main__.Example object at
0x7f2e5cd61210>, <__main__.Example object at 0x7f2e5cd61250>,
<__main__.Example object at 0x7f2e5cd61390>, <__main__.Example object at
0x7f2e5cd613d0>, <__main__.Example object at 0x7f2e5cd61410>,
<__main__.Example object at 0x7f2e5cd61450>]
>>> sys.getrefcount(Example)
15
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating alot of class instances?

2009-07-05 Thread Tim Chase

For example lets say I have

class MyMaterials:

and my instances might need to look like

material_01
material_02
or
light_01
light_02
or
Mesh_01
Mesh_02 etc


If you do not know how many instances you are going to create from the
beginning, there is no way for you to know which of the instances you mentioned
above will get created. So having names for all of the instances will not help
you since you will never know what names are "safe" to use.

On the other hand, if you have all instances in a list, you can refer to them
by index and you know exactly how many of them you have.

If you would like to get instances by some name you gave to them, maybe
something like this will work:

def get_instance(name):
for instance in instance_list:
if instance.name == name:
return instance
return None


Another option might be to use a counter in the class that keeps 
track of the number of instances:


  class MyMaterial:
instances = 0
def __init__(self, name):
  self.name = name
  self.instance = MyMaterial.instances
  MyMaterial.instances += 1
def __str__(self):
  return "%s_%02i" % (self.name, self.instance)

  m = MyMaterial("Brick")
  print m
  # "Brick_00"
  print repr([MyMaterial("Stone") for _ in range(5)])
  # "[Stone_01, Stone_02, Stone_03, Stone_04, Stone_05]"

It's not thread-safe, but it may do the trick.

-tkc



--
http://mail.python.org/mailman/listinfo/python-list


Re: memoization module?

2009-07-05 Thread Lino Mastrodomenico
2009/7/5 kj :
> Is there a memoization module for Python?  I'm looking for something
> like Mark Jason Dominus' handy Memoize module for Perl.

Check out the "memoized" class example here:

  

-- 
Lino Mastrodomenico
-- 
http://mail.python.org/mailman/listinfo/python-list


memoization module?

2009-07-05 Thread kj


Is there a memoization module for Python?  I'm looking for something
like Mark Jason Dominus' handy Memoize module for Perl.

TIA!

kj

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating alot of class instances?

2009-07-05 Thread Rickard Lindberg
> Thank you soo much for speedy and in detailed help. Your replies
> really cleared out most of the cloud for me. I have one last issue to
> resolve which is something I did not articulate properly, I realize
> now. The last issue is actually automatically naming the instances.
> The reason I was using the "instance_count" is for enumerating the
> actual name of an instance.
>
> For example lets say I have
>
> class MyMaterials:
>
> and my instances might need to look like
>
> material_01
> material_02
> or
> light_01
> light_02
> or
> Mesh_01
> Mesh_02 etc

If you do not know how many instances you are going to create from the
beginning, there is no way for you to know which of the instances you mentioned
above will get created. So having names for all of the instances will not help
you since you will never know what names are "safe" to use.

On the other hand, if you have all instances in a list, you can refer to them
by index and you know exactly how many of them you have.

If you would like to get instances by some name you gave to them, maybe
something like this will work:

def get_instance(name):
for instance in instance_list:
if instance.name == name:
return instance
return None

Note that this might very well return None if no instance with that particular
name was found.

-- 
Rickard Lindberg
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: XML(JSON?)-over-HTTP: How to define API?

2009-07-05 Thread vasudevram
On Jul 3, 1:11 pm, "Diez B. Roggisch"  wrote:
> Allen Fowler schrieb:
>
>
>
>
>
>
>
> >> I have an (in-development) python system that needs to shuttle events / 
> >> requests
> >> around over the network to other parts of itself.   It will also need to
> >> cooperate with a .net application running on yet a different machine.
>
> >> So, naturally I figured some sort of HTTP event / RPC type of would be a 
> >> good
> >> idea?
>
> >> Are there any modules I should know about, or guidelines I could read, that
> >> could aid me in the design of the API?    
>
> > To clarify:
>
> > Each message would be <1KB of data total, and consist of some structured 
> > object containing strings, numbers, dates, etc.
>
> > For instance there would be an "add user" request that would contain one or 
> > more User objects each having a number of properties like:
>
> > - Full Name
> > - Username
> > - Password
> > - Email addresses (a variable length array)
> > - Street Address line1
> > - Street Address line1
> > - Street Address line1
> > - City
> > - State
> > - Zip
> > - Sign Up Date
>
> >  and so on.
>
> > Since I need to work with other platforms, pickle is out...  what are the 
> > alternatives?  XML? JSON?
>
> > How should I formally define each of the valid messages and objects?
>
> > Thank you,
>
> Use XMLRPC. Implementations for both languages are available. There is
> no need for formal spec - which is a good thing. You just call the
> server, and it works.
>
> Diez

I second the suggestion of Diez to use XML-RPC. Very simple to learn
and use. Supports structs (as method arguments and method return
values) which can consist of other data types bundled together, also
supports arrays. Just check whether .NET supports XML-RPC.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating alot of class instances?

2009-07-05 Thread kk
Hi

Thank you soo much for speedy and in detailed help. Your replies
really cleared out most of the cloud for me. I have one last issue to
resolve which is something I did not articulate properly, I realize
now. The last issue is actually automatically naming the instances.
The reason I was using the "instance_count" is for enumerating the
actual name of an instance.

For example lets say I have

class MyMaterials:

and my instances might need to look like

material_01
material_02
or
light_01
light_02
or
Mesh_01
Mesh_02 etc

I will need to get the base names from the "some list" and create the
names accordingly from the list array.

Basically I also need to generate the instance names based on the some
list. For example some_list[2] might denote a name for the instance.

I will study the sampled codes in depth now. Maybe you have answered
the naming issue as well, if so please frgove the noise.

thanks




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Martin v. Löwis
> This is a good test for Python implementation bottlenecks.  Run
> that tokenizer on HTML, and see where the time goes.

I looked at it with cProfile, and the top function that comes up
for a larger document (52k) is
...validator.HTMLConformanceChecker.__iter__.

This method dispatches various validation routines, and it computes
the method names from the input over and over again, doing lots
of redundant string concatenations. It also capitalizes the element
names, even though the spelling in the original document is probably
not capitalized (but either upper-case or lower case).

In my patch below, I create a dictionary of bound methods, indexed
by (syntax) type and name, following the logic of falling back to
just type-based validation if no type/name routine exists. However,
in order to reduce the number of dictionary lookups, it will also
cache type/name pairs (both in the original spelling, and the
capitalized spelling), so that subsequent occurrences of the same
element will hit the method cache.

With this simple optimization, I get a 20% speedup on my test
case. In my document, there are no attributes - the same changes
should be made to attribute validation routines.

I don't think this has anything to do with the case statement.

Regards,
Martin

diff -r 30ba63d28b1b python/src/html5lib/filters/validator.py
--- a/python/src/html5lib/filters/validator.py	Fri Jul 03 17:47:34 2009 +0300
+++ b/python/src/html5lib/filters/validator.py	Sun Jul 05 21:10:06 2009 +0200
@@ -18,6 +18,7 @@
 # Import from the sets module for python 2.3
 from sets import Set as set
 from sets import ImmutableSet as frozenset
+import re
 import _base
 import iso639codes
 import rfc3987
@@ -265,19 +266,45 @@
 self.thingsThatDefineAnID = []
 self.thingsThatPointToAnID = []
 self.IDsWeHaveKnownAndLoved = []
+self.validate_type = {}
+self.validate_type_name = {}
+r = re.compile("^validate([A-Z][^A-Z]+)([A-Z][^A-Z]+)?$")
+for name in dir(self):
+m = r.match(name)
+if not m: continue
+method = getattr(self, name)
+if m.group(2):
+d = self.validate_type_name.setdefault(m.group(1), {})
+d[m.group(2)] = method
+else:
+self.validate_type[m.group(1)] = method
 
 def __iter__(self):
-types = dict((v,k) for k,v in tokenTypes.iteritems())
 for token in _base.Filter.__iter__(self):
-fakeToken = {"type": types.get(token.get("type", "-"), "-"),
- "name": token.get("name", "-").capitalize()}
-method = getattr(self, "validate%(type)s%(name)s" % fakeToken, None)
+t = token.get("type", "-")
+n = token.get("name", "-")
+try:
+# try original name spelling
+method = self.validate_type_name[t][n]
+except KeyError:
+# try capitalization
+cn = n.capitalize()
+try:
+method = self.validate_type_name[t][cn]
+# also cache original spelling
+self.validate_type_name[t][n] = method
+except KeyError:
+# No name-specific validateion, try type-specific one
+try:
+method = self.validate_type[t]
+# cache as name-specific as well
+self.validate_type_name[t][cn] = method
+self.validate_type_name[t][n] = method
+except KeyError:
+# no validation available
+method = None
 if method:
 for t in method(token) or []: yield t
-else:
-method = getattr(self, "validate%(type)s" % fakeToken, None)
-if method:
-for t in method(token) or []: yield t
 yield token
 for t in self.eof() or []: yield t
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating alot of class instances?

2009-07-05 Thread Tim Chase

The solution might be dead simple but I just cannot figure out at the
moment.

For example this is what I need in the simplest form

class myclass():
 def __init__(self,name):
 self.name=name

for count,data in enumerate(some list):
  instance_count=myclass()
  instance_count.name=data

print instances


Sounds like a use for a list:

  instances = []
  for count, data in enumerate(some_list):
# 1) camel-case is preferred for classnames
# 2) since your __init__() expects the name
# pass it in, instead of setting it later
instance = MyClass(data)
instances.append(instance)

This can be written in a slightly more condensed-yet-readable 
list-comprehension form as:


  instances = [MyClass(data) for data in some_list]

You then have a list/array of instances you can print:

  print instances

or pick off certain items from it:

  print instances[42]

-tkc



--
http://mail.python.org/mailman/listinfo/python-list


Re: Creating alot of class instances?

2009-07-05 Thread Andre Engels
On 7/5/09, kk  wrote:

>  I am new to Python classes and trying to figure out this particular
>  issue here. I will need to create instances of a class. But at the
>  moment I do not know how many instances I will end up having, in every
>  case it might be different. Most of the documents I read makes this
>  simpl class-student analogy to explain python classes which is fine.
>  But in those examples the number and the names of the instances were
>  known and limited

That's no problem. The only limit to the number of instances of a
class you can create is your memory - and not even that if you don't
need to 'keep' the instances.

>  I will be querying some data and create class instances based on the
>  data I gather. But the problem as I mentioned is that I do not know
>  the names and the number of the end class instances. They will be
>  based on the content of the data. So how can I create class instances
>  within a loop and when the loop is done how can I figure out the list
>  of instances via class membership?  I can track the names by
>  introducing another list but I want to understand the class side of
>  things.
>
>  The solution might be dead simple but I just cannot figure out at the
>  moment.
>
>  For example this is what I need in the simplest form
>
>  class myclass():
>   def __init__(self,name):
>   self.name=name
>
>  for count,data in enumerate(some list):
>   instance_count=myclass()
>   instance_count.name=data
>
>  print instances

Okay, to solve your problem, we add a list containing all the instances:

class myclass():
def __init__(self,name):
self.name=name

instances = []

for count,data in enumerate(some list):
instance_count=myclass()
instance_count.name=data
instances.append(instance_count)

print instances

=

However, that won't work because myclass has an __init__ with 2
attributes, so you will have to call it using an attribute:

class myclass():
def __init__(self,name):
self.name=name

instances = []

for count,data in enumerate(some list):
instance_count=myclass(data)
instances.append(instance_count)

print instances

=

This works, but it can be done better:

First we notice that count is not used at all, so why create it?


class myclass():
def __init__(self,name):
self.name=name

instances = []

for data in some list:
instance_count=myclass(data)
instances.append(instance_count)

print instances

=

Then, the variable instance_count is created once, then used in the
next line. We can do that at once:


class myclass():
def __init__(self,name):
self.name=name

instances = []

for data in some list:
instances.append(myclass(data))

print instances



Finally, "print instances" does not give very nice looking
information, so I would change this to:

class myclass():
def __init__(self,name):
self.name=name

instances = []

for data in some list:
instances.append(myclass(data))

print (instance.name for instance in instances)

-- 
André Engels, andreeng...@gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-05 Thread Terry Reedy

Paul Rubin wrote:


I don't know what a furphy is, but I don't accept that "somethingness"
vs. "nothingness" is the same distinction as truth vs falsehood.  True
and False are values in a specific datatype (namely bool), not
abstract qualities of arbitrary data structures.  The idea that the
"if" statement selects between "somethingness" and "nothingness"
rather than between True and False is a bogus re-imagining of the
traditional function of an "if" statement and has been an endless
source of bugs in Python code.  Look how much confusion it causes here
in the newsgroup all the time.


You appear to be confusing a specific interpretation of an abstraction 
with the abstraction itself. Or perhaps better, you seem to be confusing 
a specific example of a general process with the general process.


A boolean variable is a variable that is in one of two states -- a 
binary variable -- a variable that carries one bit of information. The 
two states are the marked state and the unmarked or default state. Which 
is to say, when we draw a distinction to distinguish two states, we mark 
one of them to distinguish one from the other. The if statement tests 
whether an object is in the marked state (or not).


Truth and falsity of propositions are one possible interpretation of 
marked and unmarked, giving us propositional logic.  But they are only 
one of many. So are in and out of a particular class and member or not 
of a set or subclass or not of a set, giving us class and set logic. So 
are closed and open, said of gates or switches, or on and off, giving us 
switching logic. So are non-zero and zero, said of numbers. Or done and 
not done, said of an algorithmic process.


Counts 0 and 1, and their representations '0' and '1', taken in 
themselves, are as good as any distinct pair as a pair of labels for the 
two distinct states. They have some computational advantages, including 
making it easy to count the number of objects in a collection in the 
marked state (and, given the total number, the number in the unmarked 
state). They have one disadvantage, though. If I say 'x = 1', do I mean 
1 versus 0 or 1 versus all possible ints? Similarly, If 'print(x)' 
prints 1, does it mean 1 versus 0 or 1 versus all other ints? 
Recognizing this, Guido decided to subclass them and give them alternate 
names. He could have chosen 'Marked' and 'Unmarked', or any of several 
other pairs, but did choose the conventional 'True' and 'False', 
referring to the common propositional interpretation. However, he 
specifically disclaimed any intention to restrict 'if' to testing 
specific logic propositions, as opposed to the general proposition 
'object is in the marked state'.


Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Creating alot of class instances?

2009-07-05 Thread kk
Hi

I am new to Python classes and trying to figure out this particular
issue here. I will need to create instances of a class. But at the
moment I do not know how many instances I will end up having, in every
case it might be different. Most of the documents I read makes this
simpl class-student analogy to explain python classes which is fine.
But in those examples the number and the names of the instances were
known and limited.

I will be querying some data and create class instances based on the
data I gather. But the problem as I mentioned is that I do not know
the names and the number of the end class instances. They will be
based on the content of the data. So how can I create class instances
within a loop and when the loop is done how can I figure out the list
of instances via class membership?  I can track the names by
introducing another list but I want to understand the class side of
things.

The solution might be dead simple but I just cannot figure out at the
moment.

For example this is what I need in the simplest form

class myclass():
 def __init__(self,name):
 self.name=name

for count,data in enumerate(some list):
  instance_count=myclass()
  instance_count.name=data

print instances


thanks



-- 
http://mail.python.org/mailman/listinfo/python-list


Method to separate unit-test methods and data?

2009-07-05 Thread Nick Daly
Hi,

I was wondering if it's possible / if there are any simple methods
known of storing unit-test functions and their data in separate files?

Perhaps this is a strange request, but it does an excellent job of
modularizing code.  As far as revision control goes, it makes it
easier to discern between new or changed test cases, and changes in
the test data.

I've worked through this idea a bit and actually have a nearly working
model.  For example, when testing a class, I have a test module, which
contains a class for testing each method of the class.  This allows me
to generalize each method's parameters into each class, which can then
be overridden by the config file's data, something like as follows
(with a really arbitrary example, Python 2.5 code):


Demo code (midpoint.py):
===

class Midpoint(object):

   def __init__(self, a, b):
       self.a = a
       self.b = b

   def mid():
       return (self.a + self.b) / 2.0

   def sum()
       return (self.a + self.b)



Testing Code (test_Midpoint.py):
===

import unittest
import midpoint
import sys, ConfigParser as configparser


# set up the config file that overrides each class's data
config = configparser.SafeConfigParser()
config.read(sys.argv[0])


class test_Midpoint_mid(unittest.TestCase):
   def __init__(self):

       # default testing values
       self.none_values = ((None, 1),
                           (0, None))

       # override the default values with the config file values
       for key, value in config.items(self.__class__.__name__):
           if value:
               setattr(self, key, value)


   # a few tests of the method
   def test_rejectNone(self):
       for tests in self.none_values:
           self.assertRaises(TypeError,
               midpoint.Midpoint(tests[0], tests[1]).mid)

# and repeat the concept for class test_Midpoint_sum



Config Code (test_Midpoint.cfg):
==

# override the default values of the test class's members

[test_Midpoint_mid]
none_values = ((-1, None),
              (None, -12.8))



What I haven't yet figured out how to do though, is properly override
the default class member values with values from the config file.  The
config file's data is loaded as a string instead of as a list, as I'd
want.  This causes all the tests to fail, as while none_values needs
to be interpreted as a list, it is instead understood as:

" ((-1, None),\n               (None, -12.8))"

Does anyone have any solutions for these problems?  First, is there a
known and simple way to separate unit-test data and methods into
separate files?  Secondly, if not, is there a simple way to convert
strings into other base types, like lists, dictionaries, and so forth?
Or, am I going to have to write my own list-parsing methods?  Would
pickling help?  I haven't yet had a chance to look into if or how that
would work...  If there's anything else I can clarify about this
request, feel free to let me know.

Thanks for any help you can provide,
Nick
-- 
http://mail.python.org/mailman/listinfo/python-list


A C++ user's introduction to Python: a really good read

2009-07-05 Thread Dotan Cohen
Here is a C++ KDE programer's take on Python:
http://majewsky.wordpress.com/2009/07/04/python-experiences-or-why-i-like-c-more/

Good read.

-- 
Dotan Cohen

http://what-is-what.com
http://gibberish.co.il
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Clarity vs. code reuse/generality

2009-07-05 Thread wwwayne
On Fri, 03 Jul 2009 14:34:58 GMT, Alan G Isaac 
wrote:

>On 7/3/2009 10:05 AM kj apparently wrote:

=== 8< ===

>2.
>from scipy.optimize import bisect
>def _binary_search(lo, hi, func, target, epsilon):
>def f(x): return func(x) - target
>return bisect(f, lo, high, xtol=epsilon)
>
>3. If you don't want to use SciPy (why?), have them
>implement http://en.wikipedia.org/wiki/Bisection_method#Pseudo-code
>to produce their own `bisect` function.

Of course this isn't really pseudo-code, it's VB code with quite poor
comments: 

'Bisection Method
 
   'Start loop
Do While (abs(right - left) > 2*epsilon)
 
  'Calculate midpoint of domain
  midpoint = (right + left) / 2
 
  'Find f(midpoint)
  If ((f(left) * f(midpoint)) > 0) Then
'Throw away left half
left = midpoint
  Else
'Throw away right half
right = midpoint
  End If
Loop
Return (right + left) / 2

and even just throwing away the VB code and leaving the comments does
not give a good algorithm:

'Bisection Method
 
'Start loop
 
  'Calculate midpoint of domain
 
  'Find f(midpoint)

'Throw away left half

'Throw away right half

A much  better approach to teaching introductory programming in any
language at almost any level is to incorporate some "top down problem
solving", including writing a method of solution (algorithm) in some
reasonably well-defined pseudo-code that can be easily elaborated and
translated into one's target language (and, peferably, some
reasonable-sized subset of related languages). This pseudo-code should
then become comments (or equivalent) for the students to convert to
real code, as in:

Algorithm bisect (f, left, right, epsilon):

# Bisection Method to find a root of a real continuous function f(x):
#Assume f(x) changes sign between f(left) and f(right) and
#   we want a value not further than epsilon from a real root.
#Begin with the domain left...right.

# While the absolute value of (left - right) exceeds 2*epsilon:

# Calculate the midpoint, mid,  of the domain.

# If the product of f(left) and f(mid) is positive:

# Set left to mid;

# Otherwise:

# Set right to mid.

# Return the midpoint of left...right.

===
And adapting this approach to kj's case is straightforward.

Of course, what consitutes a suitable vocabulary and syntax for an
algorithm pseudo-code language depends upon the target language(s),
the tastes of the instructor, and the point in the lesson or course.
My choice is for python+COBOL (as above) initially, soon incorporating
the usual arithmetic and relational operators (and how soon and how
many at once depends upon the level of the students: for an
introductory university/college course in Computer Science or
equivalent, where everyone should have a reasonable background in
mathemtics notation as a prerequisite, this should be very soon and
quite fast), arrays and subscripting, etc.

But if we were to write this algorithm or kj's in python-like
pseudo-code it would already *be* python codeor very close to
it--which is why we should teach intorductory programming in python.
Very soon students would be writing algorithms that required very
little elaboration to be programs.

But without including suitable problem solving and psudo-code
algorithm writing there will soon come a time or an example where
students are trying to think in code instead of in their natural
language and don't have the experience and repertoire to be able to do
that well.

I hope that's not too pedantic or AR?

wayne

>hth,
>Alan Isaac
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Determining if a function is a method of a class within a decorator

2009-07-05 Thread Piet van Oostrum
> David Hirschfield  (DH) wrote:

>DH> Yeah, it definitely seems like having two separate decorators is the
>DH> solution. But the strange thing is that I found this snippet after some
>DH> deep googling, that seems to do something *like* what I want, though I
>DH> don't understand the descriptor stuff nearly well enough to get what's
>DH> happening:

>DH> 
>http://stackoverflow.com/questions/306130/python-decorator-makes-function-forget-that-it-belongs-to-a-class

>DH> answer number 3, by ianb. It seems to indicate there's a way to introspect
>DH> and determine the class that the function is going to be bound to...but I
>DH> don't get it, and I'm not sure it's applicable to my case.

>DH> I'd love an explanation of what is going on in that setup, and if it isn't
>DH> usable for my situation, why not?

What that example does is not getting the name of the class in the
decorator, but in the bound method that is the result of the decorator
when that method is called. This is just done by asking for the class of
the self parameter. Actually they even don't do that in that example but
it could have been done. Note also that there is an error in the code:
keyargs should be kw.

There is also something special in that code: it uses the descriptor
protocol. This is necessary for a method. The descriptor protocol for
methods defines a __get__ method that transforms the unbound method into
a bound method. That code uses this to decorate the generated bound method
object instead of decorating the unbound method.

A side effect of doing the class detection at call time is that you get
the name of the subclass if you use the method on an instance of the
subclass, not the name of the class that the method was defined in:

class D(C):
 pass

D().f(1, 2) will talk about class D, not class C.

So if you would like to do something special for bound methods the
__get__ might be the proper place to do it.
-- 
Piet van Oostrum 
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: p...@vanoostrum.org
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-05 Thread Lie Ryan
Steven D'Aprano wrote:
> On Sun, 05 Jul 2009 11:37:49 +, Lie Ryan wrote:
> 
>> Neither python's `if` nor `if` in formal logic is about testing True vs.
>> False. `if` in python and formal logic receives a statement. The
>> statement must be evaluatable to True or False, but does not have to be
>> True or False themselves. It just happens that True evaluates to True
>> and False evaluates to False.
> 
> I think your explanation is a little confused, or at least confusing.

Indeed, partially because I said "statement" when I really meant
"expression".

> Other languages don't require specific enumerable values, and instead 
> accept (e.g.) any integer, or any object, with rules for how to interpret 
> such values in such a context. 

That was what I was wanting to say, except that I stretched that to
formal logic (mathematical logic). Even in formal logic `if` receives
any arbitrary expression that can be -- according to certain rules --
interpreted as True or False (i.e. the expressions themselves are not
required to be a boolean value).

The conclusion is python's `if` does not deviate from `if`'s semantic in
mathematical sense.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How Python Implements "long integer"?

2009-07-05 Thread Pedram
On Jul 5, 8:32 pm, Pedram  wrote:
> On Jul 5, 8:12 pm, a...@pythoncraft.com (Aahz) wrote:
>
>
>
> > In article 
> > <6f6be2b9-49f4-4db0-9c21-52062d8ea...@l31g2000yqb.googlegroups.com>,
>
> > Pedram   wrote:
>
> > >This time I have a simple C question!
> > >As you know, _PyLong_New returns the result of PyObject_NEW_VAR. I
> > >found PyObject_NEW_VAR in objimpl.h header file. But I can't
> > >understand the last line :( Here's the code:
>
> > >#define PyObject_NEW_VAR(type, typeobj, n) \
> > >( (type *) PyObject_InitVar( \
> > >      (PyVarObject *) PyObject_MALLOC(_PyObject_VAR_SIZE((typeobj),
> > >(n)) ),\
> > >      (typeobj), (n)) )
>
> > >I know this will replace the PyObject_New_VAR(type, typeobj, n)
> > >everywhere in the code and but I can't understand the last line, which
> > >is just 'typeobj' and 'n'! What do they do? Are they make any sense in
> > >allocation process?
>
> > Look in the code to find out what PyObject_InitVar() does -- and, more
> > importantly, what its signature is.  The clue you're missing is the
> > trailing backslash on the third line, but that should not be required if
> > you're using an editor that shows you matching parentheses.
> > --
> > Aahz (a...@pythoncraft.com)           <*>        http://www.pythoncraft.com/
>
> > "as long as we like the same operating system, things are cool." --piranha
>
> No, they wrapped the 3rd line!
>
> I'll show you the code in picture 
> below:http://lh3.ggpht.com/_35nHfALLgC4/SlDVMEl6oOI/AKg/vPWA1gttvHM...
>
> As you can see the PyObject_MALLOC has nothing to do with typeobj and
> n in line 4.

Oh! What a mistake! I got it! they're Py_Object_InitVar
parameters.
Sorry and Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How Python Implements "long integer"?

2009-07-05 Thread Pedram
On Jul 5, 8:12 pm, a...@pythoncraft.com (Aahz) wrote:
> In article 
> <6f6be2b9-49f4-4db0-9c21-52062d8ea...@l31g2000yqb.googlegroups.com>,
>
>
>
> Pedram   wrote:
>
> >This time I have a simple C question!
> >As you know, _PyLong_New returns the result of PyObject_NEW_VAR. I
> >found PyObject_NEW_VAR in objimpl.h header file. But I can't
> >understand the last line :( Here's the code:
>
> >#define PyObject_NEW_VAR(type, typeobj, n) \
> >( (type *) PyObject_InitVar( \
> >      (PyVarObject *) PyObject_MALLOC(_PyObject_VAR_SIZE((typeobj),
> >(n)) ),\
> >      (typeobj), (n)) )
>
> >I know this will replace the PyObject_New_VAR(type, typeobj, n)
> >everywhere in the code and but I can't understand the last line, which
> >is just 'typeobj' and 'n'! What do they do? Are they make any sense in
> >allocation process?
>
> Look in the code to find out what PyObject_InitVar() does -- and, more
> importantly, what its signature is.  The clue you're missing is the
> trailing backslash on the third line, but that should not be required if
> you're using an editor that shows you matching parentheses.
> --
> Aahz (a...@pythoncraft.com)           <*>        http://www.pythoncraft.com/
>
> "as long as we like the same operating system, things are cool." --piranha

No, they wrapped the 3rd line!

I'll show you the code in picture below:
http://lh3.ggpht.com/_35nHfALLgC4/SlDVMEl6oOI/AKg/vPWA1gttvHM/s640/Screenshot.png

As you can see the PyObject_MALLOC has nothing to do with typeobj and
n in line 4.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-05 Thread Steven D'Aprano
On Sun, 05 Jul 2009 06:12:25 -0700, Paul Rubin wrote:

>> There are three natural approaches to (say) re.search() for dealing
>> with failure:
>> 
>> (1) return a sentinel value like None; (2) return a matchobject which
>> tests False; (3) raise an exception.
> 
> 4. Have re.search return a bool and possible matchobject separately:
> 
>put_match_here = []
>if re.search(pat, s, target=put_match_here):
>   do_something_with(put_match_here[0])

Wow. I never for the life of me thought I'd see an experienced Python 
programmer re-implement Pascal's VAR parameters.

That is... hideous. Returning a (flag, matchobject) tuple is the height 
of beauty in comparison.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: generation of keyboard events

2009-07-05 Thread Emile van Sebille

On 7/5/2009 8:56 AM RAM said...

Hi,

I need to start an external program and pass the keyboard events like
F1,Right arrow key etc to the program..I am trying to use the
subprocess module to invoke the external program. I am able to invoke
but not able to generate the keyboard events and pass them on to the
external progam.


If you're on *nix, search for python and expect and you'll find 
something based apps -- I'm not sure what you do for GUIs. On windows, 
I'm sure there are ways of doing this with the win32 extensions, but I 
commonly use msched from www.mjtnet.com.  I create msched scripts from 
within python and use subprocess to invoke msched and execute the python 
generated script.


HTH,

Emile



Please help me in this because I am a beginner.

regards
Sreerama V


--
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-05 Thread Steven D'Aprano
On Sun, 05 Jul 2009 11:37:49 +, Lie Ryan wrote:

> Neither python's `if` nor `if` in formal logic is about testing True vs.
> False. `if` in python and formal logic receives a statement. The
> statement must be evaluatable to True or False, but does not have to be
> True or False themselves. It just happens that True evaluates to True
> and False evaluates to False.

I think your explanation is a little confused, or at least confusing.

`if` implements a two-way branch. Some languages, like Pascal and Java, 
requires the switch value to take one of two specific enumerable values 
conventionally spelled TRUE and FALSE (modulo variations in case).

Other languages don't require specific enumerable values, and instead 
accept (e.g.) any integer, or any object, with rules for how to interpret 
such values in such a context. Forth, for example, branches according to 
whether the word on the stack is zero or non-zero: "nothing" or 
"something". Lisp branches according to empty list or non-empty list: 
"nothing" or "something" again.

Other languages, like Ruby, have less intuitive rules. That's their 
problem.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: generation of keyboard events

2009-07-05 Thread Tim Harig
On 2009-07-05, RAM  wrote:
> I need to start an external program and pass the keyboard events like
> F1,Right arrow key etc to the program..I am trying to use the
> subprocess module to invoke the external program. I am able to invoke
> but not able to generate the keyboard events and pass them on to the

catb.org/esr/faqs/smart-questions.html

You have told us nothing about the environment where you are trying to
accomplish this.  GUI, CLI, Unix, Windows, etc? So I suggest that you
checkout the curses getch functions.  You can find them in the standard
library documentation at http://docs.python.org.  You should also reference
documentation for the C version in your systems man pages.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How Python Implements "long integer"?

2009-07-05 Thread Aahz
In article <6f6be2b9-49f4-4db0-9c21-52062d8ea...@l31g2000yqb.googlegroups.com>,
Pedram   wrote:
>
>This time I have a simple C question!
>As you know, _PyLong_New returns the result of PyObject_NEW_VAR. I
>found PyObject_NEW_VAR in objimpl.h header file. But I can't
>understand the last line :( Here's the code:
>
>#define PyObject_NEW_VAR(type, typeobj, n) \
>( (type *) PyObject_InitVar( \
>  (PyVarObject *) PyObject_MALLOC(_PyObject_VAR_SIZE((typeobj),
>(n)) ),\
>  (typeobj), (n)) )
>
>I know this will replace the PyObject_New_VAR(type, typeobj, n)
>everywhere in the code and but I can't understand the last line, which
>is just 'typeobj' and 'n'! What do they do? Are they make any sense in
>allocation process?

Look in the code to find out what PyObject_InitVar() does -- and, more
importantly, what its signature is.  The clue you're missing is the
trailing backslash on the third line, but that should not be required if
you're using an editor that shows you matching parentheses.
-- 
Aahz (a...@pythoncraft.com)   <*> http://www.pythoncraft.com/

"as long as we like the same operating system, things are cool." --piranha
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How Python Implements "long integer"?

2009-07-05 Thread Pedram
Hello again,
This time I have a simple C question!
As you know, _PyLong_New returns the result of PyObject_NEW_VAR. I
found PyObject_NEW_VAR in objimpl.h header file. But I can't
understand the last line :( Here's the code:

#define PyObject_NEW_VAR(type, typeobj, n) \
( (type *) PyObject_InitVar( \
  (PyVarObject *) PyObject_MALLOC(_PyObject_VAR_SIZE((typeobj),
(n)) ),\
  (typeobj), (n)) )

I know this will replace the PyObject_New_VAR(type, typeobj, n)
everywhere in the code and but I can't understand the last line, which
is just 'typeobj' and 'n'! What do they do? Are they make any sense in
allocation process?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-05 Thread Steven D'Aprano
On Sun, 05 Jul 2009 03:08:16 -0700, Paul Rubin wrote:

> Steven D'Aprano  writes:
>> > Yes, it saves a few keystrokes to say "if x:" instead of "if
>> > len(x)==0:" or even "if bool(x):",
>> 
>> It's not about saving keystrokes -- that's a furphy. It's about
>> encapsulation. Objects are in a better position to recognise when they
>> are "something" (true) or "nothing" (false) than you are.
> 
> I don't know what a furphy is, 

Is your Google broken? *wink*

http://en.wikipedia.org/wiki/Furphy


> but I don't accept that "somethingness"
> vs. "nothingness" is the same distinction as truth vs falsehood.

It's the distinction used by Python since the dawn of time. Python only 
grew a bool type a few versions back.


> True
> and False are values in a specific datatype (namely bool), not abstract
> qualities of arbitrary data structures.  

I'm not talking about the constants True and False (nouns), but about 
true and false values (adjectives).


> The idea that the "if"
> statement selects between "somethingness" and "nothingness" rather than
> between True and False is a bogus re-imagining of the traditional
> function of an "if" statement 

There's nothing bogus about it.

Some languages such as Pascal and Java require a special Boolean type for 
if-branches. Other languages, like Forth, C, Lisp and Ruby do not.

http://en.wikipedia.org/wiki/Boolean_data_type


> and has been an endless source of bugs in Python code.


I wonder why these "endless" bugs weren't important enough to be 
mentioned in the rationale to PEP 285:

http://www.python.org/dev/peps/pep-0285/

You'd think something as vital as "if x Considered Harmful" would have 
made it into the PEP, but no. Instead Guido *explicitly* stated that he 
had no intention of forcing `if` to require a bool, describing `if x` as 
the "correct form" and calling scrapping such a feature as "crippling the 
language".

 
> Look how much confusion it causes here in the newsgroup all the time.

The only confusion is that you're causing me. Would you care to link to 
some?



>> "if len(x) == 0" is wasteful. Perhaps I've passed you a list-like
>> iterable instead of a list, and calculating the actual length is O(N).
> 
> That doesn't happen in any of Python's built-in container types.

And if they were the only types possible in Python, that might be 
relevant.


> I
> could see some value to having a generic "is_empty" predicate on
> containers though, to deal with this situation.

We have that already. It's spelled __bool__ or __nonzero__, and it 
applies to any object, not just containers.


> Your iterable could
> support that predicate.  In fact maybe all iterables should support that
> predicate.  They don't (and can't) all support "len".

Iterators are a special case, because in general they can't tell if 
they're exhausted until they try to yield a value.


>> If you write len(x)==0 Python doesn't complain if x is a dict instead
>> of the list you were expecting. Why is it acceptable to duck-type
>> len(x) but not truth-testing?
> 
> I haven't seen the amount of bugs coming from generic "len" as from
> something-vs-nothing confusion.

Again with these alleged bugs.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


generation of keyboard events

2009-07-05 Thread RAM
Hi,

I need to start an external program and pass the keyboard events like
F1,Right arrow key etc to the program..I am trying to use the
subprocess module to invoke the external program. I am able to invoke
but not able to generate the keyboard events and pass them on to the
external progam. Please help me in this because I am a beginner.

regards
Sreerama V
-- 
http://mail.python.org/mailman/listinfo/python-list


[no subject]

2009-07-05 Thread 13139781969
Title: New Page 1







This message was sent using picture-talk messaging service from MetroPCS.


(Homer & Marge having sex with Bart Simpson)>









(Homer & Marge having sex with Bart Simpson)-- 
http://mail.python.org/mailman/listinfo/python-list


Python and webcam capture delay?

2009-07-05 Thread jack catcher (nick)

Hi,

I'm thinking of using Python for capturing and showing live webcam 
stream simultaneously between two computers via local area network. 
Operating system is Windows. I'm going to begin with VideoCapture 
extension, no ideas about other implementation yet. Do you have any 
suggestions on how short delay I should hope to achieve in showing the 
video? This would be part of a psychological experiment, so I would need 
to deliver the video stream with a reasonable delay (say, below 100ms).

--
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Aahz
In article ,
Hendrik van Rooyen  wrote:
>
>But wait - maybe if he passes an iterator around - the equivalent of
>for char in input_stream...  Still no good though, unless the next call
>to the iterator is faster than an ordinary python call.

Calls to iterators created by generators are indeed faster than an
ordinary Python call, because the stack frame is already mostly set up.
-- 
Aahz (a...@pythoncraft.com)   <*> http://www.pythoncraft.com/

"as long as we like the same operating system, things are cool." --piranha
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problems with using queue in Tkinter application

2009-07-05 Thread Icarus
On Jul 4, 11:24 am, Peter Otten <__pete...@web.de> wrote:
> Icarus wrote:
> > On Jul 4, 3:21 am, Peter Otten <__pete...@web.de> wrote:
> >> Icarus wrote:
> >> > I'm working on a serial protocol analyzer in python.  We have an
> >> > application written by someone else in MFC but we need something that
> >> > is cross platform.  I intended to implement the GUI portion in Tkinter
> >> > but am having trouble.
>
> >> > The idea is that I will read messages from the serial port and output
> >> > them to a Tkinter Text object initially.  Eventually it will have
> >> > other functionality but that's it for the short term.  I've written
> >> > this little test app to experiment with putting things on the GUI via
> >> > a Queue which is polled by the Tkinter loop.
>
> >> > On some machines this code works fine and I get whatever I type in
> >> > displayed in the Text widget.  On others I get errors like this as
> >> > soon as I start it running.
>
> >> > error in background error handler:
> >> > out of stack space (infinite loop?)
> >> > while executing
> >> > "::tcl::Bgerror {out of stack space (infinite loop?)} {-code 1 -level
> >> > 0 -errorcode NONE -errorinfo {out of stack space (infinite loop?)
> >> > while execu..."
>
> >> > I don't understand why on some machines it works exactly as expected
> >> > and on others it acts the same way Tkinter does when I call functions
> >> > directly from outside the Tkinter thread.  Does anyone have any
> >> > suggestions?  The full code as appended below.  Thanks in advance.
>
> >> > [code]
>
> >> > import Queue
>
> >> > class functionQueue:
>
> >> > def __init__(self, root = None, timeout = 250):
>
> >> > self.functionQueue = Queue.Queue()
> >> > self.root = root
> >> > self.timeout = timeout
>
> >> > if(self.root):
> >> > self.pop_function(root)
>
> >> > def pop_function(self, root = None):
>
> >> > try:
> >> > funcArgList = self.functionQueue.get(block = False)
> >> > except Queue.Empty:
> >> > pass
> >> > else:
> >> > try:
> >> > funcArgList[0](*funcArgList[1])
> >> > except:
> >> > try:
> >> > print "Failed to call function", funcArgList[0]
> >> > except:
> >> > print "Failed to call function"
>
> >> > if(root):
> >> > root.after(self.timeout, lambda: self.pop_function
> >> > (self.root))
>
> >> > def add_function(self, function, argList):
>
> >> > try:
> >> > self.functionQueue.put([function, argList])
> >> > except:
> >> > pass
>
> >> > if( __name__ == '__main__'):
>
> >> > import Tkinter
> >> > import thread
>
> >> > text = Tkinter.Text()
> >> > text.pack()
>
> >> > myQueue = functionQueue(text, 50)
>
> >> > def gui_loop():
> >> > try:
> >> > text.mainloop()
> >> > except:
> >> > import os
> >> > os._exit(1)
>
> >> > thread.start_new_thread(text.mainloop, ())
>
> >> > while(True):
> >> > usrInput = raw_input()
>
> >> > if(usrInput == "-1"):
> >> > import os
> >> > os._exit(0)
>
> >> > myQueue.add_function(text.insert, ['end', usrInput + "\n"])
> >> > myQueue.add_function(text.see, ['end'])
>
> >> > [/code]
>
> >> I can make it work over here by putting the UI into the main thread, as
> >> suggested byhttp://effbot.org/zone/tkinter-threads.htm:
>
> >> import Queue
> >> import Tkinter
> >> import threading
>
> >> class FunctionQueue:
> >> # unchanged
>
> >> def input_loop():
> >> while True:
> >> try:
> >> usrInput = raw_input()
> >> except EOFError:
> >> break
> >> myQueue.add_function(text.insert, ['end', usrInput + "\n"])
> >> myQueue.add_function(text.see, ['end'])
> >> myQueue.add_function(text.quit, [])
>
> >> if __name__ == '__main__':
> >> text = Tkinter.Text()
> >> text.pack()
>
> >> myQueue = FunctionQueue(text, 50)
> >> threading.Thread(target=input_loop).start()
> >> text.mainloop()
>
> >> Peter
>
> > Peter, thanks for the suggestion.  I tried your code exactly on my box
> > and I still get the same results.  As soon as I run the script and
> > every time I click on the Text box I get tcl::Bgerror ... just like I
> > mentioned above.  I'm fairly certain that I'm not calling Tkinter
> > functions from any other thread but it's acting as though I am as soon
> > as I create the input thread.
> > If I comment out the input loop thread everything is fine but of
> > course that's not terribly useful as a logging box.
>
> http://bugs.python.org/issue3835
>
> Could tcl have been built without thread support on the failing machines?
>
> Peter

You had it Peter.  I tried the "import pydoc pydoc.gui()" in the bug
report you referenced and the same thing described there occurred.
After recompiling tcl/tk with --threads-enabled and replacing the
slackware default packages with those everything is working as I
expected.  Thanks for the help.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How Python Implements "long integer"?

2009-07-05 Thread Pablo Torres N.
On Sun, Jul 5, 2009 at 04:57, Mark Dickinson wrote:
> On Jul 5, 8:38 am, Pedram  wrote:
>> Hello,
>> I'm reading about implementation of long ints in Python. I downloaded
>> the source code of CPython and will read the longobject.c, but from
>> where I should start reading this file? I mean which function is the
>> first?
>
> I don't really understand the question:  what do you mean by 'first'?
> It might help if you tell us what your aims are.

I think he means the entry point, problem is that libraries have many.


-- 
Pablo Torres N.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Lino Mastrodomenico
2009/7/5 Hendrik van Rooyen :
> I cannot see how you could avoid a python function call - even if he
> bites the bullet and implements my laborious scheme, he would still
> have to fetch the next character to test against, inside the current state.
>
> So if it is the function calls that is slowing him down, I cannot
> imagine a solution using less than one per character, in which
> case he is screwed no matter what he does.

A simple solution may be to read the whole input HTML file in a
string. This potentially requires lots of memory but I suspect that
the use case by far most common for this parser is to build a DOM (or
DOM-like) tree of the whole document. This tree usually requires much
more memory that the HTML source itself.

So, if the code duplication is acceptable, I suggest keeping this
implementation for cases where the input is extremely big *AND* the
whole program will work on it in "streaming", not just the parser
itself.

Then write a simpler and faster parser for the more common case when
the data is not huge *OR* the user will keep the whole document in
memory anyway (e.g. on a tree).

Also: profile, profile a lot. HTML pages are very strange beasts and
the bottlenecks may be in innocent-looking places!

-- 
Lino Mastrodomenico
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How Python Implements "long integer"?

2009-07-05 Thread Pedram
On Jul 5, 5:04 pm, Mark Dickinson  wrote:

> That's shocking.  Everyone should be English. :-)

Yes, I'm trying :)

> I'd pick one operation (e.g., addition), and trace through the
> relevant functions in longobject.c.  Look at the long_as_number
> table to see where to get started.
>
> In the case of addition, that table shows that the nb_add slot is
> given by long_add.  long_add does any necessary type conversions
> (CONVERT_BINOP) and then calls either x_sub or x_add to do the real
> work.
> x_add calls _PyLong_New to allocate space for a new PyLongObject, then
> does the usual digit-by-digit-with-carry addition.  Finally, it
> normalizes
> the result (removes any unnecessary zeros) and returns.
>
> As far as memory allocation goes: almost all operations call
> _PyLong_New at some point.  (Except in py3k, where it's a bit more
> complicated because small integers are cached.)

Oh, I didn't see long_as_number before. I'm reading it. That was very
helpful, thanks.

> If you have more specific questions I'll have a go at answering them.
>
> Mark

Thank you a million.
I will write your name in my "Specially thanks to" section of my
article (In font size 72) ;)

Pedram
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Hendrik van Rooyen
"Paul Rubin"  wrote:

> The series of tests is written that way because there is no case
> statement available.  It is essentially switching on a bunch of
> character constants and then doing some additional tests in each
> branch.
> 
> It could be that using ord(c) as an index into a list of functions
> might be faster than a dict lookup on c to get a function.  I think
> John is hoping to avoid a function call and instead get an indexed
> jump within the Python bytecode for the big function.

I agree about the ord(c). However, avoiding the function call
is, I think, right now, in the realms of wishful thinking.

I cannot see how you could avoid a python function call - even if he
bites the bullet and implements my laborious scheme, he would still 
have to fetch the next character to test against, inside the current state.

So if it is the function calls that is slowing him down, I cannot
imagine a solution using less than one per character, in which
case he is screwed no matter what he does.

But wait - maybe if he passes an iterator around - the equivalent of
for char in input_stream...
Still no good though, unless the next call to the iterator is faster
than an ordinary python call.

- Hendrik



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Hendrik van Rooyen
"Steven D'Aprano"  wrote:

>On Sun, 05 Jul 2009 10:12:54 +0200, Hendrik van Rooyen wrote:
>
>> Python is not C.
>
>John Nagle is an old hand at Python. He's perfectly aware of this, and 
>I'm sure he's not trying to program C in Python.
>
>I'm not entirely sure *what* he is doing, and hopefully he'll speak up 
>and say, but whatever the problem is it's not going to be as simple as 
>that.

I am well aware that John is not a newbie.
He was complaining about Python's lack of a case statement
in the context of a state machine.

The point I was trying to make is that, if any state machine
is examined, then, if you examine any one state, the reasons
for leaving it ("state transitions") is always a subset of the
choices that _can_ be made.

So that drawing a circle round each state in a state diagram, 
and making a routine to examine the arrows leaving that circle,
and returning the destination point of the chosen arrow,
is a way of splitting the job up, and results in making only 
the relevant decisions at the time of their relevance.

This is in contrast to the classic C way of making one big
case statement to implement a finite state machine, which
gets its efficiency (if any) out of compiler optimisations
such as replacing a skip chain with a jump table.

I understand that it leads to a lot of what looks like 
boilerplate code, but he was looking for speed...

- Hendrik



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-05 Thread Paul Rubin
Steven D'Aprano  writes:
> > I wouldn't say Python's None is terrible,...
> No, wait, I tell I lie... re.search() sometimes bites me, because 
> sometimes it returns None and sometimes it returns a matchobject and I 
> don't use re often enough to have good habits with it yet.

re is a common source of this type of bug but there are others.
> There are three natural approaches to (say) re.search() for dealing with 
> failure:
> 
> (1) return a sentinel value like None;
> (2) return a matchobject which tests False;
> (3) raise an exception.

4. Have re.search return a bool and possible matchobject separately:

   put_match_here = []
   if re.search(pat, s, target=put_match_here):
  do_something_with(put_match_here[0])

or alternatively (cleaner), have a new type of object which supports
search operations while self-updating with the match object:

   mt = re.match_target()
   ...
   if mt.search(pat, s):
  do_something_with(mt.match)
   if mt.search(pat2, s):
  do_another_thing_with(mt.match)
   ...

This is sort of inspired by what Perl does.  I often do something like
this because it makes it cleaner to chain a series of matches.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How Python Implements "long integer"?

2009-07-05 Thread Mark Dickinson
On Jul 5, 1:09 pm, Pedram  wrote:
> Thanks for reply,
> Sorry I can't explain too clear! I'm not English ;)

That's shocking.  Everyone should be English. :-)

> But I want to understand the implementation of long int object in
> Python. How Python allocates memory and how it implements operations
> for this object?

I'd pick one operation (e.g., addition), and trace through the
relevant functions in longobject.c.  Look at the long_as_number
table to see where to get started.

In the case of addition, that table shows that the nb_add slot is
given by long_add.  long_add does any necessary type conversions
(CONVERT_BINOP) and then calls either x_sub or x_add to do the real
work.
x_add calls _PyLong_New to allocate space for a new PyLongObject, then
does the usual digit-by-digit-with-carry addition.  Finally, it
normalizes
the result (removes any unnecessary zeros) and returns.

As far as memory allocation goes: almost all operations call
_PyLong_New at some point.  (Except in py3k, where it's a bit more
complicated because small integers are cached.)

If you have more specific questions I'll have a go at answering them.

Mark
-- 
http://mail.python.org/mailman/listinfo/python-list


ANN: A new version of the Python module which wraps GnuPG has been released.

2009-07-05 Thread Vinay Sajip
A new version of the Python module which wraps GnuPG has been
released.

What Does It Do?

The gnupg module allows Python programs to make use of the
functionality provided by the Gnu Privacy Guard (abbreviated GPG or
GnuPG). Using this module, Python programs can encrypt and decrypt
data, digitally sign documents and verify digital signatures, manage
(generate, list and delete) encryption keys, using proven Public Key
Infrastructure (PKI) encryption technology based on OpenPGP.

This module is expected to be used with Python versions >= 2.4, as it
makes use of the subprocess module which appeared in that version of
Python. Development and testing has been carried out on Windows and
Ubuntu. This module is a newer version derived from earlier work by
Andrew Kuchling, Richard Jones and Steve Traugott.

A test suite using unittest is included with the source distribution.

Simple usage:

>>> import gnupg
>>> gpg = gnupg.GPG(gnupghome='/path/to/keyring/directory')
>>> gpg.list_keys()
[{
  ...
  'fingerprint': 'F819EE7705497D73E3CCEE65197D5DAC68F1AAB2',
  'keyid': '197D5DAC68F1AAB2',
  'length': '1024',
  'type': 'pub',
  'uids': ['', 'Gary Gross (A test user) ']},
 {
  ...
  'fingerprint': '37F24DD4B918CC264D4F31D60C5FEFA7A921FC4A',
  'keyid': '0C5FEFA7A921FC4A',
  'length': '1024',
  ...
  'uids': ['', 'Danny Davis (A test user) ']}]
>>> encrypted = gpg.encrypt("Hello, world!", ['0C5FEFA7A921FC4A'])
>>> str(encrypted)
'-BEGIN PGP MESSAGE-\nVersion: GnuPG v1.4.9 (GNU/Linux)\n
\nhQIOA/6NHMDTXUwcEAf
...
-END PGP MESSAGE-\n'
>>> decrypted = gpg.decrypt(str(encrypted), passphrase='secret')
>>> str(decrypted)
'Hello, world!'
>>> signed = gpg.sign("Goodbye, world!", passphrase='secret')
>>> verified = verified = gpg.verify(str(signed))
>>> print "Verified" if verified else "Not verified"
'Verified'

For more information, visit http://code.google.com/p/python-gnupg/ -
as always, your feedback is most welcome (especially bug reports,
patches and suggestions for improvement). Enjoy!

Cheers

Vinay Sajip
Red Dove Consultants Ltd.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PSP Caching

2009-07-05 Thread Johnson Mpeirwe
Thanks Simon,

I got around this behavior by adding "MaxRequestsPerChild  1" (default value
of this is 0) to my httpd.conf to limit the number of requests a child server
process will handle before it dies but I think it is important to keep it 0 in
production environment.

Regards,
Johnson

On Fri, 3 Jul 2009 10:44:52 -0700 (PDT), Simon Forman wrote
> On Jul 3, 5:18 am, Johnson Mpeirwe  wrote:
> > Hello All,
> >
> > How do I stop caching of Python Server Pages (or whatever causes changes
> > in a page not to be noticed in a web browser)? I am new to developing
> > web applications in Python and after looking at implementations of PSP
> > like Spyce (which I believed introduces new unnecessary non-PSP syntax),
> > I decided to write my own PSP applications from scratch. When I modify a
> > file, I keep getting the old results until I intentionally introduce an
> > error (e.g parse error) and correct it after to have the changes
> > noticed. There's no proxy (I am working on a windows machine unplugged
> > from the network). I have Googled and no documents seem to talk about
> > this. Is there any particular mod_python directive I must set in my
> > Apache configuration to fix this?
> >
> > Any help will be highly appreciated.
> >
> > Johnson
> 
> I don't know much about caching with apache, but the answer mght be 
> on this page: http://httpd.apache.org/docs/2.2/caching.html
> 
> Meanwhile, couldn't you just send apache a restart signal when you
> modify your code?
> 
> HTH,
> ~Simon
> -- 
> http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is code duplication allowed in this instance?

2009-07-05 Thread David Robinow
On Sun, Jul 5, 2009 at 5:54 AM, Lawrence
D'Oliveiro wrote:
> In message  c1d1c62d6...@y17g2000yqn.googlegroups.com>, Klone wrote:
>
>> So in this scenario is it OK to duplicate the algorithm to be tested
>> within the test codes or refactor the method such that it can be used
>> within test codes to verify itself(??).
>
> I think you should be put on the management fast-track.
 Heavens, no. He's too valuable as a managee.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How Python Implements "long integer"?

2009-07-05 Thread Pedram
On Jul 5, 1:57 pm, Mark Dickinson  wrote:
> On Jul 5, 8:38 am, Pedram  wrote:
>
> > Hello,
> > I'm reading about implementation of long ints in Python. I downloaded
> > the source code of CPython and will read the longobject.c, but from
> > where I should start reading this file? I mean which function is the
> > first?
>
> I don't really understand the question:  what do you mean by 'first'?
> It might help if you tell us what your aims are.
>
> In any case, you probably also want to look at the Include/
> longintrepr.h and Include/longobject.h files.
>
> Mark

Thanks for reply,
Sorry I can't explain too clear! I'm not English ;)
But I want to understand the implementation of long int object in
Python. How Python allocates memory and how it implements operations
for this object?
Although, I'm reading the source code (longobject.c and as you said,
longintrepr.h and longobject.h) but if you can help me, I really
appreciate that.

Pedram
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Stefan Behnel
John Nagle wrote:
>Here's some actual code, from "tokenizer.py".  This is called once
> for each character in an HTML document, when in "data" state (outside
> a tag).  It's straightforward code, but look at all those
> dictionary lookups.
> 
> def dataState(self):
> data = self.stream.char()
> 
> # Keep a charbuffer to handle the escapeFlag
> if self.contentModelFlag in\
>   (contentModelFlags["CDATA"], contentModelFlags["RCDATA"]):
> if len(self.lastFourChars) == 4:
> self.lastFourChars.pop(0)
> self.lastFourChars.append(data)
> 
> # The rest of the logic
> if data == "&" and self.contentModelFlag in\
>   (contentModelFlags["PCDATA"], contentModelFlags["RCDATA"]) and
> not\
>   self.escapeFlag:
> self.state = self.states["entityData"]
> elif data == "-" and self.contentModelFlag in\
>   (contentModelFlags["CDATA"], contentModelFlags["RCDATA"]) and
> not\
>   self.escapeFlag and "".join(self.lastFourChars) == "":
> self.escapeFlag = False
> self.tokenQueue.append({"type": "Characters", "data":data})
> elif data == EOF:
> # Tokenization ends.
> return False
> elif data in spaceCharacters:
> # Directly after emitting a token you switch back to the "data
> # state". At that point spaceCharacters are important so
> they are
> # emitted separately.
> self.tokenQueue.append({"type": "SpaceCharacters", "data":
>   data + self.stream.charsUntil(spaceCharacters, True)})
> # No need to update lastFourChars here, since the first
> space will
> # have already broken any  sequences
> else:
> chars = self.stream.charsUntil(("&", "<", ">", "-"))
> self.tokenQueue.append({"type": "Characters", "data":
>   data + chars})
> self.lastFourChars += chars[-4:]
> self.lastFourChars = self.lastFourChars[-4:]
> return True

Giving this some more thought, I'd also try is to split the huge
if-elif-else block like this:

if data in string_with_all_special_characters:
if data == '&' ...:
...
else:
...

So there are three things to improve:

- eliminate common subexpressions which you know are constant
- split the large conditional sequence as shown above
- use separate dataState() methods when inside and outside of CDATA/RCDATA
  blocks and (maybe) escaped blocks

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Paul McGuire
On Jul 5, 3:12 am, "Hendrik van Rooyen"  wrote:
>
> Use a dispatch dict, and have each state return the next state.
> Then you can use strings representing state names, and
> everybody will be able to understand the code.
>
> toy example, not tested, nor completed:
>
> protocol = {"start":initialiser,"hunt":hunter,"classify":classifier,other
> states}
>
> def state_machine():
> next_step = protocol["start"]()
> while True:
> next_step = protocol[next_step]()
>

I've just spent about an hour looking over this code, with a few
comments to inject to the thread here:

- To all those suggesting the OP convert to a dispatch table, be
assured that this code is well aware of this idiom.  It is used
HEAVILY at a macro level, picking through the various HTML states
(starting a tag, reading attributes, reading body, etc.).  There still
are a number of cascading if-elif's within some of these states, and
some of them *may* be candidates for further optimization.

- There is an underlying HTMLInputStream that seems to be doing some
unnecessary position bookkeeping (positionLine and positionCol).
Commenting this out increases my test speed by about 13%.  In my
ignorance, I may be removing some important behavior, but this does
not seem to be critical as I tested against a few megs of HTML
source.  Before blaming the tokenizer for everything, there may be
more performance to be wrung from the input stream processor.  For
that matter, I would guess that about 90% of all HTML files that this
code would process would easily fit in memory - in that case, the
stream processing (and all of the attendant "if I'm not at the end of
the current chunk" code) could be skipped/removed entirely.

- The HTMLInputStream's charsUntil code is an already-identified
bottleneck, and some re enhancements have been applied here to help
out.

- Run-time construction of tuple literals where the tuple members are
constants can be lifted out.  emitCurrentToken rebuilds this tuple
every time it is called (which is a lot!):

if (token["type"] in (tokenTypes["StartTag"], tokenTypes
["EndTag"], tokenTypes["EmptyTag"])):

Move this tuple literal into a class constant (or if you can tolerate
it, a default method argument to gain LOAD_FAST benefits - sometimes
optimization isn't pretty).

- These kinds of optimizations are pretty small, and only make sense
if they are called frequently.  Tallying which states are called in my
test gives the following list in decreasing frequency.  Such a list
would help guide your further tuning efforts:

tagNameState194848
dataState   182179
attributeNameState  116507
attributeValueDoubleQuotedState 114931
tagOpenState105556
beforeAttributeNameState58612
beforeAttributeValueState   58216
afterAttributeValueState58083
closeTagOpenState   50547
entityDataState 1673
attributeValueSingleQuotedState 1098
commentEndDashState 372
markupDeclarationOpenState  370
commentEndState 364
commentStartState   362
commentState362
selfClosingStartTagState359
doctypePublicIdentifierDoubleQuotedState291
doctypeSystemIdentifierDoubleQuotedState247
attributeValueUnQuotedState 191
doctypeNameState32
beforeDoctypePublicIdentifierState  16
afterDoctypePublicIdentifierState   14
afterDoctypeNameState   9
doctypeState8
beforeDoctypeNameState  8
afterDoctypeSystemIdentifierState   6
afterAttributeNameState 5
commentStartDashState   2
bogusCommentState   2

For instance, I wouldn't bother doing much tuning of the
bogusCommentState.  Anything called fewer than 50,000 times in this
test doesn't look like it would be worth the trouble.


-- Paul

(Thanks to those who suggested pyparsing as an alternative, but I
think this code is already beyond pyparsing in a few respects.  For
one thing, this code works with an input stream, in order to process
large HTML files; pyparsing *only* works with an in-memory string.
This code can also take advantage of some performance short cuts,
knowing that it is parsing HTML; pyparsing's generic classes can't do
that.)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-05 Thread Lie Ryan
Paul Rubin wrote:
> Steven D'Aprano  writes:
>> "if len(x) == 0" is wasteful. Perhaps I've passed you a list-like 
>> iterable instead of a list, and calculating the actual length is O(N). 
> 
> That doesn't happen in any of Python's built-in container types.  I
> could see some value to having a generic "is_empty" predicate on
> containers though, to deal with this situation.  Your iterable could
> support that predicate.  In fact maybe all iterables should support
> that predicate.  They don't (and can't) all support "len".

That doesn't happen because all python's built-in container keep track
of its own length and are able to quickly determine its own length. But
outside builtins, certain iterables cannot determine its own length
quickly, e.g. iterators, but may have alternative ways to determine
whether itself is empty (through a "private" attributes). If you peek
through this private attribute, you're breaking encapsulation. Although
python is a language of consenting adults and doesn't really have a real
private, codes that breaks encapsulation is prone to bugs.

>>> Yes, it saves a few keystrokes to say "if x:" instead of "if
>>> len(x)==0:" or even "if bool(x):",
>> It's not about saving keystrokes -- that's a furphy. It's about
>> encapsulation. Objects are in a better position to recognise when
>> they are "something" (true) or "nothing" (false) than you are.
>
> I don't know what a furphy is, but I don't accept that "somethingness"
> vs. "nothingness" is the same distinction as truth vs falsehood.  True
> and False are values in a specific datatype (namely bool), not
> abstract qualities of arbitrary data structures.  The idea that the
> "if" statement selects between "somethingness" and "nothingness"
> rather than between True and False is a bogus re-imagining of the
> traditional function of an "if" statement and has been an endless
> source of bugs in Python code.  Look how much confusion it causes here
> in the newsgroup all the time.

Neither python's `if` nor `if` in formal logic is about testing True vs.
False. `if` in python and formal logic receives a statement. The
statement must be evaluatable to True or False, but does not have to be
True or False themselves. It just happens that True evaluates to True
and False evaluates to False. For example, the statement:

P := `e = 2.7182818284590451`
Q := `e = m*c**2`
--
P -> Q

P -> Q evaluates to:
`e = 2.7182818284590451` -> `e = m*c**2`

Note that `e = 2.7182818284590451` is a statement, not a boolean value.
The truth value of `e = 2.7182818284590451` is determined by "calling"
(note the double quotes) `e = 2.7182818284590451`.statement_is_true(),
which when written in python syntax becomes: (e ==
2.7182818284590451).__bool__()

>> If you write len(x)==0 Python doesn't complain if x is a dict
>> instead of the list you were expecting. Why is it acceptable to
>> duck-type len(x) but not truth-testing?
> 
> I haven't seen the amount of bugs coming from generic "len" as from
> something-vs-nothing confusion.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Stefan Behnel
Paul Rubin wrote:
> Stefan Behnel writes:
>> You may notice that the creation of this exact tuple appears in almost all
>> if the conditionals of this method. So it is part of the bottleneck.
> 
> I don't think so.  The tuple is only created when the character has
> already matched, and for the vast majority of the chars in the input
> stream (ordinary text chars rather than html delimiters) none of them
> match.

Well, it's the second thing that happens when entering the method, it
happens several times later on when specific characters are matched, and it
also happens at the end when none of the special characters did match. So
it /is/ part of the bottleneck because the dict lookups, the tuple
creation, the "in" test and the tuple deallocation happen *twice* for
almost all characters in the stream.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Paul Rubin
Stefan Behnel  writes:
> You may notice that the creation of this exact tuple appears in almost all
> if the conditionals of this method. So it is part of the bottleneck.

I don't think so.  The tuple is only created when the character has
already matched, and for the vast majority of the chars in the input
stream (ordinary text chars rather than html delimiters) none of them
match.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-05 Thread Paul Rubin
Steven D'Aprano  writes:
> > Yes, it saves a few keystrokes to say "if x:" instead of "if
> > len(x)==0:" or even "if bool(x):", 
> 
> It's not about saving keystrokes -- that's a furphy. It's about 
> encapsulation. Objects are in a better position to recognise when they 
> are "something" (true) or "nothing" (false) than you are.

I don't know what a furphy is, but I don't accept that "somethingness"
vs. "nothingness" is the same distinction as truth vs falsehood.  True
and False are values in a specific datatype (namely bool), not
abstract qualities of arbitrary data structures.  The idea that the
"if" statement selects between "somethingness" and "nothingness"
rather than between True and False is a bogus re-imagining of the
traditional function of an "if" statement and has been an endless
source of bugs in Python code.  Look how much confusion it causes here
in the newsgroup all the time.

> "if len(x) == 0" is wasteful. Perhaps I've passed you a list-like 
> iterable instead of a list, and calculating the actual length is O(N). 

That doesn't happen in any of Python's built-in container types.  I
could see some value to having a generic "is_empty" predicate on
containers though, to deal with this situation.  Your iterable could
support that predicate.  In fact maybe all iterables should support
that predicate.  They don't (and can't) all support "len".

> If you write len(x)==0 Python doesn't complain if x is a dict
> instead of the list you were expecting. Why is it acceptable to
> duck-type len(x) but not truth-testing?

I haven't seen the amount of bugs coming from generic "len" as from
something-vs-nothing confusion.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Stefan Behnel
Paul Rubin wrote:
> Stefan Behnel writes:
>>> # Keep a charbuffer to handle the escapeFlag
>>> if self.contentModelFlag in\
>>>   (contentModelFlags["CDATA"], contentModelFlags["RCDATA"]):
>> Is the tuple
>>  (contentModelFlags["CDATA"], contentModelFlags["RCDATA"])
>> constant? If that is the case, I'd cut it out into a class member ...
> 
> I think the main issue for that function comes after that if statement.
> There is a multi-way switch on a bunch of different possible character
> values.  I do agree with you that the first "if" can also be optimized.

You may notice that the creation of this exact tuple appears in almost all
if the conditionals of this method. So it is part of the bottleneck.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Stefan Behnel
Paul Rubin wrote:
> Stefan Behnel writes:
>>> The series of tests is written that way because there is no case
>>> statement available.  It is essentially switching on a bunch of
>>> character constants and then doing some additional tests in each
>>> branch.
>> Although doing some of the tests first and then checking the input
>> conditionally might be faster here.
> 
> That is essentially what happens.  There are a bunch of tests of the
> form
>if data=='<' and [some other stuff]: ...

That's what I meant. Some of the "other stuff" is redundant enough to do it
once at the beginning of the function (or even before entering the
function, by writing specialised methods), i.e. I'd (partially) reverse the
order of the "and" operands.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How Python Implements "long integer"?

2009-07-05 Thread Mark Dickinson
On Jul 5, 8:38 am, Pedram  wrote:
> Hello,
> I'm reading about implementation of long ints in Python. I downloaded
> the source code of CPython and will read the longobject.c, but from
> where I should start reading this file? I mean which function is the
> first?

I don't really understand the question:  what do you mean by 'first'?
It might help if you tell us what your aims are.

In any case, you probably also want to look at the Include/
longintrepr.h and Include/longobject.h files.

Mark
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is code duplication allowed in this instance?

2009-07-05 Thread Lawrence D'Oliveiro
In message , Klone wrote:

> So in this scenario is it OK to duplicate the algorithm to be tested
> within the test codes or refactor the method such that it can be used
> within test codes to verify itself(??).

I think you should be put on the management fast-track.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Paul Rubin
Stefan Behnel  writes:
> > The series of tests is written that way because there is no case
> > statement available.  It is essentially switching on a bunch of
> > character constants and then doing some additional tests in each
> > branch.
> Although doing some of the tests first and then checking the input
> conditionally might be faster here.

That is essentially what happens.  There are a bunch of tests of the
form
   if data=='<' and [some other stuff]: ...

Because of short-circuit evaluation of "and", the additional tests
only happen once the character has been matched.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multi thread reading a file

2009-07-05 Thread Lawrence D'Oliveiro
In message <025ff4f1$0$20657$c3e8...@news.astraweb.com>, Steven D'Aprano wrote:

> On Sun, 05 Jul 2009 12:12:22 +1200, Lawrence D'Oliveiro wrote:
> 
>> In message 
>> <1beffd94-cfe6-4cf6-bd48-2ccac8637...@j32g2000yqh.googlegroups.com>, ryles 
>> wrote:
>> 
>> # Oh... yeah. I really *did* want 'is None' and not '== None' which
>> # iter() will do. Sorry guys!
>>> 
>>> Please don't let this happen to you too ;)
>> 
>> Strange. others have got told off for using "== None" instead of "is
>> None" 
>> ,
>> and yet it turns out Python itself does exactly the same thing.
> 
> That's not "strange", that's a bug.

It's not a bug, as Gabriel Genellina has pointed out.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Paul Rubin
Stefan Behnel  writes:
> > # Keep a charbuffer to handle the escapeFlag
> > if self.contentModelFlag in\
> >   (contentModelFlags["CDATA"], contentModelFlags["RCDATA"]):
> Is the tuple
>   (contentModelFlags["CDATA"], contentModelFlags["RCDATA"])
> constant? If that is the case, I'd cut it out into a class member ...

I think the main issue for that function comes after that if statement.
There is a multi-way switch on a bunch of different possible character
values.  I do agree with you that the first "if" can also be optimized.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Stefan Behnel
Paul Rubin wrote:
> Steven D'Aprano writes:
>> Yes, I'm aware of that, but that's not what John's code is doing -- he's 
>> doing a series of if expr ... elif expr tests. I don't think a case 
>> statement can do much to optimize that.
> 
> The series of tests is written that way because there is no case
> statement available.  It is essentially switching on a bunch of
> character constants and then doing some additional tests in each
> branch.

Although doing some of the tests first and then checking the input
conditionally might be faster here.

Another idea: You could exchange the methods whenever self.contentModelFlag
changes, i.e. you'd have a "dataState_CDATA", a "dataState_PCDATA" etc.


> It could be that using ord(c) as an index into a list of functions
> might be faster than a dict lookup on c to get a function.

Rather unlikely, given that calling "ord(c)" involves a dict lookup for
"ord". You might get away with the pre-initialised keywords trick, though.


> I think
> John is hoping to avoid a function call and instead get an indexed
> jump within the Python bytecode for the big function.

Hmm, yes, the actual code inside the conditionals is pretty short, so the
call overhead might hurt here.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Stefan Behnel
Stefan Behnel wrote:
> John Nagle wrote:
>>Here's some actual code, from "tokenizer.py".  This is called once
>> for each character in an HTML document, when in "data" state (outside
>> a tag).  It's straightforward code, but look at all those
>> dictionary lookups.
>>
>> def dataState(self):
>> data = self.stream.char()
>>
>> # Keep a charbuffer to handle the escapeFlag
>> if self.contentModelFlag in\
>>   (contentModelFlags["CDATA"], contentModelFlags["RCDATA"]):
> 
> Is the tuple
> 
>   (contentModelFlags["CDATA"], contentModelFlags["RCDATA"])
> 
> constant? If that is the case, I'd cut it out into a class member (or
> module-local variable) first thing in the morning.

Ah, and there's also this little trick to make it a (fast) local variable
in that method:

def some_method(self, some_const=(1,2,3,4)):
...

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Paul Rubin
Steven D'Aprano  writes:
> Yes, I'm aware of that, but that's not what John's code is doing -- he's 
> doing a series of if expr ... elif expr tests. I don't think a case 
> statement can do much to optimize that.

The series of tests is written that way because there is no case
statement available.  It is essentially switching on a bunch of
character constants and then doing some additional tests in each
branch.

It could be that using ord(c) as an index into a list of functions
might be faster than a dict lookup on c to get a function.  I think
John is hoping to avoid a function call and instead get an indexed
jump within the Python bytecode for the big function.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Adding an object to the global namespace through " f_globals" is that allowed ?

2009-07-05 Thread Stef Mientki

Terry Reedy wrote:

Stef Mientki wrote:

hello,

I need to add an object's name to the global namespace.
The reason for this is to create an environment,
where you can add some kind of math environment,
where no need for Python knowledge is needed.

The next statement works,
but I'm not sure if it will have any dramatical side effects,
other than overruling a possible object with the name A

def some_function ( ...) :
 A = object ( ...)
 sys._getframe(1).f_globals [ Name ] = A


global name
name = A

or is name is a string var
globals()[name] = A

great, the last 2 lines works in most cases.
Now to get everything working correctly, I have to use both globals() 
and f_globals()

but everything seems to work perfect now.

thanks for all the hints.
cheers,
Stef

--
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Steven D'Aprano
On Sun, 05 Jul 2009 01:58:13 -0700, Paul Rubin wrote:

> Steven D'Aprano  writes:
>> Okay, we get it. Parsing HTML 5 is a bitch. What's your point? I don't
>> see how a case statement would help you here: you're not dispatching on
>> a value, but running through a series of tests until one passes.
> 
> A case statement switch(x):... into a bunch of constant case labels
> would be able to use x as an index into a jump vector, and/or do an
> unrolled logarithmic (bisection-like) search through the tests, instead
> of a linear search.

Yes, I'm aware of that, but that's not what John's code is doing -- he's 
doing a series of if expr ... elif expr tests. I don't think a case 
statement can do much to optimize that.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Steven D'Aprano
On Sun, 05 Jul 2009 10:12:54 +0200, Hendrik van Rooyen wrote:

> Python is not C.

John Nagle is an old hand at Python. He's perfectly aware of this, and 
I'm sure he's not trying to program C in Python.

I'm not entirely sure *what* he is doing, and hopefully he'll speak up 
and say, but whatever the problem is it's not going to be as simple as 
that.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Stefan Behnel
John Nagle wrote:
> Paul Rubin wrote:
>> John Nagle  writes:
>>> Python doesn't have a "switch" or "case" statement, and when
>>> you need a state machine with many states, that makes for painful,
>>> slow code.  ...
>>> There's a comment in the code that it would be useful
>>> to run a few billion lines of HTML through an instrumented version
>>> of the parser to decide in which order the IF statements should be
>>> executed.  You shouldn't have to do that.
>>
>> In that particular program it would probably be better to change those
>> if/elif/elif/else constructs to dictionary lookups.  I see the program
>> already does that for some large tables.
> 
>A dictionary lookup (actually, several of them) for every
> input character is rather expensive.

Did you implement this and prove your claim in benchmarks? Taking a look at
the current implementation, I'm pretty sure a dict-based implementation
would outrun it in your first try.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question of style

2009-07-05 Thread Steven D'Aprano
On Sat, 04 Jul 2009 23:17:21 -0700, Paul Rubin wrote:

> Steven D'Aprano  writes:
>> Certain people -- a tiny minority -- keep trying to argue that the
>> ability to say "if obj" for arbitrary objects is somehow a bad thing,
>> and their arguments seem to always boil down to: "If you write code
>> that assumes that only bools have a truth value, then surprising things
>> will happen because all objects have a truth value."
> 
> I'd put it under the general rubric of "explicit is better than
> implicit".  

"if x" is explicit. It's difficult to see how a branch could be anything 
other than explicit, but never mind.


> The language shouldn't do silly automatic typecasts all over
> the place.  

"if x" doesn't involve a typecast. Python doesn't have typecasts, except 
possibly for the special case of myobject.__class__ = Another_Class.

If you mean Python shouldn't do silly automatic type conversions all over 
the place, I absolutely agree with you! Fortunately, testing the truth 
value of an object isn't a silly automatic type conversion.


> Yes, it saves a few keystrokes to say "if x:" instead of "if
> len(x)==0:" or even "if bool(x):", 


It's not about saving keystrokes -- that's a furphy. It's about 
encapsulation. Objects are in a better position to recognise when they 
are "something" (true) or "nothing" (false) than you are.

Given an arbitrary object x, how do you know if it's something or 
nothing? In general, you can't tell -- but the object can, provided it's 
well written. (The conspicuous exception is iterators, but that's a 
special case.)

"if len(x) == 0" is wasteful. Perhaps I've passed you a list-like 
iterable instead of a list, and calculating the actual length is O(N). 
Why spend all that effort to find out the length of the object is 
59,872,819 only to throw that away? My object knows when it's empty, but 
instead you make foolish assumptions about the object and consequently 
write wastefully slow code.


> but if I program in a style where I
> like to think I know the type of something when I use it, I'd like the
> interpreter to let me know when I'm wrong instead of proceeding
> silently.

Oh come on now... that's a silly objection. If you want strongly-typed 
variables in Python, say so, don't pretend this is a failure of truth-
testing. If you write len(x)==0 Python doesn't complain if x is a dict 
instead of the list you were expecting. Why is it acceptable to duck-type 
len(x) but not truth-testing?


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Stefan Behnel
John Nagle wrote:
>Here's some actual code, from "tokenizer.py".  This is called once
> for each character in an HTML document, when in "data" state (outside
> a tag).  It's straightforward code, but look at all those
> dictionary lookups.
> 
> def dataState(self):
> data = self.stream.char()
> 
> # Keep a charbuffer to handle the escapeFlag
> if self.contentModelFlag in\
>   (contentModelFlags["CDATA"], contentModelFlags["RCDATA"]):

Is the tuple

(contentModelFlags["CDATA"], contentModelFlags["RCDATA"])

constant? If that is the case, I'd cut it out into a class member (or
module-local variable) first thing in the morning. And I'd definitely keep
the result of the "in" test in a local variable for reuse, seeing how many
times it's used in the rest of the code.

Writing inefficient code is not something to blame the language for.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Paul Rubin
Steven D'Aprano  writes:
> Okay, we get it. Parsing HTML 5 is a bitch. What's your point? I don't 
> see how a case statement would help you here: you're not dispatching on a 
> value, but running through a series of tests until one passes. 

A case statement switch(x):... into a bunch of constant case labels
would be able to use x as an index into a jump vector, and/or do an
unrolled logarithmic (bisection-like) search through the tests,
instead of a linear search.
-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   >