from:"David Bolen"

Re: Linux users: please run gui tests

2015-08-07 Thread David Bolen

Terry Reedy  writes:

> and report here python version, linux system, and result.
> Alteration of environment and locale is a known issue, skip that.

Using source builds on my slave (bolen-ubuntu):

Linux buildbot-ubuntu 4.1.0-x86_64-linode59 #1 SMP Mon Jun 22 10:39:23 EDT 2015 
x86_64 x86_64 x86_64 GNU/Linux
NOTE: This is a 32-bit userspace system, just with a 64-bit kernel

Python 3.6.0a0 (default:e56893df8e76, Aug  7 2015, 16:36:30) 
[GCC 4.8.4] on linux

[1/3] test_tk
[2/3] test_ttk_guionly
[3/3] test_idle
All 3 tests OK.

Python 3.5.0b4+ (3.5:b9a0165a3de8, Aug  7 2015, 16:21:51) 
[GCC 4.8.4] on linux

[1/3] test_tk
[2/3] test_ttk_guionly
[3/3] test_idle
All 3 tests OK.

Python 3.4.3+ (3.4:f5069e6e4229, Aug  7 2015, 16:38:53) 
[GCC 4.8.4] on linux

[1/3] test_tk
[2/3] test_ttk_guionly
[3/3] test_idle
All 3 tests OK.

I have also adjusted the slave to run under Xvfb so the tests should
be included going forward.

-- David
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python FTP timeout value not effective

2013-09-02 Thread David Bolen

John Nagle  writes:

> Here's the relevant code:
>
> TIMEOUTSECS = 60  ## give up waiting for server after 60 seconds
> ...
> def urlopen(url,timeout=TIMEOUTSECS) :
> if url.endswith(".gz") :  # gzipped file, must decompress first
> nd = urllib2.urlopen(url,timeout=timeout) # get connection
>   ... # (NOT .gz FILE, DOESN'T TAKE THIS PATH)
> else :
>   return(urllib2.urlopen(url,timeout=timeout)) # (OPEN FAILS)
>
>
> TIMEOUTSECS used to be 20 seconds, and I increased it to 60. It didn't
> help.

I apologize if it's an obvious question, but is there any possibility that
the default value to urlopen is not being used, but some other timeout is
being supplied?  Or that somehow TIMEOUTSECS is being redefined before
being used by the urlopen definition?  Can you (or have you) verified the
actual timeout parameter value being supplied to urllib2.urlopen?

The fact that you seem to still be timing out very close to the prior 20s
timeout value seems a little curious, since there's no timeout by default
(barring an application level global socket default), so it feels like a
value being supplied.

Not sure which 2.7 you're using, but I tried the below with both 2.7.3 and
2.7.5 on Linux since they were handy, and the timeout parameter seems to be
working properly at least in a case I can simulate (xxx is a firewalled
host so the connection attempt just gets black-holed until the timeout):

 >>> import time, urllib2
 >>> def test(timeout):
 ...   print time.ctime()
 ...   try:
 ... urllib2.urlopen('ftp://xxx', timeout=timeout)
 ...   except:
 ... print 'Error'
 ...   print time.ctime()
 ... 
 >>> test(5)
 Mon Sep  2 17:36:15 2013
 Error
 Mon Sep  2 17:36:20 2013
 >>> test(20)
 Mon Sep  2 17:36:23 2013
 Error
 Mon Sep  2 17:36:44 2013
 >>> test(60)
 Mon Sep  2 17:36:50 2013
 Error
 Mon Sep  2 17:37:50 2013

It's tougher to simulate a host that artificially delays the connection
attempt but then succeeds, so maybe it's an issue related specifically to
that implementation.  Depending on how the delay is implemented (delaying
SYN response versus accepting the connection but just delaying the welcome
banner, for example), I suppose it may be tickling some very specific bug.

Since all communication essentially boils down to I/O over the socket, it
seems to me likely that those cases should still fail over time periods
related to the timeout supplied, unlike your actual results, which makes me
wonder about the actual urllib2.urlopen timeout parameter.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Migrate from Access 2010 / VBA

2012-11-27 Thread David Bolen

kgard  writes:

> I am the lone developer of db apps at a company of 350+
> employees. Everything is done in MS Access 2010 and VBA. I'm
> frustrated with the limitations of this platform and have been
> considering switching to Python. I've been experimenting with the
> language for a year or so, and feel comfortable with the basics.
(...)
> Has anyone here made this transition successfully? If so, could you
> pass along your suggestions about how to do this as quickly and
> painlessly as possible?

I went through a very similar transition a few years ago from
standalone Access databases (with GUI forms, queries and reports, as
well as replication) to a pure web application with full reporting
(albeit centrally designed and not a report designer for users).

I suppose my best overall suggestion is to migrate the data first and
independently of any other activities.  Unless your uses for Access in
terms of GUI or reporting are extremely limited, don't try to replace
your current system in one swoop, and in particular, be willing to
continue allowing Access as long as necessary for GUI/reports until
you're sure you've matched any current capabilities with an alternate
approach.  For all its warts, as a database GUI and reporting tool,
Access has a lot going for it, and it can be more complex than you may
think to replicate elsewhere.

So the first thing I would suggest is to plan and implement a
migration of the data itself.  In my case I migrated the data from
Access into PostgreSQL.  That process itself took some planning and
testing in terms of moving the data, and then correcting various bits
of the schemas and data types (if I recall, booleans didn't round-trip
properly at first), so was actually a series of conversions until I
was happy, during which time everyone was using Access as usual.

To support the migration, I created a mirror Access database to the
production version, but instead of local Jet tables, I linked all the
tables to the PostgreSQL server. All other aspects of the Access
database (e.g., forms, reports, queries) remained the same, just now
working off of the remote data.  This needed testing too - for
example, some multi-level joining in Access queries can be an issue.
In some cases it was easier for me to migrate selected Access query
logic into a database view and then replace the query in Access to use
the view.  You also need to (painfully) set any UI aspects of the
table definitions manually since the linking process doesn't set that
up, for which I used the original Access db as a model.  I ended up doing
that multiple times as I evolved the linked database, and I'll admit that
was seriously tedious.

While not required, I also wrapped up my new linked Access database
into a simple installer (InnoSetup based in my case).  Prior to this
everyone was just copying the mdb file around, but afterwards I had an
installer they just ran to be sure they had the latest version.

If you do this part carefully, for your end users, aside from
installing the new database, they see absolutely no difference, but
you now have easy central access to the data, and most importantly can
write other applications and tools against it without touching the
Access side.  It turns Access into just your GUI and reporting tool.

If you have power users that make local changes they can continue to
design additional queries or reports in their own local mdb against
the linked tables.  They'll need some extra support for updates
though, either instructions to re-link, or instructions on exporting
and importing their local changes into a newly installed version of
your master mdb.

Having done this, you are then free to start implementing, for
example, a web based application to start taking over functionality.
The nice thing is that you need not replicate everything at once, you
can start slow or with the most desirable features, letting Access
continue to handle the less common or more grungy legacy stuff at
first.  There are innumerable discussions on best web and application
frameworks, so probably not worth getting into too much.  In my case
I'm using a CherryPy/Genshi/SQLAlchemy/psycopg2 stack.

As long as you still have Access around, you'll have to take it into
consideration with schema changes, but that's not really that much
harder than any other schema migration management.  It's just another
client to the database you can run in parallel as long as you wish.
If you do change the schema, when done, just load your master Access
database, update the links, and rebuild/redistribute the installer to
your users.  Many changes (e.g., new columns with defaults) can be
backwards compatible and avoid forced upgrades.

You can operate both systems in parallel for a while even for similar
functionality (for testing if nothing else), but can then retire
functionality from Access as the web app supports it.  Ideally this
will be organic by your users preferring the web.  Selecting when to
drop Access entirely c

Re: logging time format millisecond precision decimalsign

2012-07-20 Thread David Bolen

"Alex van der Spek"  writes:

> I use this formatter in logging:
>
> formatter = logging.Formatter(fmt='%(asctime)s \t %(name)s \t %(levelname)s
> \t %(message)s')
>
> Sample output:
>
> 2012-07-19 21:34:58,382   root   INFO   Removed - C:\Users\ZDoor\Documents
>
> The time stamp has millisecond precision but the decimal separator is a
> comma.
>
> Can I change the comma (,) into a period (.) and if so how?

I do it by:

  1. Replacing the default date format string to exclude ms.
  2. Including %(msecs)03d in the format string where appropriate.  Using 'd'
 instead of s truncates rather than shows the full float value.

So in your case, I believe that changing your formatter creation to:

  formatter = logging.Formatter(fmt='%(asctime)s.%(msecs)03d \t %(name)s \t 
%(levelname)s \t %(message)s', '%Y-%m-%d %H:%M:%S')

should work.  This uses the same date format as the default, but
without ms, though of course you could also opt to make any other date
format you prefer.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sandboxed Python: memory limits?

2011-04-07 Thread David Bolen

Chris Angelico  writes:

>So I'm hoping to restrict the script's ability to
> consume all of memory, without (preferably) ulimit/rlimiting the
> entire process (which does other things as well). But if it can't be,
> it can't be.

Just wondering, but rather than spending the energy to cap Python's
allocations internally, could similar effort instead be directed at
separating the "other things" the same process is doing?  How tightly
coupled is it?  If you could split off just the piece you need to
limit into its own process, then you get all the OS tools at your
disposal to restrict the resources of that process.

Depending on what the "other" things are, it might not be too hard to
split apart, even if you have to utilize some IPC mechanism to
coordinate among the two pieces.  Certainly might be of the same order
of magnitude of tweaking Python to limit memory internally.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Date Parsing Question

2010-09-03 Thread David Bolen

Gavin  writes:

> python-dateutil seems to work very well if everything is in English,
> however, it does not seem to work for other languages and the
> documentation does not seem to have any information about locale
> support.

Probably because I don't think there is much built in.  You'll want to
supply your own parserinfo to the parse function.  I haven't had to
parse non-english localized date strings myself yet, but yes, the
default parserinfo used by the module is in English.

Looks like the docs don't get into it too much, but if you review the
parser.py source in the package you can see the default parserinfo
definition.

I would think defining your own (or subclass of the default) and
replacing the WEEKDAYS and MONTHS values would work (you can get
localized lists directly from the calendar module) and maybe adding to
the jump table if you want to parser longer phrases.  At a first
glance, the lexer within the module does seem like there may be some
possible issues with more esoteric encodings or unicode, but just
something to stay aware of.

If you already have a i18n/l10n setup in your application (or need to
have finer grained control than a global locale setting), you could
instead override the lookup methods, though there's a bit more work to
do since the initial lookup tables will probably need to be created in
each of the locales you may wish to switch between.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: time between now and the next 2:30 am?

2010-07-23 Thread David Bolen

Neil Cerutti  writes:

> On 2010-07-23, Jim  wrote:
>> How can I calculate how much time is between now and the next
>> 2:30 am?  Naturally I want the system to worry about leap
>> years, etc.
>
> You need the datetime module. Specifically, a datetime and
> timedelta object.

Although it sounds like the question is to derive the timedelta value,
so it's not known up front.  That's a little trickier since you'd need
to construct the datetime object for "next 2:30 am" to subtract "now"
from to get the delta.  But that requires knowing when the next day
is, thus dealing with month endings.  Could probably use the built-in
calendar module to help with that though.

For the OP, you might also take a peek at the dateutil third party
module, and its relativedelta support, which can simplify the creation
of the "next 2:30 am" datetime object.

Your case could be handled by something like:

from datetime import datetime
from dateutil.relativedelta import relativedelta

target = datetime.now() + relativedelta(days=+1, hour=2, minute=30,
second=0, microsecond=0)
remaining = target - datetime.now()

This ends up with target being a datetime instance for the next day at
2:30am, and remaining being a timedelta object representing the time
remaining, at least as of the moment of its calculation.

Note that relativedelta leaves fields alone that aren't specified, so
since datetime.now() includes down to microseconds, I clear those
explicitly).  Since you really only need the date, you could also use
datetime.date.today() instead as the basis of the calculation and then
not need second/microsecond parameters to relativedelta.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: About problems that I have with learning wxPython in Macintosh

2010-07-16 Thread David Bolen

"ata.jaf"  writes:

> import wx
>
> class MainWindow(wx.Frame) :
>  def __init__(self, parent, title) :
>wx.Frame.__init__(self, parent, title=title, size=(200, 100))
>self.control = wx.TextCtrl(self, style=wx.TE_MULTILINE)
>self.CreateStatusBar()
>
>filemenu = wx.Menu()
>
>filemenu.Append(wx.ID_ABOUT, '&About', ' Information about this
> program.')
>filemenu.AppendSeparator()
>filemenu.Append(wx.ID_EXIT, 'E&xit', ' Terminate the program')
>
>menuBar = wx.MenuBar()
>menuBar.Append(filemenu, '&File')
>self.SetMenuBar(menuBar)
>self.Show(True)
>
> app = wx.App(False)
> frame = MainWindow(None, 'Sample editor')
> app.MainLoop()
>
> The menus doesn't appear in the product.
> Can anyone help me to find a tutorial that is for using wxPython on a
> Mac?

I think the menus are actually working as designed, and they are
present, but just not perhaps what or where you expected.  That's
because some of the standard IDs (e.g., wx.ID_ABOUT) and some names
(e.g., "E&xit") are adjusted under OSX to conform to that platform's
menu standard.  This is actually to your benefit, as you can use the
same wxPython code to get menus on each platform to which users on
that platform will be familiar.

So for example, ID_ABOUT and ID_EXIT are always under the Application
menu (and E&xit becomes &Quit) which is where Mac users expect them to
be.  Mac users would be quite confused if your application exited with
Command-x rather than Command-Q.

See http://wiki.wxpython.org/Optimizing%20for%20Mac%20OS%20X for a
little more information.  There are also a series of methods on wxApp
if you want finer control over this (such as SetMacAboutMenuItemId,
SetMacExitMenuItemId, SetMacPreferencesMenuItemId) but using the
standard ID_* names does it automatically.

If you're looking for your own specific menus, I'd just switch away
from the standard ids and names.  For example, if you switched the
wx.ID_* in the above to -1, you'd see them show up under the File menu
rather than relocated to the Application menu.  Although "E&xit" would
still get replaced with "&Quit".

But if you are in fact setting up a menu for an application exit, I'd
let wxPython do what it's doing, as your application will appear
"normal" to users on the Mac.

I'd also suggest moving over to the wxPython mailing list for followup
questions as there are more folks there familiar with wxPython.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python 2.7 released

2010-07-05 Thread David Bolen

Martineau  writes:

> Some clarification. I meant installed 2.7 on top of 2.6.x. Doing so
> would have interfered with the currently installed version because I
> always install Python in the same directory, one named just "Python",
> to minimize the number of changes I have to make to to other parts of
> the system.

That's fine, you're just making a conscious choice to only support
(yourself) a single version installed at a time.

I tend to need multiple versions around when developing, so I keep a
bunch of versions all installed in separate directories as \Python\x.y
(so I only have a single root directory).  With 2.7, my current box
has 6 Python interpreters (2.4-3.1) installed at the moment.

I use Cygwin (wouldn't try to work on a Windows system without it), so
just use bash aliases to execute the right interpreter, but a batch
file could be used with the cmd interpreter, and you could link GUI
shortcuts to that batch file.

Not sure there's a good solution to your help file link, other than
the existing Start menu links installed per Python version.  Even with
local links you'd probably want separate links per version anyway
since they're different documents.

Of course, since this started by just considering installing it to get
at a single file (which I know was since solved), it's probably an
acceptable use case for violating your standard policy and picking a
different directory name just in this case, and then blowing it away
later. :-)

>I also believe the Windows installer makes registry
> changes that also involve paths to the currently installed version,
> which again, is something I wanted to avoid until I'm  actually ready
> to commit to upgrading.

The path information installed in the registry
(Software\Python\PythonCore under HLKM or HKCU depending on
installation options) is structured according to major.minor release
(e.g., 2.6 vs. 2.7 are distinct), but you're right Windows only
supports one file extension mapping, so by default the last Python to
be installed gets associated with .py/.pyw etc... by default.

But you can optionally disable this during installation.  On the
customize screen showing during installation.  de-select the "Register
Extensions" option, and the active install won't change any existing
mappings and thus have no impact on your current default installation.

> If there are better ways on Windows to accomplish this, I'd like to
> hear about them. I suppose I could use hardlinks or junctions but
> they're not well supported on most versions of Windows.

If you're still using the basic Windows command prompt or GUI links
then a batch file is the simplest way to go.  With something like
Cygwin (which I personally would never do without), then you have a
variety of techniques available including links, shell aliases, etc...

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: What's the matter with docs.python.org?

2010-05-20 Thread David Bolen

Christian Mertes  writes:

> On Mi, 2010-05-19 at 16:42 -0700, Aahz wrote:
>> Also, I think you need to pass the host HTTP header to access
>> docs.python.org
>
> Look, I don't really want to read Python docs via telnet. I basically
> wanted to point out that there is strange behaviour and someone might
> feel responsible and look into it.

I think the point is that if you are going to use telnet as a
diagnostic tool you need to more accurately represent the browser.  I
just tried and using the Host header renders a completely different
response than not (presumably because the server is using virtual
hosting).  With an appropriate "Host: docs.python.org" you get the
actual documentation home page, without it you get the "page has
moved" text you saw.

It may or may not have anything to do with the original problem, but
it probably does explain the response you got when you tried to use
telnet as a test tool.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Extract a bordered, skewed rectangle from an image

2010-05-07 Thread David Bolen

"Paul Hemans"  writes:

> I am wondering whether there are any people here that have experience with 
> openCV and Python. If so, could you either give me some pointers on how to 
> approach this, or if you feel so inclined, bid on the project. There are 2 
> problems:

Can't offer actual services, but I've done image tracking and object
identification in Python with OpenCV so can suggest some approaches.

You might also try the OpenCV mailing list, though it's sometimes
varies wildly in terms of S/N ratio.

And for OpenCV specifically, I definitely recommend the book "Learning
OpenCV" by O'Reilly.  It's really hard to grasp the concepts and
applications of the raw OpenCV calls from the API documentation, and I
found the book (albeit not cheap) helped me out tremendously and was
well worth it.

I'll flip the two questions since the second is quicker to answer.

> How to do this through Python into openCV? I am a newbie to Python, not 
> strong in Maths and ignorant of the usage of openCV.

After trying a few wrappers, the bulk of my experience is with the
ctypes-opencv wrapper and OpenCV 1.x (either 1.0 or 1.1pre).  Things
change a lot with the recent 2.x (which needs C++ wrappers), and I'm
not sure the various wrappers are as stable yet.  So if you don't have
a hard requirement for 2.x, I might suggest at least starting with 1.x
and ctypes-opencv, which is very robust, though I'm a little biased as
I've contributed code to the wrapper.

> How do I get openCV to ignore the contents of the label and just focus on 
> the border?

There's likely no single answer, since multiple mechanisms for
identifying features in an image exist, and you can also derive
additional heuristics based on your own knowledge of the domain space
(your own images).  Without knowing exactly what the border design to
make it easy to detect is, it's hard to say anything definitive.

But in broad strokes, you'll often:

  1. Normalize the image in some way.  This can be to adjust for
 brightness from various scans to make later processing more
 consistent, or to switch spaces (to make color matching more
 effective) or even to remove color altogether if it just
 complicates matters.  You may also mask of entire portions of the
 image if you have information that says they can't possibly be
 part of what you are looking for.
  2. Attempt to remove noise.  Even when portions of an image looks
 like a solid color, at the pixel level there can be may different
 variations in pixel values.  Operations such as blurring or
 smoothing help to average out those values and simplify matching
 entire regions.
  3. Attempt to identify the regions or features of interest.  Here's
 where a ton of algorithms may apply due to your needs, but the
 simplest form to start with is basic color matching.  For edge
 detection (like of your label) convolutions (such as gradient
 detection) might also ideal.  
  4. Process identified regions to attempt to clean them up, if
 possible weakening regions likely to be extraneous, and
 strengthening those more likely to be correct.  Morphology
 operations are one class of processing likely to help here.
  5. Select among features (if more than one) to identify the best
 match, using any knowledge you may have that can be used to
 rank them (e.g., size, position in image, etc...)

My own processing is ball tracking in motion video, so I have some
additional data in terms of adjacent frames that helps me remove
static background information and minimize the regions under
consideration for step 3, but a single image probably won't have
that.  But given that you have scanned documents, there may be other
simplifying rules you can use, like eliminating anything too white or
too black (depending on label color).

My own flow works like:

1. Normalize each frame

   1. Blur the frame (cvSmooth with CV_BLUR, 5x5 matrix).  This
  smooths out the pixel values, improving the color conversion.
   2. Balance brightess (in RGB space).  I ended up just offsetting
  the image a fixed (x,x,x) value to maximize the RGB values.
  Found it worked better doing it in RGB before Lab conversion.
   3. Convert the image to the "Lab" color space.  I used Lab because
  the conversion process was fastest, but when frame rate isn't
  critical, HLS is likely better since hue/saturation are
  completely separate from lightness which makes for easier color
  matching.

2. Identify uninteresting regions in the current frame

   This may not apply to you, but here is where I mask out static
   information from prior background frames, based on difference
   calculations with the current frame, or very dark areas that I
   knew couldn't include what I was interested in.

   In your case, for example, if you know the label is going to show
   up fairly saturated (say it's a solid red or something), you could
   probably eliminate everything that is b

Re: Impersonating a Different Logon

2010-04-07 Thread David Bolen

Kevin Holleran  writes:

> Thanks, I was able to connect to the remote machine.  However, how do
> I query for a very specific key value?  I have to scan hundreds of
> machines and need want to reduce what I am querying.  I would like to
> be able to scan a very specific key and report on its value.

Any remote machine connection should automatically used any cached
credentials for that machine, since Windows always uses the same
credentials for a given target machine.

So if you were to access a share with the appropriate credentials,
using _winreg after that point should work.  I normally use
\\machine\ipc$ (even from the command line) which should always exist.

You can use the wrappers in the PyWin32 library (win32net) to access
and then release the share with NetUseAdd and NetUseDel.

Of course, the extra step of accessing the share might or might not be
any faster than WMI, but it would have a small advantage of not
needing WMI support on the target machine - though that may be a
non-issue nowadays.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Recommend Commercial graphing library

2010-04-07 Thread David Bolen

AlienBaby  writes:

> I'd be grateful for any suggestions / pointers to something useful,

Ignoring the commercial vs. open source discussion, although it was a
few years ago, I found Chart Director (http://www.advsofteng.com/) to
work very well, with plenty of platform and language support,
including Python.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Generic singleton

2010-03-04 Thread David Bolen

Duncan Booth  writes:

> It is also *everywhere* in the Python world. Unlike Java and C++, Python 
> even has its own built-in type for singletons.
>
> If you want a singleton in Python use a module.
>
> So the OP's original examples become:
>
> --- file singleton.py ---
> foo = {}
> bar = []
>
> --- other.py ---
> from singleton import foo as s1
> from singleton import foo as s2
> from singleton import bar as s3
> from singleton import bar as s4
>
> ... and then use them as you wish.

In the event you do use a module as a singleton container, I would
advocate sticking with fully qualified names, avoiding the use of
"from" imports or any other local namespace caching of references.

Other code sharing the module may not update things as expected, e.g.:

import singleton

singleton.foo = {}

at which point you've got two objects around - one in the singleton.py
module namespace, and the s1/s2 referenced object in other.py.

If you're confident of the usage pattern of all the using code, it may
not be critical.  But consistently using "singleton.foo" (or an import
alias like s.foo) is a bit more robust, sticking with only one
namespace to reach the singleton.

-- David

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: listing existing windows services with python

2010-02-16 Thread David Bolen

alex23  writes:

> News123  wrote:
>> What is the best way with python to get a list of all windows services.
>>
>> As a start I would be glad to receive only the service names.
>>
>> However it would be nicer if I could get all the properties of a service
>> as well.
>
> I highly recommend Tim Golden's fantastic WMI module[1].

Another alternative is the win32service module from the pywin32
package (which IMO you'll almost certainly want available when doing
any significant Windows-specific operations) which wraps the native
win32 libraries for enumerating, querying and controlling services.

A simple loop could use EnumServicesStatus to iterate through the
services, OpenService with the SERVICE_QUERY_CONFIG flag to get a
handle to each service, and then QueryServiceConfig to retrieve
configuration information.

Since pywin32 is a relatively thin wrapper over the win32 libraries,
pure MSDN documentation can be used for help with the calls, augmented
by any Python-related information contained in the pywin32
documentation.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Executing Commands From Windows Service

2010-02-09 Thread David Bolen

David Bolen  writes:

> Not from my past experience - the system account (LocalSystem for
> services) can be surprising, in that it's pretty much unlimited access
> to all local resources, but severely limited in a handful of cases,
> one of which is any attempt to access the network.  I can't recall for
> sure if it's an absolute block, or if in some cases you can configure
> around it (e.g., it might use a null session for remote shares which
> can be enabled through the registry on the target machine).  I've
> basically stuck "LocalSystem = no network" in my head from past
> experience.

Given it's been a few years, I decided to try some tests, and the
above is too simplistic.

The LocalSystem account runs without any local Windows credentials
(e.g., not like a logged in user), which has several consequences.
One is that you can't access any network resources that require such
credentials (like shares).  However, there's no sort of firewall
filtering or anything, so plain old TCP/IP connections are fine.
Unless, of course, the client being used also has other needs for
local Windows credentials, independent or as a pre-requisite to the
network operations.

So backing up a bit, the TCP/IP connection that plink is making is not
inherently disabled by running under LocalSystem, but it's certainly
possible that plink is trying to identify the user under which it is
operating to perhaps identify ssh keys or other local resources it
needs to operate.  You might be able to cover this with command line
options (e.g., plink supports "-i" to specify a key file to use), but
you'll also need to ensure that the file you are referencing is
readable by the LocalSystem account.

One of the other responders had a very good point about locating plink
in the first place too.  Services run beneath an environment that is
inherited from the service control manager process, and won't include
various settings that are applied to your user when logged in,
especially things like local path changes, and working directories.
Should you change the system path (via the environment settings),
you'll need to reboot for the service control manager to notice - I
don't think you can restart it without a reboot.

So it's generally safer to be very clear, and absolute when possible,
in a service for paths to external resources.

The prior advice of running the service as an identified user (e.g.,
with local credentials) is still good as it does remove most of these
issues since if you can run the script manually under that user you
know it'll work under service.  But it's not a hard requirement.

If your script is dying such that a top level exception is being
raised you should be able to find it in the application event log.  So
that might give further information on what about the different
environment is problematic.

You can also use the win32traceutil module to help with grabbing debug
output on the fly.  Import the module in your service, which will
implicitly redirect stdout/stderr to a trace buffer.  Run the same
win32traceutil module from the command line in another window.  Then
start the service.  Any stdout/stderr will be reflected in the other
window.  Can't catch everything (suppressed exceptions, or I/O that
doesn't flow through the script's stdout/stderr), but again might help
point in the right direction.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Executing Commands From Windows Service

2010-02-09 Thread David Bolen

T  writes:

> The more testing I do, I think you may be right..I was able to get it
> to work under a local admin account, and it worked under debug mode
> (which would also have been running as this user).  I'm a bit
> surprised though - I was under the assumption that LocalSystem had
> rights to access the network?

Not from my past experience - the system account (LocalSystem for
services) can be surprising, in that it's pretty much unlimited access
to all local resources, but severely limited in a handful of cases,
one of which is any attempt to access the network.  I can't recall for
sure if it's an absolute block, or if in some cases you can configure
around it (e.g., it might use a null session for remote shares which
can be enabled through the registry on the target machine).  I've
basically stuck "LocalSystem = no network" in my head from past
experience.

So you can either install your service to run under your existing
account, or create an account specifically for running your service,
granting that account just the rights it needs.

-- David

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Executing Commands From Windows Service

2010-02-09 Thread David Bolen

T  writes:

> I have a script, which runs as a Windows service under the LocalSystem
> account, that I wish to have execute some commands.  Specifically, the
> program will call plink.exe to create a reverse SSH tunnel.  Right now
> I'm using subprocess.Popen to do so.  When I run it interactively via
> an admin account, all is well.  However, when I'm running it via
> service, no luck.  I'm assuming this is to do with the fact that it's
> trying to run under the LocalSystem account, which is failing.  What
> would be the best way around this?  Thanks!

The LocalSystem account is not, if I recall correctly, permitted to
access the network.

You'll have to install the service to run under some other account that
has appropriate access to the network.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Which version of MSVC?90.DLL's to distribute with Python 2.6 based Py2exe executables?

2009-12-29 Thread David Bolen

Jonathan Hartley  writes:

> I guess I really need an installer. Oh well.

This need not be that much of a hurdle.  Several solutions exist such
as Inno Setup (my personal preference), NSIS, etc... which are not
hard to create a solid installer with.  I suspect your end users will
appreciate it too since your application (even if trivial) will
install/uninstall just like other standard applications.  Combining
py2exe with such an installer is a solid combination for deployment
under Windows.

It could also help you over time since you'll have better control if
needed over how future versions handle updates, can control menus,
shortcuts, etc..  Even if a start menu shortcut just opens up a
console window with your text based application, it's probably easier
for users then telling them to open such a window manually, switch to
the right directory, and start your script.

You can arrange to have the redist installer run from within your
installation script, so it's a one-time hit rather than each time your
application starts.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python PIL and Vista/Windows 7 .. show() not working ...

2009-11-30 Thread David Bolen

Esmail  writes:

> I dug around in the docs and found a named parameter that I can set
> when I
> call show.
>
> Definition: im.show(self, title=None, command=None)
>
> I installed irfanview and specified it/its path in the parameter,
> but that didn't work either. It's really quite puzzling in the
> case of Vista since that's been around for quite a few years now.

But I thought everyone was sticking their fingers in their ears and
humming to try to forget Vista had been released, particularly now
that Windows 7 is out :-)

Perhaps there's an issue with the temporary file location.  I don't
have a Vista system to test on, but the show() operation writes the
image to a temporary file as returned by tempfile.mktemp(), and then
passes the name on to the external viewer.  The viewing command is
handed to os.system() with the filename embedded without any special
quoting.  So if, for example, the temporary location has spaces or
"interesting" characters, it probably won't get parsed properly.

One easy debugging step is probably to add a print just before the
os.system() call that views the image (bottom of _showxv function in
Image.py in my copy of 1.1.6).  That way at least you'll know the
exact command being used.

If that's the issue, there are various ways around it.  You could
patch PIL itself (same function) to quote the filename when it is
constructing the command.  Alternatively, the tempfile module has a
tempdir global you could set to some other temporary directory before
using the show() function (or any other code using tempfile).

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python gui builders

2009-11-18 Thread David Bolen

Simon Hibbs  writes:

> I've had this problem for a few years. I've tried PythonCard,
> WxWidgets with WxDesigner, BoaConstructor, etc. None of them come
> anywhere close to PyQT/QTDesigner.

For me, the killer feature missing from of all of the wx-based
designers is that they require sizer based designs at all stages, not
even permitting a fixed layout up front as a first draft.  Either that
or any case I've found permitting a fixed layout, then didn't permit
turning that easily into a sizer-based layout.

>From an overall design perspective, that was the feature I found most
intriguing in QTDesigner.  I could randomly drop stuff around the
window while doing an initial layout, which is especially helpful when
you aren't quite sure yet how you want the layout to look.  Then you
can select groups of objects and apply the containers to provide for
flexible layout.

I absolutely prefer sizer-based layouts for a final implementation,
but early in the design stages find it more helpful, and freeing, not
to be as tied to the containers.

With that said, for various reasons I still prefer wxPython to Qt, and
at the moment, find wxFormBuilder the best fit for my own designs
(even before the direct Python support, just using XRC).

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: A "terminators' club" for clp

2009-11-15 Thread David Bolen

Terry Reedy  writes:

> r wrote:
>> On Nov 14, 4:59 am, kj  wrote:
>>> But, as I already showed, I'm out of my depth here,
>>> so I'd better shut up.
>>
>> Don't give up so easy! The idea is great, what Paul is saying is that
>> most people who read this group use newsreaders and that has nothing
>> to do with google groups. These guy's have kill filters for just this
>> sort of thing but either way the emails are on their puters so they
>> have to deal with them on an individual basis. It would be nice
>> however to clean up the Google group version and rid it of the plagues
>> of spam infestations.
>
> Anyone with a newsreader can, like me, read gmane.comp.python.general,
> which mirrors python-list, which now filters out much/most of the spam
> on c.l.p from G.g.

The same is true on some (not sure if it qualifies for many) Usenet
servers.  I use news.individual.net for example (for a modest yearly
fee as of a few years ago) and in my experience it does a great job at
filtering spam.  I'm sure there are other services that do as well.  I
don't have to manage any special filters and don't seem to see any of
the stuff in this group, for example, mentioned in this thread.

I do use gmane for a lot of other lists (including python-dev) that
aren't operated as a Usenet newsgroups and it's an excellent service.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Need cleanup advice for multiline string

2009-08-12 Thread David Bolen

Robert Dailey  writes:

> Hey guys. Being a C++ programmer, I like to keep variable definitions
> close to the location in which they will be used. This improves
> readability in many ways. However, when I have a multi-line string
> definition at function level scope, things get tricky because of the
> indents. In this case indents are serving two purposes: For syntax and
> actual text output. The tabs for function scope should not be included
> in the contents of the string. (...)

Personally I'm in the camp that something like this should be hoisted
out of the code path (whether to global scope, a dedicated message
module or configuration file is a design choice).

But if it's going to stay inline, one approach that can maintain some
of the attractive qualities of a triple quoted string is to make use
of the textwrap module:

import textwrap

def RunCommand( commandList ):
   # ...
   if returnCode:
   failMsg = textwrap.dedent('''\
 *
 The following command returned exit code [{:#x}].
 This represents failure of some form. Please review
 the command output for more details on the issue.

 {}
 *
 ''')

which removes any common leading whitespace (must be identical in terms
of any tabs/spaces).

This is still additional run-time processing, and most likely less
efficient than the joining of individual strings, but it does permit a
clean triple-quoted string so IMO is easier to read/maintain in the
source - providing the code indentation level doesn't get in the way
of the desired line length of the string.  You can also choose to
dedent the string a bit (say to the level of "failMsg") if needed
without being forced all the way back to the left margin.

You can also combine textwrap.dedent with some of the other options if
where the strings are defined makes it nicer if they still have some
indentation (say in a global Python module).  In that case, you'd most
likely just process them once when the module was imported, so any
inefficiency in textwrap.dedent is far less important.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Networked Broadcast Messaging

2009-08-11 Thread David Bolen

"squishywaf...@gmail.com"  writes:

> * Machines can come and go. Since messages are not directly sent to a
> specific IP address from our Python script, the messages are simply
> broadcasted to those who are there to listen. If nobody is subscribed
> to the message type being sent, nothing happens.

What sort of delivery guarantees are you looking for if there is in
fact a machine that is trying to listen to a particular message or
message group?  Is it ok if someone is listening for a certain type of
message, is it ok if it misses one that is sent?

If you do simple direct broadcasting (e.g., UDP), you'd need your own
reliability layer above that if you cared if the message actually got
to an intended destination if that destination was present.

If you want better guarantees, you might look into a distributed
message bus like Spread (http://www.spread.org/) or perhaps a
messaging protocol like XMPP (http://xmpp.org/) through its PubSub
extension.  Both have Python interfaces, though I have no personal
experience with either.  But perhaps that will also give you some
terms or starting points for searching for other options.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: fast video encoding

2009-07-31 Thread David Bolen

gregorth  writes:

> I am a novice with video encoding. I found that few codecs support
> gray scale images. Any hints to take advantage of the fact that I only
> have gray scale images?

I don't know that there's any good way around the fact that video
encoding is simply one of the heavier CPU-bound activities you're
likely to encounter.  So I suspect that codec choice (barring letting
quality drop a bit) is going to move the bar less than picking the
beefiest CPU you can.

If I were looking to do this, I'd probably include investigating
pumping the raw camera images into an ffmpeg encoding subprocess and
let it handle all the encoding.  There's about a gazillion options and
a variety of codec options to play with for performance.  You could
grab and store a short video sample from the camera and use it as a
benchmark to compare encoding options.

ffmpeg does have a -pix_fmt option that can be used to indicate the
input pixel type - "gray" would be 8-bit, and result in a 4:0:0 image
with a YUV-based encoder, for example.  Not sure how much, if any,
impact it would have on encoding speed though.

To be honest, with your data rate, I might even consider getting
Python out of the pure encoding path once it starts - depending on the
raw network format delivered by the camera you might be able to have
ffmpeg read directly from it.  Might not be worth it, since the data
transfer is probably I/O bound, but a 640x480x1 stream at 100Hz is
still nothing to sniff at, and even managing the raw data flow in
Python might eat up some CPU that you could better allocate to the
encoding process, or require extra data copies along the way that
would be best to avoid.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Semaphore Techniques

2009-07-28 Thread David Bolen

John D Giotta  writes:

> I'm looking to run a process with a limit of 3 instances, but each
> execution is over a crontab interval. I've been investigating the
> threading module and using daemons to limit active thread objects, but
> I'm not very successful at grasping the documentation.
>
> Is it possible to do what I'm trying to do and if so anyone know of a
> useful example to get started?

Does it have to be built into the tool, or are you open to handling the
restriction right in the crontab entry?

For example, a crontab entry like:

  * * * * * test `pidof -x script.py | wc -w` -ge 4 || /script.py

should attempt to run script.py every minute (adjust period as
required) unless there are already four of them running.  And if pidof
isn't precise enough you can put anything in there that would
accurately check your processes (grep a ps listing or whatever).

This works because if the test expression is true it returns 0 which
terminates the logical or (||) expression.

There may be some variations based on cron implementation (the above
was tested against Vixie cron), but some similar mechanism should be
available.

If you wanted to build it into the tool, it can be tricky in terms of
managing shared state (the count) amongst purely sibling/cooperative
processes.  It's much easier to ensure no overlap (1 instance), but
once you want 'n' instances you need an accurate process-wide counter.
I'm not positive, but don't think Python's built-in semaphores or
shared memory objects are cross-process.  (Maybe something in
multiprocessing in recent Python versions would work, though they may
need the sharing processes to all have been executed from a parent
script)

I do believe there are some third party interfaces (posix_ipc,
shm/shm_wrapper) that would provide access to posix shared-process
objects.  A semaphore may still not work as I'm not sure you can
obtain the current count.  But you could probably do something with
a shared memory counter in conjunction with a mutex of some sort, as
long as you were careful to clean it up on exit.

Or, you could stick PIDs into the shared memory and count PIDs on
a new startup (double checking against running processes to help
protect against process failures without cleanup).

You could also use the filesystem - have a shared directory where each
process dumps its PID, after first counting how many other PIDs are in
the directory and exiting if too many.

Of course all of these (even with a PID check) are risky in the
presence of unexpected failures.  It would be worse with something
like C code, but it should be reasonably easy to ensure that your
script has cleanup code even on an unexpected termination, and it's
not that likely the Python interpreter itself would crash.  Then
again, something external could kill the process.  Ensuring accuracy
and cleanup of shared state can be non-trivial.

You don't mention if you can support a single master daemon, but if
you could, then it can get a little easier as it can maintain and
protect access to the state - you could have each worker process
maintain a socket connection of some sort with the master daemon so it
could detect when they terminate for the count, and it could just
reject such connections from new processes if too many are running
already.  Of course, if the master daemon goes away then nobody would
run, which may or may not be an acceptable failure mode.

All in all, unless you need the scripts to enforce this behavior even
in the presence of arbitrary use, I'd just use an appropriate crontab
entry and move on to other problems :-)

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Why not enforce four space indentations in version 3.x?

2009-07-17 Thread David Bolen

Nobody  writes:

> On Thu, 16 Jul 2009 09:18:47 -0500, Tim Chase wrote:
>
>> Yes, the dictatorial "a tab always equals 8 spaces"
>
> Saying "always" is incorrect; it is more accurate to say that tab stops
> are every 8 columns unless proven otherwise, with the burden of proof
> falling on whoever wants to use something different.

I suspect Tim was referring to the Python tokenizer.  Internally,
barring the existence of one of a few Emacs/vi tab setting commands in
the file, Python always assigns the logical indentation level for a
tab to align with the next multiple-of-8 column.  This is unrelated to
how someone might choose to display such a file.

So mixing tabs and spaces and using a visual display setting of
something other than 8 for the tab size (other than one consistent
with an instruction embedded in the file) can yield a discrepancy
between what is shown on the screen and how the same code is perceived
by the Python compiler.  This in turn may cause errors or code to
execute at different indent levels than expected.  Thus, in general,
such a mixture is a bad idea, and as per this thread, no longer
permitted in a single block in Python 3.x.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Why not enforce four space indentations in version 3.x?

2009-07-15 Thread David Bolen

Miles Kaufmann  writes:

> On Jul 14, 2009, at 5:06 PM, David Bolen wrote:
>> Are you sure?  It seems to restrict them in the same block, but not in
>> the entire file.  At least I was able to use both space and tab
>> indented blocks in the same file with Python 3.0 and 3.1.
>
> It seems to me that, within an indented block, Python 3.1 requires
> that you are consistent in your use of indentation characters *for
> that indentation level*.  For example, the following code seems to be
> allowed:

Um, right - in other words, what I said :-)

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Why not enforce four space indentations in version 3.x?

2009-07-14 Thread David Bolen

John Nagle  writes:

>Python 3 enforces the rule that you can't mix tabs and spaces
> for indentation in the same file.  That (finally) guarantees that
> the indentation you see is what the Python parser sees.  That's
> enough to prevent non-visible indentation errors.

Are you sure?  It seems to restrict them in the same block, but not in
the entire file.  At least I was able to use both space and tab
indented blocks in the same file with Python 3.0 and 3.1.  I suspect
precluding any mixture at all at the file level would be more
intrusive, for example, when trying to combine multiple code sources
in a single file.

Not that this really changes your final point, since the major risk
of a mismatch between the parser vs. visual display is within a single
block.

>It also means that the Python parser no longer has to have
> any concept of how many spaces equal a tab.  So the problem
> is now essentially solved.

"has to have" being a future possibility at this point, since I'm
fairly sure the 3.x parser does technically still have the concept of
a tab size of 8, though now it can be an internal implementation
detail.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: PDF: finding a blank image

2009-07-13 Thread David Bolen

DrLeif  writes:

> What I would like to do is have python detect a "blank" pages in a PDF
> file and remove it.  Any suggestions?

The odds are good that even a blank page is being "rendered" within
the PDF as having some small bits of data due to scanner resolution,
imperfections on the page, etc..  So I suspect you won't be able to just
look for a well-defined pattern in the resulting PDF or anything.

Unless you're using OCR, the odds are good that the scanner is
rendering the PDF as an embedded image.  What I'd probably do is
extract the image of the page, and then use image processing on it to
try to identify blank pages.  I haven't had the need to do this
myself, and tool availability would depend on platform, but for
example, I'd probably try ImageMagick's convert operation to turn the
PDF into images (like PNGs).  I think Gimp can also do a similar
conversion, but you'd probably have to script it yourself.

Once you have an image of a page, you could then use something like
OpenCV to process the page (perhaps a morphology operation to remove
small noise areas, then a threshold or non-zero counter to judge
"blankness"), or probably just something like PIL depending on
complexity of the processing needed.

Once you identify a blank page, removing it could either be with pure
Python (there have been other posts recently about PDF libraries) or
with external tools (such as pdftk under Linux for example).

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Suppressing Implicit Chained Exceptions (Python 3.0)

2009-07-02 Thread David Bolen

"andrew cooke"  writes:

> However, when printed via format_exc(), this new exception still has the
> old exception attached via the mechanism described at
> http://www.python.org/dev/peps/pep-3134/ (this is Python 3.0).

If you're in control of the format_exc() call, I think the new chain
keyword parameter can disable this and restore the old behavior.

If you're not in control of the traceback display, I'm not sure there's
an easy way to prevent it, given that displaying chained exceptions is
the default mode.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Off-topic: Usenet archiving history

2009-06-14 Thread David Bolen

Ben Finney  writes:

> David Bolen  writes:
>
>> Individual messages could include an Expires: header if they wished,
>
> Since we're already well off-topic: NNTP, HTTP, and email, and probably
> other protocols as well, all deal with messages. They are all consistent
> in defining a message [0] as having *exactly one* header.

Heh, I'm not sure it's quite as consistent as you may think,
particularly with older RFCs, which are relevant in this discussion
since we're talking about historical artifacts.

For example, while more recent mail RFCs like 2822 may specifically
talk about header fields as the "header" (singular) of the message,
the older RFC 822 instead refers to a "headers" (plural) section.

> Every time you call a field from the header “a header”, or refer to
> the plural “headers of a message”, the IETF kills a kitten. You
> don't want to hurt a kitten, do you?

Heaven forbid - though I'd think I could hold my own with the IETF.
My reference to "header" was in lieu of "header line", something that
the Usenet RFCs (1036, and the older 850) do extensively themselves.

But I'll be more careful in the future - need to ensure kitten safety!

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Off-topic: Usenet archiving history

2009-06-14 Thread David Bolen

Dennis Lee Bieber  writes:

>   Either way -- it was still a change from "expiration at some
> date"... Though since (Netcom/Mindspring)Earthlink seems to have
> subcontracted NNTP service to Giganews (or some such) it wouldn't
> surprise me to learn that service also keeps a mammoth archive...

I'm not sure it's really a change, or if it is, it certainly isn't a
change from how things were originally.  "Expiration at some date" was
never any sort of global policy for Usenet - just an aspect of a
individual news server.  Some servers held messages for long periods,
particularly for the big seven groups - it's true that alt.* and in
particular the binaries, might expire quickly.  I know I certainly ran
some servers that didn't bother expiring - or had expiration times in
years - of the big seven.  My experience post-dates the great
renaming, so I can't speak to before that, but don't think behavior
was very different.

Individual messages could include an Expires: header if they wished,
but even that was just a suggestion.  Any actual expiration was due to
local configuration on each news server, which while it could take
Expires: headers into account, was just as often driven by local
storage availability or the whims of the local news admin :-)

I think Deja News was providing web access to their archive from the
mid-90s on (so quite a while before Google even existed) so certainly
by that point everyone had access to a rather complete archive even if
messages had expired on their local server.  I think Deja was also the
first to introduce X-No-Archive.  But other archives certainly existed
pre-Deja, which I'm sure is, in large part, how Google was able to
locate and incorporate the older messages into their system after
their acquisition of the Deja archive.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ah, ctypes

2009-06-01 Thread David Bolen

Nick Craig-Wood  writes:

> ctypes could potentially note that function types don't have enough
> references to them when passed in as arguments to C functions?  It
> might slow it down microscopically but it would fix this problem.

Except that ctypes can't know the lifetime needed for the callbacks.  If
the callbacks are only used while the called function is executing (say,
perhaps for a progress indicator or internal completion callback) then
it's safe to create the function wrapper just within the function call.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: AOPython Question

2009-05-28 Thread David Bolen

Roastie  writes:

> I installed the AOPython module:
>
>% easy_install aopython
>
> That left an aopython-1.0.3-py2.6.egg at
> C:\mystuff\python\python_2.6.2\Lib\site-packages.

An egg is basically a ZIP file with a specific structure (you can
inspect it with common ZIP tools).  Depending on the package
easy_install is installing, it may be considered safe to install as a
single file (which Python does support importing files from).

I tend to prefer to have an actual unpacked tree myself.  If you use
the "-Z" option to easy_install, you can force it to always unpack any
eggs when installing them.

Alternatively, if you've already got the single egg, you can always
unzip it yourself.  Just rename it temporarily and unzip it into a
directory named exactly the same as the single egg file was.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: A fast way to read last line of gzip archive ?

2009-05-25 Thread David Bolen

"Barak, Ron"  writes:

> I couldn't really go with the shell utilities approach, as I have no
> say in my user environment, and thus cannot assume which binaries
> are install on the user's machine.

I suppose if you knew your target you could just supply the external
binaries to go with your application, but I agree that would probably
be more of a pain than its worth for the performance gain in real
world time.

> I'll try and implement your last suggestion, and see if the
> performance is acceptable to (human) users.

In terms of tuning the third option a bit, I'd play with the tracking
of the final two chunk (as mentioned in my first response), perhaps
shrinking the chunk size or only processing a smaller chunk of it for
lines (assuming a reasonable line size) to minimize the final loop.
You could also try using splitlines() on the final buffer rather than
a StringIO wrapper, although that'll have a memory hit for the
constructed list but doing a small portion of the buffer would
minimize that.

I was curious what I could actually achieve, so here are three variants
that I came up with.

First, this just fine tunes slightly tracking the chunks and then only
processes enough final data based on anticipated maximum length length
(so if the final line is longer than that you'll only get the final
MAX_LINE bytes of that line).  I also found I got better performance
using a smaller 1024 chunk size with GZipFile.read() than a MB - not
entirely sure why although it perhaps matches the internal buffer size
better:

# last-chunk-2.py

import gzip
import sys

CHUNK_SIZE = 1024
MAX_LINE = 255

in_file = gzip.open(sys.argv[1],'r')

chunk = prior_chunk = ''
while 1:
prior_chunk = chunk
# Note that CHUNK_SIZE here is in terms of decompressed data
chunk = in_file.read(CHUNK_SIZE)
if len(chunk) < CHUNK_SIZE:
break

if len(chunk) < MAX_LINE:
chunk = prior_chunk + chunk

line = chunk.splitlines(True)[-1]
print 'Last:', line

On the same test set as my last post, this reduced the last-chunk
timing from about 2.7s to about 2.3s.

Now, if you're willing to play a little looser with the gzip module,
you can gain quite a bit more.  If you directly call the internal _read()
method you can bypass some of the unnecessary processing read() does, and
go back to larger I/O chunks:

# last-gzip.py

import gzip
import sys

CHUNK_SIZE = 1024*1024
MAX_LINE = 255

in_file = gzip.open(sys.argv[1],'r')

chunk = prior_chunk = ''
while 1:
try:
# Note that CHUNK_SIZE here is raw data size, not decompressed
in_file._read(CHUNK_SIZE)
except EOFError:
if in_file.extrasize < MAX_LINE:
chunk = chunk + in_file.extrabuf
else:
chunk = in_file.extrabuf
break

chunk = in_file.extrabuf
in_file.extrabuf = ''
in_file.extrasize = 0

line = chunk[-MAX_LINE:].splitlines(True)[-1]
print 'Last:', line

Note that in this case since I was able to bump up CHUNK_SIZE, I take
a slice to limit the work splitlines() has to do and the size of the
resulting list.  Using the larger CHUNK_SIZE (and it being raw size) will
use more memory, so could be tuned down if necessary.

Of course, the risk here is that you are dependent on the _read()
method, and the internal use of the extrabuf/extrasize attributes,
which is where _read() places the decompressed data.  In looking back
I'm pretty sure this code is safe at least for Python 2.4 through 3.0,
but you'd have to accept some risk in the future.

This approach got me down to 1.48s.

Then, just for the fun of it, once you're playing a little looser with
the gzip module, it's also doing work to compute the crc of the
original data for comparison with the decompressed data.  If you don't
mind so much about that (depends on what you're using the line for)
you can just do your own raw decompression with the zlib module, as in
the following code, although I still start with a GzipFile() object to
avoid having to rewrite the header processing:

# last-decompress.py

import gzip
import sys
import zlib

CHUNK_SIZE = 1024*1024
MAX_LINE = 255

decompress = zlib.decompressobj(-zlib.MAX_WBITS)

in_file = gzip.open(sys.argv[1],'r')
in_file._read_gzip_header()

chunk = prior_chunk = ''
while 1:
buf = in_file.fileobj.read(CHUNK_SIZE)
if not buf:
break
d_buf = decompress.decompress(buf)
# We might not have been at EOF in the read() but still have no
# decompressed data if the only remaining data was not original data
if d_buf:
prior_chunk = chunk
chunk = d_buf

if len(chunk) < MAX_LINE:
chunk = prior_chunk + chunk

line = chunk[-MAX_LINE:].splitlines(True)[-1]
print 'Last:', line

This version got me down to 1.15s.

So in summar

Re: A fast way to read last line of gzip archive ?

2009-05-24 Thread David Bolen

"Barak, Ron"  writes:

> I thought maybe someone has a way to unzip just the end portion of
> the archive (instead of the whole archive), as only the last part is
> needed for reading the last line.

The problem is that gzip compressed output has no reliable
intermediate break points that you can jump to and just start
decompressing without having worked through the prior data.

In your specific code, using readlines() is probably not ideal as it
will create the full list containing all of the decoded file contents
in memory only to let you pick the last one.  So a small optimization
would be to just iterate through the file (directly or by calling
readline()) until you reach the last line.

However, since you don't care about the bulk of the file, but only
need to work with the final line in Python, this is an activity that
could be handled more efficiently handled with external tools, as you
need not involve much intepreter time to actually decompress/discard
the bulk of the file.

For example, on my system, comparing these two cases:

# last.py

import gzip
import sys

in_file = gzip.open(sys.argv[1],'r')
for line in in_file:
pass
print 'Last:', line

# last-popen.py

import sys
from subprocess import Popen, PIPE

# Implement gzip -dc  | tail -1
gzip = Popen(['gzip', '-dc', sys.argv[1]], stdout=PIPE)
tail = Popen(['tail', '-1'], stdin=gzip.stdout, stdout=PIPE)
line = tail.communicate()[0]
print 'Last:', line

with an ~80MB log file compressed to about 8MB resulted in last.py
taking about 26 seconds, while last-popen took about 1.7s.  Both
resulted in the same value in "line".  As long as you have local
binaries for gzip/tail (such as Cygwin or MingW or equivalent) this
works fine on Windows systems too.

If you really want to keep everything in Python, then I'd suggest
working to optimize the "skip" portion of the task, trying to
decompress the bulk of the file as quickly as possible.  For example,
one possibility would be something like:

# last-chunk.py

import gzip
import sys
from cStringIO import StringIO

in_file = gzip.open(sys.argv[1],'r')

chunks = ['', '']
while 1:
chunk = in_file.read(1024*1024)
if not chunk:
break
del chunks[0]
chunks.append(chunk)

data = StringIO(''.join(chunks))
for line in data:
pass
print 'Last:', line

with the idea that you decode about a MB at a time, holding onto the
final two chunks (in case the actual final chunk turns out to be
smaller than one of your lines), and then only process those for
lines.  There's probably some room for tweaking the mechanism for
holding onto just the last two chunks, but I'm not sure it will make
a major difference in performance.

In the same environment of mine as the earlier tests, the above took
about 2.7s.  So still much slower than the external utilities in
percentage terms, but in absolute terms, a second or so may not be
critical for you compared to pure Python.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Thread-killing, round 666 (was Re: Lisp mentality vs. Python mentality)

2009-04-28 Thread David Bolen

Vsevolod  writes:

> On Apr 27, 11:31 pm, David Bolen  wrote:
>> I'm curious - do you know what happens if threading is implemented as
>> a native OS thread and it's stuck in an I/O operation that is blocked?
>> How does the Lisp interpreter/runtime gain control again in order to
>> execute the specified function?  I guess on many POSIX-ish
>> environments, internally generating a SIGALRM to interrupt a system
>> operation might work, but it would likely have portability problems.
>
> We're arguing to the old argument, who knows better, what the
> programmer wants: language implementor or the programmer himself.
> AFAIK, Python community is on former side, while Lisp one -- on the
> later. As always, there's no right answer.

Note I wasn't trying to argue anything, I was actually interested in
how the behavior is handled in Lisp?  Do you know how the Lisp
implementation of threads you spoke about handles this case?

E.g., can the Lisp implementation you are familiar with actually kill
such a thread blocked on an arbitrary external system or library call?

-- David
--
http://mail.python.org/mailman/listinfo/python-list

Re: Thread-killing, round 666 (was Re: Lisp mentality vs. Python mentality)

2009-04-27 Thread David Bolen

Vsevolod  writes:

> "This should be used with caution: it is implementation-defined
> whether the thread runs cleanup forms or releases its locks first."
> This doesn't mean deprecated. It means: implementation-dependent. For
> example in SBCL: "Terminate the thread identified by thread, by
> causing it to run sb-ext:quit - the usual cleanup forms will be
> evaluated". And it works fine.

I'm curious - do you know what happens if threading is implemented as
a native OS thread and it's stuck in an I/O operation that is blocked?
How does the Lisp interpreter/runtime gain control again in order to
execute the specified function?  I guess on many POSIX-ish
environments, internally generating a SIGALRM to interrupt a system
operation might work, but it would likely have portability problems.

Or is that combination (native OS thread and/or externally blocking
I/O) prevented by the runtime somehow (perhaps by internally polling
what appears to code as blocking I/O)?  But surely if there's an
access to OS routines, the risk of blocking must be present?

That scenario is really the only rationale use case I've run into for
wanting to kill a thread, since in other cases the thread can be
monitoring for an application defined way to shut down.

-- David
--
http://mail.python.org/mailman/listinfo/python-list

Re: global name 'self' is not defined - noob trying to learn

2009-03-30 Thread David Bolen

mark.sea...@gmail.com writes:

> class myclass(object):
> #
> # def __new__(class_, init_val, size, reg_info):
> def __init__(self, init_val, size, reg_info):
>
> # self = object.__new__(class_)
> self.reg_info = reg_info
> print self.reg_info.message
> self.val = self

Note that here you assign self.val to be the object itself.  Are you
sure you didn't mean "self.val = init_val"?

> (...)
> def __int__(self):
> return self.val

Instead of an integer, you return the current class instance as set up
in __init__.  The __int__ method ought to return an integer.

> def __long__(self):
> return long(self.val)

And this will be infinite recursion, since long() will try to
call the __long__ method on  so you're just recursing on the
__long__ method.

You can see this more clearly with:

>>> cat = myclass(0x55, 32, my_reg)
>>> int(cat)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: __int__ returned non-int (type myclass)
>>>

I won't post the traceback for long(cat), as it's, well, "long" ...

-- David
--
http://mail.python.org/mailman/listinfo/python-list

Re: finally successful in ods with python, just one help needed.

2009-03-14 Thread David Bolen

Krishnakant  writes:

> based on your code snippid I added a couple of lines to actually center
> align text in the merged cell in first row.

Sorry, guess I should have verified handling all the requirements :-)

I think there's two issues:

* I neglected to add the style I created to the document, so even in my
  first example, columns had a default style (not the 1in) style I thought
  I was creating.

* I don't think you want a paragraph style applied to the paragraph
  text within the cell, but to the cell as a whole.  I think if you
  just try to associate it with the text.P() element the "width" of
  the paragraph is probably just the text itself so there's nothing to
  center, although that's just a guess.

I've attached an adjusted version that does center the spanned cell
for me.  Note that I'll be the first to admit I don't necessarily
understand all the ODF style rules.  In particular, I got into a lot
of trouble trying to add my styles to the overall document styles
(e.g., ods.styles) which I think can then be edited afterwards rather
than the automatic styles (ods.automaticstyles).

The former goes into the styles.xml file whereas the latter is included
right in contents.xml.  For some reason using ods.styles kept causing
OpenOffice to crash trying to load the document, so I finally just went
with the flow and used automaticstyles.  It's closer to how OO itself
creates the spreadsheet anyway.

-- David

from odf.opendocument import OpenDocumentSpreadsheet
from odf.style import Style, TableColumnProperties, ParagraphProperties
from odf.table import Table, TableRow, TableColumn, \
  TableCell, CoveredTableCell
from odf.text import P

def make_ods():
ods = OpenDocumentSpreadsheet()

col = Style(name='col', family='table-column')
col.addElement(TableColumnProperties(columnwidth='1in'))

centered = Style(name='centered', family='table-cell')
centered.addElement(ParagraphProperties(textalign='center'))

ods.automaticstyles.addElement(col)
ods.automaticstyles.addElement(centered)

table = Table()
table.addElement(TableColumn(numbercolumnsrepeated=3, stylename=col))
ods.spreadsheet.addElement(table)

# Add first row with cell spanning columns A-C
tr = TableRow()
table.addElement(tr)
tc = TableCell(numbercolumnsspanned=3, stylename=centered)
tc.addElement(P(text="ABC1"))
tr.addElement(tc)

# Add two more rows with non-spanning cells
for r in (2,3):
tr = TableRow()
table.addElement(tr)
for c in ('A','B','C'):
tc = TableCell()
tc.addElement(P(text='%s%d' % (c, r)))
tr.addElement(tc)

ods.save("ods-test.ods")

if __name__ == "__main__":
make_ods()
--
http://mail.python.org/mailman/listinfo/python-list

Re: finally successful in ods with python, just one help needed.

2009-03-14 Thread David Bolen

Krishnakant  writes:

> However when I apply the same elements and attributes to the one I am
> creating with odfpy, I get "attribute not allowed " errors.
> If some one is interested to look at the code, please let me know, I can
> send an attachment off the list so that others are not forced to
> download some thing they are not concerned about.

I just tried this myself and the following creates a 3x3 spreadsheet
with the first row spanning all three columns (no special formatting
like centering or anything), using odf2py 0.8:

import sys

from odf.opendocument import OpenDocumentSpreadsheet
from odf.style import Style, TableColumnProperties
from odf.table import Table, TableRow, TableColumn, \
  TableCell, CoveredTableCell
from odf.text import P

def make_ods():
ods = OpenDocumentSpreadsheet()

col = Style(name='col', family='table-column')
col.addElement(TableColumnProperties(columnwidth='1in'))

table = Table()
table.addElement(TableColumn(numbercolumnsrepeated=3, stylename=col))
ods.spreadsheet.addElement(table)

# Add first row with cell spanning columns A-C
tr = TableRow()
table.addElement(tr)
tc = TableCell(numbercolumnsspanned=3)
tc.addElement(P(text="ABC1"))
tr.addElement(tc)
# Uncomment this to more accurately match native file
##tc = CoveredTableCell(numbercolumnsrepeated=2)
##tr.addElement(tc)

# Add two more rows with non-spanning cells
for r in (2,3):
tr = TableRow()
table.addElement(tr)
for c in ('A','B','C'):
tc = TableCell()
tc.addElement(P(text='%s%d' % (c, r)))
tr.addElement(tc)

ods.save("ods-test.ods")

Maybe that will give you a hint as to what is happening in your case.

Note that it appears creating such a spreadsheet directly in Calc also
adds covered table cells for those cells beneath the spanned cell, but
Calc loads a file fine without those and still lets you later split
the merge and edit the underlying cells.  So I'm not sure how required
that is as opposed to just how Calc manages its own internal structure.

-- David
--
http://mail.python.org/mailman/listinfo/python-list

Re: wxPython fast and slow

2009-03-12 Thread David Bolen

iu2  writes:

> A question about CallAfter: As I understand, this function is intended
> to be used from within threads, where it queues the operation to be
> performed in the GUI queue.

I agree with the second half of the sentence but not the first.
CallAfter is intended to queue up a delayed call (via the GUI queue),
but it can be used anywhere you wish that behavior.  Yes, it's also
one of the very few functions that can be called from a thread other
than the GUI thread, but it works just as well from the GUI thread.

Or to quote its docstring:

Call the specified function after the current and pending event
handlers have been completed.  This is also good for making GUI
method calls from non-GUI threads.  Any extra positional or
keyword args are passed on to the callable when it is called.

> How does it work in this situation? Does it queue the opreation for
> some idle time or does it perform it right away?

You can actually see the source in _core.py in your wx installation.
It always executes via a wx.PostEvent call.

> And another question, if I may, I used to make tight loops in windows
> API, planting inside them a command that processes messages from the
> GUI queue and returns when no more messages exists. Something like
> this:
>
> loop {
>   operations
>   process_gui_messages
> }
>
> The loop ran quickly and the GUI remained responsive during the loop.
> I did it on window API using a function I defined similar to this one:

I don't think there's much difference in the above and doing your
operations during one of the events.  In both cases "operations" is
going to block any further event processing so cannot be lengthy or
the GUI will feel unresponsive.  "Lengthy" varies but I'd certainly
put it in the neighborhood of small fractions of a second.

Your original code took almost 2 seconds for the "operations" part
(before getting back to processing GUI messages through the main
loop), which certainly seems too long.

> void ProcessMessages()
> {
>   while (PeekMessage()) {
> TranslateMessage(..);
> DispatchMessage(..);
>   }
> }

Not quite positive, but if you're talking about implementing this as a
nested dispatch loop (e.g., called from within an existing event), you
can do that via wxYield.  Of course, as with any nested event loop
processing, you have to be aware of possible reentrancy issues.

> This technique is not good for long loops, where the user may activate
> other long GUI opreations during the tight loop and make a mess.
> But it carries out the job well where during the time of the loop the
> user may only access to certain features, such as pressing a button to
> cancel the operation, operating the menu to exit the program, etc.
> This scheme saves some state-machine code that is required when using
> event-based programming.

Maybe - for my own part, I'm not completely convinced and tend to far
prefer avoiding nested event loop dispatching.  There are some times
when it might be unavoidable, but I tend to find it indicative that I
might want to re-examine what I am doing.

It seems to me that as long as you have to keep the "operations" step
of your loop small enough, you have to be able to divide it up.  So
you'll need some state no matter what to be able to work through each
stage of the overall "operations" in between calls to process the GUI.

At that point, whether it's a local variable within the scope of the
looping code, or just some instance variables in the object handling
the event loop seems about the same amount of state management.

For example, in your original code you could probably consider the
generator and/or 'x' your local state.  But the current step in the
movement could just as easily be an instance variable.

> Does wxPython have something like ProcessMessages?

If you just mean a way to process pending messages wxYield may be
sufficient.

If you want to take over the primary dispatch loop for the application,
normally that has been handed off to wxWidgets via wxApp.MainLoop.  However,
I believe you can build your own main dispatch loop if you want, as there
are functions in wxApp like ProcessPendingEvents, Pending, Dispatch and
so on.  You may need to explicitly continue to support Idle events in
your own loop if desired.

If you need to get into more details, it's probably better dealt with
on the wxPython mailing list.

-- David
--
http://mail.python.org/mailman/listinfo/python-list

Re: wxPython fast and slow

2009-03-08 Thread David Bolen

iu2  writes:

> Indeed, but I don't think the CallAfter is necessary. I could just as
> well remove the time.sleep in the original code. I could also make a
> tight loop to replace time.sleep
> for i in range(100): pass
> and tune it to fit the speed I need.

Except that CallAfter passed control back through the event loop which is
crucial for your GUI to appear responsive in other ways.

> I haven't mention this, but I actually want something to be the same
> speed on different PC-s. So a timer seems to fit in.

Then even a time.sleep() or plain loop isn't sufficient since each may
have additional latencies depending on load.  You will probably need
to query a system clock of some type to verify when your interval has
passed.

> I just can't make it work.
> Using wx.Timer is too slow.
> Using time.sleep is fast with PyScripter active, and slow when it is
> closed.

I have to admit to thinking that perhaps you're trying to operate too
quickly if you need better resolution than wx.Timer.  Most screen
operations don't have to appear that frequently to still appear
smooth, but that's your call.

Of course, even wx.Timer may be subject to other latencies if the
system or your application is busy with other events, so it depends
on how critical precise your timing needs to be.

You might also try an idle event, implementing your own timer (using
whatever call gives you the best resolution on your platform), and
just ignoring idle events that occur more frequently than the timing
you want.  Just remember to always request a new event.  You could do
the same thing with CallAfter as well, just reschedule a new one if
the current one is faster than your preferred interval.

-- David

--
http://mail.python.org/mailman/listinfo/python-list

Re: Using clock() in threading on Windows

2009-02-21 Thread David Bolen

"Martin v. Löwis"  writes:

> As a consequence, the half-busy loops could go away, at least
> on systems where lock timeouts can be given to the system.

I know that in some cases in the past I've had to bypass a Queue's use
of threading objects for waiting for a queue to unblock because of the
increased overhead (and latency as the timer increases) of the busy
loop.  On windows, replacing it with an implementation using
WaitForObject calls with the same timeouts I would have used with the
Queue performed much better, not unexpectedly, but was non-portable.

The current interface to the lowest level locks in Python are
certainly generic enough to cross lots of platforms, but it would
definitely be useful if they could implement timeouts without busy
loops on those platforms where they were supported.

-- David
--
http://mail.python.org/mailman/listinfo/python-list

Re: is python Object oriented??

2009-01-31 Thread David Bolen

thmpsn@gmail.com writes:

> I don't know how you would do it in C# (or Java for that matter).
>
> In C++ you can play with pointers to "get at" some memory location
> somewhere in the object. The only portable way to know the exact
> location between the beginning of the object and the desired member is
> the offsetof() macro, but as I understand it this only works for POD
> types, which means that it won't work for classes such as:
>
> class NonPOD
> {
> private:
> int a;
> int b;
> public:
> NonPOD();
> ~NonPOD();
> int C();
> };
>
> (I haven't ever actually tried it, so I'm not sure.)
>
> Nevertheless, you can play and hope for the best. For example, if the
> member you want to get at is 'b', then you can do:
>
> NonPOD obj;
> std::cout << "obj.b = " << *(int*) ((unsigned char*) &obj + sizeof
> (int)) << std::endl;
>
> and hope that the compiler didn't leave a hole between the 'a' member
> and the 'b' member.

Probably moving off topic, but I don't think you have to get anywhere
near that extreme in terms of pointers, unless you're trying to deal
with instances for which you have no source but only opaque pointers.

I haven't gotten stuck having to do this myself yet, but I believe one
commmon "hack" for the sort of class you show above is to just
"#define private public" before including the header file containing
the class definition.  No fiddling with pointers, offsets, or
whatever, just normal object access syntax past that point.

Of course, I believe such a redefinition violates the letter of the
C++ standard, but most preprocessors do it anyway.  Also, it won't
handle the case where the "private:" is not used, but the members are
just declared prior to any other definition, since a class is private
by default.

But even then, if you had to, just make a copy of the class definition
(or heck, just define a structure if it's just data elements), ensure
the private portions are public, and then cast a pointer to the old
class instance to one of your new class instance.  Assuming you're
building everything in a single compiler, the layouts should match
just fine.  Again, normal object member access, no casting or pointers
needed (beyond the initial overall object pointer cast).

-- David
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python Crashes

2009-01-15 Thread David Bolen

koranthala  writes:

> Could anyone guide me on this? I have been facing this issue for a
> day, and cannot seem to solve it.

We had a scheduling system that had a similar "once in a long while
hard Windows-process crash" which after a bunch of work to try to
track down the source, the most robust solution was just to trap the
failure and restart, as the system ran off a persistent state that was
already engineering to be robust in the case of a hardware crash
(power outage, etc...)  While I agree with others that it's most
likely a fault in an extension that gets tickled over time, as was
likely in our case, we needed all our extensions and were using latest
versions at the time.  So if your application is such that just
restarting it is practical, it may be a sufficient temporary (or not
so temporary - our system ran for years this way) workaround for you.

What you can do is execute your script from beneath control of another
script, and trap process failures, restarting the script on
non-standard exits.  This can be in addition to any top level
exception handling of the child script itself, where it can provide
more graceful support for internal failures.

The trick, under Windows, is to ensure that you disable any pop-up
windows that may occur during the crash, otherwise the monitoring task
never gets a chance to get control and restart things.

With the pywin32 extension, something like:

import win32api, win32con

old_mode = win32api.SetErrorMode(win32con.SEM_FAILCRITICALERRORS |
 win32con.SEM_NOGPFAULTERRORBOX |
 win32con.SEM_NOOPENFILEERRORBOX)

Or with ctypes:

import ctypes

SEM_FAILCRITICALERRORS = 1
SEM_NOGPFAULTERRORBOX  = 2
SEM_NOOPENFILEERRORBOX = 0x8000

old_mode = ctypes.windll.kernel32.SetErrorMode(SEM_FAILCRITICALERRORS |
   SEM_NOGPFAULTERRORBOX |
   SEM_NOOPENFILEERRORBOX)

at any point prior to starting the child process will ensure that hard
process errors will silently terminate the process and return control
the parent, as well as not popping up any dialog boxes that require
intervention by a person.

Should the process exit harshly, the exit code should be fairly
clear (I forget, but I think it's in the 0xC000 range, maybe
0xC005 for a typical GPF), and you can decide on restarting the
task as opposed to just exiting normally.

This will also prevent any pop-ups in the main monitoring process.
You can restore old behavior there after starting the child by making
another call to SetErrorMode using old_mode as the argument.

-- David
--
http://mail.python.org/mailman/listinfo/python-list

Re: Implementing file reading in C/Python

2009-01-14 Thread David Bolen

Johannes Bauer  writes:

> Yup, I changed the Python code to behave the same way the C code did -
> however overall it's not much of an improvement: Takes about 15 minutes
> to execute (still factor 23).

Not sure this is completely fair if you're only looking for a pure
Python solution, but to be honest, looping through a gazillion
individual bytes of information sort of begs for trying to offload
that into a library that can execute faster, while maintaining the
convenience of Python outside of the pure number crunching.

I'd assume numeric/numpy might have applicable functions, but I don't
use those libraries much, whereas I've been using OpenCV recently for
a lot of image processing work, and it has matrix/histogram support,
which seems to be a good match for your needs.

For example, assuming the OpenCV library and ctypes-opencv wrapper, add
the following before the file I/O loop:

from opencv import *

# Histogram for each file chunk
hist = cvCreateHist([256], CV_HIST_ARRAY, [(0,256)])

then, replace (using one of your posted methods as a sample):

datamap = { }
for i in data:
datamap[i] = datamap.get(i, 0) + 1

array = sorted([(b, a) for (a, b) in datamap.items()], reverse=True)
most = ord(array[0][1])

with:

matrix = cvMat(1, len(data), CV_8UC1, data)
cvCalcHist([matrix], hist)
most = cvGetMinMaxHistValue(hist,
min_val = False, max_val = False,
min_idx = False, max_idx = True)

should give you your results in a fraction of the time.  I didn't run
with a full size data file, but for a smaller one using smaller chunks
the OpenCV varient ran in about 1/10 of the time, and that was while
leaving all the other remaining Python code in place.

Note that it may not be identical results to some of your other
methods in the case of multiple values with the same counts, as the
OpenCV histogram min/max call will always pick the lower value in such
cases, whereas some of your code (such as above) will pick the upper
value, or your original code depended on the order of information
returned by dict.items.

This sort of small dedicated high performance choke point is probably
also perfect for something like Pyrex/Cython, although that would
require a compiler to build the extension for the histogram code.

-- David
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3.0 automatic decoding of UTF16

2008-12-06 Thread David Bolen

Johannes Bauer <[EMAIL PROTECTED]> writes:

> This is very strange - when using "utf16", endianness should be detected
> automatically. When I simply truncate the trailing zero byte, I receive:

Any chance that whatever you used to "simply truncate the trailing
zero byte" also removed the BOM at the start of the file?  Without it,
utf16 wouldn't be able to detect endianness and would, I believe, fall
back to native order.

-- David
--
http://mail.python.org/mailman/listinfo/python-list

Re: [py2exe] What to download when updating?

2008-04-27 Thread David Bolen

Gilles Ganault <[EMAIL PROTECTED]> writes:

> Hello
>
>   Out of curiosity, if I recompile a Python (wxPython) app with
> py2exe, can I have customers just download the latest .exe, or are
> there dependencies that require downloading the whole thing again?

It will depend on what you changed in your application.  The most
likely file that will change is your library.zip file since it has all
of your Python modules.  I believe that with py2exe the main exe is
typically a standard stub, so it need not change, but it can if the
top level script is named differently since it has to execute it.

The other files are binary dependencies, so you may add or remove them
during any given build process depending on what modules you may newly
import (or have removed the use of).

In the end, you could in theory just compare the prior version
distribution tree to the new version is simplest.  But then you'd need
to package up an installer that did the right thing on the target
system.

To be honest, just packaging it up as a new version and putting it
into a standard installer (as with InnoSetup or NSIS) and letting the
installer keep track of what to do when installing the new version on
top of an existing version is generally simplest overall, albeit
larger.

But during internal development or other special cases, I've
definitely just distributed updated library.zip files without any
problem.

-- David
--
http://mail.python.org/mailman/listinfo/python-list

Re: is there enough information?

2008-03-04 Thread David Bolen

Dennis Lee Bieber <[EMAIL PROTECTED]> writes:

> On Mon, 3 Mar 2008 08:11:43 -0500, Jean-Paul Calderone
> <[EMAIL PROTECTED]> declaimed the following in comp.lang.python:
>
>> I'm not sure, but you seem to be implying that the only way to use Windows'
>> asynchronous I/O APIs is with threads.  Actually, it is possible (and Twisted
>> allows you) to use these as well without writing a threaded application.
>>
>   I only pointed out that, on Windows, one can not use the common
> /select()/ function with files. And one rarely sees examples of coding a
> Twisted-style (emphasis on style) asynchronous callback system mixing
> files and network sockes using the Windows-specific API.
>
>   If using threads, the Windows asynchronous I/O isn't needed... let
> the thread block until the I/O completes, then transfer the data (or a
> message that the data is available) back to the main processing
> thread...

You're probably right that it's rare, but when needed, using the
Windows asynchronous/overlapping API can provide a better solution
than blocking threads depending on the needs at hand, and without
involving any callbacks or Twisted-style programming.

An example of mine is high performance serial port handling as part of
a custom FHSS wireless adapter with a serial port interface to the PC.
In this case, minimizing I/O latency was crucial since delays could
mean missing a broadcast timeslot (about 15ms) on the wireless
network.  A serial port isn't a disk file, but certainly a "file" in
the context of Windows handles.

Early implementations used independent threads for reading/writing to
the serial port and blocking during such operations, but that turned
out to have an undesirable amount of latency, and was also difficult
to interrupt when the threads were in a blocked condition.

Instead I created a single thread that had a loop using overlapped I/O
simultaneously in each direction as well as native Windows event
objects for aborting or signaling that there was additional data to be
written (the pending read I/O handled the read case).  The main loop
was just a WaitForMultipleObjects to handle any of the I/O completion
indications, requests for more I/O or aborts.  It was very high
performing (low latency) with low CPU usage - measurably better than a
multi-threaded version.

Communication with the rest of the application was through a
thread-safe bi-directional buffer object, also using native Win32
event objects.  It worked similar to a queue, but by using the native
event objects I didn't have the performance inefficiencies for reads
with timeouts of the Python objects.  The underlying Python primitives
don't have the timeout capability built in, so reads with timeouts get
implemented through checks for data interspersed with increasing
sleeps, which adds unnecessary latency.

Anyway, it worked extremely well, and was a much better fit for my
needs than a multi-threaded version with blocking I/O, without it
having to be Twisted-style.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Low-overhead GUI toolkit for Linux w/o X11?

2007-11-03 Thread David Bolen

David Bolen <[EMAIL PROTECTED]> writes:

> When I was looking for an embedded graphics library for a prior
> platform (ELAN 486, 2MB flash, 6MB RAM) under DOS, we took a look at
> these:
>
>   * GRX (http://grx.gnu.de/index.html)
(...)
> There aren't any Python wrappers for GRX, but the library is straight
> C which should be easy to wrap (manually or with something like SWIG).
> No built-in widget support at all (some sample button processing code
> in a demo module), but easy enough to implement your own if your needs
> are modest.

I had forgotten, since we didn't use it, but there is an external mGui
library (http://web.tiscalinet.it/morello/MGui/index.html) that can
layer on top of GRX to provide higher level functionality.  Of course,
it would also have to be wrapped for use from Python.

-- David

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Low-overhead GUI toolkit for Linux w/o X11?

2007-11-03 Thread David Bolen

Grant Edwards <[EMAIL PROTECTED]> writes:

> I'm looking for GUI toolkits that work with directly with the
> Linux frambuffer (no X11).  It's an embedded device with
> limited resources, and getting X out of the picture would be a
> big plus.

Sounds like a reasonably modern "embedded" system since traditionally
neither X (nor Python) would likely have even been plausible in such
environments.  Depending on the higher level GUI functionality you
require and how tight the resources really are, you might want to
consider investigating pure drawing libraries and then implement any
missing GUI elements (widgets and mouse handling) you need yourself.

When I was looking for an embedded graphics library for a prior
platform (ELAN 486, 2MB flash, 6MB RAM) under DOS, we took a look at
these:

  * GRX (http://grx.gnu.de/index.html)
  * Allegro (http://alleg.sourceforge.net/)

We ended up using GRX, primarily because it was the simplest to
develop a custom video driver for to match our platform, along with
having a simpler core.  We were under DOS but also used it with a
later generation of the platform under Linux.  Both libraries support
operation over the framebuffer in Linux.  Our app was in C++ (Python
wasn't an option), and we implemented our own buttons and text
widgets (in our case we never needed any scrolling widgets).

There aren't any Python wrappers for GRX, but the library is straight
C which should be easy to wrap (manually or with something like SWIG).
No built-in widget support at all (some sample button processing code
in a demo module), but easy enough to implement your own if your needs
are modest.

Although we didn't end up using it, Allegro is more fully featured
(actually with some non-GUI cruft too since it targets games), and
also appears to have two work-in-progress Python bindings.  Some basic
widget support in dialog processing routines.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Co-developers wanted: document markup language

2007-09-01 Thread David Bolen

Roy Smith <[EMAIL PROTECTED]> writes:

> Anybody remember Scribe?

(raising hand)

OT, but I still have a bunch of Scribe source documents from college.

Of course, as I attended CMU where it originated I suppose that's not
unusual.  Definitely pre-WYSIWYG, but one of the first to separate
presentation markup from structure (very much in line with later stuff
like SGML from IBM although I don't recall the precise timing relation
of the two), including the use of styles.

I personally liked it a lot (I think the markup syntax is easier on
the eyes than the *ML family).  If I remember correctly, for a while
there, it was reasonably common to see Scribe-like markup in
newsgroups (e.g,. "@begin(flame)" and @end("flame") or "@b[emphasis]")
before SGML/XML/HTML became much more common (" ... ").

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HowTo Use Cython on a Windows XP Box?

2007-08-31 Thread David Bolen

David Lees <[EMAIL PROTECTED]> writes:

> Yes, you are correct in understanding my question.  I thought my post
> was clear, but I guess not.  I will go try the pyrex list.

You might also try looking for references to distutils support for
non-MS compilers, since Pyrex (and presumably Cython) uses distutils
under the covers to build the final extension.  I'm pretty sure there
is support in recent Python releases for using mingw rather than MSVC
for most extensions (there may be problems with using certain Python
APIs that depending on specific C RTL structures like files).

As to using VC, yes, it does have to be VC 7.1, e.g,. Visual Studio
2003.  You can't use 2005, as MS didn't maintain runtime
compatibility.  I'm sure there are a number of threads about that also
available.  If I recall correctly, VC 7.1 began to be used in the 2.4
timeframe - although it was getting discussed back when 2.3 was
getting released, based on an offer Microsoft had made to provide
copies to core developers.  The discussions are archived, but VC 6 was
definitely long in the tooth at that point.  As the development tools
aren't free, they haven't been upgraded past that point to date.  It's
unfortunate that when MS changed the main runtime DLL with VC 7 (for
the first time in a pretty long time), that they then did so
immediately again (and incompatibly) with VC 8.

At the time, there were also efforts with some success to use the free
toolkit MS made available (although I think it was sans optimizer),
but then I think that got pulled and/or it became more difficult to
find/use, but my memory is fuzzy.

You mention having VS 2005 - if so, do you also have an MSDN
subscription?  I believe you should still be able to get VS 2003 via
that route if you first started with 2005 and thus never had 2003.  If
not, the mingw approach may be your best bet.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unzip: Memory Error

2007-08-30 Thread David Bolen

I wrote:

> Here's a small example of a ZipFile subclass (tested a bit this time)
> that implements two generator methods:

Argh, not quite tested enough - one fix needed, change:

if bytes[-1] not in ('\n', '\r'):
partial = lines.pop()

to:

if bytes[-1] not in ('\n', '\r'):
partial = lines.pop()
else:
partial = ''

(add the extra two lines)

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unzip: Memory Error

2007-08-30 Thread David Bolen

David Bolen <[EMAIL PROTECTED]> writes:

> If you are going to read the file data incrementally from the zip file
> (which is what my other post provided) you'll prevent the huge memory
> allocations and risk of running out of resource, but would have to
> implement your own line ending support if you then needed to process
> that data in a line-by-line mode.  Not terribly hard, but more
> complicated than my prior sample which just returned raw data chunks.

Here's a small example of a ZipFile subclass (tested a bit this time)
that implements two generator methods:

read_generator  Yields raw data from the file
readline_generator  Yields "lines" from the file (per splitlines)

It also corrects my prior code posting which didn't really skip over
the file header properly (due to the variable sized name/extra
fields).  Needs Python 2.3+ for generator support (or 2.2 with
__future__ import)

Peak memory use is set "roughly" by the optional chunk parameter.
It's roughly since that's an uncompressed chunk so will grow in memory
during decompression.  And the readline generator adds further copies
for the data split into lines.

For your file processing by line, it could be used as in:

zipf = ZipFileGen('somefile.zip')

g = zipf.readline_generator('somefilename.txt')
for line in g:
dealwithline(line)

zipf.close()

Even if not a perfect match, it should point you further in the right
direction.

-- David

  - - - - - - - - - - - - - - - - - - - - - - - - -

import zipfile
import zlib
import struct

class ZipFileGen(zipfile.ZipFile):

def read_generator(self, name, chunk=65536):
"""Return a generator that yields file bytes for name incrementally.
The optional chunk parameter controls the chunk size read from the
underlying zip file.  For compressed files, the data length returned
by the generator will be larger as the decompressed version of a chunk.

Note that unlike read(), this method does not preserve the internal
file pointer and should not be mixed with write operations.  Nor does
it verify that the ZipFile is still opened and for reading.

Multiple generators returned by this function are not designed to be
used simultaneously (they do not re-seek the underlying file for
each request."""

zinfo = self.getinfo(name)
compressed = (zinfo.compress_type == zipfile.ZIP_DEFLATED)
if compressed:
dc = zlib.decompressobj(-15)

self.fp.seek(zinfo.header_offset)

# Skip the file header (from zipfile.ZipFile.read())
fheader = self.fp.read(30)
if fheader[0:4] != zipfile.stringFileHeader:
raise zipfile.BadZipfile, "Bad magic number for file header"

fheader = struct.unpack(zipfile.structFileHeader, fheader)
fname = self.fp.read(fheader[zipfile._FH_FILENAME_LENGTH])
if fheader[zipfile._FH_EXTRA_FIELD_LENGTH]:
self.fp.read(fheader[zipfile._FH_EXTRA_FIELD_LENGTH])

# Process the file incrementally
remain = zinfo.compress_size
while remain:
bytes = self.fp.read(min(remain, chunk))
remain -= len(bytes)
if compressed:
bytes = dc.decompress(bytes)
yield bytes

if compressed:
bytes = dc.decompress('Z') + dc.flush()
if bytes:
yield bytes

def readline_generator(self, name, chunk=65536):
"""Return a generator that yields lines from a file within the zip
incrementally.  Line ending detection based on splitlines(), and
like file.readline(), the returned line does not include the line
ending.  Efficiency not guaranteed if used with non-textual files.

Uses a read_generator() generator to retrieve file data incrementally,
so it inherits the limitations of that method as well, and the
optional chunk parameter is passed to read_generator unchanged."""

partial = ''
g = self.read_generator(name, chunk=chunk)

for bytes in g:
# Break current chunk into lines
lines = bytes.splitlines()

# Add any prior partial line to first line
if partial:
lines[0] = partial + lines[0]

# If the current chunk didn't happen to break on a line ending,
# save the partial line for next time
if bytes[-1] not in ('\n', '\r'):
partial = lines.pop()

# Then yield the lines we've identified so far
for curline in lines:
yield curline

# Return any trailing data (if file didn't end in a line ending)
if partial:
yield partial
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Creating a multi-tier client/server application

2007-08-30 Thread David Bolen

Jeff <[EMAIL PROTECTED]> writes:

> David: Sounds like a pretty interesting app.  Thanks for the in-depth
> description.  I went and checked out Twisted PB, and it seems
> awesome.  I may very well go with that.  How was writing code with
> it?  I may also end up using py2app, but I'm also going to have to
> support Windows, (p2exe, then), and possibly Linux.  Well, maybe not
> Linux, but I'll probably be doing most of the development in Linux, so
> I guess that counts.

I find PB very easy, but it's important to first become familiar with
Twisted (in particular Deferred's), which can have a steep, but worth
it IMO, learning curve.  PB is a thin, transparent system, so it
doesn't try to hide the fact that you are working remotely.  Being
thin, there also isn't very much to have to learn.

For packaging, you don't have to use a single system if you are
multi-platform.  Your codebase can be common, and just have separate
setup files using py2app on OS X and py2exe on Windows.  A makefile or
equivalent can handle final distribution packaging (e.g,. hdiutil for
dmg on OS X, Inno Setup, NSIS, etc... on Windows).  You'll spend some
platform-specific time getting the initial stuff setup, but then new
builds should be easy.

For Linux, depending on the level of your users you can either just
directly ship something like eggs (generated through a setup) or look
into pyInstaller, which was the old Installer package that also
supports single-exe generation for Linux.  pyInstaller also does
Windows, so if you have to support them both you could try using
pyInstaller rather than both it and py2exe.

But if you're just developing in Linux, final packaging probably isn't
very important.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unzip: Memory Error

2007-08-30 Thread David Bolen

mcl <[EMAIL PROTECTED]> writes:

> pseudo code
>
> zfhdl = zopen(zip,filename)  # Open File in Zip Archive for
> Reading
>
> while True:
> ln = zfhdl.readline()# Get nextline of file
> if not ln:   # if EOF file
>   break
> dealwithline(ln) # do whatever is necessary with
> file
> zfhdl.close
>
> That is probably over simplified, and probably wrong but you may get
> the idea of what I am trying to achieve.

Do you have to process the file as a textual line-by-line file?  Your
original post showed code that just dumped the file to the filesystem.
If you could back up one step further and describe the final operation
you need to perform it might be helpful.

If you are going to read the file data incrementally from the zip file
(which is what my other post provided) you'll prevent the huge memory
allocations and risk of running out of resource, but would have to
implement your own line ending support if you then needed to process
that data in a line-by-line mode.  Not terribly hard, but more
complicated than my prior sample which just returned raw data chunks.

Depending on your application need, it may still be simpler to just
perform an extraction of the file to temporary filesystem space (using
my prior code for example) and then open it normally.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unzip: Memory Error

2007-08-29 Thread David Bolen

mcl <[EMAIL PROTECTED]> writes:

> I am trying to unzip an 18mb zip containing just a single 200mb file
> and I get a Memory Error.  When I run the code on a smaller file 1mb
> zip, 11mb file, it works fine.
(...)
> def unzip_file_into_dir(file, dir):
>   #os.mkdir(dir, 0777)
>   zfobj = zipfile.ZipFile(file)
>   for name in zfobj.namelist():
>   if name.endswith('/'):
>   os.mkdir(os.path.join(dir, name))
>   else:
>   outfile = open(os.path.join(dir, name), 'wb')
>   outfile.write(zfobj.read(name))
>   outfile.close()

The "zfobj.read(name)" call is reading the entire file out of the zip
into a string in memory.  It sounds like it's exceeding the resources
you have available (whether overall or because the Apache runtime
environment has stricter limits).

You may want to peek at a recent message from me in the "Unable to
read large files from zip" thread, as the suggestion there may also be
suitable for your purposes.

http://groups.google.com/group/comp.lang.python/msg/de04105c170fc805?dmode=source
-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Creating a multi-tier client/server application

2007-08-29 Thread David Bolen

Jeff <[EMAIL PROTECTED]> writes:

> reasons, not the least of which is that I've been working almost
> entirely on web apps for the past few years, and I am getting mighty
> sick of it.  A lot of that is due to the language (PHP, which I have
> since grown to hate) I had to use.  I've worked on a site for my self
> in Python (using Pylons, actually--which is excellent) which was
> vastly easier and more fun.  But I'd really like to try something
> different.

To contribute a data point against your original question - I've a similar
(structurally, not functionality) system I completed recently.

Without trying to get too mired in the thick client v. web application
debate, there were a handful of points that decided me in favor of the
thick client:

* Needed to automate QuickTime viewer for video previews and extraction
  of selected frames to serve as thumbnails on web approval page.
* Needed to control transfers to server of multiple very large files
  (hundreds of MBs to GBs at a shot)

But assuming a thick client, in terms of your original question of
components to use, here's what I've got.  My primary networking component
is Twisted.

The pieces are:

Client (OS X Cocoa application):

* PyObjC based.  Twisted for networking, Twisted's PB for the primary
  management channel, with an independent direct network connections for
  bulk file transfers.  (I needed to go Mac native for clean integration of
  QuickTime UI elements including frame extraction to thumbnails)

Server:

* Twisted for networking.  PB and raw connections for clients, web server
  through twisted.web.  Genshi for web templating, with Mochikit (might
  move to JQuery) for client-side JS/AJAX.  Twisted for email transmission
  (email construction using normal Python email package).  Small UI
  front-end module (Cocoa/PyObjC).

The client accesses server-based objects through Twisted PB, which for some
of the server objects also control session change lifetime (transactions).
So at least in my case, having a stateful connection from the client worked
out well, particularly since I needed to coordinate both database changes
as well as filesystem changes through independent file uploads, each of
which can fail independently.

Right now a single server application contains all support for client
connections as well as the web application, but I could fracture that (so
the web server was independent for example) if needed.

For the client, I package it using py2app, and put into an normal Mac
installer, and distribute as a dmg.  If it were a Windows client, I'd
probably wrap with py2exe, then Inno Setup.  The server's web server has a
restricted URL that provides access to the DMG.  The client has a Help menu
item taking users to that URL.  Clients are versioned and accepted/rejected
by the server during initial connection - from the server side I can
"retire" old client versions, at which point users get a message at signon
with a button to take them to the download page for the latest DMG.

So right now upgrades are executed manually by the user, and I can support
older clients during any transition period.  I may provide built-in support
for automatically pulling down the new image and executing its installer,
but haven't found it a hardship yet.  I probably won't bother trying to
automate smaller levels of updates.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unable to read large files from zip

2007-08-29 Thread David Bolen

Nick Craig-Wood <[EMAIL PROTECTED]> writes:

> Kevin Ar18 <[EMAIL PROTECTED]> wrote:
>> 
>>  I posted this on the forum, but nobody seems to know the solution: 
>> http://python-forum.org/py/viewtopic.php?t=5230
>> 
>>  I have a zip file that is several GB in size, and one of the files inside 
>> of it is several GB in size.  When it comes time to read the 5+GB file from 
>> inside the zip file, it fails with the following error:
>>  File "...\zipfile.py", line 491, in read bytes = 
>> self.fp.read(zinfo.compress_size)
>>  OverflowError: long it too large to convert to int
>
> That will be an number which is bigger than 2**31 == 2 GB which can't
> be converted to an int.
>
> It would be explained if zinfo.compress_size is > 2GB, eg
>
>   >>> f=open("z")
>   >>> f.read(2**31)
>   Traceback (most recent call last):
> File "", line 1, in ?
>   OverflowError: long int too large to convert to int
>
> However it would seem nuts that zipfile is trying to read > 2GB into
> memory at once!

Perhaps, but that's what the read(name) method does - returns a string
containing the contents of the selected file.  So I think this runs
into a basic issue of the maximum length of Python strings (at least
in 32bit builds, not sure about 64bit) as much as it does an issue
with the zipfile module.  Of course, the fact that the only "read"
method zipfile has is to return the entire file as a string might
be considered a design flaw.

For the OP, if you know you are going to be dealing with very large
files, you might want to implement your own individual file
extraction, since I'm guessing you don't actually need all 5+GB of the
problematic file loaded into memory in a single I/O operation,
particularly if you're just going to write it out again, which is what
your original forum code was doing.

I'd probably suggest just using the getinfo(name) method to return the
ZipInfo object for the file in question, then process the appropriate
section of the zip file directly.  E.g., just seek to the proper
offset, then read the data incrementally up to the full size from the
ZipInfo compress_size attribute.  If the files are compressed, you can
incrementally hand their data to the decompressor prior to other
processing.

E.g., instead of your original:

fileData = dataObj.read(i)
fileHndl = file(fileName,"wb")
fileHndl.write(fileData)
fileHndl.close()

something like (untested):

CHUNK = 65536# I/O chunk size

fileHndl = file(fileName,"wb")

zinfo = dataObj.getinfo(i)
compressed = (zinfo.compress_type == ZLIB_DEFLATED)
if compressed:
dc = zlib.decompressobj(-15)

dataObj.fp.seek(zinfo.header_offset+30)
remain = zinfo.compress_size
while remain:
bytes = dataObj.fp.read(min(remain, CHUNK))
remain -= len(bytes)
if compressed:
bytes = dc.decompress(bytes)
fileHndl.write(bytes)

if compressed:
bytes = dc.decompress('Z') + dc.flush()
if bytes:
fileHndl.write(bytes)

fileHndl.close()

Note the above assumes you are only reading from the zip file as it
doesn't maintain the current read() method invariant of leaving the
file pointer position unchanged, but you could add that too.  You
could also verify the file CRC along the way if you wanted to.

Might be even better if you turned the above into a generator, perhaps
as a new method on a local ZipFile subclass.  Use the above as a
read_gen method with the write() calls replaced with "yield bytes",
and your outer code could look like:

fileHndl = file(fileName,"wb")
for bytes in dataObj.read_gen(i):
fileHndle.write(bytes)
fileHndl.close()

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How properly manage memory of this PyObject* array?? (C extension)

2006-07-19 Thread David Bolen

"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes:

> > *WRONG*. The object exists in and of itself. There may be one *or more*
> > references to it, via pointers, scattered about in memory; they are
> > *NOT* components of the object. A reference count is maintained inside
> > the object and manipulated by Py_INCREF etc. The Python garbage
> > collector knows *nothing* about the memory occupied by those pointers;
> 
> John - Thanks again,
> I think I just learned something that is important.  If I free memory
> associated
> with pointers to Python objects it will NOT erase the Python objects!!!
> I can merrily follow your orders to free(my_array); without worrying
> about
> nuking the Python objects too early!  Thanks for pointing out that
> reference count is maintained inside the Python object itself.

This may be clear from the thread, but since you don't mention it, you
must also have issued the Py_DECREF for each object reference in
my_array at the point that you're planning on freeing it.

Otherwise, while you will have freed up your own local memory
allocation and no longer make use of the object references (pointers)
previously there, the Python memory manager doesn't know that - it
still has a ref count for your prior references - and the objects
themselves will never be freed.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Replacing utf-8 characters

2005-10-05 Thread David Bolen

Mike <[EMAIL PROTECTED]> writes:

> What you and I typed was ascii. The value of link came from importing
> that utf-8 web page into that variable.  That is why I think it is not
> working.  But not sure what the solution is.

Are you sure you're asking what you think you are asking?  Both the
ampersand character (&) and the characters within the ampersand entity
character reference (&) are ASCII.  As it turns out they are also
legal UTF-8, but I would not call a web page UTF-8 just because I saw
the sequence of characters "&" within the stream.  (That's not to
say it isn't UTF-8 encoded, just that I don't think that's the issue)

I'm just guessing, but you do realize that legal HTML should quote all
uses of the ampersand character with an entity reference, since the
ampersand itself is reserved for use in such references.  This
includes URL references whether inside attributes or in the body of
the text.

So when you see something in a browser in a web page that shows a URL
that includes "&" such as for separating parameters, internally that
page is (or should be) stored with "&" for that character.  Thus
if you retrieve the page in code, that's what you'll find.  It's the
browser processing that entity reference that turns it back into the
"&" for presentation.

Note that whether or not the page in question is encoded as UTF-8 is a
completely distinct question - whatever encoding the page is in would
be used to encode the characters in the entity reference (namely
"&").

I'm assuming that in scraping the page you want to reverse the process
(e.g., perform the interpretation of the entity references much as a
browser would) before using that URL for other purposes.  If so, the
string replacement you tried should handle the replacement just fine,
at least within the value of the URL as managed by your code.

You then mention it being the same when you view the contents of the
link, which isn't quite clear to me, but if that means retrieving
another copy of the link as embedded in an HTML page then yes, it'll
get quoted again since as initially, you have to quote an ampersand
as an entity reference within HTML.

What did you mean by "view the contents link"?

-- David

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Bug on Python2.3.4 [FreeBSD]?

2005-08-12 Thread David Bolen

Uwe Mayer <[EMAIL PROTECTED]> writes:

> AFAICT there seems to be a bug on FreeBSD's Python 2.3.4 open function. The
> documentation states:
> 
> > Modes 'r+', 'w+' and 'a+' open the file for updating (note that 'w+'
> > truncates the file). Append 'b' to the mode to open the file in binary
> > mode, on systems that differentiate between binary and text files (else it
> > is ignored). If the file cannot be opened, IOError is raised.   
> 
> Consider:
> 
> $ cat test
> lalala
> 
> $ python2.3
> Python 2.3.4 (#2, Jan  4 2005, 04:42:43)
> [GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
> Type "help", "copyright", "credits" or "license" for more information.
> >>> f = open('test', 'r+')
> >>> f.read()
> 'lalala\n'
> >>> f.write('testing')
> >>> f.close()
> >>>
> [1]+  Stopped python2.3
> $ cat test
> lalala
> 
> -> write did not work; ok

Strange, I tried this with Python 2.3.3 and 2.3.5 on two FreeBSD 4.10
systems and it seemed to append to the file properly in both cases.
Going back further, it also worked with Python 2.2.2 on a FreeBSD 4.7
system.  I don't see happen to have a 2.3.4 installation, but can't
see any changes to the source for the file object between 2.3.4 and
2.3.5, for example.

~> python
Python 2.3.5 (#2, May  5 2005, 11:11:17)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('test','r+')
>>> f.read()
'lalala\n'
>>> f.write('testing')
>>> f.close()
>>>
~> cat test
lalala
testing  # no newline was present

Which version of FreeBSD are you running?  I thought it might be a
dependency on needing to seek between reads and writes on a duplex
stream (which is ANSI), but FreeBSD doesn't require that, at least
back as far as a 4.7 system I have, and I assume much earlier than
that.

One dumb question - are you absolutely sure it wasn't appending?  As
written, there's no trailing newline on the file, so your final "cat
test" would produce output where the "testing" was on the same line as
your next command prompt, and can sometimes be missed visually.

> Can anyone confirm that? Is there any other way of opening a file for
> appending instead of a+? 

Well, if you really just want appending, I'd just use "a".  It creates
the file if necessary but always appends to the end.  Of course, it's
not set up for reading, but you wouldn't need that for appending.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: debugger?

2005-07-06 Thread David Bolen

Qiangning Hong <[EMAIL PROTECTED]> writes:

(...)
> However, while I use pdb or inserting "print" statement to debug my
> apps, sometimes it is a pain.  I think I need a good GUI debugger to
> help me.  The debugger should meet _most_ of the following
> requirements:
> 
> 1. can debug wxPython applications (and other GUI lib).
> 2. an intuitive way to set/clear/enable/disable breakpoints.
> 3. can set conditional breakpoints (i.e. break when some condition satisfied).
> 4. variable watch list, namescope watching (local, global)
> 5. evaluate expression, change variable values, etc within debugging.
> 6. change the running routine, (i.e. go directly to a statement, skip
> some statements, etc)
> 7. clever way to express objects, not just a string returned by repr()
> 8. perform profiling
> 9. a clear interface.
> 10. cross-platform.
> 11. free, or better, open source.

Although we typically use unit tests and 'print' debugging, I settled
on Wing IDE as having the best debugger for the times when something
more was needed.  It's not free (pretty reasonable cost for an IDE
though), but otherwise I think would meet your other points, except
perhaps for profiling.  It's easy enough to grab an evaluation version
to try out (http://www.wingide.com).

For us, a big point was wxPython debugging, and being able to stop at
exceptions within wxPython event handlers.  Interestingly enough,
that's seems to be a tough requirement for many of the existing
debuggers because the exceptions occur in code that has been called
out to from within a C++ layer, and thus have to be caught before the
C++ layer gets a chance to clear the exception.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re:

2005-07-05 Thread David Bolen

[EMAIL PROTECTED] (Roy Smith) writes:

(...)
> We've got code coveage tools.  This is a testing tool.  You keep
> running tests and it keeps track of which lines of code are executed
> (i.e. which logic branches are taken).  One theory of testing says you
> should keep writing test cases until you've exercised every branch.  I
> don't see any reason such a tool wouldn't be useful in a big Python
> project, but I'm not aware of any.

The coverage.py module (http://www.garethrees.org/2001/12/04/python-coverage)
has worked pretty well for us.  Just run your unit test suite under its
control.

There's also http://www.softwareverify.com/pythonCoverageValidator
which is commercial.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: py2exe + svn - the final drama

2005-05-06 Thread David Bolen

Timothy Smith <[EMAIL PROTECTED]> writes:

> what i do is as soon as the update is complete i close the app, but it
> still gives the error, i tried clear() after update and before it, it
> still got the same error. it's be nice to not have to fiddle around
> with the zip file, i really think making py2exe create a dir instead
> of a zip will be much better

Well, you'd still potentially have a problem if the update changed a
file in that directory that hadn't been imported yet, but now depended
on other updated files that your application had already loaded old
versions for.  That's a general problem of updating modules beneath
the executing application, and not really specific to the zip file,
although you're getting a zip importer specific error related to that
in this case.

> here what i do anyway
> 
> if (os.name == 'nt') or (os.name == 'win32'):
>  client = pysvn.Client()
> #get current revision number
> CurrentRev = client.info('').revision.number
>  Check = client.update('')
> sys.path_importer_cache.clear()
>  if Check.number > CurrentRev:
> self.Popup('Update installed, click ok and restart
> ','Update installed')
> self.Destroy()
>  else:
> InfoMsg.Update(3,'No Updates needed')

Ah, it's more devious than I thought.  Just pointed out the other
missing piece in his response.

Apparently there are two levels of caching that you've got to defeat
if you change the underlying zip:

1. A global file set of file directory cache information for any opened
   zip file (for all files in the zip).  This is held in the zipimport
   module global _zip_directory_cache.
2. Individual file cached information within the zipimporter instance
   that is kept in the path importer cache (sys.path_importer_cache).
   Technically these are just references to the same individual entries
   being held in the dictionary from (1).

So when you cleared out (2), it still found the cached directory at
the zipimport module level and re-used that information.  But if you only
clear out (1), then the reference in (2) to the directory entries for
currently imported modules remains and still gets used.

I tried testing this with a small zip file that I first built with normal
compression on the entries, then imported one from a running interpreter,
and then rebuilt the zip without compression.  I couldn't seem to get the
precise error you were getting, but doing this gave me a decompression
error upon an attempted reload of an imported module, since the cached
information still thought it was compressed.

After clearing both sys.path_importer_cache and
zipimport._zip_directory_cache, the reload went fine.

It's sort of unfortunate that you have to cheat with the "private"
cache clearing in this case.  It might be worth an enhancement request
to see if zipimport could know to update itself if the timestamp on
the zip file changes, but this is sort of a very specialized scenario.
Although maybe just a public way to cleanly flush import cache
information would be useful.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: py2exe + svn - the final drama

2005-05-06 Thread David Bolen

Just <[EMAIL PROTECTED]> writes:

> the zipimport module has an attr called _zip_directory_cache, which is a 
> dict you can .clear(). Still, reloading modules is hairy at best, its 
> probably easiest to relaunch your app when the .zip file has changed.

Except that he's getting an error during the process exit of the
current execution, which is needed to restart.  And if he updates to a
different copy, there's the bootstrap problem of how to get it back
into the standard location for the next restart since his application
will need to have it to restart in the first place.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: py2exe + svn - the final drama

2005-05-06 Thread David Bolen

Timothy Smith <[EMAIL PROTECTED]> writes:

> Timothy Smith wrote:
(...)
> >zipimport.ZipImportError: bad local file header in Z:\temp\library.zip
> >
> > not that once i have finished client.update(''), it has successfully
> > updated the zipfile, i open a dialoge box saying "click ok and
> > restart program" AFTER i click on the above error pops up and my app
> > shuts down as intended.
> >
> >ideas?
> >
> ok i have done some digging and i cound this
> 
> /* Check to make sure the local file header is correct */
>   fseek(fp, file_offset, 0);
>   l = PyMarshal_ReadLongFromFile(fp);
>   if (l != 0x04034B50) {
>   /* Bad: Local File Header */
>   PyErr_Format(ZipImportError,
>"bad local file header in %s",
>archive);
>   fclose(fp);
> 
> 
> can anyone explain to me about zip file headers and why it would be
> different/incorrect and give me this error?

Are you perhaps trying to update the zip file in-place while it is still
being used by the application?  I'm not sure that's a safe operation.  A
quick peek at the same module where I think you found the above code shows
that when a zip importer instance is associated with a zip file, the
directory for that zip file is read in and cached.  So the importer is
holding onto offset information for each file based on the contents of the
zip directory at initialization time.

If you then change the file contents (such as updating it with svn), those
offsets will no longer be valid.  I then expect that during your process
exit, some bit of code is performing an extra import, which accesses the
wrong (based on the new file contents) portion of the zip file, and the
above safety check prevents it from loading an erroneous set of bytes
thinking its a valid module.

I expect you need to work on a mechanism to update the file
independently of the running copy, and then arrange to have it moved
into place for a subsequent execution.  Or find some way to have the
zip importer refresh its directory information or make a new importer
instance once the zip file is updated.

One (untested) thought ... before the update, make a copy of your
current library.zip as some other name, and adjust your sys.path to
reference that name (rather than the default pointer to the main
library.zip that py2exe initializes things with).  That should force
any future imports to access the old copy of the zip file and not the
one that svn will be updating.  Since you need to leave that zip file
copy in place through the exit (to satisfy any trailing imports),
arrange for your application to check for that copy on startup and
remove it if present.

Or, after looking through import.c handling for zip file imports,
there might be a simpler way.  ZIP imports are handled by a
zipimporter installed in sys.path_hooks, and once a specific path
element has a path hook instantiated for it (based on the sys.path
element name) it is cached in sys.path_hooks_cache.

So, simply clearing out the path_hooks_cache entry for your main
library.zip file should cause the next import attempt to re-create a
new zipimporter instance and thus re-open the file and re-load the
directory information.

I don't know if py2exe installs the library.zip into sys.path just as
"library.zip" or with some path information, but try checking out the
keys in sys.path_hooks_cache from your application when it is running.
You should find an entry (probably the only one unless you explicitly
augment sys.path yourself) for library.zip - clear out that key after
the update and see how it works.

Heck, since you're the efficiency hit is likely not an issue, just
flush all of sys.path_hooks_cache and don't even worry about the
actual key name for library.zip.  So a simple:
sys.path_importer_cache.clear()
call after your update completes may do the trick.

-- David

PS: In the same way that updating the library.zip under the running
application is tricky, you might run into issues if you end up trying
to update one of the extension modules.  svn might not be able to
update it (depending on how it deals with "in use" files).
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: py2exe and library.zip

2005-05-05 Thread David Bolen

Timothy Smith <[EMAIL PROTECTED]> writes:

> I've got this working now, and fyi it downloads the entire zip every
> time. and svn appears to be very slow at it to.

Hmm, not what I would have expected, and certainly unfortunate for
your desired use case.

I just tried some experiments with rsync (easier to test locally than
Subversion), and found that taking an existing zip, unpacking it and
then repacking it with some rearrangement was in fact sending
everything, even though the source files were unchanged.  Since py2exe
is effectively rebuilding that library.zip on each run, that probably
is a fair representation of the generation process.

I'm not familiar enough with zip file compression, but perhaps it
includes the use of something that is file specific to seed the
compression engine, which would mean that making a new zip file even
with the same files in it might not yield precisely the same internal
compressed storage.  Both versions would be proper and decompressible,
just not binary identical even for unchanged sources.

If I disabled compression for the zip files (just did a store only),
and rebuilt the zip even with a rearranged file order, rsync was
able to detect just the changes.

So you might want to try ensuring that your py2exe generated file is
not compressing the individual modules (a verbose zip listing of the
library.zip should just show that they were "Stored").  Your
library.zip will get larger, but it should become more efficient to
transfer - hopefully as well with Subversion as I was seeing with
rsync.

(In fact, I remember doing just something like this with a project of mine
that I was using py2exe with, and then using rsync to push out the resultant
files to remote sites - I had originally compressed the library.zip but
rsync was pushing the whole thing out, so I stopped using the compression)

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: py2exe and library.zip

2005-05-05 Thread David Bolen

Peter Hansen <[EMAIL PROTECTED]> writes:

> Good point.  When I wrote that I was picturing the form of compression
> that a .tar.gz file would have, not what is actually used inside a
> .zip file which is -- quite logically now that you point it out --
> done on a file-by-file basis.  (Clearly to do otherwise would risk
> your data and make changing compressed zips highly inefficient.)

Right, and yes, .tar.gz files are very problematic for such
algorithms, such as rsync.  In fact, there was a patch made available
for gzip (never made it ito the actual package I believe) that
permitted resetting the compression engine at selected block
boundaries - thus effectively bounding the "noise" generated by a
single change.  The output would grow a bit since resetting the engine
dropped overall efficiency, but you got a tremendous gain back in
terms of "rsyncability" of the file.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: py2exe and library.zip

2005-05-05 Thread David Bolen

Peter Hansen <[EMAIL PROTECTED]> writes:

> Do you know that Subversion has (as I understand it) a fairly
> intelligent binary file comparison routine, and it will (again, as I
> understand it) not transmit the entire contents of the zip file but
> would actually send only the portions that have changed?  At least,
> that's if the file isn't compressed in some way that prevents this
> algorithm from working well.  (Note to self: check if zip files that
> can be in sys.path can be compressed, and if py2exe compresses them.)

Even if the files were compressed, which has a net result similar to
randomizing the contents and will certainly extend the portion that
appears "changed", the worst that would happen is that subversion
(which does use a binary delta algorithm) would end up downloading the
single file portion of the zip file rather than the smaller change
within the file.  It should still be efficient.

But to be honest, for something like the OPs purpose, it's not clear
that an SCM is needed, since all he's trying to accomplish is bring a
remote copy up to date with the central one.  For that you could just
publish a location containing the necessary files and have the users
use something like rsync directly (which is just as efficient in terms
of a binary delta) to update their own local version.

Of course, if the Subversion server is already in place so it's a
convenient server, or if more of the user base already has the client
in place, it should work just about as well.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: email: Content-Disposition and linebreaks with long filenames

2005-04-19 Thread David Bolen

Martin Körner <[EMAIL PROTECTED]> writes:

> I am using email module for creating mails with attachment (and then
> sending via smtplib).
> 
> If the name of the attachment file is longer than about 60 characters
> the filename is wrapped in the Content-Disposition header:
> 
> Content-Disposition: attachment;
>   filename="This is a sample file with a very long filename
>   0123456789.zip"
> 
> 
> This leads to a wrong attachment filename in email clients - the space
> after "filename" is not shown or the client displays a special
> character (the linbreak or tab before 0123456789.zip).

Yes, it would appear that the default Generator used by the Message
object to create a textual version of a message explicitly uses tab
(\t) as a continuation character rather than space - probably because
it looks a little nicer when printed.  Interestingly enough, the
default Header continuation character is just a plain space which
would work fine here.

I should point out that I believe this header format could be
considered correct, although I find RFC2822 a bit ambiguous on this
point.  It talks about runs of FWS (folding white space) in an
unfolding operation as being considered a single space (section
3.2.3).  However, I suppose someone might argue if "runs" includes a
single character.  I think it should, but obviously some e-mail
clients disagree :-)

(...)
> Is it possible to prevent the linebreak?

Should be - two approaches I can think of (msg below is the email.Message):

1) Create your own Header object for the specific header line rather than
   just storing it as a string via add_header.  For that specific header you
   can then override the default maximum line length.  Something like:

from email.Header import Header

cd_header = Header('Content-Disposition: attachment; filename="."',
   maxlinelen=998)
msg['Content-Disposition'] = cd_header

   Note that because Header defaults to a space continuation character,
   you could also leave maxlinelen alone and let it break the line, but
   since it would break with a single space it would work right in clients.

2) Use your own Generator object to generate the textual version of the
   message (which is when the wrapping is occurring), and during the
   flattening process, disable (or set a longer value for) header wrapping.
   Something like:

   Assuming "fp" is an output File-like object:

from email.Generator import Generator

g = Generator(fp)
g.flatten(msg, maxheaderlen=998)   (or maxheaderlen=0 to disable wrapping)

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue.Queue-like class without the busy-wait

2005-04-01 Thread David Bolen

"Paul L. Du Bois" <[EMAIL PROTECTED]> writes:

> Has anyone written a Queue.Queue replacement that avoids busy-waiting?
> It doesn't matter if it uses os-specific APIs (eg
> WaitForMultipleObjects).  I did some googling around and haven't found
> anything so far.

This isn't a Queue.Queue replacement, but it implements a buffer
intended for inter-thread transmission, so it could be adjusted to
mimic Queue semantics fairly easily.  In fact, internally it actually
keeps write chunks in a list until read for better performance, so
just removing the coalesce process would be the first step.

It was written specifically to minimize latency (which is a
significant issue with the polling loop in the normal Python Queue
implementation) and CPU usage in support of a higher level
Win32-specific serial I/O class, so it uses Win32 events to handle the
signaling for the key events when waiting.

The fundamental issue with the native Python lock is that to be
minimalistic in what it requires from each OS, it doesn't impose a
model of being able to wait on an event signal - that's the key thing
you need to have (a timed blocking wait on some signalable construct)
to be most efficient for these operations - which is what I use the
Win32 Event for.

-- David

  - - - - - - - - - - - - - - - - - - - - - - - - -

import thread
import win32event as we

class Buffer:
"""A thread safe unidirectional data buffer used to represent data
traveling to or from the application and serial port handling threads.

This class is used as an underlying implementation mechanism by SerialIO.
Application code should not typically need to access this directly, but
can handle I/O through SerialIO.

Note that we use Windows event objects rather than Python's because
Python's OS-independent versions are not very efficient with timed waits,
imposing internal latencies and CPU usage due to looping around a basic
non-blocking construct.  We also use the lower layer thread lock rather
than threading's to minimize overhead.
"""

def __init__(self, notify=None):
self.lock = thread.allocate_lock()
self.has_data = we.CreateEvent(None,1,0,None)
self.clear()
self.notify = notify

def _coalesce(self):
if self.buflist:
self.buffer += ''.join(self.buflist)
self.buflist = []

def __len__(self):
self.lock.acquire()
self._coalesce()
result = len(self.buffer)
self.lock.release()
return result

def clear(self):
self.lock.acquire()
self.buffer = ''
self.buflist = []
self.lock.release()

def get(self, size=0, timeout=None):
"""Retrieve data from the buffer, up to 'size' bytes (unlimited if
0), but potentially less based on what is available.  If no
data is currently available, it will wait up to 'timeout' seconds
(forever if None, no blocking if 0) for some data to arrive"""

self.lock.acquire()
self._coalesce()

if not self.buffer:
# Nothing buffered, wait until something shows up (timeout
# rules match that of threading.Event)
self.lock.release()
if timeout is None:
win_timeout = we.INFINITE
else:
win_timeout = int(timeout * 1000)
rc = we.WaitForSingleObject(self.has_data, win_timeout)
self.lock.acquire()
self._coalesce()

if not size:
size = len(self.buffer)

result_len = min(size,len(self.buffer))
result = self.buffer[:result_len]
self.buffer = self.buffer[result_len:]
we.ResetEvent(self.has_data)
self.lock.release()
return result

def put_back(self,data):
self.lock.acquire()
self.buffer = data + self.buffer
self.lock.release()
we.SetEvent(self.has_data)
if self.notify:
self.notify()

def put(self, data):
self.lock.acquire()
self.buflist.append(data)
self.lock.release()
we.SetEvent(self.has_data)
if self.notify:
self.notify()
-- 
http://mail.python.org/mailman/listinfo/python-list

email.Message.set_charset and Content-Transfer-Encoding

2005-03-09 Thread David Bolen

I've noticed that using set_charset() on an email.Message instance will not
replace any existing Content-Transfer-Encoding header but will install one
if it isn't yet present.

Thus, if you initially create a message without a charset, it defaults to
us-ascii, and creates both Content-Type and Content-Transfer-Encoding
headers (the latter of which defaults to "7bit").

If you then later attempt to change the charset (say, to "iso-8859-1")
with set_charset(), it adjusts the Content-Type header, but leaves the
Content-Transfer-Encoding header alone, which I would think is no
longer accurate, since it is the new charset's body encoding that will
eventually be used when flattening the message, which would then no longer
match the encoding header.

It's also different than if you had passed in an iso-8859-1 charset
originally when constructing the message instance, in which case the
encoding would have been selected as quoted-printable.

The documentation for set_charset seemed to imply (at least to me)
that eventual headers generated by a generator would be affected by
the change in charset, so having it stay at 7bit was confusing.

Is anyone aware of a reason why the encoding shouldn't adjust in
response to a set_charset call similar to how a supplied charset
initially establishes it at message creation time?

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Gordon McMillan installer and Python 2.4

2005-03-04 Thread David Bolen

[EMAIL PROTECTED] (Svein Brekke) writes:

> Has anyone else succeded in using McMillans installer on 2.4?
> Thanks for any help.

I have a feeling that it may be related to the fact that the starter
modules (run*.exe) were built with VC6, which matches with Python
builds up to 2.3, but not 2.4 (which is built with VC7).  So you're
probably running into a mismatch between C runtime libraries.

Since it sounds like you have VS 2003 available to you, you might try
rebuilding the appropriate run*.exe (the source is in the installer
tree beneath the source directory).  VS 2003 should auto-convert the
.dsw/.dsp files but there's at least one dependency (zlib) that you
might have to build separately and handle manually (e.g., update the
project to locate your particular zlib library).

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: cannot open file in write mode, no such file or directory

2005-03-01 Thread David Bolen

[EMAIL PROTECTED] writes:

> I'm having a problem where when trying to open a file in write mode, I
> get an IOError stating no such file or directory.  I'm calling an
> external program which takes an input file and produces an output file
> repeatedly, simulating the input file separately for each replicate.
> The error occurs when trying to open the input file to write out the
> new data.  The problem is difficult to reproduce since it only shows up
> once every few thousand replicates.  I've tried using both os.system
> and os.popen to invoke the external program.  Originally I was running
> this on cygwin, but also tried under windows.

You might be hitting a race condition where the OS is still
considering the file to be in use when you get around to rewriting it,
even if the using application has just exited.  I've run into similar
problems when trying to rename temporary files under NT based systems.

The problem can be obscured because some of the Win32-specific IO
errors can turn into more generic IOError exceptions at the Python
level due to incomplete mappings available for all Win32 errors.  In
particular, a lot of Win32-layer failures turn into EINVAL errno's at
the C RTL level, which Python in turn translates to ENOENT (which is
the file not found).  So the IOError exception at the Python level can
be misleading.

Since it sounds like you can reproduce the problem relatively easily
(just run your application several thousand times), a quick check for
this condition would be to trap the IOError, delay a few seconds (say
5-10 to be absolutely sure, although in the cases I've run into 2-3 is
generally more than enough), and retry the operation.  If that
succeeds, then this might be the issue you're hitting.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

print and str subclass with tab in value

2005-02-23 Thread David Bolen

I ran into this strange behavior when noticing some missing spaces in
some debugging output.  It seems that somewhere in the print
processing, there is special handling for string contents that isn't
affected by changing how a string is represented when printed
(overriding __str__).

For example, given a class like:

class mystr(str):

def __new__(cls, value):
return str.__new__(cls, value)

def __str__(self):
return 'Test'

you get the following behavior

>>> x = strtest.mystr('foo')
>>> print x,1
Test 1
>>> print repr(x),1
'foo' 1
>>> x = strtest.mystr('foo\t')
>>> print x,1
Test1
>>> print repr(x),1
'foo\t' 1

Note the lack of a space if the string value ends in a tab, even if
that tab has nothing to do with the printed representation of a
string.

It looks like it's part of basic string output since with a plain old
string literal the tab gets output (I've replaced the literal tab with
[TAB] in the output below) but no following string.

>>> x = 'testing\t'
>>> print x,1
testing[TAB]1
>>> x = str('testing\t')
>>> print x,1
testing[TAB]1

so I'm guessing it's part of some optimization of tab handling in
print output, although a quick perusal of the Python source didn't
have anything jump out at me.

It seems to me that this is probably a buglet since I would expect
print and its softspace handling to depend on what was actually
written and not internal values - has anyone else ever run into this.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: win32 service and sockets

2005-02-09 Thread David Bolen

Tom Brown <[EMAIL PROTECTED]> writes:

> Well, I have found that it works if I launch the client on the same
> machine as the service. It will not work from a remote machine. Any
> ideas?

Sounds like it might be an issue at the network layer rather than in
your code - perhaps a routing or filtering problem between your two
machines.  Have you verified that you do in fact have network
connectivity between the machines (such as with ping), and that you
can reach your server's port from the client (perhaps try telnetting
to the port).

Since you mentioned Xp, could any of it's built-in firewall support be
enabled, and perhaps blocking access to your server's port?

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Redirecting stdout/err under win32 platform

2005-02-03 Thread David Bolen

Pierre Barbier de Reuille <[EMAIL PROTECTED]> writes:

> AFAIK, there is no working bidirectionnal pipes on Windows ! The
> functions exists in order for them to claim being POSIX, but they're
> not working properly. (...)

Can you clarify what you believe doesn't work properly?  The os.popen*
functions under Windows use native CreateProcess calls to create the
child process and connect stdin/out/err handles to that child process,
so should behave properly.  (Subject of course to the same risk of
deadlocks and what not due to buffering or queued up data that any
system would have with these calls)

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: lambda

2005-01-18 Thread David Bolen

Antoon Pardon <[EMAIL PROTECTED]> writes:

> Op 2005-01-18, Simon Brunning schreef <[EMAIL PROTECTED]>:
> > On 18 Jan 2005 07:51:00 GMT, Antoon Pardon <[EMAIL PROTECTED]> wrote:
> >> 3 mutating an item in a sorted list *does* *always* cause problems
> >
> > No, it doesn't. It might cause the list no longer to be sorted, but
> > that might or might no be a problem.
> 
> Than in the same vain I can say that mutating a key in a dictionary
> doesn't always cause problems either. Sure it may probably make a
> key unaccessible directly, but that might or might not be a problem.

Well, I'd definitely consider an inaccessible key as constituting a
problem, but I don't think that's a good analogy to the list case.

With the dictionary, the change can (though I do agree it does not
have to) interfere with proper operation of the dictionary, while a
list that is no longer sorted still functions perfectly well as a
list.  That is, I feel "problems" are more guaranteed with a
dictionary since we have affected base object behavior, whereas sorted
is not an inherent attribute of the base list type but something the
application is imposing at a higher level.

For example, I may choose to have an object type that is mutable (and
not worthy for use as a dictionary key) but maintains a logical
ordering so is sortable.  I see no problem with sorting a list of such
objects, and then walking that list to perform some mutation to each
of the objects, even if along the way the mutation I am doing results
in the items so touched no longer being in sorted order.  The act of
sorting was to provide me with a particular sequence of objects, but
aside from that fact, the list continues to perform perfectly well as
a list even after the mutations - just no longer delivering objects in
sorted order.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: extension module, thread safety?

2005-01-18 Thread David Bolen

Nick Coghlan <[EMAIL PROTECTED]> writes:

> Pierre Barbier de Reuille wrote:
> > Ok, I wondered why I didn't know these functions, but they are new
> > to Python 2.4 ( and I didn't take the time to look closely at Python
> > 2.4 as some modules I'm working with are still not available for
> > Python 2.4). But if it really allows to call Python code outside a
> > Python thread ... then I'll surely use that as soon as I can use
> > Python 2.4 :) Thanks for the hint :)
> 
> The Python 2.4 docs claim the functions were added in Python 2.3, even
> though they aren't documented in the 2.3.4 docs.
> 
> The 2.3 release PEP (PEP 283) confirms that PEP 311 (which added these
> functions) went in.

And even before that it was certainly possible to call into the Python
interpreter from a native thread using existing functions, albeit the
newer functions are more convenient (and perhaps more robust, I don't
know).

My earliest interaction with Python (~1999, while writing a module
that extended and embedded Python 1.5.2) used PyEval_AcquireThread()
and PyEval_ReleaseThread() to get access to a thread state from a
native C application thread (not initiated by the Python interpreter)
to allow me to call safely into an executing Python script upon
asynchronous data reception by the C code.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: extension module, thread safety?

2005-01-17 Thread David Bolen

Torsten Mohr <[EMAIL PROTECTED]> writes:

> The question came up if this is by itself thread safe,
> if some two or more threads try to change these data types,
> are the C functions by themselves are "atomic" or can they
> be interrupted be the perl interpreter and then (data types
> are in some inconsistent half-changed state) another function
> that works on these data is called?

I presume you mean "Python" and not "perl"...

If the threads under discussion are all Python threads, then by
default yes, the extension module C functions will appear to be atomic
from the perspective of the Python code.  When the Python code calls
into the extension module, the GIL (global interpreter lock) is still
being held.  Unless the extension module code explicitly releases the
GIL, no other Python threads can execute (even though those threads
are in fact implemented as native platform threads).

So in general, if you write an extension module where none of its
functions ever release the GIL, there's no way for two of its
functions to be run from different Python threads simultaneously.

Note that this restriction won't necessarily hold if there are other
ways (at the C level, or from other extension modules) to trigger code
in the extension module, since that's outside of the control of the
Python GIL.  Nor will it necessarily hold true if your extension
module calls back out into Python (as a callback, or whatever) since
once the interpreter is back in Python code the interpreter itself
will periodically release the GIL, or some other extension code that
the callback code runs may release it.

To the extent possible, it's considered good practice to release the
GIL in an extension module whenever you are doing lengthy processing
so as to permit other Python threads (that may have nothing to do with
using your extension module) to execute.  For short routines this
really isn't an issue, but if your extension module will be spending
some time managing its data, you may wish to add some internal thread
protection around that data, so that you can use your own locks rather
than depending on the GIL.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Another PythonWin Excel question

2005-01-06 Thread David Bolen

"It's me" <[EMAIL PROTECTED]> writes:

> Yes, I read about that but unfortunately I have no experience with VBA *at
> all*.  :=(

You don't really have to know VBA, but if you're going to try to
interact with COM objects from Python, you'll find it much smoother if
you at least use any available reference information for the COM
object model and interfaces you are using.

In the Excel case, that means understanding - or at least knowing how
to look in a reference - its object model, since that will tell you
exactly what parameters an Add method on a worksheet object will take
and how they work.

For excel, online documentation can be found in a VBAXL9.CHM help file
(the "9" may differ based on Excel release), but it might not always
be installed depending on what options were selected on your system.  In
my English, Office 2000 installation, for example, the files are located in:
c:\Program Files\Microsoft Office\Office\1033

You can load that file directly, or Excel itself will reference it
from within the script editor help (Tools->Macro->Visual Basic Editor,
then F1 for help).  If you methods or classes and have the help
installed it'll bring in the reference.

You can also find it on MSDN on the web, although it can be tricky to
navigate down to the right section - the top of the Office 2000 object
documentation should be available at:

http://msdn.microsoft.com/library/en-us/odeomg/html/deovrobjectmodelguide.asp

This is mostly reference information, but there are some higher level
discussions of overall objects (e.g., worksheets, workbooks, cells,
etc...) too.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Securing a future for anonymous functions in Python

2004-12-31 Thread David Bolen

Scott David Daniels <[EMAIL PROTECTED]> writes:

> David Bolen wrote:
> > So for example, an asynchronous sequence of operations might be like:
> > d = some_deferred_function()
> > d.addCallback(lambda x: next_function())
> > d.addCallback(lambda blah: third_function(otherargs, blah))
> > d.addCallback(lambda x: last_function())
> > which to me is more readable (in terms of seeing the sequence of
> > operations being performed in their proper order), then something like:
> > def cb_next(x):
> > return next_function()
> > def cb_third(blah, otherargs):
> > return third_function(otherargs, blah)
> > def cb_last(x):
> > return last_function()
> > d = some_deferred_function()
> > d.addCallback(cb_next)
> > d.addCallback(cb_third, otherargs)
> > d.addCallback(cb_next)
> > which has an extra layer of naming (the callback functions),
> > and
> > requires more effort to follow the flow of what is really just a simple
> > sequence of three functions being called.
> 
> But this sequence contains an error of the same form as the "fat":

"this" being which of the two scenarios you quote above?

>  while test() != False:
>   ...code...

I'm not sure I follow the "error" in this snippet...

> The right sequence using lambda is:
>   d = some_deferred_function()
>   d.addCallback(next_function)
>   d.addCallback(lambda blah: third_function(otherargs, blah))
>   d.addCallback(last_function)

By what metric are you judging "right"?

In my scenario, the functions next_function and last_function are not
written to expect any arguments, so they can't be passed straight into
addCallback because any deferred callback will automatically receive
the result of the prior deferred callback in the chain (this is how
Twisted handles asynchronous callbacks for pending operations).
Someone has to absorb that argument (either the lambda, or
next_function itself, which if it is an existing function, needs to be
handled by a wrapper, ala my second example).

Your "right" sequence simply isn't equivalent to what I wrote.
Whether or not next_function is fixable to be used this way is a
separate point, but then you're discussing two different scenarios,
and not two ways to write one scenario.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Event-Driven Woes: making wxPython and Twisted work together

2004-12-30 Thread David Bolen

Daniel Bickett <[EMAIL PROTECTED]> writes:

> My initial solution was, naturally, the wxPython support inside of the
> twisted framework. However, it has been documented by the author that
> the support is unstable at this time, and should not be used in
> full-scale applications.

Rather than the wx reactor, there's an alternate recipe that just
cranks the twisted event loop from within a timer at the wx level that
we've used very successfully.  It does have some caveats (such as a
potentially higher latency in servicing the network based on your
timer interval), but so far for our applications it hasn't been an
issue at all, so it might be something you might try.  The code was
based on http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/181780

The advantage of this approach is that no threads are necessary, and
thus there's no problem issuing wxPython calls from within twisted
callbacks or twisted calls from within wxPython event handlers.

Just include the following bits into your application object (note that
the use of "installSignalHandlers" might not be needed on all systems):

class MyApp(wx.wxApp):

(...)

def OnInit(self):

# Twisted Reactor code
reactor.startRunning(installSignalHandlers=0)
wx.EVT_TIMER(self, 99, self.OnTimer)
self.timer = wx.wxTimer(self, 99)
self.timer.Start(150, False)

(...)

def OnTimer(self, event):
reactor.runUntilCurrent()
reactor.doIteration(0)

def __del__(self):
self.timer.Stop()
reactor.stop()
wx.wxApp.__del__(self)

and you can try adjusting the timer interval for the best mix of CPU
load versus latency.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Securing a future for anonymous functions in Python

2004-12-30 Thread David Bolen

Ian Bicking <[EMAIL PROTECTED]> writes:

> The one motivation I can see for function expressions is
> callback-oriented programming, like:
> 
>get_web_page(url,
>  when_retrieved={page |
>give_page_to_other_object(munge_page(page))})

This is my primary use case for lambda's nowadays as well - typically
just to provide a way to convert the input to a callback into a call
to some other routine.  I do a lot of Twisted stuff, whose deferred
objects make heavy use of single parameter callbacks, and often you
just want to call the next method in sequence, with some minor change
(or to ignore) the last result.

So for example, an asynchronous sequence of operations might be like:

d = some_deferred_function()
d.addCallback(lambda x: next_function())
d.addCallback(lambda blah: third_function(otherargs, blah))
d.addCallback(lambda x: last_function())

which to me is more readable (in terms of seeing the sequence of
operations being performed in their proper order), then something like:

def cb_next(x):
return next_function()
def cb_third(blah, otherargs):
return third_function(otherargs, blah)
def cb_last(x):
return last_function()

d = some_deferred_function()
d.addCallback(cb_next)
d.addCallback(cb_third, otherargs)
d.addCallback(cb_next)

which has an extra layer of naming (the callback functions), and
requires more effort to follow the flow of what is really just a simple
sequence of three functions being called.

> I think this specific use case -- defining callbacks -- should be
> addressed, rather than proposing a solution to something that isn't
> necessary.  (...)

I'd be interested in this approach too, especially if it made it simpler
to handle simple manipulation of callback arguments (e.g., since I often
ignore a successful prior result in a callback in order to just move on
to the next function in sequence).

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is there a better way of listing Windows shares other than using "os.listdir"

2004-12-30 Thread David Bolen

[EMAIL PROTECTED] writes:

> I'm currently using "os.listdir" to obtain the contents of some slow Windows
> shares.  I think I've seen another way of doing this using the win32 library
> but I can't find the example anymore.

Do you want the list of files on the shares or the list of shares
itself?  If the files, you can use something like FindFiles, but I
don't expect it to be that much faster just to obtain directory names
(likely the overhead is on the network).

If you just want a list of shares, you could use NetUseEnum, which
should be pretty speedy.

(FindFiles is wrapped by win32api, and NetUseEnum by win32net, both parts
 of the pywin32 package)

Here's a short example of displaying equivalent output to the "net
use" command:

  - - - - - - - - - - - - - - - - - - - - - - - - -
import win32net

status = {0 : 'Ok',
  1 : 'Paused',
  2 : 'Disconnected',
  3 : 'Network Error',
  4 : 'Connected',
  5 : 'Reconnected'}

resume = 0
while 1:
(results, total, resume) = win32net.NetUseEnum(None, 1, resume)
for use in results:
print '%-15s %-5s %s' % (status.get(use['status'], 'Unknown'),
 use['local'],
 use['remote'])
if not resume:
break
  - - - - - - - - - - - - - - - - - - - - - - - - -

Details on the the arguments to NetUseEnum can be found in MSDN (with
any pywin32 specifics in the pywin32 documentation).

> My main problem with using "os.listdir" is that it hangs my gui application.
> The tread running the "os.listdir" appears to block all other threads when
> it calls this function.

Yes, for a GUI you need to keep your main GUI thread always responsive
(e.g., don't do any blocking operations).

There are a number of alternatives to handling a long processing task
in a GUI application, dependent on both the operation and toolkit in
use.  For wxPython, http://wiki.wxpython.org/index.cgi/LongRunningTasks
covers several of the options (and the theory behind them is generally
portable to other toolkits although implementation will change).

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Problem in threading

2004-12-30 Thread David Bolen

"It's me" <[EMAIL PROTECTED]> writes:

> It depends on what "help" means to you.   Both Windows and Unix (and it's
> variances) are considered "thread-weak" OSes.  So, using thread will come
> with some cost.   The long gone IBM OS/2 is a classic example of a
> "thread-strong" OS.
(...)

Interesting - can you clarify what you perceive as the differences
between a thread-weak and thread-strong OS?  If given the choice, I
would probably refer to Windows (at least NT based systems, let's
ignore 9x) as thread-strong, and yes, often think of Windows as
preferring thread based solutions, while Unix would often prefer
process based.

Windows is far more efficient at handling large numbers of threads
than it is processes, with much less overhead and there is lots of
flexibility in terms of managing threads and their resources.  Threads
are first class OS objects at the kernel and scheduler level (waitable
and manageable).

I can't think of anything offhand specific that OS/2 did with respect
to threads that isn't as well supported by current Win32 systems.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: where's "import" in the C sources?

2004-12-29 Thread David Bolen

Torsten Mohr <[EMAIL PROTECTED]> writes:

> i tried to find the file and line in the C sources of python
> where the command "import" is implemented.  Can anybody give
> me some hint on this?

Well, there are several levels, depending on what you are looking for.
The literal "import" syntax in a source module is translated by the
Python compiler to various IMPORT_* bytecodes, which are processed in
the main interpreter loop (see ceval.c).

They all basically bubble down to making use of the builtin __import__
method, which is obtained from the builtin module defined in
bltinmodule.c.

That in turn makes use of the import processing module whose code can
be found in import.c - which is the same source that also implements
the "imp" module to provide lower layer access to to the import
internals.

Now, when it comes to physically loading in a module, Python source
and compiled modules are handled by import (well, not the compiling
part), but dynamically loaded extension modules are OS specific.  You
can find the handling of such extension modules in OS-specific source
files dynload_*.c (e.g., dynload_win.c for Windows).

All of these files can be found in the dist/src/Python directory in
the Python source tree.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: A completely silly question

2004-12-20 Thread David Bolen

"Fredrik Lundh" <[EMAIL PROTECTED]> writes:

> >> Well, but that's true as well for getchar() (at least in many cases of
> >> interactive input and line buffering), so in that respect I do think
> >> it's a fairly direct replacement, depending on how the OP was going to
> >> use getchar() in the application.
> >
> > The OP said "wait for a single character input". sys.stdin.read(1)
> > waits for a newline.
> 
> in the same same sentence, the OP also said that he wanted something like
> C's getchar().  if you guys are going to read only parts of the original post,
> you could at least try to read an entire sentence, before you start arguing...

Not even sure what's there to argue about - getchar() does do single
character input, so the OPs (full) original sentence seems plausible
to me, and his example was using it in a while loop which I took to
represent processing some input one character at a time.

In any event - I also gave a way (Windows-specific) to truly obtain
the single next character without any buffering, so just ignore any
controversy in the first part of the response if desired.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: threading priority

2004-12-20 Thread David Bolen

Peter Hansen <[EMAIL PROTECTED]> writes:

> [EMAIL PROTECTED] wrote:
> > I googled as suggested, and the answer isn't crystal clear.  My
> > impression is that the problem is that a python thread must acquire the
> > GIL in order to execute, and the strategy for deciding which thread
> > should get the GIL when multiple threads are waiting for it is not
> > based on priority.  Is that correct?
> 
> That's basically correct.  I don't actually know what
> the strategy is, though I suspect it's either not
> formally documented or explicitly not defined, though
> for a given platform there may be some non-arbitrary
> pattern...
> (...)

I expect the Python interpreter has little to say over thread
prioritization and choice of execution, although it does impose some
granularity on the rate of switching.  The GIL itself is implemented
on the lower layer lock implementation, which is taken from the native
threading implementation for the platform.

Therefore, when multiple Python threads are waiting for the GIL, which
one is going to get released will depend on when the underlying OS
satisfies the lock request from the threads, which should be based on
the OS thread scheduling system and have nothing to do with Python
per-se.

I do believe you are correct in that the Python GIL prevents thread
pre-emption by the OS (because all other Python threads are waiting on
the GIL and not in a running state), but the actual act of switching
threads at a switching point (sys.setcheckinterval()) would be an OS
only decision, and subject to whatever standard platform thread
scheduling rules were in place.

So if you were to use a platform specific method to control thread
priority, that method should be honored by the Python threads (subject
to the granularity of the system check interval for context switches).
For example, here's a Windows approach that fiddles with the thread
priority:

  - - - - - - - - - - - - - - - - - - - - - - - - -
import threading
import ctypes
import time

w32 = ctypes.windll.kernel32

THREAD_SET_INFORMATION = 0x20
THREAD_PRIORITY_ABOVE_NORMAL = 1

class DummyThread(threading.Thread):

def __init__(self, begin, name, iterations):
threading.Thread.__init__(self)
self.begin = begin
self.tid = None
self.iterations = iterations
self.setName(name)

def setPriority(self, priority):
if not self.isAlive():
print 'Unable to set priority of stopped thread'

handle = w32.OpenThread(THREAD_SET_INFORMATION, False, self.tid)
result = w32.SetThreadPriority(handle, priority)
w32.CloseHandle(handle)
if not result:
print 'Failed to set priority of thread', w32.GetLastError()

def run(self):
self.tid = w32.GetCurrentThreadId()
name = self.getName()

self.begin.wait()
while self.iterations:
print name, 'running'
start = time.time()
while time.time() - start < 1:
pass
self.iterations -= 1

if __name__ == "__main__":

start = threading.Event()

normal = DummyThread(start, 'normal', 10)
high   = DummyThread(start, 'high', 10)

normal.start()
high.start()

# XXX - This line adjusts priority - XXX
high.setPriority(THREAD_PRIORITY_ABOVE_NORMAL)

# Trigger thread execution
start.set()
  - - - - - - - - - - - - - - - - - - - - - - - - -

And the results of running this with and without the setPriority call:

Without:With:

normal running  high running  
high runninghigh running  
normal running  high running  
high runninghigh running  
normal running  normal running
high runninghigh running  
normal running  high running  
high runninghigh running  
normal running  high running  
high runningnormal running
normal running  high running  
high runninghigh running  
normal running  normal running
high runningnormal running
normal running  normal running
high runningnormal running
normal running  normal running
high runningnormal running
normal running  normal running
high runningnormal running

I'm not entirely positive why the normal thread gets occasionally
executed before the high thread is done.  It might be that the
interpreter is actually releasing the GIL in the code I've written for
the thread's run() (maybe during the I/O) which opens up an
opportunity, or it may be that Windows is boosting the other thread
occasionally to avoid starvation.  So I expect the normal thread is
getting occasional bursts of bytecode execution (the syscheckinterval).

But clearly the OS level prioritizati

Re: A completely silly question

2004-12-17 Thread David Bolen

Mike Meyer <[EMAIL PROTECTED]> writes:

> Steven Bethard <[EMAIL PROTECTED]> writes:
> 
> > Amir Dekel wrote:
> >> What I need from the program is to wait for a single character
> >> input, something like while(getchar()) in C. All those Python
> >> modules don't make much sence to me...
> >
> > sys.stdin.read(1)
> 
> That doesn't do what he wants, because it doesn't return until you hit
> a newline.

Well, but that's true as well for getchar() (at least in many cases of
interactive input and line buffering), so in that respect I do think
it's a fairly direct replacement, depending on how the OP was going to
use getchar() in the application.

For example, compare:  with:

#include  >>> import sys
   >>> while 1:
main() ...   c = sys.stdin.read(1)
{  ...   print ord(c),
while (1) {...
int ch = getchar();
printf("%d ",ch);
}
}

When run, both produce (at least for me):

0123456789 (hit Enter here)
48 49 50 51 52 53 54 55 56 57 10

under both Unix (at least FreeBSD/Linux in my quick tests) and Windows
(whether MSVC or Cygwin/gcc).

(I don't include any output buffer flushing, since it shouldn't be
needed on an interactive terminal, but you could add that to ensure
that it isn't the output part that is being buffered - I did try it
just to be sure on the Unix side)

> The answer is system dependent. Or you can use massive overkill and
> get curses, but if you're on windows you'll have to use a third party
> curses package, and maybe wrap it

If you want to guarantee you'll get the next console character without
any waiting under Windows there's an msvcrt module that contains
functions like kbhit() and getch[e] that would probably serve.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Socket being garbage collected too early

2004-12-16 Thread David Bolen

Scott Robinson <[EMAIL PROTECTED]> writes:

> I have been having trouble with the garbage collector and sockets.

Are you actually getting errors or is this just theoretical?

> Unfortunately, google keeps telling me that the problem is the garbage
> collector ignoring dead (closed?) sockets instead of removing live
> ones.  My problem is
> 
> 
>   x.sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
>   do_stuff(x.sock)
> 
> 
> def do_stuff(sock):
>   sock_list.append(sock)
> 
> once do_stuff finishes, x.sock disappears, and I can only believe it
> is being garbage collected.

Can you clarify this?  What do you mean by "x.sock" disappears?  Are
you getting a NameError later when trying to use "x.sock"?  

x.sock is just a name binding, so it is not really involved in garbage
collection (GC applies to the objects to which names are bound). 

In this case, you need to include much more in the way of code (a
fully running, but smallest possible, snippet of code would be best),
since the above can be interpreted many ways.  At the least, it's very
important to include information about the namespace within which
those two code snippets run if anyone is likely to be able to give you
a good answer.  Also, being very precise about the error condition you
are experiencing (including actual error messages, tracebacks, etc...)
is crucial.

Is 'x' referencing a local or global object, and does that socket code
occur within a method, a function, or what?  Also, in do_stuff, where
is sock_list defined?  Is it local, global?

If, as written, sock_list is a local name to do_stuff, then that
binding is going to disappear when do_stuff completes, thus, the list
to which it is bound will be destroyed, including all references to
objects that the list may contain.  So at that point, when you return
from do_stuff, the only reference to the socket object will be in
x.sock.  But if 'x' is also local to the function/method where the
call to do_stuff is, the name binding will be removed when the
function/method returns, at which point there will be no references to
the socket object, and yes, it will be destroyed.

But if sock_list is global, and continues to exist when do_stuff
completes, then the reference it contains to the socket will keep the
socket object alive even if you remove the x.sock binding.

-- David

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Winge IDE Issue - an suggestions?

2004-12-16 Thread David Bolen

Mike Thompson  writes:

(...)
> WingIDE bug seemed the only explanation, although it was puzzling me
> that something so obvious could make it through their QA. Thanks again.

I haven't used ElementTree, but if it includes an extension module
(likely for performance), it's important to realize that WingIDE's
debugger specifically catches exceptions that occur on the "far" side
of an extension module.  So even if that extension module would
normally suppress the exception, thus hiding it from the original
Python code that called the extension module, Wing's debugger will
stop at it, which is different than normal runtime.

This can actually be very helpful, since for example, debuggers that
can't do this can't stop on exceptions in cases such as wxPython event
handlers, since they occur from within a C extension too.

Wing has a bunch of default locations that it ignores (that would
otherwise trigger via normal standard library calls), but for your own
applications or libraries, you need to teach it a bit, by asking it to
ignore locations you know not to be relevent to your code.  Once you
mark such a location, it is remembered in your project so it won't
bother you again.

This was discussed a bit more in depth recently in the "False
Exceptions" thread on this group.  See:

http://groups-beta.google.com/group/comp.lang.python/browse_frm/thread/f996d6554334e350/e581bea434d3d248

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: New versions breaking extensions, etc.

2004-12-15 Thread David Bolen

"Martin v. Löwis" <[EMAIL PROTECTED]> writes:

> Can you elaborate? To me, that problem only originates from
> the OS lack of support for deleting open files. If you could
> delete a shared libary that is still in use (as you can on
> Unix), the put the new version of the DLL in the place, (...)

Note that at least on NT-based systems, you can at least rename the
existing file out of the way while it is in use in order to install a
new version.

I do think however, that until more recent changes (not sure whether
in 2K versus XP) in how DLL searches work (e.g., permitting local
versions), even with that operation, if a named DLL was available
loaded into memory, it would be used by a subsequent process
attempting to reference it regardless of the state of the filesystem.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: uptime for Win XP?

2004-12-13 Thread David Bolen

Andrey Ivanov <[EMAIL PROTECTED]> writes:

(...)
> Writting this script was harder than I initially thought due to
> a lack of documentation for win32all. And I still don't know what
> that bizzare_int value stands for (an error/status code?).

The pywin32 documentation tends not to duplicate information already
available via MSDN (whether in a local installation or at
msdn.microsoft.com) on the underlying Win32 API, so when in doubt,
that's where to look.  Then, the pywin32 documentation will sometimes
qualify how the Python interface maps that function.

But in particular, a general rule (as has already been posted) is that
any out parameters are aggregated along with the overall result code
into a result tuple.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: pythonwin broke

2004-12-03 Thread David Bolen

Trent Mick <[EMAIL PROTECTED]> writes:

> It is also possible that there is some little installer bug or detail
> on your environment that is causing the problem. You could try
> ActivePython. I regularly install and uninstall ActivePython 2.3 and
> 2.4 installers and both installs are still working fine.

Just as another data point, I have all of Python 1.5.2, 2.0.1, 2.1.3,
2.2.3, 2.3.4 and 2.4 installed side by side on my Windows box, as
installed by their standard installers, without any problems.  And
that includes uninstall/reinstall cycles for patch releases of
versions older than the most recent (e.g., putting on 2.2.3 after a
2.3 variant was already installed).  The only real restriction is as
you noted - only one can own the file associations (or be associated
with the COM support for pywin32).

In case it matters, I do install everything as administrator for all
users and this is under 2K (my NT box has everything but 2.4).

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: installing 2.4

2004-12-02 Thread David Bolen

"Jive" <[EMAIL PROTECTED]> writes:

> It's only getting worse.  I went to Add/remove programs and removed 2.4.
> Now Python 2.4 numarray and Python 2.4 pywin extensions are still listed as
> installed, but I cannot remove them.

You mentioned in your first post about "copying your site package"
... did you actually make a copy or did you perhaps "move" your
site-packages directory from beneath 2.3 to under 2.4.  If so, then
the uninstall entry in the registry is not going to find the files to
be able to uninstall them.

Worst case you should be able to reinstall Python 2.3, and your
extension packages from their installer images.  Don't worry about the
uninstall list in Add/Remove programs as reinstalling the packages
will just update their entries.  That will refresh the files in your
Python 2.3 tree, and providing you don't disable the option, should
re-establish file associations and what not back to a 2.3
installation.

-- David
-- 
http://mail.python.org/mailman/listinfo/python-list

1 2 >

1 - 100 of 105 matches

Mail list logo