Re: [Tutor] Python C extension - which method?

2018-05-05 Thread Stefan Behnel
Hi,

Brad M schrieb am 04.05.2018 um 11:30:
> I want to create a C-based memory scanner for Python, and so far this is
> how I do it:
> 
> Python:
> 
> from ctypes import cdll
> mydll = cdll.LoadLibrary('hello.dll')
> print(mydll.say_something())
> 
> and hello.dll:
> 
> #include 
> __declspec(dllexport) int say_something()
> {
> return 1980;
> }
> 
> so the printout is "1980"
> 
> Is this alright?


Depends on your needs and your C/C++ knowledge.

If you have a shared library that provides the ready-made functionality,
and accessing that native code at all is more important than calling it
very quickly (e.g. you only do a few longish-running calls into it), then
wrapping a shared library with ctypes (or preferably cffi) is a good way to
do it.

Otherwise, try either a native wrapper generator like pybind11, or write
your wrapper in Cython.

Specifically, if you are not just calling into an external library 1:1, but
need to do (or can benefit from doing) non-trivial operations in native
code, definitely use Cython.

http://cython.org


> I am aware that there is another much more complicated
> method such as this:
> 
> https://tutorialedge.net/python/python-c-extensions-tutorial/#building-and-installing-our-module

Well, yes, it exists, but I advise against wrapping C code manually that
way. It's just too cumbersome and error prone. Leave it to the experts who
have already written their tools for you.

Stefan


Disclosure: I'm a Cython core dev, so I'm biased and I absolutely know what
I'm talking about.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] XML Programs

2018-04-17 Thread Stefan Behnel
Glen schrieb am 16.04.2018 um 13:10:
> I'm writing a save-game editor for a game I play (just a project to learn).
> But I am struggling on how to structure the code, how to store the xml data
> in data structure etc,
> 
> Can anyone recommend some source I can review that reads and writes data
> from an xml file.

Here's a tutorial for the lxml package:

http://lxml.de/tutorial.html

However, I'd first check if there really is no Python library yet that
handles your "game files", whatever format they may have. One of the most
important things to learn about software engineering is to know when *not*
to write code to solve a problem.

If you end up having (or wanting) to deal with the bare XML format
yourself, you may consider implementing your own XML API for your format,
so that you can nicely assign functionality to certain tags in the document
tree. See the section on "Implementing Namespaces" here:

http://lxml.de/element_classes.html

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] XML Programs

2018-04-17 Thread Stefan Behnel
leam hall schrieb am 16.04.2018 um 14:54:
> On Mon, Apr 16, 2018 at 7:10 AM, Glen wrote:
>> I'm writing a save-game editor for a game I play (just a project to learn).
>> But I am struggling on how to structure the code, how to store the xml data
>> in data structure etc,
>>
>> Can anyone recommend some source I can review that reads and writes data
>> from an xml file.
> 
> A friend's comment was "life is too short for XML". I like that. Have
> you considered JSON? Taking it a step further, MongoDB (JSON) or
> SQLite (SQL)? Both are pretty common and standard.

Actually, XML is pretty common and standard. But life's definitely too
short for Mongo.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] XML parsing

2018-03-30 Thread Stefan Behnel
Neil Cerutti schrieb am 30.03.2018 um 15:50:
> On 2018-03-30, Stefan Behnel wrote:
>> I admit that I'm being a bit strict here, there are certainly
>> cases where parsing the namespace from a tag is a sensible
>> thing to do. I'm really just saying that most of the time, when
>> you feel the need to do that, it's worth reconsidering (or
>> asking) if you are doing the right thing.
> 
> Namespaces hurt my head when I try to imagine a use for them. The
> only one's I've encountered are just applied to an entire
> document, making parsing more inconvenient but providing no other
> benefit I can see.

Actually, namespaces give meaning to data, and they allow combining
different XML languages into one document. Without them, it would be
impossible to tell, for example, that what a document contains is an author
name in Dublin Core metadata and not an image subtitle. Or a base64 encoded
inline image. Or anything. They allow you to cleanly extract an SVG image
from an ODF document without implementing the OpenDocument standard first.

Namespaces enable reuse of well defined semantics, in the same way that you
would structure your code in functions and modules, instead of stuffing
everything in the main module as plain top-level code.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] XML parsing

2018-03-30 Thread Stefan Behnel
Asif Iqbal schrieb am 30.03.2018 um 03:40:
> On Thu, Mar 29, 2018 at 3:41 PM, Peter Otten wrote:
>> Asif Iqbal wrote:
>>> On Thu, Mar 29, 2018 at 3:56 AM, Peter Otten wrote:
 Asif Iqbal wrote:
> Here is a sample xml file
>
> http://tail-f.com/ns/rest";>
>   http://networks.com/nms";>
> ALLFLEX-BLOOMINGTON
> post-staging
> full-mesh
> ALLFLEX
> http://networks.com/nms";>
>   advanced-plus
>   1000
>   true
>   true
> 
> 
> 
>
> with open('/tmp/template-metadata') as f:
> import xml.etree.ElementTree as ET
> root = ET.fromstring(f.read())

Don't use fromstring() here, use "parse(f).getroot()". The first loads the
whole file step by step into a string in memory, then parses it, and then
throws the string away. The second directly loads and parses the file step
by step.


> I also want to extract the namespace and I see this gets me the namespace
> 
>   str(root[0]).split('{')[1].split('}')[0]
> 
> Is there a better way to extract the name space?

Yes: don't. ;)

Normally, you write code *for* concrete XML namespaces, not code that
extracts arbitrary namespaces from XML. I'm saying that because people
misunderstand this up all the time, and assume that they need to extract
information from namespaces (or even from namespace prefixes!). But without
knowing the namespace upfront, the XML data is actually meaningless. And if
you know the namespace anyway, why would you need to extract it from the data?

I admit that I'm being a bit strict here, there are certainly cases where
parsing the namespace from a tag is a sensible thing to do. I'm really just
saying that most of the time, when you feel the need to do that, it's worth
reconsidering (or asking) if you are doing the right thing.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Recommended Python Compiler

2017-07-31 Thread Stefan Behnel
Ben Finney schrieb am 31.07.2017 um 04:24:
> You might be looking for a different compiler, maybe one which compiles
> not to Python byte code but instead to CPU machine code. There isn't
> such a thing;

The Python wiki lists several compilers for Python, although with varying
levels of language compliance and/or code compatibility*:

https://wiki.python.org/moin/PythonImplementations#Compilers

The most widely used static Python compiler is probably still Cython.

Stefan
(Cython core developer)


PS*: being compliant with the Python language does not necessarily mean it
can successfully run your existing code, as that also requires access to
third-party dependencies etc. Not all tools (aim to) provide that.

PPS: it seems that wiki list needs an update as a) many of the links did
not survive the termination of and migration away from Google Code and b)
some of the projects should now better be marked as abandoned.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python Optical Character Recognition

2015-09-30 Thread Stefan Behnel
Sebastian Cheung via Tutor schrieb am 30.09.2015 um 10:55:
> How to read a jpg or png file into python and extract text thanks. I
> would imagine getting small symbols like ; or , to be difficult?

That depends entirely on the OCR engine. And you won't normally get in
touch with that very deeply.


> Any good framework like this?

Searching for "python ocr" gives me a couple of hits, including this one:

https://pypi.python.org/pypi/pytesseract

PyPI has some more matches that might be relevant:

https://pypi.python.org/pypi?%3Aaction=search&term=ocr&submit=search

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem using lxml

2015-08-23 Thread Stefan Behnel
Anthony Papillion schrieb am 23.08.2015 um 01:16:
> from lxml import html
> import requests
> 
> page = requests.get("http://joplin.craigslist.org/search/w4m";)
> tree = html.fromstring(page.text)

While requests has its merits, this can be simplified to

tree = html.parse("http://joplin.craigslist.org/search/w4m";)


> titles = tree.xpath('//a[@class="hdrlnk"]/text()')
> try:
> for title in titles:
> print title

This only works as long as the link tags only contain plain text, no other
tags, because "text()" selects individual text nodes in XPath. Also, using
@class="hdrlnk" will not match link tags that use class="  hdrlnk  " or
class="abc hdrlnk other".

If you want to be on the safe side, I'd use cssselect instead and then
serialise the complete text content of the link tag to a string, i.e.

from lxml.etree import tostring

for link_element in tree.cssselect("a.hdrlnk"):
title = tostring(
link_element,
method="text", encoding="unicode", with_tail=False)
print(title.strip())

Note that the "cssselect()" feature requires the external "cssselect"
package to be installed. "pip install cssselect" should handle that.


> except:
> pass

Oh, and bare "except:" clauses are generally frowned upon because they can
easily hide bugs by also catching unexpected exceptions. Better be explicit
about the exception type(s) you want to catch.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Generate Prime Numbers

2015-05-29 Thread Stefan Behnel
Mirage Web Studio schrieb am 29.05.2015 um 17:28:
> Below is a sample code i created.
> Can i better it any way?

Absolutely. Prime number generation is a very well researched and fun to
implement topic. Thus many people have done it before.

See this for algorithmic improvements:

https://pypi.python.org/pypi/pyprimes

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pip install lxml fails

2015-04-09 Thread Stefan Behnel
Alex Kleider schrieb am 09.04.2015 um 21:49:
> On 2015-04-09 09:11, Stefan Behnel wrote:
>> All you need to do is install the "-dev" package that goes with your Python
>> installation, e.g. "python3-dev" should match Python 3.4 in current Ubuntu
>> releases.
>>
>> The reason why it's in a separate package is that many people actually
>> don't need this, e.g. when they only install plain Python packages or use
>> the Ubuntu provided binary packages that they can install via apt (e.g.
>> "sudo apt-get install python-lxml").
> 
> Thanks Brandon and Sefan.  It proved correct that I did not have
> python3-dev installed
> .. but after intalling it, the pip install lxml still fails!
> 
> There's a huge amount of output but what follows might provide clues as to
> the current problem:
> [...]
> /usr/bin/ld: cannot find -lz

Same thing, you need to make zlib available to the build. Ubuntu calls the
package "zlib1g-dev". And while you're at it, make sure you also have
"libxml2-dev" and "libxslt-dev".

The official installation instructions can be found here, BTW:

http://lxml.de/installation.html

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pip install lxml fails

2015-04-09 Thread Stefan Behnel
Alex Kleider schrieb am 09.04.2015 um 17:29:
> On 2015-04-09 07:08, Brandon McCaig wrote:
>> I'm a python newbie, but it looks to me like your compiler cannot
>> find your header files, and in particular pyconfig.h.
>>
>> I tried searching my system and found a file with that name at
>> these locations:
>>
>> /home/bambams/src/pyenv/versions/2.7.9/include/python2.7/pyconfig.h
>> /home/bambams/src/pyenv/versions/3.4.2/include/python3.4m/pyconfig.h
>> /usr/include/python2.6/pyconfig.h
>> /usr/include/python2.7/pyconfig.h
>> /usr/include/python3.2mu/pyconfig.h
>>
>> Based on that I am assuming that you should have a pyconfig.h in
>> either /usr/include/python3.4m or
>> /home/alex/P3env/env/include/python3.4m. I would probably start
>> there to verify that I have that header file in a location where
>> it's expected (-I flags in the above command, or system include
>> directories). If not then I would wonder why...
> 
> I have corresponding files in /usr/include for 2.7 but not for 3*:
> alex@t61p:~/P3env/env$ ls -ld /usr/include/python*
> drwxr-xr-x 2 root root 4096 Apr  8 19:05 /usr/include/python2.7
> 
> alex@t61p:~/P3env/env$ ls
> bin  lib
> Nothing in the env file (generated by virtualenv.)
> 
> My Ubuntu 14.04 system comes with Python3 by default so it does exist:
> alex@t61p:~/P3env/env$ which python3
> /usr/bin/python3
> It's a mystery why it doesn't come with the corresponding include directory.
> 
> I'm guessing this is a system level problem that can probably only be
> solved by someone at Ubuntu or one of the lxml maintainers.

It's solved already. :)

All you need to do is install the "-dev" package that goes with your Python
installation, e.g. "python3-dev" should match Python 3.4 in current Ubuntu
releases.

The reason why it's in a separate package is that many people actually
don't need this, e.g. when they only install plain Python packages or use
the Ubuntu provided binary packages that they can install via apt (e.g.
"sudo apt-get install python-lxml").

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Code critique

2014-10-24 Thread Stefan Behnel
Bo Morris schrieb am 24.10.2014 um 14:03:
> May I please get a little instructional criticism. The code below works. It
> logs into 9 different Linux computers, runs a couple commands, and then
> transfers a file back to the server. I want to become a better Python
> coder; therefore, I was hoping for some ways to make the below code better,
> more efficient, or if I am doing something incorrectly, a correct way of
> doing it.  Thanks

A quick comment, not related to coding style: take a look at fabric. It's
made for doing these things.

http://www.fabfile.org/

Regarding your program, instead of writing long sequences of repetitive if
conditions, I would write one function for each of the different operations
and store them in a dict, mapping each host name to a function (and
multiple host names may map to the same function). Then, look up the host
name in the dict and call the corresponding function to run the right
operations on that host.

Using functions will make it easier to factor out similar code. If you look
at the different operations that you do on the different hosts, you will
notice that most of them do the same thing, just with different files.
Instead of duplicating your code for each host, extract the file names that
each host needs and then pass it into the function that reads the file from
the host. The function will then be the same for all hosts, only the input
arguments change.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Standard Library Performance (3.4.1)

2014-10-24 Thread Stefan Behnel
Alan Gauld schrieb am 24.10.2014 um 13:03:
> Not all library modules are C based however so it doesn't
> always apply. But they are usually optimised and thoroughly
> debugged so it is still worth using them rather than building
> your own.

It's worth stressing this point a bit more. Lots of people have been using
this code, and some found bugs in it that are now fixed. This means that
whatever implementation of similar functionality you can come up with
yourself will most likely have more bugs and be less generally versatile.
And even if the standard library code doesn't fit your needs, start by
taking a deep look at what the Python Package Index (PyPI) offers instead
of writing your own.

Developer time is much better spent reusing other people's code and helping
to squash the remaining bugs in it than having everyone write their own
buggy code over and over.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Better way to check *nix remote file age?

2014-06-27 Thread Stefan Behnel
Raúl Cumplido, 27.06.2014 12:10:
> I would recommend you to migrate your Python version for a newer one where
> you can use fabric, paramiko or other ssh tools. It would be easier.

+1

Even compiling it yourself shouldn't be too difficult on Linux.


> I would recommend also instead of doing an "ls -l" command doing something
> to retrieve only the information you need:
> 
> /bin/ls -ls | awk '{print $7,$8,$9, $10}'
> 
> Jun 27 10:36 my_file

Or run some Python code on the other side, e.g. with

python -c 'python code here'

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Android

2014-06-04 Thread Stefan Behnel
Mario Py, 04.06.2014 07:47:
> I'm writing one small simple program that will use Tkinter.
> Once finished, will I be able to run it on android tablet?

Have a look at kivy, it supports different systems, including various
mobile devices.

http://kivy.org/

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] xml parsing from xml

2014-05-10 Thread Stefan Behnel
Stefan Behnel, 10.05.2014 10:57:
> Danny Yoo, 07.05.2014 22:39:
>> If you don't want to deal with a event-driven approach that SAX
>> emphasizes, you may still be able to do this problem with an XML-Pull
>> parser.  You mention that your input is hundreds of megabytes long, in
>> which case you probably really do need to be careful about memory
>> consumption.  See:
>>
>> https://wiki.python.org/moin/PullDom
> 
> Since the OP mentioned that the file is quite large (800 MB), not only
> memory consumption should matter but also processing time. If that is the
> case, PullDOM isn't something to recommend since it's MiniDOM based, which
> makes it quite slow overall.

To back that by some numbers, here are three memory efficient
implementations, using PullDOM, cElementTree and lxml.etree:


$ cat lx.py
from lxml.etree import iterparse, tostring

doc = iterparse('input.xml', tag='country')
root = None
for _, node in doc:
print("--")
print("This is the node for " + node.get('name'))
print("--")
print(tostring(node))
print("\n\n")

if root is None:
root = node.getparent()
else:
sib = node.getprevious()
if sib is not None:
root.remove(sib)

$ cat et.py
from xml.etree.cElementTree import iterparse, tostring

doc = iterparse('input.xml')
for _, node in doc:
if node.tag == "country":
print("--")
print("This is the node for " + node.get('name'))
print("--")
print(tostring(node))
print("\n\n")
node.clear()

$ cat pdom.py
from xml.dom.pulldom import START_ELEMENT, parse

doc = parse('input.xml')
for event, node in doc:
if event == START_ELEMENT and node.localName == "country":
doc.expandNode(node)
print("--")
print("This is the node for " + node.getAttribute('name'))
print("--")
print(node.toxml())
print("\n\n")


I ran all three against a 400 MB XML file generated by repeating the data
snippet the OP provided. Here are the system clock timings in
minutes:seconds, on 64bit Linux, using CPython 3.4.0:

$ time python3 lx.py > /dev/null
time: 0:31
$ time python3 et.py > /dev/null
time: 3:33
$ time python3 pdom.py > /dev/null
time: 9:51


Adding to that another bit of actual tree processing, if I had to choose
between 2 minutes and well over 20 minutes processing time for my 800MB,
I'd tend to prefer the 2 minutes.

Note that the reason why cElementTree performs so poorly here is that its
serialiser is fairly slow, and the code writes the entire 400 MB of XML
back out. If the test was more like "parse 400 MB and generate CSV from
it", then it should perform similar to lxml. PullDOM/MiniDOM, on the other
hand, are slow on parsing, serialisation *and* tree processing.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] xml parsing from xml

2014-05-10 Thread Stefan Behnel
Danny Yoo, 07.05.2014 22:39:
> If you don't want to deal with a event-driven approach that SAX
> emphasizes, you may still be able to do this problem with an XML-Pull
> parser.  You mention that your input is hundreds of megabytes long, in
> which case you probably really do need to be careful about memory
> consumption.  See:
> 
> https://wiki.python.org/moin/PullDom

Since the OP mentioned that the file is quite large (800 MB), not only
memory consumption should matter but also processing time. If that is the
case, PullDOM isn't something to recommend since it's MiniDOM based, which
makes it quite slow overall.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] preferred httprequest library

2014-05-09 Thread Stefan Behnel
Martin A. Brown, 09.05.2014 00:54:
>  : I¹m new to python but not so much to programming.  I need to 
>  : construct a set or programs to test a forms poster that has been 
>  : enhanced (it is in php).  I mostly need http get and post.  This 
>  : is a hands on set of tests and does not have to be bullet proof.
>  : 
>  : What would be a good http request library to use for this work?
> 
> There are many options.  When you can afford to suck the entire 
> remote resource into memory (as with many applications), you will 
> probably find the 'requests' library very handy.  I'd start here, 
> avoiding grabbing for the standard library tools (urllib, and 
> urllib2) unless you need that finer control.
> 
> This has a nice abstraction and, from your description, I think this 
> would be a good fit:
> 
>   http://docs.python-requests.org/en/latest/

Agreed that "requests" is a good external tool with a (mostly) nice
interface, but if you really just need to do GET/POST, there's nothing
wrong with the stdlib's urllib.request module (called urllib2 in Python 2).

https://docs.python.org/3.4/library/urllib.request.html

Basically, you just say

result = urllib.request.urlopen(some_url)

for a GET request and

result = urllib.request.urlopen(some_url, data=post_data)

for POST. It returns a file-like object that you can use to read the
response, ask for headers, etc.

Advantage is that it runs out of the box on whatever Python installation
you have. If installing external packages before running the tests is not
an issue, then the "requests" library will make your life a bit easier for
the more involved cases that you might encounter at some point in the future.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] xml parsing from xml

2014-05-07 Thread Stefan Behnel
Neil D. Cerutti, 07.05.2014 20:04:
> On 5/7/2014 1:39 PM, Alan Gauld wrote:
>> On 07/05/14 17:56, Stefan Behnel wrote:
>>> Alan Gauld, 07.05.2014 18:11:
>>>> and ElementTree (aka etree). The documenation gives examples of both.
>>>> sax is easiest and fastest for simple XML in big files ...
>>>
>>> I wouldn't say that SAX qualifies as "easiest". Sure, if the task is
>>> something like "count number of abc tags" or "find tag xyz and get an
>>> attribute value from it", then SAX is relatively easy and also quite
>>> fast.
>>
>> That's pretty much what I said. simple task, big file. sax is easy.
>>
>> For anything else use etree.
>>
>>> BTW, ElementTree also has a SAX-like parsing mode, but comes with a
>>> simpler interface and saner parser configuration defaults.
>>
>> My experience was different. Etree is powerful but for simple
>> tasks I just found sax easier to grok. (And most of my XML parsing
>> is limited to simple extraction of a field or two.)
> 
> If I understand this task correctly it seems like a good application for
> SAX. As a state machine it could have a mere two states, assuming we aren't
> troubled about the parent nodes of Country tags.

Yep, that's the kind of thing I meant. You get started, just trying to get
out one little field out of the file, then notice that you need another
one, and eventually end up writing a page full of code where a couple of
lines would have done the job. Even just safely and correctly getting the
text content of an element is surprisingly non-trivial in SAX.

It's still unclear what the OP wanted exactly, though. To me, it read more
like the task was to copy some content over from one XML file to another,
in which case doing it in ET is just trivial thanks to the tree API, but
SAX requires you to reconstruct the XML brick by brick here.


> In my own personal case, I partly prefer xml.sax simply because it ignores
> namespaces, a nice benefit in my cases. I wish I could make ElementTree do
> that.

The downside of namespace unaware parsing is that you never know what you
get. It works for some input, but it may also just fail arbitrarily, for
equally valid input.

One cool thing about ET is that it makes namespace aware processing easy by
using fully qualified tag names (one string says it all).  Most other XML
tools (including SAX) require some annoying prefix mapping setup that you
have to carry around in order to tell the processor that you are really
talking about the thing that it's showing to you.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] xml parsing from xml

2014-05-07 Thread Stefan Behnel
Alan Gauld, 07.05.2014 18:11:
> Python comes with several XML parsers. The simplest to use are probably sax
> and ElementTree (aka etree). The documenation gives examples of both. sax
> is easiest and fastest for simple XML in big files while etree is probably
> better for more complex XML structures.

I wouldn't say that SAX qualifies as "easiest". Sure, if the task is
something like "count number of abc tags" or "find tag xyz and get an
attribute value from it", then SAX is relatively easy and also quite fast.
However, anything larger than that (i.e. *any* real task) quickly gets so
complex and complicated that it's usually faster to learn ElementTree's
iterparse() from scratch and write a working solution with it, than to
write even a half working implementation of a SAX handler for a given
problem. And that's completely ignoring the amount of time that such an
unwieldy SAX handler will cost you in terms of long term code maintenance.

BTW, ElementTree also has a SAX-like parsing mode, but comes with a simpler
interface and saner parser configuration defaults. That makes the xml.sax
package even less recommendable.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] array('c')

2014-05-01 Thread Stefan Behnel
Ian D, 01.05.2014 16:38:
> I have this part of code and am unsure as to the effect of the array('c') 
> part.

The argument that you pass into the constructor is a type identifier.

https://docs.python.org/2/library/array.html#array.array

The different types are defined at the top of that page.

It seems that you are using Python 2, BTW. In Python 3, the "c" type is not
supported anymore. I guess that's because characters do not actually exist
at the C level, only bytes (which are represented by "b" and "B").


> n = len(cmd)
>a = array('c')
>a.append(chr((n>> 24) & 0xFF))
>a.append(chr((n>> 16) & 0xFF))
>a.append(chr((n>>  8) & 0xFF))
>a.append(chr(n & 0xFF))
>scratchSock.send(a.tostring() + cmd)   
>   

You don't have to pass in strings. Integers will do just fine.

The code above looks way too complicated, though. You might want to take a
look at the struct module instead, which allows you to do C level
formatting of simple and compound values.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] cdata/aml question..

2014-04-13 Thread Stefan Behnel
Peter Otten, 13.04.2014 10:56:
> from xml.etree import ElementTree as ET
> #root = ET.parse(filename).getroot()
> root = ET.fromstring(data)
> for department in root.findall(".//department"):
> name = department.find("name").text
> desc = department.find("desc").text

  name = department.findtext("name")
  desc = department.findtext("desc")

> print("{}: {}".format(name, desc))

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] c++ on python

2014-03-13 Thread Stefan Behnel
Russel Winder, 13.03.2014 17:29:
> On Thu, 2014-03-13 at 16:57 +0100, Stefan Behnel wrote:
> […]
>> The thing is: if you have to write your own wrapper anyway (trivial or
>> not), then why not write it in Cython right away and avoid the intermediate
>> plain C level?
> 
> If the task is two write an adapter (aka wrapper) then perhaps use SWIG
> whcih is easier for this task than writing Cython code.

Depends. SWIG is nice if you have a large API that a) you want to wrap
quickly all at once and b) that matches the tool well. Once you're beyond
the "matches the tool well" spot, however, you'll start having an
increasingly hard time pushing the tool into matching your API.

Cython has a higher learning curve to get started (it's a programming
language, not a wrapper generator by itself, use something like XDress for
that), but is unlimited in what it allows you to do (because it's a
programming language). So things won't suddenly become harder (let alone
impossible) afterwards.


>> It's usually much nicer to work with object oriented code on both sides
>> (assuming you understand the languages on both sides), than to try to
>> squeeze them through a C-ish API bottleneck in the middle.
> 
> It could be that "object oriented" is a red herring. Without details (*)
> of what it is about the C++ code that is the connection between Python
> and C++, it is difficult to generalize.

Sure. I've seen both good and bad API designs in C++, as in any other language.


> ctypes can be a real pain when trying to call C++ from Python using
> argument values that are not primitive types. CFFI solves (currently
> much, soon most) of this problem by addressing the adapter between
> Python and C++ in a different way to that employed by ctypes. In both
> cases, both are a lot easier than writing Cython code. 

Now, that's a bit overly generalising, wouldn't you say? Even in the cases
where cffi is as simple as Cython, I'd still prefer the portability and
simplicity advantage of having statically compiled (and tested) wrapper
code over a mix of a hand written C++-to-C wrapper and some dynamically
generated glue code with its own set of runtime dependencies. But I can
certainly accept that tools like ctypes and cffi have their niche, too. If
you're comfortable with them, and they fit your needs, then sure, use them.
There isn't one tool that caters for everyone.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] c++ on python

2014-03-13 Thread Stefan Behnel
James Chapman, 13.03.2014 17:35:
> Perhaps I should look into Cython as I'm currently working on a
> project that utilises a C API.
> 
> I've been finding that getting the data types to be exactly what the C
> API is expecting to be the hardest part.
> 
> With the original question in mind, here's an example calling into a
> C++ external C API:
> 
> (Works if compiled is VisualStudio. The DLL produced by MinGW-G++ didn't 
> work).
> 
> ---
> // main.h
> 
> #ifndef __MAIN_H__
> #define __MAIN_H__
> 
> #include 
> 
> #define DLL_EXPORT __declspec(dllexport)
> 
> #ifdef __cplusplus
> extern "C"
> {
> #endif
> 
> int DLL_EXPORT add(int a, int b);
> 
> #ifdef __cplusplus
> }
> #endif
> 
> #endif // __MAIN_H__
> ---
> 
> ---
> //main.cpp
> 
> #include "main.h"
> 
> // a sample exported function
> int DLL_EXPORT add(int a, int b)
> {
> return(a + b);
> }
> 
> extern "C" DLL_EXPORT BOOL APIENTRY DllMain(HINSTANCE hinstDLL, DWORD
> fdwReason, LPVOID lpvReserved)
> {
> switch (fdwReason)
> {
> case DLL_PROCESS_ATTACH:
> // attach to process
> // return FALSE to fail DLL load
> break;
> 
> case DLL_PROCESS_DETACH:
> // detach from process
> break;
> 
> case DLL_THREAD_ATTACH:
> // attach to thread
> break;
> 
> case DLL_THREAD_DETACH:
> // detach from thread
> break;
> }
> return TRUE; // succesful
> }
> ---
> 
> ---
> # -*- coding: utf-8 -*-
> # dll.py
> 
> import ctypes
> 
> 
> class DllInterface(object):
> 
> dll_handle = None
> 
> def __init__(self, dll_file):
> self.dll_handle = ctypes.WinDLL(dll_file)
> 
> def add_a_and_b(self, a=0, b=0):
> return self.dll_handle.add(a, b)
> 
> 
> if __name__ == '__main__':
> dll_file = 'PythonDLL.dll'
> external_lib = DllInterface(dll_file)
> int_a = ctypes.c_int(1)
> int_b = ctypes.c_int(2)
> result = external_lib.add_a_and_b(int_a, int_b)
> print(result)

In Cython, that would essentially be

cdef extern from "main.h":
int add(int a, int b)

print(add(1, 2))

It compiles down to C(++), i.e. it interfaces at the API level, not the ABI
level, as ctypes would.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] c++ on python

2014-03-13 Thread Stefan Behnel
Alan Gauld, 12.03.2014 23:05:
> On 12/03/14 16:49, Stefan Behnel wrote:
>> Alan Gauld, 12.03.2014 10:11:
>>> If it were a library then you would have to call
>>> the individual C++ functions directly using
>>> something like ctypes, which is usually more
>>> complex.
>>
>> ctypes won't talk to C++, but Cython can do it quite easily.
> 
> I thought it would work provided the interface functions
> were declared as C functions? That might involve
> writing a wrapper around it but that is usually
> trivial if you have to compile the source anyway.

The thing is: if you have to write your own wrapper anyway (trivial or
not), then why not write it in Cython right away and avoid the intermediate
plain C level?

It's usually much nicer to work with object oriented code on both sides
(assuming you understand the languages on both sides), than to try to
squeeze them through a C-ish API bottleneck in the middle.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] c++ on python

2014-03-12 Thread Stefan Behnel
Alan Gauld, 12.03.2014 10:11:
> If it were a library then you would have to call
> the individual C++ functions directly using
> something like ctypes, which is usually more
> complex.

ctypes won't talk to C++, but Cython can do it quite easily.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] XML parsing when elements contain foreign characters

2014-01-09 Thread Stefan Behnel
Garry Bettle, 09.01.2014 09:50:
> I'm trying to parse some XML and I'm struggling to reference elements that
> contain foreign characters.

I skipped over Steven's response and he apparently invested quite a bit of
time in writing it up so nicely, so I can happily agree and just add one
little comment that you should generally avoid using MiniDOM for XML
processing. Instead, use the ElementTree library, which lives right next to
it in Python's standard library. It's a lot easier to use, and also
performs much better.

http://docs.python.org/library/xml.etree.elementtree.html

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] trying to parse an xml file

2013-12-15 Thread Stefan Behnel
Steven D'Aprano, 14.12.2013 23:22:
> On Sat, Dec 14, 2013 at 09:29:00AM -0500, bruce wrote:
>> Looking at a file -->>
>> http://www.marquette.edu/mucentral/registrar/snapshot/fall13/xml/BIOL_bysubject.xml
>>
>> The file is generated via online/web url, and appears to be XML.
>>
>> However, when I use elementtree:
>>   document = ElementTree.parse( '/apps/parseapp2/testxml.xml' )
>>
>> I get an invalid error : not well-formed (invalid token):
> 
> I cannot reproduce that error. Perhaps you have inadvertently corrupted 
> the file when downloading it?

You may have missed my post, but I had already suggested this and the OP
replied in (accidentally?) private e-mail that that was the source of the
problem.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] trying to parse an xml file

2013-12-14 Thread Stefan Behnel
bruce, 14.12.2013 15:29:
> Looking at a file -->>
> http://www.marquette.edu/mucentral/registrar/snapshot/fall13/xml/BIOL_bysubject.xml

That file looks ok to me.


> The file is generated via online/web url, and appears to be XML.
> 
> However, when I use elementtree:
>   document = ElementTree.parse( '/apps/parseapp2/testxml.xml' )
> 
> I get an invalid error : not well-formed (invalid token):

That's only a part of the error message. Could you provide the complete output?

That being said, maybe you did something wrong when you downloaded the
file? Try to get it again.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Pretty printing XML using LXML on Python3

2013-11-30 Thread Stefan Behnel
SM, 29.11.2013 22:21:
> On Thu, Nov 28, 2013 at 2:45 PM, eryksun wrote:
>> On Thu, Nov 28, 2013 at 2:12 PM, SM wrote:
>>> Run with Python3:
>>>
>>> $ python3 testx.py
>>> b'\n  \n  some text\n\n'
>>
>> print() first gets the object as a string. tostring() returns bytes,
>> and bytes.__str__ returns the same as bytes.__repr__.

Meaning, it's a pure matter of visual representation on the screen, not a
difference in the data.


>> You can decode
>> the bytes before printing, or instead use tounicode():
>>
>> >>> s = etree.tounicode(root, pretty_print=True)
>> >>> print(s)
>> 
>>   
>>   some text
>> 
> 
> Thank you, eryksun. using tounicode seems to work on this small piece of
> code. It still has issues with my code which is generating a big XML code.

Well, I'm sure you are not generating a large chunk of XML just to print it
on the screen, so using tostring(), as you did before, is certainly better.

However, if it's really that much output, you should serialise into a file
instead of serialising it into memory first and then writing that into a
file. So, use ElementTree.write() to write the output into a file directly.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Load Entire File into memory

2013-11-04 Thread Stefan Behnel
Amal Thomas, 04.11.2013 14:55:
> I have checked the execution time manually as well as I found it through my
> code. During execution of my code, at start, I stored my initial time(start
> time) to a variable  and at the end calculated time taken to run the code =
> end time - start time. There was a significance difference in time.

You should make sure that there are no caching effects here. Your operating
system may have loaded the file into memory (assuming that you have enough
of that) after the first read and then served it from there when you ran
the second benchmark.

So, make sure you measure the time twice for both, preferably running both
benchmarks in reverse order the second time.

That being said, it's not impossible that f.readlines() is faster than
line-wise iteration, because it knows right from the start that it will
read the entire file, so it can optimise for it (didn't check if it
actually does, this might have changed in Py3.3, for example).

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A mergesort

2013-08-31 Thread Stefan Behnel
D.V.N.Sarma డి.వి.ఎన్.శర్మ, 31.08.2013 18:30:
> I have been searching for mergesort implimentations in python and came
> across this.

In case this isn't just for education and you actually want to use it, the
built-in sorting algorithm in Python (used by list.sort() and sorted()) is
a very fast mergesort variant. Anything you could write in Python code is
bound to be slower.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] list all links with certain extension in an html file python

2012-09-28 Thread Stefan Behnel
Santosh Kumar, 16.09.2012 09:20:
> I want to extract (no I don't want to download) all links that end in
> a certain extension.
> 
> Suppose there is a webpage, and in the head of that webpage there are
> 4 different CSS files linked to external server. Let the head look
> like this:
> 
> http://foo.bar/part1.css";>
> http://foo.bar/part2.css";>
> http://foo.bar/part3.css";>
> http://foo.bar/part4.css";>
> 
> Please note that I don't want to download those CSS, instead I want
> something like this (to stdout):
> 
> http://foo.bar/part1.css
> http://foo.bar/part1.css
> http://foo.bar/part1.css
> http://foo.bar/part1.css
> 
> Also I don't want to use external libraries.

That's too bad because lxml.html would make this really easy. See the
iterlinks() method here:

http://lxml.de/lxmlhtml.html#working-with-links

Note this this also handles links in embedded CSS code etc., although you
might not be interested in that, if the example above is representative for
your task.

Stefan


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] measuring the start up time of an event-driven program

2012-07-24 Thread Stefan Behnel
Albert-Jan Roskam, 24.07.2012 11:18:
> I would like to test how long it takes for two versions of the same
> program to start up and be ready to receive commands. The program is
> SPSS version-very-old vs. SPSS version-latest.
> 
> Normally I'd just fire the program up in a subprocess and measure the
> time before and after finishing. But this kind of program is never
> finished. It's looping until infinity and waiting for events/commands. I
> tried wrapping it in a "while True" loop, and break out of the loop and
> terminate the program (using ctypes) if the retcode of the process is
> equal to zero. But that doesn't work.

Is it really looping or is it just sitting and waiting for input? You might
be able to provide some input that makes it terminate.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] The Best Way to go About with Self Modifying Code/Code Generation?

2012-07-08 Thread Stefan Behnel
Steven D'Aprano, 08.07.2012 15:48:
>> Hey all, I have a question on using self-modifying code/code generation
>> in Python; namely how to do it. 
> 
> I know others have already said not to do this, and to be honest I was
> going to say that same thing, but I have changed my mind. Buggrit, this is
> Python, where we are all consenting adults and you're allowed to shoot
> yourself in the foot if you want.

:)

You're right. Saying "don't do that" is easy, but giving people the freedom
to shoot themselves in the foot sometimes has accidental side effects that
end up pushing our world another bit forward. (Though usually not, but
that's ok)

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] The Best Way to go About with Self Modifying Code/Code Generation?

2012-07-07 Thread Stefan Behnel
Aaron Tp, 07.07.2012 23:19:
> I have a question on using self-modifying code/code generation
> in Python; namely how to do it.

Don't.

Seriously, the answer you should ask yourself is: why do you think the
genetic algorithm (or whatever you are trying to do exactly) would come up
with any reasonable code constructs other than what you already
anticipated? And if you anticipated them, you can just write them down
statically in the code and only care about recombining them.

Make your code configurable instead, and then change the configuration. I
agree with Alan that a state machine could be a suitable approach (but it's
a bit hidden in his long answer). In any case, think in terms of modifying
data, not code.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python XML for newbie

2012-07-02 Thread Stefan Behnel
Peter Otten, 02.07.2012 09:57:
> Sean Carolan wrote:
>>> Thank you, this is helpful.  Minidom is confusing, even the
>>> documentation confirms this:
>>> "The name of the functions are perhaps misleading"

Yes, I personally think that (Mini)DOM should be locked away from beginners
as far as possible.


>> Ok, so I read through these tutorials and am at least able to print
>> the XML output now.  I did this:
>>
>> doc = etree.parse('computer_books.xml')
>>
>> and then this:
>>
>> for elem in doc.iter():
>> print elem.tag, elem.text
>>
>> Here's the data I'm interested in:
>>
>> index 1
>> field 11
>> value 9780596526740
>> datum
>>
>> How do you say, "If the field is 11, then print the next value"?  The
>> raw XML looks like this:
>>
>> 
>> 1
>> 11
>> 9780470286975
>> 
>>
>> Basically I just want to pull all these ISBN numbers from the file.
> 
> With http://lxml.de/ you can use xpath:
> 
> $ cat computer_books.xml 
> 
> 
> 
> 1
> 11
> 9780470286975
> 
> 
> 
> $ cat read_isbn.py
> from lxml import etree
> 
> root = etree.parse("computer_books.xml")
> print root.xpath("//datum[field=11]/value/text()")
> $ python read_isbn.py 
> ['9780470286975']
> $ 

And lxml.objectify is also a nice tool for this:

  $ cat example.xml
  
   
108

 
  1
  2
  Essential System Administration
 

   
  

  $ python
  Python 2.7.3
  >>> from lxml import objectify
  >>> t = objectify.parse('example.xml')
  >>> for datum in t.iter('datum'):
  ... if datum.field == 2:
  ... print(datum.value)
  ...
  Essential System Administration
  >>>

It's not impossible that this is faster than the XPath version, but that
depends a lot on the data.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] The dreaded UnicodeDecodeError... why, why, why does it still want ascii?

2012-06-06 Thread Stefan Behnel
Marc Tompkins, 06.06.2012 10:21:
> On Tue, Jun 5, 2012 at 11:22 PM, Stefan Behnel wrote:
> 
>> You can do this:
>>
>>connection = urllib2.urlopen(url)
>>tree = etree.parse(connection, my_html_parser)
>>
>> Alternatively, use fromstring() to parse from strings:
>>
>>page = urllib2.urlopen(url)
>>pagecontents = page.read()
>> html_root = etree.fromstring(pagecontents, my_html_parser)
>>
>>
> Thank you!  fromstring() did the trick for me.
> 
> Interestingly, your first suggestion - parsing straight from the connection
> without an intermediate read() - appears to create the tree successfully,
> but my first strip_tags() fails, with the error "ValueError: Input object
> has no document: lxml.etree._ElementTree".

Weird. You may want to check the parser error log to see if it has any hint.


>> See the lxml tutorial.
> 
> I did - I've been consulting it religiously - but I missed the fact that I
> was mixing strings with file-like IO, and (as you mentioned) the error
> message really wasn't helping me figure out my problem.

Yes, I think it could do better here. Reporting a parser error with an
"unprintable error message" would at least make it less likely that users
are being diverted from the actual cause of the problem.


>> Also note that there's lxml.html, which provides an
>> extended tool set for HTML processing.
> 
> I've been using lxml.etree because I'm used to the syntax, and because
> (perhaps mistakenly) I was under the impression that its parser was more
> resilient in the face of broken HTML - this page has unclosed tags all over
> the place.

Both are using the same parser and share most of their API. lxml.html is
mostly just an extension to lxml.etree with special HTML tools.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] The dreaded UnicodeDecodeError... why, why, why does it still want ascii?

2012-06-05 Thread Stefan Behnel
Marc Tompkins, 06.06.2012 03:10:
> I'm trying to parse a webpage using lxml; every time I try, I'm
> rewarded with "UnicodeDecodeError: 'ascii' codec can't decode byte
> 0x?? in position?: ordinal not in range(128)"  (the byte value and
> the position occasionally change; the error never does.)
> 
> The page's encoding is UTF-8:
>  
> so I have tried:
> -  setting HTMLParser's encoding to 'utf-8'

That's the way to do it, although the parser should be able to figure it
out by itself, given the above content type declaration.


> Here's my current version, trying everything at once:
> 
> from __future__ import print_function
> import datetime
> import urllib2
> from lxml import etree
> url = 
> 'http://www.wpc-edi.com/reference/codelists/healthcare/claim-adjustment-reason-codes/'
> page = urllib2.urlopen(url)
> pagecontents = page.read()
> pagecontents = pagecontents.decode('utf-8')
> pagecontents = pagecontents.encode('ascii', 'ignore')
> tree = etree.parse(pagecontents,
> etree.HTMLParser(encoding='utf-8',recover=True))

parse() is meant to parse from files and file-like objects, so you are
telling it to parse from the "file path" in pagecontents, which obviously
does not exist. I admit that the error message is not helpful.

You can do this:

connection = urllib2.urlopen(url)
tree = etree.parse(connection, my_html_parser)

Alternatively, use fromstring() to parse from strings:

page = urllib2.urlopen(url)
pagecontents = page.read()
html_root = etree.fromstring(pagecontents, my_html_parser)

See the lxml tutorial. Also note that there's lxml.html, which provides an
extended tool set for HTML processing.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] breeds of Python .....

2012-04-01 Thread Stefan Behnel
Brett Ritter, 01.04.2012 07:19:
> On Sat, Mar 31, 2012 at 5:37 PM, Barry Drake wrote:
>> concentrate on Python3 or stay with Python2 and get into bad habits when it
>> comes to change eventually?  Apart from the print and input functions, I
>> haven't so far got a lot to re-learn.
> 
> My recommendation is to go with Python2 - most major projects haven't
> made the switch

This statement is a bit misleading because it implies that you actually
"have to make the switch" at some point. Many projects are quite happily
supporting both at the same time, be it in a single code base (e.g. helped
by the "six" module) or by using the 2to3 conversion tool.

Also, from what I see and hear, "most major projects" are at least on their
way to adapting their code base for Python 3 compatibility, and many, many
libraries and other small or large software packages are already available
for Python 3.

I don't see a major reason for a beginner to not go straight for Python 3,
and then learn the necessary Py2 quirks in addition when the need arises.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python with HTML

2012-01-29 Thread Stefan Behnel
t4 techno, 28.01.2012 11:02:
> I want to make a web page which has to include some python script and html
> tags as well, am not getting how to do that .
> I searched some articles but cant understand them .
> is there anything like linking an external html file into python script ?
> 
> Can u please help for same
> waiting for your instructions or some links that can help me

As others have pointed out, your request is not very clear. I agree that
client side Python in a browser is not the best approach, but if your
question is about server side HTML generation from Python, you should
either look at one of the web frameworks (a couple of them were already
mentioned), or use a templating engine. There are plenty of them for
Python, here is a list:

http://wiki.python.org/moin/Templating

There are also plenty of web frameworks, BTW. They are better than plain
templating engines when your data becomes non-trivial and comes from a
database etc.

Depending on what you actually want to do (note that you only said *how*
you want to do it, not *what* you want to do), you should also look at
content management systems like Plone.

Hope that helps,

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why do you have to close files?

2012-01-26 Thread Stefan Behnel
Alan Gauld, 27.01.2012 02:16:
> with open(myfile) as aFile:
>  # use aFile

I should add that this is the shortest, safest (as in "hard to get wrong")
and most readable way to open and close a file. It's worth getting used to.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Text Proccessing/Command Line Redirection/XML Parsing etc in Python.

2011-11-28 Thread Stefan Behnel

Pritesh Ugrankar, 28.11.2011 07:56:

First of all, my apologies for writing this very long post.


Welcome to the list. :)



I have been through some related questions about this in Stack Overflow as
well as googled it and found that Perl and Python are the two languages
that offer most what I need. As a SAN Administrator, I have a very limited
time to learn a scripting language so I can concentrate on only one. Most
of my questions below may make you think that I prefer Perl, but its
nothing like...Just that I tried learning Perl before for doing stuff I
want to try, but am thinking now what advantages will I have if I try out
Python?


There are two anecdotes that people from both camps frequently report. With 
Perl, people write their script, and then, several months later, they come 
back, look at it, don't understand it anymore, and rewrite it. With Python, 
people write their script, forget about it over time, write it again when 
they need it, and when they happen to find the old one and compare it to 
the new one, they find that both look almost identical.


It's all in the syntax.



All my SAN Management Servers are Windows only.

Following is what I am specifically looking at:

1) Consider the following output:
symdev -sid 1234 list devs
0D62 Not Visible???:? 07C:D13 RAID-5N/A (DT) RW  187843
0D63 Not Visible???:? 08C:C11 RAID-5N/A (DT) RW  187843
0D64 Not Visible???:? 07C:C12 RAID-5N/A (DT) RW  62614
0D65 Not Visible???:? 08C:D14 RAID-5N/A (DT) RW  62614
0D66 Not Visible???:? 07C:D15 RAID-5N/A (DT) RW  31307
0D67 Not Visible???:? 08C:C13 RAID-5N/A (DT) RW  31307
0D68 Not Visible???:? 07C:C14 RAID-5N/A (DT) RW  31307

  Whats given above is only a small part of the output. There are many other
fields that appear but I have left those out for brevity.

The symdev commands generates a list of devices that can be used for SAN
Allocation.

What I want to do is, on the Windows Machines, do something like a grep or
awk so that the 10th field, which contains the size of the devices will be
filtered and I can generate an output like.

Devices of 187 GB = 3

Devices of 62 GB = 2

Devices of 31 GB = 3

Thing is, this output will differ on each storage box. Some may have 10
devices, some may have 100

I can use grep or awk for Windows, but looking at a bigger picture here.

what I want to do is do some kind of filtering of the command line output
so that it will count the type of devices and seggregate them according to
their size.


That's really easy. You open the file (see the open() function) and it 
returns a file object. You can iterate over it with a for-loop, and it will 
return each line as a string. Use the split() method on the string object 
to split the string by whitespace. That returns a list of separate fields. 
Then, pick the fields you want. In code:


with open('thefile.txt') as f:
for line in f:
fields = line.split()
print(fields[9])   # the 10th field, for example

If you are not reading the output from a file but from a process you 
started, take a look at the subprocess module in the standard library.


http://docs.python.org/library/subprocess.html

Also take a look at string formatting for output.

http://docs.python.org/tutorial/inputoutput.html

http://docs.python.org/library/stdtypes.html#string-formatting-operations



Tried Perl, but I found that the syntax was a little difficult to remember.
This is again my own shortcoming as I am not a trained programmer. I only
got to work on the script after a gap of many weeks and by that time, I
forgot what the script was supposed to do so had to start from the
scratchMay be commenting will help :)


Yep, that's Perl at it's best.



Which language will generate Binary executable that is smaller in size and
faster?


You usually don't do that. Instead, you'd install Python on all machines 
where you need it and then just run your code there.


If you really want to go through the hassle to build a self-contained 
executable from each program you write, you will have to bundle the runtime 
for either language with it, so it won't be small.




4) I also want to try out playing with XML outputThe storage commands I
use allow me the output to be directed to an XML FormatIs Python better
suited at this ?


Absolutely. Python has ElementTree. You'll just love working with it.

http://docs.python.org/library/xml.etree.elementtree.html

A quick tutorial is here:

http://effbot.org/zone/element-index.htm



Few more questions pop up like, Which will give me more freedom and ease to
maintain ? Which scripting language is better from the employability point
of view?

I dont want to start with one language and six months or a year down think
"Heck, this was better in the other one".because I really can
concentrate on only one langauge.


There are always certain types of probl

Re: [Tutor] Cython vs Python-C API

2011-11-15 Thread Stefan Behnel

Dario Lopez-Kästen, 15.11.2011 09:33:

On Tue, Nov 15, 2011 at 9:09 AM, Stefan Behnel wrote:


cubic spline interpolation


No, I didn't.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Cython vs Python-C API

2011-11-15 Thread Stefan Behnel

Jaidev Deshpande, 14.11.2011 21:30:

I need to perform cubic spline interpolation over a range of points, and I
have written the code for the same in C and in Python.

The interpolation is part of a bigger project. I want to front end for the
project to be Python. Ideally I want Python only to deal with data
visualization and i/o, and I'll leave the computationally expensive part of
the project to C extensions, which can be imported as functions into Python.

To this end, the interpolation can be handled in two ways:

1. I can either compile the C code into a module using the Python-C/C++
API, through which I can simple 'import' the required function.
2. I can use the Python code and extend it using Cython.


3. use the existing C code and wrap it with Cython, likely using NumPy to 
pass the data, I guess.


Why write yet another version of your code?

I would strongly suggest not to put C-API calls into your C code. It will 
just make it less versatile (e.g. no longer usable outside of CPython) and 
generally harder to maintain.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] 6 random numbers

2011-10-16 Thread Stefan Behnel

ADRIAN KELLY, 16.10.2011 21:43:

anyone know how i would go about printing 6 random numbers


print(75, 45, 6, 35, 36472, 632)

Numbers were generated by typing randomly on my keyboard.

Sorry-for-taking-it-all-too-literally-ly,

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] how obsolete is 2.2?

2011-09-07 Thread Stefan Behnel

Hi,

please don't repost your question before waiting at least a little for an 
answer.


c smith, 08.09.2011 05:26:

I found a book at the local library that covers python but it's 2.2.


That's way old then. It won't teach you anything about the really 
interesting and helpful things in Python, such as generators, itertools or 
the "with" statement, extended APIs and stdlib modules, and loads of other 
goodies and enhanced features, such as metaclasses and interpreter 
configuration stuff.




I already have been using 2.7 for basic stuff and would like to know if it's
worth my time to read this book.


Likely not. Better read a recent tutorial or spend your time getting used 
to the official Python documentation.




Are there any glaring differences that would be easy to point out, or is it
too convoluted?


Tons of them, too many to even get started. You might want to take a look 
at the "what's new" pages in the Python documentation. That will give you a 
pretty good idea of major advances.




Also, am I correct in thinking that 3.0 will always be called 3.0


No. It's called Python 3 (sometimes historically named Python 3k or Python 
3000), with released versions being 3.0, 3.1.x and 3.2.x and the upcoming 
release being 3.3.




but will change over time and will always include experimental features


Well, it's the place where all current development happens, be it 
experimental or not.




while 2.x will gradually increase the 'x'


Nope. 'x' is fixed at 7, Py2.7 is the officially last release series of 
Python 2, although with an extended maintenance time frame of several years.




and the highest 'x' will indicate the most current, stable release?


That's right, both for the Py2.x and Py3.x releases.



oh, and a question on 'pickle':


Let's keep that in your other post, to let it serve a purpose.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] gzip

2011-08-08 Thread Stefan Behnel

questions anon, 08.08.2011 01:57:

Thank you, I didn't realise that was all I needed.
Moving on to the next problem:
I would like to loop through a number of directories and decompress each
*.gz file and leave them in the same folder but the code I have written only
seems to focus on the last folder. Not sure where I have gone wrong.
Any feedback will be greatly appreciated.


import gzip
import os

MainFolder=r"D:/DSE_work/temp_samples/"

for (path, dirs, files) in os.walk(MainFolder):
 for dir in dirs:
 outputfolder=os.path.join(path,dir)
 print "the path and dirs are:", outputfolder
 for gzfiles in files:
 print gzfiles
 if gzfiles[-3:]=='.gz':
 print 'dealing with gzfiles:', dir, gzfiles
 f_in=os.path.join(outputfolder,gzfiles)
 print f_in
 compresseddata=gzip.GzipFile(f_in, "rb")
 newFile=compresseddata.read()
 f_out=open(f_in[:-3], "wb")
 f_out.write(newFile)


Note how "outputfolder" is set and reset in the first inner loop, *before* 
starting the second inner loop. Instead, build the output directory name 
once, without looping over the directories (which, as far as I understand 
your intention, you can ignore completely).


Also, see the shutils module. It has a method that efficiently copies data 
between open file(-like) objects. With that, you can avoid reading the 
whole file into memory.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Mainloop conflict

2011-07-30 Thread Stefan Behnel

Christopher King, 31.07.2011 04:30:

I think I'll go with threading. I've become more familiar with it.


That's ok. When used carefully, threads can be pretty helpful to gain 
concurrency in I/O tasks.


But just in case you ever feel like using them for anything else, this is 
worth a read:


http://ptolemy.eecs.berkeley.edu/publications/papers/06/problemwithThreads/

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Mainloop conflict

2011-07-29 Thread Stefan Behnel

Christopher King, 29.07.2011 17:08:

On Thursday, July 28, 2011, Dave Angel wrote:

On 07/28/2011 08:32 PM, Christopher King wrote:


Dear Tutor Dudes,
 I have a socket Gui program. The only problem is that socket.recv

waits

for a response, which totally screws Tkinter I think. I tried making the
timeout extremely small (it was alright if I didn't receive anything, I
was excepting that a lot) but I think that screwed socket. Anyone have

words

of wisdom.

Sincerely,
 Me


Sure:

Do the socket I/O on a separate thread.


I was afraid of that.


Understandable. However, as I already said, you don't need to do that, just 
go the obvious route.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Mainloop conflict

2011-07-28 Thread Stefan Behnel

Christopher King, 29.07.2011 02:32:

 I have a socket Gui program. The only problem is that socket.recv waits
for a response, which totally screws Tkinter I think. I tried making the
timeout extremely small (it was alright if I didn't receive anything, I
was excepting that a lot) but I think that screwed socket. Anyone have words
of wisdom.


Most of the GUI main loops (including tk, I believe) have a way to hook in 
additional file descriptors and sockets that can be listened on, so that 
you get a normal event when data becomes available in them.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question regarding xml.dom.minidom: How do you send an unsignedByte in an wsdl request

2011-07-22 Thread Stefan Behnel

Garry Bettle, 22.07.2011 20:18:

I'm trying some calls to an wsdl API I've subscribed to.


You might find this interesting:

http://effbot.org/zone/element-soap.htm



But I'm struggling to know what they want when sending an unsignedByte in a
request.


That's just a number, plain old-fashioned decimal digits.



  **
*  *
**"


In ElementTree, that's just

  request_element = Element("request_tag_name_here", PriceFormat="123")

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Question regarding xml.dom.minidom: How do you send an unsignedByte in an wsdl request

2011-07-22 Thread Stefan Behnel

Emile van Sebille, 22.07.2011 20:59:

You'll likely get more traction on this at
http://mail.python.org/mailman/listinfo/xml-sig


Unlikely.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Hello World in Python without space

2011-07-15 Thread Stefan Behnel

Richard D. Moores, 15.07.2011 23:21:

On Sun, Jul 10, 2011 at 05:05, Peter Otten wrote:


>>> help(print)

shows

print(...)
print(value, ..., sep=' ', end='\n', file=sys.stdout)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep:  string inserted between values, default a space.
end:  string appended after the last value, default a newline.


I didn't know that printing to a file with print() was possible, so I tried

>>> print("Hello, world!", file="C:\test\test.txt")
Traceback (most recent call last):
   File "", line 1, in
builtins.AttributeError: 'str' object has no attribute 'write'
>>>

And the docs at
  tell me
"The file argument must be an object with a write(string) method; if
it is not present or None, sys.stdout will be used."

What do I do to test.txt to make it "an object with a write(string) method"?


Oh, there are countless ways to do that, e.g.

  class Writable(object):
  def __init__(self, something):
  print("Found a %s" % something))
  def write(self, s):
  print(s)

  print("Hello, world!", file=Writable("C:\\test\\test.txt"))

However, I'm fairly sure what you want is this:

with open("C:\\test\\test.txt", "w") as file_object:
print("Hello, world!", file=file_object)

Look up "open()" (open a file) and the "with statement" (used here 
basically as a safe way to make sure the file is closed after writing).


Also note that "\t" refers to a TAB character in Python, you used this 
twice in your file path string.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Cython question

2011-07-02 Thread Stefan Behnel

Alan Gauld, 02.07.2011 15:28:

"Albert-Jan Roskam" wrote

I used cProfile to find the bottlenecks, the two Python functions
getValueChar and getValueNum. These two Python functions simply call two
equivalent C functions in a .dll (using ctypes).


The code is currently declared as Windows-only and I don't know any good 
C-level profiling code for that platform. Under Linux, once I'm sure I have 
a CPU bound problem below the Python level, I'd use valgrind and 
KCacheGrind to analyse the performance. That will include all C function 
calls (and even CPU instructions, if you want) in the call trace. Makes it 
a bit less obvious to see what Python is doing, but leads to much more 
detailed results at the C level.


It's also worth keeping in mind that all profiling attempts *always* 
interfere with the normal program execution. The results you get during a 
profiling run may not be what you'd get with profiling disabled. So, 
profiling is nice, but it doesn't replace proper benchmarking.




In that case cythin will speed up the calling loops but it can't do
anything to speed up the DLL calls, you have effectively already optimised
those functions by calling the DLL.


The problem is that these functions are called as many times as there are
VALUES in a file


It might be worth a try if you have very big data sets
because a C loop is faster than a Python loop. But don't expect order of
magnitude improvements.


Looking at the code now, it's actually worse than that. The C function call 
does not only go through ctypes, but is additionally wrapped in a method 
call. So the OP is paying the call overhead twice for each field, plus the 
method lookup and some other operations. These things can add up quite easily.


So, iff the conversion code is really a CPU bottleneck, and depending on 
how much work the C functions actually do, the current call overhead, 100 
times per record, may be a substantial part of the game. It's worth seeing 
if it can be dropped at the Python level by removing method lookup and call 
levels (i.e. by inlining the method), but if that's not enough, Cython may 
still be worth it. For one, Cython's call overhead is lower than that of 
ctypes, and if the call is only done once, and the loop is moved into 
Cython (i.e. C) entirely, the overhead will also drop substantially.


It might also be worth running the code in PyPy instead of CPython. PyPy 
will optimise a lot of the overhead away that this code contains.




So if I understand you correctly, this is not Cpu bound


I don't have enough information to comment on that.



It may still be CPU bound in that the CPU is doing all the work, but if
the CPU time is in the DLL functions rather than in the loop cython
won't help much.

CPU bound refers to the type of processing - is it lots of logic, math,
control flows etc? Or is it I/O bound - reading network, disk, or user
input? Or it might be memory bound - creating lots of in memory objects
(especially if that results in paging to disk, when it becomes I/O
bound  too!)

Knowing what is causing the bottleneck will determine how to improve
things. Use tools like TaskManager in Windows or top in *nix to see
where the time is going and what resources are being consumed. Fast code
is not always the answer.


That is very good advice. As a rule of thumb, a process monitor like top 
will tell you how much time is spent in I/O and CPU. If, during a test run 
(with profiling disabled, as that eats time, too!), your CPU usage stays 
close to 100%, your program is CPU bound. If, however, it stays lower, and 
the monitor reports a high I/O waiting time, it's I/O bound. In this case, 
I/O bound is what you want to achieve, because it means that your code is 
running faster than your hard drive can deliver the data.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Cython question

2011-07-02 Thread Stefan Behnel

Albert-Jan Roskam, 02.07.2011 11:49:

Some time ago I finished a sav reader for Spss .sav data files (also with the
help of some of you!):
http://code.activestate.com/recipes/577650-python-reader-for-spss-sav-files/

It works fine, but it is not fast with big files. I am thinking of implementing
two of the functions in cython (getValueChar and getValueNum).
As far as I understood it requires the functions to be re-written in a
Python-like langauge


"rewritten" only in the sense that you may want to apply optimisations or 
provide type hints. Cython is Python, but with language extensions that 
allow the compiler to apply static optimisations to your code.




, 'minus the memory manager'.


Erm, not sure what you mean here. Cython uses the same memory management as 
CPython.




That little piece of code is
converted to C and subsequently compiled to a .dll or .so file. The original
program listens and talks to that .dll file. A couple of questions:
-is this a correct representation of things?


More or less. Instead of "listens and talks", I'd rather say "uses". What 
you get is just another Python extension module which you can use like any 
other Python module.




-will the speed improvement be worthwhile? (pros)


Depends. If your code is I/O bound, then likely not. If the above two 
functions are true CPU bottlenecks that do some kind of calculation or data 
transformation, it's likely going to be faster in Cython.




-are there reasons not to try this? (cons)


If your performance problem is not CPU related, it may not be worth it.



-is it 'sane' to mix ctypes and cython for nonintensive and intensive
operations, respectively?


Why would you want to use ctypes if you can use Cython?

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python Extensions in C

2011-05-26 Thread Stefan Behnel

James Reynolds, 26.05.2011 21:34:

On Thu, May 26, 2011 at 3:07 PM, Stefan Behnel wrote:

Stefan Behnel, 26.05.2011 18:10:
  James Reynolds, 26.05.2011 17:22:



As an intellectual exercise, I wanted to try my hand at writing some
extensions in C.


This is fine for en exercise, and I hope you had fun doing this.

However, for real code, I suggest you use Cython instead. Your module
would have been substantially simpler and likely also faster.

http://cython.org


Oh, and one more thing: it makes it easier to write safe, portable and
versatile code. As others have pointed out, your code has unnecessary bugs.
It also doesn't compile in Python 3 and lacks the ability to calculate the
averages of a set or deque, for example. Instead, it only handles tuples and
lists. That reduces the usefulness of your implementation.


Thank you for point out the above. Could you kindly please point out some of
those unnecessary bugs?


Alan gave a good response here. For example, PyFloat_AsDouble() can fail, 
you need to handle that.


Your code also does not correctly sum up floating point numbers. See 
math.fsum() for that.


Here's a possible Cython version of the variance() function:

  from math import fsum

  def variance(seq):
  "Calculate the variance of all FP numbers in a sequence."
  cdef double value, dsum
  cdef Py_ssize_t count = len(seq)

  dsum = fsum(seq)
  average = dsum / count

  return fsum([(average - value) ** 2 for value in seq]) / count

You can avoid the list comprehension (read: creation) in the second call by 
using a generator expression (remove the angular brackets). However, this 
is likely to run slower (and requires Cython 0.15 ;), so we are trading 
memory for speed here.


If you want this implementation to run faster, you need to unfold and 
inline fsum's algorithm, which you can look up in the CPython sources.


For comparison, here is a less accurate version that does not use fsum() 
but your own algorithm:


  def variance(seq):
  cdef double value, dsum, sqsum
  cdef Py_ssize_t count = len(seq)

  dsum = 0.0
  for value in seq:
  dsum += value
  average = dsum / count

  sqsum = 0.0
  for value in seq:
  sqsum += (average - value) ** 2
  return sqsum / count

Note that both implementations cannot work with iterators as input as they 
iterate twice. If you want to support that, you can add


seq = list(seq)

at the start, which will let us trade memory for this feature. An 
additional type test condition for seq not being a list or tuple will give 
you more or less what PySequence_Fast() does.




I'm not sure that it matters that it won't work in sets and deque's, so long
as the documentation is clear, no?


It will matter to the users of your code at some point.



(Which I'm still not sure how to do, just yet)


What do you mean? Doc strings? You can just add them to the module function 
struct:


http://docs.python.org/py3k/extending/extending.html#the-module-s-method-table-and-initialization-function



But, I did test it for sets and deque and it works just fine.

setx = set([])
print type(setx)
for i in r:
 setx.add(random.choice(r))
print stats.mean(setx)
dequex = deque([])
print type(dequex)
for i in r:
 dequex.append(random.choice(r))
print stats.mean(dequex)


Right. I keep forgetting that PySequence_Fast() actually has a fallback 
that copies the iterable into a tuple. So you are also trading memory here 
to make it work for all iterable types. Personally, I'd never use the 
PySequence_Fast*() API because it isn't really all that fast. It's just 
faster than normal Python iteration for tuples and lists and also avoids 
copying in that case. All other cases are worse than generic iteration, and 
it's not faster than Cython's looping code.




I'll see what I can do about making it work with P3k, I think the only thing
that would need to be changed would be "PyMODINIT_FUNC initstats(void)" I
believe. Please correct me if I'm wrong though.


You need to create a module struct and use a different name for the init 
function. Another thing that Cython code doesn't have to care about.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python Extensions in C

2011-05-26 Thread Stefan Behnel

Stefan Behnel, 26.05.2011 18:10:

James Reynolds, 26.05.2011 17:22:

As an intellectual exercise, I wanted to try my hand at writing some
extensions in C.


This is fine for en exercise, and I hope you had fun doing this.

However, for real code, I suggest you use Cython instead. Your module would
have been substantially simpler and likely also faster.

http://cython.org


Oh, and one more thing: it makes it easier to write safe, portable and 
versatile code. As others have pointed out, your code has unnecessary bugs. 
It also doesn't compile in Python 3 and lacks the ability to calculate the 
averages of a set or deque, for example. Instead, it only handles tuples 
and lists. That reduces the usefulness of your implementation.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python Extensions in C

2011-05-26 Thread Stefan Behnel

Rachel-Mikel ArceJaeger, 26.05.2011 17:46:

A couple small things that will help improve memory management

Rather than avg = sumall / count; return avg; Just return sumall/count
instead. Then you don't have to waste a register or assignment
operation.

Division is expensive. Avoid it when you can.

Here, for (a=0; a != count; a++) { temp =
PyFloat_AsDouble(PySequence_Fast_GET_ITEM(seq,a)); sumall += temp;
Again, save variables and operations. Write this as:

for (a=0; a != count; a++) { sumall +=
PyFloat_AsDouble(PySequence_Fast_GET_ITEM(seq,a));

Similar corrections in var()

It's cheaper when you're using powers of two to just right or
left-shift:>>  or<<. Since you want to increase by a power of two, do:

(avg - PyFloat_AsDouble(PySequence_Fast_GET_ITEM(seq,a)<<  1; // This
means (...)^(2^1)

Division by powers of two is>>. Note that these only works for powers of
two.


Oh please! You are seriously insulting my C compiler here. Believe me, it's 
a *lot* smarter than this.


None of this is even faintly necessary.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python Extensions in C

2011-05-26 Thread Stefan Behnel

James Reynolds, 26.05.2011 17:22:

As an intellectual exercise, I wanted to try my hand at writing some
extensions in C.


This is fine for en exercise, and I hope you had fun doing this.

However, for real code, I suggest you use Cython instead. Your module would 
have been substantially simpler and likely also faster.


http://cython.org

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Parsing an XML document using ElementTree

2011-05-25 Thread Stefan Behnel

Sithembewena Lloyd Dube, 25.05.2011 14:40:

Thanks for all your suggestions. I read up on gzip and urllib and also
learned in the process that I could use urllib2 as its the latest form of
that library.

Herewith my solution: I don't know how elegant it is, but it works just
fine.

def get_contests():
  url = '
http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
'
  req = urllib2.Request(url)
  req.add_header('accept-encoding','gzip/deflate')
  opener = urllib2.build_opener()
  response = opener.open(req)


This is ok.



  compressed_data = response.read()
  compressed_stream = StringIO.StringIO(compressed_data)
  gzipper = gzip.GzipFile(fileobj=compressed_stream)
  data = gzipper.read()


This should be simplifiable to

   uncompressed_stream = gzip.GzipFile(fileobj=response)



  current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml')
  data_file = open(current_path, 'w')
  data_file.write(data)
  data_file.close()
  xml_data = ET.parse(open(current_path, 'r'))


And this subsequently becomes

   xml_data = ET.parse(uncompressed_stream)



  contest_list = []
  for contest_parent_node in xml_data.getiterator('contest'):


Take a look at ET.iterparse().



   contest = Contest()
   for contest_child_node in contest_parent_node:
if (contest_child_node.tag == "name" and
contest_child_node.text is not None and contest_child_node.text != ""):
 contest.name = contest_child_node.text
if (contest_child_node.tag == "league" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.league = contest_child_node.text
if (contest_child_node.tag == "acro" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.acro = contest_child_node.text
if (contest_child_node.tag == "time" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.time = contest_child_node.text
if (contest_child_node.tag == "home" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.home = contest_child_node.text
if (contest_child_node.tag == "away" and
contest_child_node.text is not None and contest_child_node.text != ""):
contest.away = contest_child_node.text


This is screaming for a simplification, such as

   for child in contest_parent_node:
   if child.tag in ('name', 'league', ...): # etc.
   if child.text:
   setattr(context, child.tag, child.text)


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Parsing an XML document using ElementTree

2011-05-24 Thread Stefan Behnel

Sithembewena Lloyd Dube, 24.05.2011 11:59:

I am trying to parse an XML feed and display the text of each child node
without any success. My code in the python shell is as follows:

>>> import urllib
>>> from xml.etree import ElementTree as ET

>>> content = urllib.urlopen('
http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
')
>>> xml_content = ET.parse(content)

I then check the xml_content object as follows:

>>> xml_content



Well, yes, it does return an XML document, but not what you expect:

  >>> urllib.urlopen('URL see above').read()
  "\r\n  you must add 'accept-encoding' as
  'gzip,deflate' to the header of your request\r
  \n"

Meaning, the server forces you to pass an HTTP header to the request in 
order to receive gzip compressed data. Once you have that, you must 
decompress it before passing it into ElementTree's parser. See the 
documentation on the gzip and urllib modules in the standard library.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem with printing Python output to HTML Correctly

2011-05-10 Thread Stefan Behnel

Spyros Charonis, 10.05.2011 19:14:

On Tue, May 10, 2011 at 5:11 PM, Spyros Charonis wrote:

I know I posted the exact same topic a few hours ago and I do apologize for
this, but my script had a careless error, and my real issue is somewhat
different.


I would have preferred an update to the initial thread instead of a 
complete repost. A single thread makes it easier for others to read up on 
the answers when they find the thread through a web search later on. 
Duplicate information requires additional effort to find out what is 
relevant and what is not.




No need to post answers, I figured out where my mistake was.


Given that you received answers that helped you, it would only be fair to 
write a quick follow-up to let others know where the problem was and how 
you solved it, in case they encounter a similar problem one day.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Jokes on Python Language

2011-04-21 Thread Stefan Behnel

Ratna Banjara, 21.04.2011 13:49:

Does anybody knows jokes related to Python Language?
If the answer is yes, please do share it...


http://www.python.org/doc/humor/

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] how to optimize this code?

2011-03-27 Thread Stefan Behnel

Albert-Jan Roskam, 27.03.2011 21:57:

I made a program that reads spss data files. I ran cProfile to see if I can
optimize things (see #1 below).


First thing to note here: sort the output by "time", which refers to the 
"tottime" column. That will make it more obvious where most time is really 
spent.




It seems that the function getValueNumeric is a pain spot (see #2
below). This function calls a C function in a dll for each numerical
cell value. On the basis of this limited amount of info, what could I do
to further optimize the code? I heard about psyco, but I didn't think
such tricks would be necessary as the function spssGetValueNumeric is is
implemented in C already (which should be fast).


The problem is that you are using ctypes to call it. It's useful for simple 
things, but it's not usable for performance critical things, such as 
calling a C function ten million times in your example. Since you're saying 
"dll", is this under Windows? It's a bit more tricky to set up Cython on 
that platform than on pretty much all others, since you additionally need 
to install a C compiler, but if you want to go that route, it will reward 
you with a much faster way to call your C code, and will allow you to also 
speed up the code that does the calls.


That being said, see below.



## most time consuming function

  def getValueNumeric(fh, spssio, varHandle):
 numValue = ctypes.c_double()
 numValuePtr = ctypes.byref(numValue)
 retcode = spssio.spssGetValueNumeric(fh,
ctypes.c_double(varHandle),
numValuePtr)


You may still be able to make this code a tad faster, by avoiding the 
function name lookups on both the ctypes module and "spssio", and by using 
a constant pointer for numValue (is you're not using threads). That may not 
make enough of a difference, but it should at least be a little faster.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Efficiency of while versus (x)range

2011-03-17 Thread Stefan Behnel

Shane O'Connor, 17.03.2011 01:32:

In particular, I'm using Python 2.4.3 on a web server which needs to run as
fast as possible using as little memory as possible (no surprises there!).


Note that a web application involves many things outside of your own code 
that seriously impact the performance and/or resource requirements. 
Database access can be slow, excessively dynamic page generation and 
template engines can become a bottleneck, badly configured caching can eat 
your RAM and slow down your response times.


It seems very unlikely to me that the performance of loop iteration will 
make a substantial difference for you.




I'm aware that there are more significant optimizations than the above and I
will profile the code rather than prematurely optimize loops at the sake of
readability/errors but I'm still curious about the answer.


You shouldn't be. Optimising at that level is clearly the wrong place to 
start with.


That touches on a really nice feature of Python. It's a language that 
allows you to focus strongly on your desired functionality instead of 
thinking about questionable micro optimisations all over the place. 
Optimisation is something that you should start to apply when your test 
suite is large enough to catch the bugs you introduce by doing it.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] BLAS Implementation on Python

2011-03-08 Thread Stefan Behnel

Alan Gauld, 08.03.2011 09:51:

"Knacktus" wrote


He doesn't have to write it, as it is very obvious, that no Python code
on earth (even written by Guido himself ;-)) stands a chance compared to
Fortran or C. Look at this:


There is one big proviso. The C or Fortran needs to be well written
in the first place. It's quite possible to write code in Python that
outperforms badly written C. And it is quite easy to write bad C!


It's seriously hard to write computational Python code that is faster than 
C code, though. It shifts a bit if you can take advantage of Python's 
dynamic container types, especially dicts, but that's rarely the case for 
mathematic computation code, which would usually deploy NumPy etc. in 
Python. Writing that in pure Python is bound to be incredibly slow, and 
likely still several hundred times slower than even a simple approach in C. 
There's a reason people use NumPy, C, Fortran and Cython for these things.


Remember that the OP's topic was a BLAS implementation. The BLAS libraries 
are incredibly well optimised for all sorts of platforms, including GPUs. 
They are building blocks that C programmers can basically just plug into 
their code to run hugely fast computations. There is no way Python code 
will ever be able to get any close to that.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] BLAS Implementation on Python

2011-03-07 Thread Stefan Behnel

Knacktus, 07.03.2011 14:28:

Am 07.03.2011 01:50, schrieb Mahesh Narayanamurthi:

Hello,

I am thinking of implementing a BLAS package in pure python. I am
wondering if this is a good idea.


My design goals are:


[2] Targetted to run on Python3

Good idea. NumPy and SciPy will be (are already?) ported.


Yes, NumPy has been ported as of version 1.5. Not sure about the overall 
status of SciPy, their FAQ isn't up to date.




[4] To serve as a reference design for
future High Performance Code in Python

The future of High Performance Python is probably PyPy.


PyPy has been "the future" ever since they started ;-). They have gotten 
faster, but it's is still far from being suitable for any kind of numerical 
high performance computing. That will continue to be the domain of C, 
Fortran and Cython for a while, and potentially including LuaJIT2 for 
certain use cases (usable from Python through Lupa, BTW).


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] BLAS Implementation on Python

2011-03-07 Thread Stefan Behnel

Mahesh Narayanamurthi, 07.03.2011 01:50:

I am thinking of implementing a BLAS package in pure python. I am wondering
if this is a good idea. My design goals are:

[1] Efficient manipulation of Matrices and
 Vectors using pure python objects and
 python code.
[2] Targetted to run on Python3
[3] Extensive use of defensive programming
 style
[4] To serve as a reference design for
 future High Performance Code in Python
[5] To serve as a reference material in
 classroom courses on numerical computing
 or for hobbyist programmers


First question that comes to my mind: who would use this? The available 
packages are commonly not written in Python, but they are fast, which is 
why they are being used by Python programs. A quick web search brought up 
Tokyo, for example, which wraps BLAS in Cython:


https://github.com/tokyo/tokyo

I can see that it would be easier to teach the concepts based on code 
written in Python than based on the existing implementations. Even if there 
likely won't be a real use case, it may still be nice to have a pure Python 
implementation available, if it could serve as a drop-in replacement for 
faster implementations. I.e., you could present the inner workings based on 
the Python code, and then switch to a 'real' implementation for the actual 
computations.


However, even if you say "efficient" above (and the algorithms may well be 
efficient), don't expect it to be fast. Python is great for orchestrating 
high performance computations. It's less great for doing them in plain 
Python code.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] c++ data types in python script

2011-03-06 Thread Stefan Behnel

Arthur Mc Coy, 06.03.2011 09:56:

I've used SWIG module to embed python inside c++ app.


Given that this deals with an advanced topic (C-level extensions), I find 
comp.lang.python (python-list), where you also posted this, a more 
appropriate place for discussion than the Python tutor mailing list. So I 
suggest people answer on c.l.py.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Accessing a DLL from python

2011-03-01 Thread Stefan Behnel

Hanlie Pretorius, 01.03.2011 13:33:

Can anyone perhaps suggest the easiest way of translating the C code
into Python, bearing in mind that I'm rather a beginner?


A beginner of what? Python? Programming in general?

The C code you posted doesn't look too complex, so you could try to 
translate it (mostly literally) into Python syntax and use Cython to wrap 
that in a binary extension.


Cython is basically Python, but it allows you to call directly into C code. 
Here's a tutorial:


http://docs.cython.org/src/tutorial/clibraries.html

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Converting .pyd to .so

2011-02-28 Thread Stefan Behnel

j ram, 28.02.2011 18:49:

Wine is a good suggestion, but it takes up 3.53 MB. Is there a lighter
alternative?


So far, you didn't state whether the DLL actually uses Windows calls, but I
would imagine it does, and if so, you can't use it on anything but Windows
without emulating those calls, thus using Wine.


Sorry for not being more specific, the DLL actually uses Windows calls.


If it's available in source form (C? C++? What else?), you can extract the
part that's interesting to you and wrap that using Cython or ctypes (with
Cython being substantially faster than SWIG or ctypes).


The Source is C. I've heard of Cython, would Cython be a more portable
alternative?


It's likely not more portable than SWIG.



However, you didn't give us any hints about what the DLL actually does, so
we can't know if you really need to go that path or if you just failed to
find the obvious portable alternative.


The DLL wraps a device driver, and the library of the SWIG wrapped device
driver calls is invoked from a Python app. I was trying to find how this
device driver (DLL) could be used on Linux without having to re-write the
whole driver code for Linux.


Again, you're not giving us enough information - what kind of device are 
you talking about?


Generally speaking, I doubt that a Windows device driver is of any use on a 
non-Windows operating system. There are exceptions such as NDISWrapper


http://en.wikipedia.org/wiki/NDISwrapper

but they are just that: exceptions. It's hard to emulate enough of the 
Windows driver support layer to make use of a Windows driver.


Anyway, this is no longer a Python problem, so this is getting off-topic 
for this list.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Converting .pyd to .so

2011-02-27 Thread Stefan Behnel

fall colors, 28.02.2011 03:25:
> Stefan Behnel wrote:

Well, there's Wine, a free implementation of Windows for Unix systems. You
can either try to load the DLL using Wine and ctypes (I suspect that's the
hard way), or just run the Windows Python distribution through Wine and load
the wrapper .pyd into that.

I assume the DLL is only available in binary form?



Wine is a good suggestion, but it takes up 3.53 MB. Is there a lighter
alternative?


So far, you didn't state whether the DLL actually uses Windows calls, but I 
would imagine it does, and if so, you can't use it on anything but Windows 
without emulating those calls, thus using Wine.




The DLL is available in both source and binary form.


If it's available in source form (C? C++? What else?), you can extract the 
part that's interesting to you and wrap that using Cython or ctypes (with 
Cython being substantially faster than SWIG or ctypes).


However, you didn't give us any hints about what the DLL actually does, so 
we can't know if you really need to go that path or if you just failed to 
find the obvious portable alternative.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Converting .pyd to .so

2011-02-27 Thread Stefan Behnel

fall colors, 27.02.2011 20:27:

I was wondering if it would be possible to convert a .pyd file that works on
Windows into a .so file that works on Linux?

I gather that it might not be possible to convert the .pyd file if the
underlying DLL file was built with Windows API calls (Swig was used to wrap
up the DLL into a pyd file). Is there a wrapper or something that can
interface to the .pyd to make it Linux compatible?


Well, there's Wine, a free implementation of Windows for Unix systems. You 
can either try to load the DLL using Wine and ctypes (I suspect that's the 
hard way), or just run the Windows Python distribution through Wine and 
load the wrapper .pyd into that.


I assume the DLL is only available in binary form?

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Creating xml documents

2011-02-16 Thread Stefan Behnel

allan oware, 16.02.2011 09:20:

Alright,
Am stuck with xml.dom.minidom


Interesting. Why would that be the case?



, but is there correct usage for *Node.prettyxml()
*resulting in less junky newlines and spaces ?

i.e *from*



 28.4258363792


 -6.13557141177


 46.1374243603


 6.41013674147

   

*to*
*
*

 28.477417
 -4.936117
 47.341236
 5.267412



See my reply to your original question: use ElementTree.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Creating xml documents

2011-02-15 Thread Stefan Behnel

Karim, 15.02.2011 17:24:

On 02/15/2011 02:48 PM, Stefan Behnel wrote:

allan oware, 15.02.2011 14:31:

Which python modules can one use to create nicely formatted xml documents ?


Depends on your exact needs, but xml.etree.ElementTree is usually a good
thing to use anyway. If you care about formatting (a.k.a. pretty
printing), look here:

http://effbot.org/zone/element-lib.htm#prettyprint


It seems stefan that version 1.3 still does not validate xml against xsd...


I have no idea what this has to do with either the OP's thread (i.e. this 
thread) or with me.


It's true, though, that ElementTree does not support validation against W3C 
XML Schema. If you need that, you can use lxml, but I do not see the OP 
needing this.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Creating xml documents

2011-02-15 Thread Stefan Behnel

allan oware, 15.02.2011 14:31:

Which python modules can one use to create nicely formatted xml documents ?


Depends on your exact needs, but xml.etree.ElementTree is usually a good 
thing to use anyway. If you care about formatting (a.k.a. pretty printing), 
look here:


http://effbot.org/zone/element-lib.htm#prettyprint

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] JES Jython

2011-02-07 Thread Stefan Behnel

Eun Koo, 07.02.2011 20:45:

Hi I have a problem in JES getting a solution to a function. Is there a way you 
guys can help?


If you can provide enough information for us to understand what your 
problem is, we may be able to help you. It's a matter of politeness to help 
us help you. The more work we have to put into figuring out what you want 
(or even what your words are supposed to mean), the less likely it is that 
someone will do that and help you out.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] small ElementTree problem

2011-01-28 Thread Stefan Behnel

Alex Hall, 28.01.2011 14:25:

On 1/28/11, Stefan Behnel wrote:

Alex Hall, 28.01.2011 14:09:

On 1/28/11, Stefan Behnel wrote:

Alex Hall, 27.01.2011 23:23:

self.id=root.find("id").text
self.name=root.find("name).text


There's a findtext() method on Elements for this purpose.


I thought that was used to search for the text of an element? I want
to get the text, whatever it may be, not search for it. Or am I
misunderstanding the function?


What do you think 'find()' does? Use the Source, Luke. ;)

Here is what I am thinking:
element.find("tagname"): returns an element with the tag name, the
first element with that name to be found. You can then use the usual
properties and methods on this element.
element.findtext("text"): returns the first element found that has a
value of "text". Take this example:

some text

Now you get the root, then call:
root.find("a") #returns the "a" element
root.findtext("some text") #also returns the "a" element


Ah, ok, then you should read the documentation:

http://docs.python.org/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.findtext

findtext() does what find() does, except that it returns the text value of 
the Element instead of the Element itself.


It basically spells out to "find text of element matching(path)".

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] small ElementTree problem

2011-01-28 Thread Stefan Behnel

Alex Hall, 28.01.2011 14:09:

On 1/28/11, Stefan Behnel wrote:

Alex Hall, 27.01.2011 23:23:

   self.id=root.find("id").text
self.name=root.find("name).text


There's a findtext() method on Elements for this purpose.


I thought that was used to search for the text of an element? I want
to get the text, whatever it may be, not search for it. Or am I
misunderstanding the function?


What do you think 'find()' does? Use the Source, Luke. ;)

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] small ElementTree problem

2011-01-27 Thread Stefan Behnel

Hi,

since you said that you have it working already, here are just a few 
comments on your code.


Alex Hall, 27.01.2011 23:23:

all=root.findall("list/result")
for i in all:
  self.mylist.append(Obj().parse(i))


It's uncommon to use "i" for anything but integer loop variables. And 'all' 
is not a very telling name either. I'd use something like "all_results" and 
"result_element" instead.




In Obj.parse(), the element passed in is treated like this:

def parse(data):
  root=data.getroot()


I don't understand this. You already have an Element here according to your 
code above. Why do you try to call getroot() on it? Again, "data" is not 
very telling. Giving it a better name will help you here.




  self.id=root.find("id").text
self.name=root.find("name).text


There's a findtext() method on Elements for this purpose.



Printing the results of the above through Obj.id or Obj.name gives me
odd behavior:
print Obj.id :>None


"Obj.id" is a class attribute. "self.id" is an instance attribute. 
Different things.




Does the root change when I call find()?


No.



Why would
an Element object get used when I clearly say to use the text of the
found element?


I don't think the code snippets you showed us above are enough to answer this.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] get xml for parsing?

2011-01-26 Thread Stefan Behnel

Alex Hall, 27.01.2011 05:01:

How would I go about getting the xml from a website through the site's
api? The url does not end in .xml since the xml is generated based on
the parameters in the url. For example:
https://api.website.com/user/me/count/10?api_key=MY_KEY
would return ten results (the count parameter) as xml. How do I
actually get this xml into my program? TIA.


The filename extension doesn't matter. If you know it's XML that you get 
back, you can use ElementTree (in the xml.etree package) or lxml (external 
package) to read it. Use the urllib2 package in the standard library to 
request the XML page, then pass the result into the XML parser (parse()).


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] module to parse XMLish text?

2011-01-14 Thread Stefan Behnel

Wayne Werner, 15.01.2011 03:25:

On Fri, Jan 14, 2011 at 4:42 PM, Terry Carroll wrote:

On Fri, 14 Jan 2011, Karim wrote:

  from xml.etree.ElementTree import ElementTree

I don't think straight XML parsing will work on this, as it's not valid
XML; it just looks XML-like enough to cause confusion.


It's worth trying out - most (good) parsers "do the right thing" even when
they don't have strictly valid code. I don't know if xml.etree is one, but
I'm fairly sure both lxml and BeautifulSoup would probably parse it
correctly.


They wouldn't. For the first tags, the text values would either not come 
out at all or they would be read as attributes and thus loose their order 
and potentially their whitespace as well. The other tags would likely get 
parsed properly, but the parser may end up nesting them as it hasn't found 
a closing tag for the previous tags yet.


So, in any case, you'd end up with data loss and/or a structure that would 
be much harder to handle than the (relatively) simple file structure.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Refcount in C extensions

2011-01-14 Thread Stefan Behnel

Izz ad-Din Ruhulessin, 14.01.2011 19:49:

Thanks for your quick reply and clearing the issue up for me. Using your
answer, I rewrote the function to this:

double Py_GetAttr_DoubleFromFloat(PyObject *obj, const char *attr)


{

PyObject *get_attr, *py_float;

int has_attr;



//Check if the given object has the given attribute.


has_attr = PyObject_HasAttrString(obj, attr);

if (has_attr == False) {

return -.0;

}



//Get our attribute and convert it to a double.


get_attr = PyObject_GetAttrString(obj, attr);


Note that HasAttr() calls GetAttr() internally, so it's actually faster to 
call GetAttr() and check for an exception (and clear it). That's basically 
how HasAttr() works, except that it doesn't tell you the result if the 
attribute existed.




py_float = PyNumber_Float(get_attr);

if (py_float == NULL) {

Py_DECREF(get_attr);

Py_XDECREF(py_float);


You already know that py_float is NULL, so Py_XDECREF() is a no-op.



return -.0;

}

double output = PyFloat_AsDouble(py_float);



//Garbage collect


Py_DECREF(get_attr);

Py_XDECREF(py_float);


py_float cannot be NULL at this point, so the Py_XDECREF() will compile 
into Py_DECREF().




(False is 0)


In that case, better write 0 instead.



Regarding your Cython suggestion, as a matter of coincidence I have been
reading about it in the past few days. I'm in doubt of using it however,
because I have a lot of native C code that would require rewriting if I
switched to Cython.


No need for that, Cython can call external C code natively. So you can make 
the switch step by step.




On the other hand, your example shows that such a
one-time rewrite will pay-off big time in future development speed.


It usually does, yes. It often even pays off immediately because a rewrite 
tends to be pretty straight forward (basically, read and understand the C 
code, rip out all low-level stuff and replace the rest with simpler code), 
and while doing so, some bugs tend to disappear, the code becomes simpler, 
safer and often also faster, and new features appear while you're at it.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Refcount in C extensions

2011-01-14 Thread Stefan Behnel

Izz ad-Din Ruhulessin, 14.01.2011 17:52:

I am writing a Python C extension and I have some trouble understanding how
reference counting works exactly. Though I think I understand the practice
on simple operations (see my question at stackoverflow:
http://stackoverflow.com/questions/4657764/py-incref-decref-when), but on
more "complex" operations I cannot quite grasp it yet.

For example, in the following helper function, do I need to DECREF or are
all the references automatically destroyed when the function returns?

double Py_GetAttr_DoubleFromFloat(PyObject *obj, const char *attr)
{
if ((PyObject_GetAttrString(obj, attr) == False) ||
(PyObject_HasAttrString(obj, attr) == False)) {
return -.0;
}


This is C, nothing is done automatically. So you need to take care to 
properly DECREF the references. one or two references are leaked in the above.


BTW, what's "False"? Did you mean "Py_False"?



return PyFloat_AsDouble(PyNumber_Float(PyObject_GetAttrString(obj, attr)));


This leaks two references.



Please share your thoughts, thanks in advance, kind regards,


Consider taking a look at Cython. It's an extension language that lets you 
write Python code and generates C code for you. In Cython, your code above 
simply spells


cdef double Py_GetAttr_DoubleFromFloat(obj, attr):
value = getattr(obj, attr, False)
if value is False:
return -.0
return value

Note that I'm using a Python object for 'attr' for performance reasons (and 
for Python 3 portability).


I would expect that the above is at least a bit faster than your code, but 
it handles ref-counting correctly.


Having said that, I'd frown a bit about an API that returns either False or 
a double value ...


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] module to parse XMLish text?

2011-01-14 Thread Stefan Behnel

Terry Carroll, 14.01.2011 03:55:

Does anyone know of a module that can parse out text with XML-like tags as
in the example below? I emphasize the "-like" in "XML-like". I don't think
I can parse this as XML (can I?).

Sample text between the dashed lines::

-
Blah, blah, blah




SOMETHING ELSE
SOMETHING DIFFERENT

-


You can't parse this as XML because it's not XML. The three initial child 
tags are not properly closed.


If the format is really as you describe, i.e. one line per tag, regular 
expressions will work nicely. Something like (untested)


  import re
  parse_tag_and_text = re.compile(
# accept a tag name and then either space+tag or '>'+text+' ]+)(?: ([^>]+)>\s*|>([^<]+)http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Equality of numbers and Strings

2011-01-10 Thread Stefan Behnel

Karim, 10.01.2011 17:07:

I am not a beginner in Python language but I discovered a hidden property
of immutable elements as Numbers and Strings.

s ='xyz'
 >>> t = str('xyz')

 >>> id(s) == id(t)
True

Thus if I create 2 different instances of string if the string is
identical (numerically). I get the same object in py db. It could be
evident but if I do the same (same elements) with a list it will not
give the same result. Is-it because of immutable property of strings and
numbers?


AFAIR, all string literals in a module are interned by the CPython 
compiler, and short strings that look like identifiers are also interned 
(to speed up dictionary lookups, e.g. for function names). So you will get 
identical objects in these cases, although it's not a good idea to rely on 
this as it's an implementation detail of the runtime.


And the second thing that you can observe here is that str() never copies a 
string you pass in, which is reasonable behaviour for immutable objects.




Thus if I create 2 different instances of string if the string is
identical (numerically).


There's no such thing as "numerically identical" strings. It's enough to 
say that they are identical as opposed to equal.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Equality of numbers and Strings

2011-01-10 Thread Stefan Behnel

Emile van Sebille, 10.01.2011 18:42:

On 1/10/2011 9:23 AM bob gailer said...

On 1/10/2011 11:51 AM, Emile van Sebille wrote:


well, not predictably unless you understand the specifics of the
implementation you're running under.


>>> from string import letters
>>> longstring = letters*100
>>> otherstring = letters*100
>>> id(longstring)
12491608
>>> id (otherstring)
12100288
>>> shortstring = letters[:]
>>> id(letters)
11573952
>>> id(shortstring)
11573952
>>>


In my experiment I found that using * to replicate gave different
results than using the exact literal. That is why in the program I
posted I used the equivalent of eval("'" + letters*n + "'") which gives
different results than eval("letters*n")!


Hence, not predictably.

I also found it particularly interesting that an explicit copy didn't:
shortstring = letters[:]


There's no need to copy an immutable object.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scraping and saving in file SOLVED

2010-12-29 Thread Stefan Behnel

Peter Otten, 29.12.2010 13:45:

   File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 430, in encode
 return self.decode().encode(encoding)


Wow, that's evil.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python 2.7.1 interpreter passing function pointer as function argument and Shedskin 0.7

2010-12-28 Thread Stefan Behnel

Frank Chang, 28.12.2010 22:35:

Good afternoon. I want to thank everyone who helped me fix the global
name 'levinshtein_automata' is not defined error.
When I run the Shedskin 0.7 Python to C+++ compiler on the
same python program, I receive the error message * Error *
automata_test.py:148 : unbound identifier 'lookup_func'.


ShedSkin is very restrictive compared to CPython. It requires static types 
for variables and it doesn't support all Python features.


Is there a reason you want to use ShedSkin for your program?

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Choice of Python

2010-12-28 Thread Stefan Behnel

Abdulhakim Haliru, 28.12.2010 13:38:

I come from a Cakephp, zend framework angle cutting through ASP.net,VB and
C# at an intermediate level.
[...]
Is python really worth the pain or should I just skip it ?


Given that you already invested your time into learning all of the above 
(which basically cover about 1 1/2 of several main corners of programming), 
I think you should really take some time off to unlearn some of the bad 
habits that these particular languages tend to teach you. Python is a truly 
good way to do that.


My advice: don't spend too much time reading books. Pick a task that sounds 
like fun to implement and give it a try with Python. Some would propose 
exercises from project Euler for this or maybe pygame, but you'll likely 
have your own idea about what's fun and what isn't.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python C API - Defining New Classes with Multiple Inheritance

2010-12-25 Thread Stefan Behnel

Logan McGrath, 24.12.2010 03:14:

Hi, I've just started using the Python C API for version 2.7.1, and I've
got a question!

How do you define a new type which inherits from multiple types?


You can do this for Python classes, but not for C implemented types (which 
are single inheritance by design). Note that you can create both from C 
code, though, so you can just use a Python class if you need multiple 
inheritance and C types for everything else (e.g. for the base classes).




I've been
browsing the source code for Python 2.7.1 but I'm having a tough time
finding examples. I see that MySQLdb defines low-level classes in the
module "_mysql" using the C API, then extends from them using Python, but I
want to keep as much of this in C as I can.

Any help would be much appreciated!


The best advice I can give is to use Cython. It's basically a Python 
compiler that generates fast C code for extension modules. So you can just 
write your classes in Python and let Cython compile them for you. That way, 
you get about the same speed as with hand written C code (often faster, 
sometimes slower), but with substantially less coding.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] xml.etree.ElementTree.parse() against a XMLShema file

2010-12-22 Thread Stefan Behnel

Karim, 22.12.2010 22:09:

Using lxml (except for the different import) will be fully compliant with
the ET code.
Do I have to adapt it?


There are certain differences.

http://codespeak.net/lxml/compatibility.html

This page hasn't been changed for ages, but it should still be mostly accurate.



I saw your fantastic benchmarks! Why the hell lxml is not integrated into
the stdlib.
I thought they put in it things which works at best for python interest ?


I proposed it but it was rejected with the argument that it's a huge 
dependency and brings in two large C libraries that will be hard to control 
for future long-term maintenance. I think that's a reasonable objection.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] xml.etree.ElementTree.parse() against a XMLShema file

2010-12-22 Thread Stefan Behnel

Karim, 22.12.2010 19:28:

On 12/22/2010 07:07 PM, Karim wrote:


Is somebody has an example of the way to parse an xml file against a
"grammary" file.xsd.


I found this:

http://www.velocityreviews.com/forums/t695106-re-xml-parsing-with-python.html

Stefan is it still true the limitation of etree in python 2.7.1 ?


Yes, ElementTree (which is in Python's stdlib) and lxml.etree are separate 
implementations. If you want validation, use the lxml package.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-22 Thread Stefan Behnel

Walter Prins, 21.12.2010 22:13:

On 21 December 2010 17:57, Alan Gauld wrote:

"Stefan Behnel" wrote


But I don't understand how uncompressing a file before parsing it can
be faster than parsing the original uncompressed file?



I didn't say "uncompressing a file *before* parsing it". I meant
uncompressing the data *while* parsing it.



Ah, ok that can work, although it does add a layer of processing
to identify compressed v uncompressed data, but if I/O is the
bottleneck then it could give an advantage.



OK my apologies, I see my previous response was already circumscribed by
later emails (which I had not read yet.)  Feel free to ignore it. :)


Not much of a reason to apologize. Especially on a newbee list like 
python-tutor, a few words more or a different way of describing things may 
help in widening the set of readers who understand and manage to follow 
other people's arguments.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel

David Hutto, 21.12.2010 16:11:

On Tue, Dec 21, 2010 at 10:03 AM, Stefan Behnel wrote:

I meant
uncompressing the data *while* parsing it. Just like you have to decode it
for parsing, it's just an additional step to decompress it before decoding.
Depending on the performance relation between I/O speed and decompression
speed, it can be faster to load the compressed data and decompress it into
the parser on the fly. lxml.etree (or rather libxml2) internally does that
for you, for example, if it detects compressed input when parsing from a
file.

Note that these performance differences are tricky to prove in benchmarks,


Tricky and proven, then tell me what real time, and this is in
reference to a recent c++ discussion, is python used in ,andhow could
it be utilized insay an aviation system to avoid a collision when
milliseconds are on the line?


I doubt that there are many aviation systems that send around gigabytes of 
compressed XML data milliseconds before a collision.


I even doubt that air plane collision detection is time critical anywhere 
in the milliseconds range. After all, there's a pilot who has to react to 
the collision warning, and he or she will certainly need more than a couple 
of milliseconds to react, not to mention the time that it takes for the air 
plane to adapt its flight direction. If you plan the system in a way that 
makes milliseconds count, you can just as well replace it by a 
jack-in-the-box. Oh, and that might even speed up the reaction of the pilot. ;)


So, no, if these systems ever come close to a somewhat recent state of 
technology, I wouldn't mind if they were written in Python. The CPython 
runtime is pretty predictable in its performance characteristics, after all.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel

Alan Gauld, 21.12.2010 15:11:

"Stefan Behnel" wrote

And I thought a 1G file was extreme... Do these people stop to think that
with XML as much as 80% of their "data" is just description (ie the tags).


As I already said, it compresses well. In run-length compressed XML
files, the tags can easily take up a negligible amount of space compared
to the more widely varying data content


I understand how compression helps with the data transmission aspect.


compress rather well). And depending on how fast your underlying storage
is, decompressing and parsing the file may still be faster than parsing a
huge uncompressed file directly.


But I don't understand how uncompressing a file before parsing it can
be faster than parsing the original uncompressed file?


I didn't say "uncompressing a file *before* parsing it". I meant 
uncompressing the data *while* parsing it. Just like you have to decode it 
for parsing, it's just an additional step to decompress it before decoding. 
Depending on the performance relation between I/O speed and decompression 
speed, it can be faster to load the compressed data and decompress it into 
the parser on the fly. lxml.etree (or rather libxml2) internally does that 
for you, for example, if it detects compressed input when parsing from a file.


Note that these performance differences are tricky to prove in benchmarks, 
as repeating the benchmark usually means that the file is already cached in 
memory after the first run, so the decompression overhead will dominate in 
the second run. That's not what you will see in a clean run or for huge 
files, though.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel

David Hutto, 21.12.2010 13:09:

On Tue, Dec 21, 2010 at 6:59 AM, Stefan Behnel wrote:

David Hutto, 21.12.2010 12:45:


If file a.xml has simple tagged xml like, and file b.config has
tags that represent the a.xml(i.e.=) as greater tags,
does this pattern optimize the process by limiting the size of the
tags to be parsed in the xml, then converting those simpler tags that
are found to the b.config values for the simplesimple format?


In other words I'm lazy and asking for the experiment to be performed
for me(or, more importantly, if it has been), but since I'm not new to
this, if no one has a specific case, I'll timeit when I get to it.


I'm still not sure I understand what you are trying to describe here


a.xml has tags with simplistic forms, like was argued above, with,
or. b.config has variables for the simple tags in a.xml so that
  =  in b.config.

So when parsing a.xml, you parse it, then use more complex tags to
define with b.config.. I'll review the url's a little later.


Ok, I'd call that simple renaming, that's what I meant with "indirection" 
and "mapping" (basically the two concepts that computer science is all 
about ;).


Sure, run your own benchmarks, but don't expect anyone to be interested in 
the results. If your interest is to obfuscate the tag names, why not just 
use a binary (or less readable) format? That gives you much better 
obfuscation in the first place.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel

David Hutto, 21.12.2010 12:45:

If file a.xml has simple tagged xml like, and file b.config has
tags that represent the a.xml(i.e.  =) as greater tags,
does this pattern optimize the process by limiting the size of the
tags to be parsed in the xml, then converting those simpler tags that
are found to the b.config values for the simple  simple format?


In other words I'm lazy and asking for the experiment to be performed
for me(or, more importantly, if it has been), but since I'm not new to
this, if no one has a specific case, I'll timeit when I get to it.


I'm still not sure I understand what you are trying to describe here, but I 
think you want to look into the Wikipedia articles on indexing, hashing and 
compression.


http://en.wikipedia.org/wiki/Index_%28database%29
http://en.wikipedia.org/wiki/Index_%28information_technology%29
http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Data_compression

Terms like "indirection" and "mapping" also come to my mind when I try to 
make sense out of your hints.


Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

2010-12-21 Thread Stefan Behnel

David Hutto, 21.12.2010 12:02:

On Tue, Dec 21, 2010 at 5:45 AM, Alan Gauld wrote:

8 bytes to describe an int which could be represented in
a single byte in binary (or even in CSV).


Well, "CSV" indicates that there's at least one separator character 
involved, so make that an asymptotic 2 bytes on average. But obviously, 
compression applies to CSV and other 'readable' formats as well.




But that byte can't describe the tag


Yep, that's an argument that Alan already presented.

Stefan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


  1   2   3   >