Re: documentation for the change of Python 2.5

2006-06-28 Thread Serge Orlov
On 6/28/06, bussiere [EMAIL PROTECTED] wrote:
 I've read thsi documentation n:
 http://docs.python.org/dev/whatsnew/whatsnew25.html
 is there a way to have it in a more printable form ?

Yep: http://www.python.org/ftp/python/doc/2.5b1/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with sets and Unicode strings

2006-06-27 Thread Serge Orlov
On 6/27/06, Dennis Benzinger [EMAIL PROTECTED] wrote:
 Hi!

 The following program in an UTF-8 encoded file:


 # -*- coding: UTF-8 -*-

 FIELDS = (Fächer, )
 FROZEN_FIELDS = frozenset(FIELDS)
 FIELDS_SET = set(FIELDS)

 print uFächer in FROZEN_FIELDS
 print uFächer in FIELDS_SET
 print uFächer in FIELDS


 gives this output


 False
 False
 Traceback (most recent call last):
File test.py, line 9, in ?
  print uFÀcher in FIELDS
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
 ordinal not in range(128)


 Why do the first two print statements succeed and the third one fails
 with an exception?

Actually all three statements fail to produce correct result.

 Why does the use of set/frozenset remove the exception?

Because sets use hash algorithm to find matches, whereas the last
statement directly compares a unicode string with a byte string. Byte
strings can only contain ascii characters, that's why python raises an
exception. The problem is very easy to fix: use unicode strings for
all non-ascii strings.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python UTF-8 and codecs

2006-06-27 Thread Serge Orlov
On 6/27/06, Mike Currie [EMAIL PROTECTED] wrote:
 I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in
 them.  Every configuration I try I get a UnicodeError: ascii codec can't
 decode byte 0x85 in position 255: oridinal not in range(128)

 I've tried using the codecs.open('foo.txt', 'rU', 'utf-8', errors='strict')
 and that doesn't work and I've also try wrapping the file in an utf8_writer
 using codecs.lookup('utf8')

 Any clues?

Use unicode strings for non-ascii characters. The following program works:

import codecs

c1 = unichr(0x85)
f = codecs.open('foo.txt', 'wU', 'utf-8')
f.write(c1)
f.close()

But unichr(0x85) is a control characters, are you sure you want it?
What is the encoding of your data?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: to py or not to py ?

2006-06-27 Thread Serge Orlov
On 6/27/06, Chandrashekhar kaushik [EMAIL PROTECTED] wrote:
 HI all
 I have the following prob.
 I am to write a parallel vis application .
 I wud have by default used C++ for the same but somehow
 thought if py cud help me ..
 It does as in many things that i would otherwise have written down
 already exists ... ( like built in classes for sockets , threading etc )

 I would be doin the following type of tasks ..

 1. sending data across the network
 the data is going to be huge

 2. once data has been sent i will run some vis
 algos parallely on them and get the results

 now one thing that i wud req. is serializing my data structures so that
 they can be sent across the net.

 pyton does allow this using cPickle , but it bloats the data like anythin
 !!!
 for example a class containing 2 integers which i expect will be 8 bytes
 long ..
 cPickle.dumps returns a string thats 86 bytes wide  ( this is the binary
 version protocol 1 )

 anyway to improve serialization ??

Do it yourself using struct module.

 also is it actually a good idea to write high perf applications in python ?

Take a look at Mercurial http://www.selenic.com/mercurial/ sources.
It's a high performance python application. Or watch Bryan
O'Sullivan's Mercurial presentation
http://video.google.com/videoplay?docid=-7724296011317502612 he
talks briefly how they made it work fast.

But writing high performance application in python requires
self-discipline and attention to details, looking at the way you spell
I think it will be a challenge ;)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with sets and Unicode strings

2006-06-27 Thread Serge Orlov
On 6/27/06, Dennis Benzinger [EMAIL PROTECTED] wrote:
 Serge Orlov wrote:
  On 6/27/06, Dennis Benzinger [EMAIL PROTECTED] wrote:
  Hi!
 
  The following program in an UTF-8 encoded file:
 
 
  # -*- coding: UTF-8 -*-
 
  FIELDS = (Fächer, )
  FROZEN_FIELDS = frozenset(FIELDS)
  FIELDS_SET = set(FIELDS)
 
  print uFächer in FROZEN_FIELDS
  print uFächer in FIELDS_SET
  print uFächer in FIELDS
 
 
  gives this output
 
 
  False
  False
  Traceback (most recent call last):
 File test.py, line 9, in ?
   print uFÀcher in FIELDS
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
  ordinal not in range(128)
 
 
  Why do the first two print statements succeed and the third one fails
  with an exception?
 
  Actually all three statements fail to produce correct result.

 So this is a bug in Python?

No.

  frozenset remove the exception?
 
  Because sets use hash algorithm to find matches, whereas the last
  statement directly compares a unicode string with a byte string. Byte
  strings can only contain ascii characters, that's why python raises an
  exception. The problem is very easy to fix: use unicode strings for
  all non-ascii strings.

 No, byte strings contain characters which are at least 8-bit wide
 http://docs.python.org/ref/types.html.

Yes, but later it's written that non-ascii characters do not have
universal meaning assigned to them. In other words if you put byte
0xE4 into a bytes string all python knows about it is that it's *some*
character. If you put character U+00E4 into a unicode string python
knows it's a latin small letter a with diaeresis. Trying to compare
*some* character with a specific character is obviously undefined.

 But I don't understand what
 Python is trying to decode and why the exception says something about
 the ASCII codec, because my file is encoded with UTF-8.

Because byte strings can come from different sources (network, files,
etc) not only from the sources of your program python cannot assume
all of them are utf-8. It assumes they are ascii, because most of
wide-spread text encodings are ascii bases. Actually it's a guess,
since there are utf-16, utf-32 and other non-ascii encodings. If you
want to experience the life without guesses put
sys.setdefaultencoding(undefined) into site.py
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python UTF-8 and codecs

2006-06-27 Thread Serge Orlov
On 6/27/06, Mike Currie [EMAIL PROTECTED] wrote:
 Okay,

 Here is a sample of what I'm doing:


 Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
 win32
 Type help, copyright, credits or license for more information.
  filterMap = {}
  for i in range(0,255):
 ... filterMap[chr(i)] = chr(i)
 ...
  filterMap[chr(9)] = chr(136)
  filterMap[chr(10)] = chr(133)
  filterMap[chr(136)] = chr(9)
  filterMap[chr(133)] = chr(10)

This part is incorrect, it should be:

filterMap = {}
for i in range(0,128):
filterMap[chr(i)] = chr(i)

filterMap[chr(9)] = unichr(136)
filterMap[chr(10)] = unichr(133)
filterMap[unichr(136)] = chr(9)
filterMap[unichr(133)] = chr(10)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python UTF-8 and codecs

2006-06-27 Thread Serge Orlov
On 6/27/06, Mike Currie [EMAIL PROTECTED] wrote:
 Well,  not really.  It doesn't affect the result.  I still get the error
 message.  Did you get a different result?

Yes, the program succesfully wrote text file. Without magic abilities
to read the screen of your computer I guess you now get exception in
print statement. It is because you use legacy windows console (I use
unicode-capable console of lightning compiler
http://www.python.org/pypi/Lightning%20Compiler to run snippets of
code). You can either change console or comment out print statement or
change your program to print unicode representation: print
repr(filteredLine)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ascii Encoding Error with UTF-8 encoder

2006-06-27 Thread Serge Orlov
On 6/27/06, Mike Currie [EMAIL PROTECTED] wrote:
 Thanks for the thorough explanation.

 What I am doing is converting data for processing that will be tab (for
 columns) and newline (for row) delimited.   Some of the data contains tabs
 and newlines so, I have to convert them to something else so the file
 integrity is good.

Usually it is done by escaping: translate tab - \t, new line - \n,
back slash - \\.
Python strings already have a method to do it in just one line:
 s=chr(9)+chr(10)+chr(92)
 print s.encode(string_escape)
\t\n\\

when you're ready to convert it back you call decode(string_escape)


 Not my idea, I've been left with the implementation however.

The idea is actually not bad as long as you know how to cope with unicode.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Function to prune dictionary keys not working

2006-06-27 Thread Serge Orlov
On 6/27/06, John Machin [EMAIL PROTECTED] wrote:
 | '1.00' = 0.5
 True
 | '0.33' = 0.5
 True

 Python (correctly) does very little (guesswork-based) implicit type
 conversion.

At the same time, Python (incorrectly :) compares incomparable objects.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: nested dictionary assignment goes too far

2006-06-26 Thread Serge Orlov
On 26 Jun 2006 16:56:22 -0700, Jake Emerson [EMAIL PROTECTED] wrote:
 I'm attempting to build a process that helps me to evaluate the
 performance of weather stations. The script below operates on an MS
 Access database, brings back some data, and then loops through to pull
 out statistics. One such stat is the frequency of reports from the
 stations ('char_freq'). I have a collection of methods that operate on
 the data to return the 'char_freq' and this works great. However, when
 the process goes to insert the unique 'char_freq' into a nested
 dictionary the value gets put into ALL of the sub-keys for all of the
 weather stations.

It's a sure sign you're sharing an object. In python, unless
specifically written, an assignment-like method doesn't create copies:

 d = dict.fromkeys([1,2,3],[4,5,6])
 id(d[1]) == id(d[2])
True

Instead of

rain_raw_dict =
 dict.fromkeys(distinctID,{'N':-6999,'char_freq':-6999,'tip1':-6999,'tip2':-6999,'tip3':-6999,'tip4':-6999,'tip5':-6999,'tip6':-6999,'lost_rain':-6999})

you should do something like this:

defaults = 
{'N':-6999,'char_freq':-6999,'tip1':-6999,'tip2':-6999,'tip3':-6999,'tip4':-6999,'tip5':-6999,'tip6':-6999,'lost_rain':-6999}
rain_raw_dict = {}
for ID in [110,140,650,1440]:
rain_raw_dict[ID] = defaults.copy()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python database access

2006-06-25 Thread Serge Orlov
On 25 Jun 2006 21:19:18 -0700, arvind [EMAIL PROTECTED] wrote:
 Hi all,
 I am going to  work on Python 2.4.3 and MSSQL database server on
 Windows platform.
 But I don't know how to make the connectivity or rather which module to
 import.
 I searched for the modules in the Python library, but I couldn't find
 which module to go for.

The module you're looking for is the first result if you search
python mysql on google or if you search mysql on python package
index http://www.python.org/pypi
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP 314 - requirements for Python itself

2006-06-23 Thread Serge Orlov
On 6/23/06, Mark Nottingham [EMAIL PROTECTED] wrote:
 PEP 314 introduces metadata that explains what packages are required
 by a particular package. Is there any way to express what version of
 Python itself is required?

No, but you can do it yourself:

# do not edit this file, edit actualsetup.py instead
import sys
if sys.version_info  (2, 4):
print Error: Python 2.4 or greater is required to use this package
sys.exit(1)
import actualsetup

Disclaimer: I haven't actually run or tested this code, but the idea
is to write the checking code that is compatible with very old python
versions and do the actual work in actualsetup.py
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP 314 - requirements for Python itself

2006-06-23 Thread Serge Orlov
On 6/23/06, Mark Nottingham [EMAIL PROTECTED] wrote:
 I was looking for some normal (hopefully, machine-readable) way to
 indicate it so that people can figure out the version of Python
 required before they download the package.

I'm sure writing English text like make sure you have python 2.4
before downloading this package is not abnormal :) How do you expect
to prevent users from downloading your package if they don't have
python your package needs? It could be useful if there was a tool to
silently download and install python, but I'm sure it is a pain to
code and support such a tool, so nobody was crazy enough to do it.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP 314 - requirements for Python itself

2006-06-23 Thread Serge Orlov
On 6/23/06, Mark Nottingham [EMAIL PROTECTED] wrote:
 I was thinking more about things where people can search for packages
 that need different versions of python, etc.; not so much for
 automation.

OK, now I see why you need it. I'm sure using virtual package name
python to declare python dependence is logical and
non-controversial, so you can write to python-dev asking for PEP 314
addendum, I just wanted say that nobody is checking it right now and
*right now* run-time checking is the way to go.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Porting python to a TI Processor (C64xx)

2006-06-21 Thread Serge Orlov
On 6/21/06, Roland Geibel [EMAIL PROTECTED] wrote:
Dear all.We want to make python run on DSP processors (C64xx family of TI).I don't know what C64xx is, but I believe python needs general purpose CPU to run 
I've already tried to ask [EMAIL PROTECTED] (about his Python forarm-Linux),but didn't get an answer so far.Neither could Ifind it in the Python tree at sourceforge.
What are you trying to find in the sources? Python is just a C program and you port it just like any C program. 
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python at compile - possible to add to PYTHONPATH

2006-06-21 Thread Serge Orlov
On 21 Jun 2006 15:54:56 -0700, rh0dium [EMAIL PROTECTED] wrote:
 Hi all,

 Can anyone help me out.  I would like to have python automatically look
 in a path for modules similar to editing the PYTHONPATH but do it at
 compile time so every user doesn't have to do this..

 Soo...

 I want to add /foo/bar to the PYTHONPATH build so I don't have to add
 it later on.  Is there a way to do this?

You don't need to recompile python. Just change sys.path before all
import statements.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: statically linked python

2006-06-19 Thread Serge Orlov
Ralph Butler wrote:
 Serge Orlov wrote:
  Ralph Butler wrote:
  Hi:
 
  I have searched the docs and google but have not totally figured
  out how to accomplish my task:  On a linux box, I want to compile
  and link python so that it uses no shared libraries, but does support
  import of some extra modules.  I have made a few attempts but
  with limited success.  In particular, I have tried things like
  adding -static to the compiler options in the Makefile.
 
  At one point I managed to build a python that was close to what I
  wanted, e.g. when I ran ldd python, it said:
   not a dynamic executable
  In that version, when I do some imports, e.g. sys, os, etc. they
  load fine.  But, when I try to import some other modules, e.g. time,
  they are not found.  I have tried similar procedures while also
  altering Modules/Setup.local (produced by configure) to contain:
   time timemodule.c # -lm # time operations and variables
 
  There has to be a simple, elegant way to accomplish this which I am
  simply overlooking.  Any help would be appreciated.
 
  This has nothing to do with python. glibc doesn't support loading
  shared libraries into statically linked executables. At least it didn't
  support in 2002:
  http://www.cygwin.com/ml/libc-alpha/2002-06/msg00079.html
  Since it still doesn't work most likely it is still not supported, but
  you may ask glibc developers what is the problem.
 

 I do not want to load them.  I want to statically link the code for a
 module (e.g. time) directly into the statically linked executable.
 Sorry if that was not clear.

OK, so you're asking how to make a module builtin. I haven't done that
myself, but let me give you a hint where to look: there is list of
builtin modules sys.builtin_module_names if you search the whole python
source distribution for some of the names in the list you'll get list
of files where to look. I've just searched and found that only two
files are involved: PC\config.c and setup.py

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: statically linked python

2006-06-17 Thread Serge Orlov
Ralph Butler wrote:
 Hi:

 I have searched the docs and google but have not totally figured
 out how to accomplish my task:  On a linux box, I want to compile
 and link python so that it uses no shared libraries, but does support
 import of some extra modules.  I have made a few attempts but
 with limited success.  In particular, I have tried things like
 adding -static to the compiler options in the Makefile.

 At one point I managed to build a python that was close to what I
 wanted, e.g. when I ran ldd python, it said:
  not a dynamic executable
 In that version, when I do some imports, e.g. sys, os, etc. they
 load fine.  But, when I try to import some other modules, e.g. time,
 they are not found.  I have tried similar procedures while also
 altering Modules/Setup.local (produced by configure) to contain:
  time timemodule.c # -lm # time operations and variables

 There has to be a simple, elegant way to accomplish this which I am
 simply overlooking.  Any help would be appreciated.

This has nothing to do with python. glibc doesn't support loading
shared libraries into statically linked executables. At least it didn't
support in 2002:
http://www.cygwin.com/ml/libc-alpha/2002-06/msg00079.html
Since it still doesn't work most likely it is still not supported, but
you may ask glibc developers what is the problem.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup error

2006-06-16 Thread Serge Orlov
William Xu wrote:
 Hi, all,

 This piece of code used to work well. i guess the error occurs after
 some upgrade.

  import urllib
  from BeautifulSoup import BeautifulSoup
  url = 'http://www.google.com'
  port = urllib.urlopen(url).read()
  soup = BeautifulSoup()
  soup.feed(port)
 Traceback (most recent call last):
   File stdin, line 1, in ?
   File /usr/lib/python2.3/sgmllib.py, line 94, in feed

Look at the traceback: you're not calling BeautifulSoup module! In
fact, there is no feed method in the current BeautifulSoup
documentation. Maybe it used to work well, but now it's definitely
going to fail. As I understand documentation you need to write

soup = BeautifulSoup(port)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: memory leak problem with arrays

2006-06-15 Thread Serge Orlov
sonjaa wrote:
 Serge Orlov wrote:
  sonjaa wrote:
   Serge Orlov wrote:
sonjaa wrote:
 Hi

 I'm new to programming in python and I hope that this is the problem.

 I've created a cellular automata program in python with the numpy 
 array
 extensions. After each cycle/iteration the memory used to examine and
 change the array as determined by the transition rules is never freed.
 I've tried using del on every variable possible, but that hasn't
 worked.
   
Python keeps track of number of references to every object if the
object has more that one reference by the time you use del the object
is not freed, only number of references is decremented.
   
Print the number of references for all the objects you think should be
freed after each cycle/iteration, if is not equal 2 that means you are
holding extra references to those objects. You can get the number of
references to any object by calling sys.getrefcount(obj)
  
   thanks for the info. I used this several variables/objects and
   discovered that little counters i.e. k = k +1 have many references to
   them, up tp 1+.
   Is there a way to free them?
 
  Although it's looks suspicious, even if you manage to free it you will
  gain only 12 bytes. I think you should concentrate on more fat
  objects ;)


 Sent message to the NumPy forum as per Roberts suggestion.
 An update after implimenting the suggestions:

 After doing this I see that iterative counters used to collect
 occurrences
 and nested loop counters (ii  jj) as seen in the code example below
 are the culprits with the worst ones over 1M:

That means you have over 1M integers in your program. How did it happen
if you're using numpy arrays? If I allocate a numpy array of one
million bytes it is not using one million integers, whereas a python
list of 1M integers creates 1M integers:

 import numpy
 a = numpy.zeros((100,), numpy.UnsignedInt8)
 import sys
 sys.getrefcount(0)
632
 b=[0]*100
 sys.getrefcount(0)
1000632


But that doesn't explain why your program doesn't free memory. But the
way, are you sure you have enough memory for one iteration of your
program?

-- 
http://mail.python.org/mailman/listinfo/python-list


[OT] Re: Python open proxy honeypot

2006-06-15 Thread Serge Orlov
imcs ee wrote:
 On 13 Jun 2006 15:09:57 -0700, Serge Orlov [EMAIL PROTECTED] wrote:
  Alex Reinhart wrote:
  My spam folder at gmail is not growing anymore for many months (it is
  about 600-700 spams a month). Have spammers given up spamming gmail.com
  only or is it global trend?
 Gmail said messages that have been in Spam more than 30 days will be
 automatically deleted
 so may be the speed of spam comes in counterbalanced to the speed spam goes 
 out?

Yes, it is. My point was monthly amount is not increasing for me. But
I guess if you publish your email everywhere it is increasing:
http://egofood.blogspot.com/2006/06/well-spam-is-officially-annoying.html
20,000 a month. Wow.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: BeautifulSoup error

2006-06-15 Thread Serge Orlov
William Xu wrote:
 Hi, all,

 This piece of code used to work well. i guess the error occurs after
 some upgrade.

  import urllib
  from BeautifulSoup import BeautifulSoup
  url = 'http://www.google.com'
  port = urllib.urlopen(url).read()
  soup = BeautifulSoup()
  soup.feed(port)
 Traceback (most recent call last):
   File stdin, line 1, in ?
   File /usr/lib/python2.3/sgmllib.py, line 94, in feed
 self.rawdata = self.rawdata + data
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xb8 in position 565: 
 ordinal not in range(128)
 

 Any ideas to solve this?

According to the documentation
http://www.crummy.com/software/BeautifulSoup/documentation.html
chapter Beautiful Soup Gives You Unicode, Dammit Beautiful Soup fully
supports unicode so it's probably a bug.

 version info:

 Python 2.3.5 (#2, Mar  7 2006, 12:43:17)
 [GCC 4.0.3 20060212 (prerelease) (Debian 4.0.2-9)] on linux2

 python-beautifulsoup: 3.0.1-1

Upgrading python-beautifulsoup is a good idea, since there were two bug
fix releases after 3.0.1

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: memory leak problem with arrays

2006-06-14 Thread Serge Orlov
sonjaa wrote:
 Hi

 I'm new to programming in python and I hope that this is the problem.

 I've created a cellular automata program in python with the numpy array
 extensions. After each cycle/iteration the memory used to examine and
 change the array as determined by the transition rules is never freed.
 I've tried using del on every variable possible, but that hasn't
 worked.

Python keeps track of number of references to every object if the
object has more that one reference by the time you use del the object
is not freed, only number of references is decremented.

Print the number of references for all the objects you think should be
freed after each cycle/iteration, if is not equal 2 that means you are
holding extra references to those objects. You can get the number of
references to any object by calling sys.getrefcount(obj)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: memory leak problem with arrays

2006-06-14 Thread Serge Orlov
sonjaa wrote:
 Serge Orlov wrote:
  sonjaa wrote:
   Hi
  
   I'm new to programming in python and I hope that this is the problem.
  
   I've created a cellular automata program in python with the numpy array
   extensions. After each cycle/iteration the memory used to examine and
   change the array as determined by the transition rules is never freed.
   I've tried using del on every variable possible, but that hasn't
   worked.
 
  Python keeps track of number of references to every object if the
  object has more that one reference by the time you use del the object
  is not freed, only number of references is decremented.
 
  Print the number of references for all the objects you think should be
  freed after each cycle/iteration, if is not equal 2 that means you are
  holding extra references to those objects. You can get the number of
  references to any object by calling sys.getrefcount(obj)

 thanks for the info. I used this several variables/objects and
 discovered that little counters i.e. k = k +1 have many references to
 them, up tp 1+.
 Is there a way to free them?

Although it's looks suspicious, even if you manage to free it you will
gain only 12 bytes. I think you should concentrate on more fat
objects ;)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: split with * in string and ljust() puzzles

2006-06-14 Thread Serge Orlov
Sambo wrote:
 I have just (finally) realized that it is splitting and removing
 on single space but that seams useless, and split items
 1 and 2 are empty strings not spaces??

What is useless for you is worth $1,000,000 for somebody else ;)
If you have comma separated list '1,,2'.split(',') naturally returns
['1', '', '2']. I think you can get what you want with a simple regexp.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Bundling an application with third-party modules

2006-06-14 Thread Serge Orlov
Ben Finney wrote:
 Serge Orlov [EMAIL PROTECTED] writes:

  Ben Finney wrote:
   That's a large part of my question. How can I lay out these
   modules sensibly during installation so they'll be easily
   available to, but specific to, my application?
 
  Put them in a directory lib next to the main module and start the
  main module with the following blurb:
  
  import sys, os
  sys.path.insert(1, os.path.join(sys.path[0],lib))
  

 The application consists of many separate programs to perform various
 tasks, some larger than others. There's no sensible place for a main
 module.

Perhaps I'm using my own jargon. By main module I mean every module
used to start any application within your project. If you want
relocatable solution, create empty .topdir file in the top directory
and put this blurb into every application:

import sys, os
top_dir = sys.path[0]
while True:
if os.path.exists(os.path.join(top_dir,.topdir)):
break
top_dir = os.path.dirname(top_dir)
sys.path.insert(1, os.path.join(top_dir,lib))

I don't think you need to worry about duplication, I used this code. It
is verion 1.0 and it is final :) You won't need to change it.

 There probably will be a library directory for common code,
 though. Are you suggesting that the third-party libraries should go
 within the application-native library?

Not really. I was just feeling lazy to type a generic solution, so I
assumed one project == one application.

 What's a good way to get from upstream source code (some of which is
 eggs, some of which expects 'distutils' installation, and some of
 which is simple one-file modules) to a coherent set of application
 library code, that is automatable in an install script?

Well, I did it manually, it's not that time consuming if you keep in
mind that you also need to test new version and by testing I also mean
finding integration bugs days later. Anyway, if you feel like
automating I think you can do something using distutils command
install --home=/temp/dir and then copying to your common library
directory in a flat manner (--home option puts files in subdirectories
that don't make sense for a bundled lib)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: embedded python and windows multi threading, can't get it to work

2006-06-13 Thread Serge Orlov
freesteel wrote:
 I am trying to run a python programme embedded from C++. I want to run
 the same python code concurrently in several threads. I read the manual
 on embedding, especially chapter 8, and searched for relevant info on
 google all afternoon, but I can't get this to work. What am I doing
 wrong? I use python2.4 and vc++7 (.net). The first thread seems to work
 okay, the 2nd thread crashes, but the exception information is not very
 useful:
 (An unhandled exception of type 'System.NullReferenceException'
 occurred in pyembed_test.exe

Running one iterpreter in more than one thread is not supported. You
need to create one interpreter per thread using Py_NewInterpreter
(don't forget to read Bugs and caveats paragraph). I hope you also
realize the interpreters won't share objects.

-- 
http://mail.python.org/mailman/listinfo/python-list


[OT] Re: Python open proxy honeypot

2006-06-13 Thread Serge Orlov
Alex Reinhart wrote:
 Being deluged by spam like nearly all of us (though fortunately I have a
 very good spam filter), I also hate spam as much as almost everybody. I
 know basic Python (enough to make a simple IRC bot) and I figured a good
 project to help learn Python would be to make a simple proxypot.

 I've done some research and found one already existing, written in Perl
 (http://www.proxypot.org/). However, I prefer the syntax and ease of
 Python (and Proxypot is no longer maintained, as far as I can see), so I
 decided to write my own. I have just one question:

 Is running Python's built-in smtpd, pretending to accept and forward all
 messages, enough to get me noticed by a spammer, or do I have to do
 something else to advertise my script as an open proxy?

 I'm hoping to make this proxy script distributed, in that several
 honeypots are run on different servers, and the results are then
 collected on a central server that provides statistics and a listing of
 all spammers caught. So, just out of curiosity, I'd like to know how
 many people would actually be willing to run a honeypot on their server,
 and how many are opposed to the idea (just so I know if the concept is
 even valid).

IMHO it's pretty useless, spammers are starting to use botnets, and the
more you make inconvenient to them use open proxies, the more of them
will move to closed botnets.

My spam folder at gmail is not growing anymore for many months (it is
about 600-700 spams a month). Have spammers given up spamming gmail.com
only or is it global trend?

-- 
http://mail.python.org/mailman/listinfo/python-list


[OT] Re: Python open proxy honeypot

2006-06-13 Thread Serge Orlov
Alex Reinhart wrote:
 Serge Orlov wrote:
  IMHO it's pretty useless, spammers are starting to use botnets, and the
  more you make inconvenient to them use open proxies, the more of them
  will move to closed botnets.
 As long as I inconvenience them, or at least catch one or two, I'll be
 satisfied.

What makes you think that spammers won't discover you're blackholing
their spam as soon as you start to make some impact on their business?
They will just skip your proxypots and move to real open proxies.

I think you'll make bigger impact if you implement proxy checking
software http://dsbl.org/programs in Python, so it can run on windows
too.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Intermittent Failure on Serial Port (Trace Result)

2006-06-12 Thread Serge Orlov
H J van Rooyen wrote:

 Note that the point of failure is not the same place in the python file, but 
 it
 is according to the traceback, again at a flush call...

Yes, traceback is bogus. Maybe the error is raised during garbage
collection, although the strace you've got doesn't show that. The main
reason of the failure seems to be a workaround in python's function
new_buffersize, it doesn't clear errno after lseek and then this errno
pops up somewhere else. There are two places I can clearly see that
don't clear errno: file_dealloc and get_line. Obviously this stuff
needs to be fixed, so you'd better file a bug report. I'm not sure how
to work around this bug in the meantime, since it is still not clear
where this error is coming from. Try to pin point it. For example, if
your code relies on garbage collection to call file.close, try to close
all files in your program explicitly. It seems like a good idea anyway,
since your program is long running, errors during close are not that
significant. Instead of standard close I'd call something like this:

def soft_close(f):
try:
f.close()
except IOError, e:
print stderr, Hmm, close of file failed. Error was: %s %
e.errno

 The close failed is explicable - it seems to happen during closedown, with 
 the
 port already broken..,

It is not clear who calls lseek right before close. lseek is called by
new_buffersize that is called by file.read. But who calls file.read
during closedown?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Intermittent Failure on Serial Port

2006-06-11 Thread Serge Orlov
H J van Rooyen wrote:
 Serge Orloff wrote:

 | H J van Rooyen wrote:
 |  Traceback (most recent call last):
 |File portofile.py, line 232, in ?
 |  ret_val = main_routine(port, pollstruct, pfifo)
 |File portofile.py, line 108, in main_routine
 |  send_nak(port, timeout)  # so bad luck - comms error
 |File /home/hvr/Polling/lib/readerpoll.py, line 125, in send_nak
 |  port.flush()
 |  IOError: [Errno 29] Illegal seek
 |  close failed: [Errno 29] Illegal seek
 | 
 |
 |
 |  Where can I find out what the Errno 29 really means?
 |  Is this Python, the OS or maybe hardware?
 |
 | It is from kernel: grep -w 29 `locate errno`
 | /usr/include/asm-generic/errno-base.h: #define   ESPIPE  29
 |  /* Illegal seek */
 |
 | man lseek:
 |
 | ERRORS:
 | ESPIPE fildes is associated with a pipe, socket, or FIFO.
 |
 | RESTRICTIONS:
 | Linux  specific  restrictions:  using  lseek  on  a  tty device
 | returns ESPIPE.


 Thanks for the info - so the Kernel sometimes bombs me out - does anybody know
 why the python flush sometimes calls lseek?

I thought it was your own flush method. If it is file.flush method that
makes the issue more complicated, since stdlib file.flush doesn't call
lseek method. I suggest you run your program using strace to log system
calls, without such log it's pretty hard to say what's going on. The
most interesting part is the end, but make sure you have enough space
for the whole log, it's going to be big.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Intermittent Failure on Serial Port

2006-06-10 Thread Serge Orlov
H J van Rooyen wrote:
 Traceback (most recent call last):
   File portofile.py, line 232, in ?
 ret_val = main_routine(port, pollstruct, pfifo)
   File portofile.py, line 108, in main_routine
 send_nak(port, timeout)  # so bad luck - comms error
   File /home/hvr/Polling/lib/readerpoll.py, line 125, in send_nak
 port.flush()
 IOError: [Errno 29] Illegal seek
 close failed: [Errno 29] Illegal seek



 Where can I find out what the Errno 29 really means?
 Is this Python, the OS or maybe hardware?

It is from kernel: grep -w 29 `locate errno`
/usr/include/asm-generic/errno-base.h: #define   ESPIPE  29
 /* Illegal seek */

man lseek:

ERRORS:
ESPIPE fildes is associated with a pipe, socket, or FIFO.

RESTRICTIONS:
Linux  specific  restrictions:  using  lseek  on  a  tty device
returns ESPIPE.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Getting start/end dates given week-number

2006-06-09 Thread Serge Orlov
Tim Chase wrote:
 I've been trying to come up with a good algorithm for determining
 the starting and ending dates given the week number (as defined
 by the strftime(%W) function).

I think you missed %U format, since later you write:

 My preference would be for a Sunday-Saturday range rather than a
 Monday-Sunday range.  Thus,

 Any thoughts/improvements/suggestions would be most welcome.

If you want to match %U:

def weekBoundaries(year, week):
 startOfYear = date(year, 1, 1)
 week0 = startOfYear - timedelta(days=startOfYear.isoweekday())
 sun = week0 + timedelta(weeks=week)
 sat = sun + timedelta(days=6)
 return sun, sat

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Freezing a static executable

2006-06-05 Thread Serge Orlov
Will Ware wrote:
 I am trying to freeze a static executable. I built a static Python
 executable this way:
 ./configure --disable-shared --prefix=/usr/local
 make
 make install
 Even that didn't give me a really static executable, though:

AFAIK it's not supported because the interpreter won't be able to load
C extensions if compiled statically. There is a bootstrap issue, to
build a static python executable you need extensions built but to build
extensions you need python, so you need unconventional build procedure.

After python build is finished you get static library libpython2.4.a.
Then you need all extensions you're going to use built as .a files (I'm
not even sure there is a standard way to do it). Then you need to write
a loader like in py2exe, exemaker, pyinstaller, etc that will
initialize python interperter and extensions. Those three pieces
(libpython2.4.a, extensions, loader) can be linked as a static
executable.


 What stupid thing am I doing wrong?

You are just trying to do something nobody was really interested to
implement.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: is it possible to find which process dumped core

2006-06-05 Thread Serge Orlov
su wrote:
 to find which process dumped core at the promt we give

 $ file core.28424

 core.28424: ELF 32-bit LSB core file of 'soffice.bin' (signal 11),
 Intel 80386, version 1 (SYSV), from 'soffice.bin'

 from this command we know 'soffice.bin' process dumped core. Now can i
 do the same using python i.e. finding which process dumped core?  if so
 how can i do it?

Parse a core file like the file command does?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Get EXE (made with py2exe) path directory name

2006-06-05 Thread Serge Orlov

Andrei B wrote:
 I need to get absolute path name of a file that's in the same dir as
 the exe, however the Current Working Directory is changed to somthing
 else.


Use sys.path[0]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: struct: type registration?

2006-06-02 Thread Serge Orlov
John Machin wrote:
 On 2/06/2006 4:18 AM, Serge Orlov wrote:
  If you want to parse binary data use pyconstruct
  http://pyconstruct.wikispaces.com/
 

 Looks promising on the legibility and functionality fronts. Can you make
 any comment on the speed?

I don't know really. I used it for small data parsing, its performance
was acceptable. As I understand it is implemented right now as pure
python code using struct under the hood. The biggest concern is the
lack of comprehensive documentation, if that scares you, it's not for
you.

 Reason for asking is that Microsoft Excel
 files have this weird RK format for expressing common float values in
 32 bits (refer http://sc.openoffice.org, see under Documentation
 heading). I wrote and support the xlrd module (see
 http://cheeseshop.python.org/pypi/xlrd) for reading those files in
 portable pure Python. Below is a function that would plug straight in as
 an example of Giovanni's custom unpacker functions. Some of the files
 can be very large, and reading rather slow.

I *guess* that the *current* implementation of pyconstruct will make
parsing slightly slower. But you have to try to find out.

 from struct import unpack

 def unpack_RK(rk_str): # arg is 4 bytes
  flags = ord(rk_str[0])
  if flags  2:
  # There's a SIGNED 30-bit integer in there!
  i, = unpack('i', rk_str)
  i = 2 # div by 4 to drop the 2 flag bits
  if flags  1:
  return i / 100.0
  return float(i)
  else:
  # It's the most significant 30 bits
  # of an IEEE 754 64-bit FP number
  d, = unpack('d', '\0\0\0\0' + chr(flags  252) + rk_str[1:4])
  if flags  1:
  return d / 100.0
  return d

I had to lookup what  means :) Since nobody except this function cares
about internals of RK number, you don't need to use pyconstruct to
parse at bit level. The code will be almost like you wrote except you
replace unpack('d', with Construct.LittleFloat64().parse( and plug
the unpack_RK into pyconstruct framework by deriving from Field class.
Sure, nobody is going to raise your paycheck because of this rewrite :)
The biggest benefit comes from parsing the whole data file with
pyconstruct, not individual fields.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: py2exe qt4/qimage

2006-06-01 Thread Serge Orlov
aljosa wrote:
 i'm trying to convert python (image resizer script using PyQt4) script
 to exe but support for jpeg and tiff image formats is located in
 Qt4.1\plugins\imageformats (dll files) and when script is converted
 exe file doesn't support jpeg and tiff.

 i tryed using all file formats in
 script:
 tmp1 = QImage('images/type.bmp')
 tmp2 = QImage('images/type.gif')
 tmp3 = QImage('images/type.jpg')
 tmp4 = QImage('images/type.png')
 tmp5 = QImage('images/type.tif')

 but it doesn't work when i convert script to exe.
 any tips on howto include jpeg and tiff image formats support in exe?

You need bundle the plugins as data files:
http://docs.python.org/dist/node12.html

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: struct: type registration?

2006-06-01 Thread Serge Orlov
Giovanni Bajo wrote:
 John Machin wrote:
  I am an idiot, so please be gentle with me: I don't understand why you
  are using struct.pack at all:

 Because I want to be able to parse largest chunks of binary datas with custom
 formatting. Did you miss the whole point of my message:

 struct.unpack(3liiSiiShh, data)

Did you want to write struct.unpack(Sheesh, data) ? Seriously, the
main problem of struct is that it uses ad-hoc abbreviations for
relatively rarely[1] used functions calls and that makes it hard to
read.

If you want to parse binary data use pyconstruct
http://pyconstruct.wikispaces.com/

[1] Relatively to regular expression and string formatting calls.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Are ActivePython scripts compatible with Linux?

2006-05-31 Thread Serge Orlov
A.M wrote:
 I am planning to develop python applications on windows and run them on
 Linux.

 Larry Bates [EMAIL PROTECTED] wrote in message
 news:[EMAIL PROTECTED]
  Short answer: yes

A.M wrote:
 Thanks alot Larry for your comprehensive answer.

Small addition: test, test, test. This is the only way to make sure
your program works on another platform. VMware is offering now free
virtual machine emulator, vmplayer. You have no excuse not to install
linux! :) If you have dual-core processor or an idle machine you can
even setup http://buildbot.sf.net to continuously test your source code
changes.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is anybody knows about a linkable, quick MD5/SHA1 calculator library ?

2006-05-30 Thread Serge Orlov
DurumDara wrote:
 Hi !

 I need to speedup my MD5/SHA1 calculator app that working on
 filesystem's files.
 I use the Python standard modules, but I think that it can be faster if
 I use C, or other module for it.

 I use FSUM before, but I got problems, because I move into DOS area,
 and the parameterizing of outer process maked me very angry (not working).
 You will see this in this place:
 http://mail.python.org/pipermail/python-win32/2006-May/004697.html

FWIW I looked at what is the problem, apparently fsum converts the name
back to unicode, tries to print it and silently corrupts the output.
You give it short name XA02BB~1 of the file xAÿ and fsum prints xA

Use python module or try another utility.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: why not in python 2.4.3

2006-05-29 Thread Serge Orlov
Rocco wrote:
 Also with ascii the function does not work.

Well, at least you fixed misconfiguration ;)

Googling for 1F8B (that's two first bytes from your strange python 2.4
result) gives a hint: it's a beginning of gzip stream. Maybe urllib2 in
python 2.4 reports to the server that it supports compressed data but
doesn't decompress it when receives the reply?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: saving settings

2006-05-29 Thread Serge Orlov
SuperHik wrote:
 aum wrote:
  On Mon, 29 May 2006 09:05:36 +0200, SuperHik wrote:
 
  Hi,
 
  I was wondering how to make a single .exe file, say some kind od clock,
  and be able to save some settings (alarm for example) into the same
  file? Basically make code rewrite it self...
 
  thanks!
 
  Yikes!!!
 
  I'd strongly suggest you read the doco for ConfigParser, and load/save
  your config file to/from os.path.join(os.path.expanduser(~)).
 
  Another option - save your stuff in the Windows Registry
 

 but if I copy this file on the other computer settings will be lost...

Put your program in a writable folder and save configuration right into
that folder. Then your can transfer the whole folder. Tip: sys.path[0]
always contains the path to the directory where __main__ module is
located.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: why not in python 2.4.3

2006-05-29 Thread Serge Orlov
John Machin wrote:
 On 29/05/2006 10:47 PM, Serge Orlov wrote:
  Maybe urllib2 in
  python 2.4 reports to the server that it supports compressed data but
  doesn't decompress it when receives the reply?
 

 Something funny is happening here. Others reported it working with 2.4.3
 and Rocco's original code as posted in this thread -- which works for me
 on 2.4.2, Windows XP.

It works for me too, returning raw uncompressed data.

 There was one suss thing about Rocco's problem description:
 First message ended with  d=takefeed(url)
 But next message said print rss
 Is rss == d?

Nope. If you look at html tags, 2.3 code returns feed generator ...
whereas 2.4 code returns rss channel generator ... That may
explain why 2.3 result is not compressed and 2.4 result is compressed,
but that doesn't explain why 2.4 *is* compressed. I looked at python
2.4 httplib, I'm sure it's not a problem, quote from httplib:

# we only want a Content-Encoding of identity since we
don't
# support encodings such as x-gzip or x-deflate.

I think there is a web accellerator sitting somewhere between Rocco and
Google server that is confused that Rocco is misinforming web server
saying he's using Firefox, but at the same time claiming that he cannot
handle compressed data. That's why they teach little kids: don't lie :)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: why not in python 2.4.3

2006-05-28 Thread Serge Orlov
Rocco wrote:

  import sys
  sys.getdefaultencoding()
 'latin_1'

Don't change default encoding. It should be always ascii.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: q - including manpages in setup.py

2006-05-28 Thread Serge Orlov
aum wrote:
 Hi,

 What is the best way to incorporate manpages in a distutils setup.py
 script?

 Is there any distro-independent way to find the most appropriate place to
 put the manpages?
 For instance, /usr/man/? /usr/share/man? /usr/local/man?
 /usr/local/share/man?

What do you mean distro? Linux? That should be /usr/local/man but AFAIK
some distros are misconfigured and their man doesn't search /usr/local
by default, YMMV.

 Also - I've got .html conversions of the manpages, for the benefit of OSs
 such as Windows which don't natively support manpages. What's the best
 place to put these?

your_tool --html-manual that uses os.start or webbrowser module to
invoke html viewer. Or your_tool --man that dumps plain text on the
screen.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: iteration over non-sequence ,how can I resolve it?

2006-05-28 Thread Serge Orlov
python wrote:
 To BJörn Lindqvist :
  thank you . how to write the code specifically ?Could you give a
 example?

Use Queue module:

import threading
from Queue import Queue

class PrintThread(threading.Thread):
  def __init__(self, urlList, results_queue):
threading.Thread.__init__(self)
urllist=[]
self.urllist=urlList
self.results_queue = results_queue
  def run(self):
urllink=[self.urllist] * 2
self.results_queue.put(urllink)

results = Queue()
threadList = []
for i in range(0,2):
thread=PrintThread(Thread+str(i), results)
threadList.append(thread)
thread.start()

for i in threadList:
linkReturned = results.get()
for j in linkReturned:
print j

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: deploying big python applications

2006-05-25 Thread Serge Orlov
AndyL wrote:
 Hi,

 let me describe how I do that today. There is standard python taken from
   python.org installed in a c:\python23 with at least dozen different
 additional python packages (e.g. SOAPpy, Twisted, wx, many smaller ones
 etc) included. Also python23.dll moved from c:\windows to c:\python23.
 This is zipped and available as over 100MB file to anyone to manually
 unzip on his/her PC. This is a one time step.

 On top of that there is 30K lines of code with over 100 .py files
 application laid out within a directory tree. Very specific for the
 domain, typical application. This again is zipped and available to
 anyone as much smaller file to unzip and use. This step is per software
 releases.

 There is one obvious drawback - I can not separate python from standard
 libraries easily.

True, python releases on windows are forward incompatible with C
extensions, so don't even think about that. I'm not even talking about
big pure python packages that could probably break because of small
subtle changes in python API between releases.

 So when upgrade to 2.4 comes, I need to reinstall all
 the packages.

Yes, but how much time it will *actually* take? I bet it's 1 hour.
Seriously, why don't you *time* it with a stopwatch? And then compare
that time to the time needed to debug the new release.

 In order to address that as well as the Linux port I
 project following structure:
   -default python.org installation or one time step on Windows
   -set of platform dependent libraries in directory A
   -set of platform independent libraries in directory B
   -application in directory C

I would suggest the same structure I described for deploying over LAN:
http://groups.google.com/group/comp.lang.python/msg/2482a93eb7115cb6?hl=en;

The only problem is that exemaker cannot find python relative to
itself, you will have to mash exemaker, python and application launcher
in one directory. So the layout is like this:

app/
engine/  -- directory with your actual application
app.exe  -- renamed exemaker.exe
app.py  -- dispatching module, see below
python.exe
python24.dll
lib -- python stdlib, etc

=== app.py ===
from engine import real_application


This way file engine/real_application.py is platform independant.

On Linux/Unix shell script is an equivalent of exemaker. Or C program
like exemaker, but you will have to compile it for all platforms.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Version Testing Tool?

2006-05-24 Thread Serge Orlov
Michael Yanowitz wrote:
 Hello:

Is there a version testing tool available for Python
 such that I can check to see if my code will still run in
 versions 2.2, 2.3, 2.4.3, and 1.1 (for example) (or whatever)
 without having to install all these different versions on my
 computer?

Such tool will never be reliable, unless it duplicates all the work
that went into all python versions.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: NEWB: reverse traversal of xml file

2006-05-23 Thread Serge Orlov
manstey wrote:
 But will this work if I don't know parts in advance.

Yes it will work as long as the highest part number in the whole file
is not very high. The algorithm needs only store N records in memory,
where N is the highest part number in the whole file.

 I only know parts
 by reading through the file, which has 450,000 lines.

Lines or records? I created a sequence of 10,000,000 numbers which is
equal to your ten million records like this:

def many_numbers():
for n in xrange(100):
for part in xrange(10):
yield part
parts = many_numbers()

and the code processed it consuming virtually no memory in 13 seconds.
That is the advantage of iterators and generators, you can process long
sequences without allocating a lot of memory.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: NEWB: reverse traversal of xml file

2006-05-22 Thread Serge Orlov
manstey wrote:
 Hi,

 I have an xml file of about 140Mb like this:

 book
   record
 ...
  wordpartWTS1/wordpartWTS
   /record
   record
 ...
 wordpartWTS2/wordpartWTS
   /record
   record
 ...
 wordpartWTS1/wordpartWTS
   /record
 /book

 I want to traverse it from bottom to top and add another field to each
 record totalWordPart1/totalWordPart
 which would give the highest value of wordpartWTS for each record for
 each word

 so if wordparts for the first ten records were 1 2 1 1 1 2 3 4 1 2
 I want totalWordPart to be 2 2 1 1 4 4 4 4 2 2

 I figure the easiest way to do this is to go thru the file backwards.

 Any ideas how to do this with an xml data file?

You need to iterate from the beginning and use itertools.groupby:

from itertools import groupby

def enumerate_words(parts):
word_num = 0
prev = 0
for part in parts:
if prev = part:
word_num += 1
prev = part
yield word_num, part


def get_word_num(item):
return item[0]

parts = 1,2,1,1,1,2,3,4,1,2
for word_num, word in groupby(enumerate_words(parts), get_word_num):
parts_list = list(word)
max_part = parts_list[-1][1]
for word_num, part_num in parts_list:
print max_part, part_num

prints:

2 1
2 2
1 1
1 1
4 1
4 2
4 3
4 4
2 1
2 2

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: No math module??

2006-05-22 Thread Serge Orlov
WIdgeteye wrote:
 I have been trying to run a python program and I get the following
 error:
 Traceback (most recent call last):
  Fil e string, line 39, in ?

That doesn't look like a python program, File string means it's an
embedded script. When a script is embedded it is responsibility of the
caller (blender application) to setup correct path to modules.

   File /home/Larry/.blender/scripts/bzflag/__init__.py, line 22, in ?
 import BZfileRead
   File /home/Larry/.blender/scripts/bzflag/BZfileRead.py, line 24, in ?
 import BZsceneWriter
   File /home/Larry/.blender/scripts/bzflag/BZsceneWriter.py, line 25, in ?
 import BZcommon
   File /home/Larry/.blender/scripts/bzflag/BZcommon.py, line 24, in ?
 import math
 ImportError: No module named math


[snip]

 So what's up??:)

Try to insert
==
import sys
print sys.path, sys.version, sys.executable
==
right before the failing import math. The next step is most likely to
RTFM how to properly setup python embedded into blender. If everything
looks as described in the manual, it's a bug in blender.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: import woe

2006-05-19 Thread Serge Orlov
[EMAIL PROTECTED] wrote:
 hello,

 i have a problem.  i would like to import python files above and below
 my current directory.

 i'm working on /home/foo/bar/jar.py

 i would like to import /home/foo/car.py and
/home/foo/bar/far.py

 how can i do this?

$ cat ~/.bashrc
export PATH=/home/foo/:$PATH
$ cat /home/foo/application
#!/usr/bin/env python
import bar.jar
$ chmod +x /home/foo/application
$ cd /home/foo/bar
$ application
 all imports work fine ...

 ps: i want to scale, so i do not want to edit the python path

In what sense do you want to scale, working with multiple projects or
multiple versions of one project at the same time? Anyway you are to
quick to jump to conclusions, if you don't want to edit python path who
will do it for you? Python path won't appear out of thin air if your
file layout is not supported out of the box.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: WTF? Printing unicode strings

2006-05-19 Thread Serge Orlov
Ron Garret wrote:
 In article [EMAIL PROTECTED],
  Serge Orlov [EMAIL PROTECTED] wrote:

  Ron Garret wrote:
 I'm using an OS X terminal to ssh to a Linux machine.
   
In theory it should work out of the box. OS X terminal should set
enviromental variable LANG=en_US.utf-8, then ssh should transfer this
variable to Linux and python will know that your terminal is utf-8.
Unfortunately AFAIK OS X terminal doesn't set that variable and most
(all?) ssh clients don't transfer it between machines. As a workaround
you can set that variable on linux yourself . This should work in the
command line right away:
   
LANG=en_US.utf-8 python -c print unichr(0xbd)
   
Or put the following line in ~/.bashrc and logout/login
   
export LANG=en_US.utf-8
  
   No joy.
  
   [EMAIL PROTECTED]:~$ LANG=en_US.utf-8 python -c print unichr(0xbd)
   Traceback (most recent call last):
 File string, line 1, in ?
   UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
   position 0: ordinal not in range(128)
   [EMAIL PROTECTED]:~$
 
  What version of python and what shell do you run? What the following
  commands print:
 
  python -V
  echo $SHELL
  $SHELL --version

 [EMAIL PROTECTED]:~$ python -V
 Python 2.3.4
 [EMAIL PROTECTED]:~$ echo $SHELL
 /bin/bash
 [EMAIL PROTECTED]:~$ $SHELL --version
 GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
 Copyright (C) 2002 Free Software Foundation, Inc.
 [EMAIL PROTECTED]:~$

That's recent enough. I guess the distribution you're using set LC_*
variables for no good reason. Either unset all enviromental variables
starting with LC_ and set LANG variable or overide LC_CTYPE variable:

LC_CTYPE=en_US.utf-8 python -c print unichr(0xbd)

Should be working now :)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: WTF? Printing unicode strings

2006-05-19 Thread Serge Orlov
Serge Orlov wrote:
 Ron Garret wrote:
  In article [EMAIL PROTECTED],
   Serge Orlov [EMAIL PROTECTED] wrote:
 
   Ron Garret wrote:
  I'm using an OS X terminal to ssh to a Linux machine.

 In theory it should work out of the box. OS X terminal should set
 enviromental variable LANG=en_US.utf-8, then ssh should transfer this
 variable to Linux and python will know that your terminal is utf-8.
 Unfortunately AFAIK OS X terminal doesn't set that variable and most
 (all?) ssh clients don't transfer it between machines. As a workaround
 you can set that variable on linux yourself . This should work in the
 command line right away:

 LANG=en_US.utf-8 python -c print unichr(0xbd)

 Or put the following line in ~/.bashrc and logout/login

 export LANG=en_US.utf-8
   
No joy.
   
[EMAIL PROTECTED]:~$ LANG=en_US.utf-8 python -c print unichr(0xbd)
Traceback (most recent call last):
  File string, line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)
[EMAIL PROTECTED]:~$
  
   What version of python and what shell do you run? What the following
   commands print:
  
   python -V
   echo $SHELL
   $SHELL --version
 
  [EMAIL PROTECTED]:~$ python -V
  Python 2.3.4
  [EMAIL PROTECTED]:~$ echo $SHELL
  /bin/bash
  [EMAIL PROTECTED]:~$ $SHELL --version
  GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
  Copyright (C) 2002 Free Software Foundation, Inc.
  [EMAIL PROTECTED]:~$

 That's recent enough. I guess the distribution you're using set LC_*
 variables for no good reason. Either unset all enviromental variables
 starting with LC_ and set LANG variable or overide LC_CTYPE variable:

 LC_CTYPE=en_US.utf-8 python -c print unichr(0xbd)

 Should be working now :)

I've pulled myself together and installed linux in vwware player.
Apparently there is another way linux distributors can screw up. I
chose debian 3.1 minimal network install and after answering all
installation questions I found that only ascii and latin-1 english
locales were installed:
$ locale -a
C
en_US
en_US.iso88591
POSIX

In 2006, I would expect utf-8 english locale to be present even in
minimal install. I had to edit /etc/locale.gen and run locale-gen as
root. After that python started to print unicode characters.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Encode exception for chinese text

2006-05-19 Thread Serge Orlov
Vinayakc wrote:
 Hi all,

 I am new to python.

 I have written one small application which reads data from xml file and
 tries to encode data using apprpriate charset.
 I am facing problem while encoding one chinese paragraph with charset
 gb2312.

 code is:

 encoded_str = str_data.encode(gb2312)

 The type of str_data is type 'unicode'

 The exception is:

 UnicodeEncodeError: 'gb2312' codec can't encode character u'\xa0' in
 position 0: illegal multibyte sequence

Hmm, this is 'no-break space' in the very beginning of the text. It
look suspiciously like a  plain text utf-8 signature which is 'zero
width no-break space'. If you strip the first character do you still
have encoding errors?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Encode exception for chinese text

2006-05-19 Thread Serge Orlov
Vinayakc wrote:
 Yes serge, I have removed the first character but it is still giving
 encoding exception.

Then I guess this character was used as a poor man indentation tool at
least in the beginning of your text. It's up to you to decide what to
do with that character, you have several choices:

* edit source xml file to get rid of it
* remove it while you process your data
* replace it with ordinary space
* consider utf-8

Note, there are legitimate use cases for no-break space, for example
one million can be written like 1 000 000, where spaces are
non-breakable. This prevents the number to be broken by right margin
like this: 1 000
000

Keep that in mind when you remove or replace no-break space.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: the tostring and XML methods in ElementTree

2006-05-19 Thread Serge Orlov
George Sakkis wrote:
   I'm currently using
   (a variation of) the workaround below instead of ET.tostring and it
   works fine for me:
  
   def tostring(element, encoding=None):
   text = element.text
   if text:
   if not isinstance(text, basestring):
   text2 = str(text)
   elif isinstance(text, str) and encoding:
   text2 = text.decode(encoding)
   element.text = text2
   s = ET.tostring(element, encoding)
   element.text = text
   return s
  
  
   Why isn't this the standard behaviour ?
 
 
  Because it wouldn't work. What if you wanted to serialize a different 
  encoding
  than that of the strings you put into the .text fields? How is ET supposed 
  to
  know what encoding your strings have? And how should it know that you didn't
  happily mix various different byte encodings in your strings?

 If you're mixing different encodings, no tool can help you clean up the
 mess, you're on your own. This is very different though from having
 nice utf-8 strings everywhere, asking ET.tostring explicitly to print
 them in utf-8 and getting back garbage. Isn't the most reasonable
 assumption that the input's encoding is the same with the output, or
 does this fall under the refuse the temptation to guess motto ? If
 this is the case, ET could at least accept an optional input encoding
 parameter and convert everything to unicode internally.

This is an optimization. Basically you're delaying decoding. First of
all have you measured the impact on your program if you delay decoding?
I'm sure for many programs it doesn't matter, so what you're proposing
will just pollute their source code with optimization they don't need.
That doesn't mean it's a bad idea in general. I'd prefer it implemented
in python core with minimal impact on such programs, decoding delayed
until you try to access individual characters. The code below can be
implemented without actual decoding:

utf8_text_file.write(abc.decode(utf-8) +  def.decode(utf-8))

But this example will require decoding done during split method:

a = (abc.decode(utf-8) +  def.decode(utf-8)).split()




  Use unicode, that works *and* is portable.

 *and* it's not supported by all the 3rd party packages, databases,
 middleware, etc. you have to or want to use.

You can always call .encode method. Granted that could be a waste of
CPU and memory, but it works.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: WTF? Printing unicode strings

2006-05-18 Thread Serge Orlov
Ron Garret wrote:
 In article [EMAIL PROTECTED],
  Robert Kern [EMAIL PROTECTED] wrote:

  Ron Garret wrote:
 
   I forgot to mention:
  
  sys.getdefaultencoding()
  
   'utf-8'
 
  A) You shouldn't be able to do that.

 What can I say?  I can.

  B) Don't do that.

 OK.  What should I do instead?

Exact answer depends on what OS and terminal you are using and what
your program is supposed to do, are you going to distribute the program
or it's just for internal use.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: WTF? Printing unicode strings

2006-05-18 Thread Serge Orlov
Ron Garret wrote:
 In article [EMAIL PROTECTED],
  Serge Orlov [EMAIL PROTECTED] wrote:

  Ron Garret wrote:
   In article [EMAIL PROTECTED],
Robert Kern [EMAIL PROTECTED] wrote:
  
Ron Garret wrote:
   
 I forgot to mention:

sys.getdefaultencoding()

 'utf-8'
   
A) You shouldn't be able to do that.
  
   What can I say?  I can.
  
B) Don't do that.
  
   OK.  What should I do instead?
 
  Exact answer depends on what OS and terminal you are using and what
  your program is supposed to do, are you going to distribute the program
  or it's just for internal use.

 I'm using an OS X terminal to ssh to a Linux machine.

In theory it should work out of the box. OS X terminal should set
enviromental variable LANG=en_US.utf-8, then ssh should transfer this
variable to Linux and python will know that your terminal is utf-8.
Unfortunately AFAIK OS X terminal doesn't set that variable and most
(all?) ssh clients don't transfer it between machines. As a workaround
you can set that variable on linux yourself . This should work in the
command line right away:

LANG=en_US.utf-8 python -c print unichr(0xbd)

Or put the following line in ~/.bashrc and logout/login

export LANG=en_US.utf-8

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: WTF? Printing unicode strings

2006-05-18 Thread Serge Orlov
Ron Garret wrote:
   I'm using an OS X terminal to ssh to a Linux machine.
 
  In theory it should work out of the box. OS X terminal should set
  enviromental variable LANG=en_US.utf-8, then ssh should transfer this
  variable to Linux and python will know that your terminal is utf-8.
  Unfortunately AFAIK OS X terminal doesn't set that variable and most
  (all?) ssh clients don't transfer it between machines. As a workaround
  you can set that variable on linux yourself . This should work in the
  command line right away:
 
  LANG=en_US.utf-8 python -c print unichr(0xbd)
 
  Or put the following line in ~/.bashrc and logout/login
 
  export LANG=en_US.utf-8

 No joy.

 [EMAIL PROTECTED]:~$ LANG=en_US.utf-8 python -c print unichr(0xbd)
 Traceback (most recent call last):
   File string, line 1, in ?
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
 position 0: ordinal not in range(128)
 [EMAIL PROTECTED]:~$

What version of python and what shell do you run? What the following
commands print:

python -V
echo $SHELL
$SHELL --version

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: newb: comapring two strings

2006-05-18 Thread Serge Orlov
manstey wrote:
 Hi,

 Is there a clever way to see if two strings of the same length vary by
 only one character, and what the character is in both strings.

 E.g. str1=yaqtil str2=yaqtel

 they differ at str1[4] and the difference is ('i','e')

 But if there was str1=yiqtol and str2=yaqtel, I am not interested.

 can anyone suggest a simple way to do this?

 My next problem is, I have a list of 300,000+ words and I want to find
 every pair of such strings. I thought I would first sort on length of
 string, but how do I iterate through the following:

 str1
 str2
 str3
 str4
 str5

 so that I compare str1  str2, str1  str3, str 1  str4, str1  str5,
 str2  str3, str3  str4, str3  str5, str4  str5.

If your strings are pretty short you can do it like this even without
sorting by length first:

def fuzzy_keys(s):
for pos in range(len(s)):
yield s[0:pos]+chr(0)+s[pos+1:]

def fuzzy_insert(d, s):
for fuzzy_key in fuzzy_keys(s):
if fuzzy_key in d:
strings = d[fuzzy_key]
if type(strings) is list:
strings += s
else:
d[fuzzy_key] = [strings, s]
else:
d[fuzzy_key] = s

def gather_fuzzy_matches(d):
for strings in d.itervalues():
if type(strings) is list:
yield strings

acc = {}
fuzzy_insert(acc, yaqtel)
fuzzy_insert(acc, yaqtil)
fuzzy_insert(acc, oaqtil)
print list(gather_fuzzy_matches(acc))

prints

[['yaqtil', 'oaqtil'], ['yaqtel', 'yaqtil']]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: arrays, even, roundup, odd round down ?

2006-05-17 Thread Serge Orlov
Lance Hoffmeyer wrote:
 So, I have using the following to grab numbers from MS Word.  I discovered 
 that that there is a special
 rule being used for rounding.

 If a ??.5 is even the number is to rounded down (20.5 = 20)
 if a ??.5 is odd the number is to rounded up (21.5 = 22)

 Brands = [B1,B2]
 A1 = []
 A1 = [ re.search(r(?m)(?s)\r%s.*?SECOND.*?(?:(\d{1,3}\.\d)\s+){2} % i, 
 target_table).group(1)  for i in Brands ]
 A1 = [int(float(str(x))+0.5) for x in A1 ]
 print A1


 Any solutions for this line with the above conditions?

Seems like a job for Decimal:

from decimal import Decimal
numbers = 20.50 21.5.split()
ZERO_PLACES = Decimal(1)
print [int(Decimal(num).quantize(ZERO_PLACES)) for num in numbers]

produces

[20, 22]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Strange IO Error when extracting zips to a network location

2006-05-17 Thread Serge Orlov
Hari Sekhon wrote:
 Hi,
I've written a script to run on windows to extract all zips under a
 given directory path to another directory path as such:

 python extractzips.py fetch all zips under this dir put all extracted
 files under this dir

 The purpose of this script is to retrieve backup files which are
 individually zipped under a backup directory tree on a backup server.

 This scripts works nicely and has input validation etc, exiting
 gracefully and telling you if you gave a non existent start or target
 path...

 When running the script as follows

 python extractzips.py \\backupserver\backupshare\machine\folder d:\unziphere

 the script works perfectly, but if I do

 python extractzips.py \\backupserver\backupshare\machine\folder
 \\anetworkmachine\share\folder

 then it unzips a lot of files, recreating the directory tree as it goes
 but eventually fails with the traceback:

   File extractzips.py, line 41, in zipextract
 outfile.write(zip.read(x))
 IOError: [Errno 22] Invalid argument


 But I'm sure the code is correct and the argument is passed properly,
 otherwise a hundred files before it wouldn't have extracted successfully
 using this exact same piece of code (it loops over it). It always fails
 on this same file every time. When I extract the same tree to my local
 drive it works fine without error.

 I have no idea why pushing to a network share causes an IO Error,
 shouldn't it be the same as extracting locally from our perspective?

It looks like
http://support.microsoft.com/default.aspx?scid=kb;en-us;899149 is the
answer.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python script windows servcie

2006-05-17 Thread Serge Orlov
Mivabe wrote:
 Mivabe formulated the question :
 
  Google helped me discovering that it has something to do something with
  'CTRL_LOGOFF_EVENT'. I know what it means but i don't know how to solve it.
  Is that something i have to configure in the script?
 
  I'n totally new to Python so maybe someone can point me to the right
  direction? :D
 
  Regards, Mivabe

 No-one who can help me or did i visit the wrong group for this
 'problem'?

Indeed. Next time you'd better ask in a windows specific list:
http://mail.python.org/mailman/listinfo/python-win32

You need to ignore CTRL_LOGOFF_EVENT. Take a look for example at
http://mail.zope.org/pipermail/zope-checkins/2005-March/029068.html

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: distributing a app frozen by cx_freeze

2006-05-13 Thread Serge Orlov
Flavio wrote:
 Well I managed to get rid of the undefined symbol message by copying
 all qt libs to the freeze directory, the problem is that now the
 package is huge (83MB)!

 So my question is: is there a way to find out exactly which lib is
 missing ?

I haven't done that myself, but I've had an idea of discovering
dependances for dynamic languages: run your test suite and register
which files are loaded (byte code, dlls, datafiles), then remove from
the list all files your know they were used only for testing, that's
it, now you know all the files that you need to run your application.

On linux you can find you which .so files are loaded by looking at file
/proc/self/maps at the end of running your test suite. To find out
which python bytecode files were loaded you can use -v option of
python, it will print all files that were loaded to stderr, to separate
its output from other stderr stuff, you can redirect sys.stderr to some
other file.

After you've done all that work, I'm not sure if you need cx_freeze.
You just need to write little startup script that will set
LD_LIBRARY_PATH, PYTHONPATH and start your main script.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: cx_freeze and matplotlib

2006-05-13 Thread Serge Orlov
Flavio wrote:
 I am trying to freeze an application which imports matplotlib. It all
 works fine on the machine where it was frozen. The executable runs
 without a glitch.

 But when I move the directory containing the frozen executable and
 other libs to a new machine, I get the following error:

 Traceback (most recent call last):
  File
 /home/fccoelho/Downloads/cx_Freeze-3.0.2/initscripts/Console.py,
 line 26, in ?
  File epigrass.py, line 5, in ?
  File Epigrass/manager.py, line 7, in ?
  File Epigrass/simobj.py, line 4, in ?
  File /usr/lib/python2.4/site-packages/matplotlib/__init__.py, line
 457, in ?
try: return float(s)
  File /usr/lib/python2.4/site-packages/matplotlib/__init__.py, line
 245, in wrapper
if level not in self.levels:
  File /usr/lib/python2.4/site-packages/matplotlib/__init__.py, line
 319, in _get_data_path
Return the string representing the configuration dir.  If s is the
 RuntimeError: Could not find the matplotlib data files

 Matplotlib can't find its data files.

I'm not familiar with cx_freeze, but have you told cx_freeze that you
don't want to bundle matplotlib or cx_freeze has decided that
matplotlib is not going to be bundled? That fact that matplotlib is
loaded from site-package is pretty strange, standalone application are
not supposed to depend on non-system packages.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Install libraries only without the program itself

2006-05-11 Thread Serge Orlov
Gregor Horvath wrote:
 Hi,

 My application is a client/server in a LAN. I want to keep my programs
 .py files on a central File Server serving all clients. The clients
 should load those over the LAN every time they start the program since I
 expect that they are rapidly changing and I dont want to update each
 client seperatly.

Don't forget you can screw up running clients if you override old
version with a new one.

 On the clients there should only be python and the necessary libraries
 and third party modules (sqlobject etc.) installed.

I believe it's better to keep *everything* on the file server. Suppose
your OS is windows and suppose you want to keep everything in s:/tools.
The actions are:
1. Copy python with all 3rd party modules from c:/python24 to
s:/tools/python24-win32
2. Grab exemaker from http://effbot.org/zone/exemaker.htm, copy
exemaker.exe to s:/tools/win32/client.exe
3. Create little dispatcher s:/tools/win32/client.py:
#!s:/tools/python24-win32/python.exe
import sys
sys.path[0] = s:/tools/client-1.0.0
import client
4. Create your first version of s:/tools/client-1.0.0/client.py:
print I'm a client version 1.0.0
-
That's it. Now s:/tools/win32/client.exe is ready to go. I guess it's
obvious how to release version 1.0.1 If you need to support other
architectures or operating systems you just need to create tiny
dispatchers in directories s:/tools/linux, s:/tools/macosx ...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python memory deallocate

2006-05-11 Thread Serge Orlov
Heiko Wundram wrote:
 Am Donnerstag 11 Mai 2006 15:15 schrieb [EMAIL PROTECTED]:
  I MUST find a system which deallocate memory...
  Otherwise, my application crashes not hardly  it's arrived to
  break-point system

 As was said before: as long as you keep a reference to an object, the object's
 storage _will not be_ reused by Python for any other objects (which is
 sensible, or would you like your object to be overwritten by other objects
 before you're done with them?). Besides, even if Python did free the memory
 that was used, the operating system wouldn't pick it up (in the general case)
 anyway (because of fragmentation issues), so Python keeping the memory in an
 internal free-list for new objects is a sensible choice the Python developers
 took here.

BTW python 2.5 now returns free memory to OS, but if a program keeps
allocating more memory with each new iteration in python 2.4, it will
not help.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory leak in Python

2006-05-11 Thread Serge Orlov
[EMAIL PROTECTED] wrote:
 I ran simulation for 128 nodes and used the following

 oo = gc.get_objects()
 print len(oo)

 on every time step the number of objects are increasing. For 128 nodes
 I had 1058177 objects.

 I think I need to revisit the code and remove the referencesbut how
 to do that. I am still a newbie coder and every help will be greatly
 appreciated.

The next step is to find out what type of objects contributes to the
growth most of all, after that print several object of that type that
didn't exist on iteration N-1 but exist on iteration N

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Use subprocesses in simple way...

2006-05-11 Thread Serge Orlov
DurumDara wrote:
 10 May 2006 04:57:17 -0700, Serge Orlov [EMAIL PROTECTED]:
  I thought md5 algorithm is pretty light, so you'll be I/O-bound, then
  why bother with multi-processor algorithm?

 This is an assessor utility.
 The program's architecture must be flexible, because I don't know,
 where it need to run (only I have a possibility to fix this: I write
 to user's guide).

 But I want to speedup my alg. with native code, and multiprocess code.
 I not tested yed, but I think that 4 subprocess quickly as one large
 process.

I believe you need to look at Queue module. Using Queue will help you
avoid threading hell that you're afraid of (and rightly so!). Create
two queues: one for jobs, another one for results, the main thread
submits jobs and picks up results for results queue. As soon as number
of results == number of jobs, it's time to quit. Submit N special jobs
that indicate it's time to exit, where N is the number of worker
threads. Then join the main thread with worker threads and exit the
application.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to encode html and xml tag datas with standard python modules ?

2006-05-11 Thread Serge Orlov
DurumDara wrote:
 Hi !

 I probed this function, but that is not encode the hungarian specific
 characters, like áéíóüóöoúüu: so the chars above chr(127).
 Have the python a function that can encode these chars too, like in Zope ?


The word encode is ambiguous. What do you mean? The example Fredrik
gave to you does encode:

 import cgi
 cgi.escape(uáéíóüóöoúüu).encode(ascii, xmlcharrefreplace)
'#225;#233;#237;#243;#252;#243;#246;#337;#250;#252;#369;'

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: FTP filename escaping

2006-05-11 Thread Serge Orlov
Almad wrote:
 OK, after some investigation...problem is in non-latin characters in
 filenames on ftp.

 Yes, users should be killed for this,

It's futile, users will always find a way to crash you program :) And
you can't kill them all, there are too many of them.

 but I would like to handle it
 somehow...

It depends on what you're actually doing. Did you write the ftp server?
Or do you have any information about server (OS etc...)? Is your client
the only client who can upload? Do you care how file names actually
look interally in the server?

 I can't figure out how it's handled by protocol, ftplib seems to just
 strip those characters...

I believe filename == sequence of bytes terminated by newline byte. I
doubt ftplib strips bytes over 127. Even if it does, copy it to your
private module collection as ftplibng.py, fix it and import ftplibng as
ftplib

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can distutils windows installer invoke another distutils windows installer

2006-05-11 Thread Serge Orlov
timw.google wrote:
 Hi all.

 I have a package that uses other packages. I created a setup.py to use
 'try:' and import to check if some required packages are installed. I
 have the tarballs and corresponding windows installers in my sdist
 distribution, so if I untar my source distribution and do 'python
 setup.py install', the script either untars the subpackages to a tmp
 directory and does an os.system('python setup.py install') (Linux), or
 os.system(bdist_wininst installer) (win32) for the missing
 subpackage.

I believe there are two ways to handle dependances: either you bundle
your dependances with your package (they just live in a directory
inside your package, you don't install them) or you leave resolution of
dependances to the application that uses your package. Handling
dependances like you do it (package installs other packages) doesn't
seem like a good idea to me.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ascii to latin1

2006-05-10 Thread Serge Orlov
Luis P. Mendes wrote:
 Errors occur when I assign the result of ''.join(cp for cp in de_str if
 not unicodedata.category(cp).startswith('M')) to a variable.  The same
 happens with de_str.  When I print the strings everything is ok.

 Here's a short example of data:
 115448,DAÇÃO
 117788,DA 1º DE MO Nº 2

 I used the following script to convert the data:
 # -*- coding: iso8859-15 -*-

 class Latin1ToAscii:

   def abreFicheiro(self):
   import csv
   self.reader = csv.reader(open(self.input_file, rb))

   def converter(self):
   import unicodedata
   self.lista_csv = []
   for row in self.reader:
   s = unicode(row[1],latin-1)
   de_str = unicodedata.normalize(NFD, s)
   nome = ''.join(cp for cp in de_str if not \
   unicodedata.category(cp).startswith('M'))

   linha_ascii = row[0] + , + nome  # *
   print linha_ascii.encode(ascii)
   self.lista_csv.append(linha_ascii)


   def __init__(self):
   self.input_file = 'nome_latin1.csv'
   self.output_file = 'nome_ascii.csv'

 if __name__ == __main__:
   f = Latin1ToAscii()
   f.abreFicheiro()
   f.converter()


 And I got the following result:
 $ python latin1_to_ascii.py
 115448,DACAO
 Traceback (most recent call last):
   File latin1_to_ascii.py, line 44, in ?
 f.converter()
   File latin1_to_ascii.py, line 22, in converter
 print linha_ascii.encode(ascii)
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xba' in
 position 11: ordinal not in range(128)


 The script converted the ÇÃ from the first line, but not the º from the
 second one.  Still in *, I also don't get a list as [115448,DAÇÃO] but a
 [u'115448,DAÇÃO'] element, which doesn't suit my needs.

 Would you mind telling me what should I change?

Calling this process latin1 to ascii was a misnomer, sorry that I
used this phrase. It should be called latin1 to search key, there is
no requirement that the key must be ascii, so change the corresponding
lines in your code:

linha_key = row[0] + , + nome
print linha_key
self.lista_csv.append(linha_key.encode(latin-1)

With regards to º, Richie already gave you food for thoughts, if you
want 1 DE MO to match 1º DE MO remove that symbol from the key
(linha_key = linha_key.translate({uº: None}), if you don't want such
a fuzzy matching, keep it.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory leak in Python

2006-05-10 Thread Serge Orlov
[EMAIL PROTECTED] wrote:
 I am using Ubuntu Linux.

 My program is a simulation program with four classes and it mimics bit
 torrent file sharing systems on 2000 nodes. Now, each node has lot of
 attributes and my program kinds of tries to keep tab of everything. As
 I mentioned its a simulation program, it starts at time T=0 and goes on
 untill all nodes have recieved all parts of the file(BitTorrent
 concept). The ending time goes to thousands of seconds. In each sec I
 process all the 2000 nodes.

Most likely you keep references to objects you don't need, so python
garbage collector cannot remove those objects. If you cannot figure it
out looking at the source code, you can gather some statistics to help
you, for example use module gc to iterate over all objects in your
program (gc.get_objects()) and find out objects of which type are
growing with each iteration.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Use subprocesses in simple way...

2006-05-10 Thread Serge Orlov

Dara Durum wrote:

[snip design of a multi-processor algorithm]

I thought md5 algorithm is pretty light, so you'll be I/O-bound, then
why bother with multi-processor algorithm?

 2.)
 Do you know command line to just like FSUM that can compute file
 hashes (MD5/SHA1), and don't have any problems with unicode alt. file
 names ?

I believe you can wrap the broken program with a simple python wrapper.
Use win32api.GetShortPathName to convert non-ascii file names to DOS
filenames.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: data entry tool

2006-05-10 Thread Serge Orlov
Peter wrote:
 Diez B. Roggisch wrote:
  Make it a webapp. That will guarantee to make it runnable on the list of
  OSses you gave. Use Django/TurboGears/ZOPE for the application itself-
  whichever suits you best.

 A webapp isn't feasible as most of the users are on dial up (this is in New
 Zealand and broadband isn't available for lots of people).

I don't see connection here, why it's not feasible?

 I was hoping for a simple tool.  Even if it only worked on Windows, it would
 be a start.  It just needs to present a form of text entry fields to the
 user, and place the data in a plain text file.

You can do it using for example Tkinter
http://wiki.python.org/moin/TkInter that comes with python
distribution for windows.


 If python can't do this, can anyone suggest another language or approach?

Sure you can code that application in Python, the problem is
distribution and support of application running on multiple platforms.
That's what webapp will help you to avoid. Keep in mind that standalone
application for windows will be about 2Mb. Keep also in mind linux is
not a platform, it is hmm, how to say it? a snapshot of random programs
found on internet, so it's very hard to distribute and support programs
for it.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: data entry tool

2006-05-10 Thread Serge Orlov
[EMAIL PROTECTED] wrote:
 If the data to be entered is simple and textual you can even think
 about using a text only interface. The resulting program will be really
 simple, and probably good enough.

FWIW here is size of Hello, world! program distribution using
different interfaces:

text console: 1.2Mb
web.py w/o ssl: 1.5Mb
tkinter: 2.1Mb
wxpython: 3.0Mb

Getting more slim distributions requires some manual work

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using time.sleep() in 2 threads causes lockup whenhyper-threading is enabled

2006-05-09 Thread Serge Orlov
Dennis Lee Bieber wrote:
 On 8 May 2006 15:44:04 -0700, Serge Orlov [EMAIL PROTECTED]
 declaimed the following in comp.lang.python:


  The test program in question doesn't require a dedicated machine and it
  doesn't consume a lot of CPU resources since it sleeps most of the
  time.
 
   Yet... Do we have any evidence that other activity on the machine
 may or may not affect the situation? There is a big difference between
 leaving a machine idle for a few hours to see if it hangs, vs doing
 normal activities with a process in the background (and what about
 screen savers? network activity?)

But what if other processes will actually help to trigger the bug? IMHO
both situations (idle or busy) are equal if you don't know what's going
on.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Embedding Python

2006-05-09 Thread Serge Orlov
gavinpaterson wrote:
 Dear Pythoners,

 I am writing as I am having trouble embedding a Python program into a
 Win XP C++ console application.

 I have written a script file for importing and I am trying to use the
 example for Pure Embedding found in the product documentation.

 The program fails to successfully execute past the line:

 if (pFunc  PyCallable_Check(pFunc))

 I have checked the pFunc pointer at runtime and it is not null so I
 assume that the PyCallable_Check fails.

Looking at the source in Pure Embedding I see that it is supposed to
print an error message if PyCallable_Check fails, have you got the
message?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ascii to latin1

2006-05-09 Thread Serge Orlov
Richie Hindle wrote:
 [Serge]
  def search_key(s):
  de_str = unicodedata.normalize(NFD, s)
  return ''.join(cp for cp in de_str if not
 unicodedata.category(cp).startswith('M'))

 Lovely bit of code - thanks for posting it!

Well, it is not so good. Please read my next message to Luis.


 You might want to use NFKD to normalize things like LATIN SMALL
 LIGATURE FI and subscript/superscript characters as well as diacritics.

IMHO It is perfectly acceptable to declare you don't interpret those
symbols.  After all they are called *compatibility* code points. I
tried a quater symbol: Google and MSN don't interpret it. Yahoo
doesn't support it at all.

NFKD form is also more tricky to use. It loses semantic of characters,
for example if you have character digit two followed by superscript
digit two; they look like 2 power 2, but NFKD will convert them into
22 (twenty two), which is wrong. So if you want to use NFKD for search
your will have to preprocess your data, for example inserting space
between the twos.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ascii to latin1

2006-05-09 Thread Serge Orlov
Luis P. Mendes wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Richie Hindle escreveu:
  [Serge]
  def search_key(s):
  de_str = unicodedata.normalize(NFD, s)
  return ''.join(cp for cp in de_str if not
 unicodedata.category(cp).startswith('M'))
 
  Lovely bit of code - thanks for posting it!
 
  You might want to use NFKD to normalize things like LATIN SMALL
  LIGATURE FI and subscript/superscript characters as well as diacritics.
 

 Thank you very much for your info.  It's a very good aproach.

 When I used the NFD option, I came across many errors on these and
 possibly other codes: \xba, \xc9, \xcd.

What errors? normalize method is not supposed to give any errors. You
mean it doesn't work as expected? Well, I have to admit that using
normalize is a far from perfect way to  implement search. The most
advanced algorithm is published by Unicode guys:
http://www.unicode.org/reports/tr10/ If you read it you'll understand
it's not so easy.


 I tried to use NFKD instead, and the number of errors was only about
 half a dozen, for a universe of 60+ names, on code \xbf.
 It looks like I have to do a search and substitute using regular
 expressions for these cases.  Or is there a better way to do it?

Perhaps you can use unicode translate method to map the characters that
still give you problems to whatever you want.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hyperthreading locks up sleeping threads

2006-05-08 Thread Serge Orlov

[EMAIL PROTECTED] wrote:
 Tried importing win32api instead of time and using the
 win32api.GetTickCount() and win32api.Sleep() methods.

What about win32api.SleepEx? What about

WaitForMultipleObjects
WaitForMultipleObjectsEx
WaitForSingleObject
WaitForSingleObjectEx

when the object is not expected to produce events and the function
timeouts?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using time.sleep() in 2 threads causes lockup whenhyper-threading is enabled

2006-05-08 Thread Serge Orlov

Delaney, Timothy (Tim) wrote:
 [EMAIL PROTECTED] wrote:

  I am a bit surprised that nobody else has tried running the short
  Python program above on a hyper-threading or dual core / dual
  processor system.

 Does it happen every time? Have you tried it on multiple machines? Is it
 possible that that one machine is having problems? Does it take the same
 amount of time each run to replicate - and if so, how long is that (give
 or take a minute)?

 Until you can answer these questions with definite answers, people are
 not going to dedicate machines *that they use* for hours on end trying
 to replicate it. And from this thread, the time required appears to be
 minutes to hours. That suggests you have a race condition which
 results in a deadlock - and of course, that is more likely to occur on a
 dual-core or dual-cpu machine, as you really have multiple threads
 executing at once.

The test program in question doesn't require a dedicated machine and it
doesn't consume a lot of CPU resources since it sleeps most of the
time.

 I'm surprised that so many people have been willing to dedicate as much
 time as they have, but then again considering the people involved it's
 not quite so surprising.

This problem doesn't require more time than any other in
comp.lang.python which I try to help to resolve. In fact, since OP is
not a newbie, it took less time than some newbie questions.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ascii to latin1

2006-05-08 Thread Serge Orlov
Luis P. Mendes wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

 I'm developing a django based intranet web server that has a search page.

 Data contained in the database is mixed.  Some of the words are
 accented, some are not but they should be.  This is because the
 collection of data  began a long time ago when ascii was the only way to go.

 The problem is users have to search more than once for some word,
 because the searched word can be or not be accented.  If we consider
 that some expressions can have several letters that can be accented, the
 search effort is too much.

 I've searched the net for some kind of solution but couldn't find.  I've
 just found for the opposite.

 example:
 if the word searched is 'televisão', I want that a search by either
 'televisao', 'televisão' or even 'télévisao' (this last one doesn't
 exist in Portuguese) is successful.

 So, instead of only one search, there will be several used.

 Is there anything already coded, or will I have to try to do it all by
 myself?

You need to covert from latin1 to ascii not from ascii to latin1. The
function below does that. Then you need to build database index not on
latin1 text but on ascii text. After that convert user input to ascii
and search.

import unicodedata

def search_key(s):
de_str = unicodedata.normalize(NFD, s)
return ''.join(cp for cp in de_str if not
unicodedata.category(cp).startswith('M'))

print search_key(utelevisão)
print search_key(utélévisao)

= Result:
televisao
televisao

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A critic of Guido's blog on Python's lambda

2006-05-07 Thread Serge Orlov
Ken Tilton wrote:
 It is vastly more disappointing that an alleged tech genius would sniff
 at the chance to take undeserved credit for PyCells, something probably
 better than a similar project on which Adobe (your superiors at
 software, right?) has bet the ranch. This is the Grail, dude, Brooks's
 long lost Silver Bullet. And you want to pass?

 C'mon, Alex, I just want you as co-mentor for your star quality. Of
 course you won't have to do a thing, just identify for me a True Python
 Geek and she and I will take it from there.

 Here's the link in case you lost it:

  http://www.lispnyc.org/wiki.clp?page=PyCells

 :)

 peace, kenny

 ps. flaming aside, PyCells really would be amazingly good for Python.
 And so Google. (Now your job is on the line. g) k

Perhaps I'm missing something but what's the big deal about PyCells?
Here is 22-lines barebones implementation of spreadsheet in Python,
later I create 2 cells a and b, b depends on a and evaluate all
the cells. The output is

a = negate(sin(pi/2)+one) = -2.0
b = negate(a)*10 = 20.0

=== spreadsheet.py ==
class Spreadsheet(dict):
def __init__(self, **kwd):
self.namespace = kwd
def __getitem__(self, cell_name):
item = self.namespace[cell_name]
if hasattr(item, formula):
return item()
return item
def evaluate(self, formula):
return eval(formula, self)
def cell(self, cell_name, formula):
Create a cell defined by formula
def evaluate_cell():
return self.evaluate(formula)
evaluate_cell.formula = formula
self.namespace[cell_name] = evaluate_cell
def cells(self):
Yield all cells of the spreadsheet along with current values
and formulas
for cell_name, value in self.namespace.items():
if not hasattr(value, formula):
continue
yield cell_name, self[cell_name], value.formula

import math
def negate(x):
return -x
sheet1 = Spreadsheet(one=1, sin=math.sin, pi=math.pi, negate=negate)
sheet1.cell(a, negate(sin(pi/2)+one))
sheet1.cell(b, negate(a)*10)
for name, value, formula in sheet1.cells():
print name, =, formula, =, value

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A critic of Guido's blog on Python's lambda

2006-05-07 Thread Serge Orlov
Bill Atkins wrote:
 Serge Orlov [EMAIL PROTECTED] writes:

  Ken Tilton wrote:
  It is vastly more disappointing that an alleged tech genius would sniff
  at the chance to take undeserved credit for PyCells, something probably
  better than a similar project on which Adobe (your superiors at
  software, right?) has bet the ranch. This is the Grail, dude, Brooks's
  long lost Silver Bullet. And you want to pass?
 
  C'mon, Alex, I just want you as co-mentor for your star quality. Of
  course you won't have to do a thing, just identify for me a True Python
  Geek and she and I will take it from there.
 
  Here's the link in case you lost it:
 
   http://www.lispnyc.org/wiki.clp?page=PyCells
 
  :)
 
  peace, kenny
 
  ps. flaming aside, PyCells really would be amazingly good for Python.
  And so Google. (Now your job is on the line. g) k
 
  Perhaps I'm missing something but what's the big deal about PyCells?
  Here is 22-lines barebones implementation of spreadsheet in Python,
  later I create 2 cells a and b, b depends on a and evaluate all
  the cells. The output is
 
  a = negate(sin(pi/2)+one) = -2.0
  b = negate(a)*10 = 20.0
 
  === spreadsheet.py ==
  class Spreadsheet(dict):
  def __init__(self, **kwd):
  self.namespace = kwd
  def __getitem__(self, cell_name):
  item = self.namespace[cell_name]
  if hasattr(item, formula):
  return item()
  return item
  def evaluate(self, formula):
  return eval(formula, self)
  def cell(self, cell_name, formula):
  Create a cell defined by formula
  def evaluate_cell():
  return self.evaluate(formula)
  evaluate_cell.formula = formula
  self.namespace[cell_name] = evaluate_cell
  def cells(self):
  Yield all cells of the spreadsheet along with current values
  and formulas
  for cell_name, value in self.namespace.items():
  if not hasattr(value, formula):
  continue
  yield cell_name, self[cell_name], value.formula
 
  import math
  def negate(x):
  return -x
  sheet1 = Spreadsheet(one=1, sin=math.sin, pi=math.pi, negate=negate)
  sheet1.cell(a, negate(sin(pi/2)+one))
  sheet1.cell(b, negate(a)*10)
  for name, value, formula in sheet1.cells():
  print name, =, formula, =, value
 

 I hope Ken doesn't mind me answering for him, but Cells is not a
 spreadsheet (where did you get that idea?).

It's written on the page linked above, second sentence: Think of the
slots as cells in a spreadsheet, and you've got the right idea. I'm
not claiming that my code is full PyCell implementation.


 It does apply the basic
 idea of a spreadsheet to software - that is, instead of updating value
 when some event occurs, you specify in advance how that value can be
 computed and then you stop worrying about keeping it updated.

The result is the same. Of course, I don't track dependances in such a
tiny barebones example. But when you retrieve a cell you will get the
same value as with dependances. Adding dependances is left as an
exercise.


 Incidentally, is this supposed to be an example of Python's supposed
 aesthetic pleasantness?

Nope. This is an example that you don't need macros and
multi-statements. Ken writes: While the absence of macros and
multi-statement lambda in Python will make coding more cumbersome. I'd
like to see Python code doing the same if the language had macros and
multi-statement lambda. Will it be more simple? More expressive?

 I find it a little hideous, even giving you
 the benefit of the doubt and pretending there are newlines between
 each function.  There's nothing like a word wrapped in pairs of
 underscores to totally ruin an aesthetic experience.

I don't think anyone who is not a master of a language can judge
readability. You're just distracted by insignificant details, they
don't matter if you code in that language for many years. I'm not going
to tell you how Lisp Cell code looks to me ;)

 P.S. Is this really a spreadsheet?  It looks like it's a flat
 hashtable...

Does it matter if it's flat or 2D?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: the tostring and XML methods in ElementTree

2006-05-07 Thread Serge Orlov
[EMAIL PROTECTED] wrote:
 Question 1: assuming the following:
  a) beforeCtag.text gets assigned a value of 'I\x92m confused'
  b) afterRoot is built using the XML() method where the input to the
 XML() method is the results of a tostring() method from beforeRoot
 Are there any settings/arguments that could have been modified that
 would have resulted in afterCtag.text being of type type 'str' and
 afterCtag.text when printed displays:
  I'm confused

 ?

str type (also known as byte string) is only suitable for ascii text.
chr(0x92) is outside of ascii so you should use unicode strings or
you\x92ll be confused :)

 print uI\u2019m not confused
I'm not confused


 Question 2: Does the fact that resultToStr is equal to resultToStr2
 mean that an encoding of utf-8 is the defacto default when no encoding
 is passed as an argument to the tostring method, or does it only mean
 that in this particular example, they happened to be the same?


No. Dejure default encoding is ascii, defacto people try to change it,
but it's not a good idea. I'm not sure how you got the strings to be
the same, but it's definately host-specific result, when I repeat your
interactive session I get different resultToStr at this point:

 afterRoot = ElementTree.XML(resultToStr)
 resultToStr
'beforeRootCI#146;m confused/C/beforeRoot'


 3) would it be possible to construct a statement of the form

 newResult = afterCtag.text.encode(?? some argument ??)

 where newResult was the same as beforeCtag.text?  If so, what should
 the argument be to the encode method?

Dealing with unicode doesn't require you to pollute your code with
encode methods, just open the file using codecs module and then write
unicode strings directly:

import codecs
fileHandle = codecs.open('c:/output1.text', 'w',utf-8)
fileHandle.write(uI\u2019m not confused, because I'm using unicode)

 4) what is the second character in encodedCtagtext (the character with
 an ordinal value of 194)?

That is byte with value 194, it's not a character. It is part of
unicode code point U+0092 when it is encoded in utf-8

 '\xc2\x92'.decode(utf-8)
u'\x92'

This code point actually has no name, so you shouldn't produce it:

 import unicodedata
 unicodedata.name('\xc2\x92'.decode(utf-8))

Traceback (most recent call last):
  File pyshell#40, line 1, in -toplevel-
unicodedata.name('\xc2\x92'.decode(utf-8))
ValueError: no such name

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: the tostring and XML methods in ElementTree

2006-05-07 Thread Serge Orlov
[EMAIL PROTECTED] wrote:
 O/S: Windows XP Home
 Vsn of Python: 2.4

[snip fighting with unicode character U+2019 (RIGHT SINGLE QUOTATION
MARK) ]

I don't know what console you use but if it is IDLE you'll get confused
even more because it is buggy and improperly handles that character:

 print repr(u'I'm confused')
u'I\x92m confused'

I'm using Lightning Compiler
http://cheeseshop.python.org/pypi/Lightning%20Compiler to run
snippets of code and in the editor tab it handles that character fine:

 print repr(u'I'm confused')
u'I\u2019m confused'

But in the console tab it produces the same buggy result :) It looks
like handling unicode is like rocket science :)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Embedding Python: How to run compiled(*.pyc/*.pyo) files using Python C API?

2006-05-05 Thread Serge Orlov
Shankar wrote:
 Hello,

 I am trying to run compiled Python files (*.pyc and *.pyo) using Python C
 API.

 I am using the method PyRun_FileFlags() for this purpose.

 The code snippet is as follows:-

 PyCompilerFlags myFlags;
 myFlags.cf_flags=1; // I tried all values 0, 1 and 2
 PyRun_FileFlags(script, file, Py_file_input, globals, locals, myFlags);

 But unfortunately I get the following exception:-
 DeprecationWarning: Non-ASCII character '\xf2' in file E:\test.pyc on line
 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html
 for details

Note, it's not an exception, it's a warning.


 When I run the .py file, then things work fine.
 The .py file contains only one statement,
 print Hello World

 Which Python C API should I use to run compiled Python files(*.pyc and
 *.pyo) in the scenario where the source file (*.py) is not present.

I believe it's PyImport_ImportModule(test)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Elegent solution to replacing ' and ?

2006-05-05 Thread Serge Orlov
fyleow wrote:
 I'm trying to replace the ' and  characters in the strings I get from
 feedparser so I can enter it in the database without getting errors.
 Here's what I have right now.

 self.title = entry.title.encode('utf-8')
 self.title = self.title.replace('\', '\\\')
 self.title = self.title.replace('\'', '\\\'')

 This works just great but is there a more elegent way to do this?  It
 looks like maybe I could use the translate method but I'm not sure.

You should use execute method to construct sql statements. This is
wrong:

self.title = entry.title.encode('utf-8')
self.title = self.title.replace('\', '\\\')
self.title = self.title.replace('\'', '\\\'')
cursor.execute('select foo from bar where baz=%s ' % self.title)

This is right:

self.title = entry.title
cursor.execute(select foo from bar where baz=%s, (self.title,))

The formatting style differs between db modules, take a look at
paramstyle description in PEP 249:
http://www.python.org/dev/peps/pep-0249/

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why does built-in set not take keyword arguments?

2006-05-04 Thread Serge Orlov

Jack Diederich wrote:
 On Thu, May 04, 2006 at 02:08:30PM -0400, Steven Watanabe wrote:
  I'm trying to do something like this in Python 2.4.3:
 
  class NamedSet(set):
def __init__(self, items=(), name=''):
  set.__init__(self, items)
  self.name = name
 
  class NamedList(list):
def __init__(self, items=(), name=''):
  list.__init__(self, items)
  self.name = name
 
  I can do:
 
   mylist = NamedList(name='foo')
 
  but I can't do:
 
   myset = NamedSet(name='bar')
  TypeError: set() does not take keyword arguments
 
  How come? How would I achieve what I'm trying to do?

 setobject.c checks for keyword arguments in it's __new__ instead
 of its __init__.  I can't think of a good reason other to enforce
 inheriters to be maximally set-like.  We're all adults here so
 I'd call it a bug.  bufferobect, rangeobject, and sliceobject all
 do this too, but classmethod and staticmethod both check in tp_init.
 Go figure.

 As a work around use a function to make the set-alike.

 class NamedSet(set): pass

 def make_namedset(vals, name):
   ob = NamedSet(vals)
   ob.name = name
   return ob

 Then make_namedset as a constructor in place of NamedSet(vals, name)

Or use this work around:

class NamedSet(set):
def __new__(cls, iterable=(), name=):
return super(NamedSet, cls).__new__(cls)

def __init__(self, iterable=(), name=):
super(NamedSet, self).__init__(iterable)
self.name = name

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

2006-05-03 Thread Serge Orlov
[EMAIL PROTECTED] wrote:
 Below are 2 files that isolate the problem.  Note, both programs hang
 (stop responding) with hyper-threading turned on (a BIOS setting), but
 work as expected with hyper-threading turned off.

What do you mean stop responding? Not responding when you press
ctrl-c? They stop printing? If you mean stop printing, try
sys.stdout.flush() after each print

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

2006-05-03 Thread Serge Orlov

[EMAIL PROTECTED] wrote:
  What do you mean stop responding?

 Both threads print their thread numbers (either 1 or 2) approximately
 every 10 seconds.  However, after a while (minutes to hours) both
 programs (see above) hang!

 Pressing ctrl-c (after the printing stops) causes the threads to wake
 up from their sleep statement.  And since the sleep took more than 1
 seconds the thread number and the duration of the sleep is printed to
 the screen.

 Do you have a hyper-threading/dual/multi core CPU?  Did you try this?

I don't have such CPU but I run the first program anyway. It printed

C:\pyth.py
thread 1 started
sleep time: 0.01
3.63174649292e-006
8.43682646817e-005
0.000164825417756

thread 2 started
sleep time: 0.003
0.000675225482568
0.000753447714724
0.00082943502596

1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 2 1

I got bored and tried to stop it with ctrl-c but it didn't respond and
kept running and printing the numbers. I had to kill it from task
manager.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Strange result with math.atan2()

2006-05-02 Thread Serge Orlov
Vedran Furac wrote:
 Ben Caradoc-Davies wrote:
  Vedran Furac wrote:
  I think that this results must be the same:
  In [3]: math.atan2(-0.0,-1)
  Out[3]: -3.1415926535897931
  In [4]: math.atan2(-0,-1)
  Out[4]: 3.1415926535897931
 
  -0 is converted to 0, then to 0.0 for calculation, losing the sign. You
  might as well write 0.0 instead of -0
 
  The behaviour of atan2 conforms to the ISO C99 standard (Python is
  implemented in C). Changing the sign of the first argument changes the
  sign of the output, with no special treatment for zero.
 
  http://www.ugcs.caltech.edu/manuals/libs/mpfr-2.2.0/mpfr_22.html

 Well, here I can read:

 Special values are currently handled as described in the ISO C99 standard
 for the atan2 function (note this may change in future versions):

 * atan2(+0, -0) returns +Pi.
 * atan2(-0, -0) returns -Pi. /* wrong too */
 * atan2(+0, +0) returns +0.
 * atan2(-0, +0) returns -0. /* wrong too */
 * atan2(+0, x) returns +Pi for x  0.
 * atan2(-0, x) returns -Pi for x  0
   

 And the formula (also from that site):
   if x  0, atan2(y, x) = sign(y)*(PI - atan (abs(y/x)))
   ^^^

 So, you can convert -0 to 0, but you must multiply the result with sign of
 y, which is '-' (minus).

But you miss the fact that 0 is an *integer*, not a float, and -0
doesn't exist.
Use this code until you stop passing integers to atan2:

from math import atan2 as math_atan2
def atan2(y, x):
if (isinstance(y, int) and y == 0) or (
isinstance(x, int) and x == 0):
raise ValueError(Argument that is an integer zero can \
produce wrong results)
return math_atan2(y, x)

print atan2(-0.0, -0.0)
print atan2(-0, -0)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: simultaneous assignment

2006-05-02 Thread Serge Orlov
John Salerno wrote:
 bruno at modulix wrote:

  Now if I may ask: what is your actual problem ?

 Ok, since you're so curious. :)

 Here's a scan of the page from the puzzle book:
 http://johnjsalerno.com/spies.png

 Basically I'm reading this book to give me little things to try out in
 Python. There's no guarantee that this puzzle is even conducive to (or
 worthy of) a programming solution.

So what you're trying to do is to run over all possible combinations?
Anyway you don't need to worry about identity, since boolean values are
immutable. In general when you see statement like

some_var = immutable value

you can be *sure* you're changing *only* some_var

Warning! Half-spoiler below :) Following is a function run_over_space
from my personal utils package for generating all combinations and an
example how it can be applied to your puzzle:

def decrement(point, space):
 Yield next point of iteration space 
for coord in range(len(point)):
if point[coord]  0:
point[coord] -= 1
return
else:
point[coord] = space[coord]
continue
raise StopIteration

def run_over_space(space):
 Yield all points of iteration space.
Space is a list of maximum values of each dimension
point = space[:]
while True:
yield point
decrement(point,space)

def describe_point(spy,w,x,y,z):
if spy:
print Spy1 is right, ,
else:
print Spy1 is wrong, ,
print w, x, y, z = , w, x, y, z

for point in run_over_space([1,1,1,1,1]):
describe_point(*point)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to prevent this from happening?

2006-05-01 Thread Serge Orlov
[EMAIL PROTECTED] wrote:
 Regarding this expression:  1  x

 I had a bug in my code that made x become Very Large - much larger than
 I had intended. This caused Python, and my PC, to lock up tight as a
 drum, and it appeared that the Python task (Windows XP) was happily and
 rapidly consuming all available virtual memory.

 Presumably, Python was trying to create a really really long integer,
 just as I had asked it.

 Is there a way to put a limit on Python, much like there is a stack
 limit, so that this sort of thing can't get out of hand?

This is a general problem regardless of programming language and it's
better solved by OS. Windows has API for limiting resource usage but it
lacks user tools. At least I'm not aware of them, maybe *you* can find
them. There is Windows System Resource Manager
http://www.microsoft.com/technet/downloads/winsrvr/wsrm.mspx It won't
run on Windows XP, but you may take a look at its distribution CD
image. If you're lucky maybe there is a command line tool for Windows
XP.

Alternatively you can switch to a better OS ;) Any Unix-like (Max OS X,
Linux, *BSD, etc...), they all have resource usage limiting tools out
of the box.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: stdin: processing characters

2006-05-01 Thread Serge Orlov
Kevin Simmons wrote:
 Thanks for your input. I found an answer that suits my needs, not curses
 :-), but stty settings and sys.stdin.read(n) :

   import os, sys

   while 1:
   os.system(stty -icanon min 1 time 0)
   print 
   Radio computer control program.
   --
   Choose a function:
  po) Power toggle
  fq) Change frequency
  cm) Change mode
  vo) Change volume
  re) Reset
  qu) Quit
   --,
   func = sys.stdin.read(2)
   if func == po:
   ...
   ... rest of menu actions ...
   elif func = qu:
   os.system(stty cooked)
   sys.exit()


Looks reasonable if you don't need portability. But you may want to
refactor it a little bit to make sure terminal setting are always
restored:

try:
do_all_the_work()
finally:
os.system(stty cooked)

P.S. Maybe its me, but when I see call sys.exit() I always have a gut
feeling this function never returns. But in fact my I'm wrong and
sys.exit is more reasonable: it raises exception. So you can call
sys.exit() inside do_all_the_work and you can still be sure that
os.system(stty cooked) is always executed at the end.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Can Python kill a child process that keeps on running?

2006-05-01 Thread Serge Orlov
I. Myself wrote:
 Suppose we spawn a child process with Popen.  I'm thinking of an
 executable file, like a compiled C program.
 Suppose it is supposed to run for one minute, but it just keeps going
 and going.  Does Python have any way to kill it?

 This is not hypothetical; I'm doing it now, and it's working pretty
 well, but I would like to be able to handle this run-on condition.  I'm
 using Windows 2000, but I want my program to be portable to linux.

On linux it's pretty easy to do, just setup alarm signal. On windows
it's not so trivial to the point you cannot do it using python.org
distribution, you will need to poke in low level C API using win32
extensions or ctypes. AFAIK twisted package http://twistedmatrix.com
has some code to help you. Also take a look at buildbot sources
http://buildbot.sf.net that uses twisted. Buildbot has the same
problem as you have, it needs to kill run away or non-responding
processes.

-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   3   >