Re: SAX unicode and ascii parsing problem

2010-11-30 Thread Stefan Behnel

goldtech, 30.11.2010 22:15:

Think I found it, for example:

line = 'my big string'
line.encode('ascii', 'ignore')

I processed the problem strings during parsing with this and it works
now.


That's not the right way of dealing with encodings, though. You should open 
the file with a well defined encoding (using codecs.open() or io.open() in 
Python >= 2.6), and then write the unicode strings into it just as you get 
them.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Watch the YANK CIA BUSTARDS Censorship live delete this post on KEY WIKILEAK REVELATIONS - RARE

2010-11-30 Thread small Pox
http://www.telegraph.co.uk/news/worldnews/northamerica/usa/8152326/WikiLeaks-release-Timeline-of-the-key-WikiLeaks-revelations.html

WikiLeaks release: Timeline of the key WikiLeaks revelations

 By Jon Swaine in New York  6:53PM GMT 22 Nov 2010

December 2007: Guantanamo Bay operating procedures

A US Army manual for soldiers at Camp Delta discloses that prisoners
were denied access to the Red Cross for up to four weeks and that
inmates could earn “special rewards”, including a roll of lavatory
paper, for good behaviour and co-operation.

September 2008: Sarah Palin's email account

Emails taken from the then-Republican Vice-Presidential candidate's
personal account suggest that she has been using it for official
business as Governor of Alaska. Doing so could have helped her avoid
having her communications subjected to state laws on the disclosure of
public records.

November 2008: BNP membership list

The names, addresses and occupations of more than 13,000 members of
the far-Right British party are released in one file. The list shows
that members include police officers, senior members of the military,
doctors and other professionals.

October 2009: Trafigura report

An internal study about the effects of dumping waste by the energy
trading company discloses that it used amateurish processes while
dumping gasoline on the Ivory Coast and probably would have left
dangerous sulphur compounds untreated

November 2009: Climategate emails

More than 1,000 emails sent between staff at the University of East
Anglia's Climate Research Unit appeared to show that scientists
distorted research to boost their argument that global warming was man-
made, causing an international media storm.

November 2009: September 11 pager messages

About half a million pager messages sent in New York City on September
11, 2001, tell the story of the 9/11 terrorist attacks through
individuals. Personal messages from people caught up in the carnage
emerge, prompting criticism from commentators who claim the leak is an
invasion of privacy.

April 2010: Apache helicopter attack on journalists

Video footage shows 15 people, including two people working for the
Reuters news agency, being gunned down by a US Army helicopter in
Iraq. The crew, who were not disciplined, mistook their targets'
camera equipment for weapons.

July 2010: Afghanistan war logs

Tens of thousands of classified US military documents tell of the
daily events of war in Afghanistan. The logs disclose that the Taliban
is receiving greater assistance from the Pakistani intelligence
services than was previously known and that the US runs a secret
assassination squad. They also raise questions over potential crimes
committed by coalition troops.

October 2010: Iraq war logs

Almost 400,000 classified US military documents recording the Iraq war
suggest that evidence of the torture of Iraqis by coalition troops was
ignored and record civilian deaths in more detail than was previously
known. More than 66,000 civilians suffered “violent deaths” between
2004 and the end of 2009, they show.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Martin v. Loewis
> This sounds like a strong prospect for how to get things working (I
> didn't realize open would accept a bytes argument for the filename),
> but I'm also interested in whether reading filenames from stdin and
> subsequently opening them is supposed to "just work" given a suitable
> encoding - like with Java which also uses unicode strings.  In Java,
> I'm told that ISO-8859-1 is supposed to "guarantee a roundtrip
> conversion".

It's the same in Python. However, as in Java, Python will *not*
necessarily use ISO-8859-1 when you pass a (Unicode) string to
open; instead, it will (as will Java) use your locale's encoding.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Martin v. Loewis
> The world does not revolve around Python.  Unix filenames have been
> encoding-agnostic long before Python was around.  If Python3 does not
> support this then it's a regression on Python's part.

Fortunately, Python 3 does support that.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Martin v. Löwis
> It'd be great if all programs used the same encoding on a given OS,
> but at least on Linux, I believe historically filenames have been
> created with different encodings.  IOW, if I pick one encoding and go
> with it, filenames written in some other encoding are likely to cause
> problems.  So I need something for which a filename is just a blob
> that shouldn't be monkeyed with.

In that case, you should use byte strings as file names, not
character strings.

Regards,
Martin

-- 
http://mail.python.org/mailman/listinfo/python-list


Regarding searching directory and to delete it with specific pattern.

2010-11-30 Thread Ramprakash Jelari thinakaran
Hi all,
Would like to search list of directories with specific pattern and delete
it?.. How can i do it?.
Example: in /home/jpr/ i have the following list of directories.
1.2.3-2,  1.2.3-10, 1.2.3-8, i would like to delete the directories other
than 1.2.3-10 which is the higher value?..


Regards,
JPR.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading by positions plain text files

2010-11-30 Thread Tim Harig
On 2010-12-01, javivd  wrote:
> On Nov 30, 11:43 pm, Tim Harig  wrote:
>> On 2010-11-30, javivd  wrote:
>>
>> > I have a case now in wich another file has been provided (besides the
>> > database) that tells me in wich column of the file is every variable,
>> > because there isn't any blank or tab character that separates the
>> > variables, they are stick together. This second file specify the
>> > variable name and his position:
>>
>> > VARIABLE NAME      POSITION (COLUMN) IN FILE
>> > var_name_1                 123-123
>> > var_name_2                 124-125
>> > var_name_3                 126-126
>> > ..
>> > ..
>> > var_name_N                 512-513 (last positions)
>>
>> I am unclear on the format of these positions.  They do not look like
>> what I would expect from absolute references in the data.  For instance,
>> 123-123 may only contain one byte??? which could change for different
>> encodings and how you mark line endings.  Frankly, the use of the
>> world columns in the header suggests that the data *is* separated by
>> line endings rather then absolute position and the position refers to
>> the line number. In which case, you can use splitlines() to break up
>> the data and then address the proper line by index.  Nevertheless,
>> you can use file.seek() to move to an absolute offset in the file,
>> if that really is what you are looking for.
>
> I work in a survey research firm. the data im talking about has a lot
> of 0-1 variables, meaning yes or no of a lot of questions. so only one
> position of a character is needed (not byte), explaining the 123-123
> kind of positions of a lot of variables.

Then file.seek() is what you are looking for; but, you need to be aware of
line endings and encodings as indicated.  Make sure that you open the file
using whatever encoding was used when it was generated or you could have
problems with multibyte characters affecting the offsets.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Change one list item in place

2010-11-30 Thread Steve Holden
On 11/30/2010 8:28 PM, MRAB wrote:
> On 01/12/2010 01:08, Gnarlodious wrote:
>> This works for me:
>>
>> def sendList():
>>  return ["item0", "item1"]
>>
>> def query():
>>  l=sendList()
>>  return ["Formatting only {0} into a string".format(l[0]), l[1]]
>>
>> query()
>>
>>
>> However, is there a way to bypass the
>>
>> l=sendList()
>>
>> and change one list item in-place? Possibly a list comprehension
>> operating on a numbered item?
>>
> There's this:
> 
> return ["Formatting only {0} into a string".format(x) if i == 0 else
> x for i, x in enumerate(sendList())]
> 
> but that's too clever for its own good. Keep it simple. :-)

I quite agree. That solution is so clever it would be asking for a fight
walking into a bar in Glasgow.

However, an unpacking assignment can make everything much more
comprehensible [pun intended] by removing the index operations. The
canonical solution would be something like:

def query():
x, y = sendList()
return ["Formatting only {0} into a string".format(x), y]

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
PyCon 2011 Atlanta March 9-17   http://us.pycon.org/
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Albert Hopkins
On Wed, 2010-12-01 at 02:14 +, MRAB wrote:
> If the filenames are to be shown to a user then there needs to be a
> mapping between bytes and glyphs. That's an encoding. If different
> users use different encodings then exchange of textual data becomes
> difficult.

That's presentation, that's separate.  Indeed, I have my user encoding
set to UTF-8, and if there is a filename that's not valid utf-8 then my
GUI (GNOME will show "(invalid encoding)" and even allow me to rename it
and my shell (bash) will show '?' next to the invalid "characters" (and
make it a little more challenging to rename ;)).  And I can freely copy
these "invalid" files across different (Unix) systems, because the OS
doesn't care about encoding.

But that's completely different from the actual name of the file.  Unix
doesn't care about presentation in filenames. It just cares about the
data.  There are not "glyphs" in Unix, only in the UI that runs on top
of it.

Or to put it another way, Unix's filename encoding is RAW-DATA.  It's
not "textual" data.  The fact that most filenames contain mainly
human-readable text is a convenient convention, but not required or
enforced by the OS.

>  That's where encodings which can be used globally come in.
> By the time Python 4 is released I'd be surprised if Unix hadn't
> standardised on a single encoding like UTF-8. 

I have serious doubts about that.  At least in the Linux world the
kernel wants to stay out of encoding debates (except where it has to
like Window filesystems). But the point is that:

The world does not revolve around Python.  Unix filenames have been
encoding-agnostic long before Python was around.  If Python3 does not
support this then it's a regression on Python's part.


-- 
http://mail.python.org/mailman/listinfo/python-list


To Thread or not to Thread....?

2010-11-30 Thread Jack Keegan
Hi there,

I'm currently writing an application to control and take measurements during
an experiments. This is to be done on an embedded computer running XPe so I
am happy to have python available, although I am pretty new to it.
The application basically runs as a state machine, which transitions through
it's states based on inputs read in from a set of general purpose
input/output (GPIO) lines. So when a certain line is pulled low/high, do
something and move to another state. All good so far and since I get through
main loop pretty quickly, I can just do a read of the GPIO lines on each
pass through the loop and respond accordingly.
However, in one of the states I have to start reading in, and storing frames
from a camera. In another, I have to start reading accelerometer data from
an I2C bus (which operates at 400kHz). I haven't implemented either yet but
I would imagine that, in the case of the camera data, reading a frame would
take a large amount of time as compared to other operations. Therefore, if I
just tried to read one (or one set of) frames on each pass through the loop
then I would hold up the rest of the application. Conversely, as the I2C bus
will need to be read at such a high rate, I may not be able to get the
required data rate I need even without the camera data. This naturally leads
me to think I need to use threads.
As I am no expert in either I2C, cameras, python or threading I thought I
would chance asking for some advice on the subject. Do you think I need
threads here or would I be better off using some other method. I was
previously toying with the idea of using generators to create weightless
threads (as detailed in
http://www.ibm.com/developerworks/library/l-pythrd.html) for reading the
GPIOs. Do you think this would work in this situation?
Another option would be to write separately programs, perhaps even in C, and
spawn these in the background when needed. I'm a little torn as to which way
to go. If it makes a difference and more in case you are wondering, I will
be interfacing to the GPIOs, cameras and I2C bus through a set of C DLLs
using Ctypes.

Any help or suggestions will be greatly appreciated,

Thanks very much,

Jack
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading by positions plain text files

2010-11-30 Thread Tim Chase

On 11/30/2010 08:03 PM, javivd wrote:

On Nov 30, 11:43 pm, Tim Harig  wrote:

VARIABLE NAME  POSITION (COLUMN) IN FILE
var_name_1 123-123
var_name_2 124-125
var_name_3 126-126
..
..
var_name_N 512-513 (last positions)



and no, MRAB, it's not the similar problem (at least what i understood
of it). I have to associate the position this file give me with the
variable name this file give me for those positions.


MRAB may be referring to my reply in that thread where you can do 
something like


  OFFSETS = 'offsets.txt'
  offsets = {}
  f = file(OFFSETS)
  f.next() # throw away the headers
  for row in f:
varname, rest = row.split()[:2]
# sanity check
if varname in offsets:
  print "[%s] in %s twice?!" % (varname, OFFSETS)
if '-' not in rest: continue
start, stop = map(int, rest.split('-'))
offsets[varname] = slice(start, stop+1) # 0-based offsets
#offsets[varname] = slice(start+1, stop+2) # 1-based offsets
  f.close()

  def do_something_with(data):
# your real code goes here
print data['var_name_2']

  for row in file('data.txt'):
data = dict((name, row[offsets[name]]) for name in offsets)
do_something_with(data)

There's additional robustness-checks I'd include if your 
offsets-file isn't controlled by you (people send me daft data).


-tkc




--
http://mail.python.org/mailman/listinfo/python-list


Re: How to initialize each multithreading Pool worker with an individual value?

2010-11-30 Thread James Mills
On Wed, Dec 1, 2010 at 7:35 AM, Valery Khamenya  wrote:
> multithreading.pool Pool has a promissing initializer argument in its
> constructor.
> However it doesn't look possible to use it to initialize each Pool's
> worker with some individual value (I'd wish to be wrong here)
>
> So, how to initialize each multithreading Pool worker with the
> individual values?
>
> The typical use case might be a connection pool, say, of 3 workers,
> where each of 3 workers has its own TCP/IP port.
>
> from multiprocessing.pool import Pool
>
> def port_initializer(_port):
>    global port
>    port = _port
>
> def use_connection(some_packet):
>    global _port
>    print "sending data over port # %s" % port
>
> if __name__ == "__main__":
>    ports=((4001,4002, 4003), )
>    p = Pool(3, port_initializer, ports) # oops... :-)
>    some_data_to_send = range(20)
>    p.map(use_connection, some_data_to_send)

I assume you are talking about multiprocessing
despite you mentioning "multithreading" in the mix.

Have a look at the source code for multiprocessing.pool
and how the Pool object works and what it does
with the initializer argument. I'm not entirely sure it
does what you expect and yes documentation on this
is lacking...

cheers
James

-- 
-- James Mills
--
-- "Problems are solved by method"
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading by positions plain text files

2010-11-30 Thread MRAB

On 01/12/2010 02:03, javivd wrote:

On Nov 30, 11:43 pm, Tim Harig  wrote:

On 2010-11-30, javivd  wrote:


I have a case now in wich another file has been provided (besides the
database) that tells me in wich column of the file is every variable,
because there isn't any blank or tab character that separates the
variables, they are stick together. This second file specify the
variable name and his position:



VARIABLE NAME  POSITION (COLUMN) IN FILE
var_name_1 123-123
var_name_2 124-125
var_name_3 126-126
..
..
var_name_N 512-513 (last positions)


I am unclear on the format of these positions.  They do not look like
what I would expect from absolute references in the data.  For instance,
123-123 may only contain one byte??? which could change for different
encodings and how you mark line endings.  Frankly, the use of the
world columns in the header suggests that the data *is* separated by
line endings rather then absolute position and the position refers to
the line number. In which case, you can use splitlines() to break up
the data and then address the proper line by index.  Nevertheless,
you can use file.seek() to move to an absolute offset in the file,
if that really is what you are looking for.


I work in a survey research firm. the data im talking about has a lot
of 0-1 variables, meaning yes or no of a lot of questions. so only one
position of a character is needed (not byte), explaining the 123-123
kind of positions of a lot of variables.

and no, MRAB, it's not the similar problem (at least what i understood
of it). I have to associate the position this file give me with the
variable name this file give me for those positions.

thank you both and sorry for my english!


You just have to parse the second file to build a list (or dict)
containing the name, start position and end position of each variable:

variables = [("var_name_1", 123, 123), ...]

and then work through that list, extracting the data between those
positions in the first file and putting the values in another list (or
dict).

You also need to check whether the positions are 1-based or 0-based
(Python uses 0-based).
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread MRAB

On 01/12/2010 01:28, Nobody wrote:

On Tue, 30 Nov 2010 18:53:14 +0100, Peter Otten wrote:


I think this is wrong.  In Unix there is no concept of filename
encoding.  Filenames can have any arbitrary set of bytes (except '/' and
'\0').   But the filesystem itself neither knows nor cares about
encoding.


I think you misunderstood what I was trying to say. If you write a list of
filenames into files.txt, and use an encoding (ISO-8859-1, say) other than
that used by the shell to display file names (on Linux typically UTF-8 these
days) and then write a Python script exist.py that reads filenames and
checks for the files' existence,


I think you misunderstood.

In the Unix kernel, there aren't any encodings. Strings of bytes are
/just/ strings of bytes. A text file containing a list of filenames
doesn't /have/ an encoding. The filenames passed to API functions don't
/have/ an encoding.

This is why Unix filenames are case-sensitive: because there isn't any
"case". The number 65 has no more in common with the number 97 than it
does with the number 255. The fact that 65 is the ASCII code for "A" while
97 is the ASCII code for "a" doesn't come into it. Case-insensitive
filenames require knowledge of the encoding in order to determine when
filenames are "equivalent". DOS/Windows tried this and never really got it
right (it works fine on a standalone system, or within later versions of
a Windows-only ecosystem, but becomes a nightmare when files get
transferred between systems via older or non-Microsoft channels).

Python 3.x's decision to treat filenames (and environment variables) as
text even on Unix is, in short, a bug. One which, IMNSHO, will mean that
Python 2.x is still around when Python 4 is released.


If the filenames are to be shown to a user then there needs to be a
mapping between bytes and glyphs. That's an encoding. If different
users use different encodings then exchange of textual data becomes
difficult. That's where encodings which can be used globally come in.
By the time Python 4 is released I'd be surprised if Unix hadn't
standardised on a single encoding like UTF-8.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Reading by positions plain text files

2010-11-30 Thread javivd
On Nov 30, 11:43 pm, Tim Harig  wrote:
> On 2010-11-30, javivd  wrote:
>
> > I have a case now in wich another file has been provided (besides the
> > database) that tells me in wich column of the file is every variable,
> > because there isn't any blank or tab character that separates the
> > variables, they are stick together. This second file specify the
> > variable name and his position:
>
> > VARIABLE NAME      POSITION (COLUMN) IN FILE
> > var_name_1                 123-123
> > var_name_2                 124-125
> > var_name_3                 126-126
> > ..
> > ..
> > var_name_N                 512-513 (last positions)
>
> I am unclear on the format of these positions.  They do not look like
> what I would expect from absolute references in the data.  For instance,
> 123-123 may only contain one byte??? which could change for different
> encodings and how you mark line endings.  Frankly, the use of the
> world columns in the header suggests that the data *is* separated by
> line endings rather then absolute position and the position refers to
> the line number. In which case, you can use splitlines() to break up
> the data and then address the proper line by index.  Nevertheless,
> you can use file.seek() to move to an absolute offset in the file,
> if that really is what you are looking for.

I work in a survey research firm. the data im talking about has a lot
of 0-1 variables, meaning yes or no of a lot of questions. so only one
position of a character is needed (not byte), explaining the 123-123
kind of positions of a lot of variables.

and no, MRAB, it's not the similar problem (at least what i understood
of it). I have to associate the position this file give me with the
variable name this file give me for those positions.

thank you both and sorry for my english!

J
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Change one list item in place

2010-11-30 Thread Gnarlodious
Thanks.
Unless someone has a simpler solution, I'll stick with 2 lines.

-- Gnarlie
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming games in historical linguistics with Python

2010-11-30 Thread Gnarlodious
Have you considered entering all this data into an SQLite database?
You could do fast searches based on any features you deem relevant to
the phoneme. Using an SQLite editor application you can get started
building a database right away. You can add columns as you get the
inspiration, along with any tags you want. Putting it all in database
tables can really make chaotic linguistic data seem manageable.

My own linguistics project uses mostly SQLite and a number of
OrderedDict's based on .plist files. It is all working very nicely,
although I haven't tried to deal with any phonetics (yet).

-- Gnarlie
http://Sectrum.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Change one list item in place

2010-11-30 Thread MRAB

On 01/12/2010 01:08, Gnarlodious wrote:

This works for me:

def sendList():
 return ["item0", "item1"]

def query():
 l=sendList()
 return ["Formatting only {0} into a string".format(l[0]), l[1]]

query()


However, is there a way to bypass the

l=sendList()

and change one list item in-place? Possibly a list comprehension
operating on a numbered item?


There's this:

return ["Formatting only {0} into a string".format(x) if i == 0 
else x for i, x in enumerate(sendList())]


but that's too clever for its own good. Keep it simple. :-)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Nobody
On Tue, 30 Nov 2010 18:53:14 +0100, Peter Otten wrote:

>> I think this is wrong.  In Unix there is no concept of filename
>> encoding.  Filenames can have any arbitrary set of bytes (except '/' and
>> '\0').   But the filesystem itself neither knows nor cares about
>> encoding.
> 
> I think you misunderstood what I was trying to say. If you write a list of 
> filenames into files.txt, and use an encoding (ISO-8859-1, say) other than 
> that used by the shell to display file names (on Linux typically UTF-8 these 
> days) and then write a Python script exist.py that reads filenames and 
> checks for the files' existence, 

I think you misunderstood.

In the Unix kernel, there aren't any encodings. Strings of bytes are
/just/ strings of bytes. A text file containing a list of filenames
doesn't /have/ an encoding. The filenames passed to API functions don't
/have/ an encoding.

This is why Unix filenames are case-sensitive: because there isn't any
"case". The number 65 has no more in common with the number 97 than it
does with the number 255. The fact that 65 is the ASCII code for "A" while
97 is the ASCII code for "a" doesn't come into it. Case-insensitive
filenames require knowledge of the encoding in order to determine when
filenames are "equivalent". DOS/Windows tried this and never really got it
right (it works fine on a standalone system, or within later versions of
a Windows-only ecosystem, but becomes a nightmare when files get
transferred between systems via older or non-Microsoft channels).

Python 3.x's decision to treat filenames (and environment variables) as
text even on Unix is, in short, a bug. One which, IMNSHO, will mean that
Python 2.x is still around when Python 4 is released.

-- 
http://mail.python.org/mailman/listinfo/python-list


Intro to Python slides, was Re: how to go on learning python

2010-11-30 Thread Dan Stromberg
On Tue, Nov 30, 2010 at 6:37 AM, Xavier Heruacles  wrote:
> I'm basically a c/c++ programmer and recently come to python for some web
> development. Using django and javascript I'm afraid I can develop some web
> application now. But often I feel I'm not good at python. I don't know much
> about generators, descriptors and decorators(although I can use some of it
> to accomplish something, but I don't think I'm capable of knowing its
> internals). I find my code ugly, and it seems near everything are already
> gotten done by the libraries. When I want to do something, I just find some
> libraries or modules and then just finish the work. So I'm a bit tired of
> just doing this kind of high level scripting, only to find myself a bad
> programmer. Then my question is after one coded some kind of basic app, how
> one can keep on learning programming using python?
> Do some more interesting projects? Read more general books about
> programming? or...?
> --
> http://mail.python.org/mailman/listinfo/python-list

You could check out these slides from an Intro to Python talk I'm
giving tonight:

http://stromberg.dnsalias.org/~dstromberg/Intro-to-Python/

...perhaps especially the Further Resources section at the end.  The
Koans might be very nice for you, as might Dive Into Python.

BTW, if you're interested in Python and looking into Javascript anew,
you might look at Pyjamas.  It lets you write web apps in Python that
also run on a desktop; you can even call into Raphael from it.  Only
thing about it is it's kind of a young project compared to most Python
implementations.

PS: I mostly came from C too - knowing C can be a real advantage for a
Python programmer sometimes.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Nobody
On Mon, 29 Nov 2010 21:26:23 -0800, Dan Stromberg wrote:

> Does anyone know what I need to do to read filenames from stdin with
> Python 3.1 and subsequently open them, when some of those filenames
> include characters with their high bit set?

Use "bytes" rather than "str". Everywhere. This means reading names from
sys.stdin.buffer (which is a binary stream) rather than sys.stdin (which
is a text stream). If you pass a "bytes" to an I/O function (e.g. open()),
it will just pass the bytes directly to the OS without any decoding.

But really, if you're writing *nix system utilities, you should probably
stick with Python 2.x until the end of time. Using 3.x will just make life
difficult for no good reason (e.g. in 3.x, os.environ also contains
Unicode strings).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to initialize each multithreading Pool worker with an individual value?

2010-11-30 Thread Dan Stromberg
On Tue, Nov 30, 2010 at 1:35 PM, Valery Khamenya  wrote:
> Hi,
>
> multithreading.pool Pool has a promissing initializer argument in its
> constructor.
> However it doesn't look possible to use it to initialize each Pool's
> worker with some individual value (I'd wish to be wrong here)
>
> So, how to initialize each multithreading Pool worker with the
> individual values?
>
> The typical use case might be a connection pool, say, of 3 workers,
> where each of 3 workers has its own TCP/IP port.
>
> from multiprocessing.pool import Pool
>
> def port_initializer(_port):
>    global port
>    port = _port
>
> def use_connection(some_packet):
>    global _port
>    print "sending data over port # %s" % port
>
> if __name__ == "__main__":
>    ports=((4001,4002, 4003), )
>    p = Pool(3, port_initializer, ports) # oops... :-)
>    some_data_to_send = range(20)
>    p.map(use_connection, some_data_to_send)

Using an initializer with multiprocessing is something I've never tried.

I have used queues with multiprocessing though, and I believe you
could use them, at least as a fallback plan if you prefer to get the
initializer to work.

If you create in the parent a queue in shared memory (multiprocessing
facilitates this nicely), and fill that queue with the values in your
ports tuple, then you could have each child in the worker pool extract
a single value from this queue so each worker can have its own, unique
port value.

HTH
-- 
http://mail.python.org/mailman/listinfo/python-list


Change one list item in place

2010-11-30 Thread Gnarlodious
This works for me:

def sendList():
return ["item0", "item1"]

def query():
l=sendList()
return ["Formatting only {0} into a string".format(l[0]), l[1]]

query()


However, is there a way to bypass the

l=sendList()

and change one list item in-place? Possibly a list comprehension
operating on a numbered item?

-- Gnarlie
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Dan Stromberg
On Tue, Nov 30, 2010 at 9:53 AM, Peter Otten <__pete...@web.de> wrote:
> $ ls
> $ python3
> Python 3.1.1+ (r311:74480, Nov  2 2009, 15:45:00)
> [GCC 4.4.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
 with open(b"\xe4\xf6\xfc.txt", "w") as f:
> ...     f.write("hello\n")
> ...
> 6

> $ ls
> ???.txt

This sounds like a strong prospect for how to get things working (I
didn't realize open would accept a bytes argument for the filename),
but I'm also interested in whether reading filenames from stdin and
subsequently opening them is supposed to "just work" given a suitable
encoding - like with Java which also uses unicode strings.  In Java,
I'm told that ISO-8859-1 is supposed to "guarantee a roundtrip
conversion".
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Dan Stromberg
On Tue, Nov 30, 2010 at 7:19 AM, Antoine Pitrou  wrote:
> On Mon, 29 Nov 2010 21:52:07 -0800 (PST)
> Yingjie Lan  wrote:
>> --- On Tue, 11/30/10, Dan Stromberg  wrote:
>> > In Python 3, I'm finding that I have encoding issues with
>> > characters
>> > with their high bit set.  Things are fine with strictly
>> > ASCII
>> > filenames.  With high-bit-set characters, even if I
>> > change stdin's
>> > encoding with:
>>
>> Co-ask. I have also had problems with file names in
>> Chinese characters with Python 3. I unzipped the
>> turtle demo files into the desktop folder (of
>> course, the word 'desktop' is in Chinese, it is
>> a windows XP system, localization is Chinese), then
>> all in a sudden some of the demos won't work
>> anymore. But if I move it to a folder whose
>> path contains only english characters, everything
>> comes back to normal.
>
> Can you try the latest 3.2alpha4 (*) and check if this is fixed?
> If not, then could you please open a bug on http://bugs.python.org ?
>
> (*) http://python.org/download/releases/3.2/
>
> Thank you
>
> Antoine.

I have the same problem using 3.2alpha4: the word man~ana (6
characters long) in a filename causes problems (I'm catching the
exception and skipping the file for now) despite using what I believe
is an 8-bit, all 256-bytes-are-characters encoding: iso-8859-1.  'not
sure if you wanted both of us to try this, or Yingjie alone though.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Dan Stromberg
On Tue, Nov 30, 2010 at 11:47 AM, Martin v. Loewis  wrote:
>> Does anyone know what I need to do to read filenames from stdin with
>> Python 3.1 and subsequently open them, when some of those filenames
>> include characters with their high bit set?
>
> If your files on disk use file names encoded in iso-8859-1, don't set
> your locale to a UTF-8 locale (as you apparently do), but set it to
> a locale that actually matches the encoding that you use.
>
> Regards,
> Martin
>

It'd be great if all programs used the same encoding on a given OS,
but at least on Linux, I believe historically filenames have been
created with different encodings.  IOW, if I pick one encoding and go
with it, filenames written in some other encoding are likely to cause
problems.  So I need something for which a filename is just a blob
that shouldn't be monkeyed with.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming games in historical linguistics with Python

2010-11-30 Thread Vlastimil Brom
2010/11/30 Dax Bloom :
> Hello,
>
> Following a discussion that began 3 weeks ago I would like to ask a
> question regarding substitution of letters according to grammatical
> rules in historical linguistics. I would like to automate the
> transformation of words according to complex rules of phonology and
> integrate that script in a visual environment.
> Here follows the previous thread:
> http://groups.google.com/group/comp.lang.python/browse_thread/thread/3c55f9f044c3252f/fe7c2c82ecf0dbf5?lnk=gst&q=evolutionary+linguistics#fe7c2c82ecf0dbf5
>
> Is there a way to refer to vowels and consonants as a subcategory of
> text? Is there a function to remove all vowels? How should one create
> and order the dictionary file for the rules? How to chain several
> transformations automatically from multiple rules? Finally can anyone
> show me what existing python program or phonological software can do
> this?
>
> What function could tag syllables, the word nucleus and the codas? How
> easy is it to bridge this with a more visual environment where
> interlinear, aligned text can be displayed with Greek notations and
> braces as usual in the phonology textbooks?
>
> Best regards,
>
> Dax Bloom
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Hi,
as far as I know, there is no predefined function or library for
distinguishing vowels or consonants, but these can be simply
implemented individually according to the exact needs.

e.g. regular expressions can be used here: to remove vowels, the code
could be (example from the command prompt):

>>> import re
>>> re.sub(r"(?i)[aeiouy]", "", "This is a SAMPLE TEXT")
'Ths s  SMPL TXT'
>>>

See http://docs.python.org/library/re.html
or
http://www.regular-expressions.info/
for the regexp features.

You may eventually try the new development version regex, which adds
many interesting new features and remove some limitations
http://bugs.python.org/issue2636

In some cases regular expressions aren't really appropriate or may
become too complicated.
Sometimes a parsing library like pyparsing may be a more adequate tool:
http://pyparsing.wikispaces.com/

If the rules are simple enough, that they can be formulated for single
characters or character clusters with a regular expression, you can
model the phonological changes as a series of replacements with
matching patterns and the respective replacement patterns.

For character-wise matching and replacing the regular expressions are
very effective; using lookarounds
http://www.regular-expressions.info/lookaround.html
even some combinatorics for conditional changes can be expressed;
however, i would find some more complex conditions, suprasegmentals,
morpheme boundaries etc. rather difficult to formalise this way...

hth,
  vbr
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading by positions plain text files

2010-11-30 Thread MRAB

On 30/11/2010 21:31, javivd wrote:

Hi all,

Sorry, newbie question:

I have database in a plain text file (could be .txt or .dat, it's the
same) that I need to read in python in order to do some data
validation. In other files I read this kind of files with the split()
method, reading line by line. But split() relies on a separator
character (I think... all I know is that it's work OK).

I have a case now in wich another file has been provided (besides the
database) that tells me in wich column of the file is every variable,
because there isn't any blank or tab character that separates the
variables, they are stick together. This second file specify the
variable name and his position:


VARIABLE NAME   POSITION (COLUMN) IN FILE
var_name_1  123-123
var_name_2  124-125
var_name_3  126-126
..
..
var_name_N  512-513 (last positions)

How can I read this so each position in the file it's associated with
each variable name?


It sounds like a similar problem to this:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/53e6f41bfff6/123422d510187dc3?show_docid=123422d510187dc3
--
http://mail.python.org/mailman/listinfo/python-list


Re: how to go on learning python

2010-11-30 Thread Terry Reedy

On 11/30/2010 9:37 AM, Xavier Heruacles wrote:

I'm basically a c/c++ programmer and recently come to python for some
web development. Using django and javascript I'm afraid I can develop
some web application now. But often I feel I'm not good at python. I
don't know much about generators, descriptors and decorators(although I
can use some of it to accomplish something, but I don't think I'm
capable of knowing its internals). I find my code ugly, and it seems
near everything are already gotten done by the libraries. When I want to
do something, I just find some libraries or modules and then just finish
the work. So I'm a bit tired of just doing this kind of high level
scripting, only to find myself a bad programmer. Then my question is
after one coded some kind of basic app, how one can keep on learning
programming using python?
Do some more interesting projects? Read more general books about
programming? or...?


You can use both your old C skills and new Python skills by helping to 
develop Python by working on issues on the tracker bugs.python.org. If 
you are interested but needed help getting started, ask.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Catching user switching and getting current active user from root on linux

2010-11-30 Thread James Mills
On Wed, Dec 1, 2010 at 8:54 AM, Tim Harig  wrote:
> Well you could use inotify to trigger on any changes to /var/log/wtmp.
> When a change is detected, you could check of deltas in the output of "who
> -a" to figure out what has changed since the last time wtmp triggered.

This is a good idea and you could also
make use of the following library:

http://pypi.python.org/pypi?:action=search&term=utmp&submit=search

cheers
James

-- 
-- James Mills
--
-- "Problems are solved by method"
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Catching user switching and getting current active user from root on linux

2010-11-30 Thread Tim Harig
On 2010-11-30, mpnordland  wrote:
> I have situation where I need to be able to get the current active
> user, and catch user switching eg user1 locks screen, leaves computer,
> user2 comes, and logs on.
> basically, when there is any type of user switch my script needs to
> know.

Well you could use inotify to trigger on any changes to /var/log/wtmp.
When a change is detected, you could check of deltas in the output of "who
-a" to figure out what has changed since the last time wtmp triggered.
-- 
http://mail.python.org/mailman/listinfo/python-list


Catching user switching and getting current active user from root on linux

2010-11-30 Thread mpnordland
I have situation where I need to be able to get the current active
user, and catch user switching eg user1 locks screen, leaves computer,
user2 comes, and logs on.
basically, when there is any type of user switch my script needs to
know.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading by positions plain text files

2010-11-30 Thread Tim Harig
On 2010-11-30, javivd  wrote:
> I have a case now in wich another file has been provided (besides the
> database) that tells me in wich column of the file is every variable,
> because there isn't any blank or tab character that separates the
> variables, they are stick together. This second file specify the
> variable name and his position:
>
> VARIABLE NAME POSITION (COLUMN) IN FILE
> var_name_1123-123
> var_name_2124-125
> var_name_3126-126
> ..
> ..
> var_name_N512-513 (last positions)

I am unclear on the format of these positions.  They do not look like
what I would expect from absolute references in the data.  For instance,
123-123 may only contain one byte??? which could change for different
encodings and how you mark line endings.  Frankly, the use of the
world columns in the header suggests that the data *is* separated by
line endings rather then absolute position and the position refers to
the line number. In which case, you can use splitlines() to break up
the data and then address the proper line by index.  Nevertheless,
you can use file.seek() to move to an absolute offset in the file,
if that really is what you are looking for.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: IMAP support

2010-11-30 Thread pakalk
On 30 Lis, 22:26, Adam Tauno Williams  wrote:
> On Tue, 2010-11-30 at 13:03 -0800, pakalk wrote:
> > Please, give me an example of raw query to IMAP server?
>
> 
>
> I'm not certain what you mean by "raw query".

m = imap()
m.query('UID SORT ...') # etc.

Thanks for link :)
-- 
http://mail.python.org/mailman/listinfo/python-list


How to initialize each multithreading Pool worker with an individual value?

2010-11-30 Thread Valery Khamenya
Hi,

multithreading.pool Pool has a promissing initializer argument in its
constructor.
However it doesn't look possible to use it to initialize each Pool's
worker with some individual value (I'd wish to be wrong here)

So, how to initialize each multithreading Pool worker with the
individual values?

The typical use case might be a connection pool, say, of 3 workers,
where each of 3 workers has its own TCP/IP port.

from multiprocessing.pool import Pool

def port_initializer(_port):
global port
port = _port

def use_connection(some_packet):
global _port
print "sending data over port # %s" % port

if __name__ == "__main__":
ports=((4001,4002, 4003), )
p = Pool(3, port_initializer, ports) # oops... :-)
some_data_to_send = range(20)
p.map(use_connection, some_data_to_send)


best regards
--
Valery A.Khamenya
-- 
http://mail.python.org/mailman/listinfo/python-list


Reading by positions plain text files

2010-11-30 Thread javivd
Hi all,

Sorry, newbie question:

I have database in a plain text file (could be .txt or .dat, it's the
same) that I need to read in python in order to do some data
validation. In other files I read this kind of files with the split()
method, reading line by line. But split() relies on a separator
character (I think... all I know is that it's work OK).

I have a case now in wich another file has been provided (besides the
database) that tells me in wich column of the file is every variable,
because there isn't any blank or tab character that separates the
variables, they are stick together. This second file specify the
variable name and his position:


VARIABLE NAME   POSITION (COLUMN) IN FILE
var_name_1  123-123
var_name_2  124-125
var_name_3  126-126
..
..
var_name_N  512-513 (last positions)

How can I read this so each position in the file it's associated with
each variable name?

Thanks a lot!!

Javier

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: IMAP support

2010-11-30 Thread Adam Tauno Williams
On Tue, 2010-11-30 at 13:03 -0800, pakalk wrote: 
> Please, give me an example of raw query to IMAP server?



I'm not certain what you mean by "raw query".

> And why do you focus on "Nevermind is so ekhm... nevermind... "??
> Cannot you just help?

This list does suffer from a case of "attitude".  Most programming
forums have that; Python "attitude" has its own special flavor.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: SAX unicode and ascii parsing problem

2010-11-30 Thread Justin Ezequiel
can't check right now but are you sure it's the parser and not
this line
d.write(csv+"\n")
that's failing?
what is d?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: SAX unicode and ascii parsing problem

2010-11-30 Thread goldtech
snip...
>
> I'm just as stumped as I was when you first asked this question 13
> minutes ago. ;-)
>
> regards
>  Steve
>
snip...

Hi Steve,

Think I found it, for example:

line = 'my big string'
line.encode('ascii', 'ignore')

I processed the problem strings during parsing with this and it works
now. Got this from:

http://stackoverflow.com/questions/2365411/python-convert-unicode-to-ascii-without-errors


Best, Lee

:^)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: IMAP support

2010-11-30 Thread pakalk
Please, give me an example of raw query to IMAP server?

And why do you focus on "Nevermind is so ekhm... nevermind... "??
Cannot you just help?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: SAX unicode and ascii parsing problem

2010-11-30 Thread Steve Holden
On 11/30/2010 3:43 PM, goldtech wrote:
> Hi,
> 
> I'm trying to parse an xml file using SAX. About half-way through a
> file I get this error:
> 
> Traceback (most recent call last):
>   File "C:\Python26\Lib\site-packages\pythonwin\pywin\framework
> \scriptutils.py", line 325, in RunScript
> exec codeObject in __main__.__dict__
>   File "E:\sc\b2.py", line 58, in 
> parser.parse(open(r'ppb5.xml'))
>   File "C:\Python26\Lib\xml\sax\expatreader.py", line 107, in parse
> xmlreader.IncrementalParser.parse(self, source)
>   File "C:\Python26\Lib\xml\sax\xmlreader.py", line 123, in parse
> self.feed(buffer)
>   File "C:\Python26\Lib\xml\sax\expatreader.py", line 207, in feed
> self._parser.Parse(data, isFinal)
>   File "C:\Python26\Lib\xml\sax\expatreader.py", line 304, in
> end_element
> self._cont_handler.endElement(name)
>   File "E:\sc\b2.py", line 51, in endElement
> d.write(csv+"\n")
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 146-147: ordinal not in range(128)
> 
> I'm using ActivePython 2.6. I trying to figure out the simplest fix.
> If there's a Python way to just take the source XML file and covert/
> process it so this will not happen - that would be best. Or should I
> just update to Python 3 ?
> 
> I tried this but nothing changed, I thought this might convert it and
> then I'd paerse the new file - didn't work:
> 
> uc = open(r'E:\sc\ppb4.xml').read().decode('utf8')
> ascii = uc.decode('ascii')
> mex9 = open( r'E:\scrapes\ppb5.xml', 'w' )
> mex9.write(ascii)
> 
> Again I'm looking for something simple even it's a few more lines of
> codes...or upgrade(?)
> 
> Thanks, appreciate any help.
> mex9.close()

I'm just as stumped as I was when you first asked this question 13
minutes ago. ;-)

regards
 Steve

-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
PyCon 2011 Atlanta March 9-17   http://us.pycon.org/
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/

-- 
http://mail.python.org/mailman/listinfo/python-list


SAX unicode and ascii parsing problem

2010-11-30 Thread goldtech
Hi,

I'm trying to parse an xml file using SAX. About half-way through a
file I get this error:

Traceback (most recent call last):
  File "C:\Python26\Lib\site-packages\pythonwin\pywin\framework
\scriptutils.py", line 325, in RunScript
exec codeObject in __main__.__dict__
  File "E:\sc\b2.py", line 58, in 
parser.parse(open(r'ppb5.xml'))
  File "C:\Python26\Lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
  File "C:\Python26\Lib\xml\sax\xmlreader.py", line 123, in parse
self.feed(buffer)
  File "C:\Python26\Lib\xml\sax\expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
  File "C:\Python26\Lib\xml\sax\expatreader.py", line 304, in
end_element
self._cont_handler.endElement(name)
  File "E:\sc\b2.py", line 51, in endElement
d.write(csv+"\n")
UnicodeEncodeError: 'ascii' codec can't encode characters in position
146-147: ordinal not in range(128)

I'm using ActivePython 2.6. I trying to figure out the simplest fix.
If there's a Python way to just take the source XML file and covert/
process it so this will not happen - that would be best. Or should I
just update to Python 3 ?

I tried this but nothing changed, I thought this might convert it and
then I'd paerse the new file - didn't work:

uc = open(r'E:\sc\ppb4.xml').read().decode('utf8')
ascii = uc.decode('ascii')
mex9 = open( r'E:\scrapes\ppb5.xml', 'w' )
mex9.write(ascii)

Again I'm looking for something simple even it's a few more lines of
codes...or upgrade(?)

Thanks, appreciate any help.
mex9.close()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Q] get device major/minor number

2010-11-30 Thread Dan M
On Tue, 30 Nov 2010 21:35:43 +0100, Thomas Portmann wrote:

> Thank you very much Dan, this is exactly what I was looking for.
> 
> 
> Tom

You're very welcome.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Q] get device major/minor number

2010-11-30 Thread Thomas Portmann
On Tue, Nov 30, 2010 at 9:18 PM, Dan M  wrote:
> On Tue, 30 Nov 2010 21:09:14 +0100, Thomas Portmann wrote:

>> In the example below, I would like to get the major (8) and minor (0, 1,
>> 2) numbers of /dev/sda{,1,2}. How can I get them?
>
> I think the os.major() and os.minor() calls ought to do what you want.
>
 import os
 s = os.stat('/dev/sda1')
 os.major(s.st_rdev)
> 8
 os.minor(s.st_rdev)
> 1

Thank you very much Dan, this is exactly what I was looking for.


Tom
-- 
http://mail.python.org/mailman/listinfo/python-list


get a free domain , free design , and free host

2010-11-30 Thread mohammed_a_o
get a free domain , free design , and free host

http://freedesignandhost.co.cc/

get a free domain , free design , and free host


http://freedesignandhost.co.cc/free-design.php


http://freedesignandhost.co.cc/free-host.php


http://freedesignandhost.co.cc/free-domain.php
-- 
http://mail.python.org/mailman/listinfo/python-list


SAX unicode and ascii parsing problem

2010-11-30 Thread goldtech
Hi,

I'm trying to parse an xml file using SAX. About half-way through a
file I get this error:

Traceback (most recent call last):
  File "C:\Python26\Lib\site-packages\pythonwin\pywin\framework
\scriptutils.py", line 325, in RunScript
exec codeObject in __main__.__dict__
  File "E:\sc\b2.py", line 58, in 
parser.parse(open(r'ppb5.xml'))
  File "C:\Python26\Lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
  File "C:\Python26\Lib\xml\sax\xmlreader.py", line 123, in parse
self.feed(buffer)
  File "C:\Python26\Lib\xml\sax\expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
  File "C:\Python26\Lib\xml\sax\expatreader.py", line 304, in
end_element
self._cont_handler.endElement(name)
  File "E:\sc\b2.py", line 51, in endElement
d.write(csv+"\n")
UnicodeEncodeError: 'ascii' codec can't encode characters in position
146-147: ordinal not in range(128)

I'm using ActivePython 2.6. I trying to figure out the simplest fix.
If there's a Python way to just take the source XML file and covert/
process it so this will not happen - that would be best. Or should I
just update to Python 3 ?

I tried this but nothing changed, I thought this might convert it and
then I'd paerse the new file - didn't work:

uc = open(r'E:\sc\ppb4.xml').read().decode('utf8')
ascii = uc.decode('ascii')
mex9 = open( r'E:\scrapes\ppb5.xml', 'w' )
mex9.write(ascii)

Again I'm looking for something simple even it's a few more lines of
codes...or upgrade(?)

Thanks, appreciate any help.
mex9.close()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread Ben Finney
OW Ghim Siong  writes:

> I have a big file 1.5GB in size, with about 6 million lines of
> tab-delimited data. I have to perform some filtration on the data and
> keep the good data. After filtration, I have about 5.5 million data
> left remaining. As you might already guessed, I have to read them in
> batches and I did so using .readlines(1).

Why do you need to handle the batching in your code? Perhaps you're not
aware that a file object is already an iterator for the lines of text in
the file.

> After reading each batch, I will split the line (in string format) to
> a list using .split("\t") and then check several conditions, after
> which if all conditions are satisfied, I will store the list into a
> matrix.

As I understand it, you don't need a line after moving to the next. So
there's no need to maintain a manual buffer of lines at all; please
explain if there is something additional requiring a huge buffer of
input lines.

> The code is as follows:
> -Start--
> a=open("bigfile")
> matrix=[]
> while True:
>lines = a.readlines(1)
>for line in lines:
>data=line.split("\t")
>if several_conditions_are_satisfied:
>matrix.append(data)
>print "Number of lines read:", len(lines), "matrix.__sizeof__:",
> matrix.__sizeof__()
>if len(lines)==0:
>break
> -End-

Using the file's native line iterator::

infile = open("bigfile")
matrix = []
for line in infile:
record = line.split("\t")
if several_conditions_are_satisfied:
matrix.append(record)

> Results:
> Number of lines read: 461544 matrix.__sizeof__: 1694768
> Number of lines read: 449840 matrix.__sizeof__: 3435984
> Number of lines read: 455690 matrix.__sizeof__: 5503904
> Number of lines read: 451955 matrix.__sizeof__: 6965928
> Number of lines read: 452645 matrix.__sizeof__: 8816304
> Number of lines read: 448555 matrix.__sizeof__: 9918368
>
> Traceback (most recent call last):
> MemoryError

If you still get a MemoryError, you can use the ‘pdb’ module
http://docs.python.org/library/pdb.html> to debug it interactively.

Another option is to catch the MemoryError and construct a diagnostic
message similar to the one you had above::

import sys

infile = open("bigfile")
matrix = []
for line in infile:
record = line.split("\t")
if several_conditions_are_satisfied:
try:
matrix.append(record)
except MemoryError:
matrix_len = len(matrix)
sys.stderr.write(
"len(matrix): %(matrix_len)d\n" % vars())
raise

> I have tried creating such a matrix of equivalent size and it only
> uses 35mb of memory but I am not sure why when using the code above,
> the memory usage shot up so fast and exceeded 2GB.
>
> Any advice is greatly appreciated.

With large data sets, and the manipulation and computation you will
likely be wanting to perform, it's probably time to consider the NumPy
library http://numpy.scipy.org/> which has much more powerful array
types, part of the SciPy library http://www.scipy.org/>.

-- 
 \“[It's] best to confuse only one issue at a time.” —Brian W. |
  `\  Kernighan, Dennis M. Ritchie, _The C programming language_, 1988 |
_o__)  |
Ben Finney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Q] get device major/minor number

2010-11-30 Thread Dan M
On Tue, 30 Nov 2010 21:09:14 +0100, Thomas Portmann wrote:

> Hello all,
> 
> In a script I would like to extract all device infos from block or
> character device. The "stat" function gives me most of the infos (mode,
> timestamp, user and group id, ...), however I did not find how to get
> the devices major and minor numbers. Of course I could do it by calling
> an external program, but is it possible to stay within python?
> 
> In the example below, I would like to get the major (8) and minor (0, 1,
> 2) numbers of /dev/sda{,1,2}. How can I get them?

I think the os.major() and os.minor() calls ought to do what you want.

>>> import os
>>> s = os.stat('/dev/sda1')
>>> os.major(s.st_rdev)
8
>>> os.minor(s.st_rdev)
1
>>> 

d...@dan:~$ ls -l /dev/sda1
brw-rw 1 root disk 8, 1 2010-11-18 05:41 /dev/sda1

-- 
http://mail.python.org/mailman/listinfo/python-list


[Q] get device major/minor number

2010-11-30 Thread Thomas Portmann
Hello all,

In a script I would like to extract all device infos from block or
character device. The "stat" function gives me most of the infos
(mode, timestamp, user and group id, ...), however I did not find how
to get the devices major and minor numbers. Of course I could do it by
calling an external program, but is it possible to stay within python?

In the example below, I would like to get the major (8) and minor (0,
1, 2) numbers of /dev/sda{,1,2}. How can I get them?

u...@host:~$ ls -l /dev/sda /dev/sda1 /dev/sda2
brw-rw 1 root disk 8, 0 Nov 30 19:10 /dev/sda
brw-rw 1 root disk 8, 1 Nov 30 19:10 /dev/sda1
brw-rw 1 root disk 8, 2 Nov 30 19:10 /dev/sda2
u...@host:~$ python3.1 -c 'import os
for el in ["","1","2"]: print(os.stat("/dev/sda"+el));'
posix.stat_result(st_mode=25008, st_ino=1776, st_dev=5, st_nlink=1,
st_uid=0, st_gid=6, st_size=0, st_atime=1291140641,
st_mtime=1291140640, st_ctime=1291140640)
posix.stat_result(st_mode=25008, st_ino=1780, st_dev=5, st_nlink=1,
st_uid=0, st_gid=6, st_size=0, st_atime=1291140644,
st_mtime=1291140641, st_ctime=1291140641)
posix.stat_result(st_mode=25008, st_ino=1781, st_dev=5, st_nlink=1,
st_uid=0, st_gid=6, st_size=0, st_atime=1291140644,
st_mtime=1291140641, st_ctime=1291140641)

Thanks


Tom
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Martin v. Loewis
> Does anyone know what I need to do to read filenames from stdin with
> Python 3.1 and subsequently open them, when some of those filenames
> include characters with their high bit set?

If your files on disk use file names encoded in iso-8859-1, don't set
your locale to a UTF-8 locale (as you apparently do), but set it to
a locale that actually matches the encoding that you use.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: C struct to Python

2010-11-30 Thread geremy condra
On Tue, Nov 30, 2010 at 10:57 AM, Eric Frederich
 wrote:
> I am not sure how to proceed.
> I am writing a Python interface to a C library.
> The C library uses structures.
> I was looking at the struct module but struct.unpack only seems to
> deal with data that was packed using struct.pack or some other buffer.
> All I have is the struct itself, a pointer in C.
> Is there a way to unpack directly from a memory address?
>
> Right now on the C side of things I can create a buffer of the struct
> data like so...
>
>    MyStruct ms;
>    unsigned char buffer[sizeof(MyStruct) + 1];
>    memcpy(buffer, &ms, sizeof(MyStruct));
>    return Py_BuildValue("s#", buffer, sizeof(MyStruct));
>
> Then on the Python side I can unpack it using struct.unpack.
>
> I'm just wondering if I need to jump through these hoops of packing it
> on the C side or if I can do it directly from Python.
>
> Thanks,
> ~Eric

ctypes[0] sounds like a possible solution, although if you're already
writing a C extension it might be better practice to just write a
Python object that wraps your C struct appropriately. If you're not
wedded to the C extension, though, I've had very good luck writing C
interfaces with with ctypes and a few useful decorators [1], [2].
Others prefer Cython[3], which I like for speed but which sometimes
seems to get in my way when I'm trying to interface with existing
code. There's a good, if somewhat dated, overview of a few other
strategies here[4].

Geremy Condra

[0]: http://docs.python.org/library/ctypes.html
[1]: http://code.activestate.com/recipes/576734-c-struct-decorator/
[2]: http://code.activestate.com/recipes/576731/
[3]: http://www.cython.org/
[4]: http://www.suttoncourtenay.org.uk/duncan/accu/integratingpython.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Almost free iPod

2010-11-30 Thread iGet
I know nothing is ever free and that is true.  However, you can get
things really cheap.  Two offers I am working on right now are: (Copy
and Paste link into your web browser)

A Free iPod 64gb - http://www.YouriPodTouch4free.com/index.php?ref=6695331

Here is how it works:

You click on one of the links above, select the item you want, then
enter your email in the sign-up section.The next page it will ask
you if you want to do the offer as referral or points, I would suggest
referral. Now it is going to take you to your main page.  Here you
will need to complete a level A offer or 50 points in level B
offers.

Now you may have the question, is this legit.  Surf the internet about
these sites and you will find out that they are legit.  I will not
lie; it is hard to get the referrals needed to get the items.

A suggestion is try joining the Freebie Forums.  There are several
people at these forums doing the same thing we are doing and this may
help you get some referrals quicker.
-- 
http://mail.python.org/mailman/listinfo/python-list


C struct to Python

2010-11-30 Thread Eric Frederich
I am not sure how to proceed.
I am writing a Python interface to a C library.
The C library uses structures.
I was looking at the struct module but struct.unpack only seems to
deal with data that was packed using struct.pack or some other buffer.
All I have is the struct itself, a pointer in C.
Is there a way to unpack directly from a memory address?

Right now on the C side of things I can create a buffer of the struct
data like so...

MyStruct ms;
unsigned char buffer[sizeof(MyStruct) + 1];
memcpy(buffer, &ms, sizeof(MyStruct));
return Py_BuildValue("s#", buffer, sizeof(MyStruct));

Then on the Python side I can unpack it using struct.unpack.

I'm just wondering if I need to jump through these hoops of packing it
on the C side or if I can do it directly from Python.

Thanks,
~Eric
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using property() to extend Tkinter classes but Tkinter classes are old-style classes?

2010-11-30 Thread Robert Kern

On 11/30/10 11:00 AM, Giacomo Boffi wrote:

Terry Reedy  writes:


On 11/28/2010 3:47 PM, pyt...@bdurham.com wrote:

I had planned on subclassing Tkinter.Toplevel() using property() to wrap
access to properties like a window's title.
After much head scratching and a peek at the Tkinter.py source, I
realized that all Tkinter classes are old-style classes (even under
Python 2.7).
1. Is there a technical reason why Tkinter classes are still old-style
classes?


To not break old code. Being able to break code by upgrading all
classes in the stdlib was one of the reasons for 3.x.


In 3.x, are Tkinter classes still derived by old-style classes?


No.

[~]$ python3
Python 3.1.2 (r312:79360M, Mar 24 2010, 01:33:18)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tkinter
>>> tkinter.Tk.mro()
[, , , 'object'>]

>>>

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Using property() to extend Tkinter classes but Tkinter classes are old-style classes?

2010-11-30 Thread Hans Mulder

Giacomo Boffi wrote:

Terry Reedy  writes:


On 11/28/2010 3:47 PM, pyt...@bdurham.com wrote:

I had planned on subclassing Tkinter.Toplevel() using property() to wrap
access to properties like a window's title.
After much head scratching and a peek at the Tkinter.py source, I
realized that all Tkinter classes are old-style classes (even under
Python 2.7).
1. Is there a technical reason why Tkinter classes are still old-style
classes?

To not break old code. Being able to break code by upgrading all
classes in the stdlib was one of the reasons for 3.x.


In 3.x, are Tkinter classes still derived by old-style classes?


3.x does not provide old-style classes.

Oh, and the name Tkinter was changed to tkinter: all modules in the
standard library have lower case names in 3.x.

HTH,

-- HansM
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Peter Otten
Albert Hopkins wrote:

> On Tue, 2010-11-30 at 11:52 +0100, Peter Otten wrote:
> Dan Stromberg wrote:
>> 
>> > I've got a couple of programs that read filenames from stdin, and
> then
>> > open those files and do things with them.  These programs sort of do
>> > the *ix xargs thing, without requiring xargs.
>> > 
>> > In Python 2, these work well.  Irrespective of how filenames are
>> > encoded, things are opened OK, because it's all just a stream of
>> > single byte characters.
>> 
>> I think you're wrong. The filenames' encoding as they are read from stdin
>> must be the same as the encoding used by the file system. If the file 
>> system expects UTF-8 and you feed it ISO-8859-1 you'll run into errors.
>> 
> I think this is wrong.  In Unix there is no concept of filename
> encoding.  Filenames can have any arbitrary set of bytes (except '/' and
> '\0').   But the filesystem itself neither knows nor cares about
> encoding.

I think you misunderstood what I was trying to say. If you write a list of 
filenames into files.txt, and use an encoding (ISO-8859-1, say) other than 
that used by the shell to display file names (on Linux typically UTF-8 these 
days) and then write a Python script exist.py that reads filenames and 
checks for the files' existence, 

$ python3 exist.py < files.txt

will report that a file

b'\xe4\xf6\xfc.txt' 

doesn't exist. The user looking at his editor with the encoding set to 
ISO-8859-1 seeing the line

äöü.txt

and then going to the console typing

$ ls
äöü.txt

will be confused even though everything is working correctly. 
The system may be shuffling bytes, but the user thinks in codepoints and 
sometimes assumes that codepoints and bytes are the same.

> You always have to know either
>> 
>> (a) both the file system's and stdin's actual encoding, or
>> (b) that both encodings are the same.
>> 
>> 
> If this is true, then I think that it is wrong to do in Python3.  Any
> language should be able to deal with the filenames that the host OS
> allows.
> 
> Anyway, going on with the OP.. can you open stdin so that you can accept
> arbitrary bytes instead of strings and then open using the bytes as the
> filename? 

You can access the underlying stdin.buffer that feeds you the raw bytes with 
no attempt to shoehorn them into codepoints. You can use filenames that are 
not valid in the encoding that the system uses to display filenames:

$ ls
$ python3
Python 3.1.1+ (r311:74480, Nov  2 2009, 15:45:00)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> with open(b"\xe4\xf6\xfc.txt", "w") as f:
... f.write("hello\n")
...
6
>>>
$ ls
???.txt

> I don't have that much experience with Python3 to say for sure.

Me neither.

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using property() to extend Tkinter classes but Tkinter classes are old-style classes?

2010-11-30 Thread Giacomo Boffi
Terry Reedy  writes:

> On 11/28/2010 3:47 PM, pyt...@bdurham.com wrote:
>> I had planned on subclassing Tkinter.Toplevel() using property() to wrap
>> access to properties like a window's title.
>> After much head scratching and a peek at the Tkinter.py source, I
>> realized that all Tkinter classes are old-style classes (even under
>> Python 2.7).
>> 1. Is there a technical reason why Tkinter classes are still old-style
>> classes?
>
> To not break old code. Being able to break code by upgrading all
> classes in the stdlib was one of the reasons for 3.x.

In 3.x, are Tkinter classes still derived by old-style classes?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: remote control firefox with python

2010-11-30 Thread baloan
On Nov 28, 4:22 pm, News123  wrote:
> Hi,
>
> I wondered whether there is a simpe way to
> 'remote' control fire fox with python.
>
> With remote controlling I mean:
> - enter a url in the title bar and click on it
> - create a new tab
> - enter another url click on it
> - save the html document of this page
> - Probably the most difficult one: emulate a click or 'right click' on a
> certain button or link of the current page.
> - other interesting things would be to be able to enter the master
>         password from a script
> - to enable disable proxy settings while running.
>
> The reason why I want to stay within Firefox and not use any other
> 'mechanize' frame work is, that the pages I want to automate might
> contain a lot of javascript for the construction of the actual page.
>
> Thanks in advance for any pointers ideas.

I have had some good experience with Sikuli.

http://sikuli.org/

Regards, Andreas
bal...@gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Iran slams Wiki-release as US psywar - WIKILEAKS is replacing those BIN LADEN communiques of CIA (the global ELITE) intended to threaten MASSES

2010-11-30 Thread small Pox
Iran slams Wiki-release as US psywar - WIKILEAKS is replacing those
BIN LADEN communiques of CIA (the global ELITE) intended to threaten
MASSES

CIA is the criminal agency of the global elite.

They want to destroy the middle class from the planet and also create
a global tyranny of a police state.

http://presstv.ir/detail/153128.html
http://presstv.ir/detail/153128.html
http://presstv.ir/detail/153128.html

Iran slams Wiki-release as US psywar
Mon Nov 29, 2010 12:56PM
Share | Email | Print
Iran's President Mahmoud Ahmadinejad has questioned the recent
'leaked' documents published by Wikileaks website, describing them as
part of a US "psychological warfare."


In response to a question by Press TV on Monday over the whistleblower
website's "leaks," President Mahmoud Ahmadinejad said "let me first
correct you. The material was not leaked, but rather released in an
organized effort."

"The US administration releases documents and makes a judgment based
on them. They are mostly like a psychological warfare and lack legal
basis," President Ahmadinejad told reporters on Monday.

"The documents will certainly have no political effects. Nations are
vigilant today and such moves will have no impact on international
relations," the Iranian chief executive added at the press briefing in
Tehran.

President Ahmadinejad stressed that the Wikileaks "game" is "not even
worth a discussion and that no one would waste their time analysing
them."

"The countries in the region are like friends and brothers and these
acts of mischief will not affect their relations," he concluded.

Talks with the West

The president announced that aside from Brazil and Turkey a number of
other countries may take part in the new round of talks between Iran
and the P5+1 -- Britain, China, France, Russia, the US, plus Germany.

Human rights

"They (Western powers) trample on the dignity of man, their identity
and real freedom. They infringe all of these and then they call it
human rights," Ahmadinejad said.

Earlier this month, the UN General Assembly's Third Committee accused
Iran of violating human rights regulations.

The 118-member Non-Aligned Movement and the 57-member Organization of
the Islamic Conference have condemned the resolution against the
Islamic Republic.

"In 2005, the human rights [issue] got a new mechanism in the United
Nations ... human rights was pushed away and human rights was used for
political manipulation," Secretary General of Iran's High Council for
Human Rights Mohammed Javad Larijani told Press TV following the vote
on the resolution.

This is while the United Nations Human Rights Council reviewed the US
human rights record for the first time in its history. The council
then issued a document making 228 suggestions to the US to improve its
rights record.

IAEA 'leak'

The president said that Iran has always had a positive relationship
with the International Atomic Energy Agency but criticized the UN
nuclear agency for caving under pressure from the "masters of power
and wealth."

The president said due to this pressure the IAEA has at times adopted
"unfair and illegal stances" against the Islamic Republic.

"Their recent one (IAEA report) is better than the previous ones and
is closer to the truth but still all the facts are not reflected," he
added. "Of course the latest report also has shortcomings, for example
all [of Iran's nuclear] information has been released and these are
secret and confidential documents belonging to the country."

Ahmadinejad said since Iran was following a policy of nuclear
transparency, it did not care about the leaks, but called the move
'illegal."

New world order

"The world needs order … an order in which different people form
different walks of life enjoy equal rights and proper dignity," the
president said in his opening speech before taking questions form
Iranian and foreign journalist.

The president added that the world was already on the path to setting
up this order.

Iran isolation

When asked to comment on the US and Western media claims that Iran has
become highly isolated in the region despite an active diplomacy with
Persian Gulf littoral states, the president said the remarks were part
of the "discourse of hegemony."

"In the hegemonic discourse, it seems that concepts and words take on
different meanings than those offered by dictionaries," Ahmadinejad
said.

"When they say they have isolated Iran, it means that they themselves
are isolated and when they say Iran is economically weak, it means
that it has strengthened," the president reasoned.

When they say there is a dictatorship somewhere, it means that country
is really chosen by the people and vise a versa, the president further
noted, adding, "I do not want to name names."

ZHD/HGH/SF/MMN/MB Comments
Add Comment Click Here
Note: The views expressed and the links provided on our comment pages
are the personal views of individual contributors and do not
necessarily reflect the views of Press TV.
check this out
11/3

Re: how to go on learning python

2010-11-30 Thread Alice Bevan–McGregor

Howdy Xavier!

[Apologies for the length of this; I didn't expect to write so much!]

I've been a Python programmer for many years now (having come from a 
PHP, Perl, C, and Pascal background) and I'm constantly learning new 
idioms and ways of doing things that are more "Pythonic"; cleaner, more 
efficient, or simply more beautiful.  I learn by coding, rather than by 
reading books, taking lectures, or sitting idly watching screencasts.  
I constantly try to break the problems I come up with in my head into 
smaller and smaller pieces, then write the software for those pieces in 
as elegant a method as possible.


Because of my "turtles all the way down" design philosophy, a lot of my 
spare time projects have no immediate demonstrable benefit; I code them 
for fun!  I have a folder full of hundreds of these little projects, 
the vast majority of which never see a public release.  I also collect 
little snippets of code that I come across[1] or write, and often 
experiment with performance tests[2] of small Python snippets.


Often I'll assign myself the task of doing something far outside my 
comfort zone; a recent example is writing a HTTP/1.1 web server.  I had 
no idea how to do low-level socket programming in Python, let alone how 
HTTP actually worked under-the-hood, and because my goal wasn't 
(originally) to produce a production-quality product for others it gave 
me the freedom to experiment, rewrite, and break things in as many ways 
as I wanted.  :)  I had people trying to convince me that I shouldn't 
re-invent the wheel ("just use Twisted!") though they mis-understood 
the reason for my re-invention: to learn.


It started as a toy 20-line script to dump a static HTTP/1.0 response 
on each request and has grown into a ~270 line fully HTTP/1.1 
compliant, ultra-performant multi-process HTTP server rivalling pretty 
much every other pure-Python web server I've tested.  (I still don't 
consider it production ready, though.)  Progressive enhancement as I 
came up with and implemented ideas meant that sometimes I had to 
rewrite it from scratch, but I'm quite proud of the result and have 
learned far more than I expected in the process.


While I don't necessarily study books on Python, I did reference HTTP: 
The Definitive Guide and many websites in developing that server, and I 
often use the Python Quick Reference[3] when I zone out and forget 
something basic or need to find something more advanced.


In terms of understanding how Python works, or how you can use certain 
semantics (or even better, why you'd want to!) Python Enhancement 
Proposals (PEPs) can be an invaluable resource.  For example, PEP 
318[4] defines what a decorator is, why they're useful, how they work, 
and how you can write your own.  Pretty much everything built into 
Python after Python 2.0 was first described, reasoned, and discussed in 
a PEP.


If you haven't seen this already, the Zen of Python[5] (a PEP) has many 
great guidelines.  I try to live and breathe the Zen.


So that's my story: how I learn to improve my own code.  My motto, 
"re-inventing the wheel, every time," is the short version of the 
above.  Of course, for commercial work I don't generally spend so much 
time on the nitty-gritty details; existing libraries are there for a 
reason, and, most of the time, Getting Things Done™ is more important 
than linguistic purity!  ;)


— Alice.

[1] https://github.com/GothAlice/Random/
[2] https://gist.github.com/405354
[3] http://rgruet.free.fr/PQR26/PQR2.6.html
[4] http://www.python.org/dev/peps/pep-0318/
[5] http://www.python.org/dev/peps/pep-0020/


--
http://mail.python.org/mailman/listinfo/python-list


Re: Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread Antoine Pitrou
On Tue, 30 Nov 2010 18:29:35 +0800
OW Ghim Siong  wrote:
> 
> Does anyone know why is there such a big difference memory usage when 
> storing the matrix as a list of list, and when storing it as a list of 
> string?

That's because any object has a fixed overhead (related to metadata and
allocation), so storing a matrix line as a sequence of several objects
rather than a single string makes the total overhead larger,
especially when the payload of each object is small.

If you want to mitigate the issue, you could store your lines as tuples
rather than lists, since tuples have a smaller memory footprint:

matrix.append(tuple(data))

> According to __sizeof__ though, the values are the same whether 
> storing it as a list of list, or storing it as a list of string.

As mentioned by others, __sizeof__ only gives you the size of the
container, not the size of the contained values (which is where the
difference is here).

Regards

Antoine.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Antoine Pitrou
On Mon, 29 Nov 2010 21:52:07 -0800 (PST)
Yingjie Lan  wrote:
> --- On Tue, 11/30/10, Dan Stromberg  wrote:
> > In Python 3, I'm finding that I have encoding issues with
> > characters
> > with their high bit set.  Things are fine with strictly
> > ASCII
> > filenames.  With high-bit-set characters, even if I
> > change stdin's
> > encoding with:
> 
> Co-ask. I have also had problems with file names in
> Chinese characters with Python 3. I unzipped the 
> turtle demo files into the desktop folder (of
> course, the word 'desktop' is in Chinese, it is
> a windows XP system, localization is Chinese), then
> all in a sudden some of the demos won't work
> anymore. But if I move it to a folder whose 
> path contains only english characters, everything
> comes back to normal.

Can you try the latest 3.2alpha4 (*) and check if this is fixed?
If not, then could you please open a bug on http://bugs.python.org ?

(*) http://python.org/download/releases/3.2/

Thank you

Antoine.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 2.7.1

2010-11-30 Thread Antoine Pitrou
On Mon, 29 Nov 2010 15:11:28 -0800 (PST)
Spider  wrote:
> > 2.7 includes many features that were first released in Python 3.1. The 
> > faster io module ...
> 
> I understand that I/O in Python 3.0 was slower than 2.x (due to quite
> a lot of the code being in Python rather than C, I gather), and that
> this was fixed up in 3.1. So, io in 3.1 is faster than in 3.0.
> 
> Is it also true that io is faster in 2.7 than 2.6? That's what the
> release notes imply, but I wonder whether that comment has been back-
> ported from the 3.1 release notes, and doesn't actually apply to 2.7.

The `io` module, which was backported from 3.1/3.2, is faster than in
2.6, but that's not what is used by default in 2.x when calling e.g.
open() or file() (you'd have to use io.open() instead).

So, as you suspect, the speed of I/O in 2.7 hasn't changed. The `io`
module is available in 2.6/2.7 so that you can experiment with some 3.x
features without switching, and in this case it's much faster than 2.6.

Regards

Antoine.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread Tim Chase

On 11/30/2010 04:29 AM, OW Ghim Siong wrote:

a=open("bigfile")
matrix=[]
while True:
 lines = a.readlines(1)
 for line in lines:
 data=line.split("\t")
 if several_conditions_are_satisfied:
 matrix.append(data)
 print "Number of lines read:", len(lines), "matrix.__sizeof__:",
matrix.__sizeof__()
 if len(lines)==0:
 break


As others have mentiond, don't use .readlines() but use the 
file-object as an iterator instead.  This can even be rewritten 
as a simple list-comprehension:


  from csv import reader
  matrix = [data
for data
in reader(file('bigfile.txt', 'rb'), delimiter='\t')
if several_conditions_are_satisfied(data)
]

Assuming that you're throwing away most of the data (the final 
"matrix" fits well within memory, even if the source file doesn't).


-tkc



--
http://mail.python.org/mailman/listinfo/python-list


Help: problem in setting the background colour ListBox

2010-11-30 Thread ton ph
Hi everyone ,
 I have a requirement of displaying my data in a textCtrl like widget , but
i need that the data in the row be clickable ,
so as when i click the data i could be able to get fire and even and get me
the selected data value.After a long
search i  found ListBox to be perfect for my use but When i try to set the
backGround colour to the colour of my
application requirement i am not able to do so, but i am able to set the
foreground colour .
Hope someone will guide me in solving my problem
Thanks
-- 
http://mail.python.org/mailman/listinfo/python-list


Programming games in historical linguistics with Python

2010-11-30 Thread Dax Bloom
Hello,

Following a discussion that began 3 weeks ago I would like to ask a
question regarding substitution of letters according to grammatical
rules in historical linguistics. I would like to automate the
transformation of words according to complex rules of phonology and
integrate that script in a visual environment.
Here follows the previous thread:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/3c55f9f044c3252f/fe7c2c82ecf0dbf5?lnk=gst&q=evolutionary+linguistics#fe7c2c82ecf0dbf5

Is there a way to refer to vowels and consonants as a subcategory of
text? Is there a function to remove all vowels? How should one create
and order the dictionary file for the rules? How to chain several
transformations automatically from multiple rules? Finally can anyone
show me what existing python program or phonological software can do
this?

What function could tag syllables, the word nucleus and the codas? How
easy is it to bridge this with a more visual environment where
interlinear, aligned text can be displayed with Greek notations and
braces as usual in the phonology textbooks?

Best regards,

Dax Bloom
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How does GC affect generator context managers?

2010-11-30 Thread Duncan Booth
Jason  wrote:

> As I understood it, when the "with" block exits, the __exit__() method
> is called immediately. This calls the next() method on the underlying
> generator, which forces it to run to completion (and raise a
> StopIteration), which includes the finally clause... right?
> 
That is true if the "with" block exits, but if the "with" block (or 
"try".."finally" block) contains "yield" you have a generator. In that case 
if you simply drop the generator on the floor the cleanup at the end of the 
"with" will still happen, but maybe not until the generator is garbage 
collected.

def foo():
   with open("foo") as foo:
  for line in foo:
  yield line

...

bar = foo()
print bar.next()
del bar # May close the file now or maybe later...

   


-- 
Duncan Booth http://kupuguy.blogspot.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Needed: Real-world examples for Python's Cooperative Multiple Inheritance

2010-11-30 Thread Sol Toure
Most of the examples presented here can use the "decorator pattern" instead.
Especially the window system


On Mon, Nov 29, 2010 at 5:27 PM, Gregory Ewing
wrote:

> Paul Rubin wrote:
>
>  The classic example though is a window system, where you have a "window"
>> class, and a "scroll bar" class, and a "drop-down menu" class, etc. and
>> if you want a window with a scroll bar and a drop-down menu, you inherit
>> from all three of those classes.
>>
>
> Not in any GUI library I've ever seen. Normally there would
> be three objects involved in such an arrangement, a Window,
> a ScrollBar and a DropDownMenu, connected to each other in
> some way.
>
> --
> Greg
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
http://www.afroblend.com
African news as it happens.
-- 
http://mail.python.org/mailman/listinfo/python-list


how to go on learning python

2010-11-30 Thread Xavier Heruacles
I'm basically a c/c++ programmer and recently come to python for some web
development. Using django and javascript I'm afraid I can develop some web
application now. But often I feel I'm not good at python. I don't know much
about generators, descriptors and decorators(although I can use some of it
to accomplish something, but I don't think I'm capable of knowing its
internals). I find my code ugly, and it seems near everything are already
gotten done by the libraries. When I want to do something, I just find some
libraries or modules and then just finish the work. So I'm a bit tired of
just doing this kind of high level scripting, only to find myself a bad
programmer. Then my question is after one coded some kind of basic app, how
one can keep on learning programming using python?
Do some more interesting projects? Read more general books about
programming? or...?
-- 
http://mail.python.org/mailman/listinfo/python-list


nike shoes , fashi on clothes ; brand hand bags

2010-11-30 Thread SA sada
 Dear customers, thank you for your support of our company.
Here, there's good news to tell you: The company recently
launched a number of new fashion items! ! Fashionable
and welcome everyone to come buy. If necessary, please
plut: http://www.vipshops.org ==

 http://www.vipshops.org ==

 http://www.vipshops.org ==

 http://www.vipshops.org ==

 http://www.vipshops.org ==

 http://www.vipshops.org ==

 http://www.vipshops.org ==
1) More pictures available on our website (= http://www.vipshops.org
)
2) Many colors available .
3) Perfect quality,
4) 100% safe door to door delivery,
Best reputation , Best services
Posted: 4:13 pm on November 21st
-- 
http://mail.python.org/mailman/listinfo/python-list


How does GC affect generator context managers?

2010-11-30 Thread Jason
I've been reading through the docs for contextlib and PEP 343, and
came across this:

Note that we're not guaranteeing that the finally-clause is
executed immediately after the generator object becomes unused,
even though this is how it will work in CPython.

...referring to context managers created via the
contextlib.contextmanager decorator containing cleanup code in a
"finally" clause. While I understand that Python-the-language does not
specify GC semantics, and different implementations can do different
things with that, what I don't get is how GC even relates to a context
manager created from a generator.

As I understood it, when the "with" block exits, the __exit__() method
is called immediately. This calls the next() method on the underlying
generator, which forces it to run to completion (and raise a
StopIteration), which includes the finally clause... right?

— Jason
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Albert Hopkins
On Tue, 2010-11-30 at 11:52 +0100, Peter Otten wrote:
Dan Stromberg wrote:
> 
> > I've got a couple of programs that read filenames from stdin, and
then
> > open those files and do things with them.  These programs sort of do
> > the *ix xargs thing, without requiring xargs.
> > 
> > In Python 2, these work well.  Irrespective of how filenames are
> > encoded, things are opened OK, because it's all just a stream of
> > single byte characters.
> 
> I think you're wrong. The filenames' encoding as they are read from
stdin 
> must be the same as the encoding used by the file system. If the file
system 
> expects UTF-8 and you feed it ISO-8859-1 you'll run into errors.
> 
> I think this is wrong.  In Unix there is no concept of filename
encoding.  Filenames can have any arbitrary set of bytes (except '/' and
'\0').   But the filesystem itself neither knows nor cares about
encoding.

You always have to know either
> 
> (a) both the file system's and stdin's actual encoding, or 
> (b) that both encodings are the same.
> 
> 
If this is true, then I think that it is wrong to do in Python3.  Any
language should be able to deal with the filenames that the host OS
allows.

Anyway, going on with the OP.. can you open stdin so that you can accept
arbitrary bytes instead of strings and then open using the bytes as the
filename? I don't have that much experience with Python3 to say for
sure.

-a


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: TDD in python

2010-11-30 Thread Roy Smith
In article 
<58fe3680-21f5-42f8-9341-e069cbb88...@r19g2000prm.googlegroups.com>,
 rustom  wrote:

> Looking around I found this:
> http://bytes.com/topic/python/answers/43330-unittest-vs-py-test
> where Raymond Hettinger no less says quite unequivocally that he
> prefers test.py to builtin unittest
> because it is not so heavy-weight
> 
> Is this the general consensus nowadays among pythonistas?
> [Note I tend to agree but Ive no experience so asking]

Both frameworks have their fans; I doubt you'll find any consensus.

Pick one, learn it, and use it.  What's important is that you write 
tests, write lots of tests, and write good tests.  Which framework you 
use is a detail.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: remote control firefox with python

2010-11-30 Thread Hans-Peter Jansen
On Sunday 28 November 2010, 16:22:33 News123 wrote:
> Hi,
>
>
> I wondered whether there is a simpe way to
> 'remote' control fire fox with python.
>
>
> With remote controlling I mean:
> - enter a url in the title bar and click on it
> - create a new tab
> - enter another url click on it
> - save the html document of this page
> - Probably the most difficult one: emulate a click or 'right click'
> on a certain button or link of the current page.
> - other interesting things would be to be able to enter the master
>   password from a script
> - to enable disable proxy settings while running.
>
> The reason why I want to stay within Firefox and not use any other
> 'mechanize' frame work is, that the pages I want to automate might
> contain a lot of javascript for the construction of the actual page.

If webkit based rendering in an option (since its javascript engine is 
respected by web developers nowadays..), you might want to check out  
PyQt, based on current versions of Qt. It provides very easy access to 
a full featured web browser engine without sacrificing low level 
details. All your requirements are provided easily (if you're able to 
grok the Qt documentation, e.g. ignore all C++ clutter, you're set).

I've transcoded all available QtWebKit examples to python lately, 
available here:

http://www.riverbankcomputing.com/pipermail/pyqt/2010-November/028614.html

The attachment is a tar.bz2 archive, btw.

Clicking is archived by:

webelement.evaluateJavaScript(
"var event = document.createEvent('MouseEvents');"
"event.initEvent('click', true, true);"
"this.dispatchEvent(event);"
)

Cheers,
Pete
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread Peter Otten
OW Ghim Siong wrote:

> Hi all,
> 
> I have a big file 1.5GB in size, with about 6 million lines of
> tab-delimited data. I have to perform some filtration on the data and
> keep the good data. After filtration, I have about 5.5 million data left
> remaining. As you might already guessed, I have to read them in batches
> and I did so using .readlines(1). After reading each batch, I
> will split the line (in string format) to a list using .split("\t") and
> then check several conditions, after which if all conditions are
> satisfied, I will store the list into a matrix.
> 
> The code is as follows:
> -Start--
> a=open("bigfile")
> matrix=[]
> while True:
> lines = a.readlines(1)
> for line in lines:
> data=line.split("\t")
> if several_conditions_are_satisfied:
> matrix.append(data)
> print "Number of lines read:", len(lines), "matrix.__sizeof__:",
> matrix.__sizeof__()
> if len(lines)==0:
> break
> -End-

As Ulrich says, don't use readlines(), use

for line in a:
   ... 

that way you have only one line in memory at a time instead of the huge 
lines list.

> Results:
> Number of lines read: 461544 matrix.__sizeof__: 1694768
> Number of lines read: 449840 matrix.__sizeof__: 3435984
> Number of lines read: 455690 matrix.__sizeof__: 5503904
> Number of lines read: 451955 matrix.__sizeof__: 6965928
> Number of lines read: 452645 matrix.__sizeof__: 8816304
> Number of lines read: 448555 matrix.__sizeof__: 9918368
> 
> Traceback (most recent call last):
> MemoryError
> 
> The peak memory usage at the task manager is > 2GB which results in the
> memory error.
> 
> However, if I modify the code, to store as a list of string rather than
> a list of list by changing the append statement stated above to
> "matrix.append("\t".join(data))", then I do not run out of memory.
> 
> Results:
> Number of lines read: 461544 matrix.__sizeof__: 1694768
> Number of lines read: 449840 matrix.__sizeof__: 3435984
> Number of lines read: 455690 matrix.__sizeof__: 5503904
> Number of lines read: 451955 matrix.__sizeof__: 6965928
> Number of lines read: 452645 matrix.__sizeof__: 8816304
> Number of lines read: 448555 matrix.__sizeof__: 9918368
> Number of lines read: 453455 matrix.__sizeof__: 12552984
> Number of lines read: 432440 matrix.__sizeof__: 14122132
> Number of lines read: 432921 matrix.__sizeof__: 15887424
> Number of lines read: 464259 matrix.__sizeof__: 17873376
> Number of lines read: 450875 matrix.__sizeof__: 20107572
> Number of lines read: 458552 matrix.__sizeof__: 20107572
> Number of lines read: 453261 matrix.__sizeof__: 22621044
> Number of lines read: 413456 matrix.__sizeof__: 22621044
> Number of lines read: 166464 matrix.__sizeof__: 25448700
> Number of lines read: 0 matrix.__sizeof__: 25448700
> 
> In this case, the peak memory according to the task manager is about 1.5
> GB.
> 
> Does anyone know why is there such a big difference memory usage when
> storing the matrix as a list of list, and when storing it as a list of
> string? According to __sizeof__ though, the values are the same whether
> storing it as a list of list, or storing it as a list of string. Is

sizeof gives you the "shallow" size of the list, basically the memory to 
hold C pointers to the items in the list. A better approximation for the 
total size of a list of lists of string is

>>> from sys import getsizeof as sizeof
>>> matrix = [["alpha", "beta"], ["gamma", "delta"]]
>>> sizeof(matrix), sum(sizeof(row) for row in matrix), sum(sizeof(entry) 
for row in matrix for entry in row)
(88, 176, 179)
>>> sum(_)
443

As you can see the outer list requires only a small portion of the total 
memory, and its relative size will decrease as the matrix grows.

The above calculation may still be wrong because some of the strings could 
be identical. Collapsing identical strings into a single object is also a 
way to save memory if you have a significant number of repetitions. Try

matrix = []
with open(...) as f:
for line in f:
data = line.split("\t")
if ...:
matrix.append(map(intern, data))

to see whether it sufficiently reduces the amount of memory needed.

> there any methods how I can store all the info into a list of list? I
> have tried creating such a matrix of equivalent size and it only uses
> 35mb of memory but I am not sure why when using the code above, the
> memory usage shot up so fast and exceeded 2GB.
> 
> Any advice is greatly appreciated.
> 
> Regards,
> Jinxiang

-- 
http://mail.python.org/mailman/listinfo/python-list


ANNOUNCE: NHI1-0.10, PLMK-1.8 und libmsgque-4.8

2010-11-30 Thread Andreas Otto
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear User,


ANNOUNCE:Major Feature Release


  libmsgque: Application-Server-Toolkit for
 C, C++, JAVA, C#, Go, TCL, PERL, PHP, PYTHON, RUBY, VB.NET
  PLMK:  Programming-Language-Microkernel
  NHI1:  Non-Human-Intelligence #1


STATEMENT
=

"It takes 2 years"
"and a team of qualified software developers"
"to implement a new programming language,"
"but it takes only 2 weeks to add a micro-kernel"
- - aotto1968


SUMMARY
===


Add support from the programming language "Go" from Google


LINKS
=

  UPDATE - PLMK definition
   >
http://openfacts2.berlios.de/wikien/index.php/BerliosProject:NHI1_-_TheKernel
  ChangeLog:
   > http://nhi1.berlios.de/theLink/changelog.htm
  libmsgque including PHP documentation:
   > http://nhi1.berlios.de/theLink/index.htm
  NHI1:
   > http://nhi1.berlios.de/
  DOWNLOAD:
   > http://developer.berlios.de/projects/nhi1/
  Go man pages:
   > reference: gomsgqueref.n
   > tutorial:  gomsgquetut.n



mfg, Andreas Otto (aotto1968)
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJM9OZsAAoJEGTcPijNG3/A+qwH/1WT3K8619eLzQ78dylS623r
qrZtHXRxieD+4GIBgkU7KbNu+LGztxasLW9upafmmF2mGcWtIFuiOEJtw6MJM+07
0X7elXM5WZkXK65dbLE5bbSfO0DHw5T6aIweogA3zjcjDbB3rSC/T6WIlZB4HNYh
nBj9xC6WMP7s/jEjs4i5FCRT6gTRzDDJbR+SXqNEEYc/z8wVKPUDfpU/6JGxl9MV
rPSUsO+YdZX0XI7+imiUYSVyt+kniL3C36kGON/qGDahscoQYFS6GdoI5XDzI0c+
jN7Q2Ecrphd5F5G/2plNLbVy4mPVd9k/I8VjXMaHLm+skT2Z4Zt7aF29A1FFw68=
=/O74
-END PGP SIGNATURE-
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread Ulrich Eckhardt
OW Ghim Siong wrote:
> I have a big file 1.5GB in size, with about 6 million lines of
> tab-delimited data.

How many fields are there an each line?

> I have to perform some filtration on the data and 
> keep the good data. After filtration, I have about 5.5 million data left
> remaining. As you might already guessed, I have to read them in batches
> and I did so using .readlines(1).

I'd have guessed differently. Typically, I would say that you read one line,
apply whatever operation you want to it and then write out the result. At
least that is the "typical" operation of filtering.

> a=open("bigfile")

I guess you are on MS Windows. There, you have different handling of textual
and non-textual files with regards to the handling of line endings.
Generally, using non-textual as input is easier, because it doesn't require
any translations. However, textual input is the default, therefore:

  a = open("bigfile", "rb")

Or, even better:

 with open("bigfile", "rb") as a:

to make sure the file is closed correctly and in time.

> matrix=[]
> while True:
> lines = a.readlines(1)
> for line in lines:

I believe you could do

  for line in a:
  # use line here

> data=line.split("\t")

Question here: How many elements does each line contain? And what is their
content? The point is that each object has its overhead, and if the content
is just e.g. an integral number or a short string, the ratio of interesting
content to overhead is rather bad! Compare this to storing a longer string
with just the overhead of a single string object instead, it should be
obvious.

> However, if I modify the code, to store as a list of string rather than
> a list of list by changing the append statement stated above to
> "matrix.append("\t".join(data))", then I do not run out of memory.

You already have the result of that join:

  matrix.append(line)

> Does anyone know why is there such a big difference memory usage when
> storing the matrix as a list of list, and when storing it as a list of
> string? According to __sizeof__ though, the values are the same whether
> storing it as a list of list, or storing it as a list of string.

I can barely believe that. How are you using __sizeof__? Why aren't you
using sys.getsizeof() instead? Are you aware that the size of a list
doesn't include the size for its content (even though it grows with the
number of elements), while the size of a string does?


> Is there any methods how I can store all the info into a list of list? I
> have tried creating such a matrix of equivalent size and it only uses
> 35mb of memory but I am not sure why when using the code above, the
> memory usage shot up so fast and exceeded 2GB.

The size of an empty list is 20 here, plus 4 per element (makes sense on a
32-bit machine), excluding the elements themselves. That means that you
have around 8M elements (25448700/4). These take around 32MB of memory,
which is what you are probably seeing. The point is that your 35mb don't
include any content, probably just a single interned integer or None, so
that all elements of your list are the same and only require memory once.
In your real-world application that is obviously not so.

My suggestions:
1. Find out what exactly is going on here, in particular why our
interpretations of the memory usage differ.
2. Redesign your code to really use a filtering design, i.e. don't keep the
whole data in memory.
3. If you still have memory issues, take a look at the array library, which
should make storage of large arrays a bit more efficient.


Good luck!

Uli

-- 
Domino Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently open that filename

2010-11-30 Thread Peter Otten
Dan Stromberg wrote:

> I've got a couple of programs that read filenames from stdin, and then
> open those files and do things with them.  These programs sort of do
> the *ix xargs thing, without requiring xargs.
> 
> In Python 2, these work well.  Irrespective of how filenames are
> encoded, things are opened OK, because it's all just a stream of
> single byte characters.

I think you're wrong. The filenames' encoding as they are read from stdin 
must be the same as the encoding used by the file system. If the file system 
expects UTF-8 and you feed it ISO-8859-1 you'll run into errors.

You always have to know either

(a) both the file system's and stdin's actual encoding, or 
(b) that both encodings are the same.

If byte strings work you are in situation (b) or just lucky. I'd guess the 
latter ;)
 
> In Python 3, I'm finding that I have encoding issues with characters
> with their high bit set.  Things are fine with strictly ASCII
> filenames.  With high-bit-set characters, even if I change stdin's
> encoding with:
> 
> import io
> STDIN = io.open(sys.stdin.fileno(), 'r', encoding='ISO-8859-1')

I suppose you can handle (b) with

STDIN = sys.stdin.buffer

or

STDIN = io.TextIOWrapper(sys.stdin.buffer,
 encoding=sys.getfilesystemencoding())

in Python 3. I'd prefer the latter because it makes your assumptions 
explicit. (Disclaimer: I'm not sure whether I'm using the io API as Guido 
intended it)

> ...even with that, when I read a filename from stdin with a
> single-character Spanish n~, the program cannot open that filename
> because the n~ is apparently internally converted to two bytes, but
> remains one byte in the filesystem.  I decided to try ISO-8859-1 with
> Python 3, because I have a Java program that encountered a similar
> problem until I used en_US.ISO-8859-1 in an environment variable to
> set the JVM's encoding for stdin.
> 
> Python 2 shows the n~ as 0xf1 in an os.listdir('.').  Python 3 with an
> encoding of ISO-8859-1 wants it to be 0xc3 followed by 0xb1.
> 
> Does anyone know what I need to do to read filenames from stdin with
> Python 3.1 and subsequently open them, when some of those filenames
> include characters with their high bit set?
> 
> TIA!


-- 
http://mail.python.org/mailman/listinfo/python-list


Memory issues when storing as List of Strings vs List of List

2010-11-30 Thread OW Ghim Siong

Hi all,

I have a big file 1.5GB in size, with about 6 million lines of 
tab-delimited data. I have to perform some filtration on the data and 
keep the good data. After filtration, I have about 5.5 million data left 
remaining. As you might already guessed, I have to read them in batches 
and I did so using .readlines(1). After reading each batch, I 
will split the line (in string format) to a list using .split("\t") and 
then check several conditions, after which if all conditions are 
satisfied, I will store the list into a matrix.


The code is as follows:
-Start--
a=open("bigfile")
matrix=[]
while True:
   lines = a.readlines(1)
   for line in lines:
   data=line.split("\t")
   if several_conditions_are_satisfied:
   matrix.append(data)
   print "Number of lines read:", len(lines), "matrix.__sizeof__:", 
matrix.__sizeof__()

   if len(lines)==0:
   break
-End-

Results:
Number of lines read: 461544 matrix.__sizeof__: 1694768
Number of lines read: 449840 matrix.__sizeof__: 3435984
Number of lines read: 455690 matrix.__sizeof__: 5503904
Number of lines read: 451955 matrix.__sizeof__: 6965928
Number of lines read: 452645 matrix.__sizeof__: 8816304
Number of lines read: 448555 matrix.__sizeof__: 9918368

Traceback (most recent call last):
MemoryError

The peak memory usage at the task manager is > 2GB which results in the 
memory error.


However, if I modify the code, to store as a list of string rather than 
a list of list by changing the append statement stated above to 
"matrix.append("\t".join(data))", then I do not run out of memory.


Results:
Number of lines read: 461544 matrix.__sizeof__: 1694768
Number of lines read: 449840 matrix.__sizeof__: 3435984
Number of lines read: 455690 matrix.__sizeof__: 5503904
Number of lines read: 451955 matrix.__sizeof__: 6965928
Number of lines read: 452645 matrix.__sizeof__: 8816304
Number of lines read: 448555 matrix.__sizeof__: 9918368
Number of lines read: 453455 matrix.__sizeof__: 12552984
Number of lines read: 432440 matrix.__sizeof__: 14122132
Number of lines read: 432921 matrix.__sizeof__: 15887424
Number of lines read: 464259 matrix.__sizeof__: 17873376
Number of lines read: 450875 matrix.__sizeof__: 20107572
Number of lines read: 458552 matrix.__sizeof__: 20107572
Number of lines read: 453261 matrix.__sizeof__: 22621044
Number of lines read: 413456 matrix.__sizeof__: 22621044
Number of lines read: 166464 matrix.__sizeof__: 25448700
Number of lines read: 0 matrix.__sizeof__: 25448700

In this case, the peak memory according to the task manager is about 1.5 GB.

Does anyone know why is there such a big difference memory usage when 
storing the matrix as a list of list, and when storing it as a list of 
string? According to __sizeof__ though, the values are the same whether 
storing it as a list of list, or storing it as a list of string. Is 
there any methods how I can store all the info into a list of list? I 
have tried creating such a matrix of equivalent size and it only uses 
35mb of memory but I am not sure why when using the code above, the 
memory usage shot up so fast and exceeded 2GB.


Any advice is greatly appreciated.

Regards,
Jinxiang
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 encoding question: Read a filename from stdin, subsequently?open that filename

2010-11-30 Thread Marc Christiansen
Dan Stromberg  wrote:
> I've got a couple of programs that read filenames from stdin, and then
> open those files and do things with them.  These programs sort of do
> the *ix xargs thing, without requiring xargs.
> 
> In Python 2, these work well.  Irrespective of how filenames are
> encoded, things are opened OK, because it's all just a stream of
> single byte characters.
> 
> In Python 3, I'm finding that I have encoding issues with characters
> with their high bit set.  Things are fine with strictly ASCII
> filenames.  With high-bit-set characters, even if I change stdin's
> encoding with:
> 
>       import io
>       STDIN = io.open(sys.stdin.fileno(), 'r', encoding='ISO-8859-1')
> 
> ...even with that, when I read a filename from stdin with a
> single-character Spanish n~, the program cannot open that filename
> because the n~ is apparently internally converted to two bytes, but
> remains one byte in the filesystem.  I decided to try ISO-8859-1 with
> Python 3, because I have a Java program that encountered a similar
> problem until I used en_US.ISO-8859-1 in an environment variable to
> set the JVM's encoding for stdin.
> 
> Python 2 shows the n~ as 0xf1 in an os.listdir('.').  Python 3 with an
> encoding of ISO-8859-1 wants it to be 0xc3 followed by 0xb1.
> 
> Does anyone know what I need to do to read filenames from stdin with
> Python 3.1 and subsequently open them, when some of those filenames
> include characters with their high bit set?
> 
> TIA!

Try using sys.stdin.buffer instead of sys.stdin. It gives you bytes
instead of strings. Also use byteliterals instead of stringliterals for
paths, i.e. os.listdir(b'.').

Marc
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Possible to determine number of rows affected by a SQLite update or delete command?

2010-11-30 Thread Kushal Kumaran
On Tue, Nov 30, 2010 at 2:29 PM,   wrote:
> Is there a cursor or connection property that returns the number of rows
> affected by a SQLite update or delete command?
>

The cursor has a rowcount attribute.  The documentation of the sqlite3
module says the implementation is "quirky".  You might take a look at
it and see if it fits your needs.

> Or, if we want this information, do we have to pre-query our database for a
> count of records that will be affected by an operation?
>

-- 
regards,
kushal
-- 
http://mail.python.org/mailman/listinfo/python-list


Possible to determine number of rows affected by a SQLite update or delete command?

2010-11-30 Thread python
Is there a cursor or connection property that returns the number
of rows affected by a SQLite update or delete command?

Or, if we want this information, do we have to pre-query our
database for a count of records that will be affected by an
operation?

Thank you,
Malcolm
-- 
http://mail.python.org/mailman/listinfo/python-list