Re: Object cleanup

2012-05-31 Thread psaff...@googlemail.com
Thanks for all the responses. It looks like none of the BeautifulSoup objects have __del__ methods, so I don't think that can be the problem. To answer your other question, guppy was the best match I came up with when looking for a memory profile for Python (or more specifically "Heapy"): http

Object cleanup

2012-05-30 Thread psaff...@googlemail.com
I am writing a screen scraping application using BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/ (which is fantastic, by the way). I have an object that has two methods, each of which loads an HTML document and scrapes out some information, putting strings from the HTML documents i

Overlapping region resolution

2009-05-21 Thread psaff...@googlemail.com
This may be an algorithmic question, but I'm trying to code it in Python, so... I have a list of pairwise regions, each with an integer start and end and a float data point. There may be overlaps between the regions. I want to resolve this into an ordered list with no overlapping regions. My init

Re: CSV performance

2009-04-29 Thread psaff...@googlemail.com
> > rows = fh.read().split() > coords = numpy.array(map(int, rows[1::3]), dtype=int) > points = numpy.array(map(float, rows[2::3]), dtype=float) > chromio.writelines(map(chrommap.__getitem__, rows[::3])) > My original version is about 15 seconds. This version is about 9. The chunks version posted

Multiprocessing Pool and functions with many arguments

2009-04-29 Thread psaff...@googlemail.com
I'm trying to get to grips with the multiprocessing module, having only used ParallelPython before. based on this example: http://docs.python.org/library/multiprocessing.html#using-a-pool-of-workers what happens if I want my "f" to take more than one argument? I want to have a list of tuples of

Re: CSV performance

2009-04-27 Thread psaff...@googlemail.com
Thanks for your replies. Many apologies for not including the right information first time around. More information is below. I have tried running it just on the csv read: import time import csv afile = "largefile.txt" t0 = time.clock() print "working at file", afile reader = csv.reader(open(a

CSV performance

2009-04-27 Thread psaff...@googlemail.com
I'm using the CSV library to process a large amount of data - 28 files, each of 130MB. Just reading in the data from one file and filing it into very simple data structures (numpy arrays and a cstringio) takes around 10 seconds. If I just slurp one file into a string, it only takes about a second,

mod_python form upload: permission denied sometimes...

2009-04-24 Thread psaff...@googlemail.com
I have a mod_python application that takes a POST file upload from a form. It works fine from my machine, other machines in my office and my home machine. It does not work from my bosses machine in a different city - he gets "You don't have permission to access this on this server". In the logs, i

Parallel processing on shared data structures

2009-03-19 Thread psaff...@googlemail.com
I'm filing 160 million data points into a set of bins based on their position. At the moment, this takes just over an hour using interval trees. I would like to parallelise this to take advantage of my quad core machine. I have some experience of Parallel Python, but PP seems to only really work fo

Re: Memory efficient tuple storage

2009-03-19 Thread psaff...@googlemail.com
In the end, I used a cStringIO object to store the chromosomes - because there are only 23, I can use one character for each chromosome and represent the whole lot with a giant string and a dictionary to say what each character means. Then I used numpy arrays for the data and coordinates. This sque

Re: Memory efficient tuple storage

2009-03-13 Thread psaff...@googlemail.com
Thanks for all the replies. First of all, can anybody recommend a good way to show memory usage? I tried heapy, but couldn't make much sense of the output and it didn't seem to change too much for different usages. Maybe I was just making the h.heap() call in the wrong place. I also tried getrusag

Memory efficient tuple storage

2009-03-13 Thread psaff...@googlemail.com
I'm reading in some rather large files (28 files each of 130MB). Each file is a genome coordinate (chromosome (string) and position (int)) and a data point (float). I want to read these into a list of coordinates (each a tuple of (chromosome, position)) and a list of data points. This has taught m

Re: Which core am I running on?

2009-02-09 Thread psaff...@googlemail.com
On 9 Feb, 12:24, Gerhard Häring wrote: > http://objectmix.com/python/631346-parallel-python.html > Hmm. In fact, this doesn't seem to work for pp. When I run the code below, it says everything is running on the one core. import pp import random import time from string import lowercase ncpus = 3

Too many open files

2009-02-09 Thread psaff...@googlemail.com
I'm building a pipeline involving a number of shell tools. In each case, I create a temporary file using tempfile.mkstmp() and invoke a command ("cmd < /tmp/tmpfile") on it using subprocess.Popen. At the end of each section, I call close() on the file handles and use os.remove() to delete them. Ev

Re: Which core am I running on?

2009-02-09 Thread psaff...@googlemail.com
On 9 Feb, 12:24, Gerhard Häring wrote: > Looks like I have answered a similar question once, btw. ;-) > Ah, yes - thanks. I did Google for it, but obviously didn't have the right search term. Cheers, Peter -- http://mail.python.org/mailman/listinfo/python-list

Which core am I running on?

2009-02-09 Thread psaff...@googlemail.com
Is there some way I can get at this information at run-time? I'd like to use it to tag diagnostic output dumped during runs using Parallel Python. Peter -- http://mail.python.org/mailman/listinfo/python-list

Re: subprocess.Popen stalls

2009-01-12 Thread psaff...@googlemail.com
On 12 Jan, 15:33, mk wrote: > > Better use communicate() method: > Oh yes - it's right there in the documentation. That worked perfectly. Many thanks, Peter -- http://mail.python.org/mailman/listinfo/python-list

subprocess.Popen stalls

2009-01-12 Thread psaff...@googlemail.com
I'm building a bioinformatics application using the ipcress tool: http://www.ebi.ac.uk/~guy/exonerate/ipcress.man.html I'm using subprocess.Popen to execute ipcress, which takes a group of files full of DNA sequences and returns some analysis on them. Here's a code fragment: cmd = "/usr/bin/ipcr

Re: mod_python: delay in files changing after alteration

2009-01-12 Thread psaff...@googlemail.com
On 6 Jan, 23:31, Graham Dumpleton wrote: > Thus, any changes to modules/packages installed on sys.path require a > full restart of Apache to ensure they are loaded by all Apache child > worker processes. > That will be it. I'm pulling in some libraries of my own from elsewhere, which are still b

mod_python: delay in files changing after alteration

2009-01-05 Thread psaff...@googlemail.com
Maybe this is an apache question, in which case apologies. I am running mod_python 3.3.1-3 on apache 2.2.9-7. It works fine, but I find that when I alter a source file during development, it sometimes takes 5 seconds or so for the changes to be seen. This might sound trivial, but when debugging te

Re: Selecting a different superclass

2008-12-18 Thread psaff...@googlemail.com
On 17 Dec, 20:33, "Chris Rebert" wrote: > superclass = TraceablePointSet if tracing else PointSet > Perfect - many thanks. Good to know I'm absolved from evil, also ;) Peter -- http://mail.python.org/mailman/listinfo/python-list

Selecting a different superclass

2008-12-17 Thread psaff...@googlemail.com
This might be a pure OO question, but I'm doing it in Python so I'll ask here. I'm writing a number crunching bioinformatics application. Read lots of numbers from files; merge, median and munge; draw plots. I've found that the most critical part of this work is validation and traceability - "wher