Re: begin to parse a web page not entirely downloaded

2007-02-08 Thread k0mp
On Feb 8, 8:02 pm, Leif K-Brooks <[EMAIL PROTECTED]> wrote:
> k0mp wrote:
> > It seems to take more time when I use read(size) than just read.
> > I think in both case urllib.openurl retrieve the whole page.
>
> Google's home page is very small, so it's not really a great test of
> that. Here's a test downloading the first 512 bytes of an Ubuntu ISO
> (beware of wrap):
>
> $ python -m timeit -n1 -r1 "import urllib"
> "urllib.urlopen('http://ubuntu.cs.utah.edu/releases/6.06/ubuntu-6.06.1-desktop-i386.is...)"
> 1 loops, best of 1: 596 msec per loop

OK, you convince me. The fact that I haven't got better results in my
test with read(512) must be because what takes most of the time is the
response time of the server, not the data transfer on the network.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: begin to parse a web page not entirely downloaded

2007-02-08 Thread k0mp
On Feb 8, 8:06 pm, Björn Steinbrink <[EMAIL PROTECTED]> wrote:
> On Thu, 08 Feb 2007 10:20:56 -0800, k0mp wrote:
> > On Feb 8, 6:54 pm, Leif K-Brooks <[EMAIL PROTECTED]> wrote:
> >> k0mp wrote:
> >> > Is there a way to retrieve a web page and before it is entirely
> >> > downloaded, begin to test if a specific string is present and if yes
> >> > stop the download ?
> >> > I believe that urllib.openurl(url) will retrieve the whole page before
> >> > the program goes to the next statement.
>
> >> Use urllib.urlopen(), but call .read() with a smallish argument, e.g.:
>
> >>  >>> foo = urllib.urlopen('http://google.com')
> >>  >>> foo.read(512)
> >> ' ...
>
> >> foo.read(512) will return as soon as 512 bytes have been received. You

> >> can keep caling it until it returns an empty string, indicating that
> >> there's no more data to be read.
>
> > Thanks for your answer :)
>
> > I'm not sure that read() works as you say.
> > Here is a test I've done :
>
> > import urllib2
> > import re
> > import time
>
> > CHUNKSIZE = 1024
>
> > print 'f.read(CHUNK)'
> > print time.clock()
>
> > for i in range(30) :
> > f = urllib2.urlopen('http://google.com')
> > while True:   # read the page using a loop
> > chunk = f.read(CHUNKSIZE)
> > if not chunk: break
> > m = re.search('', chunk )
> > if m != None :
> > break
>
> > print time.clock()
>
> > print
>
> > print 'f.read()'
> > print time.clock()
> > for i in range(30) :
> > f = urllib2.urlopen('http://google.com')
> > m = re.search('', f.read() )
> > if m != None :
> > break
>
> A fair comparison would use "pass" here. Or a while loop as in the
> other case. The way it is, it compares 30 times read(CHUNKSIZE)
> against one time read().
>
> Björn

That's right my test was false. I've replaced http://google.com with
http://aol.com
And the 'break' in the second loop with 'continue' ( because when the
string is found I don't want the rest of the page to be parsed.

I obtain this :
f.read(CHUNK)
0.1
0.17

f.read()
0.17
0.23


f.read() is still faster than f.read(CHUNK)


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: begin to parse a web page not entirely downloaded

2007-02-08 Thread k0mp
On Feb 8, 6:54 pm, Leif K-Brooks <[EMAIL PROTECTED]> wrote:
> k0mp wrote:
> > Is there a way to retrieve a web page and before it is entirely
> > downloaded, begin to test if a specific string is present and if yes
> > stop the download ?
> > I believe that urllib.openurl(url) will retrieve the whole page before
> > the program goes to the next statement.
>
> Use urllib.urlopen(), but call .read() with a smallish argument, e.g.:
>
>  >>> foo = urllib.urlopen('http://google.com')
>  >>> foo.read(512)
> ' ...
>
> foo.read(512) will return as soon as 512 bytes have been received. You
> can keep caling it until it returns an empty string, indicating that
> there's no more data to be read.

Thanks for your answer :)

I'm not sure that read() works as you say.
Here is a test I've done :

import urllib2
import re
import time

CHUNKSIZE = 1024

print 'f.read(CHUNK)'
print time.clock()

for i in range(30) :
f = urllib2.urlopen('http://google.com')
while True:   # read the page using a loop
chunk = f.read(CHUNKSIZE)
if not chunk: break
m = re.search('', chunk )
if m != None :
break

print time.clock()

print

print 'f.read()'
print time.clock()
for i in range(30) :
f = urllib2.urlopen('http://google.com')
m = re.search('', f.read() )
if m != None :
break

print time.clock()


It prints that :
f.read(CHUNK)
0.1
0.31

f.read()
0.31
0.32


It seems to take more time when I use read(size) than just read.
I think in both case urllib.openurl retrieve the whole page.

-- 
http://mail.python.org/mailman/listinfo/python-list


begin to parse a web page not entirely downloaded

2007-02-08 Thread k0mp
Hi,

Is there a way to retrieve a web page and before it is entirely
downloaded, begin to test if a specific string is present and if yes
stop the download ?
I believe that urllib.openurl(url) will retrieve the whole page before
the program goes to the next statement. I suppose I would be able to
do what I want by using the sockets module, but I'm sure there's a
simpler way to do it.

-- 
http://mail.python.org/mailman/listinfo/python-list


how to trace threads ?

2006-07-05 Thread k0mp
Hi,
First, sorry for the dumb question. I'm trying to find out how to trace
threads. There is a settrace function in the threading module but I
can't figure out how to use it. Is there someone who can show me a
sample ?

Thanks,
Kathan

-- 
http://mail.python.org/mailman/listinfo/python-list