subject:"Re\: \[Tutor\] reading random line from a file"

Re: [Tutor] reading random line from a file

2007-07-23 Thread Tiger12506

> Significance of number 4096 :
> file is stored in blocks of size 2K/4K/8K (depending
> upon the machine). file seek for an offset goes block
> by block rather than byte by byte. Hence for file size
> < 4096 (assuming you have 4K block size), you will
> anyway end up scanning it entirely so as well load it
> up in memory.

Mmmm... It depends on the file system. FAT/FAT32 will
read as small a block as a sector size, i.e. 512 bytes. I think
I read somewhere that NTFS is 4K. Ridiculous waste i think.

Anyway... It's dangerous to think one way or the other about
boundaries like that. The only way that 4096 can help you is if you
only start reading on boundary lines, and disable buffering on the OS
level. Otherwise, you will get double and triple buffering occurring.
Perhaps python takes care of this, but it's doubtful. C doesn't by default,
and since python programmers often aren't of the background to
grok how the OS caches reads, it would be extra overhead for a
special case of that most aren't aware.

Mmmm... The OS will read all of those characters in anyway right? 4K.
But if you ask for the data byte by byte, it will copy it to your pointer
byte by byte from the cache instead of copying all of the memory.

Anyway... all this is making my head hurt because I can't quite remember
how it works. (When I last read information about this, I didn't understand 
it's
significance to my programming.)


> But I
> just want to add that since index creation is quite a
> laborious task (in terms of CPU/time) one should do it
> only once (or till file is changed).

Agreed, but it is still better to make the index once at program
start, rather than search through each time a line is requested.

> Thus it should be
> kept on disk and ensure that index is re-created in
> case file changes.

That's a good idea. Especially for large files.

> I would like suggestions on index
> creation.

Creating an index is easy. There are many ways. Here is one.

file_index=[0]
for line in fobj:
file_index.append(len(line)+file_index[-1]) 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-20 Thread Aditya Lal

A bug:
The function random.randint(a,b) include both ends
i.e. b is also included. Thus for file with single
line a=0,b=1 my algo will give an IndexError.

Significance of number 4096 :
file is stored in blocks of size 2K/4K/8K (depending
upon the machine). file seek for an offset goes block
by block rather than byte by byte. Hence for file size
< 4096 (assuming you have 4K block size), you will
anyway end up scanning it entirely so as well load it
up in memory.

Luke suggestion for Index:
I think its an implicit need to give equal probability
to each line. Taking an example - suppose we are
trying to find "quote of the day" from a dictionary of
quotations which may contain 100s of thousands of
quotes. We would like to see a new one each time on
invocation rather than favour the longest one.
So, creating an index is the right solution. But I
just want to add that since index creation is quite a
laborious task (in terms of CPU/time) one should do it
only once (or till file is changed). Thus it should be
kept on disk and ensure that index is re-created in
case file changes. I would like suggestions on index
creation.

--- Luke Paireepinart <[EMAIL PROTECTED]> wrote:

> bhaaluu wrote:
> > Greetings,
> > Thanks for including the complete source code!
> > It really helps to have something that works to
> look at.
> > I modified an earlier version of this to run on my
> > computer (GNU/Linux; Python 2.4.3).
> >   
> I think the best strategy for this problem would be
> to build an index of 
> the offset of the start of each line, and then
> randomly select from this 
> list.
> that makes each line equally probable, and you can
> set up your class so 
> that the index is only built on the first call to
> the function.
> -Luke
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

Choose the right car based on your needs.  Check out Yahoo! Autos new Car 
Finder tool.
http://autos.yahoo.com/carfinder/
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-19 Thread Tiger12506

> I think the best strategy for this problem would be to build an index of
> the offset of the start of each line, and then randomly select from this
> list.
> that makes each line equally probable, and you can set up your class so
> that the index is only built on the first call to the function.
> -Luke

Oh fudge! I knew there was a "best-way-to-do-it". Now I'm upset cuz i didn't 
think of it first. ;-)

JS 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-19 Thread Luke Paireepinart

bhaaluu wrote:
> Greetings,
> Thanks for including the complete source code!
> It really helps to have something that works to look at.
> I modified an earlier version of this to run on my
> computer (GNU/Linux; Python 2.4.3).
>   
I think the best strategy for this problem would be to build an index of 
the offset of the start of each line, and then randomly select from this 
list.
that makes each line equally probable, and you can set up your class so 
that the index is only built on the first call to the function.
-Luke

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-19 Thread bhaaluu

Greetings,
Thanks for including the complete source code!
It really helps to have something that works to look at.
I modified an earlier version of this to run on my
computer (GNU/Linux; Python 2.4.3).
The os.module() stuff is new for me, so I played
around with it... here's my modified file just in case
another beginner is interested:

#!/usr/bin/env python
"""
2007-07-19
Modified snippet from Tutor mailing list.
Example of checking to see if a file exists, using 'os.path()'
Examples of using 'os.system()' with wget and mv commmands.
Lookup: enumerate(), random.randint()
This is obviously NOT cross-platform. Works on GNU/Linux.
b h a a l u u at g m a i l dot c o m
"""
import os
import random

def randline():
 text = 'list_classifiers'
 if not os.path.exists(text):
   os.system('wget
http://cheeseshop.python.org/pypi?%3Aaction=list_classifiers ')
   os.system('mv pypi\?\:action\=list_classifiers list_classifiers')
 f = file(text, 'rb')
 for i,j in enumerate(f):
if random.randint(0,i) == i:
   line = j
 f.close()
 return line

while True:
 print "\n", randline()
 answer = raw_input("Another line? [y/n]: ")
 if answer == 'n':
   break

Happy Programming!
--
bhaaluu at gmail dot com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-19 Thread Aditya Lal

Sorry, I did not see the other thread in which this
approach has already been covered. The point Kent has
raised about going into infinite loop with file having
single line is very true.

Following is the corrected version (for completeness
sake) -

import os,random

def getrandfromMem(filename) :
  fd = file(filename,'rb')
  l = fd.readlines()
  pos = random.randint(0,len(l))
  fd.close()
  return (pos,l[pos])

def getrandomline2(filename) :
  filesize = os.stat(filename)[6]
  if filesize < 4096 :  # Seek may not be very useful
return getrandfromMem(filename)

  fd = file(filename,'rb')
  for _ in range(10) : # Try 10 times
pos = random.randint(0,filesize)
fd.seek(pos)
fd.readline()  # Read and ignore
line = fd.readline()
if line != '' :
   break

  if line != '' :
return (pos,line)
  else :
getrandfromMem(filename)

getrandomline2("shaks12.txt")

Caveat : It will still skip 1st line during random
selection if its size exceed 4096 chars !!


--- Aditya Lal <[EMAIL PROTECTED]> wrote:

> An alternative approach (I found the Yorick's code
> to
> be too slow for large # of calls) :
> 
> We can use file size to pick a random point in the
> file. We can read and ignore text till next new
> line.
> This will avoid outputting partial lines. Return the
> next line (which I guess is still random :)). 
> 
> Indicative code -
> 
> import os,random
> 
> def getrandomline(filename) :
>   offset = random.randint(0,os.stat(filename)[6])
>   fd = file(filename,'rb')
>   fd.seek(offset)
>   fd.readline()  # Read and ignore
>   return fd.readline()
> 
> getrandomline("shaks12.txt")
> 
> Caveat: The above code will never choose 1st line
> and
> will return '' for last line. Other than the
> boundary
> conditions it will work well (even for large files).
> 
> 
> Interestingly :
> 
> On modifying this code to take in file object rather
> than filename, the performance improved by ~50%. On
> wrapping it in a class, it further improved by ~25%.
> 
> On executing the get random line 100,000 times on
> large file (size 2707519 with 9427 lines), the class
> version finished < 5 seconds.
> 
> Platform : 2GHz Intel Core 2 Duo macBook (2GB RAM)
> running Mac OSX (10.4.10).
> 
> Output using python 2.5.1 (stackless)
> 
> Approach using enum approach : 9.55798196793 : for
> [100] iterations
> Approach using filename : 11.552863121 : for
> [10]
> iterations
> Approach using file descriptor : 5.97015094757 : for
> [10] iterations
> Approach using class : 4.46039891243 : for [10]
> iterations
> 
> Output using python 2.3.5 (default python on OSX)
> 
> Approach using enum approach : 12.2886080742 : for
> [100] iterations
> Approach using filename : 12.5682640076 : for
> [10]
> iterations
> Approach using file descriptor : 6.55952501297 : for
> [10] iterations
> Approach using class : 5.35413718224 : for [10]
> iterations
> 
> I am attaching test program FYI.
> 
> --
> Aditya
> 
> --- Nathan Coulter
> <[EMAIL PROTECTED]> wrote:
> 
> > >  ---Original Message---
> > >  From: Tiger12506 <[EMAIL PROTECTED]>
> > 
> > >  Yuck. Talk about a one shot function! Of course
> > it only reads through the
> > >  file once! You only call the function once. Put
> a
> > second print randline(f)
> > >  at the bottom of your script and see what
> happens
> > :-)
> > >  
> > >  JS
> > >  
> > 
> > *sigh*
> > 
> > #!/bin/env python
> > 
> > import os
> > import random
> > 
> > text = 'shaks12.txt'
> > if not os.path.exists(text):
> >   os.system('wget
> >
> http://www.gutenberg.org/dirs/etext94/shaks12.txt')
> > 
> > def randline(f):
> > for i,j in enumerate(file(f, 'rb')):
> > if random.randint(0,i) == i:
> > line = j
> > return line
> > 
> > print randline(text)
> > print randline(text)
> > print randline(text)
> > 
> > -- 
> > Yorick
> > ___
> > Tutor maillist  -  Tutor@python.org
> > http://mail.python.org/mailman/listinfo/tutor
> > 
> 
> 
> 
>  
>

> Sucker-punch spam with award-winning protection. 
> Try the free Yahoo! Mail Beta.
>
http://advision.webevents.yahoo.com/mailbeta/features_spam.html>
import os
> import random
> 
> class randomline :
>   
>   def __init__(self, filename="largefile.txt") :
>   self.filesize = os.stat(filename)[6]
>   self.fd = file(filename, 'rb')
> 
>   def getline(self) :
>   offset = random.randint(0,self.filesize)
>   self.fd.seek(offset)
>   self.fd.readline()
>   line = self.fd.readline()
>   return (offset,line)
>   
>   def close(self) :
>   self.fd.close()
> 
> # Uses file name
> def getrandomline(filename) :
>   offset = random.randint(0,os.stat(filename)[6])
>   fd = file(filename, 'rb')
>   fd.seek(offset)
>   ret = (offset,fd.readline())
>   fd.close()
>   re

Re: [Tutor] reading random line from a file

2007-07-19 Thread Aditya Lal

An alternative approach (I found the Yorick's code to
be too slow for large # of calls) :

We can use file size to pick a random point in the
file. We can read and ignore text till next new line.
This will avoid outputting partial lines. Return the
next line (which I guess is still random :)). 

Indicative code -

import os,random

def getrandomline(filename) :
  offset = random.randint(0,os.stat(filename)[6])
  fd = file(filename,'rb')
  fd.seek(offset)
  fd.readline()  # Read and ignore
  return fd.readline()

getrandomline("shaks12.txt")

Caveat: The above code will never choose 1st line and
will return '' for last line. Other than the boundary
conditions it will work well (even for large files). 

Interestingly :

On modifying this code to take in file object rather
than filename, the performance improved by ~50%. On
wrapping it in a class, it further improved by ~25%.

On executing the get random line 100,000 times on
large file (size 2707519 with 9427 lines), the class
version finished < 5 seconds.

Platform : 2GHz Intel Core 2 Duo macBook (2GB RAM)
running Mac OSX (10.4.10).

Output using python 2.5.1 (stackless)

Approach using enum approach : 9.55798196793 : for
[100] iterations
Approach using filename : 11.552863121 : for [10]
iterations
Approach using file descriptor : 5.97015094757 : for
[10] iterations
Approach using class : 4.46039891243 : for [10]
iterations

Output using python 2.3.5 (default python on OSX)

Approach using enum approach : 12.2886080742 : for
[100] iterations
Approach using filename : 12.5682640076 : for [10]
iterations
Approach using file descriptor : 6.55952501297 : for
[10] iterations
Approach using class : 5.35413718224 : for [10]
iterations

I am attaching test program FYI.

--
Aditya

--- Nathan Coulter
<[EMAIL PROTECTED]> wrote:

> >  ---Original Message---
> >  From: Tiger12506 <[EMAIL PROTECTED]>
> 
> >  Yuck. Talk about a one shot function! Of course
> it only reads through the
> >  file once! You only call the function once. Put a
> second print randline(f)
> >  at the bottom of your script and see what happens
> :-)
> >  
> >  JS
> >  
> 
> *sigh*
> 
> #!/bin/env python
> 
> import os
> import random
> 
> text = 'shaks12.txt'
> if not os.path.exists(text):
>   os.system('wget
> http://www.gutenberg.org/dirs/etext94/shaks12.txt')
> 
> def randline(f):
> for i,j in enumerate(file(f, 'rb')):
> if random.randint(0,i) == i:
> line = j
> return line
> 
> print randline(text)
> print randline(text)
> print randline(text)
> 
> -- 
> Yorick
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 



 

Sucker-punch spam with award-winning protection. 
Try the free Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/features_spam.htmlimport os
import random

class randomline :
	
	def __init__(self, filename="largefile.txt") :
		self.filesize = os.stat(filename)[6]
		self.fd = file(filename, 'rb')

	def getline(self) :
		offset = random.randint(0,self.filesize)
		self.fd.seek(offset)
		self.fd.readline()
		line = self.fd.readline()
		return (offset,line)
	
	def close(self) :
		self.fd.close()

# Uses file name
def getrandomline(filename) :
	offset = random.randint(0,os.stat(filename)[6])
	fd = file(filename, 'rb')
	fd.seek(offset)
	ret = (offset,fd.readline())
	fd.close()
	return ret

# Uses file descriptor
def getrandline(fd) :
	offset = random.randint(0,os.fstat(fd.fileno())[6])
	fd.seek(offset)
	line = fd.readline()
	return (offset,fd.readline())

# Uses enumeration
def randline(fd):
	for i,j in enumerate(fd) :
		if random.randint(0,i) == i:
			line = j
	fd.seek(0)
	return line


if __name__ == '__main__' :

	# Substitute your file name
	filename = "largefile.txt"

	# Class
	rd = randomline(filename)
	print rd.getline()
	rd.close()

	# file name
	print getrandomline(filename)

	# file descriptor
	fd = file(filename,'rb')
	print getrandline(fd)
	fd.close()

	# Using enum approach
	fd = file(filename,'rb')
	print randline(fd)
	fd.close()

	from timeit import Timer 
	t_class = Timer('rd.getline()', 'from __main__ import randomline ; rd = randomline("'+filename+'")')
	t_filename = Timer('getrandomline("'+filename+'")', 'from __main__ import getrandomline')
	t_fd = Timer('getrandline(fd)', 'from __main__ import getrandline ; fd = file("'+filename+'")')
	t_enum = Timer('randline(fd)', 'from __main__ import randline ; fd = file("'+filename+'")')

	print 'Approach using enum approach : %s : for [%d] iterations' % (str(t_enum.timeit(100)),100)
	print 'Approach using filename : %s : for [%d] iterations' % (str(t_filename.timeit(10)),10)
	print 'Approach using file descriptor : %s : for [%d] iterations' % (str(t_fd.timeit(10)),10)
	print 'Approach using class : %s : for [%d] iterations' % (str(t_class.timeit(10)),10)

___

Re: [Tutor] reading random line from a file

2007-07-18 Thread Nathan Coulter

>  ---Original Message---
>  From: Tiger12506 <[EMAIL PROTECTED]>

>  Yuck. Talk about a one shot function! Of course it only reads through the
>  file once! You only call the function once. Put a second print randline(f)
>  at the bottom of your script and see what happens :-)
>  
>  JS
>  

*sigh*

#!/bin/env python

import os
import random

text = 'shaks12.txt'
if not os.path.exists(text):
  os.system('wget http://www.gutenberg.org/dirs/etext94/shaks12.txt')

def randline(f):
for i,j in enumerate(file(f, 'rb')):
if random.randint(0,i) == i:
line = j
return line

print randline(text)
print randline(text)
print randline(text)

-- 
Yorick
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-18 Thread Tiger12506

> import os
> import random
>
> text = 'shaks12.txt'
> if not os.path.exists(text):
>  os.system('wget http://www.gutenberg.org/dirs/etext94/shaks12.txt')
>
> def randline(f):
>for i,j in enumerate(file(f, 'rb')):

Alright. But put randline in a loop and you open a lot of file handles. 
Thank goodness python has GB.
Seperate variable, open file at start, close file at end.
So the file is read every time  you call randline. At least as far as the 
line chosen.
Whereas my version only reads at absolute most twice the same line.
And it will run faster. Searching through the file lines to find the one 
who's index matches i is time-consuming.
Yes, my version will favor longer lines, but I don't think that seriously 
strict randomization is necessary?
IMHO, memory and speed are more important here. (You must forgive me a 
little, I've been studying C and assembly)

I'm just proud of my function ;-)

JS 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-18 Thread Tiger12506

Yuck. Talk about a one shot function! Of course it only reads through the 
file once! You only call the function once. Put a second print randline(f) 
at the bottom of your script and see what happens :-)

JS

> This method only keeps one line in memory, only reads through the file 
> once, and does not favor lines based on any characteristic of the line. 
> It's probably fast enough to not even bother keeping an index around:
>
> #!/bin/env python
>
> import os
> import random
>
> text = 'shaks12.txt'
> if not os.path.exists(text):
>   os.system('wget http://www.gutenberg.org/dirs/etext94/shaks12.txt')
>
> f = file(text, 'rb')
>
> def randline(f):
>   for i,j in enumerate(f):
>  if random.randint(0,i) == i:
> line = j
>   return line
>
> print randline(f)

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-18 Thread Poor Yorick

>  ---Original Message---
>  From: Kent Johnson <[EMAIL PROTECTED]>
>  Subject: Re: [Tutor] reading random line from a file
>  Sent: 2007-07-18 10:19
>  
[SNIP]
>  
>  It probably doesn't matter, but this will pick longer lines more often
>  than short ones.
>  

This method only keeps one line in memory, only reads through the file once, 
and does not favor lines based on any characteristic of the line.  It's 
probably fast enough to not even bother keeping an index around:

#!/bin/env python 

import os
import random

text = 'shaks12.txt'
if not os.path.exists(text):
   os.system('wget http://www.gutenberg.org/dirs/etext94/shaks12.txt')

f = file(text, 'rb')

def randline(f):
   for i,j in enumerate(f):
  if random.randint(0,i) == i:
 line = j 
   return line

print randline(f) 


--- 
Yorick
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-18 Thread Kent Johnson

Tiger12506 wrote:
> If you truly wish to kill yourself trying to make it as efficient memory as
> possible, then you can follow this example. (This is more like what I would
> write in C).
> The getrandomline function picks a random byte between the beginning and the
> end of the file, then backs up until the beginning of the line and uses
> readline to return the whole line.

It probably doesn't matter, but this will pick longer lines more often 
than short ones.

> I tested it :-)

Hmm. What happens if you run it on a file with only one line? (see below)
> 
> 
> #
> from os import stat
> from random import randint
> 
> def getrandomline(f, length):
> pos = randint(0,length)
> f.seek(pos)
> while f.read(1)!='\n':
> try:
>   f.seek(-2,1)
> except IOError:   # This is to catch seeking before the
> beginning of the file
>   f.seek(0)

I think you need a break here to avoid an infinite loop.

Kent

> return f.readline()
> 
> f = file("quotes.txt","rb")
> sizeoffile = stat("quotes.txt")[6]
> 
> while (1):
>   print getrandomline(f, sizeoffile),
> 
> f.close()
> ###
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-17 Thread Tiger12506

> wow thats pretty cool :) it's a bit above my level but it's  interesting 
> :) thanks

I'm deeply flattered! Thank you.


>> #
>> from os import stat

os.stat returns a tuple whose 7th member is the size of the file. (python 
docs)

>> from random import randint

randint(a,b) returns a random integer between a and b, inclusive

>> def getrandomline(f, length):
>> pos = randint(0,length)

Picks a random position between the beginning and the end of the file.

>> f.seek(pos)

Seek to that position in the file - i.e. the next character read will be at 
that position in the file.

>> while f.read(1)!='\n':

This sets up the loop. Read a character, if it is a newline then break the 
loop

>> try:
>>   f.seek(-2,1)

However, if it's not a newline character, that means we are in the middle of 
a line, so we move the file position two characters back from where we just 
were. Two characters because f.read(1) moves the position forward one. One 
step forward, two steps back means read character right before. Continuing 
this loop means that eventually we will back up until we meet a newline 
character, that is, the beginning of the line where our randomly chosen 
character belongs.

>> except IOError: f.seek(0)

This is a special case where randint chose a character in the first line. 
Thinking about it a bit, we realize that backing up will never find a 
newline, and loop will never break. OOPS! I just realized a mistake I made. 
There should be a break afterwards.

except IOError:
  f.seek(0)
  break

See! Anyway. When you seek before the beginning of a file, an IOError is 
raised. I caught it here and set the file position properly. (The beginning 
of the first line in this special case)

>> return f.readline()

Since the file position is set at the beginning of a random line, the 
readline function will read that line and return it.

>> f = file("quotes.txt","rb")
>> sizeoffile = stat("quotes.txt")[6]

As I said above, the 7th member of the stat tuple gives the file size so 
that I can use it in randint

>> while (1):
>>   print getrandomline(f, sizeoffile),

Obviously you won't use this - it was just to maintain the program while I 
checked it's memory usage.

>> f.close()
>> ###

See! Not a bit above your level. ;-)

HTH,
JS 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-16 Thread Tiger12506

If you truly wish to kill yourself trying to make it as efficient memory as
possible, then you can follow this example. (This is more like what I would
write in C).
The getrandomline function picks a random byte between the beginning and the
end of the file, then backs up until the beginning of the line and uses
readline to return the whole line.
I tested it :-)


#
from os import stat
from random import randint

def getrandomline(f, length):
pos = randint(0,length)
f.seek(pos)
while f.read(1)!='\n':
try:
  f.seek(-2,1)
except IOError:   # This is to catch seeking before the
beginning of the file
  f.seek(0)
return f.readline()

f = file("quotes.txt","rb")
sizeoffile = stat("quotes.txt")[6]

while (1):
  print getrandomline(f, sizeoffile),

f.close()
###

This holds at 3,688 K mem usage, whereas with the same file (100,000 lines),
using readlines gives me 47,724 K.  Big difference. Maybe not important to
you, but I'm strange.

Hope this helps.

JS

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-16 Thread Tiger12506

Perhaps ~this~ is what you are worried about performance-wise?
Image NameMem Usage
-
python.exe11,096 K

That's not too bad considering ~this~
explorer.exe   14,356 K
svchost.exe24,000 K

And I worry about the mp3 player I wrote in C using 2,520 K
I keep thinking I could cut that down if I mess with the compiler settings 
;-)

I wouldn't worry about it too much. Reading the whole file in at once is a 
performance issue when you are dealing with millions and millions of lines 
of text. An example is DNA sequences. Or databases.

JS

> max baseman wrote:
>> cool thanks
>>
>> oh for performance eventualy i would like the file to contain many quotes
> Using readlines isn't exactly going to cause a performance bottleneck.
> I used the following code
> #make the file.py
> f = file("temp.txt","w")
> x = 10
> while x > 0:
>f.write("\n")
>x -= 1
> f.close()
> #---
> this creates a file with a whole lot of lines of 'a's.
> 100,000 lines, to be exact, and 4,200,000 bytes.
>
> In other words, this is a fair approximation for if you had, say, 25,000
> quotes (since your quotes are likely to be, on average, longer than the
> amount of 'a's I used.)
> I think you'll agree that that's quite a few quotes.
>
> Now how long does it take to use readlines() on this file?
>
> #test performance.py
> import timeit
> string = "f = file('temp.txt','r');f.readlines();f.close()"
> temp = timeit.Timer(stmt=string)
> print "1000 iterations took: " + str(temp.timeit(1000))
> #-
> what this code does is opens, reads all the text of the file, and closes
> the file.
> We call timeit with 1000 as the argument, so it repeats this process
> 1000 times.
>
> The output of this program on my machine is:
> 1000 iterations took: 51.0771701431
>
> In other words, if you have 25,000 quotes, you could read all of them
> into memory in 51.07717/1000 (approximately)
> or 0.05107 seconds.  And I'm skeptical that you would even have that
> many quotes.
> So, like i said before, I doubt this will cause any significant
> performance problem in pretty much any normal situation.
>
> Also, by the way - please reply to me on-list so that others get the
> benefit of our conversations.
> -Luke

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-15 Thread Luke Paireepinart

max baseman wrote:
> cool thanks
>
> oh for performance eventualy i would like the file to contain many quotes
Using readlines isn't exactly going to cause a performance bottleneck.
I used the following code
#make the file.py
f = file("temp.txt","w")
x = 10
while x > 0:
f.write("\n")
x -= 1
f.close()
#---
this creates a file with a whole lot of lines of 'a's.
100,000 lines, to be exact, and 4,200,000 bytes.

In other words, this is a fair approximation for if you had, say, 25,000 
quotes (since your quotes are likely to be, on average, longer than the 
amount of 'a's I used.)
I think you'll agree that that's quite a few quotes.

Now how long does it take to use readlines() on this file?

#test performance.py
import timeit
string = "f = file('temp.txt','r');f.readlines();f.close()"
temp = timeit.Timer(stmt=string)
print "1000 iterations took: " + str(temp.timeit(1000))
#-
what this code does is opens, reads all the text of the file, and closes 
the file.
We call timeit with 1000 as the argument, so it repeats this process 
1000 times.

The output of this program on my machine is:
1000 iterations took: 51.0771701431

In other words, if you have 25,000 quotes, you could read all of them 
into memory in 51.07717/1000 (approximately)
or 0.05107 seconds.  And I'm skeptical that you would even have that 
many quotes.
So, like i said before, I doubt this will cause any significant 
performance problem in pretty much any normal situation.

Also, by the way - please reply to me on-list so that others get the 
benefit of our conversations.
-Luke
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-14 Thread Andreas Kostyrka

Well he could implement the indexing into his program and check mtimes to 
decide if he needs to reindex.

But yes, as long the file fits into memory, readlines (or 
list(file("quotes,txt")) makes more sense.

Andreas

-- Ursprüngl. Mitteil. --
Betreff:    Re: [Tutor] reading random line from a file
Von:"Alan Gauld" <[EMAIL PROTECTED]>
Datum:  15.07.2007 06:39


"max baseman" <[EMAIL PROTECTED]> wrote


> im writing a quick quote reader that spits out a random quote from a
> show but cant get it to pick randomly

You can either get theclines to be the same length and
use a random index to seek() to the start of the line you want.
Or you can build a separate index file which records where
each line starts and randomly select one of thiose. But that
requires that the quotes file is only changed programmatically
so that the index file can be rebuilt each time you add/delete
a quote. (Or you build an indexing program)

Its much easier (unless the file is huge) to just use readlines()

HTH

Alan G.


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-14 Thread Alan Gauld


"max baseman" <[EMAIL PROTECTED]> wrote


> im writing a quick quote reader that spits out a random quote from a
> show but cant get it to pick randomly

You can either get theclines to be the same length and
use a random index to seek() to the start of the line you want.
Or you can build a separate index file which records where
each line starts and randomly select one of thiose. But that
requires that the quotes file is only changed programmatically
so that the index file can be rebuilt each time you add/delete
a quote. (Or you build an indexing program)

Its much easier (unless the file is huge) to just use readlines()

HTH

Alan G.


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-14 Thread John Fouhy

On 15/07/07, max baseman <[EMAIL PROTECTED]> wrote:
> im writing a quick quote reader that spits out a random quote from a
> show but cant get it to pick randomly
> i tried
> a=randrange(820)+1
> text.readline(a)
>
> and i would prefer not having to bring evryline into the program then
> picking like

The 'fortune' program in unix/linux produces random quotes from a
quote file.  As I understand it, it builds separate index files for
each input file.  I'm not sure exactly how it works, but I would guess
the index files contain byte offsets for the start and end of each
quote.  So you would read the whole index file (which will be much
shorter than the main quote file), select one at random, then use
.seek() and .read() to read just the bytes you are interested in from
the main file.

-- 
John.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

2007-07-14 Thread Luke Paireepinart

max baseman wrote:
> im writing a quick quote reader that spits out a random quote from a  
> show but cant get it to pick randomly
> i tried
> a=randrange(820)+1
> text.readline(a)
>
> and i would prefer not having to bring evryline into the program then  
> picking like
>
> for line in text.readlines():
>   lines.append(text)
>   
You don't have to read the lines in this way.
Just do lines = text.readlines() directly.

There's no way that you can just directly read a specific line without 
reading in the rest of the file,
because Python doesn't know beforehand where newlines are located inside 
of the file.
So even if this were possible, it would still read in all of the file up 
to and including the line you want, so that it could
count the number of newlines.

Why is it a problem to input it all at once?
-Luke
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

Re: [Tutor] reading random line from a file

20 matches

Site Navigation

Mail list logo

Footer information