Here is a simple and quick solution -- Generate a random number "random.shuffle(x[, random])¶Shuffle the sequence x in place. The optional argument random is a 0-argument function returning a random float in [0.0, 1.0); by default, this is the function random()." http://docs.python.org/library/random.html
Multiple the random value returned by the maxnumber of your table primary key index. http://www.tizag.com/mysqlTutorial/mysqlmax.php Then use the result in your query as randID = MAX(id) * random() SELECT objectname FROM products WHERE objectID = randID Hope this help. Cheers... Threader ---------- Forwarded message ---------- From: Dennis Lee Bieber <wlfr...@ix.netcom.com> To: python-list@python.org Date: Mon, 21 Sep 2009 21:40:02 -0700 Subject: Re: [SQL] Pick random rows from SELECT? On Mon, 21 Sep 2009 10:59:38 +0200, Gilles Ganault <nos...@nospam.com> declaimed the following in gmane.comp.python.general: > Since this list is quite big and the site is the bottleneck, I'd like > to run multiple instances of this script, and figured a solution would > be to pick rows at random from the dataset, check in my local database > if this item has already been taken care of, and if not, download > details from the remote web site. > You really think making MULTIPLE, overlapping requests to a web site is going to be more efficient than just suffering the single transfer time of one large query? > If someone's done this before, should I perform the randomization in > the SQL query (SQLite using the APSW wrapper > http://code.google.com/p/apsw/), or in Python? > Pardon, I thought you implied the bottleneck is the web-site database -- I'd worry about any web-site that exposes a file-server based database to direct user access. > Here's some simplified code: > > sql = 'SELECT id,label FROM companies WHERE activity=1' > rows=list(cursor.execute(sql)) > for row in rows: > id = row[0] > label = row[1] > > print strftime("%H:%M") > url = "http://www.acme.com/details.php?id=%s" % id > req = urllib2.Request(url, None, headers) > response = urllib2.urlopen(req).read() > > name = re_name.search(response) > if name: > name = name.group(1) > sql = 'UPDATE companies SET name=? WHERE id=?' > cursor.execute(sql, (name,id) ) Ah... You mean you are retrieving the names from a local database, and then requesting web-site details based upon that name. No matter how you look at it, you appear to want to process the entire local list of companies... Multiple randomized local queries will just add to the final run-time as you start to get duplicates -- and have to reject that one to query for another random name. I'd suggest either a pool of threads -- 5-10, each reading company names from a shared QUEUE, which is populated by the main thread (remember to commit() so that you don't block on database updates by the threads). OR... determine how many companies there are, and start threads feeding them <start> and <length> (length being #names / #threads, round up -- start then being 0*length+1, 1*length+1, etc...) and use those in thread specific selects using "... limit <length> offset <start>"... This way each thread retrieves its own limited set of companies (make sure to use the same sorting criteria). -- Wulfraed Dennis Lee Bieber KD6MOG wlfr...@ix.netcom.com HTTP://wlfraed.home.netcom.com/<http://wlfraed.home.netcom.com/> ---------- Forwarded message ---------- From: greg <g...@cosc.canterbury.ac.nz> To: python-list@python.org Date: Tue, 22 Sep 2009 17:07:33 +1200 Subject: Re: Comparison of parsers in python? Nobody wrote: What I want: a tokeniser generator which can take a lex-style grammar (not > necessarily lex syntax, but a set of token specifications defined by > REs, BNF, or whatever), generate a DFA, then run the DFA on sequences of > bytes. It must allow the syntax to be defined at run-time. > You might find my Plex package useful: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ It was written some time ago, so it doesn't know about the new bytes type yet, but it shouldn't be hard to adapt it for that if you need to. What I don't want: anything written by someone who doesn't understand the > field (i.e. anything which doesn't use a DFA). > Plex uses a DFA. -- Greg
-- http://mail.python.org/mailman/listinfo/python-list