On Thu, Jul 14, 2011 at 10:50, Jerry Schwartz <je...@gii.co.jp> wrote: > There are a couple of problems with using any database for doing this. >
you're probably right. that said.... > - Rows in a table are inherently in no particular order. That means they are > neither sorted nor random. > - Depending upon the keys you are using (an auto-increment field for example), > you might be able to select a random row; but that wouldn't guarantee that you > wouldn't get the same row twice. > > There is a way around this that sounds tempting, but it would be hideously > impractical: > this actually doesn't sound very temping to me :) > 1. Create a copy of your original table, `t2`, with an auto-increment index > `aui` (either from the original table or created at this point. > 2. Create yet another table, `t3`, with the same structure but no data in it. > 3. Select a random record from `t2` using > Â SELECT ROUND(RAND()*MAX(`t2`.`aui`)) FROM `t2` INTO @RANDREC; > 4. Copy that randomly-selected record into `t3` using > Â INSERT INTO `t3` SELECT * FROM `t2` WHERE `t2`.`aui` = @RANDREC; > 5. Remove the selected record from `t2` using > Â DELETE FROM `t2` WHERE `t2`.`aui` = @RANDREC; > 6. Loop back to step 3 until `t2` is empty. > > Here's why, tempting as it is, it wouldn't work: as the number of records in > `t2` dwindles, it will become less and less likely that @RANDREC will actually > match an existing value of `t2`.`aui`. By the time you got down to a handful > of records, it might take years to find one. > > Now if the LIMIT clause accepted variables instead of constants, you could > replace steps 3 and 4 with something like this: > > Â SELECT ROUND(RAND()*COUNT(`t2`.`aui`)) INTO @RANDSTART; > Â SELECT FROM `t2` LIMIT @RANDSTART, 1; > > And that would actually work (give or take some corner cases I haven't thought > about very hard). > i've actually done find_in_string (or whatever it was under sql server) and then a substr -1 to get all of the titles where pages were stored in a table. that said, it took me a few hours to figure it out, debug, and generate /something/ that looked right. it would've been hella easier to just write a program that did a: select data from table while( data ) { new_data = regex of data print new_data } (i wrote that in pseudo code but i'd do 10 lines in perl and probably have it done in 10 minutes) > In point of fact, that's basically what you need to do whether you do it in > MySQL or in an external program. The difference is that in an external program > you can effectively renumber your rows as you remove them, so that it won't > get harder and harder to find an actual row. Even then, a random shuffle is > likely to be time-consuming. > > Regards, > > Jerry Schwartz > Global Information Incorporated > 195 Farmington Ave. > Farmington, CT 06032 > > 860.674.8796Â / FAX: 860.674.8341 > E-mail: je...@gii.co.jp > Web site: www.the-infoshop.com > >>-----Original Message----- >>From: Adarsh Sharma [mailto:adarsh.sha...@orkash.com] >>Sent: Wednesday, July 13, 2011 9:59 AM >>To: wal...@waltertross.com >>Cc: mysql@lists.mysql.com >>Subject: Re: How to Shuffle data >> >>Do i need to move output to a CSV file & then write a Java or C code for it. if you want, you can also select your pk from your data and write the pk and data out in a new table. i've never done java and never looked at any c db api's, so ymmv. but yes, do it externally. >> >>I think a procedure can do it easily , If I know the proper function for >>that. if you need this done on the fly, an sp will be what you'll want (or memcache and an event handler but i hate plugging more db's into a project until i'm killing my current setup). however, an sp to do this will make your db slower. so, if you don't need data on the fly, write a program and a cron (or at) job to run when you sleep and be done with it. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=arch...@jab.org