Re: How to Shuffle data

shawn wilson Thu, 14 Jul 2011 08:43:10 -0700

On Thu, Jul 14, 2011 at 10:50, Jerry Schwartz <je...@gii.co.jp> wrote:
> There are a couple of problems with using any database for doing this.
>


you're probably right. that said....

> - Rows in a table are inherently in no particular order. That means they are
> neither sorted nor random.
> - Depending upon the keys you are using (an auto-increment field for example),
> you might be able to select a random row; but that wouldn't guarantee that you
> wouldn't get the same row twice.
>
> There is a way around this that sounds tempting, but it would be hideously
> impractical:
>

this actually doesn't sound very temping to me :)

> 1. Create a copy of your original table, `t2`, with an auto-increment index
> `aui` (either from the original table or created at this point.
> 2. Create yet another table, `t3`, with the same structure but no data in it.
> 3. Select a random record from `t2` using
> Ā  SELECT ROUND(RAND()*MAX(`t2`.`aui`)) FROM `t2` INTO @RANDREC;
> 4. Copy that randomly-selected record into `t3` using
> Ā  INSERT INTO `t3` SELECT * FROM `t2` WHERE `t2`.`aui` = @RANDREC;
> 5. Remove the selected record from `t2` using
> Ā  DELETE FROM `t2` WHERE `t2`.`aui` = @RANDREC;
> 6. Loop back to step 3 until `t2` is empty.
>
> Here's why, tempting as it is, it wouldn't work: as the number of records in
> `t2` dwindles, it will become less and less likely that @RANDREC will actually
> match an existing value of `t2`.`aui`. By the time you got down to a handful
> of records, it might take years to find one.
>
> Now if the LIMIT clause accepted variables instead of constants, you could
> replace steps 3 and 4 with something like this:
>
> Ā  SELECT ROUND(RAND()*COUNT(`t2`.`aui`)) INTO @RANDSTART;
> Ā  SELECT FROM `t2` LIMIT @RANDSTART, 1;
>
> And that would actually work (give or take some corner cases I haven't thought
> about very hard).
>

i've actually done find_in_string (or whatever it was under sql
server) and then a substr -1 to get all of the titles where pages were
stored in a table. that said, it took me a few hours to figure it out,
debug, and generate /something/ that looked right. it would've been
hella easier to just write a program that did a: select data from
table
while( data ) {
new_data = regex of data
print new_data
}

(i wrote that in pseudo code but i'd do 10 lines in perl and probably
have it done in 10 minutes)

> In point of fact, that's basically what you need to do whether you do it in
> MySQL or in an external program. The difference is that in an external program
> you can effectively renumber your rows as you remove them, so that it won't
> get harder and harder to find an actual row. Even then, a random shuffle is
> likely to be time-consuming.
>
> Regards,
>
> Jerry Schwartz
> Global Information Incorporated
> 195 Farmington Ave.
> Farmington, CT 06032
>
> 860.674.8796Ā / FAX: 860.674.8341
> E-mail: je...@gii.co.jp
> Web site: www.the-infoshop.com
>
>>-----Original Message-----
>>From: Adarsh Sharma [mailto:adarsh.sha...@orkash.com]
>>Sent: Wednesday, July 13, 2011 9:59 AM
>>To: wal...@waltertross.com
>>Cc: mysql@lists.mysql.com
>>Subject: Re: How to Shuffle data
>>
>>Do i need to move output to a CSV file & then write a Java or C code for it.

if you want, you can also select your pk from your data and write the
pk and data out in a new table. i've never done java and never looked
at any c db api's, so ymmv. but yes, do it externally.

>>
>>I think a procedure can do it easily , If I know the proper function for
>>that.

if you need this done on the fly, an sp will be what you'll want (or
memcache and an event handler but i hate plugging more db's into a
project until i'm killing my current setup). however, an sp to do this
will make your db slower. so, if you don't need data on the fly, write
a program and a cron (or at) job to run when you sleep and be done
with it.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: How to Shuffle data

Reply via email to