Did this years ago (1998?) and called it an "Nth" program. Wrote this in ASSEMBLER!!! Basically, you take your total count, divide by the number of buckets, and of that answer, the whole part we'll call Nth and the remainder we'll call LeftoverFraction. As I start to cycle through the data, I export/flag/whatever every "Nth" record, keeping a sum of the LeftoverFraction. When the LeftoverFraction part goes over 1, I add 1 to the "Nth" for that iteration only, then substract 1 from the LeftOverFraction sum. Rinse and repeat.

So if my division came out to "every 4.25 records", then the first 5 records I would be grabbing would be 4, 8, 12, 17, 21. See how I did that? For that 4th number, the LeftoverFraction part totalled >= 1, so I added one to the Nth factor for that iteration and set the LeftoverFraction sum back by 1.

I was quite proud of that code. The asshole boss I had at the time didn't think I could do it. :-)

Hence, it was a true "Nth" across the entire dataset, without rounding error skewing the selected sample as the process went through the file.

Good luck!
--Mike


On 2014-06-13 13:23, M Jarvis wrote:

I would like to optimally shuffle my 1 million records across my <N>
buckets so that the total quantity (based on the sum of each record's
quantity field) in each bucket is balanced (as much as possible) across
buckets.



This sounds like the type of problem that has been solved before and
given a fancy algorithm name.


I see the word 'shuffle' as-in 'distribute', and not any mention of
'random'. I would *assume* you want the 1 million records randomly
distributed.

Also your desire for "balanced as much as possible" is relative -
balanced as in less than 2? 200? 463?

Off the top of my head I'd randomize the 1 million records then feed
them 1, 2, 3... to each bucket keeping a running total on each. Not
sure about the probabilities but something tells me you should more or
less be within 1000 of each other.... if not, do it again...

Kinda brute force but that's all I got for ya at the moment...

_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: http://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: http://leafe.com/archives/search/profox
This message: 
http://leafe.com/archives/byMID/profox/[email protected]
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

Reply via email to