Did this years ago (1998?) and called it an "Nth" program. Wrote this
in ASSEMBLER!!! Basically, you take your total count, divide by the
number of buckets, and of that answer, the whole part we'll call Nth and
the remainder we'll call LeftoverFraction. As I start to cycle through
the data, I export/flag/whatever every "Nth" record, keeping a sum of
the LeftoverFraction. When the LeftoverFraction part goes over 1, I add
1 to the "Nth" for that iteration only, then substract 1 from the
LeftOverFraction sum. Rinse and repeat.
So if my division came out to "every 4.25 records", then the first 5
records I would be grabbing would be 4, 8, 12, 17, 21. See how I did
that? For that 4th number, the LeftoverFraction part totalled >= 1, so
I added one to the Nth factor for that iteration and set the
LeftoverFraction sum back by 1.
I was quite proud of that code. The asshole boss I had at the time
didn't think I could do it. :-)
Hence, it was a true "Nth" across the entire dataset, without rounding
error skewing the selected sample as the process went through the file.
Good luck!
--Mike
On 2014-06-13 13:23, M Jarvis wrote:
I would like to optimally shuffle my 1 million records across my <N>
buckets so that the total quantity (based on the sum of each record's
quantity field) in each bucket is balanced (as much as possible)
across
buckets.
This sounds like the type of problem that has been solved before and
given a fancy algorithm name.
I see the word 'shuffle' as-in 'distribute', and not any mention of
'random'. I would *assume* you want the 1 million records randomly
distributed.
Also your desire for "balanced as much as possible" is relative -
balanced as in less than 2? 200? 463?
Off the top of my head I'd randomize the 1 million records then feed
them 1, 2, 3... to each bucket keeping a running total on each. Not
sure about the probabilities but something tells me you should more or
less be within 1000 of each other.... if not, do it again...
Kinda brute force but that's all I got for ya at the moment...
_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: http://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: http://leafe.com/archives/search/profox
This message:
http://leafe.com/archives/byMID/profox/[email protected]
** All postings, unless explicitly stated otherwise, are the opinions of the
author, and do not constitute legal or medical advice. This statement is added
to the messages for those lawyers who are too stupid to see the obvious.