On Mon, May 30, 2005 at 09:25:45AM +0000, Davis Houlton wrote: > Hi Frederik! I guess we're both a little confused :) My question is why would > I sort AND shuffle in the same command? Are we talking sort the whole data > set and shuffle a subset? I guess I'm having a hard time thinking why I would > randomize via key--not saying that there aren't reasons, I'm just not sure > what they are!
This is covered in the previous thread. The canonical example is playing songs with albums shuffled, but with songs on each album played together and in order. > My premise is that shuffle is organized pretty differently than sort--the > code > I have (in addition to the code I imagine we'll need for large files) looks > radically different than sort, if only because shuffling is vastly simpler. > > While we could graft a shuffle into sort--I must admit to have only taken a > cursory glance at the sort source--I think we can gain greater efficiencies > by keeping the logic paths separate. My assumption is thus the shuffling > code will be it's own entity, whether it is in sort or shuffle. It is true that shuffling can generally be done more efficiently than sorting. I don't know if efficiency is a primary concern - I think that the *ability* to handle multi-gigabyte files is important, but since they come up so rarely, especially when the task is to shuffle and not to sort, whether they are done in a minute or 30 minutes seems inconsequential. But if you are already writing something which will be able to handle large files well, I guess I personally don't see a problem with including it in coreutils. The only thing is that what you describe won't be able to handle all of the use cases that I had in mind. I would still like to see 'sort' have an option to sort based on a hash of keys since this would cover those. > Looking at it a different way, lets take a look at the usage of sort and > shuffle as a card metaphor. The way I sort a deck of cards--and my rather > simple method is far from optimum--is to first spread the cards face up out > on a table, look for some high cards of each suit, start a pile of the four > suits, and then as I pull additional cards, place them in the proper order in > each suit pile. When I'm done sometime later, I'm left with the four stacks > of cards, each suit in the proper order. > > When I shuffle the resulting deck, however, I use a different process. > Granted, I could spread all the cards on the table, mix them up "domino" > style, and then place them randomly into one, or even four stacks. That > would be acceptable. But what I do (following the grand tradition of card > shark wannabes everywhere) is split the deck in half. I take each deck, and > attempt to randomly merge them together like we've all seen those Las Vegas > dealers do on tv, and voila--I have now (in theory) randomized the deck. It's > quicker and just as effective as the table spread method. > > If we are willing to ignore the imperfections of the analogy--that Vegas > dealers shuffle their cards 7 times, that I have a tendency to mangle cards > with improper shuffling technique, etc--my thinking is that it makes sense to > have sort and shuffle remain separate on an intuitive level. And I admit, it > is true, it is not hard to train a user in sort and shuffle commands. Had > sort --random already existed, there would be no need to propose any > separation. But if we accept as a given that the code will follow two > different logic paths, I personally don't see maintenance gains from > combining the two. I hope that you aren't proposing an algorithm which is similar to card-shuffling. That would be exactly like merge-sorting on a key hash - i.e. no more efficient. > I took a quick scan of the archive and it seemed like the conclusion > was it is a good idea to keep shuffle functionality separate? I believe it was concluded that two functionalities were needed - I don't know what you mean by "separate". Frederik _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils