On 05/04/2012 02:56 PM, Pádraig Brady wrote: > On 01/13/2012 08:49 AM, goodman....@gmail.com wrote: >> Hello, >> >> I fairly recently discovered the joys of join, but now I wonder why it >> is limited to two files? >> >> In other words, I would like to do the following: >> >> join file1 file2 ... fileN >> >> While I CAN achieve this through other methods, they are not ideal. >> For instance, paste works with multiple files, but then I must cut out >> the repeated key columns. The following also works, but doesn't >> generalize to filename expansions (e.g. `join file*`): >> >> join file1 file2 | join - file3 | ... | join - fileN >> >> As for my use case, I am working with data files containing the >> results of multiple systems running over the same test items. I would >> like to compare the results of all systems for each item by putting >> them side-by-side. >> >> I don't know the history of the command, so I am not aware of any >> technical or ideological reasons why it shouldn't support more than >> two files. Any explanation appreciated! > > Sorry I thought I replied to this. > > Well it would be handy to support more than 2 files, > but you couldn't support an arbitrary number, > as you'd run out of file descriptors or RAM etc. > > Also it would complicate the implementation > of join to handle multiple files, especially > an arbitrary number of multiple files. > > So for primarily these reason you'd need to > split the task something like: > > find files | > while read file1; do > test "$file2" || read file2 > join "$file1" "$file2" > out.tmp > file2=out.tmp > done > > Note depending on the size of the data, > out.tmp might be better on a ram disk > (which /tmp is in modern GNU/Linux distros for example). > > Note also that your non tmp file piped version above > is not scalable for many files due to process limits etc.
Sigh I had previously responded: http://lists.gnu.org/archive/html/coreutils/2012-02/msg00064.html but the subsequent request was in a new thread. sorry for the noise, Pádraig.