On 02/24/2012 10:08 PM, Jérémy Compostella wrote: > All, > > I'm interesting in implementing this feature. In fact, I already made a > quick implementation to play with. > > I refer to the original thread : "split behavior" > http://lists.gnu.org/archive/html/bug-coreutils/2009-09/msg00217.html > > To summarise it (quick version), in the past the split command provided > this unlimited number of split files as its default behavior. But it did > not conform to POSIX, so it has been removed (see > http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=65cbf7d1).
Just to consider POSIX again, it is fairly explicit: "By default, the names of the output files shall be 'x' , followed by a two-character suffix from the character set as described above, starting with "aa" , "ab" , "ac" , and so on, and continuing until the suffix "zz" , for a maximum of 676 files." However I think it's incorrect to impose the arbitrary limit. Note the spec also says: "The -a option was added to overcome the limitation of being able to create only 676 files." So there doesn't seem to be an intention to limit the number of output files. I think it's just that alternative solutions were not considered. Also with numeric suffixes the limit is only 100 files. > This old behavior was: > $ cat /var/log/messages | split -2 - /tmp/x. > x.aa > x.ab > ... > x.yz > x.zaaa > x.zaab > ... > x.zyzz > x.zzaaaa > x.zzaaab > > But, others in the "split behavior" thread propose something like: > x.aa > ... > x.zz > x.zzaa > ... > x.zzzz > x.zzzzaa > > These two possibilities deserves the same goal, split files order, once > alphabetically sorted, is the correct order. > > However, the second possibility does not satisfy me since it will make the > use of the --additional-suffix option break this: > $ cat /var/log/messages | split --additional-suffix=.txt -2 - /tmp/x. && ls > /tmp/x.* | sort > x.aa.txt > ... > x.zy.txt > x.zzaa.txt > ... > x.zztw.txt > x.zz.txt <---- :( > x.zztx.txt > ... > > Therefore, my opinion is : the old behavior is more adapted to the > current split option set. Good. That's what I'd prefer anyway so as to be compatible with old data sets. Note '.' sorts before digits (-d) too, so there should be no ordering issues with --additional-suffix=... either. > In the "split behavior" thread it was proposed to look at the > POSIXLY_CORRECT environment variable to activate or not the unlimited > split files behavior. But, I think it's dangerous. Indeed, it breaks the > usual files list: x.aa ... x.zz ... vs. x.aa ... x.yz x.zaa .. (the x.zz > file does not exist anymore). User may be surprised and older scripts > may failed. We could key the new behavior on POSIXLY_CORRECT, but there is no need IMHO. Using POSIXLY_CORRECT is not desired and only used as a very last resort. > Maybe adding a new option or a new argument would be fine, I was > thinking to the following: > * --unlimited-suffixes > * --suffix-length=unlimited or --suffix-length=auto If we were to add an option --suffix-length=auto is the best IMHO. But I don't think we even need that. Just do it by default. > With this new option (or argument), user would keep the ability to > select the start suffix length. For example: > $ cat /var/log/messages | split --suffix-length=auto --suffix-length 3 -2 - > /tmp/x. > x.aaa <--- start with suffix length = 3 No need for that functionality I think. cheers, Pádraig.