On 15/12/09 22:46, Chen Guo wrote:
Hey guys,
Alright as of now everything that we originally talked about has been
implemented; tests were done on an 980K ASCII file while limiting buffer
size to 2 bytes (will test on binary later). Everything works great
great stuff.
As of now, I've got the following syntax:
-bN or --bytes=N is original usage
-b/N or --bytes=/N is split into N equal sized files
-bK/N or --bytes=K/N is extract Kth of N equal chunks to stdout
-nN is equivalent to -b/N, and -nK/N is equivalent to -bK/N
Right. Also doing -n lines:4 would allow one to specify
a distribution method which may be required. I.E. this
could be used to specify round robin distribution of lines
which might be required.
-n lines-rr:4
I haven't handled the non-seekable file case yet, but yeah this works.
As for extracting byte-chunks to stdout, I see no other way than to
read from the file's start and start outputting when the desired chunk
is read.
Also specifying other delimiters might be useful like:
-n nul:4
Actually at the top of split.c I see a TODO that talks about a -t option
which specifies a CHAR or REGEX deliminator. REGEX might be
kind of complicated, but a delim char as a global char eol should
be trivial to implement. We can leave eol = '\n' by default, and the -t
option can override it.
Right, -t is probably more general as it would also support
the existing --lines option.
But then this begs the question... How would you enter say, '\0' into
the terminal? And the way I know of entering newline is rather awkward:
-t '
'
I'd probably use escapes like the join command does.
For example, it supports: -t '\0'
For reference bash and ksh support ansi c quoting like $'\0'
so you could specify -t $'\0'. Also more generally one could do:
-t $(printf '\0'), though I wouldn't depend on those being available,
and also passing NULs at least through the command line will be problematic.
And last thing, would I be wrong to say we can't support splitting by
chunks with stdin? Barring of course, the round robin line splitting.
Right, that's all I can see possible for non seekable files.
cheers,
Pádraig.