Hi all, hope everyone had happy holidays. Here's the patch in its entirety. Let me know if anything's not satisfactory.
I should note that I went easy on the tests because the other split tests didn't seem all too comprehensive themselves. Please let me knowif I need to be more exhaustive. >From fb783060ece188fdbcd805381d02eb3b0477d25a Mon Sep 17 00:00:00 2001 From: Chen Guo <cheng...@yahoo.com> Date: Sun, 3 Jan 2010 11:16:09 -0800 Subject: [PATCH] split: divide file into equal sized chunks; add -r and -t options. Extend --bytes and --lines to divide file into N equal pieces, or extract Kth of N said pieces. Add -n/--number alias for BSD compatibility. Add -r/--round-robin option to allow division and extraction of chunks in round robin fashion, in support of nonseekable files. Add -t/--term option to allow user to choose delineation character; supports parsing C escape sequences such as \n or \xdd. * doc/coreutils.texi: update documentation of split. * src/split.c: (eol): new global variable. (usage, long_options, main): new options -n/--number, -r, and -t. (bytes_split): add max_files argument. This allows for trivial implementaton for byte chunking, similar to BSD. (lines_split, line_bytes_split): delineate line by global eol char instead of '\n'. (lines_chunk_split): new function. Split file into eol delineated chunks. (bytes_chunk_extract): new function. Extract a chunk of file. (lines_chunk_extract): new function. Extract a eol delineated chunk of file. (of_info): new struct. Used by new functions lines_rr and ofd_check to keep track of file descriptors associated with output files. (ofd_check): new function. Shuffle file descriptors in case output files out number available file descriptors. (lines_rr): new function. Split file into chunks in round-robin fashion. (lines_rr_extract): new function. Extract a chunk of file, as if chunks were created in round-robin fashion. (chunk_parse): new function. Parses /N and K/N syntax. (eol_parse): new function. Parses -t option argument. * tests/Makefile.am: add new tests. * misc/split-bchunk: new test for byte delineated chunking. * misc/split-fail: add failure scenarios for new options. * misc/split-l: change typo ln --version to split --version. * misc/split-lchunk: new test for line delineated chunking. * misc/split-rchunk: new test for round-robin chunking. * misc/split-t: new test for user defined eol char. --- doc/coreutils.texi | 57 ++++- src/split.c | 595 ++++++++++++++++++++++++++++++++++++++++++++++- tests/Makefile.am | 4 + tests/misc/split-bchunk | 46 ++++ tests/misc/split-fail | 8 + tests/misc/split-l | 2 +- tests/misc/split-lchunk | 56 +++++ tests/misc/split-rchunk | 56 +++++ tests/misc/split-t | 39 +++ 9 files changed, 841 insertions(+), 22 deletions(-) create mode 100755 tests/misc/split-bchunk create mode 100755 tests/misc/split-lchunk create mode 100755 tests/misc/split-rchunk create mode 100755 tests/misc/split-t diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 444dbc7..ac022f4 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -104,7 +104,7 @@ * shuf: (coreutils)shuf invocation. Shuffling text files. * sleep: (coreutils)sleep invocation. Delay for a specified time. * sort: (coreutils)sort invocation. Sort text files. -* split: (coreutils)split invocation. Split into fixed-size pieces. +* split: (coreutils)split invocation. Split into pieces. * stat: (coreutils)stat invocation. Report file(system) status. * stdbuf: (coreutils)stdbuf invocation. Modify stdio buffering. * stty: (coreutils)stty invocation. Print/change terminal settings. @@ -2623,7 +2623,7 @@ These commands output pieces of the input. @menu * head invocation:: Output the first part of files. * tail invocation:: Output the last part of files. -* split invocation:: Split a file into fixed-size pieces. +* split invocation:: Split a file into pieces. * csplit invocation:: Split a file into context-determined pieces. @end menu @@ -2919,15 +2919,15 @@ mean either @samp{tail ./+4} or @samp{tail -n +4}. @node split invocation -...@section @command{split}: Split a file into fixed-size pieces +...@section @command{split}: Split a file into pieces. @pindex split @cindex splitting a file into pieces @cindex pieces, splitting a file into -...@command{split} creates output files containing consecutive sections of -...@var{input} (standard input if none is given or @var{input} is -...@samp{-}). Synopsis: +...@command{split} creates output files containing consecutive or interleaved +sections of @var{input} (standard input if none is given or @var{input} +is @samp{-}). Synopsis: @example split [...@var{option}] [...@var{input} [...@var{prefix}]] @@ -2940,10 +2940,9 @@ left over for the last section), into each output file. The output files' names consist of @var{prefix} (@samp{x} by default) followed by a group of characters (@samp{aa}, @samp{ab}, @dots{} by default), such that concatenating the output files in traditional -sorted order by file name produces -the original input file. If the output file names are exhausted, -...@command{split} reports an error without deleting the output files -that it did create. +sorted order by file name produces the original input file (except +...@option{-r}). If the output file names are exhausted, @command{split} +reports an error without deleting the output files that it did create. The program accepts the following options. Also see @ref{Common options}. @@ -2959,6 +2958,13 @@ For compatibility @command{split} also supports an obsolete option syntax @optio...@var{lines}}. New scripts should use @option{-l @var{lines}} instead. +...@item -l [...@var{k}]/@var{chunks} +...@item --line...@var{k}]/@var{chunks} +If @var{k} is zero or omitted, divide @var{input} into @var{chunks} +roughly equal-sized line delineated chunks. + +If @var{k} is present and nonzero, print @var{k}th of such chunks. + @item -b @var{size} @itemx --byt...@var{size} @opindex -b @@ -2966,6 +2972,13 @@ option syntax @optio...@var{lines}}. New scripts should use @option{-l Put @var{size} bytes of @var{input} into each output file. @multiplierSuffixes{size} +...@item -b [...@var{k}]/@var{chunks} +...@itemx --byte...@var{k}]/@var{chunks} +If @var{k} is zero or omitted, divide @var{input} into @var{chunks} +equal-sized chunks. + +If @var{k} is present and nonzero, print @var{k}th of such chunks. + @item -C @var{size} @itemx --line-byt...@var{size} @opindex -C @@ -2975,6 +2988,30 @@ possible without exceeding @var{size} bytes. Individual lines longer than @var{size} bytes are broken into multiple files. @var{size} has the same format as for the @option{--bytes} option. +...@item -n [...@var{k}]/]...@var{chunks} +...@itemx --number [...@var{k}]/]...@var{chunks} +...@opindex -n +...@opindex --number +Same as @option{--byte...@var{k}]/@var{chunks}}, for BSD compatibility. + +...@item -r [...@var{k}]/]...@var{chunks} +...@itemx --round-robin [...@var{k}]/]...@var{chunks} +...@opindex -r +...@opindex --round-robin +If @var{k} is zero or omitted, distribute @var{input} lines round-robin +style into @var{chunks} output files. + +If @var{k} is present and nonzero, print @var{k}th of such chunks. + +...@item -t @var{char} +...@itemx --term @var{char} +...@opindex -t +...@opindex --term +Set @var{char} as the end of line character. Supports C escape sequences. +Using this option with @option{-b @var{size}} is equivalent to +...@option{-c @var{size}}, and with @option{-b [...@var{k}]/@var{chunks}} is +equivalent to @option{-l [...@var{k}]/@var{chunks}}. + @item -a @var{length} @itemx --suffix-leng...@var{length} @opindex -a diff --git a/src/split.c b/src/split.c index 5bd9ebb..b1272c4 100644 --- a/src/split.c +++ b/src/split.c @@ -17,8 +17,7 @@ /* By t...@sics.se, with rms. To do: - * Implement -t CHAR or -t REGEX to specify break characters other - than newline. */ + * Extend -t CHAR to -t REGEX */ #include <config.h> @@ -72,6 +71,9 @@ static int output_desc; output file is opened. */ static bool verbose; +/* End of line character */ +static char eol; + /* For long options that have no equivalent short option, use a non-character as a pseudo short option, starting with CHAR_MAX + 1. */ enum @@ -84,8 +86,11 @@ static struct option const longopts[] = {"bytes", required_argument, NULL, 'b'}, {"lines", required_argument, NULL, 'l'}, {"line-bytes", required_argument, NULL, 'C'}, + {"number", required_argument, NULL, 'n'}, + {"round-robin", required_argument, NULL, 'r'}, {"suffix-length", required_argument, NULL, 'a'}, {"numeric-suffixes", no_argument, NULL, 'd'}, + {"term", required_argument, NULL, 't'}, {"verbose", no_argument, NULL, VERBOSE_OPTION}, {GETOPT_HELP_OPTION_DECL}, {GETOPT_VERSION_OPTION_DECL}, @@ -116,9 +121,23 @@ Mandatory arguments to long options are mandatory for short options too.\n\ fprintf (stdout, _("\ -a, --suffix-length=N use suffixes of length N (default %d)\n\ -b, --bytes=SIZE put SIZE bytes per output file\n\ + -b, --bytes=/N generate N output files\n\ + -b, --bytes=K/N print Kth of N chunks of file\n\ -C, --line-bytes=SIZE put at most SIZE bytes of lines per output file\n\ -d, --numeric-suffixes use numeric suffixes instead of alphabetic\n\ -l, --lines=NUMBER put NUMBER lines per output file\n\ + -l, --lines=/N generate N eol delineated output files\n\ + -l, --lines=K/N print Kth of N eol delineated chunks\n\ + -n, --number=N same as --bytes=/N\n\ + -n, --number=K/N same as --bytes=K/N\n\ + -r, --round-robin=N generate N eol delineated output files using\n\ + round-robin style distribution.\n\ + -r. --round-robin=K/N print Kth of N eol delineated chunk as -rN would\n\ + have generated.\n\ + -t, --term=CHAR specify CHAR as eol. This will also convert\n\ + -b to its line delineated equivalent (-C if\n\ + splitting normally, -l if splitting by\n\ + chunks). C escape sequences are accepted.\n\ "), DEFAULT_SUFFIX_LENGTH); fputs (_("\ --verbose print a diagnostic just before each\n\ @@ -218,13 +237,14 @@ cwrite (bool new_file_flag, const char *bp, size_t bytes) Use buffer BUF, whose size is BUFSIZE. */ static void -bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize) +bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize, uintmax_t max_files) { size_t n_read; bool new_file_flag = true; size_t to_read; uintmax_t to_write = n_bytes; char *bp_out; + uintmax_t opened = 1; do { @@ -251,7 +271,8 @@ bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize) cwrite (new_file_flag, bp_out, w); bp_out += w; to_read -= w; - new_file_flag = true; + new_file_flag = (opened++ < max_files || !max_files)? + true : false; to_write = n_bytes; } } @@ -277,10 +298,10 @@ lines_split (uintmax_t n_lines, char *buf, size_t bufsize) error (EXIT_FAILURE, errno, "%s", infile); bp = bp_out = buf; eob = bp + n_read; - *eob = '\n'; + *eob = eol; for (;;) { - bp = memchr (bp, '\n', eob - bp + 1); + bp = memchr (bp, eol, eob - bp + 1); if (bp == eob) { if (eob != bp_out) /* do not write 0 bytes! */ @@ -340,7 +361,7 @@ line_bytes_split (size_t n_bytes) bp = buf + n_buffered; if (n_buffered == n_bytes) { - while (bp > buf && bp[-1] != '\n') + while (bp > buf && bp[-1] != eol) bp--; } @@ -362,6 +383,328 @@ line_bytes_split (size_t n_bytes) free (buf); } +/* Split into NUMBER eol chunks. */ + +static void +lines_chunk_split (size_t number, char *buf, size_t bufsize, size_t file_size) +{ + size_t n_read; + size_t chunk_no = 1; + off_t chunk_end = file_size / number - 1; + off_t offset = 0; + bool new_file_flag = true; + char *bp, *bp_out, *eob; + + while (offset < file_size) + { + n_read = full_read (STDIN_FILENO, buf, bufsize); + if (n_read == SAFE_READ_ERROR) + error (EXIT_FAILURE, errno, "%s", infile); + bp = buf; + eob = buf + n_read; + + while (1) + { + /* Begin lookng for eol at last byte of chunk. */ + bp_out = (offset < chunk_end)? bp + chunk_end - offset : bp; + if (bp_out > eob) + bp_out = eob; + bp_out = memchr (bp_out, eol, eob - bp_out); + if (!bp_out) + { + /* Buffer exhausted. */ + cwrite (new_file_flag, bp, eob - bp); + new_file_flag = false; + offset += eob - bp; + break; + } + else + bp_out++; + + cwrite (new_file_flag, bp, bp_out - bp); + chunk_end = (++chunk_no < number)? + chunk_end + file_size / number : file_size; + new_file_flag = true; + offset += bp_out - bp; + bp = bp_out; + /* A line could have been so long that it skipped + entire chunks. */ + while (chunk_end < offset) + { + chunk_end += file_size / number; + chunk_no++; + /* Create blank file: this ensures NUMBER files are + created. */ + cwrite (true, bp, 0); + } + } + } +} + +/* Extract Nth of TOTAL chunks. */ + +static void +bytes_chunk_extract (size_t n, size_t total, char *buf, size_t bufsize, + size_t file_size) +{ + off_t start = (n == 0)? 0 : (n - 1) * (file_size / total); + off_t end = (n == total)? file_size : n * (file_size / total); + ssize_t n_read; + size_t n_write; + + while (1) + { + n_read = pread (STDIN_FILENO, buf, bufsize, start); + if (n_read < 0) + error (EXIT_FAILURE, errno, "%s", infile); + n_write = (start + n_read <= end)? n_read : end - start; + if (full_write (STDOUT_FILENO, buf, n_write) != n_write) + error (EXIT_FAILURE, errno, "output error"); + start += n_read; + if (end <= start) + return; + } +} + +/* Extract lines whose first byte is in the Nth of TOTAL chunks. */ + +static void +lines_chunk_extract (size_t n, size_t total, char* buf, size_t bufsize, + size_t file_size) +{ + ssize_t n_read; + bool end_of_chunk = false; + bool skip = true; + char *bp = buf, *bp_out = buf, *eob; + off_t start; + off_t end; + + /* For n != 1, start reading 1 byte before nth chunk of file. This is to + detect if the first byte of chunk is the first byte of a line. */ + if (n == 1) + { + start = 0; + skip = false; + } + else + start = (n - 1) * (file_size / total) - 1; + end = (n == total)? file_size - 1 : n * (file_size / total) - 1; + + do + { + n_read = pread (STDIN_FILENO, buf, bufsize, start); + if (n_read < 0) + error (EXIT_FAILURE, errno, "%s", infile); + bp = buf; + bp_out = buf + n_read; + eob = bp_out; + + /* Find starting point. */ + if (skip) + { + bp = memchr (buf, eol, n_read); + if (bp && bp - buf < end - start) + { + bp++; + skip = false; + } + else if (!bp && start + n_read < end) + { + start += n_read; + continue; + } + else + return; + } + + /* Find ending point. */ + if (end < start + n_read && end == file_size - 1) + end_of_chunk = true; + else if (start + n_read >= end) + { + bp_out = (buf + end - start < buf)? buf : buf + end - start; + bp_out = memchr (bp_out, eol, eob - bp_out); + if (bp_out) + { + bp_out++; + end_of_chunk = true; + } + else + bp_out = eob; + } + + if (write (STDOUT_FILENO, bp, bp_out - bp) != bp_out - bp) + error (EXIT_FAILURE, errno, "output error"); + start += n_read; + } + while (!end_of_chunk); +} + + + +typedef struct of_info +{ + char *of_name; + int ofd; +} of_t; + +/* Rotates file descriptors when we're writing to more output files than we + have available file descriptors. */ + +static void +ofd_check (of_t *ofiles, size_t i, size_t n) +{ + if (0 < ofiles[i].ofd) + return; + else + { + int fd; + int j = i - 1; + + /* Another process could have opened a file in between the calls to + close and open, so we should keep trying until open succeeds or + we've closed all of our files. */ + while (1) + { + /* Attempt to open file. */ + fd = open (ofiles[i].of_name, + O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, + (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP + | S_IROTH | S_IWOTH)); + if (-1 < fd) + break; + /* Find an open file to close. */ + while (ofiles[j].ofd < 0) + { + if (--j == 0) + j = n - 1; + /* No more open files to close, exit with failure. */ + if (j == i) + error (EXIT_FAILURE, 0, "%s", ofiles[i].of_name); + } + close (ofiles[j].ofd); + } + ofiles[i].ofd = fd; + } +} + +/* Divide file into N chunks in round robin fashion. */ + +static void +lines_rr (size_t n, char *buf, size_t bufsize) +{ + of_t *ofiles = xnmalloc (n, sizeof *ofiles); + char *bp, *bp_out, *eob; + size_t n_read; + bool eof = false; + size_t i; + bool inc; + + /* Generate output file names. */ + for (i = 0; i < n; i++) + { + next_file_name (); + ofiles[i].of_name = xmalloc (strlen (outfile) + 1); + strcpy (ofiles[i].of_name, outfile); + ofiles[i].ofd = -1; + } + i = 0; + + do + { + n_read = full_read (STDIN_FILENO, buf, bufsize); + if (n_read == SAFE_READ_ERROR) + error (EXIT_FAILURE, errno, "%s", infile); + if (n_read < bufsize) + { + if (n_read == 0) + break; + eof = true; + } + bp = buf; + eob = buf + n_read; + + + while (bp != eob) + { + /* Find end of line. */ + bp_out = memchr (bp, eol, eob - bp); + if (bp_out) + { + bp_out++; + inc = true; + } + else + bp_out = eob; + + /* Secure file descriptor. */ + ofd_check (ofiles, i, n); + + if (full_write (ofiles[i].ofd, bp, bp_out - bp) != bp_out - bp) + error (EXIT_FAILURE, errno, "%s", ofiles[i].of_name); + if (inc && ++i == n) + i = 0; + bp = bp_out; + inc = false; + } + } + while (!eof); + + /* Close any open file descriptors. */ + for (i = 0; i < n; i++) + if (-1 < ofiles[i].ofd) + close (ofiles[i].ofd); +} + +/* Extract Nth of TOT eol delineated, round robin distributed chunks. */ + +static void +lines_rr_extract (uintmax_t n, uintmax_t tot, char *buf, size_t bufsize) +{ + int line_no = 1; + char *bp, *bp_out, *eob; + size_t n_read; + bool eof = false; + bool inc = false; + + do + { + n_read = full_read (STDIN_FILENO, buf, bufsize); + if (n_read == SAFE_READ_ERROR) + error (EXIT_FAILURE, errno, "%s", infile); + if (n_read != bufsize) + { + if (n_read == 0) + break; + eof = true; + } + bp = buf; + eob = buf + n_read; + + while (bp != eob) + { + /* Find end of line. */ + bp_out = memchr (bp, eol, eob - bp); + if (bp_out) + { + bp_out++; + inc = true; + } + else + bp_out = eob; + + if (line_no == n + && full_write (STDOUT_FILENO, bp, bp_out - bp) != bp_out - bp) + error (EXIT_FAILURE, errno, "output error"); + if (inc) + line_no = (line_no == tot)? 1 : line_no + 1; + bp = bp_out; + inc = false; + } + } + while (!eof); +} + #define FAIL_ONLY_ONE_WAY() \ do \ { \ @@ -370,21 +713,159 @@ line_bytes_split (size_t n_bytes) } \ while (0) +/* Parse K/N syntax of chunk options. */ + +static void +chunk_parse (uintmax_t *m_units, uintmax_t *n_units, char *slash) +{ + *slash = '\0'; + if (slash != optarg + && xstrtoumax (optarg, NULL, 10, m_units, "") != LONGINT_OK + || SIZE_MAX < *m_units) + { + error (0, 0, _("%s: invalid chunk number"), optarg); + usage (EXIT_FAILURE); + } + if (xstrtoumax (++slash, NULL, 10, n_units, "") != LONGINT_OK + || *n_units == 0 || *n_units < *m_units || SIZE_MAX < *n_units) + { + error (0, 0, _("%s: invalid number of total chunks"), slash); + usage (EXIT_FAILURE); + } +} + +/* Parse eol character for -t option. */ + +static void +eol_parse () +{ + if (*optarg == '\\') + switch (*(optarg+1)) + { + case 'a': + if (*(optarg + 2) != 0) + error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg); + eol = '\a'; + break; + + case 'b': + if (*(optarg + 2) != 0) + error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg); + eol = '\b'; + break; + + case 'f': + if (*(optarg + 2) != 0) + error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg); + eol = '\f'; + break; + + case 'n': + if (*(optarg + 2) != 0) + error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg); + eol = '\n'; + break; + + case 'r': + if (*(optarg + 2) != 0) + error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg); + eol = '\r'; + break; + + case 't': + if (*(optarg + 2) != 0) + error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg); + eol = '\t'; + break; + + case 'v': + if (*(optarg + 2) != 0) + error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg); + eol = '\v'; + break; + + case '\'': + if (*(optarg + 2) != 0) + error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg); + eol = '\''; + break; + + case '\"': + if (*(optarg + 2) != 0) + error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg); + eol = '\"'; + break; + + case '\\': + if (*(optarg + 2) != 0) + error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg); + eol = '\\'; + break; + + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + { + char *term; + long int tmp; + if (xstrtol (optarg + 1, &term, 8, &tmp, "") != LONGINT_OK + || tmp < 0 || 255 < tmp ||4 + optarg < term || *term != 0) + error (EXIT_FAILURE, 0, _("%s: invalid octal esacpe sequence"), + optarg); + eol = (char) tmp; + break; + } + + case 'x': + { + char *term; + long int tmp; + if (xstrtol (optarg + 2, &term, 16, &tmp, "") != LONGINT_OK + || tmp < 0 || 255 < tmp || 4 + optarg < term || *term != 0) + error (EXIT_FAILURE, 0, _("%s: invalid hex escape sequence"), + optarg); + eol = (char) tmp; + break; + } + + default: + error (0, 0, _("%s: invalid escape sequence"), optarg); + usage (EXIT_FAILURE); + } + else + { + if (*(optarg + 1) != 0) + error (EXIT_FAILURE, 0, _("%s: invalid eol character"), optarg); + eol = *optarg; + } +} + + int main (int argc, char **argv) { struct stat stat_buf; enum { - type_undef, type_bytes, type_byteslines, type_lines, type_digits + type_undef, type_bytes, type_byteslines, type_lines, type_digits, + type_chunk_bytes, type_chunk_eol, type_rr } split_type = type_undef; size_t in_blk_size; /* optimal block size of input file device */ char *buf; /* file i/o buffer */ size_t page_size = getpagesize (); + uintmax_t m_units = 0; uintmax_t n_units; static char const multipliers[] = "bEGKkMmPTYZ0"; int c; int digits_optind = 0; + size_t file_size; + char *slash; + bool eol_char = false; initialize_main (&argc, &argv); set_program_name (argv[0]); @@ -404,7 +885,7 @@ main (int argc, char **argv) /* This is the argv-index of the option we will read next. */ int this_optind = optind ? optind : 1; - c = getopt_long (argc, argv, "0123456789C:a:b:dl:", longopts, NULL); + c = getopt_long (argc, argv, "0123456789C:a:b:c:dl:n:r:t:", longopts, NULL); if (c == -1) break; @@ -426,6 +907,13 @@ main (int argc, char **argv) case 'b': if (split_type != type_undef) FAIL_ONLY_ONE_WAY (); + slash = strchr (optarg, '/'); + if (slash) + { + split_type = type_chunk_bytes; + chunk_parse (&m_units, &n_units, slash); + break; + } split_type = type_bytes; if (xstrtoumax (optarg, NULL, 10, &n_units, multipliers) != LONGINT_OK || n_units == 0) @@ -438,6 +926,13 @@ main (int argc, char **argv) case 'l': if (split_type != type_undef) FAIL_ONLY_ONE_WAY (); + slash = strchr (optarg, '/'); + if (slash) + { + split_type = type_chunk_eol; + chunk_parse (&m_units, &n_units, slash); + break; + } split_type = type_lines; if (xstrtoumax (optarg, NULL, 10, &n_units, "") != LONGINT_OK || n_units == 0) @@ -459,6 +954,42 @@ main (int argc, char **argv) } break; + case 'n': + if (split_type != type_undef) + FAIL_ONLY_ONE_WAY (); + split_type = type_chunk_bytes; + slash = strchr (optarg, '/'); + if (slash) + { + chunk_parse (&m_units, &n_units, slash); + break; + } + if (xstrtoumax (optarg, NULL, 10, &n_units, "") != LONGINT_OK + || n_units == 0 || SIZE_MAX < n_units) + { + error (0, 0, _("%s: invalid number of chunks"), optarg); + usage (EXIT_FAILURE); + } + break; + + case 'r': + if (split_type != type_undef) + FAIL_ONLY_ONE_WAY (); + split_type = type_rr; + slash = strchr (optarg, '/'); + if (slash) + { + chunk_parse (&m_units, &n_units, slash); + break; + } + if (xstrtoumax (optarg, NULL, 10, &n_units, "") != LONGINT_OK + || n_units == 0 || SIZE_MAX < n_units) + { + error (0, 0, _("%s: invalid number of chunks"), optarg); + usage (EXIT_FAILURE); + } + break; + case '0': case '1': case '2': @@ -492,6 +1023,11 @@ main (int argc, char **argv) suffix_alphabet = "0123456789"; break; + case 't': + eol_parse (); + eol_char = true; + break; + case VERBOSE_OPTION: verbose = true; break; @@ -505,6 +1041,17 @@ main (int argc, char **argv) } } + /* Default eol to \n if none specified. */ + if (!eol_char) + eol = '\n'; + else + { + if (split_type == type_chunk_bytes) + split_type = type_chunk_eol; + if (split_type == type_bytes) + split_type = type_byteslines; + } + /* Handle default case. */ if (split_type == type_undef) { @@ -546,10 +1093,15 @@ main (int argc, char **argv) output_desc = -1; /* Get the optimal block size of input device and make a buffer. */ - if (fstat (STDIN_FILENO, &stat_buf) != 0) error (EXIT_FAILURE, errno, "%s", infile); in_blk_size = io_blksize (stat_buf); + file_size = stat_buf.st_size; + + if (split_type == type_chunk_bytes || split_type == type_chunk_eol + || split_type == type_rr) + if (file_size < n_units) + error (EXIT_FAILURE, errno, "number of chunks exceed file size"); buf = ptr_align (xmalloc (in_blk_size + 1 + page_size - 1), page_size); @@ -561,13 +1113,34 @@ main (int argc, char **argv) break; case type_bytes: - bytes_split (n_units, buf, in_blk_size); + bytes_split (n_units, buf, in_blk_size, 0); break; case type_byteslines: line_bytes_split (n_units); break; + case type_chunk_bytes: + if (m_units == 0) + bytes_split (file_size / n_units, buf, in_blk_size, n_units); + else + bytes_chunk_extract (m_units, n_units, buf, in_blk_size, file_size); + break; + + case type_chunk_eol: + if (m_units == 0) + lines_chunk_split (n_units, buf, in_blk_size, file_size); + else + lines_chunk_extract (m_units, n_units, buf, in_blk_size, file_size); + break; + + case type_rr: + if (m_units == 0) + lines_rr (n_units, buf, in_blk_size); + else + lines_rr_extract (m_units, n_units, buf, in_blk_size); + break; + default: abort (); } diff --git a/tests/Makefile.am b/tests/Makefile.am index 85503cc..89d2e40 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -228,8 +228,12 @@ TESTS = \ misc/sort-rand \ misc/sort-version \ misc/split-a \ + misc/split-bchunk \ misc/split-fail \ misc/split-l \ + misc/split-lchunk \ + misc/split-rchunk \ + misc/split-t \ misc/stat-fmt \ misc/stat-hyphen \ misc/stat-printf \ diff --git a/tests/misc/split-bchunk b/tests/misc/split-bchunk new file mode 100755 index 0000000..15c0d64 --- /dev/null +++ b/tests/misc/split-bchunk @@ -0,0 +1,46 @@ +#!/bin/sh +# show that splitting into 3 byte delineated chunks works. + +# Copyright (C) 2009 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +if test "$VERBOSE" = yes; then + set -x + split --version +fi +. $srcdir/test-lib.sh + +printf '1\n2\n3\n4\n5\n' > in || framework_failure + +split --bytes=/3 in > out || fail=1 +split --bytes=1/3 in > b1 || fail=1 +split --bytes=2/3 in > b2 || fail=1 +split --bytes=3/3 in > b3 || fail=1 +echo -n -e 1'\n'2 > exp-1 +echo -e '\n'3 > exp-2 +echo -e 4'\n'5 > exp-3 + +compare xaa exp-1 || fail=1 +compare xab exp-2 || fail=1 +compare xac exp-3 || fail=1 +compare b1 exp-1 || fail=1 +compare b2 exp-2 || fail=1 +compare b3 exp-3 || fail=1 +test -f xad && fail=1 + +# Splitting into more chunks than file size should fail. +split --bytes=/20 in 2> /dev/null && fail=1 + +Exit $fail diff --git a/tests/misc/split-fail b/tests/misc/split-fail index e36c86d..4a0c9c3 100755 --- a/tests/misc/split-fail +++ b/tests/misc/split-fail @@ -29,8 +29,11 @@ touch in || framework_failure split -a 0 in 2> /dev/null || fail=1 split -b 0 in 2> /dev/null && fail=1 +split -b /0 in 2> /dev/null && fail=1 split -C 0 in 2> /dev/null && fail=1 split -l 0 in 2> /dev/null && fail=1 +split -l /0 in 2> /dev/null && fail=1 +split -t in 2> /dev/null && fail=1 # Make sure -C doesn't create empty files. rm -f x?? || fail=1 @@ -64,5 +67,10 @@ split: line count option -99*... is too large EOF compare out exp || fail=1 +# Make sure invalid -t characters are not accepted. +split -tab in 2> /dev/null && fail=1; +split -t\\nb in 2> /dev/null && fail=1; +split -t\\8 in 2> /dev/null && fail=1; +split -t\\x1FF 2> /dev/null && fail=1; Exit $fail diff --git a/tests/misc/split-l b/tests/misc/split-l index fb07a27..850d5b5 100755 --- a/tests/misc/split-l +++ b/tests/misc/split-l @@ -18,7 +18,7 @@ if test "$VERBOSE" = yes; then set -x - ln --version + split --version fi . $srcdir/test-lib.sh diff --git a/tests/misc/split-lchunk b/tests/misc/split-lchunk new file mode 100755 index 0000000..cb71939 --- /dev/null +++ b/tests/misc/split-lchunk @@ -0,0 +1,56 @@ +#!/bin/sh +# show that splitting into 3 newline delineated chunks works. + +# Copyright (C) 2009 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +if test "$VERBOSE" = yes; then + set -x + ln --version +fi + +. $srcdir/test-lib.sh + +printf '1\n2\n3\n4\n5\n' > in || framework_failure + +split --lines=/3 in > out || fail=1 +split --lines=1/3 in > l1 || fail=1 +split --lines=2/3 in > l2 || fail=1 +split --lines=3/3 in > l3 || fail=1 + +cat <<\EOF > exp-1 +1 +2 +EOF +cat <<\EOF > exp-2 +3 +EOF +cat <<\EOF > exp-3 +4 +5 +EOF + +compare xaa exp-1 || fail=1 +compare xab exp-2 || fail=1 +compare xac exp-3 || fail=1 +compare l1 exp-1 || fail=1 +compare l2 exp-2 || fail=1 +compare l3 exp-3 || fail=1 +test -f xad && fail=1 + +# Splitting into more chunks than file size should fail. +split --bytes=/20 in 2> /dev/null && fail=1 + +Exit $fail diff --git a/tests/misc/split-rchunk b/tests/misc/split-rchunk new file mode 100755 index 0000000..080e6a2 --- /dev/null +++ b/tests/misc/split-rchunk @@ -0,0 +1,56 @@ +#!/bin/sh +# show that splitting into 3 round-robin chunks works. + +# Copyright (C) 2009 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +if test "$VERBOSE" = yes; then + set -x + ln --version +fi + +. $srcdir/test-lib.sh + +printf '1\n2\n3\n4\n5\n' > in || framework_failure + +split --round-robin=/3 in > out || fail=1 +split --round-robin=1/3 in > r1 || fail=1 +split --round-robin=2/3 in > r2 || fail=1 +split --round-robin=3/3 in > r3 || fail=1 + +cat <<\EOF > exp-1 +1 +4 +EOF +cat <<\EOF > exp-2 +2 +5 +EOF +cat <<\EOF > exp-3 +3 +EOF + +compare xaa exp-1 || fail=1 +compare xab exp-2 || fail=1 +compare xac exp-3 || fail=1 +compare r1 exp-1 || fail=1 +compare r2 exp-2 || fail=1 +compare r3 exp-3 || fail=1 +test -f xad && fail=1 + +# Splitting into more chunks than file size should fail. +split --bytes=/20 in 2> /dev/null && fail=1 + +Exit $fail diff --git a/tests/misc/split-t b/tests/misc/split-t new file mode 100755 index 0000000..4fba0f2 --- /dev/null +++ b/tests/misc/split-t @@ -0,0 +1,39 @@ +#!/bin/sh +# show that splitting with '\0' as the eol char works. + +# Copyright (C) 2009 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +if test "$VERBOSE" = yes; then + set -x + split --version +fi + +. $srcdir/test-lib.sh + +echo -n -e a'\0'b'\0'c'\0'd'\0'e'\0' > in || framework_failure + +split -l 2 -t \\0 in > out || fail=1 + +echo -n -e a'\0'b'\0' > exp-1 +echo -n -e c'\0'd'\0' > exp-2 +echo -n -e e'\0' > exp-3 + +compare xaa exp-1 || fail=1 +compare xab exp-2 || fail=1 +compare xac exp-3 || fail=1 +test -f xad && fail=1 + +Exit $fail -- 1.6.3.3