On 07/05/2013 10:43 PM, Assaf Gordon wrote:
>
> On 07/05/2013 12:12 PM, Pádraig Brady wrote:
>> On 07/05/2013 07:04 PM, Assaf Gordon wrote:
>>> Hello,
> Regarding old discussion here:
> http://lists.gnu.org/archive/html/coreutils/2011-02/msg00030.html
>
> Attached is a patch with adds "--repetition" option to shuf, enabling
> random number generation with repetitions.
>
I like this.
--repetition seems to be a very good interface too,
since it aligns with standard math nomenclature in regard to permutations.
I'd prefer to generalize it though, to supporting stdin as well as -i.
>>>
>>> Attached is an updated patch, supporting "--repetitions" with STDIN/FILE/-e
>>> (using the naive implementation ATM).
>>> e.g.
>>>$ shuf --repetitions --head-count=100 --echo Head Tail
>>> or
>>>$ shuf -r -n100 -e Head Tail
>>
>> Excellent thanks.
>>
>>> But the code is getting a bit messy, I guess from evolving features over
>>> time.
>>> I'd like to re-organize it a bit, re-factor some functions and make the
>>> code clearer - what do you think?
>>> it will make the code slightly more verbose (and slightly bigger), but
>>> shouldn't change the running performance.
>>
>> If you're getting your head around the code enough to refactor,
>> then it would be great if you could handle the TODO: item in shuf.c
>
> Attached is an updated patch, with some code cleanups (not including said
> TODO item yet).
>
> -gordon
I've split to two patches.
1. Unrelated test improvements.
2. All the rest
Note in both patches I made adjustments to the tests like
-c=$(cat exp | wc -l) || framework_failure_
+c=$(wc -l < exp) || framework_failure_
-c=$(cat exp | sort -nu | fmt ) || framework_failure_
+c=$(sort -nu exp | paste -s -d ' ') || framework_failure_
I.E. avoid cat unless needed, and paste is more general than fmt in this usage.
Also I simplified the --help a little like:
- -r, --repetitions output COUNT values, with repetitions.\n\
-with -iLO-HI, output random numbers.\n\
-with -e, stdin or FILE, output random lines.\n\
-count defaults to 1 if -n COUNT is not used.\n\
+ -r, --repetitions output COUNT items, allowing repetition.\n\
+ -n 1 is implied if not specified.\n\
I'll push the 2 attached patches soon.
thanks!
Pádraig.
>From f20a3407a8ae8488b2e7434f75738b219a2320be Mon Sep 17 00:00:00 2001
From: Assaf Gordon
Date: Fri, 5 Jul 2013 14:59:44 -0600
Subject: [PATCH 1/2] tests: add more tests for shuf option combinations
* test/misc/shuf.sh: Add tests for erroneous conditions
like multiple '-o' and '--random-source'.
---
tests/misc/shuf.sh | 29 +
1 files changed, 29 insertions(+), 0 deletions(-)
diff --git a/tests/misc/shuf.sh b/tests/misc/shuf.sh
index 3e33b61..492fd41 100755
--- a/tests/misc/shuf.sh
+++ b/tests/misc/shuf.sh
@@ -65,4 +65,33 @@ if ! test -r unreadable; then
shuf -n1 unreadable && fail=1
fi
+# Multiple -n is accepted, should use the smallest value
+shuf -n10 -i0-9 -n3 -n20 > exp || framework_failure_
+c=$(wc -l < exp) || framework_failure_
+test "$c" -eq 3 || { fail=1; echo "Multiple -n failed">&2 ; }
+
+# Test error conditions
+
+# -i and -e must not be used together
+: | shuf -i -e A B &&
+ { fail=1; echo "shuf did not detect erroneous -e and -i usage.">&2 ; }
+# Test invalid value for -n
+: | shuf -nA &&
+ { fail=1; echo "shuf did not detect erroneous -n usage.">&2 ; }
+# Test multiple -i
+shuf -i0-9 -n10 -i8-90 &&
+ { fail=1; echo "shuf did not detect multiple -i usage.">&2 ; }
+# Test invalid range
+for ARG in '1' 'A' '1-' '1-A'; do
+ shuf -i$ARG &&
+{ fail=1; echo "shuf did not detect erroneous -i$ARG usage.">&2 ; }
+done
+
+# multiple -o are forbidden
+shuf -i0-9 -o A -o B &&
+ { fail=1; echo "shuf did not detect erroneous multiple -o usage.">&2 ; }
+# multiple random-sources are forbidden
+shuf -i0-9 --random-source A --random-source B &&
+ { fail=1; echo "shuf did not detect multiple --random-source usage.">&2 ; }
+
Exit $fail
--
1.7.7.6
>From 349eda8cb0765621979d8fd8b58c21e9c5d49073 Mon Sep 17 00:00:00 2001
From: Assaf Gordon
Date: Thu, 4 Jul 2013 13:26:45 -0600
Subject: [PATCH 2/2] shuf: add --repetition to support repetition in output
main(): Process new option. Replace input_numbers_option_used()
with a local variable. Re-organize argument processing.
usage(): Describe the new option.
(write_random_numbers): A new function to generate a
permutation of the specified input range with repetition.
(write_random_lines): Likewise for stdin and --echo.
(write_permuted_numbers): New function refactored from
write_permuted_output().
(write_permuted_lines): Likewise.
* tests/misc/shuf.sh: Add tests for --repetitions option.
* doc/coreutils.texi: Mention --repetitions, add examples.
* TODO: Mention an optimization to avoid needing to
read all of the inpu