On Tue, Jun 08, 2021 at 01:45:00AM -0400, Nathan Hartman wrote:
> In order to do some testing, I needed some test data that reproduces
> the issue; since stsp can't share the customer's 100MB XML file, and
> we'd probably want other inputs or sizes anyway, I wrote a program
> that attempts to generate such a thing. I'm attaching that program...
> 
> To build, rename to .c extension and, e.g.,
> $ gcc gen_diff_test_data.c -o gen_diff_test_data
> 
> To run it, provide two parameters:
> 
> The first is a 'seed' value like you'd provide to a pseudo random
> number generator at init time.
> 
> The second is a 'length' parameter that says how long (approximately)
> you want the output data to be. (The program nearly always overshoots
> this by a small amount.)
> 
> Rather than using the system's pseudo random number generator, this
> program includes its own implementation to ensure that users on any
> system can get the same results when using the same parameters. So if
> different people want to test with the same sets of input, you only
> have to share 2 numbers, rather than send each other files >100MB of
> useless junk.
> 
> Example: Generate two files of approx 100 MB, containing lots of
> differences and diff them:
> 
> $ gen_diff_test_data 98 100m > one.txt
> $ gen_diff_test_data 99 100m > two.txt
> $ time diff one.txt two.txt > /dev/null
> 
> With the above parameters, it takes my system's diff about 50 seconds
> to come up with something that looks reasonable at a glance; svn's
> diff has been crunching away for a while now...

Thank you Nathan, this is incredibly useful!

Would you consider committing this tool to our repository, e.g. somewhere
within the tools/dev/ subtree?

Thanks,
Stefan

Reply via email to