On Tue, Jun 08, 2021 at 01:45:00AM -0400, Nathan Hartman wrote: > In order to do some testing, I needed some test data that reproduces > the issue; since stsp can't share the customer's 100MB XML file, and > we'd probably want other inputs or sizes anyway, I wrote a program > that attempts to generate such a thing. I'm attaching that program... > > To build, rename to .c extension and, e.g., > $ gcc gen_diff_test_data.c -o gen_diff_test_data > > To run it, provide two parameters: > > The first is a 'seed' value like you'd provide to a pseudo random > number generator at init time. > > The second is a 'length' parameter that says how long (approximately) > you want the output data to be. (The program nearly always overshoots > this by a small amount.) > > Rather than using the system's pseudo random number generator, this > program includes its own implementation to ensure that users on any > system can get the same results when using the same parameters. So if > different people want to test with the same sets of input, you only > have to share 2 numbers, rather than send each other files >100MB of > useless junk. > > Example: Generate two files of approx 100 MB, containing lots of > differences and diff them: > > $ gen_diff_test_data 98 100m > one.txt > $ gen_diff_test_data 99 100m > two.txt > $ time diff one.txt two.txt > /dev/null > > With the above parameters, it takes my system's diff about 50 seconds > to come up with something that looks reasonable at a glance; svn's > diff has been crunching away for a while now...
Thank you Nathan, this is incredibly useful! Would you consider committing this tool to our repository, e.g. somewhere within the tools/dev/ subtree? Thanks, Stefan