On Thu, Jul 16, 2020 at 10:58 AM David Kastrup <d...@gnu.org> wrote: > > Han-Wen Nienhuys <hanw...@gmail.com> writes: > > > On Wed, Jul 15, 2020 at 10:48 PM David Kastrup <d...@gnu.org> wrote: > > > >> Well ok. But only 1000000 random numbers are being used (there is > >> another call using 10000000 instead, the choice appearing random). > >> Let's assume we have 10 processes going through 138 files each. The > >> processes are going to switch to the next output file asynchronously, so > >> with any change, there is a chance of the old number colliding with the > >> other processes' numbers, and the new number colliding. The probability > >> that a new number is different from an existing set of 9 is > >> 999991/1000000. If we do this switch 1380 times, the probability of a > >> collision during one run is 1-(999991/1000000)^1380, about 1 in 80. > > > > I don't think this analysis is correct. In order for two numbers to > > meaningfully collide, they have to be picked at the same time (since > > we rename the files afterwards.). > > And 10 numbers are picked at the same time since 10 processes work on > their temporary files at the same time.
My apologies, your analysis correct. That said, I am still skeptical of this being the explanation. 1000000 as a random number was actually used for midi output, which uses the input as a basename. So you'd need to have1000 parallel instances working on the same file to get a 50% chance of collision. The /tmp version used 10M , which -according to your calculation- would yield a 0.01% chance of colliding. Maybe something is off with the init after fork, but GUILE's random initialization also doesn't look very reliable: https://git.savannah.nongnu.org/cgit/guile.git//tree/libguile/random.c/?id=5f22d1090bef72639f2744402c0466d8dbf8f8ac#n121 if you permute bytes at distance 8 in the seed, you'll get the same seed. I'll just add a loop to retry. Note that the original version of this MR used the final destination as the basename for the temp file, which was why I didn't use a loop in the first place. (I still think that is the better design.) > > The birthday attack says you need 1000 parallel instances to get a 50% > > of collision. > > For a single occasion of collision. Not for 1300 occasions in sequence. > How about addressing my actual argument? > > >> Now if I remember correctly, there were some changes in how > >> lilypond-book worked that typically resulted in double the number of > >> processes getting spawned than asked for which would give us 19 instead > >> of 9 possibilities for collision. That would raise the probability of a > >> collision to about 1 in 40 runs. > > > > I think you misremember. The command > > > > make -jN CPU_COUNT=N > > > > will potentially spawn 2*N -1 processes: 1 x lilypond-book with N > > children and N- 1 other processes. > > So back to 1 in 80. Still not the best odds if a collision is fatal. > > -- > David Kastrup -- Han-Wen Nienhuys - hanw...@gmail.com - http://www.xs4all.nl/~hanwen