I’ve been thinking about Martin’s desire for the pretty single backslash 
approach.  I think the discussion should be about the probability of collisions 
(I.e., files that cannot be folded due to false positives)

Assuming 95 printable characters (127-32) plus ‘\n’, a total of 96 characters, 
the unconditioned probability of a given of an n-character string occurring on 
a line is (1/96)^n.

Here are the 3 options being discussed (please check my math!)

1. The pretty double backslash approach in the current draft
       - folds only on any column and supports indents

Scan the text content to ensure no existing lines already end with a backslash 
('\') character when the subsequent line starts with a backslash ('\') 
character as the first non-space (' ') character.

    P(“\\\n[ ]*\\”)
      = sum of (1/96)^n for 3<=n<69
      = ∑((1÷96)^x; x; 3; 69)
      = 0.0000011421783625730994152046783625731
      ~= 1 / 1,000,000

2. The not pretty single backslash approach in I-D -06
       - folds only on max-column with no support for indents

Scan the artwork to ensure no existing lines already end with a '\' character 
on the desired maximum column.

     P(“.\{$maxcol-1\}\\\n”)
       = P( (not ‘\n’ for $maxcol-1 chars), followed by a “\\\n”)
       = ((1−(1÷96))^68)×(1÷96)^2
       = 0.0000532376463396105463857306859461496
        ~= 1 / 20,000

3. Martin’s pretty single backslash approach
       - folds only on any column and supports indents

Scan the artwork to ensure no existing lines already end with a '\' character 
OR that a white space character appears on the max column.

      P(“\\\n”) + P(“.\{$maxcol\} “)
        = (1/96)^2 + ((1−(1÷96))^69)×(1÷96)
        = 0.00516608334670744635108885960932866
        ~= 1 / 200

Note that each of these assume a long-line.  The number of long lines in a 
given piece of text is small, 1/100?  Thusly, while option #3 is two orders of 
magnitude more like than option #2, it may only be detected 1 / 200,000 text 
samples, that are themselves detected as needing to be folded (1/5?), so maybe 
one in a million text samples?

Maybe an automated folding algorithm could try #3 and, only if detecting the 
precondition, switch to option #1?

Kent // contributor 



_______________________________________________
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod

Reply via email to