Re: dmd 1.060 and 2.045 release

Steven Schveighoffer Mon, 10 May 2010 04:50:17 -0700

On Sun, 09 May 2010 02:11:21 -0400, Lionello Lunesu<l...@lunesu.remove.com> wrote:

I'm in the middle of moving from one city to another so don't wait for
me. I have attached the D version of the code in the wikipedia article
(including the patch for transpositions.)

It's not straightforward to drop this in (apart from it being in D),
since speller.c creates all variations on a string (=inefficient) and
uses a callback function to check if a variation is a valid symbol.

I'll have more time to look at this next week.

Several others have privately brought up this problem to Walter. He doesnot want to change how the symbol lookup tables work, and there is no wayto iterate them. Therefore, without a way to iterate current symbols,this is the only way the algo can be written.

However, according to reports on the latest beta, he's sped up the lookuptimes for symbols significantly enough to perceptually reduce the problem.

Because of the nature of the algorithm, the longer the invalid symbol, theslower the algorithm. It would be interesting to see a comparison betweenthe current beta code and code that does a full iteration with very longsymbols. I don't know if anyone wants to look at modifying the symbollookup data structures to allow iteration, it may be perceptuallyinsignificant, and unimportant for most developers.


A quick test on a long symbol name:

voids023456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567891(){}

void main()
{
     
s123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890();
}



And on the various compilers:

2.043: 0.052s, does not suggest
2.045: 5.4s, suggests the alternative
beta: 0.8s, does not suggest

2.043 only does spell checking with a Levenshtein distance of 1, 2.045does 2, but is extremely slow. The beta does a distance of 2, but only ifthe errors are close together (Walter added this as an optimization toremove one factor from the runtime complexity).

So clearly, there is still room for improvement, I think with a propersymbol iteration, we could get the time down to be as quick as 2.043 orfaster *and* provide the ability to check for a complete LD of 2 where theerrors are not close together. We might be able to even push the LD to 3or 4.

I've thought about the optimization of errors close together beingchecked, and I think the counter case is the case which takes the longest-- a long symbol. Typically people use camelCase to denote symbols, whatif two of the words in the symbol were misspelled by one character (or acapitalization was forgotten)?

It may not be an issue, the spell checker is simply a nice hint, but isn'tessential to determine errors.


-Steve

Re: dmd 1.060 and 2.045 release

Reply via email to