On 29 May 2015 at 23:22, Lex Trotman <ele...@gmail.com> wrote: >> >>> Does it have any major drawbacks? I read it has to some kind >>> "prefix table" prior to running the search, but I guess that's >>> negligible for all reasonable search terms? >> >> I think KMP does not have drawbacks. The prefix table is of length m+1, >> m is the length of the string P for which you wish to find all occurrences >> in some string T. For example, P is "hello", m == 5, and you build a >> prefix table of length m+1 == 6. Then T may be any long string, for the >> long string you do not need any additional memory. You only need the extra >> m+1 size_t locations for the prefix table, m is length of P, the string you >> wish to search. > > Its drawback is it is more complicated, and as I said you have to > benchmark to see if its worth it, things that were worthwhile back > when KMP was invented may not be worthwhile today. >
I might add that the search time may be irrelevant compared to the time to display unless you have a huge buffer and a very small window. >> >> >> Have a great day, >> Marius Ioan Buzea >> >> >> >> >> >> >> -------------------------------------------- >> On Fri, 5/29/15, Thomas Martitz <ku...@rockbox.org> wrote: >> >> Subject: Re: [Geany-Devel] pull request on GitHub, to add >> GeanyHighlightSelectedWords, into Geany Plugins >> To: devel@lists.geany.org >> Date: Friday, May 29, 2015, 1:49 PM >> >> Am 29.05.2015 um 12:44 >> schrieb marius buzea: >> > Hello, >> > >> > >> > >> With KMP it is possible to search all occurrences of a m >> length string, into a n length string, >> > >> using O(m+n) machine operations. Next page: >> > http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm >> > describes the algorithm. >> > >> > >> > >> > The KMP works well >> with the utf-8 encoding of unicode. One property of >> utf8 is that >> > the encoding one unicode >> symbol is not a substring of another utf8 substring. >> This >> > property allows to take the utf-8 >> encoding of the string you wish to search, and to >> > find this utf8 encoding string, in the >> utf8 encoding of the text string. Geany >> uses >> > scintilla, and scintilla uses utf8 >> to encode the document it displays, and scintilla has >> > a command that gives the raw utf8 byte >> array for a [start, end) range. So, KMP >> > gives great speed for searching all >> occurrences, and may be used with the underlying >> > text representation of scintilla used by >> geany. The utf-8 encoding of a unicode >> > string of length n, is less than 6n, each >> utf8 encoding is at most 6 bytes. >> > >> > >> > >> > >> I also think that including this functionality/feature into >> Geany core would be a good choice. >> > It >> would be a small tradeoff between keeping the core small, >> and adding this new functionality, >> > but >> this is your choice. >> > >> > >> > >> > >> If you wish to extend automark, then this is good choice >> too. If you wish, and if it helps, >> > please reuse any part of the >> implementation provided here: >> > >> http://sourceforge.net/p/geanyhighlightselectedword/code/HEAD/tree/trunk/GeanyHighlightSelectedWord/GeanyHighlightSelectedWord.c >> > If needed, I would help. >> > >> > >> > >> What should I do next? Should I not do the >> pull request for GeanyHighlightSelectedWord? >> > It is okay with me. >> GeanyHighlightSelectedWord would then be still available at >> sourceforge until >> > Geany provides this >> functionality from its core, or from automark. >> > >> >> >> I wonder if this algorithm should be applied to >> all searches, and thus >> be integrated into >> scintilla. Does it have any major drawbacks? I read >> it has to some kind "prefix table" >> prior to running the search, but I >> guess >> that's negligible for all reasonable search terms? >> >> Best regards >> _______________________________________________ >> Devel mailing list >> Devel@lists.geany.org >> https://lists.geany.org/cgi-bin/mailman/listinfo/devel >> >> _______________________________________________ >> Devel mailing list >> Devel@lists.geany.org >> https://lists.geany.org/cgi-bin/mailman/listinfo/devel _______________________________________________ Devel mailing list Devel@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/devel