Re: faster splitter

Andrei Alexandrescu via Digitalmars-d Tue, 31 May 2016 11:44:02 -0700

On 5/31/16 1:54 PM, qznc wrote:

On Tuesday, 31 May 2016 at 01:55:16 UTC, Andrei Alexandrescu wrote:

I agree it's difficult to characterize the behavior of substring
search with one number. There are many dimensions of variation. (But
there's no reason for an emotional response.) A few possible baselines
come to mind:


* Search a long string for a one-character string, match and fail.


There is a special version of find for searching a single char in a
string. Using a one-letter needle string is more like a user mistake
than something to optimize for.

Oh, I see what you mean - it's a string consisting only of onecharacter, not one character 'x'. I agree, but I'm just saying it's asound baseline.

* Take an English text string. Search for a substring consisting of
its last portion (e.g. 5% of the length).


How long should the english text be? A Tweet? A book? A Gigabyte of log
files?

I'm thinking of a few exponentially increasing sizes. e.g. 100, 1000,10_000, 100_000, 1_000_000.

English text means basically ASCII and no Unicode?

My understanding (and correct me if I'm wrong) is that we're discussingsearch with the default predicate, ignoring equivalent graphemerepresentations etc. So there's no need to mind encoding at all, whichmeans the text could be in any language. Foreign languages would furtherincrease the vocabulary, but that's the only effect.

* Take an English text string. Search for a substring consisting of a
fraction of the text (e.g. 3%) with additional characters prepended.
Repeat for appended.


Why the prepend/append? To force a mismatch?

Yah, and it's a good test for strings that have some equal portion yetthey are different, which seems to me puts a greater strain on thealgorithms. It is arguably a likely situation in the real world.



Thanks,

Andrei

Re: faster splitter

Reply via email to