Performance against Python: find substring in string

cblake Thu, 08 Apr 2021 05:30:06 -0700

With the same defs of `full_string` and `sub_string` (in both languages) this 
python 3.8: 
    
    
    from time import time
    def count(full, sub):
        t0 = time()
        start = n = 0
        while True:
            if (spot := full.find(sub, start)) >= 0:
                start = spot + 1
                n += 1
            else:
                break
        print("Time taken: ", time() - t0, " matches: ", n)
    count(full_string * 50000, sub_string)
    
    
    Run


runs almost exactly 10x slower (not a mere 2-4x as you were expecting) on a 
Linux machine than this Nim (-d:danger, --passC:-flto, etc.): 
    
    
    import times, strutils
    proc count(full, sub: string) =
        let t0 = epochTime()
        var start, n: int # auto inits to 0
        while true:
          if (let spot = full.find(sub, start); spot) >= 0:
              start = spot + 1
              n += 1
          else:
              break
        echo "Time taken: ", epochTime() - t0, " matches: ", n
    count(repeat(full_string, 50000), sub_string)
    
    
    Run

The difference from your benchmark attempt is that here we pump up the data 
scale by 50,000 back-to-back copies, searching a big string (in small hops).

Anyway, most things depend a lot on, well, a lot of things. Nim responds well 
to effort applied towards optimization. (Even Python can respond well via 
Cython/Pythran/etc./etc. - _How well_ just depends).

The subculture of "write my microbenchmark in languages X,Y,Z and draw 
conclusions" mostly leads to misguided conclusions. What is measured is mostly 
developer effort/skill/lang familiarity (often limited by other arbitrary 
constraints like "Python without Cython" that would not apply in a real world 
scenario).

Performance against Python: find substring in string

Reply via email to