Re: Speed-up my code please

Udiknedormin Fri, 29 Dec 2017 03:15:05 +0100

It's quite easy to speed it up, actually. Let's take a look at your transform 
iterator:
    
    
    proc flip(s: seq[string]): seq[string] =
      result = s  # copy
      result[0] = s[^1]
      result[^1] = s[0]
    
    proc transpose(s: seq[string]): seq[string] =
      result = s  # copy
      for i in 0 .. s.high:
        for j in 0 .. s.high:
          result[j][i] = s[i][j]
    
    iterator transform(s: seq[string]): seq[string] =
      for a in [s, s.flip]:  # copy x 3
        let
          b = a.transpose    # copy
          c = b.flip         # copy
          d = c.transpose    # copy
                             # x2
        yield a; yield b; yield c; yield d  # possibly copy x 4
      
      # total of:  copy x 9,  possibly copy x 17


Well, you take a sequence BY VALUE but still you don't operate in-place but 
allocate new sequences all the time. And all strings, it'd add. The bare 
minimum would be to get rid of copies you don't need:
    
    
    proc flip(s: var seq[string]) =
      swap s[0], s[^1]  # in-place
    
    proc transpose(s: var seq[string]) =
      for i in 0 .. s.high:
        for j in i+1 .. s.high:
          swap s[j][i], s[i][j]  # in-place
    
    iterator transform(s: var seq[string]): var seq[string] =
      var tmp = s  # copy
      tmp.flip     # in-place
      
      template yieldAll(c) =
        yield c; c.transpose  # in-place
        yield c; c.flip       # in-place
        yield c; c.transpose  # in-place
        yield c               # in-place
      
      yieldAll(s)  # in-place
      s = tmp      # copy
      yieldAll(s)  # in-place
    

Now a new seq is allocated (and filled; i.e. deepCopied) only two times: at var 
tmp = s and at s = tmp. You could further reduce that to one by using 
shallowCopy in the latter case. Then, if you want to truly use only seq all the 
time, you can check whenever the following would be faster:
    
    
    proc flip2(s: var seq[string]) =
      for i in 0 .. s.high:
        swap s[i][0], s[i][^1]  # in-place
    
    iterator transform(s: var seq[string]): var seq[string] =
      template yieldAll(c) =
        yield c; c.transpose  # in-place
        yield c; c.flip       # in-place
        yield c; c.transpose  # in-place
        yield c               # in-place
      
      yieldAll(s)  # in-place
      s.flip2      # in-place
      s.flip       # in-place
      yieldAll(s)  # in-place
    

It may be tempting to use a 2D array / seq mocking one, as transpose and 
reading would benefit from memory locality (but not THAT much, I guess). On the 
other hand, flip (but not flip2!) benefit from memory scattering so it's not 
that obvious, but it would still faster than flip2, thanks to locality (e.g. 
you can use copyMem instead of a swap in a loop). Still, I think it might be 
nice to give 2D array a try.

@Araq Are these seq copied in yield? It would be nice if not but the value 
semantics (which seem to break on let instead of var, so maybe here too?) make 
me doubt...

Re: Speed-up my code please

Reply via email to