On 3/04/2015 4:27 a.m., John Colvin wrote:
On Thursday, 2 April 2015 at 11:49:44 UTC, Rikki Cattermole wrote:
On 3/04/2015 12:29 a.m., John Colvin wrote:
On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
On 2/04/2015 2:52 a.m., tchaloupka wrote:
Hi,
I have a bunch of square r16 and png images which I need to flip
horizontally.

My flip method looks like this:
void hFlip(T)(T[] data, int w)
{
  import std.datetime : StopWatch;

  StopWatch sw;
  sw.start();

  foreach(int i; 0..w)
  {
    auto row = data[i*w..(i+1)*w];
    row.reverse();
  }

  sw.stop();
  writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
}

With simple r16 file format its pretty fast, but with RGB PNG
files (2048x2048) I noticed its somewhat slow so I tried to
compare it with C# and was pretty surprised by the results.

C#:
PNG load - 90ms
PNG flip - 10ms
PNG save - 380ms

D using dlib (http://code.dlang.org/packages/dlib):
PNG load - 500ms
PNG flip - 30ms
PNG save - 950ms

D using imageformats
(http://code.dlang.org/packages/imageformats):
PNG load - 230ms
PNG flip - 30ms
PNG save - 1100ms

I used dmd-2.0.67 with -release -inline -O
C# was just with debug and VisualStudio attached to process for
debugging and even with that it is much faster.

I know that System.Drawing is using Windows GDI+, that can be
used with D too, but not on linux.
If we ignore the PNG loading and saving (didn't tried libpng
yet), even flip method itself is 3 times slower - I don't know D
enough to be sure if there isn't some more effecient way to make
the flip. I like how the slices can be used here.

For a C# user who is expecting things to just work as fast as
possible from a system level programming language this can be
somewhat disappointing to see that pure D version is about 3
times slower.

Am I doing something utterly wrong?
Note that this example is not critical for me, it's just a simple
hobby script I use to move and flip some images - I can wait. But
I post it to see if this can be taken somewhat closer to what can
be expected from a system level programming language.

dlib:
auto im = loadPNG(name);
hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
savePNG(im, newName);

imageformats:
auto im = read_image(name);
hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
write_image(newName, im.w, im.h, im.pixels);

C# code:
static void Main(string[] args)
        {
            var files = Directory.GetFiles(args[0]);

            foreach (var f in files)
            {
                var sw = Stopwatch.StartNew();
                var img = Image.FromFile(f);

                Debug.WriteLine("Img loaded in {0}[ms]",
(int)sw.Elapsed.TotalMilliseconds);
                sw.Restart();

img.RotateFlip(RotateFlipType.RotateNoneFlipX);
                Debug.WriteLine("Img flipped in {0}[ms]",
(int)sw.Elapsed.TotalMilliseconds);
                sw.Restart();

                img.Save(Path.Combine(args[0], "test_" +
Path.GetFileName(f)));
                Debug.WriteLine("Img saved in {0}[ms]",
(int)sw.Elapsed.TotalMilliseconds);
                sw.Stop();
            }
        }


Assuming I've done it correctly, Devisualization.Image takes around
8ms
in debug mode to flip horizontally using dmd. But 3ms for release.

module test;

void main() {
   import devisualization.image;
   import devisualization.image.mutable;
   import devisualization.util.core.linegraph;

   import std.stdio;

   writeln("===============\nREAD\n===============");
   Image img = imageFromFile("test/large.png");
   img = new MutableImage(img);

   import std.datetime : StopWatch;

   StopWatch sw;
   sw.start();

   foreach(i; 0 .. 1000) {
       img.flipHorizontal;
   }

   sw.stop();

   writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
}

I was planning on doing this earlier. But I discovered a PR I pulled
which fixed for 2.067 broke chunk types reading.

My bad, forgot I decreased test image resolution to 256x256. I'm
totally out of the running. I have some serious work to do by the
looks.

Have you considered just being able to grab an object with changed
iteration order instead of actually doing the flip? The same goes for
transposes and 90ยบ rotations. Sure, sometimes you do need actually
rearrange the memory and in a subset of those cases you need it to be
done fast, but a lot of the time you're better off* just using a
different iteration scheme (which, for ranges, should probably be part
of the type to avoid checking the scheme every iteration).

*for speed and memory reasons. Need to keep the original and the
transpose? No need to for any duplicates

Note that this is what numpy does with transposes. The .T and .transpose
methods of ndarray don't actually modify the data, they just set the
memory order** whereas the transpose function actually moves memory
around.

**using a runtime flag, which is ok for them because internal iteration
lets you only branch once on it.

I've got it down to ~ 12ms using dmd now. But if the image was much
bigger (lets say a height of ushort.max). I wouldn't be able to use a
little trick. But this is only because I'm using multithreading.

That would be an insanely large image. If it was square it would be a
4GiB image. I think it's safe to say that someone with images that large
will be looking for quite specialised solutions and wouldn't be
disappointed if things aren't optimally fast off-the-shelf!

Most image editing software could definitely not handle it. I would be very surprised if e.g. libpng can even read such a file. Although I'm pretty sure mine can ;)

Worse case scenario for more than ushort.max I think it'll be a couple hundred ms.

Reply via email to