On Monday, 14 September 2015 at 16:33:23 UTC, Rikki Cattermole wrote:

A lot of this hasn't been covered I believe.

http://dpaste.dzfl.pl/f7ab2915c3e1

1) You don't need to convert char[] to string via to. No. Too much. Cast it. 2) You don't need byKey, use foreach key, value syntax. That way you won't go around modifying things unnecessarily.

Ok, I disabled GC + reserved a bunch of memory. It probably won't help much actually. In fact may make it fail so keep that in mind.

Humm what else.

I'm worried about that first foreach. I don't think it needs to exist as it does. I believe an input range would be far better. Use a buffer to store the Hit[]'s. Have a subset per set of them.

If the first foreach is an input range, then things become slightly easier in the second. Now you can turn that into it's own input range. Also that .array usage concerns me. Many an allocation there! Hence why the input range should be the return from it.

The last foreach, is lets assume dummy. Keep in mind, stdout is expensive here. DO NOT USE. If you must buffer output then do it large quantities.


Based upon what I can see, you are definitely not able to use your cpu's to the max. There is no way that is the limiting factor here. Maybe your usage of a core is. But not the cpu's itself.

The thing is, you cannot use multiple threads on that first foreach loop to speed things up. No. That needs to happen all on one thread. Instead after that thread you need to push the result into another.

Perhaps, per thread one lock (mutex) + buffer for hits. Go round robin over all the threads. If mutex is still locked, you'll need to wait. In this situation a locked mutex means all you worker threads are working. So you can't do anything more (anyway).

Of course after all this, the HDD may still be getting hit too hard. In which case I would recommend you memory mapping it. Which should allow the OS to more efficiently handle reading it into memory. But you'll need to rework .byLine for that.


Wow that was a lot at 4:30am! So don't take it too seriously. I'm sure somebody else will rip that to shreds!

Thanks for your suggestions! That sure is a lot of details. I'll have to go through them carefully to understand what to do with all this. Going multithreaded sounds fun but would effectively kill of all of my spare time, so I might have to skip that. :)

Using char[] all around might be a good idea, but it doesn't seem like the string conversions are really that taxing. What are the arguments for working on char[] arrays rather than strings?

I'm aware that printing output like that is a performance killer, but it's not supposed to write anything in the final program. It's just there for me to be able to compare the results to my Python code.

Reply via email to