Folks, I just went through an performance comparison exercise and I thought
a summary of the results might be of interest here. A colleague is
converting some C++ code to C# to see if it's possible to maintain the
legacy high performance while enjoying the benefits of the managed world.
The core code reads from 1 to 15 text files line-by-line and parses the
contents of the lines which may look like these samples:

83;61;58;18;42;96;24;15;42;39
a1b1*0.333333333333333a2b1*0.333333333333333a3b1
a3b1*826;2*93;3*101a19b1*526;2*557;3*518

The input files often contain up to 1 million lines. Each parsed number is
used to update a cell in a large matrix that is typically hundreds wide or
high, but might be tens of thousands wide. So you can see that this is
mainly a CPU and memory intensive task. We know that most of the time is
taken in the tight loop parsing of millions of numbers out of the input
lines. I wrote a test harness that simulated the processing in C# and
discovered the following:

   - Release or Debug build made little difference.
   - Using compiled Regex slows by a factor of 5.
   - Using string Split slows by a factor of about 3.
   - Using Parallel.ForEach slows things slightly.
   - Using an unmanaged buffer with unsafe unchecked pointers slows things
   slightly.
   - The fastest way to parse the lines is with an index loop over the
   chars in the line string.

In a normal business app you would of course use Regex or string methods
for parsing because it's clear and maintainable, but in this case where
every millisecond counts I found that any FCL usage would blow-out the time
and only a for-loop was viable.

Parallelism is probably useless in this case because the processing on each
worker thread is just a blink, meaning the threading burden was heavier
than the processing it carried.

So it turns out that an old-fashioned C-style for-loop to manually parse
the lines is the fastest by a long-shot. It's fragile of course, but my
colleague has translated the old well-tested C++ code directly over to C#
(it's rather ugly). This whole scenario is rather unusual and not very
applicable to LOB apps, but I thought it was worth posting anyway.

Cheers,
*Greg Keogh*

[image: image.png]

Regex.Match(es)
Regex.Match(es) with Parallel Processing (PPL)
String Split
String Split with PPL
For-loop
For-loop with PPL
Plain file reads with no parsing (lowest baseline)

Reply via email to