On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:

Greetings!

I have this code,

foreach (...)
{

  if (std.string.tolower(fext[0]) == "doc" ||
    std.string.tolower(fext[0]) == "docx" ||
    std.string.tolower(fext[0]) == "xls" ||
    std.string.tolower(fext[0]) == "xlsx" ||
    std.string.tolower(fext[0]) == "ppt" ||
    std.string.tolower(fext[0]) == "pptx")
   continue;
}

foreach (...)
{
  if (std.string.tolower(fext[0]) == "doc")
    continue;
  if (std.string.tolower(fext[0]) == "docx")
    continue;
  if (std.string.tolower(fext[0]) == "xls")
    continue;
  if (std.string.tolower(fext[0]) == "xlsx")
    continue;
  if (std.string.tolower(fext[0]) == "ppt")
    continue;
  if (std.string.tolower(fext[0]) == "pptx")
   continue;
  ...
  ...
}

thanks.

josé


Both are slow, those trying to optimize out tolower are missing the boat. A good compiler should optimize that out.

The issue is that n compare are done in those cases with each successive compare getting slower and slower. if the match is doc then it happens in one compare in both cases. If it's pptx then it is 6. So if there are many pptx matches then the code will perform slower than for doc.


This can be good or bad behavior depending on if you are guarantee that your order is representative of what actually happens(many doc's few pptx's).

To get O(1) behavior, you must use some sort of jump list. A hash table can provide such behavior. If there are many comparisons needed then at some point this will be, on average, faster.

If performance is critical and the string length is small enough then the fastest method is to use a direct lookup table.

For n character of char. It is 256^n. For n = 4, this is 4GB of memory! (This is one reason to use a hash table). Since you are not using the full ascii you could compact this a bit, say 36^n, for n = 4 it is only 1.6MB of memory. The cost is packing the strings for lookup and computing the index.

One way to speed up the loop is to also compare for what is not used. In this case you can compare just on the first char.


   if (fext[0][0]) == "d")
   {
      ...
   }
   if (fext[0][0]) == "x")
   {
      ...
   }
   if (fext[0][0]) == "p")
   {
      ...
   }

This pares down the results. In this case it is only 3 comparisions + 2 comparisons = 5 for pptx rather than 6.

You can try and develop your own hash to get optimal amortized performance.

Reply via email to