On Oct 10, 2006, at 6:25 AM, Jochen Haerdtlein wrote:
I am a PhD student working on the extended use of expression templates for solving partial differential equations.

Since compile time of huge expression template programs is a severe problem

I fear that trying to solve that problem will result in orders of magnitude decrease in compilation speed at best, after you speed it up, as compared to new competing technologies for larger projects. The benefit is that those tricks are standard and can be relied upon. If you're up for innovation, identify those things that slow compilation to a crawl in traditional C++, design a system, say based upon a lisp system, that can `compile' that fast. Map that over to a more static C++ environment, and then implement that in g++ and then try it out on your problem. If it works, propose the system as extensions to the C++ language. I think C++ would benefit from it.

Let me give you a concrete example, take:

#include <stdio.h>

template <int I, int J>
class divides {
public:
  enum { answer = I%J == 0 || divides<I,J-1>::answer };
};

template<>
template <int I>
class divides<I, 1> {
public:
  enum { answer = 0 };
};

template <int I>
class is_prime {
public:
  enum { answer = divides<I,(I-1)/2+1>::answer == 0};
};

template <>
class is_prime<1> {
public:
  enum { answer = 0 };
};

template <>
class is_prime<2> {
public:
  enum { answer = 1 };
};

template <>
class is_prime<3> {
public:
  enum { answer = 1 };
};

main() {
#define T(N) printf("Is %d a prime, %s.\n", N, is_prime<N>::answer ? "yes" : "no");
  T(2);
  T(3);
  T(4);
  T(5);
  T(6);
  T(7);
  T(8);
  T(9);
  T(10);
  T(11);
  T(12);
  T(13);
  T(14);
  T(47797);
}

for example, it takes around 3m9.947s to compile on my system. The competition, seen at the end, clocks in at 1862x faster, around 0m0.102s to compile. The problem is that the second isn't guaranteed to be solved at compile time, while the above is. If this were the only problem to so fix, one could just change g++ to evaluate all constant expressions at compile time, no matter what and do all the required inlining for this, and then one could write the above program as the one at the end, and have the guarantee, as well as cleaner and clearer code, benefits all the way around.

Just as there exists an alternative way of doing the above, I'd expect there is an alternative way of doing what you want to do, that would be 2-3 orders of magnitude faster than the current compilation speed of the current solution and hopefully remain 1-2 orders faster _after_ you sped the g++ compilation speed up using more traditional techniques.

If this is secondary to your goal or if you want to remain true to the standard, you should be able to spot bad things like linear instantiation scans in the compiler and sped up those things in straight forward ways and gain 2-10x in speed on some problems (n == 5-200 type things). Just crank up the problem size and watch for the n^2 curve and when it dominates, get a gprof at that point, and take a look at what dominates and then fix it. The usual culprit will be a linear search of a list (vector) with n things on it, where that is done n times. When you see something related to templates, say (from cp-tree.h):

/* For a static member variable template, the
   DECL_TEMPLATE_INSTANTIATIONS list contains the explicitly and
   implicitly generated instantiations of the variable.

or

/* For a function template, the DECL_TEMPLATE_SPECIALIZATIONS lists
   contains all instantiations and specializations of the function,
   including partial instantiations.

you know you are about in the right place. If you wanted to statically look for these things, a quick grep or these two finds things like:

tree
maybe_process_partial_specialization (tree type)
{
[ ... ]
          for (t = DECL_TEMPLATE_INSTANTIATIONS
(most_general_template (CLASSTYPE_TI_TEMPLATE (type)));
               t; t = TREE_CHAIN (t))

and

      if (!class_specializations_p
          && TREE_CODE (DECL_TEMPLATE_RESULT (tmpl)) == TYPE_DECL)
        sp = &DECL_TEMPLATE_INSTANTIATIONS (tmpl);
      else
        sp = &DECL_TEMPLATE_SPECIALIZATIONS (tmpl);
      head = sp;
/* Iterate through the list until we find a matching template. */
      while (*sp != NULL_TREE)
        {

I'll leave it as an exercise for the reader what algorithm would best replace these sorts of things, or even if, for any particular one, it is possible to do better.

Hope that helps.

Anyway, here is the obvious version that compiles faster:


#include <stdio.h>

static int
divides(int I, int J)
{
  if (J == 1)
    return 0;
  return I%J == 0 || divides(I,J-1);
}

static int
is_prime(int I)
{
  if (I == 1)
    return 0;
  if (I == 2 || I == 3)
    return 1;
  return divides(I,(I-1)/2+1) == 0;
}

main() {
#define T(N) printf("Is %d a prime, %s.\n", N, is_prime(N) ? "yes" : "no");
  T(2);
  T(3);
  T(4);
  T(5);
  T(6);
  T(7);
  T(8);
  T(9);
  T(10);
  T(11);
  T(12);
  T(13);
  T(14);
  T(47797);
}

Reply via email to