There is a lot of room for improvement in the preprocessor. Quick benchmark on artificial 4Mb "shader" (16x concatenated Blender PBR shader) of several popular C-like preprocessors (I wanted to also add D's Warp, but didn't manage to compile it):
time mem page faults clang 3.8 0.11s 32Mb 3K gcc 5 0.063s 13Mb 1.3K tcc 0.067s 3Mb 0.4K glslangValidator 0.63s 25Mb 3K glcpp (Mesa) 0.36s 127Mb 31K glcpp+jemalloc 0.39s 182Mb 1K Not only glcpp is significantly slower than other C-like preprocessors (3x-6x slower), it allocates much more memory. This patch series improves the preprocessor in the following ways: 1. Print to exponentially growing string instead of using printf() and realloc() on each print. 2. Use Bloom filters to avoid excessive hash-table queries. 3. Create hand-written streamlined lexer/parser that bypasses flex/bison tokenization/printing for simple cases. This one is adds a lot of code, but it also greatly improves preprocessing speed. A few benchmarks. The same 16x concatenated Blender PBR shader: time mem page faults glcpp 0.36s 127Mb 31K glcpp-new 0.026s 13Mb 2.7K glcpp-new+jemalloc 0.026s 20Mb 1K A nice improvement both in speed and amount of used memory. More realistic test. Preprocessing my whole shader-db (more than 51K shaders from various Steam games) using shader-db's run and glcpp hybrid I hacked together: dumped from games default shader-db's collection Before 27.02s 0.52s After 2.09s 0.14s However, some games benefit very little from this series (Talos Principle 0.45s -> 0.2s, Serious Sam 0.53s -> 0.22s, to name a few). They are heavy users of preprocessor, and they hit non-optimized path. It's possible to improve them too streamlining skipping path of #if 0 ... #endif blocks. It's also possible to increase speed of the fast path using SIMD-optimizations (Clang for example uses SSE to skip multiline comments). The series passes all Mesa's preprocessor tests. The output and error output of the preprocessor after full shader-db's run is the same, including line numbers in errors and so on. The only difference that it generates a bit less trailing whitespace, but trailing whitespace doesn't really matter for preprocessor. Other preprocessors drop trailing whitespace entirely. Vladislav Egorov (12): glcpp: Print preprocessor output to string_buffer glcpp: Avoid unnecessary strcmp() glcpp: Use Bloom filter before identifier search glcpp: Use string_buffer for continuations removal ralloc: Avoid calling vsnprintf() twice ralloc: Use strnlen() inside of strncat() glcpp: Skip unnecessary line continuations removal glcpp: Use strpbrk in the line continuations pass glcpp: Avoid unnecessary linear_strdup glcpp/tests: Allow different trailing whitespace glcpp: Create fast path hand-written scanner glcpp: Substitute trivial macros in the fast path src/compiler/glsl/glcpp/glcpp-lex.l | 428 ++++++++++++++++++++++++++++++- src/compiler/glsl/glcpp/glcpp-parse.y | 149 ++++++----- src/compiler/glsl/glcpp/glcpp.h | 78 +++++- src/compiler/glsl/glcpp/pp.c | 242 +++++++++++++---- src/compiler/glsl/glcpp/tests/glcpp-test | 4 +- src/util/ralloc.c | 64 +++-- 6 files changed, 820 insertions(+), 145 deletions(-) -- 2.7.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev