[Issue 14641] Use SIMD to accelerate comment lexing
https://issues.dlang.org/show_bug.cgi?id=14641 Iain Buclaw changed: What|Removed |Added Priority|P1 |P4 --
[Issue 14641] Use SIMD to accelerate comment lexing
https://issues.dlang.org/show_bug.cgi?id=14641 Walter Brightchanged: What|Removed |Added Keywords||SIMD --
Re: Use SIMD to accelerate comment lexing
On Friday, 5 June 2015 at 00:30:44 UTC, Walter Bright wrote: It's an interesting approach. I generally shoot for making the debug builds the fastest, because that's when people are in the edit-compile-debug loop. And the debug output needs line numbers :-) In C++, you would not need line number for header, even in debug mode, unless they contains various template and/or implementation, which I assume would be false in many cases.
Re: Use SIMD to accelerate comment lexing
On 6/4/2015 2:44 PM, deadalnix wrote: On Thursday, 4 June 2015 at 18:39:02 UTC, Walter Bright wrote: On 6/3/2015 7:05 PM, deadalnix wrote: On Wednesday, 3 June 2015 at 22:50:52 UTC, Walter Bright wrote: On 6/2/2015 5:45 PM, deadalnix wrote: You go though character and look for a '/'. When you hit one, you check if the character before it is a *, and if so, you have the end of the comment. There is obviously various edges cases to take into account, but that is the general idea. Line numbers have to be kept track of as well. They retrieve line number lazily when needed, with various mechanism to speedup the lookup. Hmm. There's no way to get the line number without counting LFs, and that means searching for them. Yes, the first time you query file number, clang build metadata about new line by going through the file's content and finding position of new lines. The process uses vector operation as well. Apparently, they think it is better to do that way for various reasons: - Position tracking is more compact (and position is embedded in all expression, declaration, and more) which reduce memory footprint bu quite a lot. - It makes the lexer simpler and faster. - You don't need to track new lines if you don't use them. If you don't emit debug infos in C++, and have no error, most line number are not used (not sure in D, because various language facilities like bound checking uses line number, but that is a win in C++). - Debug emission have some predictable access pattern, and algorithm to find line number from an offset in the file are special cased to handle it. - Finding new line can be vectorized on the whole file. t cannot be vectorized when done in // with lexing. Once again, I'm not sure this is a win in D, because we need line number more than in C++, but it seems to be a win in C++. It's an interesting approach. I generally shoot for making the debug builds the fastest, because that's when people are in the edit-compile-debug loop. And the debug output needs line numbers :-)
Re: Use SIMD to accelerate comment lexing
On Thursday, 4 June 2015 at 18:39:02 UTC, Walter Bright wrote: Hmm. There's no way to get the line number without counting LFs, and that means searching for them. It would be nice if it was that simple. EndOfLine: \u000D \u000A \u000D \u000A \u2028 \u2029 EndOfFile
Re: Use SIMD to accelerate comment lexing
On Thursday, 4 June 2015 at 18:39:02 UTC, Walter Bright wrote: On 6/3/2015 7:05 PM, deadalnix wrote: On Wednesday, 3 June 2015 at 22:50:52 UTC, Walter Bright wrote: On 6/2/2015 5:45 PM, deadalnix wrote: You go though character and look for a '/'. When you hit one, you check if the character before it is a *, and if so, you have the end of the comment. There is obviously various edges cases to take into account, but that is the general idea. Line numbers have to be kept track of as well. They retrieve line number lazily when needed, with various mechanism to speedup the lookup. Hmm. There's no way to get the line number without counting LFs, and that means searching for them. Yes, the first time you query file number, clang build metadata about new line by going through the file's content and finding position of new lines. The process uses vector operation as well. Apparently, they think it is better to do that way for various reasons: - Position tracking is more compact (and position is embedded in all expression, declaration, and more) which reduce memory footprint bu quite a lot. - It makes the lexer simpler and faster. - You don't need to track new lines if you don't use them. If you don't emit debug infos in C++, and have no error, most line number are not used (not sure in D, because various language facilities like bound checking uses line number, but that is a win in C++). - Debug emission have some predictable access pattern, and algorithm to find line number from an offset in the file are special cased to handle it. - Finding new line can be vectorized on the whole file. t cannot be vectorized when done in // with lexing. Once again, I'm not sure this is a win in D, because we need line number more than in C++, but it seems to be a win in C++.
Re: Use SIMD to accelerate comment lexing
On 6/4/2015 1:44 PM, Brian Schott wrote: It would be nice if it was that simple. EndOfLine: \u000D \u000A \u000D \u000A \u2028 \u2029 EndOfFile Yeah, you're right
Re: Use SIMD to accelerate comment lexing
On 6/3/2015 7:05 PM, deadalnix wrote: On Wednesday, 3 June 2015 at 22:50:52 UTC, Walter Bright wrote: On 6/2/2015 5:45 PM, deadalnix wrote: You go though character and look for a '/'. When you hit one, you check if the character before it is a *, and if so, you have the end of the comment. There is obviously various edges cases to take into account, but that is the general idea. Line numbers have to be kept track of as well. They retrieve line number lazily when needed, with various mechanism to speedup the lookup. Hmm. There's no way to get the line number without counting LFs, and that means searching for them.
Re: Use SIMD to accelerate comment lexing
On 6/2/2015 4:08 PM, Manu via Digitalmars-d wrote: I'll wear responsibility for this, but std.simd is proving really hard for me to finish. I think in order to get something in there to start with, I need to reduce the scope to the simplest bits, get them in, then build outwards. It's fairly large to cover everything I think is important, and there's a few tools missing still; I can't finish without some way to know the SIMD flags fed to the compiler from the command line (some standard versions?), and it's also difficult to resolve without forceinline of some sort. I've reached situations where the compiler(/s) just don't do what I want it to. Also, I think the stack of simd function influence the compilers inline heuristics, and even thought the compiler decides to inline simd functions, presence of many of them in an outer function seems to reduce the probability that the outer function will be inlined as it should. forceinline needs to be a hard statement to the compiler, and ideally, a forceinline call tree shouldn't improperly influence the compilers inline decisions for outer functions. I suggest not worrying about forceinline for the moment, and just write the code as if it existed. As an aside, I need a test environment for each compiler, targetting x86, x64 and arm at least, where I can submit some code, and have it run the unittests on a matrix of appropriate targets. (ideally PPC and MIPS would also be included, so they can influence design decisions.) Does any such test system exist? A web service to provide this would be invaluable... I don't have all those systems available to me. Just make it work on the machine you have, and prove it out on that machine. Then worry about porting it.
Re: Use SIMD to accelerate comment lexing
On 6/2/2015 5:45 PM, deadalnix wrote: Well, I discussed that with clang people a while ago and here are how they do it and their measurement : You go though character and look for a '/'. When you hit one, you check if the character before it is a *, and if so, you have the end of the comment. There is obviously various edges cases to take into account, but that is the general idea. You can find the code in Lexer::SkipBlockComment in clang/lib/Lex/Lexer.cpp Various benchmark on their side have shown that alignment is desirable before having vector operations to kick in. They used to have an AVX implementation, but it seems to be gone now, I'm not sure why at this stage. Line numbers have to be kept track of as well.
Re: Use SIMD to accelerate comment lexing
Am Wed, 3 Jun 2015 09:08:52 +1000 schrieb Manu via Digitalmars-d digitalmars-d@puremagic.com: As an aside, I need a test environment for each compiler, targetting x86, x64 and arm at least, where I can submit some code, and have it run the unittests on a matrix of appropriate targets. (ideally PPC and MIPS would also be included, so they can influence design decisions.) Does any such test system exist? A web service to provide this would be invaluable... I don't have all those systems available to me. There are qemu images for most architectures supported by debian: https://people.debian.org/~aurel32/qemu/ You'll need a very recent qemu but it works better than expected. (I hope to utilize these for cross compiler testing in the future. But someone first needs to implement cross compiler testing in the dmd test suite...)
Re: Use SIMD to accelerate comment lexing
On 2015-06-03 01:08, Manu via Digitalmars-d wrote: It's fairly large to cover everything I think is important, and there's a few tools missing still; I can't finish without some way to know the SIMD flags fed to the compiler from the command line (some standard versions?), and it's also difficult to resolve without forceinline of some sort. Isn't it possible to proceed without forceinline, to be able to finish the functionality. I understand that you think it's useless for performance reasons, but is it enough to get the functionality correct? As an aside, I need a test environment for each compiler, targetting x86, x64 and arm at least, where I can submit some code, and have it run the unittests on a matrix of appropriate targets. (ideally PPC and MIPS would also be included, so they can influence design decisions.) Does any such test system exist? A web service to provide this would be invaluable... I don't have all those systems available to me. Travis CI [1] can be used for x86-64, Linux and OS X. There's also a service that uses Windows for their hosts, but I can't remember the name right now. [1] https://travis-ci.org -- /Jacob Carlborg
Re: Use SIMD to accelerate comment lexing
On 3 June 2015 at 17:50, Jacob Carlborg via Digitalmars-d digitalmars-d@puremagic.com wrote: On 2015-06-03 01:08, Manu via Digitalmars-d wrote: It's fairly large to cover everything I think is important, and there's a few tools missing still; I can't finish without some way to know the SIMD flags fed to the compiler from the command line (some standard versions?), and it's also difficult to resolve without forceinline of some sort. Isn't it possible to proceed without forceinline, to be able to finish the functionality. I understand that you think it's useless for performance reasons, but is it enough to get the functionality correct? The codegen is everything. Functionality is the easy part here ;) Most things are already correct, but I can't confidently proof out the codegen. The main blocker though is that I don't know what simd level the user requested on the command line. The library has no idea what hardware features to target without explicit statement by the user. As an aside, I need a test environment for each compiler, targetting x86, x64 and arm at least, where I can submit some code, and have it run the unittests on a matrix of appropriate targets. (ideally PPC and MIPS would also be included, so they can influence design decisions.) Does any such test system exist? A web service to provide this would be invaluable... I don't have all those systems available to me. Travis CI [1] can be used for x86-64, Linux and OS X. There's also a service that uses Windows for their hosts, but I can't remember the name right now. I use travis. I was thinking smething more d-specific, along the lines of DPaste...
Re: Use SIMD to accelerate comment lexing
On 3 June 2015 at 11:28, Manu via Digitalmars-d digitalmars-d@puremagic.com wrote: On 3 June 2015 at 17:50, Jacob Carlborg via Digitalmars-d digitalmars-d@puremagic.com wrote: On 2015-06-03 01:08, Manu via Digitalmars-d wrote: It's fairly large to cover everything I think is important, and there's a few tools missing still; I can't finish without some way to know the SIMD flags fed to the compiler from the command line (some standard versions?), and it's also difficult to resolve without forceinline of some sort. Isn't it possible to proceed without forceinline, to be able to finish the functionality. I understand that you think it's useless for performance reasons, but is it enough to get the functionality correct? The codegen is everything. Functionality is the easy part here ;) Most things are already correct, but I can't confidently proof out the codegen. The main blocker though is that I don't know what simd level the user requested on the command line. The library has no idea what hardware features to target without explicit statement by the user. Well, the compiler knows whether the types are supported natively at least, and you can probe this information using CTFE.
Re: Use SIMD to accelerate comment lexing
On Wednesday, 3 June 2015 at 22:50:52 UTC, Walter Bright wrote: On 6/2/2015 5:45 PM, deadalnix wrote: Well, I discussed that with clang people a while ago and here are how they do it and their measurement : You go though character and look for a '/'. When you hit one, you check if the character before it is a *, and if so, you have the end of the comment. There is obviously various edges cases to take into account, but that is the general idea. You can find the code in Lexer::SkipBlockComment in clang/lib/Lex/Lexer.cpp Various benchmark on their side have shown that alignment is desirable before having vector operations to kick in. They used to have an AVX implementation, but it seems to be gone now, I'm not sure why at this stage. Line numbers have to be kept track of as well. They retrieve line number lazily when needed, with various mechanism to speedup the lookup.
Re: Use SIMD to accelerate comment lexing
On 2 June 2015 at 19:25, Jonathan M Davis via Digitalmars-d digitalmars-d@puremagic.com wrote: On Tuesday, 2 June 2015 at 17:24:09 UTC, Iain Buclaw wrote: On 2 June 2015 at 19:11, Jonathan M Davis via Digitalmars-d digitalmars-d@puremagic.com wrote: On Tuesday, 2 June 2015 at 15:08:07 UTC, Walter Bright wrote: Just make it work with one compiler on one platform, the most convenient one. We can extend it to others later. Plus, within a few months, we may have switched over to ddmd (hopefully), in which case, you can just do it the D way. And what would the D way be? We have simd in the language, so we can use that, whereas Manu was talking about how he would have to do it differently in C/C++. - Jonathan M Davis I was being deliberately quizzical because there are different takes on what you would call simd in the language, what set of types are available to you, what intrinsics are exposed (and how they are exposed), etc.
Re: Use SIMD to accelerate comment lexing
On Tuesday, 2 June 2015 at 17:54:38 UTC, Iain Buclaw wrote: I was being deliberately quizzical because there are different takes on what you would call simd in the language, what set of types are available to you, what intrinsics are exposed (and how they are exposed), etc. Well, Manu would know that a lot better than I would. I hadn't even heard of SIMD before he got Walter to put it into the language, and I have yet to use it, though I do think that I need to look into it at some point to see where I could take advantage of it. - Jonathan M Davis
Re: Use SIMD to accelerate comment lexing
On 6/2/15 5:27 AM, Manu via Digitalmars-d wrote: On 2 June 2015 at 05:39, Walter Bright via Digitalmars-d digitalmars-d@puremagic.com wrote: https://issues.dlang.org/show_bug.cgi?id=14641 Manu, our resident god of vector instructions, do you want to take this on? How do you measure this? Is there a convenient setup that will produce a realistic test environment? Probably building Phobos is a good baseline. If simple experiments show measurable speedup with Phobos, it's likely to be worth the effort. -- Andrei
Re: Use SIMD to accelerate comment lexing
On Tuesday, 2 June 2015 at 12:27:38 UTC, Manu wrote: On 2 June 2015 at 05:39, Walter Bright via Digitalmars-d digitalmars-d@puremagic.com wrote: https://issues.dlang.org/show_bug.cgi?id=14641 Manu, our resident god of vector instructions, do you want to take this on? How do you measure this? Is there a convenient setup that will produce a realistic test environment? This is more awkward in C than in D, it needs a different implementation for each compiler... will DMD's CI build with all common C compilers to prove that the implementations are correct? Well, I discussed that with clang people a while ago and here are how they do it and their measurement : You go though character and look for a '/'. When you hit one, you check if the character before it is a *, and if so, you have the end of the comment. There is obviously various edges cases to take into account, but that is the general idea. You can find the code in Lexer::SkipBlockComment in clang/lib/Lex/Lexer.cpp Various benchmark on their side have shown that alignment is desirable before having vector operations to kick in. They used to have an AVX implementation, but it seems to be gone now, I'm not sure why at this stage.
Re: Use SIMD to accelerate comment lexing
On Tuesday, 2 June 2015 at 18:20:51 UTC, Jonathan M Davis wrote: On Tuesday, 2 June 2015 at 17:54:38 UTC, Iain Buclaw wrote: I was being deliberately quizzical because there are different takes on what you would call simd in the language, what set of types are available to you, what intrinsics are exposed (and how they are exposed), etc. Well, Manu would know that a lot better than I would. I hadn't even heard of SIMD before he got Walter to put it into the language, and I have yet to use it, though I do think that I need to look into it at some point to see where I could take advantage of it. - Jonathan M Davis D's simd library is difficult to use in comparison with gcc or clang's extensions to C/C++. bye,
Re: Use SIMD to accelerate comment lexing
On 3 June 2015 at 07:18, weaselcat via Digitalmars-d digitalmars-d@puremagic.com wrote: On Tuesday, 2 June 2015 at 18:20:51 UTC, Jonathan M Davis wrote: On Tuesday, 2 June 2015 at 17:54:38 UTC, Iain Buclaw wrote: I was being deliberately quizzical because there are different takes on what you would call simd in the language, what set of types are available to you, what intrinsics are exposed (and how they are exposed), etc. Well, Manu would know that a lot better than I would. I hadn't even heard of SIMD before he got Walter to put it into the language, and I have yet to use it, though I do think that I need to look into it at some point to see where I could take advantage of it. - Jonathan M Davis D's simd library is difficult to use in comparison with gcc or clang's extensions to C/C++. bye, I'll wear responsibility for this, but std.simd is proving really hard for me to finish. I think in order to get something in there to start with, I need to reduce the scope to the simplest bits, get them in, then build outwards. It's fairly large to cover everything I think is important, and there's a few tools missing still; I can't finish without some way to know the SIMD flags fed to the compiler from the command line (some standard versions?), and it's also difficult to resolve without forceinline of some sort. I've reached situations where the compiler(/s) just don't do what I want it to. Also, I think the stack of simd function influence the compilers inline heuristics, and even thought the compiler decides to inline simd functions, presence of many of them in an outer function seems to reduce the probability that the outer function will be inlined as it should. forceinline needs to be a hard statement to the compiler, and ideally, a forceinline call tree shouldn't improperly influence the compilers inline decisions for outer functions. As an aside, I need a test environment for each compiler, targetting x86, x64 and arm at least, where I can submit some code, and have it run the unittests on a matrix of appropriate targets. (ideally PPC and MIPS would also be included, so they can influence design decisions.) Does any such test system exist? A web service to provide this would be invaluable... I don't have all those systems available to me.
Re: Use SIMD to accelerate comment lexing
On 2 June 2015 at 05:39, Walter Bright via Digitalmars-d digitalmars-d@puremagic.com wrote: https://issues.dlang.org/show_bug.cgi?id=14641 Manu, our resident god of vector instructions, do you want to take this on? How do you measure this? Is there a convenient setup that will produce a realistic test environment? This is more awkward in C than in D, it needs a different implementation for each compiler... will DMD's CI build with all common C compilers to prove that the implementations are correct?
Re: Use SIMD to accelerate comment lexing
On 6/2/2015 5:27 AM, Manu via Digitalmars-d wrote: On 2 June 2015 at 05:39, Walter Bright via Digitalmars-d digitalmars-d@puremagic.com wrote: https://issues.dlang.org/show_bug.cgi?id=14641 Manu, our resident god of vector instructions, do you want to take this on? How do you measure this? Time how long the compiler takes when running a non-optimized build. Is there a convenient setup that will produce a realistic test environment? Compile Phobos. This is more awkward in C than in D, it needs a different implementation for each compiler... will DMD's CI build with all common C compilers to prove that the implementations are correct? Just make it work with one compiler on one platform, the most convenient one. We can extend it to others later.
Re: Use SIMD to accelerate comment lexing
On Tuesday, 2 June 2015 at 15:08:07 UTC, Walter Bright wrote: Just make it work with one compiler on one platform, the most convenient one. We can extend it to others later. Plus, within a few months, we may have switched over to ddmd (hopefully), in which case, you can just do it the D way. - Jonathan M Davis
Re: Use SIMD to accelerate comment lexing
On 2 June 2015 at 19:11, Jonathan M Davis via Digitalmars-d digitalmars-d@puremagic.com wrote: On Tuesday, 2 June 2015 at 15:08:07 UTC, Walter Bright wrote: Just make it work with one compiler on one platform, the most convenient one. We can extend it to others later. Plus, within a few months, we may have switched over to ddmd (hopefully), in which case, you can just do it the D way. And what would the D way be?
Re: Use SIMD to accelerate comment lexing
On Tuesday, 2 June 2015 at 17:24:09 UTC, Iain Buclaw wrote: On 2 June 2015 at 19:11, Jonathan M Davis via Digitalmars-d digitalmars-d@puremagic.com wrote: On Tuesday, 2 June 2015 at 15:08:07 UTC, Walter Bright wrote: Just make it work with one compiler on one platform, the most convenient one. We can extend it to others later. Plus, within a few months, we may have switched over to ddmd (hopefully), in which case, you can just do it the D way. And what would the D way be? We have simd in the language, so we can use that, whereas Manu was talking about how he would have to do it differently in C/C++. - Jonathan M Davis
Use SIMD to accelerate comment lexing
https://issues.dlang.org/show_bug.cgi?id=14641 Manu, our resident god of vector instructions, do you want to take this on?
[Issue 14641] New: Use SIMD to accelerate comment lexing
https://issues.dlang.org/show_bug.cgi?id=14641 Issue ID: 14641 Summary: Use SIMD to accelerate comment lexing Product: D Version: D2 Hardware: All OS: All Status: NEW Severity: enhancement Priority: P1 Component: DMD Assignee: nob...@puremagic.com Reporter: bugzi...@digitalmars.com We encourage use of Ddoc to document functions. But this can result in voluminous comments, which slow down the lexer. Lexing comments can be accelerated by using SIMD vector instructions. A little inline assembler in the lexer.c dmd source code would implement this. --
[Issue 14641] Use SIMD to accelerate comment lexing
https://issues.dlang.org/show_bug.cgi?id=14641 briancsch...@gmail.com changed: What|Removed |Added CC||briancsch...@gmail.com --- Comment #1 from briancsch...@gmail.com --- The best way to do this that I've found is to skip everything other than a set of bytes that varies based on the comment being lexed: For /* */ comments: 0x0c (\n) 0x0d (\r) 0x2a (*) 0x2f (/) x0e2 (Beginning of multi-byte UTF-8 newline) For /+ +/ comments: 0x0c (\n) 0x0d (\r) 0x2b (+) 0x2f (/) x0e2 (Beginning of multi-byte UTF-8 newline) For // comments: 0x0c (\n) 0x0d (\r) x0e2 (Beginning of multi-byte UTF-8 newline) The instruction used in libdparse to do this is pcmpestri, which requires SSE4.2 (First released in 2008 according to wikipedia). My advice is to leave most of the logic intact and implement the assembly code such that it may advance the lexer 0 or more bytes, so that the rest of the algorithm is not disrupted on machines that don't support SSE4.2. --
Re: Use SIMD to accelerate comment lexing
On Monday, 1 June 2015 at 19:38:59 UTC, Walter Bright wrote: https://issues.dlang.org/show_bug.cgi?id=14641 Manu, our resident god of vector instructions, do you want to take this on? libdparse does this already. I added some information to that bug report that may be useful.
Re: Use SIMD to accelerate comment lexing
On 6/1/2015 1:18 PM, Brian Schott wrote: libdparse does this already. I added some information to that bug report that may be useful. Thank you!
Re: Use SIMD to accelerate comment lexing
On Monday, 1 June 2015 at 20:18:19 UTC, Brian Schott wrote: On Monday, 1 June 2015 at 19:38:59 UTC, Walter Bright wrote: https://issues.dlang.org/show_bug.cgi?id=14641 Manu, our resident god of vector instructions, do you want to take this on? libdparse does this already. I added some information to that bug report that may be useful. Honestly, I'm quite impressed with what I've heard you say about the lengths that you've gone to in order to optimize your lexer and parser. _Very_ cool. And all the better if dmd gets some of the same optimizations. - Jonathan M Davis
Re: Use SIMD to accelerate comment lexing
On Monday, 1 June 2015 at 19:38:59 UTC, Walter Bright wrote: https://issues.dlang.org/show_bug.cgi?id=14641 Manu, our resident god of vector instructions, do you want to take this on? Looking at that code, I would think that some well placed prefetch and Non Temporal move intrinsic's, would do more good then anything else. CPU speculative read ahead is not so smart as one would think it otta be.