Re: Software Assurance Reference Dataset
21-Jul-2014 02:29, Walter Bright пишет: On 7/20/2014 3:10 PM, Dmitry Olshansky wrote: The computed goto is faster for two reasons, according to the article: 1.The switch does a bit more per iteration because of bounds checking. Now let's consider proper implementation of thread-code interpreter. where *code pointer points to an array of addresses. We've been through this before and it turns out switch is slower because of an extra load. a) Switch does 1 load for opcode, 1 load for the jump table, 1 indirect jump to advance (not even counting bounds checking of the switch) b) Threaded-code via (say) computed goto does 1 load of opcode and 1 indirect jump, because opcode is an address already (so there is no separate jump table). True, but I'd like to find a way that this can be done as an optimization. I found a way but that relies on tail-call optimization, otherwise it would overflow stack. I would rather find some way that works without -O flag. In fact it brings another unrelated problem with Phobos: any template-heavy libraries have amazingly awful speeds w/o inlining optimization enabled _by the client_. It should be the same with C++ though. I'm certain that forced tail call would work just fine instead of computed goto for this scenario. In fact I've measured this with LDC and the results are promising but only work with -O2/-O3 (where tail call is optimized). I'd gladly dig them up if you are interested. I'm pretty reluctant to add language features that can be done as optimizations. The point is - software that only works in release build is kind of hard to develop, even more so with libraries. Thus I'm in opposition to labeling such things as optimization when they, in fact, change semantics. -- Dmitry Olshansky
Re: Software Assurance Reference Dataset
On 7/23/2014 1:06 PM, Dmitry Olshansky wrote: The point is - software that only works in release build is kind of hard to develop, even more so with libraries. Thus I'm in opposition to labeling such things as optimization when they, in fact, change semantics. -release, -inline, -O and -g are all separate flags for good reasons. You can mix and match as convenient for what you're trying to accomplish.
Re: Software Assurance Reference Dataset
24-Jul-2014 00:14, Walter Bright пишет: On 7/23/2014 1:06 PM, Dmitry Olshansky wrote: The point is - software that only works in release build is kind of hard to develop, even more so with libraries. Thus I'm in opposition to labeling such things as optimization when they, in fact, change semantics. -release, -inline, -O and -g are all separate flags for good reasons. You can mix and match as convenient for what you're trying to accomplish. Unless you are library writer, and then it's user who mixes and matches flags and you are foobared unless it works with them all :) -- Dmitry Olshansky
Re: Software Assurance Reference Dataset
On 7/23/2014 1:20 PM, Dmitry Olshansky wrote: Unless you are library writer, and then it's user who mixes and matches flags and you are foobared unless it works with them all :) I think it's a generally unworkable rabbit hole if we add language features to turn on/off various optimizations.
Re: Software Assurance Reference Dataset
24-Jul-2014 00:43, Walter Bright пишет: On 7/23/2014 1:20 PM, Dmitry Olshansky wrote: Unless you are library writer, and then it's user who mixes and matches flags and you are foobared unless it works with them all :) I think it's a generally unworkable rabbit hole if we add language features to turn on/off various optimizations. That's true. I'm thinking of how to solve things with minimal extension to the language that is generally useful (unlike compute-goto which is very niche and dangerous). Obviously optimizations won't solve this problem, with that said it would be awesome to have your 2 final switch enhancements. -- Dmitry Olshansky
Re: Software Assurance Reference Dataset
Dmitry Olshansky: (unlike compute-goto which is very niche and dangerous). It's a niche that system languages are the few fit for. Even CPython (and other interpreters) are using that feature. Bye, bearophile
Re: Software Assurance Reference Dataset
24-Jul-2014 01:05, bearophile пишет: Dmitry Olshansky: (unlike compute-goto which is very niche and dangerous). It's a niche that system languages are the few fit for. Even CPython (and other interpreters) are using that feature. All of fast portable interpreters i.e. not JITs. It's still a niche, and a feature should have uses beyond that or be better tailored for this use case. Forced tail call was my best shot - it's useful for functional programming and it works for fast interpreters. -- Dmitry Olshansky
Re: Software Assurance Reference Dataset
On 7/23/14, 1:14 PM, Walter Bright wrote: On 7/23/2014 1:06 PM, Dmitry Olshansky wrote: The point is - software that only works in release build is kind of hard to develop, even more so with libraries. Thus I'm in opposition to labeling such things as optimization when they, in fact, change semantics. -release, -inline, -O and -g are all separate flags for good reasons. You can mix and match as convenient for what you're trying to accomplish. ... and -noboundscheck. -- Andrei
Re: Software Assurance Reference Dataset
On 7/23/2014 1:53 PM, Dmitry Olshansky wrote: Obviously optimizations won't solve this problem, with that said it would be awesome to have your 2 final switch enhancements. Yes. Feel free to ping me if I forget about them. Also, I particularly like the first of them because it's not available to C/C++.
Re: Software Assurance Reference Dataset
On 7/23/14, 2:41 PM, Walter Bright wrote: On 7/23/2014 1:53 PM, Dmitry Olshansky wrote: Obviously optimizations won't solve this problem, with that said it would be awesome to have your 2 final switch enhancements. Yes. Feel free to ping me if I forget about them. bugzilla -- Andrei
Re: Software Assurance Reference Dataset
On 7/23/2014 2:42 PM, Andrei Alexandrescu wrote: On 7/23/14, 2:41 PM, Walter Bright wrote: On 7/23/2014 1:53 PM, Dmitry Olshansky wrote: Obviously optimizations won't solve this problem, with that said it would be awesome to have your 2 final switch enhancements. Yes. Feel free to ping me if I forget about them. bugzilla -- Andrei They're already there: https://issues.dlang.org/show_bug.cgi?id=13169 https://issues.dlang.org/show_bug.cgi?id=13170 But there's so much in bugzilla I often overlook things.
Re: Software Assurance Reference Dataset
On 07/22/2014 05:02 AM, Walter Bright wrote: On 7/21/2014 3:25 PM, Timon Gehr wrote: The example just uses the ST Monad which is quite similar to weakly pure statements in D. D doesn't have weakly pure statements - it has weakly pure functions. The type checker distinguishes between statements that are allowed in 'pure'-qualified functions and statements that are not.
Re: Software Assurance Reference Dataset
On Monday, 21 July 2014 at 22:10:06 UTC, bearophile wrote: Tobias Müller: Wouldn't it be more useful to have a modified/annotated return statement for that? I don't know. I see what you're saying, and it works for the quick sort example. I'm not sure it covers all tailrec cases, but it seems useful.
Re: Software Assurance Reference Dataset
My understanding is that it can be done but only with annotations or whole program analysis. I think that's true but you don't necessarily have to annotate every function. a) possibly there's something interesting to do at the module level. I think more than one thing. E.g. A module that doesn't have any callbacks in its interface is 'interesting'. E.g. 'Layering' of modules. b) Some situations are particularly dangerous and so a function annotation could be encouraged for those. E.g. If you have a recursive function without tail recursion, and the possible recursion depth is substantial, then while it is deep in its recursion, it should limit what other functions it calls. Someone could come along later and add a logging statement to it, which usually isn't dangerous but here it could be. Quick sort is an instructive example because it has the security weakness that, although you expect the stack depth to typically be O(log n), an attacker in control of the input can force it to be O(n). Of course with tail recursion that doesn't threaten stack overflow, but it suggests that there are recursion cases we think are safe, and typically don't fall over, but are actually vulnerable. Which means if we don't feel like annotating them in defense, we're being irresponsible in a way.
Re: Software Assurance Reference Dataset
On 7/21/14, 3:22 PM, Timon Gehr wrote: On 07/21/2014 06:40 AM, Andrei Alexandrescu wrote: On 7/20/14, 8:15 PM, Timon Gehr wrote: On 07/20/2014 10:50 PM, Walter Bright wrote: On 7/20/2014 3:27 AM, Dmitry Olshansky wrote: Functional programming is full of simple recursion and it would be nice not to stack overflow in debug builds. Traditional FP languages don't have loops, and so must do recursion. Uh... D has loops, even in pure functions, So does Haskell. import Control.Monad import Control.Monad.ST import Data.STRef factorial :: Integer - Integer factorial n = runST $ do r - newSTRef 1 forM_ [1..n] $ \i- modifySTRef r (*i) readSTRef r main = print $ factorial 5 -- 120 And that doesn't look awkward at all :o). -- Andrei Indeed. Code sucks. It's its own destruction. -- Andrei
Re: Software Assurance Reference Dataset
On 07/22/2014 05:49 PM, Andrei Alexandrescu wrote: On 7/21/14, 3:22 PM, Timon Gehr wrote: On 07/21/2014 06:40 AM, Andrei Alexandrescu wrote: On 7/20/14, 8:15 PM, Timon Gehr wrote: On 07/20/2014 10:50 PM, Walter Bright wrote: ... D has loops, even in pure functions, So does Haskell. import Control.Monad import Control.Monad.ST import Data.STRef factorial :: Integer - Integer factorial n = runST $ do r - newSTRef 1 forM_ [1..n] $ \i- modifySTRef r (*i) readSTRef r main = print $ factorial 5 -- 120 And that doesn't look awkward at all :o). -- Andrei Indeed. Code sucks. It's its own destruction. -- Andrei It is easy to understand and works correctly, but of course this is not a canonical implementation of factorial.
Re: Software Assurance Reference Dataset
Timon Gehr: It is easy to understand and It's easy to understand because you are very intelligent Timon :-) Bye, bearophile
Re: Software Assurance Reference Dataset
On 7/20/2014 8:15 PM, Timon Gehr wrote: So does Haskell. import Control.Monad import Control.Monad.ST import Data.STRef factorial :: Integer - Integer factorial n = runST $ do r - newSTRef 1 forM_ [1..n] $ \i- modifySTRef r (*i) readSTRef r Would you rather write that or: int factorial(int i) pure { auto r = 1; foreach (i; 1..n + 1) r *= i; return r; } And I'd love to see what native code is generated for the Haskell example.
Re: Software Assurance Reference Dataset
Walter Bright: Would you rather write that or: How much experience do you have in writing Haskell programs longer than 50 lines of code? I have a large respect for your experience and your intelligence, but be careful with comments like that, to not show just your ignorance... If you are a Haskell programmer you usually don't write code like that, so in practice it's not a problem. And even if sometimes you have to write awkward code, the Haskell programmers can assure you that on average the advantages given by Haskell purity far outweighs the disadvantages. On overall I prefer D over Haskell, but if you want to criticize Haskell you need much much better arguments and way bigger cannons :-) Bye, bearophile
Re: Software Assurance Reference Dataset
On Tue, Jul 22, 2014 at 11:34:20AM -0700, Walter Bright via Digitalmars-d wrote: On 7/20/2014 8:15 PM, Timon Gehr wrote: So does Haskell. import Control.Monad import Control.Monad.ST import Data.STRef factorial :: Integer - Integer factorial n = runST $ do r - newSTRef 1 forM_ [1..n] $ \i- modifySTRef r (*i) readSTRef r Would you rather write that or: int factorial(int i) pure { auto r = 1; foreach (i; 1..n + 1) r *= i; return r; } And I'd love to see what native code is generated for the Haskell example. I'd rather write: int factorial(int n) pure { return reduce!((a,b) = a*b)(1, iota(2, n)); } ;-) With gdc -O3, this produces almost the same code as your looped version. But I can barely read the disassembly, because it appears that gdc -O3 triggers aggressive vectorization, so there are tons of instructions manipulating xmm registers and the loop is unrolled by 8 iterations. I have this lurking suspicion that this may actually perform more poorly than the naïve version for small values of n. :-) DMD, with -release -O -inline, was unable to inline the call to reduce, but was at least able to optimize reduce+iota into the equivalent of a simple foreach loop. So it looks quite promising. We should push for more aggressive optimizations of functional-style constructs in D; I think there's a lot of potential here. T -- My program has no bugs! Only unintentional features...
Re: Software Assurance Reference Dataset
On 7/22/2014 11:44 AM, bearophile wrote: Walter Bright: Would you rather write that or: How much experience do you have in writing Haskell programs longer than 50 lines of code? Zero. I have a large respect for your experience and your intelligence, but be careful with comments like that, to not show just your ignorance... If you are a Haskell programmer you usually don't write code like that, so in practice it's not a problem. Does that imply you agree with my earlier point, then, that D doesn't need tail recursion because it can write loops instead? And even if sometimes you have to write awkward code, the Haskell programmers can assure you that on average the advantages given by Haskell purity far outweighs the disadvantages. On overall I prefer D over Haskell, but if you want to criticize Haskell you need much much better arguments and way bigger cannons :-) The point was not at all to criticize Haskell. The point was that D does not need tail recursion because D supports writing loop constructs. Showing how loops could be written using awkward monads in Haskell that Haskell programmers wouldn't write does not change that point.
Re: Software Assurance Reference Dataset
On Tuesday, 22 July 2014 at 19:12:31 UTC, H. S. Teoh via Digitalmars-d wrote: I'd rather write: int factorial(int n) pure { return reduce!((a,b) = a*b)(1, iota(2, n)); } ;-) This is how we fucked UFCS.
Re: Software Assurance Reference Dataset
On 07/22/2014 08:34 PM, Walter Bright wrote: On 7/20/2014 8:15 PM, Timon Gehr wrote: So does Haskell. import Control.Monad import Control.Monad.ST import Data.STRef factorial :: Integer - Integer factorial n = runST $ do r - newSTRef 1 forM_ [1..n] $ \i- modifySTRef r (*i) readSTRef r Would you rather write that or: int factorial(int i) pure { auto r = 1; foreach (i; 1..n + 1) r *= i; return r; } ... If I fix the typo, this computes the same numbers for inputs int.min up to and including 12. And I'd love to see what native code is generated for the Haskell example. Well, the _actually_ corresponding D code using BigInt and DMD is significantly slower than what GHC generates here. In any case, I don't think that this kind of mutation-heavy code is a core focus of GHC, so the assembly will probably not be nearly as well optimized as it could be. But if you'd like to compare the offers of D and Haskell a little further, consider the following cute code, which I wrote in 5 minutes and which computes the 100th natural number that is a product of powers of 2, 3 and 5 well below a second on my machine: merge :: Ord a = [a] - [a] - [a] merge xs [] = xs merge [] xs = xs merge xxs@(x:xs) yys@(y:ys) = case compare x y of LT - x : merge xs yys EQ - x : merge xs ys GT - y : merge xxs ys hamming :: [Integer] hamming = 1 : merge(map (*2) hamming) (merge (map (*3) hamming) (map (*5) hamming)) main = print $ last $ take 100 hamming 51931278044838873608958984375000 What do you think would be the corresponding idiomatic/fast D code? Does it outperform the Haskell code?
Re: Software Assurance Reference Dataset
On 07/22/2014 09:42 PM, Walter Bright wrote: On overall I prefer D over Haskell, but if you want to criticize Haskell you need much much better arguments and way bigger cannons :-) The point was not at all to criticize Haskell. The point was that D does not need tail recursion because D supports writing loop constructs. Tail call support is still useful. Looping is not the main reason for supporting tail calls. Eg. a tail call might be indirect and only sometimes recurse on the same function and sometimes call another function. Showing how loops could be written using awkward The point is not to criticize? :-) monads in Haskell that Haskell programmers wouldn't write does not change that point. Sure, that part of the discussion grew out of the claim that there are no loops there.
Re: Software Assurance Reference Dataset
On 07/22/2014 11:23 PM, Timon Gehr wrote: Looping is not the main reason for supporting tail calls. (In D!)
Re: Software Assurance Reference Dataset
Timon Gehr: What do you think would be the corresponding idiomatic/fast D code? Does it outperform the Haskell code? Look in RosettaCode for various implementations in D: http://rosettacode.org/wiki/Hamming_numbers#D A good part of the performance difference is caused by the D GC and the fact that Haskell GHC uses a the GNU multiprecision libs for its bigsint. In the meantime we have to maintain, optimize and debug our bigint implementation, wasting time better spent elsewhere. Bye, bearophile
Re: Software Assurance Reference Dataset
On 7/22/2014 2:23 PM, Timon Gehr wrote: On 07/22/2014 09:42 PM, Walter Bright wrote: Showing how loops could be written using awkward The point is not to criticize? :-) Yes. If you want to infer criticism of Haskell from that, go ahead, but none was intended. You can write awkward code in any language. monads in Haskell that Haskell programmers wouldn't write does not change that point. Sure, that part of the discussion grew out of the claim that there are no loops there. Bending a language over backwards to be pedantic is pointless for this discussion. Nobody claims Obfuscated C Contest entries are best (or even suggested) practices.
Re: Software Assurance Reference Dataset
On 7/22/2014 2:29 PM, bearophile wrote: A good part of the performance difference is caused by the D GC and the fact that Haskell GHC uses a the GNU multiprecision libs for its bigsint. In the meantime we have to maintain, optimize and debug our bigint implementation, wasting time better spent elsewhere. I.e. the perf difference has nothing to do with Haskell or tail recursion.
Re: Software Assurance Reference Dataset
On 7/22/2014 2:08 PM, Timon Gehr wrote: And I'd love to see what native code is generated for the Haskell example. Well, the _actually_ corresponding D code using BigInt and DMD is significantly slower than what GHC generates here. Irrelevant to the looping issue - comparing the code gen of D's BigInt to Haskell's BigInt is not the issue here.
Re: Software Assurance Reference Dataset
On 07/23/2014 12:14 AM, Walter Bright wrote: Sure, that part of the discussion grew out of the claim that there are no loops there. Bending a language over backwards Not what happened. to be pedantic Being pedantic is important enough. is pointless for this discussion. If one doesn't want pointless pedantry my suggestion would be to not make wrong claims about apparently irrelevant things, or to ignore the rebuttal to communicate silent agreement. Nobody claims Obfuscated C Contest entries are best (or even suggested) practices. Irrelevant. I'll leave here.
Re: Software Assurance Reference Dataset
On Tuesday, 22 July 2014 at 21:23:33 UTC, Timon Gehr wrote: On 07/22/2014 09:42 PM, Walter Bright wrote: The point was not at all to criticize Haskell. The point was that D does not need tail recursion because D supports writing loop constructs. Tail call support is still useful. Looping is not the main reason for supporting tail calls. Eg. a tail call might be indirect and only sometimes recurse on the same function and sometimes call another function. Indeed, I recall Matthias Felleisen wrote that OOP makes no sense without TCO because methods will indirectly call each other. See https://blogs.oracle.com/jrose/entry/tail_calls_in_the_vm#comment-1259314984000 http://www.eighty-twenty.org/index.cgi/tech/oo-tail-calls-20111001.html While I don't care much for OOP, D is supposed to support that style.
Re: Software Assurance Reference Dataset
On Sunday, 20 July 2014 at 06:06:16 UTC, bearophile wrote: Andrew Godfrey: 2) Annotations about when a function does not expect re-entrancy to be possible based on call-graph analysis. I don't understand. Assuming I know tAhis (http://en.wikipedia.org/wiki/Reentrancy_%28computing%29 ) can you show an example? Sorry, by re-entrancy I just meant a statement of whether it is possible for the function to be re-entered (not whether it would be correct to do so). Thanks for the reminder, so I can exclude the complication of ISR's from my point. In the codebase I work on - application code - there are no ISR's, but a common cause of unexpected re-entrancy we have is: functions that call a Windows API which pumps messages. This produces unexpectedly deep call stacks and threatens stack overflow. But that's a complicated case that may not be feasible for the compiler to help with. So, a simpler example... The general idea here is that you've convinced yourself you need to do something which stresses usage of stack space, and you believe it is safe to do because of *something*. Ideally you can express that something so that if anyone tries to change it later, the compiler catches that. So, for example you've allocated a buffer on the stack, and your function 'A' builds up data into it, then calls another function 'X' to 'send' that data in some way. Function 'A' definitely doesn't expect to be re-entered. (It may or may not be re-entrant, in the definition of the wiki page. Doesn't matter for this.) Now the first step probably doesn't need language help: call it assertIAmNotReentered(). I think any solution probably still needs this as a backup. I say assert (i.e. runtime check that you'd remove in a release build) because I'm assuming that if you use it a lot you wouldn't want to pay for all those checks in a release build. Now: Suppose 'X' is a Phobos function, and 'A' doesn't call anything else. Then it seems feasible (but a lot of work) to provide an annotation (or combination of annotations) that mean: Compiler, I am not called by an ISR. You don't have to verify that. But given that, I claim that I am not re-entered. Please verify. This is a lot of work - and potentially error-prone - because when Phobos calls the C runtime or OS API's, humans would need to make decisions about whether those functions 'may callback'. For this reason I don't think we'd want to eliminate assertIAmNotReentered(). By this process, Phobos functions which may call a 'callback' back into user code (whether directly or indirectly) would be distinguished from ones which cannot (let's call it NoCallback, since I can't think of a non-terrible name right now). And then the annotation above can be enforced at least in those terms (i.e. the compiler ensures that 'X' only calls NoCallback-annotated functions). I think this is valuable, and not just to avoid stack overflow. The message-pumping case often violates higher-level correctness.
Re: Software Assurance Reference Dataset
On 7/20/2014 3:27 AM, Dmitry Olshansky wrote: Functional programming is full of simple recursion and it would be nice not to stack overflow in debug builds. Traditional FP languages don't have loops, and so must do recursion. D has loops, even in pure functions, there's no reason not to use them. Another use case is so-called threaded code interpreter, that can be done with either computed gotos (and bunch of labels) or forced tail calls (and bunch of functions). In both cases computed jump or tail call is indirect. http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/ The computed goto is faster for two reasons, according to the article: 1.The switch does a bit more per iteration because of bounds checking. 2.The effects of hardware branch prediction. (1) is eliminated with final switch. I know this optimization is not done with dmd, but D doesn't need a new language feature because of that. (2) with a bit more work, this could also be implemented as a compiler optimization. Both of these are worthy of bugzilla enhancement requests. I'd much rather implement improvements as better code generation rather than more language features.
Re: Software Assurance Reference Dataset
On 7/20/2014 1:50 PM, Walter Bright wrote: Both of these are worthy of bugzilla enhancement requests. Amazingly, these suddenly appeared: https://issues.dlang.org/show_bug.cgi?id=13169 https://issues.dlang.org/show_bug.cgi?id=13170
Re: Software Assurance Reference Dataset
Walter Bright: https://issues.dlang.org/show_bug.cgi?id=13169 https://issues.dlang.org/show_bug.cgi?id=13170 Are such optimizations portable and guaranteed on all D compilers? If the answer is negative, then they can't replace a _standard_ D syntax for computed gotos. Bye, bearophile
Re: Software Assurance Reference Dataset
On 7/20/2014 3:10 PM, Dmitry Olshansky wrote: The computed goto is faster for two reasons, according to the article: 1.The switch does a bit more per iteration because of bounds checking. Now let's consider proper implementation of thread-code interpreter. where *code pointer points to an array of addresses. We've been through this before and it turns out switch is slower because of an extra load. a) Switch does 1 load for opcode, 1 load for the jump table, 1 indirect jump to advance (not even counting bounds checking of the switch) b) Threaded-code via (say) computed goto does 1 load of opcode and 1 indirect jump, because opcode is an address already (so there is no separate jump table). True, but I'd like to find a way that this can be done as an optimization. I'm certain that forced tail call would work just fine instead of computed goto for this scenario. In fact I've measured this with LDC and the results are promising but only work with -O2/-O3 (where tail call is optimized). I'd gladly dig them up if you are interested. I'm pretty reluctant to add language features that can be done as optimizations.
Re: Software Assurance Reference Dataset
21-Jul-2014 00:50, Walter Bright пишет: On 7/20/2014 3:27 AM, Dmitry Olshansky wrote: Functional programming is full of simple recursion and it would be nice not to stack overflow in debug builds. Traditional FP languages don't have loops, and so must do recursion. D has loops, even in pure functions, there's no reason not to use them. Another use case is so-called threaded code interpreter, that can be done with either computed gotos (and bunch of labels) or forced tail calls (and bunch of functions). In both cases computed jump or tail call is indirect. http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/ Actually that blog entry is wrong about it. Computed goto won't help when used as direct replacement of switch and you are correct in that the said code could be easily optimized with final switch. What would help a lot is thread-code interpreter, that is bytecode contains direct addresses used in computed goto. The computed goto is faster for two reasons, according to the article: 1.The switch does a bit more per iteration because of bounds checking. Now let's consider proper implementation of thread-code interpreter. where *code pointer points to an array of addresses. We've been through this before and it turns out switch is slower because of an extra load. a) Switch does 1 load for opcode, 1 load for the jump table, 1 indirect jump to advance (not even counting bounds checking of the switch) b) Threaded-code via (say) computed goto does 1 load of opcode and 1 indirect jump, because opcode is an address already (so there is no separate jump table). I'm certain that forced tail call would work just fine instead of computed goto for this scenario. In fact I've measured this with LDC and the results are promising but only work with -O2/-O3 (where tail call is optimized). I'd gladly dig them up if you are interested. -- Dmitry Olshansky
Re: Software Assurance Reference Dataset
On 7/20/2014 2:38 PM, bearophile wrote: Are such optimizations portable and guaranteed on all D compilers? If the answer is negative, then they can't replace a _standard_ D syntax for computed gotos. C'mon, bearophile. Optimizations are always implementation dependent. People rely on them for fast code. This isn't any different.
Re: Software Assurance Reference Dataset
On 07/20/2014 10:50 PM, Walter Bright wrote: On 7/20/2014 3:27 AM, Dmitry Olshansky wrote: Functional programming is full of simple recursion and it would be nice not to stack overflow in debug builds. Traditional FP languages don't have loops, and so must do recursion. Uh... D has loops, even in pure functions, So does Haskell. import Control.Monad import Control.Monad.ST import Data.STRef factorial :: Integer - Integer factorial n = runST $ do r - newSTRef 1 forM_ [1..n] $ \i- modifySTRef r (*i) readSTRef r main = print $ factorial 5 -- 120 there's no reason not to use them. ... But of course there are reasons to use tail calls.
Re: Software Assurance Reference Dataset
On 7/20/2014 8:15 PM, Timon Gehr wrote: On 07/20/2014 10:50 PM, Walter Bright wrote: On 7/20/2014 3:27 AM, Dmitry Olshansky wrote: Functional programming is full of simple recursion and it would be nice not to stack overflow in debug builds. Traditional FP languages don't have loops, and so must do recursion. Uh... D has loops, even in pure functions, So does Haskell. import Control.Monad Uh-oh! Monad! :-)
Re: Software Assurance Reference Dataset
On 7/20/14, 8:15 PM, Timon Gehr wrote: On 07/20/2014 10:50 PM, Walter Bright wrote: On 7/20/2014 3:27 AM, Dmitry Olshansky wrote: Functional programming is full of simple recursion and it would be nice not to stack overflow in debug builds. Traditional FP languages don't have loops, and so must do recursion. Uh... D has loops, even in pure functions, So does Haskell. import Control.Monad import Control.Monad.ST import Data.STRef factorial :: Integer - Integer factorial n = runST $ do r - newSTRef 1 forM_ [1..n] $ \i- modifySTRef r (*i) readSTRef r main = print $ factorial 5 -- 120 And that doesn't look awkward at all :o). -- Andrei
Re: Software Assurance Reference Dataset
On Sunday, 20 July 2014 at 07:27:36 UTC, Walter Bright wrote: What about the @continuation (http://en.wikipedia.org/wiki/Continuation-passing_style )? I doubt they'll want to use that attribute, either. Especially if that can be done with AST macros :D
Re: Software Assurance Reference Dataset
Andrew Godfrey x...@y.com wrote: 1) A function annotation that means I will call myself recursively, and when I do, I expect the tail recursion optimization. I have seen code which allocates something big on the stack and depends on the optimization. So this intent should be expressible. Wouldn't it be more useful to have a modified/annotated return statement for that? Tail-recursiveness is an implementation detail, for the user of the function it's not really interesting. Except for the fact that it has bounded stack size which is a useful property by itself and not only for tailrecursive functions. Tobi
Re: Software Assurance Reference Dataset
Tobias Müller: Wouldn't it be more useful to have a modified/annotated return statement for that? I don't know. Tail-recursiveness is an implementation detail, for the user of the function it's not really interesting. Yes, in theory a @tailrec annotation doesn't change the mangling of the function. But D has exceptions and stack traces. A stack trace for an annotated function is different because there are less stack frames... Except for the fact that it has bounded stack size which is a useful property by itself and not only for tailrecursive functions. Yes, that's why I have said that a @continuation is a more general solution than @tailrec. Bye, bearophile
Re: Software Assurance Reference Dataset
On 07/21/2014 06:40 AM, Andrei Alexandrescu wrote: On 7/20/14, 8:15 PM, Timon Gehr wrote: On 07/20/2014 10:50 PM, Walter Bright wrote: On 7/20/2014 3:27 AM, Dmitry Olshansky wrote: Functional programming is full of simple recursion and it would be nice not to stack overflow in debug builds. Traditional FP languages don't have loops, and so must do recursion. Uh... D has loops, even in pure functions, So does Haskell. import Control.Monad import Control.Monad.ST import Data.STRef factorial :: Integer - Integer factorial n = runST $ do r - newSTRef 1 forM_ [1..n] $ \i- modifySTRef r (*i) readSTRef r main = print $ factorial 5 -- 120 And that doesn't look awkward at all :o). -- Andrei Indeed. This just creates a variable, modifies it 'n' times in a loop and then reads the result.
Re: Software Assurance Reference Dataset
On 07/21/2014 06:56 AM, Walter Bright wrote: On 7/20/2014 8:15 PM, Timon Gehr wrote: On 07/20/2014 10:50 PM, Walter Bright wrote: On 7/20/2014 3:27 AM, Dmitry Olshansky wrote: Functional programming is full of simple recursion and it would be nice not to stack overflow in debug builds. Traditional FP languages don't have loops, and so must do recursion. Uh... D has loops, even in pure functions, So does Haskell. import Control.Monad Uh-oh! Monad! :-) The example just uses the ST Monad which is quite similar to weakly pure statements in D.
Re: Software Assurance Reference Dataset
Andrei Alexandrescu: And that doesn't look awkward at all :o). -- Andrei A related thread: http://stackoverflow.com/questions/6622524/why-is-haskell-sometimes-referred-to-as-best-imperative-language Bye, bearophile
Re: Software Assurance Reference Dataset
On 07/22/2014 12:10 AM, bearophile wrote: Except for the fact that it has bounded stack size which is a useful property by itself and not only for tailrecursive functions. Yes, that's why I have said that a @continuation is a more general solution than @tailrec. Bye, bearophile (Continuations and a tail calls are not the same kind of thing.) http://en.wikipedia.org/wiki/Continuation#First-class_continuations http://en.wikipedia.org/wiki/Delimited_continuation http://en.wikipedia.org/wiki/Tail_call
Re: Software Assurance Reference Dataset
On 7/21/2014 3:25 PM, Timon Gehr wrote: The example just uses the ST Monad which is quite similar to weakly pure statements in D. D doesn't have weakly pure statements - it has weakly pure functions.
Re: Software Assurance Reference Dataset
Andrew Godfrey: 1) A function annotation that means I will call myself recursively, and when I do, I expect the tail recursion optimization. I have seen code which allocates something big on the stack and depends on the optimization. So this intent should be expressible. A @tailrec annotation seems good, and will improve the functional usages of D. It should stop compilation with a error if the function doesn't call itself or if the compiler is not able to remove the recursive call. This means it has to be fully enforced. 2) Annotations about when a function does not expect re-entrancy to be possible based on call-graph analysis. I don't understand. Assuming I know this (http://en.wikipedia.org/wiki/Reentrancy_%28computing%29 ) can you show an example? Bye, bearophile
Re: Software Assurance Reference Dataset
Andrew Godfrey: 1) A function annotation that means I will call myself recursively, and when I do, I expect the tail recursion optimization. I have seen code which allocates something big on the stack and depends on the optimization. So this intent should be expressible. A @tailrec annotation seems good, and will improve the functional usages of D. It should stop compilation with a error if the function doesn't call itself or if the compiler is not able to remove the recursive call. This means it has to be fully enforced. Perhaps a @cps (or @continuation) annotation is better and more general. Bye, bearophile
Re: Software Assurance Reference Dataset
On 7/19/2014 11:06 PM, bearophile wrote: A @tailrec annotation seems good, and will improve the functional usages of D. It should stop compilation with a error if the function doesn't call itself or if the compiler is not able to remove the recursive call. This means it has to be fully enforced. If you want to guarantee replacement of a recursive call with a loop, just write a loop.
Re: Software Assurance Reference Dataset
Walter Bright: If you want to guarantee replacement of a recursive call with a loop, just write a loop. There are cases where a recursive algorithm is nicer. And people that want to use D functionally, may also be used to writing code recursively. What about the @continuation (http://en.wikipedia.org/wiki/Continuation-passing_style )? Bye, bearophile
Re: Software Assurance Reference Dataset
On 7/19/2014 11:55 PM, bearophile wrote: Walter Bright: If you want to guarantee replacement of a recursive call with a loop, just write a loop. There are cases where a recursive algorithm is nicer. And people that want to use D functionally, may also be used to writing code recursively. I doubt they'll want to use an @tailrec attribute. What about the @continuation (http://en.wikipedia.org/wiki/Continuation-passing_style )? I doubt they'll want to use that attribute, either. In any case, D supports more styles of programming than any other language I can think of. I doubt adding even more will be that helpful.
Re: Software Assurance Reference Dataset
20-Jul-2014 10:45, Walter Bright пишет: On 7/19/2014 11:06 PM, bearophile wrote: A @tailrec annotation seems good, and will improve the functional usages of D. It should stop compilation with a error if the function doesn't call itself or if the compiler is not able to remove the recursive call. This means it has to be fully enforced. If you want to guarantee replacement of a recursive call with a loop, just write a loop. Functional programming is full of simple recursion and it would be nice not to stack overflow in debug builds. Another use case is so-called threaded code interpreter, that can be done with either computed gotos (and bunch of labels) or forced tail calls (and bunch of functions). In both cases computed jump or tail call is indirect. -- Dmitry Olshansky
Re: Software Assurance Reference Dataset
Walter Bright: I doubt they'll want to use an @tailrec attribute. In Scala there is @tailrec: http://www.scala-lang.org/api/current/scala/annotation/tailrec.html In both F# and OcaML there is the rec keyword: http://msdn.microsoft.com/en-us/library/dd233232.aspx http://caml.inria.fr/pub/docs/manual-ocaml-400/manual003.html#toc4 In Clojure there is recur (that is not an annotation): http://clojure.org/special_forms?responseToken=08ea4841337f67bb8f07663aa70b03aca#recur I think functional programmers are willing to use @tailrec attribute if it's well designed and it does what's written on its tin. What about the @continuation (http://en.wikipedia.org/wiki/Continuation-passing_style )? I doubt they'll want to use that attribute, either. I don't know. It's more general than the @tailrec, but probably many C and C++ and Java programmers don't even know what it is. But it allows a programming style that in some case is interesting (far cleaner than computed gotos). In any case, D supports more styles of programming than any other language I can think of. I doubt adding even more will be that helpful. I think a basic form of pattern matching implemented with the switch construct is a good idea for D. Bye, bearophile
Re: Software Assurance Reference Dataset
Returning to the earlier question, of helping to avoid stack overflows: I can easily think of two things the language could do. (Forgive me if D already has them - I have read a fair amount but not all the minutiae.) 1) A function annotation that means I will call myself recursively, and when I do, I expect the tail recursion optimization. I have seen code which allocates something big on the stack and depends on the optimization. So this intent should be expressible. 2) Annotations about when a function does not expect re-entrancy to be possible based on call-graph analysis. This would obviously have to restrict what it can do in order to be feasible, but wouldn't it still be useful? Besides - the cases where it is hard for the compiler to detect the possibility of re-entrancy are often cases where humans have trouble noticing it too.
Re: Software Assurance Reference Dataset
On Sunday, 13 July 2014 at 23:35:46 UTC, Walter Bright wrote: On 7/13/2014 4:04 AM, Marc Schütz schue...@gmx.net wrote: On Sunday, 13 July 2014 at 03:25:08 UTC, Walter Bright wrote: On 7/11/2014 10:28 AM, deadalnix wrote: The compiler can ensure that you hit at least every 4k or so. And it already does. Doesn't look so: int bar(int a) { int[8000] b = void; b[$-1] = a; return b[$-1]; } On Win32: _D4foo53barFiZi comdat assume CS:_D4foo53barFiZi pushEBP mov EBP,ESP mov EDX,7 L8: sub ESP,01000h test[ESP],ESP dec EDX jne L8 sub ESP,0D04h lea ECX,-8[EBP] mov [ECX],EAX mov EAX,-8[EBP] leave ret It doesn't do it on Linux because gcc doesn't do it. But the capability is in the back end, and it does it for alloca(), too. Hmm... but this using DMD, not GDC. Or do you mean that DMD doesn't do it, because GCC doesn't do it either? If so, what is the reason for this? Why shouldn't this feature be enabled on every platform?
Re: Software Assurance Reference Dataset
On 7/14/2014 3:07 AM, Marc Schütz schue...@gmx.net wrote: Hmm... but this using DMD, not GDC. Or do you mean that DMD doesn't do it, because GCC doesn't do it either? Right. If so, what is the reason for this? Why shouldn't this feature be enabled on every platform? There are often undocumented reasons why gcc generates code in certain ways, and I figured that being compatible was the best strategy. Every time I deviate, it comes back eventually as a bug report.
Re: Software Assurance Reference Dataset
On 14 July 2014 11:07, via Digitalmars-d digitalmars-d@puremagic.com wrote: On Sunday, 13 July 2014 at 23:35:46 UTC, Walter Bright wrote: On 7/13/2014 4:04 AM, Marc Schütz schue...@gmx.net wrote: On Sunday, 13 July 2014 at 03:25:08 UTC, Walter Bright wrote: On 7/11/2014 10:28 AM, deadalnix wrote: The compiler can ensure that you hit at least every 4k or so. And it already does. Doesn't look so: int bar(int a) { int[8000] b = void; b[$-1] = a; return b[$-1]; } On Win32: _D4foo53barFiZi comdat assume CS:_D4foo53barFiZi pushEBP mov EBP,ESP mov EDX,7 L8: sub ESP,01000h test[ESP],ESP dec EDX jne L8 sub ESP,0D04h lea ECX,-8[EBP] mov [ECX],EAX mov EAX,-8[EBP] leave ret It doesn't do it on Linux because gcc doesn't do it. But the capability is in the back end, and it does it for alloca(), too. Hmm... but this using DMD, not GDC. Or do you mean that DMD doesn't do it, because GCC doesn't do it either? If so, what is the reason for this? Why shouldn't this feature be enabled on every platform? For GDC, there is -fstack-protector (which is turned on by default in distributed binaries from Ubuntu). However IIRC only functions that use alloca or have static arrays of type char are actually checked. Declaring an int[100] doesn't invoke alloca, so you won't see it. The bounds checking in D alone is enough to catch most common overflowing stack bugs. Regards Iain
Re: Software Assurance Reference Dataset
On Sunday, 13 July 2014 at 03:25:08 UTC, Walter Bright wrote: On 7/11/2014 10:28 AM, deadalnix wrote: The compiler can ensure that you hit at least every 4k or so. And it already does. Doesn't look so: int foo(int a) { int[100] b = void; b[$-1] = a; return b[$-1]; } int bar(int a) { int[8000] b = void; b[$-1] = a; return b[$-1]; } # objdump -d bigstack.o | ddemangle | less Disassembly of section .text.int bigstack.foo(int): int bigstack.foo(int): 0: 55 push %rbp 1: 48 8b ecmov%rsp,%rbp 4: 48 81 ec a0 01 00 00sub$0x1a0,%rsp b: 48 8d 45 f4 lea-0xc(%rbp),%rax f: 89 38 mov%edi,(%rax) 11: 8b 45 f4mov-0xc(%rbp),%eax 14: c9 leaveq 15: c3 retq 16: 66 90 xchg %ax,%ax Disassembly of section .text.int bigstack.bar(int): int bigstack.bar(int): 0: 55 push %rbp 1: 48 8b ecmov%rsp,%rbp 4: 48 81 ec 10 7d 00 00sub$0x7d10,%rsp b: 48 8d 45 f4 lea-0xc(%rbp),%rax f: 89 38 mov%edi,(%rax) 11: 8b 45 f4mov-0xc(%rbp),%eax 14: c9 leaveq 15: c3 retq 16: 66 90 xchg %ax,%ax (This is with DMD master on Linux x86_64. Same with -m32.)
Re: Software Assurance Reference Dataset
On 7/13/2014 4:04 AM, Marc Schütz schue...@gmx.net wrote: On Sunday, 13 July 2014 at 03:25:08 UTC, Walter Bright wrote: On 7/11/2014 10:28 AM, deadalnix wrote: The compiler can ensure that you hit at least every 4k or so. And it already does. Doesn't look so: int bar(int a) { int[8000] b = void; b[$-1] = a; return b[$-1]; } On Win32: _D4foo53barFiZi comdat assume CS:_D4foo53barFiZi pushEBP mov EBP,ESP mov EDX,7 L8: sub ESP,01000h test[ESP],ESP dec EDX jne L8 sub ESP,0D04h lea ECX,-8[EBP] mov [ECX],EAX mov EAX,-8[EBP] leave ret It doesn't do it on Linux because gcc doesn't do it. But the capability is in the back end, and it does it for alloca(), too.
Re: Software Assurance Reference Dataset
On Friday, 11 July 2014 at 17:28:39 UTC, deadalnix wrote: The compiler can ensure that you hit at least every 4k or so. It doesn't look like a very hard constraint to have a volatile load per untouched 4k of stack (which should be very rare). Sure, it should be possible to work it into the backend at the code-gen level. For fibers without recursion I think the most powerful approach would be to do whole program analysis to establish a minimum of N for the typical case (without unbounded recursion). Of course, then alloca also have to extend the stack and do book keeping of the additional storage taken by alloca() or simply have a separate alloc-stack.
Re: Software Assurance Reference Dataset
On 7/11/2014 10:28 AM, deadalnix wrote: The compiler can ensure that you hit at least every 4k or so. And it already does.
Re: Software Assurance Reference Dataset
On Monday, 30 June 2014 at 08:00:37 UTC, Ola Fosheim Grøstad wrote: On Thursday, 26 June 2014 at 09:35:20 UTC, Walter Bright wrote: Stack overflows are not safety problems when a guard page is used past the end of the stack. Then, overflow checking is done in hardware. Guard pages aren't currently used for fibers, so overflows are a real danger there. But a page is only 2K? So what happens if you skip more than 2K and never touch the guard page? Does D prove that the stack pointer is never moved more than 2K-1 without a read or write in that range? The compiler can ensure that you hit at least every 4k or so. It doesn't look like a very hard constraint to have a volatile load per untouched 4k of stack (which should be very rare).
Re: Software Assurance Reference Dataset
(Sorry for the very late answer.) Walter Bright: Stack overflows are not safety problems when a guard page is used past the end of the stack. It's not a safety problem in Erlang/Rust, because those languages are designed to manage such failures in a good way. Please explain. The idea comes from Erlang language (and perhaps Erlang has coped it from something else), and then Rust copied it (and indeed, if you look at the Influenced by list here, you see Erlang, and it Rust has copied only the Erlang feature I am discussing here: http://en.wikipedia.org/wiki/Rust_language ). Erlang systems must be extremely reliable, they run telecommunication systems that must just always work, with only seconds or minutes of downtime every year. But programs contains errors and bugs, so to face this problem Erlang (and now Rust) has chosen a curious strategy. The description, see 4.3 Error handling philosophy at page 104-109, a PDF file: http://www.erlang.org/download/armstrong_thesis_2003.pdf It's also a bit explained here, at the 3. What is fault-tolerance section: http://stackoverflow.com/questions/3172542/are-erlang-otp-messages-reliable-can-messages-be-duplicated/3176864#3176864 Some more technical info: http://www.erlang.org/doc/design_principles/sup_princ.html Bye, bearophile
Re: Software Assurance Reference Dataset
(Sorry for the very delayed answers.) Walter Bright: Oh well, there goes about 90% of D programs and pretty much all use of the D runtime library! Right. On the other hand the need for such strict safety requirements that forbid stack overflows, is not present in desktop software, it's for uncommon systems, with limited resources, that need to work reliably, like some car/plane electronics. So when you have such needs you are willing to throw away 90% of D libs/Phobos, and probably to not use d-runtime too :-) And in such cases it's good to have as much compile-time assurances as possible. And I suspect the subset of D programs that don't have indirection or recursion is so small as to not be worth the bother. I do know there are a class of applications, for example in critical embedded systems, were recursion and indirection, and even allocation, are not allowed. Using D in such cases would require eschewing using Phobos, and some other care taken, Right. And people in such fields are looking for something safer and nicer than C + a ton of static tests (and many of them don't seem to love Ada). Andrei Alexandrescu: My understanding is that it can be done but only with annotations or whole program analysis. I agree. But people in (example) automotive industry could be (and probably are) willing to do both. Thus is a specialized niche, so adding such annotations to D language seems too much. So I've suggested to add enough static introspection (and some way to attach compile-time semantics to UDAs) to allow specialized programmers to add to D those annotations (and leave other more complex tests to external tools). H. S. Teoh: I think the compiler should be able to tell, at least for the simplest cases, whether something will definitely stop recursing. Whiley language (http://whiley.org/ ) is able to do this, with the loop variant you write. deadalnix: 2) By alloca(); it is @system I'd like to use something like alloca (or much better something similar to the variable-length stack-allocated partially-library-defined arrays of C++14/C++17) even in @safe code (but not in stack-constrained code we were discussing here). We should have a page reserved at the end of the stack so we can throw when reaching it. The compiler can ensure we don't bypass it in case 1. OK. Ola Fosheim Grøstad: A compromise would be to inject runtime checks to see if there is sufficient stack space whenever you move the stack pointer Run-time tests in debug mode seem useful. Bye, bearophile
Re: Software Assurance Reference Dataset
On 7/10/2014 5:08 AM, bearophile wrote: (Sorry for the very late answer.) Walter Bright: Stack overflows are not safety problems when a guard page is used past the end of the stack. It's not a safety problem in Erlang/Rust, because those languages are designed to manage such failures in a good way. Please explain. The idea comes from Erlang language (and perhaps Erlang has coped it from something else), and then Rust copied it (and indeed, if you look at the Influenced by list here, you see Erlang, and it Rust has copied only the Erlang feature I am discussing here: http://en.wikipedia.org/wiki/Rust_language ). Erlang systems must be extremely reliable, they run telecommunication systems that must just always work, with only seconds or minutes of downtime every year. But programs contains errors and bugs, so to face this problem Erlang (and now Rust) has chosen a curious strategy. The description, see 4.3 Error handling philosophy at page 104-109, a PDF file: http://www.erlang.org/download/armstrong_thesis_2003.pdf It's also a bit explained here, at the 3. What is fault-tolerance section: http://stackoverflow.com/questions/3172542/are-erlang-otp-messages-reliable-can-messages-be-duplicated/3176864#3176864 Some more technical info: http://www.erlang.org/doc/design_principles/sup_princ.html Bye, bearophile Thanks for the links!
Re: Software Assurance Reference Dataset
On Thursday, 26 June 2014 at 09:35:20 UTC, Walter Bright wrote: Stack overflows are not safety problems when a guard page is used past the end of the stack. Then, overflow checking is done in hardware. Guard pages aren't currently used for fibers, so overflows are a real danger there. But a page is only 2K? So what happens if you skip more than 2K and never touch the guard page? Does D prove that the stack pointer is never moved more than 2K-1 without a read or write in that range? Guard pages on a flat memory model are not as safe as a segmented memory model. A compromise would be to inject runtime checks to see if there is sufficient stack space whenever you move the stack pointer and remove them when you can prove that there is enough room. E.g. collapse the checks into larger spans of stack space by propagating them upwards in the call chain. Anyway, minimizing stack space is very useful for fibers in scientific simulations or real time systems since you want to be able to run as many as you can fit into memory. Each actor/agent could be very simple so I would not rule out the ability to prove it in most cases for some domains.
Re: Software Assurance Reference Dataset
On Thursday, 26 June 2014 at 21:01:40 UTC, Araq wrote: Spark is a research language that does not work, as I've discovered and discussed with you before. It cannot be determined the max stack usage at compile time, again, this is the halting problem. What?! It's easily solvable: Forbid recursion and indirect function calls and it's guaranteed that the program only requires a fixed size stack and you can compute an upper bound of the required stack size at compile-time. Which is BTW exactly what OpenCL does as GPUs tend to have no stacks. Actually CUDA do have recursion, and indirect function calls.
Re: Software Assurance Reference Dataset
Am 29.06.2014 11:07, schrieb ponce: On Thursday, 26 June 2014 at 21:01:40 UTC, Araq wrote: Spark is a research language that does not work, as I've discovered and discussed with you before. It cannot be determined the max stack usage at compile time, again, this is the halting problem. What?! It's easily solvable: Forbid recursion and indirect function calls and it's guaranteed that the program only requires a fixed size stack and you can compute an upper bound of the required stack size at compile-time. Which is BTW exactly what OpenCL does as GPUs tend to have no stacks. Actually CUDA do have recursion, and indirect function calls. Not only that, CUDA offers a quite usable C++ subset whereas OpenCL keeps us in the primitive C land. -- Paulo
Re: Software Assurance Reference Dataset
On Thursday, 26 June 2014 at 09:35:20 UTC, Walter Bright wrote: On 6/26/2014 2:19 AM, bearophile wrote: One kind of problem left is to avoid stack overflows. In general, stack overflow checking at compile time is the halting problem. It needs a runtime check. No, is N bytes of stack is sufficient is decidable if you don't count in the heap as an unlimited resource that preserves state. So it is not the halting problem. The halting problem would be is K bytes of stack sufficient, for K in [0,infinity. You can also do a worst case analysis that requires you to specify too much stack, but proves that it is sufficient. E.g. set the recursive depth to the max number of nodes in a binary tree, even if you know that it will never get that deep. But since you allow calls into C code you need extra annotations and probably have more pressing issues to deal with… (like GC).
Re: Software Assurance Reference Dataset
Walter Bright: It's an interesting list, and an opportunity for D. I once said that my job was to put Coverity out of business. Even if D has wide success, I don't think D will delete all the C and C++ code out of existence, so I don't think D will ever put Coverity out of business :-) The more of these issues D can automatically prevent with @safe, the better. I think D will need/better guns for that, like a more principled (formalized, written fully down in specs, eventually even proved) management of uniqueness, etc. Bye, bearophile
Re: Software Assurance Reference Dataset
Walter Bright: It's an interesting list, and an opportunity for D. I once said that my job was to put Coverity out of business. The more of these issues D can automatically prevent with @safe, the better. One kind of problem left is to avoid stack overflows. I have had several such cases in my D code (perhaps because I use fixed-sized arrays a lot). I think they can be caused by: 1) Too much large allocations in stack frames; 2) By alloca(); 3) By recursion and co-recursion chains; The first cause is probably sufficiently easy to solve mechanically, with a conservative call tree analysis of the code. This is a job for external tools. The second cause can be solved with a formal proof of the upper bounds of the arguments of alloca, or even more conservatively disallowing alloca. The third cause, a cycle in the call graph, can be found with an external too, but in theory it's also easy to avoid with a @no_call_cyles annotation :-) Such annotation looks useful only if you want to avoid stack overflows, so it's better to use a more general annotation that also forbids alloca(). Something like @fixed_stack or @constant_memory. I think adding this annotation to D is a little overkill (despite SPARK and Ada analysers are doing stack size analysis since many years). So I think a better solution (beside creating an external tool) is to add to D a more general feature to attach some compile-time semantics to user-defined annotations. To do this you need a trait that returns a sequence of all the functions called by a function. Bye, bearophile
Re: Software Assurance Reference Dataset
On Thursday, 26 June 2014 at 00:36:58 UTC, Walter Bright wrote: http://samate.nist.gov/SRD/view.php?count=20first=0sort=asc This is a list of security vulnerabilities in languages including C/C++. 88,737 of them (!). It's an interesting list, and an opportunity for D. I once said that my job was to put Coverity out of business. The more of these issues D can automatically prevent with @safe, the better. As expected, most of them would be automatically fixed by using a descendent from Algol/Mesa/Cedar branch of languages. I look forward that @safe can provide a similar set of guarantees. -- Paulo
Re: Software Assurance Reference Dataset
On 6/26/2014 2:19 AM, bearophile wrote: One kind of problem left is to avoid stack overflows. In general, stack overflow checking at compile time is the halting problem. It needs a runtime check. Stack overflows are not safety problems when a guard page is used past the end of the stack. Then, overflow checking is done in hardware. Guard pages aren't currently used for fibers, so overflows are a real danger there. In 64 bit code, however, one should be able to use guard pages for fibers.
Re: Software Assurance Reference Dataset
Walter Bright: In general, stack overflow checking at compile time is the halting problem. It needs a runtime check. There are several systems, including SPARK, that perform a conservative and apparently acceptable stack overflow check at compile time. If you don't agree with what I've written in my post, then please give a more detailed answer to the points I've written above. Stack overflows are not safety problems when a guard page is used past the end of the stack. It's not a safety problem in Erlang/Rust, because those languages are designed to manage such failures in a good way. In most other languages it's a safety problem, if your program execution has some importance. Bye, bearophile
Re: Software Assurance Reference Dataset
On 06/26/2014 11:35 AM, Walter Bright wrote: On 6/26/2014 2:19 AM, bearophile wrote: One kind of problem left is to avoid stack overflows. In general, stack overflow checking at compile time is the halting problem. That is irrelevant to his point because he is not suggesting to solve the general problem precisely. Analogously: In general, checking whether some variable in e.g. Python is ever assigned a string value is undecidable as well, but this does not imply we cannot have 'int' variables in D.
Re: Software Assurance Reference Dataset
On 6/26/2014 2:50 AM, bearophile wrote: Walter Bright: In general, stack overflow checking at compile time is the halting problem. It needs a runtime check. There are several systems, including SPARK, that perform a conservative and apparently acceptable stack overflow check at compile time. If you don't agree with what I've written in my post, then please give a more detailed answer to the points I've written above. Spark is a research language that does not work, as I've discovered and discussed with you before. It cannot be determined the max stack usage at compile time, again, this is the halting problem. Stack overflows are not safety problems when a guard page is used past the end of the stack. It's not a safety problem in Erlang/Rust, because those languages are designed to manage such failures in a good way. Please explain. In most other languages it's a safety problem, if your program execution has some importance. I mean safety in the sense of being a security problem, which is the context of this thread.
Re: Software Assurance Reference Dataset
On 6/26/2014 2:52 AM, Timon Gehr wrote: On 06/26/2014 11:35 AM, Walter Bright wrote: On 6/26/2014 2:19 AM, bearophile wrote: One kind of problem left is to avoid stack overflows. In general, stack overflow checking at compile time is the halting problem. That is irrelevant to his point because he is not suggesting to solve the general problem precisely. Analogously: In general, checking whether some variable in e.g. Python is ever assigned a string value is undecidable as well, but this does not imply we cannot have 'int' variables in D. When you're dealing with security issues, which is what this about, you'll need a guarantee about stack overflow. Adding annotations is not helpful with this because they are not checkable. Again, what WORKS is a runtime check.
Re: Software Assurance Reference Dataset
Spark is a research language that does not work, as I've discovered and discussed with you before. It cannot be determined the max stack usage at compile time, again, this is the halting problem. What?! It's easily solvable: Forbid recursion and indirect function calls and it's guaranteed that the program only requires a fixed size stack and you can compute an upper bound of the required stack size at compile-time. Which is BTW exactly what OpenCL does as GPUs tend to have no stacks. In what way is Spark a research language that does not work? And how many language design issues need to be discovered until you admit that Safe-D is a research language that does not work?
Re: Software Assurance Reference Dataset
On 6/26/2014 2:01 PM, Araq wrote: Spark is a research language that does not work, as I've discovered and discussed with you before. It cannot be determined the max stack usage at compile time, again, this is the halting problem. What?! It's easily solvable: Forbid recursion and indirect function calls Oh well, there goes about 90% of D programs and pretty much all use of the D runtime library! and it's guaranteed that the program only requires a fixed size stack and you can compute an upper bound of the required stack size at compile-time. Which is BTW exactly what OpenCL does as GPUs tend to have no stacks. In what way is Spark a research language that does not work? A while back, bearophile posted here some advocacy that Spark was using its contracts to prove things about the code. I experimented with it a bit and discovered that such proofs did not go beyond the trivial. As I recall, bearophile then agreed that it was a great idea that the implementation fell far short of. And how many language design issues need to be discovered until you admit that Safe-D is a research language that does not work? I recommend that all such issues you discover be put on bugzilla, and marked with the 'safe' keyword, so they can be addressed. I think there's a big difference between only works for trivial cases with no idea how to handle the rest and does not work for all cases and there being reasonable paths to handle them. I admit that Safe D does not yet handle all the cases.
Re: Software Assurance Reference Dataset
On 06/26/2014 10:29 PM, Walter Bright wrote: On 6/26/2014 2:52 AM, Timon Gehr wrote: On 06/26/2014 11:35 AM, Walter Bright wrote: On 6/26/2014 2:19 AM, bearophile wrote: One kind of problem left is to avoid stack overflows. In general, stack overflow checking at compile time is the halting problem. That is irrelevant to his point because he is not suggesting to solve the general problem precisely. Analogously: In general, checking whether some variable in e.g. Python is ever assigned a string value is undecidable as well, but this does not imply we cannot have 'int' variables in D. When you're dealing with security issues, which is what this about, This is about _avoiding stack overflows_. It's written down literally in the quoted passage. you'll need a guarantee about stack overflow. Adding annotations is not helpful with this because they are not checkable. ... Not true. Basic facts: - In practice, programs are constructed explicitly to fulfill a particular purpose and, in particular, if they do never overflow the stack, they usually do so for a reason. - Reasoning can be formalized and formal proofs can be written down in such a way that a machine can check whether the proof is valid. - Annotations can include a formal proof. We've had similar discussions before. Why do we _still_ have to argue about the _possibility_ of having a system that is helpful for proving stack overflows (or other bad behaviours) absent? You can say out of scope or not a priority or this should be realized in third-party tool support but not it is impossible. Again, what WORKS is a runtime check. A runtime check will not avoid the problem, which is exhaustion of stack space.
Re: Software Assurance Reference Dataset
On 6/26/14, 2:01 PM, Araq wrote: Spark is a research language that does not work, as I've discovered and discussed with you before. It cannot be determined the max stack usage at compile time, again, this is the halting problem. What?! It's easily solvable: Forbid recursion and indirect function calls I do agree that a useful subset of a language can be conservatively defined that doesn't require solving the halting problem. But that's not easy at all - it requires interprocedural analysis. Andrei
Re: Software Assurance Reference Dataset
On 6/26/14, 4:16 PM, Timon Gehr wrote: - Annotations can include a formal proof. If a function can be annotated with what other functions it calls (non-transitively), I agree that it can be guaranteed with local semantic checking that a program won't overflow. However requiring such annotations would be onerous, and deducing them would require whole program analysis. We've had similar discussions before. Why do we _still_ have to argue about the _possibility_ of having a system that is helpful for proving stack overflows (or other bad behaviours) absent? You can say out of scope or not a priority or this should be realized in third-party tool support but not it is impossible. I also seem to reckon Walter is in the other extreme (he asserts it can't be done at all, period). My understanding is that it can be done but only with annotations or whole program analysis. Andrei
Re: Software Assurance Reference Dataset
On Thu, Jun 26, 2014 at 04:43:33PM -0700, Andrei Alexandrescu via Digitalmars-d wrote: On 6/26/14, 2:01 PM, Araq wrote: Spark is a research language that does not work, as I've discovered and discussed with you before. It cannot be determined the max stack usage at compile time, again, this is the halting problem. What?! It's easily solvable: Forbid recursion and indirect function calls I do agree that a useful subset of a language can be conservatively defined that doesn't require solving the halting problem. But that's not easy at all - it requires interprocedural analysis. [...] And it may not be feasible for a compiler to automatically prove. One possible approach is to have the user supply the proof of eventual termination, which can be mechanically verified by the compiler. (Checking a supplied proof for correctness is more tractable than coming up with the proof in the first place.) But to handle proofs of non-trivial code beyond just toy examples, you might end up needing a full-scale deductive proof subsystem in the compiler, which may or may not be practical for D's needs! T -- That's not a bug; that's a feature!
Re: Software Assurance Reference Dataset
On 06/27/2014 01:47 AM, Andrei Alexandrescu wrote: On 6/26/14, 4:16 PM, Timon Gehr wrote: ... You can say out of scope or not a priority or this should be realized in third-party tool support but not it is impossible. I also seem to reckon Walter is in the other extreme (he asserts it can't be done at all, period). I don't think it makes sense to imply that I am defending an extreme position. I wasn't discussing design choices, truth is not a continuum and I was just objecting to the line of reasoning that went like undecidability of the halting problem implies that formal reasoning is pointless., which is clearly untrue. My understanding is that it can be done but only with annotations or whole program analysis. ... Any way it is done, it doesn't come for free.
Re: Software Assurance Reference Dataset
On Thu, Jun 26, 2014 at 04:47:24PM -0700, Andrei Alexandrescu via Digitalmars-d wrote: On 6/26/14, 4:16 PM, Timon Gehr wrote: - Annotations can include a formal proof. If a function can be annotated with what other functions it calls (non-transitively), I agree that it can be guaranteed with local semantic checking that a program won't overflow. However requiring such annotations would be onerous, and deducing them would require whole program analysis. We've had similar discussions before. Why do we _still_ have to argue about the _possibility_ of having a system that is helpful for proving stack overflows (or other bad behaviours) absent? You can say out of scope or not a priority or this should be realized in third-party tool support but not it is impossible. I also seem to reckon Walter is in the other extreme (he asserts it can't be done at all, period). My understanding is that it can be done but only with annotations or whole program analysis. [...] I think the compiler should be able to tell, at least for the simplest cases, whether something will definitely stop recursing. Proving that something will *not* stop recursing is unsolvable in the general case, but even if we restrict it to a solvable subset, it's still far from trivial for the compiler to invent such a proof (unless we restrict it so much that it excludes too many useful algorithms). One approach is to have the user supply a proof that the compiler can then check -- in general, if a proof exists at all, it should be checkable. Such checks furthermore can probably be done fast enough so as to not adversely affect current compilation times (unless the proof is ridiculously complex, which for practical real-world applications I don't think will happen). However, absence of proof is not proof of absence; just because neither the compiler nor the user is able to come up with a proof that something will stop recursing, doesn't mean that it definitely won't stop recursing. So the compiler cannot definitively reject the code as definitely overflowing the stack. But we *can* make it so that the user tells the compiler please stop compilation if neither of us can supply a proof that this function will stop recursing. But it has to be opt-in, because there will be many real-world applications that cannot be proven to stop recursing, even though in practice they always will, so we cannot make this a language default. One particular example that comes to mind is the compiler itself: as it parses the input program, there is in theory no guarantee that the input won't have an arbitrarily deep nesting, such that the AST generated by the parser will be infinitely deep, because you cannot statically prove that the input will terminate. You don't know if the input file is actually connected to the output pipe of a program that prints an infinite series of ever-deeper nested structs, for example. However, in practice, such input never occurs, so we generally don't worry about such contrived possibilities. But this does mean that there can be no proof of termination of recursion, even user-supplied ones -- because there *are* cases where the parser *will* recurse forever, even if it never happens in practice. Not accounting for that possibility invalidates the proof, so any proof will always be rejected. Therefore, it is impossible to prove that recursion in the compiler is finite, even though in practice it always is. T -- Life is all a great joke, but only the brave ever get the point. -- Kenneth Rexroth
Re: Software Assurance Reference Dataset
On 6/26/2014 4:16 PM, Timon Gehr wrote: On 06/26/2014 10:29 PM, Walter Bright wrote: When you're dealing with security issues, which is what this about, This is about _avoiding stack overflows_. It's written down literally in the quoted passage. Check the title of this thread, the linked issues, and bearophile's comment bringing up stack overflows. It's about security, not niceness. you'll need a guarantee about stack overflow. Adding annotations is not helpful with this because they are not checkable. ... Not true. Basic facts: - In practice, programs are constructed explicitly to fulfill a particular purpose and, in particular, if they do never overflow the stack, they usually do so for a reason. - Reasoning can be formalized and formal proofs can be written down in such a way that a machine can check whether the proof is valid. - Annotations can include a formal proof. We've had similar discussions before. Why do we _still_ have to argue about the _possibility_ of having a system that is helpful for proving stack overflows (or other bad behaviours) absent? You can say out of scope or not a priority or this should be realized in third-party tool support but not it is impossible. In the general case, it is impossible. And I suspect the subset of D programs that don't have indirection or recursion is so small as to not be worth the bother. I do know there are a class of applications, for example in critical embedded systems, were recursion and indirection, and even allocation, are not allowed. Using D in such cases would require eschewing using Phobos, and some other care taken, but that isn't the issue here, which is security vulnerabilities. Again, what WORKS is a runtime check. A runtime check will not avoid the problem, which is exhaustion of stack space. We disagree on the problem. The problem I initiated this thread on is security vulnerabilities. Terminating a program that overflows its stack is not a security vulnerability. As for formal proofs, I'll let slip a guilty secret - I know so little about computer science proofs I wouldn't even recognize one if I saw one, let alone have the ability to craft one. So when you advocate formal proofs, directing it at me won't accomplish anything. To get formal proofs for D in the spec, in the code, in the compiler, I cannot deliver that. People like you are going to have to step forward and do them.
Re: Software Assurance Reference Dataset
On 6/26/14, 5:29 PM, Timon Gehr wrote: On 06/27/2014 01:47 AM, Andrei Alexandrescu wrote: On 6/26/14, 4:16 PM, Timon Gehr wrote: ... You can say out of scope or not a priority or this should be realized in third-party tool support but not it is impossible. I also seem to reckon Walter is in the other extreme (he asserts it can't be done at all, period). I don't think it makes sense to imply that I am defending an extreme position. I wasn't discussing design choices, truth is not a continuum and I was just objecting to the line of reasoning that went like undecidability of the halting problem implies that formal reasoning is pointless., which is clearly untrue. I agree. My understanding is that it can be done but only with annotations or whole program analysis. ... Any way it is done, it doesn't come for free. That's not a fair characterization. Interprocedural analysis is quite a different ball game from classic semantic checking. Andrei
Re: Software Assurance Reference Dataset
On Thursday, 26 June 2014 at 09:19:05 UTC, bearophile wrote: Walter Bright: It's an interesting list, and an opportunity for D. I once said that my job was to put Coverity out of business. The more of these issues D can automatically prevent with @safe, the better. One kind of problem left is to avoid stack overflows. I have had several such cases in my D code (perhaps because I use fixed-sized arrays a lot). I think they can be caused by: 1) Too much large allocations in stack frames; If generated by the compiler, they be made @safe 2) By alloca(); it is @system 3) By recursion and co-recursion chains; We should have a page reserved at the end of the stack so we can throw when reaching it. The compiler can ensure we don't bypass it in case 1.
Software Assurance Reference Dataset
http://samate.nist.gov/SRD/view.php?count=20first=0sort=asc This is a list of security vulnerabilities in languages including C/C++. 88,737 of them (!). It's an interesting list, and an opportunity for D. I once said that my job was to put Coverity out of business. The more of these issues D can automatically prevent with @safe, the better.