Re: Follow-up post explaining research rationale
On Sunday, 15 May 2016 at 10:52:47 UTC, Timon Gehr wrote: On 15.05.2016 05:02, Joe Duarte wrote: Type systems are quite arbitrary and primitive That may apply to the popular ones. -- we could've moved to real-world types The "real world" is complex and there are bound to be some modeling limitations. I don't really see what "real-world" type is supposed to mean. a long time ago, which would be much safer and a hell of a lot more productive. How would that work/what's the difference? Here's what I think is the first big exploration of a real-world type system: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.397 I would combine it with something like Precimonious: http://www.cs.berkeley.edu/~ksen/papers/precimonious.pdf
Re: Follow-up post explaining research rationale
On Friday, 13 May 2016 at 22:13:50 UTC, QAston wrote: Mainstream PL syntax is extremely unintuitive and poorly designed by known pedagogical, epistemological, and communicative science standards. The vast majority people who are introduced to programming do not pursue it (likely true of many fields, but programming may see a smaller grab than most – this point requires a lot more context). I'm open to the possibility that the need to master the bizarre syntax of incumbent programming languages might serve as a useful filter for qualities valuable in a programmer, but I'm not sure how good or precise the filter is. Your research seems to have a very big ommission: textual representation is not only representation of programs - therfore programming doesn't have to have syntax. The first programming enviroment I was introduced to was an executable flowchart enviroment. Quick note: I'm looking at the effects of the syntax and design of incumbent programming languages on the appeal of programming to people in general, with some specific questions concerning the appeal of programming to women (particularly elite women who have many career options.) So that research track is bound to the world as I find it, and the world as I find it is a world where graphical programming languages and environments are not widely used, and where all the programming languages in wide use are plain text. That said, I'm deeply interested in graphical and visual programming methods. I know quite a bit about them, have studied lots of historic efforts and even contemporary ones like Snap (http://snap.berkeley.edu/) and Scratch. Side note: I'm a bit of a Russophile, and I'm fascinated by the history of Soviet computing, with innovations like the Elbrus systems with tagged memory, and the ways they cloned major American computing platforms. Something I dug into last year is the DRAKON graphical programming language, which they built as part of the Buran space shuttle program. Very interesting: https://en.wikipedia.org/wiki/DRAKON Programmers tend to be conservative in many respects. There's the assumption that programming must consist of a programmer typing plain text into a flat document that contains many levels of abstraction and many different kinds of concepts. By flat I mean that it's this continuous blob that takes no account of its content, of varying levels of abstraction, of the wildly different kinds of work and reasoning that are expressed in this run-on text file. Then a compiler takes this text and does its thing. There's very little visual representation. Type systems are quite arbitrary and primitive -- we could've moved to real-world types a long time ago, which would be much safer and a hell of a lot more productive. Type theory imports the barbarism of legacy type systems and doesn't question the assumption that the universe is best carved into ints and floats at the source code level, instead of prices, km, or seconds. Compilers still don't know that something called lastName is a string (or better yet, a *text* type -- strings are for yo-yos), or that salesTax is a decimal. That's really simple stuff. It's nuts that with semantic names we still have to assign types, and that those types are so much coarser than the information already in the name. Our vaunted notion of type safety is based on an incredibly coarse split between ints, floats, and maybe strings or arrays. I think it should confuse every CS student how these distinctions came to preoccupy us, and why we don't have anything more to say about types at this point in history. So graphical programming is a huge change for a very conservative field, and probably has worse than "script kiddie" connotations for many professional programmers. And there's no compelling evidence that graphical is better than text for many use cases. It might be better for some people, and a one-to-one mapping between a powerful textual PL and a graphical form would be very interesting.
Re: Follow-up post explaining research rationale
On Tuesday, 10 May 2016 at 13:40:30 UTC, Chris wrote: On Monday, 9 May 2016 at 19:09:35 UTC, Joe Duarte wrote: [snip] Let me give you a sense of the sorts of issues I'm thinking of. Here is a C sample from ProgrammingSimplified.com. It finds the frequency of characters in a string: int main() { char string[100]; int c = 0, count[26] = {0}; printf("Enter a string\n"); gets(string); while (string[c] != '\0') { /** Considering characters from 'a' to 'z' only and ignoring others */ if (string[c] >= 'a' && string[c] <= 'z') count[string[c]-'a']++; c++; } for (c = 0; c < 26; c++) { /** Printing only those characters whose count is at least 1 */ if (count[c] != 0) printf("%c occurs %d times in the entered string.\n",c+'a',count[c]); } return 0; } [snap] I went to www.programmingsimplified.com/c-program-examples and found that this was example 48 out of 59. The examples start with: - Hello world - Print Integer - Addition - Odd or Even - Add, subtract, multiply and divide - Check vowel - Leap year - Add digits - [...] and so on, with increasing complexity. Nobody starts with examples like the one above. More likely with number 1 in their list: #include int main() { printf("Hello world\n"); return 0; } Not so difficult to understand. You're arguing that the 32-line example on finding characters frequency in a string was too complicated? I think it might help to clarify my purpose in that post. It was to pick a random example of a simple C program to illustrate the sorts of problems programming syntax has from a cognitive science and pedagogical standpoint. For my purposes, I didn't need to choose a 6-line program, and something that short would probably undermine my ability to illustrate some things. Note also that I saw myself as being a bit *charitable* to C by choosing that sample. For instance, I didn't use an example littered with the word "void". Void in English most commonly means invalid, canceled, or not binding, as in a voided check, a void contract (such as where one party is a minor), and "null and void" is a common usage, so starting a function declaration by declaring it void is jarring. There was a discussion that Walter linked to from the late 1980s I believe, where people were requesting that this issue be fixed in C (Walter linked to it as background on the naming of D I think). It's a hole in the type system and bad syntax -- I predict that it adds confusion to learning a language that uses it. Something I've wondered is if foreigners actually have an easier time with the syntax of English programming languages. The jarring usage of terms like void, or things like dollar signs and question marks to not mark dollars or questions, might not be jarring to non-native English speakers, or non-English speakers. For them, all this might just be arbitrary tokens and they'd just have to learn what the tokens signify (and it's very interesting to think about how they and English speakers learn this). Andreas Stefik did some research where he used a randomly generated programming syntax, I think it was called Randomo, and some mainstream languages were just as hard to learn as the randomly generated one (Java or C, I think -- they're both pretty bad anyway). To non-English speakers, and especially non-Latin-alphabet-users, all of our programming languages might be randomly generated for all intents and purposes. You gave a Hello World example above. Don't get me started on Hello World. Well, too late... Here's the deal. It's not just Hello World -- a ton of programming examples in online references and introductory texts, perhaps most of them, present programs that are pointless. By "pointless" I mean they do no useful work. Commonly, they repeat something back to us. So here we're typing 64-ish characters into a console or something in order to have the computer repeat back an 11-character string to us. If we have to type "Hello World" (and a bunch of other stuff) in order for the computer to display it, well we've not had a productive interaction. I think examples should quickly illustrate computers computing or otherwise doing productive things for us. Displaying a string in a console environment is also too far removed from normal computing environments like a smartphone or Windows 10/MacOS graphical desktop. If we're going to teach displaying a pop-up message, we should cut right to a windowing environment from the outset. I think the Hello World example is also extremely confusing to newbies and rational knowers. It's not clear why the main function has an integer type, since we're not doing anything with integers, why there is an empty set of parentheses after it (what are they doing there?), why we're saying "printf" when we don't want to use our printer, why there's a newline code *inside the quotes*
Re: Researcher question – what's the point of semicolons and curly braces?
On Tuesday, 3 May 2016 at 22:17:18 UTC, cym13 wrote: In my opinion putting commas at the end of a line is useless: if it happens at the end of each line (not counting closing-braces ones) then the effective quantity of information brought is null, and in more than one case chaining instructions on the same line (the only utility of commas) ends up being a bad idea. But it's part of those things that have such an inertia that you just wouldn't ever had heard of D if it hadn't had it I think. Having commas was not decided by their intrinsic usefulness but by the choice to target the C/C++ market. Good point that line-ending semicolons carry no information if they're on every line (I assume you meant semicolons instead of commas). An important point that I think is undocumented: text editors don't let you inhabit a new line unless you press Enter on the line above. In other words, you can't have a new line by using the down arrow or some other means. When I first learned programming, I was stumped by how a compiler was line-aware, how it knew when a line was truly ended, and what counted as line qua line (I wrongly assumed you could down-arrow to a new line). It's an invisible character by default, and they don't tell you how text editors behave. This comes up a bit in Markdown and in how people are inconsistently defining a "hard" vs "soft" line break. But Python sacrifices a *lot* of performances to do that. D has its own way and different goals. Being performance-friendly is one of them and that sometimes gets you with long functions and ugly hacks. When it comes to that having curly braces (well any kind of delitmiter really) is a great thing. It's not clear how curly braces deliver better performance. Anything expressed with curly braces can be expressed without them -- i.e. you could design a language in which that were true. Walter mentioned the issue of redundancy, which seems reasonable, but that doesn't bear on the performance issue. A good example of a non-curly brace compiled language is Crystal, at least last time I checked. Python loses a lot for being a text-executing interpreted language. What an interpreter does -- in comparison to a JIT compiler -- is wildly underdocumented. The standard answer to a lot of people on the web asking for an explanation is that a JIT compiles down to native code or machine code, while an interpret just interprets the code, or sometimes you'll see "executes it directly". Big gaping hole on how it gets down to machine code. But starting with text is crippling. I love Walter's decision to have pre-compiled modules instead of text headers -- I didn't realize that C compilers were literally parsing all this text every time. Python could get some big wins from a well-designed IR and follow-on back-end code generator, or a JIT, or some combo. This is obviously not a new idea, but no one seems willing to do it in a professional, focused, and expensive way. Unladen Swallow was weird in that you had a couple of kids, undergrad students who had no experience trying to build it all. It's weird how casual and half-assed a lot of software projects are. If I were trying to do this, I'd want to assemble the Avengers -- I'd want a large team of elite software developers, architects, and testers, enough to do it in a year. That's a rare setup, but it's how I would do it if I were Microsoft, Google, FB, et al -- if I were willing to spend $20 million on it, say. Pyjion might become something interesting, but right now it looks pretty casual and might be the kind of thing where they'll a lot of outside open-source developer help (https://github.com/Microsoft/Pyjion). Pyston is only focused on Python 2, which is rearview mirror thing. By the way, anyone should be able to create a version of C, D, or Go that doesn't use curly braces or semicolons, just by enforcing some rules about indentation and maybe line length that are already adhered to by virtue of common coding standards (e.g. blocks are typically indented; and I realize Go doesn't require semicolons). If we looked at typical code examples in almost any language like C, C#, D, Java, Swift, and we systematically encoded their meaning, reducing them down to a concise and non-redundant form, we'd find lots of redundancy and a lot of textual dead code, so to speak. This would be true even without semicolons and braces. There's still a lot for a compiler or any interpretive agent to go on.
Re: Researcher question – what's the point of semicolons and curly braces?
On Tuesday, 3 May 2016 at 12:47:42 UTC, qznc wrote: The parser needs information about "blocks". Here is an example: if (x) foo(); bar(); Is bar() always executed or only if (x) is true? In other words, is bar() part of the block, which is only entered conditionally? There are three methods to communicate blocks to the compiler: curly braces, significant whitespace (Python, Haskell), or an "end" keyword (Ruby, Pascal). Which one you prefer is subjective. You mention Facebook and face recognition. I have not seen anyone try machine learning for parsing. It would probably be a fun project, but not a practical one. You wonder that understanding structured text should be a solved problem. It is. You need to use a formal language, which programming languages are. English for example is much less structured. There easily are ambiguities. For example: I saw a man on a hill with a telescope. Who has the telescope? You or the man you saw? Who is on the hill? As a programmer, I do not want to write ambiguous programs. We produce more than enough bugs without ambiguity. Thanks for the example! So you laid out the three options for signifying blocks. Then you said which one you prefer is subjective, but that you don't want to write ambiguous programs. Do you think that the curly braces and semicolons help with that? So in your example, I figure bar's status is language-defined, and programmers will be trained in the language in the same way they are now. I've been sketching out a new language, and there are a couple of ways I could see implementing this. First, blocks of code are separated by one or more blank lines. No blank lines are allowed in a block. An if block would have to terminate in an else statement, so I think this example just wouldn't compile. Now if we wanted two things to happen on an if hit, we could leave it the way you gave where the two things are at the same level of indentation. That's probably what I'd settle on, contingent on a lot of research, including my own studies and other researchers', though this probably isn't one of the big issues. If we wanted to make the second thing conditional on success on the first task, then I would require another indent. Either way the block wouldn't compile without an else. I've been going through a lot of Unicode, icon fonts, and the Noun Project, looking for clean and concise representations for program logic. One of the ideas I've been working with is to leverage Unicode arrows. In most cases it's trivial aesthetic clean-up, like → instead of ->, and a lot of it could be simple autoreplace/autocomplete in tools. For if logic, you can an example of bent arrows, and how I'd express the alternatives for your example here: http://i1376.photobucket.com/albums/ah13/DuartePhotos/if%20block%20with%20Unicode%20arrows_zpsnuigkkxz.png
Re: Follow-up post explaining research rationale
On Monday, 9 May 2016 at 20:17:40 UTC, ag0aep6g wrote: Am 09.05.2016 um 21:09 schrieb Joe Duarte: 4. We switch the person or voice from an imperative "do this" as in printf, to some sort of narrator third-person voice with "gets". "gets" is still imperative. It's short for "get string". Not saying that this is obvious, or that it's a good name. Ah, I forgot about that! I think puts has the same abbreviation structure right? put string... I think knowing/remembering that it's an abbreviation would make it less distracting. My calling it a shift in voice is incorrect assuming people remember what it stands for. JD
Re: Always false float comparisons
On Monday, 9 May 2016 at 09:10:19 UTC, Walter Bright wrote: Don Clugston pointed out in his DConf 2016 talk that: float f = 1.30; assert(f == 1.30); will always be false since 1.30 is not representable as a float. However, float f = 1.30; assert(f == cast(float)1.30); will be true. So, should the compiler emit a warning for the former case? I think it really depends on what the warning actually says. I think people have different expectations for what that warning would be. When you say 1.30 is not representable as a float, when is the "not representable" enforced? Because it looks like the programmer just represented it in the assignment of the literal – but that's not where the warning would be right? I mean I assume so because people need nonrational literals all the time, and this is the only way they can do it, which means it's a hole in the type system right? There should be a decimal type to cover all these cases, like some databases have. Would the warning say that you can't compare 1.30 to a float because 1.30 is not representable as a float? Or would it say that f was rounded upon assignment and is no longer 1.30? Short of a decimal type, I think it would be nice to have a "float equality" operator that covered this whole class of cases, where floats that started their lives as nonrational literals and floats that have been rounded with loss of precision can be treated as equal if they're within something like .001% of each other (well a percentage that can actually be represented as a float...) Basically equality that covers the known mutational properties of fp arithmetic. There's no way to do this right now without ranges right? I know that ~ is for concat. I saw ~= is an operator. What does that do? The Unicode ≈ would be nice for this. I assume IEEE 754 or ISO 10967 don't cover this? I was just reading the latter (zip here: http://standards.iso.org/ittf/PubliclyAvailableStandards/c051317_ISO_IEC_10967-1_2012.zip)
Re: Follow-up post explaining research rationale
On Monday, 9 May 2016 at 20:29:12 UTC, Joe Duarte wrote: On Monday, 9 May 2016 at 20:09:35 UTC, Adam D. Ruppe wrote: I'd also be surprised if you find an empirical gender gap after controlling for programming language syntax, too. Even if we grant that PL syntax is suboptimal, why would that result in a gender bias? But, hey, you never really know until you actually collect the data... I forgot to mention the math. You can run the model in your head. If group W has more career options than group M, W will be underrepresented in career domain A. The effect will be larger if A is less appealing than W's other options, ceteris paribus and with some starting assumptions. (But it doesn't need to be, if W has more options than M.) If aspects of career domain A are *equally frustrating* for members of groups W and M, W will still be underrepresented (and M overrepresented) if people in W have more options. So we don't even need it to be the case that bizarre programming language design disproportionately annoys women for bizarre programming language design to result in the underrepresentation of women. JD (Assuming A is included in the set of options for both groups, and is equally available to them.)
Re: Follow-up post explaining research rationale
On Monday, 9 May 2016 at 20:09:35 UTC, Adam D. Ruppe wrote: I'd also be surprised if you find an empirical gender gap after controlling for programming language syntax, too. Even if we grant that PL syntax is suboptimal, why would that result in a gender bias? But, hey, you never really know until you actually collect the data... I forgot to mention the math. You can run the model in your head. If group W has more career options than group M, W will be underrepresented in career domain A. The effect will be larger if A is less appealing than W's other options, ceteris paribus and with some starting assumptions. (But it doesn't need to be, if W has more options than M.) If aspects of career domain A are *equally frustrating* for members of groups W and M, W will still be underrepresented (and M overrepresented) if people in W have more options. So we don't even need it to be the case that bizarre programming language design disproportionately annoys women for bizarre programming language design to result in the underrepresentation of women. JD
Follow-up post explaining research rationale
Hi all, As I mentioned on the other thread where I asked about D syntax, I'm a social scientist about to launch some studies of the effects of PL syntax on learnability, motivation to pursue programming, and differential gender effects on these factors. This is a long post – some of you wanted to know more about my research goals and rationale, and I also said I would post separately on the gender issue, so here we go... As you know, women are starkly underrepresented in software engineering roles. I'm interested in zooming back to the decisions people are making when they're 16 or 19 re: programming as a career. I'm interested in people's *first encounters* with programming, in high school or college, how men and women might differentially assess programming as a career option, and why. Let me note a few things: Someone on the other thread thought that my hypothesis was that women don't become programmers because of the semicolons and curly braces in PL syntax. That's not one of my hypotheses. I do think PL syntax is a large problem, and I have some hypotheses about how it disproportionately deters qualified women, but the issues I see go much deeper than what I've called the "punctuation noise" of semicolons and curly braces. (I definitely don't have any hypotheses about female perceptions of the aesthetics of curly braces, which some posters had inferred – none of this is about female aesthetic preferences.) Also, I don't think D is particularly problematic – it has cleaner and clearer syntax than its contemporaries (well, we'll need careful research to know if it truly is clearer to a targeted population). I plan to use D as a presumptive *clearer syntax* condition in some studies – we'll see how it goes. Lastly, I'm not approaching the gender issue from an ideological or PC Principal perspective. My work will focus mostly on cognitive science and pedagogical factors – as you'll see below, I'm interested in diversity issues from lots of angles, but I don't subscribe to the diversity ideology that is fashionable in American academia. One D-specific question I do have: Have any women ever posted here? I scoured a bunch of threads here recently and couldn't find a female poster. By this I mean a poster whose supplied name was female, where a proper name was supplied (some people just have usernames). Of course we don't really know who is posting, and there could be some George Eliot situations, but the presence/absence of self-identified women is useful enough. Women are underrepresented in programming, but the skew in online programming communities is even more extreme – we're seeing near-zero percent in lots of boards. This is not a D-specific problem. Does anyone know of occasions where women posted here? Links? Getting back to the research, recent studies have argued that one reason women are underrepresented in certain STEM fields is that smart women have more options than smart men. So think of the right tail of the bell curve, the men and women in that region on the relevant aptitudes for STEM fields. There's some evidence that smart women have a broader set of skills -- *on average* -- than equivalently smart men, perhaps including better social skills (or more interest in social interaction). This probably fits with stereotypes and intuitions a lot of people already held (lots of stereotypes are accurate, as probability distributions and so forth). I'm interested in monocultures and diversity issues in a number of domains. I've done some recent work on the lack of philosophical and political diversity in social science, particularly in social psychology, and how this has undermined the quality and validity of our research (here's a recent paper by me and my colleagues in Behavioral and Brain Sciences: http://dx.doi.org/10.1017/S0140525X14000430). My interest in the lack of gender diversity in programming is an entirely different research area, but there isn't much rigorous social science and cognitive psychology research on this topic, which surprised me. I think it's an important and interesting issue. I also think a lot of the diversity efforts that are salient in tech right now are acting far too late in the cycle, sort of just waiting for women and minorities to show up. The skew starts long before people graduate with a CS degree, and I think Google, Microsoft, Apple, Facebook, et al. should think deeply about how programming language design might be contributing to these effects (especially before they roll out any more C-like programming languages). Informally, I think what's happening in many cases is that when smart women are exposed to programming, it looks ridiculous and they think something like "Screw this – I'm going to med school", or any of a thousand permutations of that sentiment. Mainstream PL syntax is extremely unintuitive and poorly designed by known pedagogical, epistemological, and
Re: Researcher question – what's the point of semicolons and curly braces?
On Tuesday, 3 May 2016 at 04:24:37 UTC, Adam D. Ruppe wrote: On Tuesday, 3 May 2016 at 03:48:09 UTC, Joe Duarte wrote: Would it be difficult to compile the clean version? You realize your bias is showing very strongly in the wording of this question, right? I don't agree the naked version is clean at all. Fair point. I probably am biased, though I don't think an objective definition of clean as having less text or punctuation would be too controversial. Maybe compact vs verbose would be more objective, though those terms are usually used to refer to differences in amount of text/keywords, repetition, etc. (e.g Python vs Java)
Re: Researcher question – what's the point of semicolons and curly braces?
On Tuesday, 3 May 2016 at 04:23:48 UTC, Walter Bright wrote: On 5/2/2016 8:48 PM, Joe Duarte wrote: Why are curly braces and semicolons necessary? What information do they carry that a compiler could not otherwise reliably obtain? You are correct in that they are (mostly) redundant. Some ambiguities can arise because D is not a whitespace delimited language. However, the real reasons are: 1. Redundancy in specification means the compiler can catch more 'typo' mistakes rather than having them compile successfully and then behave mysteriously. If a language has 0 redundancy, then any 8745b48%%&*&hjdsfh string would be a valid program. Redundancy is a critical feature of high reliability languages. Many languages have removed redundancy only to put it back in after bitter experience. The classic is implicit declaration of variables. 2. The redundancy also means the compiler can 'resync' itself to the input once a syntactic error is detected. 3. It's instantly familiar to those who program already in "curly brace" languages. Your point about redundancy is interesting. I assume typos aren't random, and I wonder if anyone has researched the patterns there, which could inform where PL designers would want to insert guards/redundancy with syntax. I wonder if I could dig into this with GitHub and BitBucket repos. Maybe other researchers already have. I'm also thinking that braces and semicolons might be satisfying to some (most?) programmers as an element of real or perceived rigor or safety, independent of the redundancy issue. For example, I'm a bit surprised by how popular SASS/SCSS is compared to Stylus (CSS preprocessors), given that SASS requires a lot of braces and semicolons while Stylus requires neither and has what I've been calling "clean" syntax. There could be feature differences I don't know about, but I wonder if people feel less safe with plain, unadorned text. I remember that Rob Pike explained why Go requires braces by recounting how at Google their tools sometimes lost or damaged the indentation in Python source files, breaking those programs. I would think that you'd just fix your tools in that case. People build such amazing software these days that I'm surprised there'd be any issue in nailing down software that handles text files without messing up their whitespace or other syntactic structure. I don't know, maybe this is a recurring challenge. In any case, your redundancy point stands on its own.
Researcher question – what's the point of semicolons and curly braces?
Hi all, I'm a social scientist and I'm preparing some studies on the effects of programming language syntax on learning, motivation to pursue programming, as well as any disproportionate effects that PL syntax has on the appeal of programming to women (more on the latter in a separate post). So I want to get a better idea of the rationale for various syntactical design decisions, and I'm going to ask you the same question I'll ask the Rust community: Why are curly braces and semicolons necessary? What information do they carry that a compiler could not otherwise reliably obtain? Here's an example from the D Overview page: class Foo { int foo(Bar c) { return c.bar; } } class Bar { int bar() { return 3; } } Okay, if we remove the curly braces and semicolons, we have: class Foo int foo(Bar c) return c.bar class Bar int bar() return 3 Would it be difficult to compile the clean version? Would there be issues with the design of the lexer/parser? I assume the compiler would recognize keywords like return (and a clean syntax could drive different rules for what statements and expressions could appear on the same line and so forth). In reality, a compiler would see the above with line ending characters terminating every line (e.g. U+000A), so it would be as line-aware as a human. I've never built lexers or parsers, much less compilers, so maybe I'm missing a major implementation hurdle. I'm just thinking that Facebook has built software that recognizes my face in other people's pictures, so it seems like building software that understands structured text would be a solved problem. It puzzles me to see so much apparent punctuation noise in a 21st-century language (and, to be fair, Rust puzzles me for the same reasons). JD
Re: Any usable SIMD implementation?
On Saturday, 23 April 2016 at 10:40:12 UTC, Johan Engelen wrote: On Monday, 18 April 2016 at 00:27:06 UTC, Joe Duarte wrote: Someone else said talked about marking "Broadwell" and other generation names. As others have said, it's better to specify features. I wanted to chime in with a couple of additional examples. Intel's transactional memory accelerating instructions (TSX) are only available on some Broadwell parts because there was a bug in the original implementation (Haswell and early Broadwell) and it's disabled on most. But the new Broadwell server chips have it, and it's a big deal for some DB workloads. Similarly, only some Skylake chips have the Secure Guard instructions (SGX), which are very powerful for creating secure enclaves on an untrusted host. Thanks, I've seen similar comments in LLVM code. I have a question perhaps you can comment on? With LLVM, it is possible to specify something like "+sse3,-sse2" (I did not test whether this actually results in SSE3 instructions being used, but no SSE2 instructions). What should be returned when querying whether "sse3" feature is enabled? Should __traits(targetHasFeature, "sse3") == true mean that implied features (such as sse and sse2) are also available? If you specify SSE3, you should definitely get SSE2 and plain old SSE with it. SSE3 is a superset of SSE2 and includes all the SSE2 instructions (more than 100 I think.) I'm not sure about your syntax – I thought the hyphen meant to include the option, not remove it, and I haven't seen the addition sign used for those settings. But I haven't done much with those optimization flags. You wouldn't want to exclude SSE2 support because it's becoming the bare minimum baseline for modern systems, the de facto FP unit. Windows 10 requires a CPU with SSE2, as do more and more applications on the archaic Unix-like platforms.
Re: Any usable SIMD implementation?
On Tuesday, 5 April 2016 at 10:27:46 UTC, Walter Bright wrote: Besides, I think it's a poor design to customize the app for only one SIMD type. A better idea (I've repeated this ad nauseum over the years) is to have n modules, one for each supported SIMD type. Compile and link all of them in, then detect the SIMD type at runtime and call the corresponding module. (This is how the D array ops are currently implemented.) There are many organizations in the world that are building software in-house, where such software is targeted to modern CPU SIMD types, most typically AVX/AVX2 and crypto instructions. In these settings -- many of them scientific compute or big data center operators -- they know what servers they have, what CPU platforms they have. They don't care about portability to the past, older computers and so forth. A runtime check would make no sense for them, not for their baseline, and it would probably be a waste of time for them to design code to run on pre-AVX silicon. (AVX is not new anymore -- it's been around for a few years.) Good examples can be found on Cloudflare's blog, especially Vlad Krasnov's posts. Here's one where he accelerates Golang's crypto libraries: https://blog.cloudflare.com/go-crypto-bridging-the-performance-gap/ Companies like CF probably spend millions of dollars on electricity, and there are some workloads where AVX-optimized code can yield tangible monetary savings. Someone else said talked about marking "Broadwell" and other generation names. As others have said, it's better to specify features. I wanted to chime in with a couple of additional examples. Intel's transactional memory accelerating instructions (TSX) are only available on some Broadwell parts because there was a bug in the original implementation (Haswell and early Broadwell) and it's disabled on most. But the new Broadwell server chips have it, and it's a big deal for some DB workloads. Similarly, only some Skylake chips have the Secure Guard instructions (SGX), which are very powerful for creating secure enclaves on an untrusted host. On the broader SIMD-as-first-class-citizen issue, I think it would be worth thinking about how to bake SIMD into the language instead of bolting it on. If I were designing a new language in 2016, I would take a fresh look at how SIMD could be baked into a language's core constructs. I'd think about new loop abstractions that could make SIMD easier to exploit, and how to nudge programmers away from serial monotonic mindsets and into more of a SIMD/FMA way of reasoning.