Re: Follow-up post explaining research rationale

2016-05-15 Thread Joe Duarte via Digitalmars-d

On Sunday, 15 May 2016 at 10:52:47 UTC, Timon Gehr wrote:

On 15.05.2016 05:02, Joe Duarte wrote:

Type systems are quite arbitrary and primitive


That may apply to the popular ones.


-- we could've moved to  real-world types


The "real world" is complex and there are bound to be some 
modeling limitations. I don't really see what "real-world" type 
is supposed to mean.



a long time ago, which would be much safer and a hell
of a lot more productive.


How would that work/what's the difference?


Here's what I think is the first big exploration of a real-world 
type system: 
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.397


I would combine it with something like Precimonious: 
http://www.cs.berkeley.edu/~ksen/papers/precimonious.pdf







Re: Follow-up post explaining research rationale

2016-05-14 Thread Joe Duarte via Digitalmars-d

On Friday, 13 May 2016 at 22:13:50 UTC, QAston wrote:

Mainstream PL syntax is extremely unintuitive and poorly 
designed by known pedagogical, epistemological, and 
communicative science standards. The vast majority people who 
are introduced to programming do not pursue it (likely true of 
many fields, but programming may see a smaller grab than most 
– this point requires a lot more context). I'm open to the 
possibility that the need to master the bizarre syntax of 
incumbent programming languages might serve as a useful filter 
for qualities valuable in a programmer, but I'm not sure how 
good or precise the filter is.


Your research seems to have a very big ommission: textual 
representation is not only representation of programs - 
therfore programming doesn't have to have syntax. The first 
programming enviroment I was introduced to was an executable 
flowchart enviroment.


Quick note: I'm looking at the effects of the syntax and design 
of incumbent programming languages on the appeal of programming 
to people in general, with some specific questions concerning the 
appeal of programming to women (particularly elite women who have 
many career options.)


So that research track is bound to the world as I find it, and 
the world as I find it is a world where graphical programming 
languages and environments are not widely used, and where all the 
programming languages in wide use are plain text.


That said, I'm deeply interested in graphical and visual 
programming methods. I know quite a bit about them, have studied 
lots of historic efforts and even contemporary ones like Snap 
(http://snap.berkeley.edu/) and Scratch. Side note: I'm a bit of 
a Russophile, and I'm fascinated by the history of Soviet 
computing, with innovations like the Elbrus systems with tagged 
memory, and the ways they cloned major American computing 
platforms. Something I dug into last year is the DRAKON graphical 
programming language, which they built as part of the Buran space 
shuttle program. Very interesting: 
https://en.wikipedia.org/wiki/DRAKON


Programmers tend to be conservative in many respects. There's the 
assumption that programming must consist of a programmer typing 
plain text into a flat document that contains many levels of 
abstraction and many different kinds of concepts. By flat I mean 
that it's this continuous blob that takes no account of its 
content, of varying levels of abstraction, of the wildly 
different kinds of work and reasoning that are expressed in this 
run-on text file. Then a compiler takes this text and does its 
thing. There's very little visual representation. Type systems 
are quite arbitrary and primitive -- we could've moved to 
real-world types a long time ago, which would be much safer and a 
hell of a lot more productive. Type theory imports the barbarism 
of legacy type systems and doesn't question the assumption that 
the universe is best carved into ints and floats at the source 
code level, instead of prices, km, or seconds. Compilers still 
don't know that something called lastName is a string (or better 
yet, a *text* type -- strings are for yo-yos), or that salesTax 
is a decimal. That's really simple stuff. It's nuts that with 
semantic names we still have to assign types, and that those 
types are so much coarser than the information already in the 
name. Our vaunted notion of type safety is based on an incredibly 
coarse split between ints, floats, and maybe strings or arrays. I 
think it should confuse every CS student how these distinctions 
came to preoccupy us, and why we don't have anything more to say 
about types at this point in history.


So graphical programming is a huge change for a very conservative 
field, and probably has worse than "script kiddie" connotations 
for many professional programmers. And there's no compelling 
evidence that graphical is better than text for many use cases. 
It might be better for some people, and a one-to-one mapping 
between a powerful textual PL and a graphical form would be very 
interesting.


Re: Follow-up post explaining research rationale

2016-05-14 Thread Joe Duarte via Digitalmars-d

On Tuesday, 10 May 2016 at 13:40:30 UTC, Chris wrote:

On Monday, 9 May 2016 at 19:09:35 UTC, Joe Duarte wrote:
[snip]
Let me give you a sense of the sorts of issues I'm thinking 
of. Here is a C sample from ProgrammingSimplified.com. It 
finds the frequency of characters in a string:


int main()
{
   char string[100];
   int c = 0, count[26] = {0};

   printf("Enter a string\n");
   gets(string);

   while (string[c] != '\0')
   {
  /** Considering characters from 'a' to 'z' only
  and ignoring others */

  if (string[c] >= 'a' && string[c] <= 'z')
 count[string[c]-'a']++;

  c++;
   }

   for (c = 0; c < 26; c++)
   {
  /** Printing only those characters
  whose count is at least 1 */

  if (count[c] != 0)
 printf("%c occurs %d times in the entered 
string.\n",c+'a',count[c]);

   }

   return 0;
}


[snap]

I went to www.programmingsimplified.com/c-program-examples and 
found that this was example 48 out of 59. The examples start 
with:


- Hello world
- Print Integer
- Addition
- Odd or Even
- Add, subtract, multiply and divide
- Check vowel
- Leap year
- Add digits
- [...]

and so on, with increasing complexity.

Nobody starts with examples like the one above. More likely 
with number 1 in their list:


#include 

int main()
{
  printf("Hello world\n");
  return 0;
}

Not so difficult to understand.



You're arguing that the 32-line example on finding characters 
frequency in a string was too complicated? I think it might help 
to clarify my purpose in that post. It was to pick a random 
example of a simple C program to illustrate the sorts of problems 
programming syntax has from a cognitive science and pedagogical 
standpoint. For my purposes, I didn't need to choose a 6-line 
program, and something that short would probably undermine my 
ability to illustrate some things.


Note also that I saw myself as being a bit *charitable* to C by 
choosing that sample. For instance, I didn't use an example 
littered with the word "void". Void in English most commonly 
means invalid, canceled, or not binding, as in a voided check, a 
void contract (such as where one party is a minor), and "null and 
void" is a common usage, so starting a function declaration by 
declaring it void is jarring. There was a discussion that Walter 
linked to from the late 1980s I believe, where people were 
requesting that this issue be fixed in C (Walter linked to it as 
background on the naming of D I think). It's a hole in the type 
system and bad syntax -- I predict that it adds confusion to 
learning a language that uses it.


Something I've wondered is if foreigners actually have an easier 
time with the syntax of English programming languages. The 
jarring usage of terms like void, or things like dollar signs and 
question marks to not mark dollars or questions, might not be 
jarring to non-native English speakers, or non-English speakers. 
For them, all this might just be arbitrary tokens and they'd just 
have to learn what the tokens signify (and it's very interesting 
to think about how they and English speakers learn this). Andreas 
Stefik did some research where he used a randomly generated 
programming syntax, I think it was called Randomo, and some 
mainstream languages were just as hard to learn as the randomly 
generated one (Java or C, I think -- they're both pretty bad 
anyway). To non-English speakers, and especially 
non-Latin-alphabet-users, all of our programming languages might 
be randomly generated for all intents and purposes.


You gave a Hello World example above. Don't get me started on 
Hello World. Well, too late... Here's the deal. It's not just 
Hello World -- a ton of programming examples in online references 
and introductory texts, perhaps most of them, present programs 
that are pointless. By "pointless" I mean they do no useful work. 
Commonly, they repeat something back to us. So here we're typing 
64-ish characters into a console or something in order to have 
the computer repeat back an 11-character string to us. If we have 
to type "Hello World" (and a bunch of other stuff) in order for 
the computer to display it, well we've not had a productive 
interaction.


I think examples should quickly illustrate computers computing or 
otherwise doing productive things for us. Displaying a string in 
a console environment is also too far removed from normal 
computing environments like a smartphone or Windows 10/MacOS 
graphical desktop. If we're going to teach displaying a pop-up 
message, we should cut right to a windowing environment from the 
outset.


I think the Hello World example is also extremely confusing to 
newbies and rational knowers. It's not clear why the main 
function has an integer type, since we're not doing anything with 
integers, why there is an empty set of parentheses after it (what 
are they doing there?), why we're saying "printf" when we don't 
want to use our printer, why there's a newline code *inside the 
quotes*

Re: Researcher question – what's the point of semicolons and curly braces?

2016-05-14 Thread Joe Duarte via Digitalmars-d

On Tuesday, 3 May 2016 at 22:17:18 UTC, cym13 wrote:

In my opinion putting commas at the end of a line is useless: 
if it happens at the end of each line (not counting 
closing-braces ones) then the effective quantity of information 
brought is null, and in more than one case chaining 
instructions on the same line (the only utility of commas) ends 
up being a bad idea. But it's part of those things that have 
such an inertia that you just wouldn't ever had heard of D if 
it hadn't had it I think. Having commas was not decided by 
their intrinsic usefulness but by the choice to target the 
C/C++ market.


Good point that line-ending semicolons carry no information if 
they're on every line (I assume you meant semicolons instead of 
commas).


An important point that I think is undocumented: text editors 
don't let you inhabit a new line unless you press Enter on the 
line above. In other words, you can't have a new line by using 
the down arrow or some other means. When I first learned 
programming, I was stumped by how a compiler was line-aware, how 
it knew when a line was truly ended, and what counted as line qua 
line (I wrongly assumed you could down-arrow to a new line). It's 
an invisible character by default, and they don't tell you how 
text editors behave. This comes up a bit in Markdown and in how 
people are inconsistently defining a "hard" vs "soft" line break.



But Python sacrifices a *lot* of performances to do that. D has 
its own way and different goals. Being performance-friendly is 
one of them and that sometimes gets you with long functions and 
ugly hacks. When it comes to that having curly braces (well any 
kind of delitmiter really) is a great thing.


It's not clear how curly braces deliver better performance. 
Anything expressed with curly braces can be expressed without 
them -- i.e. you could design a language in which that were true. 
Walter mentioned the issue of redundancy, which seems reasonable, 
but that doesn't bear on the performance issue. A good example of 
a non-curly brace compiled language is Crystal, at least last 
time I checked. Python loses a lot for being a text-executing 
interpreted language. What an interpreter does -- in comparison 
to a JIT compiler -- is wildly underdocumented. The standard 
answer to a lot of people on the web asking for an explanation is 
that a JIT compiles down to native code or machine code, while an 
interpret just interprets the code, or sometimes you'll see 
"executes it directly". Big gaping hole on how it gets down to 
machine code. But starting with text is crippling. I love 
Walter's decision to have pre-compiled modules instead of text 
headers -- I didn't realize that C compilers were literally 
parsing all this text every time.


Python could get some big wins from a well-designed IR and 
follow-on back-end code generator, or a JIT, or some combo. This 
is obviously not a new idea, but no one seems willing to do it in 
a professional, focused, and expensive way. Unladen Swallow was 
weird in that you had a couple of kids, undergrad students who 
had no experience trying to build it all. It's weird how casual 
and half-assed a lot of software projects are. If I were trying 
to do this, I'd want to assemble the Avengers -- I'd want a large 
team of elite software developers, architects, and testers, 
enough to do it in a year. That's a rare setup, but it's how I 
would do it if I were Microsoft, Google, FB, et al -- if I were 
willing to spend $20 million on it, say. Pyjion might become 
something interesting, but right now it looks pretty casual and 
might be the kind of thing where they'll a lot of outside 
open-source developer help (https://github.com/Microsoft/Pyjion). 
Pyston is only focused on Python 2, which is rearview mirror 
thing.


By the way, anyone should be able to create a version of C, D, or 
Go that doesn't use curly braces or semicolons, just by enforcing 
some rules about indentation and maybe line length that are 
already adhered to by virtue of common coding standards (e.g. 
blocks are typically indented; and I realize Go doesn't require 
semicolons). If we looked at typical code examples in almost any 
language like C, C#, D, Java, Swift, and we systematically 
encoded their meaning, reducing them down to a concise and 
non-redundant form, we'd find lots of redundancy and a lot of 
textual dead code, so to speak. This would be true even without 
semicolons and braces. There's still a lot for a compiler or any 
interpretive agent to go on.




Re: Researcher question – what's the point of semicolons and curly braces?

2016-05-13 Thread Joe Duarte via Digitalmars-d

On Tuesday, 3 May 2016 at 12:47:42 UTC, qznc wrote:


The parser needs information about "blocks". Here is an example:

  if (x)
foo();
bar();

Is bar() always executed or only if (x) is true? In other 
words, is bar() part of the block, which is only entered 
conditionally?


There are three methods to communicate blocks to the compiler: 
curly braces, significant whitespace (Python, Haskell), or an 
"end" keyword (Ruby, Pascal). Which one you prefer is 
subjective.


You mention Facebook and face recognition. I have not seen 
anyone try machine learning for parsing. It would probably be a 
fun project, but not a practical one.


You wonder that understanding structured text should be a 
solved problem. It is. You need to use a formal language, which 
programming languages are. English for example is much less 
structured. There easily are ambiguities. For example:


  I saw a man on a hill with a telescope.

Who has the telescope? You or the man you saw? Who is on the 
hill?


As a programmer, I do not want to write ambiguous programs. We 
produce more than enough bugs without ambiguity.


Thanks for the example! So you laid out the three options for 
signifying blocks. Then you said which one you prefer is 
subjective, but that you don't want to write ambiguous programs. 
Do you think that the curly braces and semicolons help with that?


So in your example, I figure bar's status is language-defined, 
and programmers will be trained in the language in the same way 
they are now. I've been sketching out a new language, and there 
are a couple of ways I could see implementing this.


First, blocks of code are separated by one or more blank lines. 
No blank lines are allowed in a block. An if block would have to 
terminate in an else statement, so I think this example just 
wouldn't compile. Now if we wanted two things to happen on an if 
hit, we could leave it the way you gave where the two things are 
at the same level of indentation. That's probably what I'd settle 
on, contingent on a lot of research, including my own studies and 
other researchers', though this probably isn't one of the big 
issues. If we wanted to make the second thing conditional on 
success on the first task, then I would require another indent. 
Either way the block wouldn't compile without an else.


I've been going through a lot of Unicode, icon fonts, and the 
Noun Project, looking for clean and concise representations for 
program logic. One of the ideas I've been working with is to 
leverage Unicode arrows. In most cases it's trivial aesthetic 
clean-up, like → instead of ->, and a lot of it could be simple 
autoreplace/autocomplete in tools. For if logic, you can an 
example of bent arrows, and how I'd express the alternatives for 
your example here: 
http://i1376.photobucket.com/albums/ah13/DuartePhotos/if%20block%20with%20Unicode%20arrows_zpsnuigkkxz.png





Re: Follow-up post explaining research rationale

2016-05-09 Thread Joe Duarte via Digitalmars-d

On Monday, 9 May 2016 at 20:17:40 UTC, ag0aep6g wrote:

Am 09.05.2016 um 21:09 schrieb Joe Duarte:
4. We switch the person or voice from an imperative "do this" 
as in
printf, to some sort of narrator third-person voice with 
"gets".


"gets" is still imperative. It's short for "get string". Not 
saying that this is obvious, or that it's a good name.


Ah, I forgot about that! I think puts has the same abbreviation 
structure right? put string... I think knowing/remembering that 
it's an abbreviation would make it less distracting. My calling 
it a shift in voice is incorrect assuming people remember what it 
stands for.


JD


Re: Always false float comparisons

2016-05-09 Thread Joe Duarte via Digitalmars-d

On Monday, 9 May 2016 at 09:10:19 UTC, Walter Bright wrote:

Don Clugston pointed out in his DConf 2016 talk that:

float f = 1.30;
assert(f == 1.30);

will always be false since 1.30 is not representable as a 
float. However,


float f = 1.30;
assert(f == cast(float)1.30);

will be true.

So, should the compiler emit a warning for the former case?


I think it really depends on what the warning actually says. I 
think people have different expectations for what that warning 
would be.


When you say 1.30 is not representable as a float, when is the 
"not representable" enforced? Because it looks like the 
programmer just represented it in the assignment of the literal – 
but that's not where the warning would be right? I mean I assume 
so because people need nonrational literals all the time, and 
this is the only way they can do it, which means it's a hole in 
the type system right? There should be a decimal type to cover 
all these cases, like some databases have.


Would the warning say that you can't compare 1.30 to a float 
because 1.30 is not representable as a float? Or would it say 
that f was rounded upon assignment and is no longer 1.30?


Short of a decimal type, I think it would be nice to have a 
"float equality" operator that covered this whole class of cases, 
where floats that started their lives as nonrational literals and 
floats that have been rounded with loss of precision can be 
treated as equal if they're within something like .001% of 
each other (well a  percentage that can actually be represented 
as a float...) Basically equality that covers the known 
mutational properties of fp arithmetic.


There's no way to do this right now without ranges right? I know 
that ~ is for concat. I saw ~= is an operator. What does that do? 
The Unicode ≈ would be nice for this.


I assume IEEE 754 or ISO 10967 don't cover this? I was just 
reading the latter (zip here: 
http://standards.iso.org/ittf/PubliclyAvailableStandards/c051317_ISO_IEC_10967-1_2012.zip)


Re: Follow-up post explaining research rationale

2016-05-09 Thread Joe Duarte via Digitalmars-d

On Monday, 9 May 2016 at 20:29:12 UTC, Joe Duarte wrote:

On Monday, 9 May 2016 at 20:09:35 UTC, Adam D. Ruppe wrote:

I'd also be surprised if you find an empirical gender gap 
after controlling for programming language syntax, too. Even 
if we grant that PL syntax is suboptimal, why would that 
result in a gender bias? But, hey, you never really know until 
you actually collect the data...


I forgot to mention the math. You can run the model in your 
head. If group W has more career options than group M, W will 
be underrepresented in career domain A. The effect will be 
larger if A is less appealing than W's other options, ceteris 
paribus and with some starting assumptions. (But it doesn't 
need to be, if W has more options than M.)


If aspects of career domain A are *equally frustrating* for 
members of groups W and M, W will still be underrepresented 
(and M overrepresented) if people in W have more options. So we 
don't even need it to be the case that bizarre programming 
language design disproportionately annoys women for bizarre 
programming language design to result in the 
underrepresentation of women.


JD


(Assuming A is included in the set of options for both groups, 
and is equally available to them.)


Re: Follow-up post explaining research rationale

2016-05-09 Thread Joe Duarte via Digitalmars-d

On Monday, 9 May 2016 at 20:09:35 UTC, Adam D. Ruppe wrote:

I'd also be surprised if you find an empirical gender gap after 
controlling for programming language syntax, too. Even if we 
grant that PL syntax is suboptimal, why would that result in a 
gender bias? But, hey, you never really know until you actually 
collect the data...


I forgot to mention the math. You can run the model in your head. 
If group W has more career options than group M, W will be 
underrepresented in career domain A. The effect will be larger if 
A is less appealing than W's other options, ceteris paribus and 
with some starting assumptions. (But it doesn't need to be, if W 
has more options than M.)


If aspects of career domain A are *equally frustrating* for 
members of groups W and M, W will still be underrepresented (and 
M overrepresented) if people in W have more options. So we don't 
even need it to be the case that bizarre programming language 
design disproportionately annoys women for bizarre programming 
language design to result in the underrepresentation of women.


JD


Follow-up post explaining research rationale

2016-05-09 Thread Joe Duarte via Digitalmars-d

Hi all,

As I mentioned on the other thread where I asked about D syntax, 
I'm a social scientist about to launch some studies of the 
effects of PL syntax on learnability, motivation to pursue 
programming, and differential gender effects on these factors. 
This is a long post – some of you wanted to know more about my 
research goals and rationale, and I also said I would post 
separately on the gender issue, so here we go...


As you know, women are starkly underrepresented in software 
engineering roles. I'm interested in zooming back to the 
decisions people are making when they're 16 or 19 re: programming 
as a career. I'm interested in people's *first encounters* with 
programming, in high school or college, how men and women might 
differentially assess programming as a career option, and why.


Let me note a few things: Someone on the other thread thought 
that my hypothesis was that women don't become programmers 
because of the semicolons and curly braces in PL syntax. That's 
not one of my hypotheses. I do think PL syntax is a large 
problem, and I have some hypotheses about how it 
disproportionately deters qualified women, but the issues I see 
go much deeper than what I've called the "punctuation noise" of 
semicolons and curly braces. (I definitely don't have any 
hypotheses about female perceptions of the aesthetics of curly 
braces, which some posters had inferred – none of this is about 
female aesthetic preferences.)


Also, I don't think D is particularly problematic – it has 
cleaner and clearer syntax than its contemporaries (well, we'll 
need careful research to know if it truly is clearer to a 
targeted population). I plan to use D as a presumptive *clearer 
syntax* condition in some studies – we'll see how it goes. 
Lastly, I'm not approaching the gender issue from an ideological 
or PC Principal perspective. My work will focus mostly on 
cognitive science and pedagogical factors – as you'll see below, 
I'm interested in diversity issues from lots of angles, but I 
don't subscribe to the diversity ideology that is fashionable in 
American academia.


One D-specific question I do have: Have any women ever posted 
here? I scoured a bunch of threads here recently and couldn't 
find a female poster. By this I mean a poster whose supplied name 
was female, where a proper name was supplied (some people just 
have usernames). Of course we don't really know who is posting, 
and there could be some George Eliot situations, but the 
presence/absence of self-identified women is useful enough. Women 
are underrepresented in programming, but the skew in online 
programming communities is even more extreme – we're seeing 
near-zero percent in lots of boards. This is not a D-specific 
problem. Does anyone know of occasions where women posted here? 
Links?


Getting back to the research, recent studies have argued that one 
reason women are underrepresented in certain STEM fields is that 
smart women have more options than smart men. So think of the 
right tail of the bell curve, the men and women in that region on 
the relevant aptitudes for STEM fields. There's some evidence 
that smart women have a broader set of skills -- *on average* -- 
than equivalently smart men, perhaps including better social 
skills (or more interest in social interaction). This probably 
fits with stereotypes and intuitions a lot of people already held 
(lots of stereotypes are accurate, as probability distributions 
and so forth).


I'm interested in monocultures and diversity issues in a number 
of domains. I've done some recent work on the lack of 
philosophical and political diversity in social science, 
particularly in social psychology, and how this has undermined 
the quality and validity of our research (here's a recent paper 
by me and my colleagues in Behavioral and Brain Sciences: 
http://dx.doi.org/10.1017/S0140525X14000430). My interest in the 
lack of gender diversity in programming is an entirely different 
research area, but there isn't much rigorous social science and 
cognitive psychology research on this topic, which surprised me. 
I think it's an important and interesting issue. I also think a 
lot of the diversity efforts that are salient in tech right now 
are acting far too late in the cycle, sort of just waiting for 
women and minorities to show up. The skew starts long before 
people graduate with a CS degree, and I think Google, Microsoft, 
Apple, Facebook, et al. should think deeply about how programming 
language design might be contributing to these effects 
(especially before they roll out any more C-like programming 
languages).


Informally, I think what's happening in many cases is that when 
smart women are exposed to programming, it looks ridiculous and 
they think something like "Screw this – I'm going to med school", 
or any of a thousand permutations of that sentiment.


Mainstream PL syntax is extremely unintuitive and poorly designed 
by known pedagogical, epistemological, and 

Re: Researcher question – what's the point of semicolons and curly braces?

2016-05-03 Thread Joe Duarte via Digitalmars-d

On Tuesday, 3 May 2016 at 04:24:37 UTC, Adam D. Ruppe wrote:

On Tuesday, 3 May 2016 at 03:48:09 UTC, Joe Duarte wrote:

Would it be difficult to compile the clean version?


You realize your bias is showing very strongly in the wording 
of this question, right? I don't agree the naked version is 
clean at all.


Fair point. I probably am biased, though I don't think an 
objective definition of clean as having less text or punctuation 
would be too controversial. Maybe compact vs verbose would be 
more objective, though those terms are usually used to refer to 
differences in amount of text/keywords, repetition, etc. (e.g 
Python vs Java)


Re: Researcher question – what's the point of semicolons and curly braces?

2016-05-03 Thread Joe Duarte via Digitalmars-d

On Tuesday, 3 May 2016 at 04:23:48 UTC, Walter Bright wrote:

On 5/2/2016 8:48 PM, Joe Duarte wrote:
Why are curly braces and semicolons necessary? What 
information do they

carry that a compiler could not otherwise reliably obtain?


You are correct in that they are (mostly) redundant. Some 
ambiguities can arise because D is not a whitespace delimited 
language. However, the real reasons are:


1. Redundancy in specification means the compiler can catch 
more 'typo' mistakes rather than having them compile 
successfully and then behave mysteriously. If a language has 0 
redundancy, then any 8745b48%%&*&hjdsfh string would be a valid 
program. Redundancy is a critical feature of high reliability 
languages.


Many languages have removed redundancy only to put it back in 
after bitter experience. The classic is implicit declaration of 
variables.


2. The redundancy also means the compiler can 'resync' itself 
to the input once a syntactic error is detected.


3. It's instantly familiar to those who program already in 
"curly brace" languages.


Your point about redundancy is interesting. I assume typos aren't 
random, and I wonder if anyone has researched the patterns there, 
which could inform where PL designers would want to insert 
guards/redundancy with syntax. I wonder if I could dig into this 
with GitHub and BitBucket repos. Maybe other researchers already 
have.


I'm also thinking that braces and semicolons might be satisfying 
to some (most?) programmers as an element of real or perceived 
rigor or safety, independent of the redundancy issue. For 
example, I'm a bit surprised by how popular SASS/SCSS is compared 
to Stylus (CSS preprocessors), given that SASS requires a lot of 
braces and semicolons while Stylus requires neither and has what 
I've been calling "clean" syntax. There could be feature 
differences I don't know about, but I wonder if people feel less 
safe with plain, unadorned text.


I remember that Rob Pike explained why Go requires braces by 
recounting how at Google their tools sometimes lost or damaged 
the indentation in Python source files, breaking those programs. 
I would think that you'd just fix your tools in that case. People 
build such amazing software these days that I'm surprised there'd 
be any issue in nailing down software that handles text files 
without messing up their whitespace or other syntactic structure. 
I don't know, maybe this is a recurring challenge. In any case, 
your redundancy point stands on its own.


Researcher question – what's the point of semicolons and curly braces?

2016-05-02 Thread Joe Duarte via Digitalmars-d

Hi all,

I'm a social scientist and I'm preparing some studies on the 
effects of programming language syntax on learning, motivation to 
pursue programming, as well as any disproportionate effects that 
PL syntax has on the appeal of programming to women (more on the 
latter in a separate post).


So I want to get a better idea of the rationale for various 
syntactical design decisions, and I'm going to ask you the same 
question I'll ask the Rust community:


Why are curly braces and semicolons necessary? What information 
do they carry that a compiler could not otherwise reliably obtain?


Here's an example from the D Overview page:


class Foo
{
int foo(Bar c) { return c.bar; }
}

class Bar
{
int bar() { return 3; }
}


Okay, if we remove the curly braces and semicolons, we have:


class Foo

int foo(Bar c) return c.bar


class Bar

int bar() return 3


Would it be difficult to compile the clean version? Would there 
be issues with the design of the lexer/parser? I assume the 
compiler would recognize keywords like return (and a clean syntax 
could drive different rules for what statements and expressions 
could appear on the same line and so forth).


In reality, a compiler would see the above with line ending 
characters terminating every line (e.g. U+000A), so it would be 
as line-aware as a human. I've never built lexers or parsers, 
much less compilers, so maybe I'm missing a major implementation 
hurdle. I'm just thinking that Facebook has built software that 
recognizes my face in other people's pictures, so it seems like 
building software that understands structured text would be a 
solved problem. It puzzles me to see so much apparent punctuation 
noise in a 21st-century language (and, to be fair, Rust puzzles 
me for the same reasons).


JD


Re: Any usable SIMD implementation?

2016-05-02 Thread Joe Duarte via Digitalmars-d

On Saturday, 23 April 2016 at 10:40:12 UTC, Johan Engelen wrote:

On Monday, 18 April 2016 at 00:27:06 UTC, Joe Duarte wrote:


Someone else said talked about marking "Broadwell" and other 
generation names. As others have said, it's better to specify 
features. I wanted to chime in with a couple of additional 
examples. Intel's transactional memory accelerating 
instructions (TSX) are only available on some Broadwell parts 
because there was a bug in the original implementation 
(Haswell and early Broadwell) and it's disabled on most. But 
the new Broadwell server chips have it, and it's a big deal 
for some DB workloads. Similarly, only some Skylake chips have 
the Secure Guard instructions (SGX), which are very powerful 
for creating secure enclaves on an untrusted host.


Thanks, I've seen similar comments in LLVM code.

I have a question perhaps you can comment on?
With LLVM, it is possible to specify something like 
"+sse3,-sse2" (I did not test whether this actually results in 
SSE3 instructions being used, but no SSE2 instructions). What 
should be returned when querying whether "sse3" feature is 
enabled?
Should __traits(targetHasFeature, "sse3") == true mean that 
implied features (such as sse and sse2) are also available?


If you specify SSE3, you should definitely get SSE2 and plain old 
SSE with it. SSE3 is a superset of SSE2 and includes all the SSE2 
instructions (more than 100 I think.)


I'm not sure about your syntax – I thought the hyphen meant to 
include the option, not remove it, and I haven't seen the 
addition sign used for those settings. But I haven't done much 
with those optimization flags.


You wouldn't want to exclude SSE2 support because it's becoming 
the bare minimum baseline for modern systems, the de facto FP 
unit. Windows 10 requires a CPU with SSE2, as do more and more 
applications on the archaic Unix-like platforms.


Re: Any usable SIMD implementation?

2016-04-17 Thread Joe Duarte via Digitalmars-d

On Tuesday, 5 April 2016 at 10:27:46 UTC, Walter Bright wrote:
Besides, I think it's a poor design to customize the app for 
only one SIMD type. A better idea (I've repeated this ad 
nauseum over the years) is to have n modules, one for each 
supported SIMD type. Compile and link all of them in, then 
detect the SIMD type at runtime and call the corresponding 
module. (This is how the D array ops are currently implemented.)


There are many organizations in the world that are building 
software in-house, where such software is targeted to modern CPU 
SIMD types, most typically AVX/AVX2 and crypto instructions.


In these settings -- many of them scientific compute or big data 
center operators -- they know what servers they have, what CPU 
platforms they have. They don't care about portability to the 
past, older computers and so forth. A runtime check would make no 
sense for them, not for their baseline, and it would probably be 
a waste of time for them to design code to run on pre-AVX 
silicon. (AVX is not new anymore -- it's been around for a few 
years.)


Good examples can be found on Cloudflare's blog, especially Vlad 
Krasnov's posts. Here's one where he accelerates Golang's crypto 
libraries: 
https://blog.cloudflare.com/go-crypto-bridging-the-performance-gap/


Companies like CF probably spend millions of dollars on 
electricity, and there are some workloads where AVX-optimized 
code can yield tangible monetary savings.


Someone else said talked about marking "Broadwell" and other 
generation names. As others have said, it's better to specify 
features. I wanted to chime in with a couple of additional 
examples. Intel's transactional memory accelerating instructions 
(TSX) are only available on some Broadwell parts because there 
was a bug in the original implementation (Haswell and early 
Broadwell) and it's disabled on most. But the new Broadwell 
server chips have it, and it's a big deal for some DB workloads. 
Similarly, only some Skylake chips have the Secure Guard 
instructions (SGX), which are very powerful for creating secure 
enclaves on an untrusted host.


On the broader SIMD-as-first-class-citizen issue, I think it 
would be worth thinking about how to bake SIMD into the language 
instead of bolting it on. If I were designing a new language in 
2016, I would take a fresh look at how SIMD could be baked into a 
language's core constructs. I'd think about new loop abstractions 
that could make SIMD easier to exploit, and how to nudge 
programmers away from serial monotonic mindsets and into more of 
a SIMD/FMA way of reasoning.