Re: D is for Data Science
happy shabbat. Shallom. See ya.
Re: D is for Data Science
On Friday, 28 November 2014 at 22:57:31 UTC, CraigDillabaugh wrote: On Friday, 28 November 2014 at 22:41:12 UTC, Tomer Rosenschtein wrote: On Friday, 28 November 2014 at 22:31:19 UTC, CraigDillabaugh wrote: On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote: Tomer Rosenschtein: Awesome article. "Paper of the week" is a modest word for this. The D code is not good. Bye, bearophile Maybe not good by the standards of this group, but it does represent the efforts of someone doing 'real work', so I think it is worthwhile. I would bet that 'in the wild' there is a lot more D code that looks like that than what might be considered good, idiomatic D. Craig I understand why D is still underground.The guy use R, by miracle he suddently test a strong typed-compiled-lang and he concludes: "well, those compiled lang seem interesting...". Then Someone post this here, on reddit, on HackerNews... And Miracle! Everybody thinks it's awesome. Common... You're the one that called it awesome! I don't think anyone here was overly excited about it, but we are always happy to see D get good press. Maybe the guy the wrote the article is just an average programmer, but hey most of the programmers in the world are average programmers - so this article could appeal to that segment of the market. It's not my call. it's one the right side, twitter things. And about "awesome", it looks like you dont get my irony.
Re: D is for Data Science
CraigDillabaugh: Maybe not good by the standards of this group, but it does represent the efforts of someone doing 'real work', so I think it is worthwhile. Perhaps part of the cause of the low quality of the code in that blog post is the design of D language is not "bondage" enough. This worries me a little, because most D code I see in the wild is not good, and looks more like a Java/C++ mix. In Python culture there is a stronger pressure to write Pythonic code similar to Python code written by all other Python programmers. In the Go culture this is even stronger, there's even only one standard way to format code, and the language is simpler so there is less possibility for usage of alternative constructs (while in D you have often five ways to shoot the foot). From what I've seen the Rust culture is more "bondage" than D culture, in both surface look of code, and idioms, and I think this is good. Bye, bearophile
Re: D is for Data Science
On Friday, 28 November 2014 at 22:41:12 UTC, Tomer Rosenschtein wrote: On Friday, 28 November 2014 at 22:31:19 UTC, CraigDillabaugh wrote: On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote: Tomer Rosenschtein: Awesome article. "Paper of the week" is a modest word for this. The D code is not good. Bye, bearophile Maybe not good by the standards of this group, but it does represent the efforts of someone doing 'real work', so I think it is worthwhile. I would bet that 'in the wild' there is a lot more D code that looks like that than what might be considered good, idiomatic D. Craig I understand why D is still underground.The guy use R, by miracle he suddently test a strong typed-compiled-lang and he concludes: "well, those compiled lang seem interesting...". Then Someone post this here, on reddit, on HackerNews... And Miracle! Everybody thinks it's awesome. Common... You're the one that called it awesome! I don't think anyone here was overly excited about it, but we are always happy to see D get good press. Maybe the guy the wrote the article is just an average programmer, but hey most of the programmers in the world are average programmers - so this article could appeal to that segment of the market.
Re: D is for Data Science
On Friday, 28 November 2014 at 22:31:19 UTC, CraigDillabaugh wrote: On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote: Tomer Rosenschtein: Awesome article. "Paper of the week" is a modest word for this. The D code is not good. Bye, bearophile Maybe not good by the standards of this group, but it does represent the efforts of someone doing 'real work', so I think it is worthwhile. I would bet that 'in the wild' there is a lot more D code that looks like that than what might be considered good, idiomatic D. Craig I understand why D is still underground.The guy use R, by miracle he suddently test a strong typed-compiled-lang and he concludes: "well, those compiled lang seem interesting...". Then Someone post this here, on reddit, on HackerNews... And Miracle! Everybody thinks it's awesome. Common...
Re: D is for Data Science
On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote: Tomer Rosenschtein: Awesome article. "Paper of the week" is a modest word for this. The D code is not good. Bye, bearophile Maybe not good by the standards of this group, but it does represent the efforts of someone doing 'real work', so I think it is worthwhile. I would bet that 'in the wild' there is a lot more D code that looks like that than what might be considered good, idiomatic D. Craig
Re: D is for Data Science
On Friday, 28 November 2014 at 22:18:09 UTC, Tomer Rosenschtein wrote: On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote: Tomer Rosenschtein: Awesome article. "Paper of the week" is a modest word for this. The D code is not good. Bye, bearophile But it was worth a reddit and hackerNews redirection: "look at that this fuckin genious who understand everthing" Btw he's not so clever but we promote this paper because we love "papers" OMG a new blog post about D! Mazeltov. I spread it, even if the guy is stupid.
Re: D is for Data Science
On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote: Tomer Rosenschtein: Awesome article. "Paper of the week" is a modest word for this. The D code is not good. Bye, bearophile But it was worth a reddit and hackerNews redirection: "look at that this fuckin genious who understand everthing" Btw he's not so clever but we promote this paper because we love "papers"
Re: D is for Data Science
Tomer Rosenschtein: Awesome article. "Paper of the week" is a modest word for this. The D code is not good. Bye, bearophile
Re: D is for Data Science
On Monday, 24 November 2014 at 15:27:19 UTC, Gary Willoughby wrote: Just browsing reddit and found this article posted about D. Written by Andrew Pascoe of AdRoll. From the article: "The D programming language has quickly become our language of choice on the Data Science team for any task that requires efficiency, and is now the keystone language for our critical infrastructure. Why? Because D has a lot to offer." Article: http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html Reddit: http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/ Awesome article. "Paper of the week" is a modest word for this.
Re: D is for Data Science
On Friday, 28 November 2014 at 12:06:06 UTC, Chris wrote: On Tuesday, 25 November 2014 at 13:24:04 UTC, ketmar via Digitalmars-d-announce wrote: On Mon, 24 Nov 2014 17:10:25 -0800 Walter Bright via Digitalmars-d-announce wrote: I know it's a tough call. But I do see these sorts of comments regularly, and it is a fact that there are too many D libraries gone to seed that won't compile anymore, and that makes us look bad. but D wins in overall. being one of the architects in my bussiness i was eagerly pushing D as our main development language. it's good that this thing (and some other too) happens before i succeeded. now we keep going with C++, as it fscks safety too, fscks principle of least astonishment, almost never fixes inconsistencies, but it has alot more libraries and i can hire alot more programmers with it. i'm still using D as a language for my hobbyst throw-away projects though, and D is great for such things. D wins, 'cause i *almost* stopped ranting (not only in this NG) and just accepting it as is. well, almost as is, i'm applying alot of patches over vanilla D. this, of course, makes my code incompatible with every other D compiler out here, but luckily this is not a concern anymore. "just accepting it as is" - Well, there's no need to do that. If there are issues, you're free to comment on them, make a feature request and/or fix them yourself. Everybody accepts any language "as is" as long as it's a mainstream language, regardless of any shortcomings or major annoyances. Your comment proves just that. Just this week I was working on new software and I'm still amazed at how many options I have in D (and I keep discovering new options). D is always compared to C++ in terms of performance and libraries. Sure, there are more libraries (and by extension programmers) out there for C++. Performance might be better or worse, depending on the library and the programmer. However, The sheer abundance of options and modeling power in D is one of the reasons I stick with D. I deal with problems concerning language processing (grammar, rules etc.), i.e. mapping the human mind to a machine, and D always gives me a way to model complex and intricate systems. Sometimes I look at the code and think "How would I have implemented this in C, Python or Java?" I shudder and say "No way!" Believe it or not, modeling power, often overlooked, is one of the key features of programming languages of the future. Performance can always be improved. But modeling power is hard to add, if you don't have it already. Libraries, well, if you have strong modeling power, you can roll your own very quickly. Maybe an abundance of libraries is a sign that a language lacks modeling power. About the article, it proves two things. First, you can easily roll your own in D. Second, you have to know the language well to be able to get the most out of it without having to roll your own.[1] Either way, it improves your general understanding of programming. [1] This includes not hesitating to ask question on D.learn.
Re: D is for Data Science
On Tuesday, 25 November 2014 at 13:24:04 UTC, ketmar via Digitalmars-d-announce wrote: On Mon, 24 Nov 2014 17:10:25 -0800 Walter Bright via Digitalmars-d-announce wrote: I know it's a tough call. But I do see these sorts of comments regularly, and it is a fact that there are too many D libraries gone to seed that won't compile anymore, and that makes us look bad. but D wins in overall. being one of the architects in my bussiness i was eagerly pushing D as our main development language. it's good that this thing (and some other too) happens before i succeeded. now we keep going with C++, as it fscks safety too, fscks principle of least astonishment, almost never fixes inconsistencies, but it has alot more libraries and i can hire alot more programmers with it. i'm still using D as a language for my hobbyst throw-away projects though, and D is great for such things. D wins, 'cause i *almost* stopped ranting (not only in this NG) and just accepting it as is. well, almost as is, i'm applying alot of patches over vanilla D. this, of course, makes my code incompatible with every other D compiler out here, but luckily this is not a concern anymore. "just accepting it as is" - Well, there's no need to do that. If there are issues, you're free to comment on them, make a feature request and/or fix them yourself. Everybody accepts any language "as is" as long as it's a mainstream language, regardless of any shortcomings or major annoyances. Your comment proves just that. Just this week I was working on new software and I'm still amazed at how many options I have in D (and I keep discovering new options). D is always compared to C++ in terms of performance and libraries. Sure, there are more libraries (and by extension programmers) out there for C++. Performance might be better or worse, depending on the library and the programmer. However, The sheer abundance of options and modeling power in D is one of the reasons I stick with D. I deal with problems concerning language processing (grammar, rules etc.), i.e. mapping the human mind to a machine, and D always gives me a way to model complex and intricate systems. Sometimes I look at the code and think "How would I have implemented this in C, Python or Java?" I shudder and say "No way!" Believe it or not, modeling power, often overlooked, is one of the key features of programming languages of the future. Performance can always be improved. But modeling power is hard to add, if you don't have it already. Libraries, well, if you have strong modeling power, you can roll your own very quickly. Maybe an abundance of libraries is a sign that a language lacks modeling power.
Re: D is for Data Science
On 28 November 2014 at 06:40, Daniel Murphy via Digitalmars-d-announce wrote: > "weaselcat" wrote in message news:rnlbybkfqokypxlgf...@forum.dlang.org... > >> I see array.sort is planned for future deprecation, what does "future" >> fall under? > > > Generally 'future deprecation' means at least 6 months after it gets turned > into a warning. Often it's significantly longer, because nobody bothers to > update it after six months have passed. 1 year down the line, someone notices the "deprecated, planned removal in Nov 2014" comment, and bumps the removal date to Nov 2015. :-)
Re: D is for Data Science
"weaselcat" wrote in message news:rnlbybkfqokypxlgf...@forum.dlang.org... I see array.sort is planned for future deprecation, what does "future" fall under? Generally 'future deprecation' means at least 6 months after it gets turned into a warning. Often it's significantly longer, because nobody bothers to update it after six months have passed.
Re: D is for Data Science
On Mon, 24 Nov 2014 17:10:25 -0800 Walter Bright via Digitalmars-d-announce wrote: > I know it's a tough call. But I do see these sorts of comments regularly, and > it > is a fact that there are too many D libraries gone to seed that won't compile > anymore, and that makes us look bad. but D wins in overall. being one of the architects in my bussiness i was eagerly pushing D as our main development language. it's good that this thing (and some other too) happens before i succeeded. now we keep going with C++, as it fscks safety too, fscks principle of least astonishment, almost never fixes inconsistencies, but it has alot more libraries and i can hire alot more programmers with it. i'm still using D as a language for my hobbyst throw-away projects though, and D is great for such things. D wins, 'cause i *almost* stopped ranting (not only in this NG) and just accepting it as is. well, almost as is, i'm applying alot of patches over vanilla D. this, of course, makes my code incompatible with every other D compiler out here, but luckily this is not a concern anymore. signature.asc Description: PGP signature
Re: D is for Data Science
weaselcat: I see array.sort is planned for future deprecation, what does "future" fall under? For us that activate warnings in dmd (because for a design mistake they are disabled on default, but hopefully this will be fixed in future) in the latest github version of the compiler it gives a warning if you use the built-in sort and "reverse". Unfortunately the library "reverse" still needs to be fixed to return the array as the built-in "reverse". Bye, bearophile
Re: D is for Data Science
On Tuesday, 25 November 2014 at 01:10:56 UTC, Walter Bright wrote: I know it's a tough call. But I do see these sorts of comments regularly, and it is a fact that there are too many D libraries gone to seed that won't compile anymore, and that makes us look bad. Or this: https://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/cmbssac It was the endless std.logger bikeshedding that finally did me in. Even if they get it into std.experimental in the next release, I'm finally done. I cancelled my projects and pulled them off dub. Is this a much better reason?
Re: D is for Data Science
On Tuesday, 25 November 2014 at 01:10:56 UTC, Walter Bright wrote: On 11/24/2014 4:50 PM, Adam D. Ruppe wrote: On Tuesday, 25 November 2014 at 00:34:30 UTC, Walter Bright wrote: Thought I'd post this as a counterpoint to the recent "please break our code" thread. I would caution against putting very much weight in Reddit opinions - there's people who will never use D and just look for excuses to justify their prejudice and there's people who think they want something, but don't really have any idea (this is common in feature requests, as I'm sure you know) That comment, in particular, seems very questionable to me. dstats at least compiles out of the box and has github activity within the last few months. It has a lot of templates, so maybe actually using it would reveal compilation problems, but at quick glance it seems to work. I know it's a tough call. But I do see these sorts of comments regularly, and it is a fact that there are too many D libraries gone to seed that won't compile anymore, and that makes us look bad. If that it's the problem, it's time to go ahead with an explicit support for the work done in dfix, no? It's not a silver bullet, but it's a clear indication to the potential adopters that there's a plan, and actively indicate that definitely "we care" about that particular issue, common to every language. --- /Paolo
Re: D is for Data Science
With algorithm.sort the deciles bench from the article runs twice as fast(it's in the reddit thread) I see array.sort is planned for future deprecation, what does "future" fall under?
Re: D is for Data Science
On 11/24/2014 4:50 PM, Adam D. Ruppe wrote: On Tuesday, 25 November 2014 at 00:34:30 UTC, Walter Bright wrote: Thought I'd post this as a counterpoint to the recent "please break our code" thread. I would caution against putting very much weight in Reddit opinions - there's people who will never use D and just look for excuses to justify their prejudice and there's people who think they want something, but don't really have any idea (this is common in feature requests, as I'm sure you know) That comment, in particular, seems very questionable to me. dstats at least compiles out of the box and has github activity within the last few months. It has a lot of templates, so maybe actually using it would reveal compilation problems, but at quick glance it seems to work. I know it's a tough call. But I do see these sorts of comments regularly, and it is a fact that there are too many D libraries gone to seed that won't compile anymore, and that makes us look bad.
Re: D is for Data Science
On Tuesday, 25 November 2014 at 00:34:30 UTC, Walter Bright wrote: Thought I'd post this as a counterpoint to the recent "please break our code" thread. I would caution against putting very much weight in Reddit opinions - there's people who will never use D and just look for excuses to justify their prejudice and there's people who think they want something, but don't really have any idea (this is common in feature requests, as I'm sure you know) That comment, in particular, seems very questionable to me. dstats at least compiles out of the box and has github activity within the last few months. It has a lot of templates, so maybe actually using it would reveal compilation problems, but at quick glance it seems to work.
Re: D is for Data Science
On 11/24/2014 7:27 AM, Gary Willoughby wrote: Just browsing reddit and found this article posted about D. https://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/cmbn83i Thought I'd post this as a counterpoint to the recent "please break our code" thread.
Re: D is for Data Science
25-Nov-2014 02:43, bearophile пишет: Dmitry Olshansky: Which is 1:1 parity. Another myth busted? ;) > dmitry@Ubu64 ~ $ time ./my2 log > > real0m0.065s > user0m0.042s > sys0m0.023s > dmitry@Ubu64 ~ $ time ./my2 log > > real0m0.063s > user0m0.040s > sys0m0.023s > Read the above more carefully. OMG. I really need to watch my fingers, and double-check:) dmitry@Ubu64 ~ $ time ./my log real0m0.156s user0m0.130s sys 0m0.026s dmitry@Ubu64 ~ $ time ./my2 log real0m0.063s user0m0.040s sys0m0.023s Which is quite bad. Optimizations do help but not much. There is still an open bug report: https://issues.dlang.org/show_bug.cgi?id=11810 Do you want also to benchmark that byLineFast that for me is usually significantly faster than the byLine? And it seems like byLineFast is indeed fast. dmitry@Ubu64 ~ $ time ./my3 log real0m0.056s user0m0.031s sys 0m0.025s dmitry@Ubu64 ~ $ time ./my2 log real0m0.065s user0m0.041s sys 0m0.024s Now once I was destroyed the question is who is going to make a PR of this? -- Dmitry Olshansky
Re: D is for Data Science
Dmitry Olshansky: Which is 1:1 parity. Another myth busted? ;) There is still an open bug report: https://issues.dlang.org/show_bug.cgi?id=11810 Do you want also to benchmark that byLineFast that for me is usually significantly faster than the byLine? Bye, bearophile
Re: D is for Data Science
25-Nov-2014 01:28, bearophile пишет: Dmitry Olshansky: Why is File.byLine so slow? Seems to be mostly fixed sometime ago. Really? I am not so sure. Bye, bearophile I too has suspected it in the past and then I tested it. Now I test it again, it's always easier to check then to argue. Two minimal programs //my.d: import std.stdio; void main(string[] args) { auto file = File(args[1], "r"); size_t cnt=0; foreach(char[] line; file.byLine()) { cnt++; } } //my2.d import core.stdc.stdio; void main(string[] args) { char[] buf = new char[32768]; size_t cnt; shared(FILE)* file = fopen(args[1].ptr, "r"); while(fgets(buf.ptr, cast(int)buf.length, file) != null){ cnt++; } fclose(file); } In the below console session, log file - is my dmsg log replicated many times (34 megs total). dmitry@Ubu64 ~ $ wc -l log 522240 log dmitry@Ubu64 ~ $ du -hs log 34M log # touch it, to have it in disk cache: dmitry@Ubu64 ~ $ cat log > /dev/null dmitry@Ubu64 ~ $ dmd my dmitry@Ubu64 ~ $ dmd my2 dmitry@Ubu64 ~ $ time ./my2 log real0m0.062s user0m0.039s sys 0m0.023s dmitry@Ubu64 ~ $ time ./my log real0m0.181s user0m0.155s sys 0m0.025s ~4 time in user mode, okay... Now with full optimizations, ranges are very sensitive to optimizations: dmitry@Ubu64 ~ $ dmd -O -release -inline my dmitry@Ubu64 ~ $ dmd -O -release -inline my2 dmitry@Ubu64 ~ $ time ./my2 log real0m0.065s user0m0.042s sys 0m0.023s dmitry@Ubu64 ~ $ time ./my2 log real0m0.063s user0m0.040s sys 0m0.023s Which is 1:1 parity. Another myth busted? ;) -- Dmitry Olshansky
Re: D is for Data Science
On Monday, 24 November 2014 at 23:32:14 UTC, Jay Norwood wrote: Is this related? https://github.com/dscience-developers/dscience This seems good too. Why the comments in the discussion about lack of libraries? https://github.com/kyllingstad/scid/wiki
Re: D is for Data Science
On Monday, 24 November 2014 at 15:27:19 UTC, Gary Willoughby wrote: Just browsing reddit and found this article posted about D. Written by Andrew Pascoe of AdRoll. From the article: "The D programming language has quickly become our language of choice on the Data Science team for any task that requires efficiency, and is now the keystone language for our critical infrastructure. Why? Because D has a lot to offer." Article: http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html Reddit: http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/ Is this related? https://github.com/dscience-developers/dscience
Re: D is for Data Science
On 11/24/2014 2:25 PM, Dmitry Olshansky wrote: [...] Excellent comments. Please post them on the reddit page!
Re: D is for Data Science
Dmitry Olshansky: Why is File.byLine so slow? Seems to be mostly fixed sometime ago. Really? I am not so sure. Bye, bearophile
Re: D is for Data Science
25-Nov-2014 00:34, weaselcat пишет: On Monday, 24 November 2014 at 15:27:19 UTC, Gary Willoughby wrote: Just browsing reddit and found this article posted about D. Written by Andrew Pascoe of AdRoll. From the article: "The D programming language has quickly become our language of choice on the Data Science team for any task that requires efficiency, and is now the keystone language for our critical infrastructure. Why? Because D has a lot to offer." Article: http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html Quoting the article: > One of the best things we can do is minimize the amount of memory we’re allocating; we allocate a new char[] every time we read a line. This is wrong. byLine reuses buffer if its mutable which is the case with char[]. I recommend authors to always double checking hypothesis before stating it in article, especially about performance. Observe: https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1660 https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1652 And notice a warning about reusing the buffer here: https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1741 Reddit: http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/ Why is File.byLine so slow? Seems to be mostly fixed sometime ago. It's slower then straight fgets but it's not that bad. Also nearly optimal solution using C's fgets with growable buffer is way simpler then outlined code in the article. Or we can mmap the file too. Having to work around the standard library defeats the point of a standard library. Truth be told the most of slowdown should be in eager split, notably with GC allocation per line. It may also trigger GC collection after splitting many lines, maybe even many collections. The easy way out is to use standard _splitter_ which is lazy and non-allocating. Which is a _2-letter_ change, and still using nice clean standard function. Article was really disappointing for me because I expected to see that single line change outlined above to fix the 80% of problem elegantly. Instead I observe 100+ spooky lines that needlessly maintain 3 buffers at the same time (how scientific) instead of growing single one to amortize the cost. And then a claim that's nice to be able to improve speed so easily. -- Dmitry Olshansky
Re: D is for Data Science
On Monday, 24 November 2014 at 15:27:19 UTC, Gary Willoughby wrote: Just browsing reddit and found this article posted about D. Written by Andrew Pascoe of AdRoll. From the article: "The D programming language has quickly become our language of choice on the Data Science team for any task that requires efficiency, and is now the keystone language for our critical infrastructure. Why? Because D has a lot to offer." Article: http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html Reddit: http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/ Why is File.byLine so slow? Having to work around the standard library defeats the point of a standard library.
Re: D is for Data Science - reddit discussion
Haven't noticed that it was already posted. Sorry about that. The disscussion is here http://forum.dlang.org/thread/qeyftagcvkhjjeeba...@forum.dlang.org
D is for Data Science - reddit discussion
D is for Data Science by Andrew Pascoe http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/
D is for Data Science
Just browsing reddit and found this article posted about D. Written by Andrew Pascoe of AdRoll. From the article: "The D programming language has quickly become our language of choice on the Data Science team for any task that requires efficiency, and is now the keystone language for our critical infrastructure. Why? Because D has a lot to offer." Article: http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html Reddit: http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/