Re: D is for Data Science

2014-11-28 Thread Tomer Rosenschtein via Digitalmars-d-announce

happy shabbat.
Shallom. See ya.



Re: D is for Data Science

2014-11-28 Thread Tomer Rosenschtein via Digitalmars-d-announce
On Friday, 28 November 2014 at 22:57:31 UTC, CraigDillabaugh 
wrote:
On Friday, 28 November 2014 at 22:41:12 UTC, Tomer Rosenschtein 
wrote:
On Friday, 28 November 2014 at 22:31:19 UTC, CraigDillabaugh 
wrote:

On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote:

Tomer Rosenschtein:


Awesome article.
"Paper of the week" is a modest word for this.


The D code is not good.

Bye,
bearophile


Maybe not good by the standards of this group, but it does 
represent the efforts of someone doing 'real work', so I 
think it is worthwhile.


I would bet that 'in the wild' there is a lot more D code 
that looks like that than what might be considered good, 
idiomatic D.


Craig


I understand why D is still underground.The guy use R, by 
miracle he suddently test a strong typed-compiled-lang and he 
concludes: "well, those compiled lang seem interesting...".

Then Someone post this here, on reddit, on HackerNews...

And Miracle!

Everybody thinks it's awesome.
Common...


You're the one that called it awesome!  I don't think anyone 
here was overly excited about it, but we are always happy to 
see D get good press.


Maybe the guy the wrote the article is just an average 
programmer, but hey most of the programmers in the world are 
average programmers - so this article could appeal to that 
segment of the market.


It's not my call. it's one the right side, twitter things.
And about "awesome", it looks like you dont get my irony.



Re: D is for Data Science

2014-11-28 Thread bearophile via Digitalmars-d-announce

CraigDillabaugh:

Maybe not good by the standards of this group, but it does 
represent the efforts of someone doing 'real work', so I think 
it is worthwhile.


Perhaps part of the cause of the low quality of the code in that 
blog post is the design of D language is not "bondage" enough. 
This worries me a little, because most D code I see in the wild 
is not good, and looks more like a Java/C++ mix. In Python 
culture there is a stronger pressure to write Pythonic code 
similar to Python code written by all other Python programmers. 
In the Go culture this is even stronger, there's even only one 
standard way to format code, and the language is simpler so there 
is less possibility for usage of alternative constructs (while in 
D you have often five ways to shoot the foot). From what I've 
seen the Rust culture is more "bondage" than D culture, in both 
surface look of code, and idioms, and I think this is good.


Bye,
bearophile


Re: D is for Data Science

2014-11-28 Thread CraigDillabaugh via Digitalmars-d-announce
On Friday, 28 November 2014 at 22:41:12 UTC, Tomer Rosenschtein 
wrote:
On Friday, 28 November 2014 at 22:31:19 UTC, CraigDillabaugh 
wrote:

On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote:

Tomer Rosenschtein:


Awesome article.
"Paper of the week" is a modest word for this.


The D code is not good.

Bye,
bearophile


Maybe not good by the standards of this group, but it does 
represent the efforts of someone doing 'real work', so I think 
it is worthwhile.


I would bet that 'in the wild' there is a lot more D code that 
looks like that than what might be considered good, idiomatic 
D.


Craig


I understand why D is still underground.The guy use R, by 
miracle he suddently test a strong typed-compiled-lang and he 
concludes: "well, those compiled lang seem interesting...".

Then Someone post this here, on reddit, on HackerNews...

And Miracle!

Everybody thinks it's awesome.
Common...


You're the one that called it awesome!  I don't think anyone here 
was overly excited about it, but we are always happy to see D get 
good press.


Maybe the guy the wrote the article is just an average 
programmer, but hey most of the programmers in the world are 
average programmers - so this article could appeal to that 
segment of the market.


Re: D is for Data Science

2014-11-28 Thread Tomer Rosenschtein via Digitalmars-d-announce
On Friday, 28 November 2014 at 22:31:19 UTC, CraigDillabaugh 
wrote:

On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote:

Tomer Rosenschtein:


Awesome article.
"Paper of the week" is a modest word for this.


The D code is not good.

Bye,
bearophile


Maybe not good by the standards of this group, but it does 
represent the efforts of someone doing 'real work', so I think 
it is worthwhile.


I would bet that 'in the wild' there is a lot more D code that 
looks like that than what might be considered good, idiomatic D.


Craig


I understand why D is still underground.The guy use R, by miracle 
he suddently test a strong typed-compiled-lang and he concludes: 
"well, those compiled lang seem interesting...".

Then Someone post this here, on reddit, on HackerNews...

And Miracle!

Everybody thinks it's awesome.
Common...


Re: D is for Data Science

2014-11-28 Thread CraigDillabaugh via Digitalmars-d-announce

On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote:

Tomer Rosenschtein:


Awesome article.
"Paper of the week" is a modest word for this.


The D code is not good.

Bye,
bearophile


Maybe not good by the standards of this group, but it does 
represent the efforts of someone doing 'real work', so I think it 
is worthwhile.


I would bet that 'in the wild' there is a lot more D code that 
looks like that than what might be considered good, idiomatic D.


Craig


Re: D is for Data Science

2014-11-28 Thread Tomer Rosenschtein via Digitalmars-d-announce
On Friday, 28 November 2014 at 22:18:09 UTC, Tomer Rosenschtein 
wrote:

On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote:

Tomer Rosenschtein:


Awesome article.
"Paper of the week" is a modest word for this.


The D code is not good.

Bye,
bearophile


But it was worth a reddit and hackerNews redirection:

"look at that this fuckin genious who understand everthing"

Btw he's not so clever but we promote this paper because we 
love "papers"


OMG a new blog post about D!
Mazeltov.
I spread it, even if the guy is stupid.



Re: D is for Data Science

2014-11-28 Thread Tomer Rosenschtein via Digitalmars-d-announce

On Friday, 28 November 2014 at 22:00:21 UTC, bearophile wrote:

Tomer Rosenschtein:


Awesome article.
"Paper of the week" is a modest word for this.


The D code is not good.

Bye,
bearophile


But it was worth a reddit and hackerNews redirection:

"look at that this fuckin genious who understand everthing"

Btw he's not so clever but we promote this paper because we love 
"papers"




Re: D is for Data Science

2014-11-28 Thread bearophile via Digitalmars-d-announce

Tomer Rosenschtein:


Awesome article.
"Paper of the week" is a modest word for this.


The D code is not good.

Bye,
bearophile


Re: D is for Data Science

2014-11-28 Thread Tomer Rosenschtein via Digitalmars-d-announce
On Monday, 24 November 2014 at 15:27:19 UTC, Gary Willoughby 
wrote:

Just browsing reddit and found this article posted about D.
Written by Andrew Pascoe of AdRoll.

From the article:
"The D programming language has quickly become our language of 
choice on the Data Science team for any task that requires 
efficiency, and is now the keystone language for our critical 
infrastructure. Why? Because D has a lot to offer."


Article:
http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html

Reddit:
http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/


Awesome article.
"Paper of the week" is a modest word for this.



Re: D is for Data Science

2014-11-28 Thread Chris via Digitalmars-d-announce

On Friday, 28 November 2014 at 12:06:06 UTC, Chris wrote:
On Tuesday, 25 November 2014 at 13:24:04 UTC, ketmar via 
Digitalmars-d-announce wrote:

On Mon, 24 Nov 2014 17:10:25 -0800
Walter Bright via Digitalmars-d-announce
 wrote:

I know it's a tough call. But I do see these sorts of 
comments regularly, and it is a fact that there are too many 
D libraries gone to seed that won't compile anymore, and that 
makes us look bad.
but D wins in overall. being one of the architects in my 
bussiness i
was eagerly pushing D as our main development language. it's 
good that
this thing (and some other too) happens before i succeeded. 
now we keep
going with C++, as it fscks safety too, fscks principle of 
least
astonishment, almost never fixes inconsistencies, but it has 
alot more
libraries and i can hire alot more programmers with it. i'm 
still using
D as a language for my hobbyst throw-away projects though, and 
D is
great for such things. D wins, 'cause i *almost* stopped 
ranting (not
only in this NG) and just accepting it as is. well, almost as 
is, i'm
applying alot of patches over vanilla D. this, of course, 
makes my code
incompatible with every other D compiler out here, but luckily 
this is

not a concern anymore.


"just accepting it as is" - Well, there's no need to do that. 
If there are issues, you're free to comment on them, make a 
feature request and/or fix them yourself. Everybody accepts any 
language "as is" as long as it's a mainstream language, 
regardless of any shortcomings or major annoyances. Your 
comment proves just that.


Just this week I was working on new software and I'm still 
amazed at how many options I have in D (and I keep discovering 
new options). D is always compared to C++ in terms of 
performance and libraries. Sure, there are more libraries (and 
by extension programmers) out there for C++. Performance might 
be better or worse, depending on the library and the 
programmer. However, The sheer abundance of options and 
modeling power in D is one of the reasons I stick with D. I 
deal with problems concerning language processing (grammar, 
rules etc.), i.e. mapping the human mind to a machine, and D 
always gives me a way to model complex and intricate systems. 
Sometimes I look at the code and think "How would I have 
implemented this in C, Python or Java?" I shudder and say "No 
way!" Believe it or not, modeling power, often overlooked, is 
one of the key features of programming languages of the future. 
Performance can always be improved. But modeling power is hard 
to add, if you don't have it already. Libraries, well, if you 
have strong modeling power, you can roll your own very quickly. 
Maybe an abundance of libraries is a sign that a language lacks 
modeling power.


About the article, it proves two things. First, you can easily 
roll your own in D. Second, you have to know the language well to 
be able to get the most out of it without having to roll your 
own.[1] Either way, it improves your general understanding of 
programming.


[1] This includes not hesitating to ask question on D.learn.


Re: D is for Data Science

2014-11-28 Thread Chris via Digitalmars-d-announce
On Tuesday, 25 November 2014 at 13:24:04 UTC, ketmar via 
Digitalmars-d-announce wrote:

On Mon, 24 Nov 2014 17:10:25 -0800
Walter Bright via Digitalmars-d-announce
 wrote:

I know it's a tough call. But I do see these sorts of comments 
regularly, and it is a fact that there are too many D 
libraries gone to seed that won't compile anymore, and that 
makes us look bad.
but D wins in overall. being one of the architects in my 
bussiness i
was eagerly pushing D as our main development language. it's 
good that
this thing (and some other too) happens before i succeeded. now 
we keep

going with C++, as it fscks safety too, fscks principle of least
astonishment, almost never fixes inconsistencies, but it has 
alot more
libraries and i can hire alot more programmers with it. i'm 
still using
D as a language for my hobbyst throw-away projects though, and 
D is
great for such things. D wins, 'cause i *almost* stopped 
ranting (not
only in this NG) and just accepting it as is. well, almost as 
is, i'm
applying alot of patches over vanilla D. this, of course, makes 
my code
incompatible with every other D compiler out here, but luckily 
this is

not a concern anymore.


"just accepting it as is" - Well, there's no need to do that. If 
there are issues, you're free to comment on them, make a feature 
request and/or fix them yourself. Everybody accepts any language 
"as is" as long as it's a mainstream language, regardless of any 
shortcomings or major annoyances. Your comment proves just that.


Just this week I was working on new software and I'm still amazed 
at how many options I have in D (and I keep discovering new 
options). D is always compared to C++ in terms of performance and 
libraries. Sure, there are more libraries (and by extension 
programmers) out there for C++. Performance might be better or 
worse, depending on the library and the programmer. However, The 
sheer abundance of options and modeling power in D is one of the 
reasons I stick with D. I deal with problems concerning language 
processing (grammar, rules etc.), i.e. mapping the human mind to 
a machine, and D always gives me a way to model complex and 
intricate systems. Sometimes I look at the code and think "How 
would I have implemented this in C, Python or Java?" I shudder 
and say "No way!" Believe it or not, modeling power, often 
overlooked, is one of the key features of programming languages 
of the future. Performance can always be improved. But modeling 
power is hard to add, if you don't have it already. Libraries, 
well, if you have strong modeling power, you can roll your own 
very quickly. Maybe an abundance of libraries is a sign that a 
language lacks modeling power.


Re: D is for Data Science

2014-11-28 Thread Iain Buclaw via Digitalmars-d-announce
On 28 November 2014 at 06:40, Daniel Murphy via Digitalmars-d-announce
 wrote:
> "weaselcat"  wrote in message news:rnlbybkfqokypxlgf...@forum.dlang.org...
>
>> I see array.sort is planned for future deprecation, what does "future"
>> fall under?
>
>
> Generally 'future deprecation' means at least 6 months after it gets turned
> into a warning.  Often it's significantly longer, because nobody bothers to
> update it after six months have passed.

1 year down the line, someone notices the "deprecated, planned removal
in Nov 2014" comment, and bumps the removal date to Nov 2015.  :-)


Re: D is for Data Science

2014-11-27 Thread Daniel Murphy via Digitalmars-d-announce

"weaselcat"  wrote in message news:rnlbybkfqokypxlgf...@forum.dlang.org...

I see array.sort is planned for future deprecation, what does "future" 
fall under?


Generally 'future deprecation' means at least 6 months after it gets turned 
into a warning.  Often it's significantly longer, because nobody bothers to 
update it after six months have passed. 



Re: D is for Data Science

2014-11-25 Thread ketmar via Digitalmars-d-announce
On Mon, 24 Nov 2014 17:10:25 -0800
Walter Bright via Digitalmars-d-announce
 wrote:

> I know it's a tough call. But I do see these sorts of comments regularly, and 
> it 
> is a fact that there are too many D libraries gone to seed that won't compile 
> anymore, and that makes us look bad.
but D wins in overall. being one of the architects in my bussiness i
was eagerly pushing D as our main development language. it's good that
this thing (and some other too) happens before i succeeded. now we keep
going with C++, as it fscks safety too, fscks principle of least
astonishment, almost never fixes inconsistencies, but it has alot more
libraries and i can hire alot more programmers with it. i'm still using
D as a language for my hobbyst throw-away projects though, and D is
great for such things. D wins, 'cause i *almost* stopped ranting (not
only in this NG) and just accepting it as is. well, almost as is, i'm
applying alot of patches over vanilla D. this, of course, makes my code
incompatible with every other D compiler out here, but luckily this is
not a concern anymore.


signature.asc
Description: PGP signature


Re: D is for Data Science

2014-11-25 Thread bearophile via Digitalmars-d-announce

weaselcat:

I see array.sort is planned for future deprecation, what does 
"future" fall under?


For us that activate warnings in dmd (because for a design 
mistake they are disabled on default, but hopefully this will be 
fixed in future) in the latest github version of the compiler it 
gives a warning if you use the built-in sort and "reverse". 
Unfortunately the library "reverse" still needs to be fixed to 
return the array as the built-in "reverse".


Bye,
bearophile


Re: D is for Data Science

2014-11-25 Thread Kagamin via Digitalmars-d-announce

On Tuesday, 25 November 2014 at 01:10:56 UTC, Walter Bright wrote:
I know it's a tough call. But I do see these sorts of comments 
regularly, and it is a fact that there are too many D libraries 
gone to seed that won't compile anymore, and that makes us look 
bad.


Or this: 
https://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/cmbssac
It was the endless std.logger bikeshedding that finally did me 
in. Even if they get it into std.experimental in the next 
release, I'm finally done. I cancelled my projects and pulled 
them off dub.


Is this a much better reason?


Re: D is for Data Science

2014-11-25 Thread Paolo Invernizzi via Digitalmars-d-announce

On Tuesday, 25 November 2014 at 01:10:56 UTC, Walter Bright wrote:

On 11/24/2014 4:50 PM, Adam D. Ruppe wrote:
On Tuesday, 25 November 2014 at 00:34:30 UTC, Walter Bright 
wrote:
Thought I'd post this as a counterpoint to the recent "please 
break our code"

thread.


I would caution against putting very much weight in Reddit 
opinions - there's
people who will never use D and just look for excuses to 
justify their prejudice
and there's people who think they want something, but don't 
really have any idea

(this is common in feature requests, as I'm sure you know)

That comment, in particular, seems very questionable to me. 
dstats at least
compiles out of the box and has github activity within the 
last few months. It
has a lot of templates, so maybe actually using it would 
reveal compilation

problems, but at quick glance it seems to work.


I know it's a tough call. But I do see these sorts of comments 
regularly, and it is a fact that there are too many D libraries 
gone to seed that won't compile anymore, and that makes us look 
bad.


If that it's the problem, it's time to go ahead with an explicit 
support for the work done in dfix, no?
It's not a silver bullet, but it's a clear indication to the 
potential adopters that there's a plan, and actively indicate 
that definitely "we care" about that particular issue, common to 
every language.


---
/Paolo


Re: D is for Data Science

2014-11-24 Thread weaselcat via Digitalmars-d-announce
With algorithm.sort the deciles bench from the article runs twice 
as fast(it's in the reddit thread)


I see array.sort is planned for future deprecation, what does 
"future" fall under?


Re: D is for Data Science

2014-11-24 Thread Walter Bright via Digitalmars-d-announce

On 11/24/2014 4:50 PM, Adam D. Ruppe wrote:

On Tuesday, 25 November 2014 at 00:34:30 UTC, Walter Bright wrote:

Thought I'd post this as a counterpoint to the recent "please break our code"
thread.


I would caution against putting very much weight in Reddit opinions - there's
people who will never use D and just look for excuses to justify their prejudice
and there's people who think they want something, but don't really have any idea
(this is common in feature requests, as I'm sure you know)

That comment, in particular, seems very questionable to me. dstats at least
compiles out of the box and has github activity within the last few months. It
has a lot of templates, so maybe actually using it would reveal compilation
problems, but at quick glance it seems to work.


I know it's a tough call. But I do see these sorts of comments regularly, and it 
is a fact that there are too many D libraries gone to seed that won't compile 
anymore, and that makes us look bad.


Re: D is for Data Science

2014-11-24 Thread Adam D. Ruppe via Digitalmars-d-announce

On Tuesday, 25 November 2014 at 00:34:30 UTC, Walter Bright wrote:
Thought I'd post this as a counterpoint to the recent "please 
break our code" thread.


I would caution against putting very much weight in Reddit 
opinions - there's people who will never use D and just look for 
excuses to justify their prejudice and there's people who think 
they want something, but don't really have any idea (this is 
common in feature requests, as I'm sure you know)


That comment, in particular, seems very questionable to me. 
dstats at least compiles out of the box and has github activity 
within the last few months. It has a lot of templates, so maybe 
actually using it would reveal compilation problems, but at quick 
glance it seems to work.


Re: D is for Data Science

2014-11-24 Thread Walter Bright via Digitalmars-d-announce

On 11/24/2014 7:27 AM, Gary Willoughby wrote:

Just browsing reddit and found this article posted about D.



https://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/cmbn83i

Thought I'd post this as a counterpoint to the recent "please break our code" 
thread.


Re: D is for Data Science

2014-11-24 Thread Dmitry Olshansky via Digitalmars-d-announce

25-Nov-2014 02:43, bearophile пишет:

Dmitry Olshansky:


Which is 1:1 parity. Another myth busted? ;)


> dmitry@Ubu64 ~ $ time ./my2 log
>
> real0m0.065s
> user0m0.042s
> sys0m0.023s
> dmitry@Ubu64 ~ $ time ./my2 log
>
> real0m0.063s
> user0m0.040s
> sys0m0.023s
>

Read the above more carefully.
OMG. I really need to watch my fingers, and double-check:)

dmitry@Ubu64 ~ $ time ./my log

real0m0.156s
user0m0.130s
sys 0m0.026s

dmitry@Ubu64 ~ $ time ./my2 log

real0m0.063s
user0m0.040s
sys0m0.023s

Which is quite bad. Optimizations do help but not much.



There is still an open bug report:
https://issues.dlang.org/show_bug.cgi?id=11810

Do you want also to benchmark that byLineFast that for me is usually
significantly faster than the byLine?



And it seems like byLineFast is indeed fast.

dmitry@Ubu64 ~ $ time ./my3 log

real0m0.056s
user0m0.031s
sys 0m0.025s
dmitry@Ubu64 ~ $ time ./my2 log

real0m0.065s
user0m0.041s
sys 0m0.024s


Now once I was destroyed the question is who is going to make a PR of this?

--
Dmitry Olshansky


Re: D is for Data Science

2014-11-24 Thread bearophile via Digitalmars-d-announce

Dmitry Olshansky:


Which is 1:1 parity. Another myth busted? ;)


There is still an open bug report:
https://issues.dlang.org/show_bug.cgi?id=11810

Do you want also to benchmark that byLineFast that for me is 
usually significantly faster than the byLine?


Bye,
bearophile


Re: D is for Data Science

2014-11-24 Thread Dmitry Olshansky via Digitalmars-d-announce

25-Nov-2014 01:28, bearophile пишет:

Dmitry Olshansky:


Why is File.byLine so slow?


Seems to be mostly fixed sometime ago.


Really? I am not so sure.

Bye,
bearophile


I too has suspected it in the past and then I tested it.
Now I test it again, it's always easier to check then to argue.

Two minimal programs
//my.d:
import std.stdio;

void main(string[] args) {
auto file = File(args[1], "r");
size_t cnt=0;
foreach(char[] line; file.byLine()) {
cnt++;
}
}
//my2.d
import core.stdc.stdio;

void main(string[] args) {
char[] buf = new char[32768];
size_t cnt;
shared(FILE)* file = fopen(args[1].ptr, "r");
while(fgets(buf.ptr, cast(int)buf.length, file) != null){
cnt++;
}
fclose(file);
}

In the below console session, log file - is my dmsg log replicated many 
times (34 megs total).


dmitry@Ubu64 ~ $ wc -l log
522240 log
dmitry@Ubu64 ~ $ du -hs log
34M log

# touch it, to have it in disk cache:
dmitry@Ubu64 ~ $ cat log > /dev/null

dmitry@Ubu64 ~ $ dmd my
dmitry@Ubu64 ~ $ dmd my2

dmitry@Ubu64 ~ $ time ./my2 log

real0m0.062s
user0m0.039s
sys 0m0.023s
dmitry@Ubu64 ~ $ time ./my log

real0m0.181s
user0m0.155s
sys 0m0.025s

~4 time in user mode, okay...
Now with full optimizations, ranges are very sensitive to optimizations:

dmitry@Ubu64 ~ $ dmd -O -release -inline  my
dmitry@Ubu64 ~ $ dmd -O -release -inline  my2
dmitry@Ubu64 ~ $ time ./my2 log

real0m0.065s
user0m0.042s
sys 0m0.023s
dmitry@Ubu64 ~ $ time ./my2 log

real0m0.063s
user0m0.040s
sys 0m0.023s

Which is 1:1 parity. Another myth busted? ;)

--
Dmitry Olshansky


Re: D is for Data Science

2014-11-24 Thread Jay Norwood via Digitalmars-d-announce

On Monday, 24 November 2014 at 23:32:14 UTC, Jay Norwood wrote:


Is this related?

https://github.com/dscience-developers/dscience


This seems good too.  Why the comments in the discussion about 
lack of libraries?


https://github.com/kyllingstad/scid/wiki




Re: D is for Data Science

2014-11-24 Thread Jay Norwood via Digitalmars-d-announce
On Monday, 24 November 2014 at 15:27:19 UTC, Gary Willoughby 
wrote:

Just browsing reddit and found this article posted about D.
Written by Andrew Pascoe of AdRoll.

From the article:
"The D programming language has quickly become our language of 
choice on the Data Science team for any task that requires 
efficiency, and is now the keystone language for our critical 
infrastructure. Why? Because D has a lot to offer."


Article:
http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html

Reddit:
http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/


Is this related?

https://github.com/dscience-developers/dscience




Re: D is for Data Science

2014-11-24 Thread Walter Bright via Digitalmars-d-announce

On 11/24/2014 2:25 PM, Dmitry Olshansky wrote:

[...]


Excellent comments. Please post them on the reddit page!



Re: D is for Data Science

2014-11-24 Thread bearophile via Digitalmars-d-announce

Dmitry Olshansky:


Why is File.byLine so slow?


Seems to be mostly fixed sometime ago.


Really? I am not so sure.

Bye,
bearophile


Re: D is for Data Science

2014-11-24 Thread Dmitry Olshansky via Digitalmars-d-announce

25-Nov-2014 00:34, weaselcat пишет:

On Monday, 24 November 2014 at 15:27:19 UTC, Gary Willoughby wrote:

Just browsing reddit and found this article posted about D.
Written by Andrew Pascoe of AdRoll.

From the article:
"The D programming language has quickly become our language of choice
on the Data Science team for any task that requires efficiency, and is
now the keystone language for our critical infrastructure. Why?
Because D has a lot to offer."

Article:
http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html



Quoting the article:

> One of the best things we can do is minimize the amount of memory 
we’re allocating; we allocate a new char[] every time we read a line.


This is wrong. byLine reuses buffer if its mutable which is the case 
with char[]. I recommend authors to always double checking hypothesis 
before stating it in article, especially about performance.


Observe:
https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1660
https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1652

And notice a warning about reusing the buffer here:

https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1741


Reddit:
http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/



Why is File.byLine so slow?


Seems to be mostly fixed sometime ago. It's slower then straight fgets 
but it's not that bad.


Also nearly optimal solution using C's fgets with growable buffer is way 
simpler then outlined code in the article. Or we can mmap the file too.



Having to work around the standard library
defeats the point of a standard library.


Truth be told the most of slowdown should be in eager split, notably 
with GC allocation per line. It may also trigger GC collection after 
splitting many lines, maybe even many collections.


The easy way out is to use standard _splitter_ which is lazy and 
non-allocating.  Which is a _2-letter_ change, and still using nice 
clean standard function.


Article was really disappointing for me because I expected to see that 
single line change outlined above to fix the 80% of problem elegantly. 
Instead I observe 100+ spooky lines that needlessly maintain 3 buffers 
at the same time (how scientific) instead of growing single one to 
amortize the cost. And then a claim that's nice to be able to improve 
speed so easily.



--
Dmitry Olshansky


Re: D is for Data Science

2014-11-24 Thread weaselcat via Digitalmars-d-announce
On Monday, 24 November 2014 at 15:27:19 UTC, Gary Willoughby 
wrote:

Just browsing reddit and found this article posted about D.
Written by Andrew Pascoe of AdRoll.

From the article:
"The D programming language has quickly become our language of 
choice on the Data Science team for any task that requires 
efficiency, and is now the keystone language for our critical 
infrastructure. Why? Because D has a lot to offer."


Article:
http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html

Reddit:
http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/


Why is File.byLine so slow? Having to work around the standard 
library defeats the point of a standard library.


Re: D is for Data Science - reddit discussion

2014-11-24 Thread MrSmith via Digitalmars-d-announce

Haven't noticed that it was already posted. Sorry about that.

The disscussion is here 
http://forum.dlang.org/thread/qeyftagcvkhjjeeba...@forum.dlang.org


D is for Data Science - reddit discussion

2014-11-24 Thread MrSmith via Digitalmars-d-announce

D is for Data Science by Andrew Pascoe

http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/


D is for Data Science

2014-11-24 Thread Gary Willoughby via Digitalmars-d-announce

Just browsing reddit and found this article posted about D.
Written by Andrew Pascoe of AdRoll.

From the article:
"The D programming language has quickly become our language of 
choice on the Data Science team for any task that requires 
efficiency, and is now the keystone language for our critical 
infrastructure. Why? Because D has a lot to offer."


Article:
http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html

Reddit:
http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/