Re: Silicon Valley D Meetup - March 18, 2021 - "Templates in the D Programming Language" by Ali Çehreli
On Friday, 19 March 2021 at 17:10:27 UTC, Ali Çehreli wrote: Jon mentioned how PR 7678 reduced the performance of std.regex.matchOnce. After analyzing the code we realized that the performance loss must be due to two delegate context allocations: https://github.com/dlang/phobos/pull/7678/files#diff-269abc020de3a951eaaa5b8eca5a0700ba8b298767c7a64f459e74e1531a80aeR825 One delegate is 'matchOnceImp' and the other one is the anonymous delegate created on the return expression. We understood that 'matchOnceImp' could not be a nested function because of an otherwise useful rule: the name of the nested function alone would *call* that function instead of being a symbol for it. That is not the case for a local delegate variable, so that's why 'matchOnceImp' exists as a delegate variable there. Then there is the addition of the 'pure' attribute to it. Fine... After tinkering with the code, we realized that the same effect can be achieved with a static member function of a static struct, which would not allocate any delegate context. I add @nogc to the following code to prove that point. The following code is even simpler than Jon and I came up with yesterday. [... Code snippet removed ...] There: we injected @trusted code inside a @nogc @safe function. Question to others: Did we understand the reason for the convoluted code in that PR fully? Is the above method really a better solution? I submitted PR 7902 (https://github.com/dlang/phobos/pull/7902) to address this. I wasn't able to use the version Ali showed in the post, but the PR does use what is essentially the same idea identified at the D Meetup. It is a performance regression, and is a bit more nuanced than would be ideal. Comments and review would be appreciated. --Jon
Re: Article: Why I use the D programming language for scripting
On Sunday, 31 January 2021 at 20:36:43 UTC, aberba wrote: It's finally out! https://opensource.com/article/21/1/d-scripting Very nice! Clearly I'm not taking enough advantage of scripting capabilities! --Jon
Re: Github Actions now support D out of the box!
On Friday, 21 August 2020 at 02:03:40 UTC, Mathias LANG wrote: Hi everyone, Almost a year ago, Ernesto Castelloti (@ErnyTech) submitted a PR for Github's "starter-workflow" to add support for D out of the box (https://github.com/actions/starter-workflows/pull/74). It was in a grey area for a while, as Github was trying to come up with a policy for external actions. I ended up picking up the project, after working with actions extensively for my own projects and the dlang org, and my PR was finally merged yesterday (https://github.com/actions/starter-workflows/pull/546). A thank you to everyone who helped put this together. I just started using it, and it works quite well. It's a very valuable tool to have! --Jon
Re: Github Actions now support D out of the box!
On Friday, 21 August 2020 at 02:03:40 UTC, Mathias LANG wrote: [...] Thanks for the effort on this, I'll definitely be checking it out! --Jon
Re: tsv-utils 2.0 release: Named field support
On Tuesday, 28 July 2020 at 15:57:57 UTC, bachmeier wrote: Thanks for your work. I've recommended tsv-utils to some students for their data analysis. It's a nice substitute for a database depending on what you're doing. It really helps that you store can store your "database" in repo like any other text file. I'm going to be checking out the new version soon. Thanks for the support and for checking out tools! Much appreciated.
Re: tsv-utils 2.0 release: Named field support
On Monday, 27 July 2020 at 14:32:27 UTC, aberba wrote: On Sunday, 26 July 2020 at 20:28:56 UTC, Jon Degenhardt wrote: I'm happy to announce a new major release of eBay's TSV Utilities. The 2.0 release supports named field selection in all of the tools, a significant usability enhancement. So I didn't checked it out until today and I'm really impressed about the documentation, presentation and just about everything. Thanks for the kind words, and for taking the time to check out the toolkit. Both are very much appreciated!
tsv-utils 2.0 release: Named field support
Hi all, I'm happy to announce a new major release of eBay's TSV Utilities. The 2.0 release supports named field selection in all of the tools, a significant usability enhancement. For those not familiar, tsv-utils is a set of command line tools for manipulating tabular data files of the type commonly found in machine learning and data mining environments. Filtering, statistics, sampling, joins, etc. The tools are patterned after traditional Unix common line tools like 'cut', 'grep', 'sort', etc., and are intended to work with these tools. Each tool is a standalone executable. Most people will only care about a subset of the tools. It is not necessary to learn the entire toolkit to get value from the tools. The tools are all written in D and are the fastest tools of their type available (benchmarks are on the GitHub repository). Previous versions of the tools referenced fields by field number, same as traditional Unix tools like 'cut'. In version 2.0, tsv-utils tools take fields either by field number or by field name, for files with header lines. A few examples using 'tsv-select', a tool similar to 'cut' that also supports field reordering and dropping fields: $ # Field numbers: Output fields 2 and 1, in that order. $ tsv-select -f 2,1 data.tsv $ # Field names: Output the 'Name' and 'RecordNum' fields. $ tsv-select -H -f Name,RecordNum data.tsv $ # Drop the 'Color' field, keep everything else. $ tsv-select -H --exclude Color file.tsv $ # Drop all the fields ending in '_time' $ tsv-select -H -e '*_time' data.tsv More information is available on the tsv-utils GitHub repository, including documentation and pre-built binaries: https://github.com/eBay/tsv-utils --Jon
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160 Got posted again to Hacker News earlier today. Currently at position 5.
Re: Our HOPL IV submission has been accepted!
On Saturday, 29 February 2020 at 01:00:40 UTC, Andrei Alexandrescu wrote: Walter, Mike, and I are happy to announce that our paper submission "Origins of the D Programming Language" has been accepted at the HOPL IV (History of Programming Languages) conference. https://hopl4.sigplan.org/track/hopl-4-papers Getting a HOPL paper in is quite difficult, and an important milestone for the D language. We'd like to thank the D community which was instrumental in putting the D language on the map. The HOPL IV conference will take place in London right before DConf. With regard to travel, right now Covid-19 fears are on everybody's mind; however, we are hopeful that between now and then the situation will improve. Congrats! Indeed a meaningful accomplishment.
New graphs for tsv-utils performance benchmarks
A small thing - Many people who have seen the performance benchmarks for eBay's TSV Utilities find the text table format I've used in the past hard to read. Me too. So, I finally generated more traditional graphical representations for the 2018 benchmark results. The graphs are here: https://github.com/eBay/tsv-utils/blob/master/docs/Performance.md#2018-benchmark-summary There are no new benchmarks, just new visualizations of the results. For folks who not familiar with these benchmarks - This is part of performance studies done by comparing eBay's TSV Utilities with a number of command line tools providing similar functionality (e.g. awk). The results shown were presented at DConf 2018. * Details of the performance study - https://github.com/eBay/tsv-utils/blob/master/docs/Performance.md * DConf 2018 talk slides - https://github.com/eBay/tsv-utils/blob/master/docs/dconf2018.pdf
Re: LDC 1.17.0-beta1
On Saturday, 10 August 2019 at 15:51:28 UTC, kinke wrote: Glad to announce the first beta for LDC 1.17: ... Please help test, and thanks to all contributors! No changes in my standard performance tests (good). All functional tests pass as well.
Re: bool (was DConf 2019 AGM Livestream)
On Sunday, 12 May 2019 at 17:08:49 UTC, Jonathan M Davis wrote: ... snip ... Fortunately, in the grand scheme of things, while this issue does matter, it's still much smaller than almost all of the issues that we have to worry about and consider having DIPs for. Personally, I'm not at all happy that this DIP was rejected, but I think that continued debate on it is a waste of everyone's time. Agreed. I too have never liked numeric values equated to true/false, in any programming language. However, it is very common. And, relative to other the big ticket items on the table, of minor importance. Changing the current behavior won't materially affect the usability of D or its future. This is a case where the best course is to make a decision move on. --Jon
Re: eBay's TSV Utilities status update
On Friday, 3 May 2019 at 03:54:14 UTC, James Blachly wrote: On 4/29/19 11:23 AM, Jon Degenhardt wrote: An update on changes to this tool-set over the last year. ... Thank you for this, and thanks for your blog post of a couple of years ago, which I referred to many times while learning D and writing fast(er) CLI tools. Looking forward to trying Steve's iopipe as well as your bufferedByLineReader. James Thanks for the kind words James!
eBay's TSV Utilities status update
An update on changes to this tool-set over the last year. For those not familiar, tsv-utils are a set of command tools for manipulating large tabular data files. Files of numeric and text data common in machine learning and data mining environments. Filtering, statistics, sampling, joins, and more. The tools are intended for large files, larger than ideal for loading in-memory in tools like R or Pandas, but not so big as to necessitate moving to distributed compute environments. The tools are quite fast, the fastest of their kind available. Besides being real tools, tsv-utils have also provided an environment for exploring the D programming language and the D ecosystem. In past year there have been two main areas of work. One area is the sampling and shuffling facilities provided by the tsv-sample program. New sampling methods are available and performance has been improved. tsv-sample is very similar to the excellent GNU shuf tool, but supports sampling methods not available in shuf. Sampling is a rich and diverse area, and the tsv-sample code is perhaps the most algorithmically interesting the tool-set. The other main update is improved I/O read performance in many of the tools. This is from developing a buffered version of byLine. It is especially effective for skinny files (short lines). Most of the tools saw performance gains of 10-40%. One of the earlier performance improvements came from buffering output lines. Combined, the line-by-line read-write performance is quite a bit faster than what is available in Phobos. The iopipe / std.io packages (Steve Schveighoff, Martin Nowak) are faster still, these are the place to go for really high performance. (See the links below for a benchmark report.) Links: * tsv-utils repo: https://github.com/eBay/tsv-utils * tsv-sample user docs: https://github.com/eBay/tsv-utils/blob/master/docs/ToolReference.md#tsv-sample-reference * tsv-sample code docs: https://tsv-utils.dpldocs.info/tsv_utils.tsv_sample.html * Performance benchmarks on line-oriented I/O facilities: https://github.com/jondegenhardt/dcat-perf/issues/1
Re: NEW Milestone: 1500 packages at code.dlang.org
On Thursday, 7 February 2019 at 18:02:21 UTC, H. S. Teoh wrote: On Thu, Feb 07, 2019 at 05:06:09PM +, Seb via Digitalmars-d-announce wrote: On Thursday, 7 February 2019 at 16:40:08 UTC, Anonymouse wrote: > > What was the word on the autotester (or similar) testing > popular > packages as part of the test suite? This is been done since more than a year now for the ~50 most popular packages: https://buildkite.com/dlang In my opinion this is one of the main reasons why the last releases were so successful (=almost no regressions). That's awesome. This is the way to go. Congrats to everyone who helped pull this off. T Agreed! This is a really nice bit of work that's come out of the D ecosystem.
Re: D-lighted, I'm Sure
On Friday, 18 January 2019 at 14:29:14 UTC, Mike Parker wrote: Not long ago, in my retrospective on the D Blog in 2018, I invited folks to write about their first impressions of D. Ron Tarrant, who you may have seen in the Lear forum, answered the call. The result is the latest post on the blog, the first guest post of 2019. Thanks, Ron! As a reminder, I'm still looking for new-user impressions and guest posts on any D-related topic. Please contact me if you're interested. And don't forget, there's a bounty for guest posts, so you can make a bit of extra cash in the process. The blog: https://dlang.org/blog/2019/01/18/d-lighted-im-sure/ Reddit: https://www.reddit.com/r/programming/comments/ahawhz/dlighted_im_sure_the_first_two_months_with_d/ Nicely done. Very enjoyable, thanks for publishing this! --Jon
Re: My Meeting C++ Keynote video is now available
On Saturday, 12 January 2019 at 15:51:03 UTC, Andrei Alexandrescu wrote: https://youtube.com/watch?v=tcyb1lpEHm0 If nothing else please watch the opening story, it's true and quite funny :o). Now as to the talk, as you could imagine, it touches on another language as well... Andrei Very nice. I especially liked how design by introspection was contrasted with other approaches and how the constexpr discussion fit into the overall theme. --Jon
Re: DCD, D-Scanner and DFMT : new year edition
On Monday, 31 December 2018 at 07:56:00 UTC, Basile B. wrote: DCD [1] 0.10.2 comes with bugfixes and small API changes. DFMT [2] and D-Scanner [3] with bugfixes too and all of the three products are based on d-parse 0.10.z, making life easier and the libraries versions more consistent for the D IDE and D IDE plugins developers. [1] https://github.com/dlang-community/DCD/releases/tag/v0.10.2 [2] https://github.com/dlang-community/dfmt/releases/tag/v0.9.0 [3] https://github.com/dlang-community/D-Scanner/releases/tag/v0.6.0 Thanks for the ongoing work on DCD et al!
Re: Iain Buclaw at GNU Tools Cauldron 2018
On Monday, 8 October 2018 at 05:12:03 UTC, Joakim wrote: On Sunday, 7 October 2018 at 15:41:43 UTC, greentea wrote: Date: September 7 to 9, 2018. Location: Manchester, UK GDC - D front-end GCC https://www.youtube.com/watch?v=iXRJJ_lrSxE Thanks for the link, just watched the whole video. The first half-hour sets the standard as an intro to the language, as only a compiler developer other than the main implementer could give, ie someone with fresh eyes. I loved that Iain started off with a list of real-world projects. That's a mistake a lot of tech talks make, ie not motivating _why_ anybody should care about their tech and simply diving into the tech itself. I hadn't heard some of that info either, great way to begin. I agree, a very nice talk, including the way the motivation part of was handled. I especially liked the example of the group who typically used Python for rapid prototyping, then re-wrote in C++ for production, who upon trying D for a prototype, were pleasantly surprised it was performant enough for production.
eBay's TSV Utilities repository renamed
I've renamed the TSV Utilities Github repository from eBay/tsv-utils-dlang to eBay/tsv-utils. This is to better reflect the functional nature of the tools. Links pointing to the old github repo will be redirected to the new repo. This includes git operations like clone, etc., so Project Tester should not be affected. Let me know if any issues surface. --Jon
Re: Driving Continuous Improvement in D
On Saturday, 2 June 2018 at 07:23:42 UTC, Mike Parker wrote: In this post for the D Blog, Jack Stouffer details how dscanner is used in the Phobos development process to help improve code quality and fight entropy. The blog: https://dlang.org/blog/2018/06/02/driving-continuous-improvement-in-d/ reddit: https://www.reddit.com/r/programming/comments/8nyzmk/driving_continuous_improvement_in_d/ Nice post. I haven't tried dscanner on my code, but I plan to now. It looks like the documentation on the dscanner repo is pretty good. If you think it's ready for wider adoption, consider adding a couple lines to the blog post indicating that folks who want to try it will find instructions in the repo.
Re: iopipe v0.0.4 - RingBuffers!
On Friday, 11 May 2018 at 15:44:04 UTC, Steven Schveighoffer wrote: On 5/10/18 7:22 PM, Steven Schveighoffer wrote: Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU grep, which has much better performance (and is 2x as fast as iopipe_search on my Linux VM, even when printing line numbers). Yeah, the MacOS default versions of the Unix text processing tools are really slow. It's worth installing the GNU versions if doing performance comparisons on MacOS, or because you work with large files. Homebrew and MacPorts both have the GNU versions. Some relevant packages: coreutils, grep, gsed (sed), gawk (awk). Most tools are in coreutils. Many will be installed with a 'g' prefix by default, leaving the existing tools in place. e.g. 'cut' will be installed as 'gcut' unless specified otherwise. --Jon
Re: Things to do in Munich
On Monday, 30 April 2018 at 19:57:10 UTC, Seb wrote: As I live in Munich and there have been a few threads about things to do in Munich, I thought I quickly share a few selected activities + current events. - over 80 museums (best ones: Museum Brandhost, Pinakothek der Moderne, Haus der Kunst, Deutsches Museum, Glyptothek, potato museum, NS- Most of the museums are closed today (public holiday). Check before you go. However, the surfers are out! —Jon
Re: Project Highlight: The D Community Hub
On Saturday, 17 February 2018 at 12:56:34 UTC, Mike Parker wrote: In case you aren't aware of the dlang-community organization at GitHub, it's an umbrella group of contributors working to keep certain D projects alive and updated. Sebastian Wilzbach filled me in on some details for the latest Project Highlight on the blog. blog: https://dlang.org/blog/2018/02/17/project-highlight-the-d-community-hub/ reddit: https://www.reddit.com/r/programming/comments/7y6gw1/the_d_community_hub_an_umbrella_group_for_d/ Very nice article. There are more projects there than I had realized!
Re: TSV Utilities release with LTO and PGO enabled
On Wednesday, 17 January 2018 at 21:49:52 UTC, Johan Engelen wrote: On Wednesday, 17 January 2018 at 04:37:04 UTC, Jon Degenhardt wrote: Clearly personal judgment played a role. However, the tools are reasonably task focused, and I did take basic steps to ensure the benchmark data and tests were separate from the training data/tests. For these reasons, my confidence is good that the results are reasonable and well founded. Great, thanks for the details, I agree. Hope it's useful for others to see these details. Thanks Johan, much appreciated. :) (btw, did you also check the performance gains when using the profile of the benchmark itself, to learn about the upper-bound of PGO for your program?) I'll merge the IR PGO addition into LDC master soon. Don't know what difference it'll make. No, I didn't do an upper-bounds check, that's a good idea. I plan to test the IR based PGO when it's available, I'll run an upper-bounds check as part of it.
Re: TSV Utilities release with LTO and PGO enabled
On Tuesday, 16 January 2018 at 22:04:52 UTC, Johan Engelen wrote: Because PGO optimizes for the given profile, it would help a lot if you clarified how you do your PGO benchmarking. What kind of test load profile you used for optimization and what test load you use for the time measurement. The profiling used is checked into the repo and run as part of a PGO build, it is available for inspection. The benchmarks used for deltas are also documented, they the ones used in the benchmark comparison to similar tools done in March 2017. This report is in the repo (https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md). However, it's hard to imagine anyone perusing the repo for this stuff, so I'll try to summarize what I did below. Benchmarks - Six different tests of rather different but common operations run on large data files. The six tests were chosen because for each I was able to find at least three other tools, written in native compiled languages, with similar functionality. There are other valuable benchmarks, but I haven't published them. Profiling - Profiling was developed separately for each tool. For each I generated several data files with data representative of typical uses cases. Generally numeric or text data in several forms and distributions. The data was unrelated to the data used in benchmarks, which is from publicly available machine learning data sets. However, personal judgement was used in the generation of the data sets, so it's not free from bias. After generating the data, I generated a set of run options specific to each tool. As an example, tsv-filter selects data file lines based on various numeric and text criteria (e.g. less-than). There are a bit over 50 comparison operations, plus a few meta operations. The profiling runs ensure all the operations are run at least once, but that the most important overweighted. The ldc.profile.resetAll call was used to exclude all the initial setup code (command line argument processing). This was nice because it meant the data files could be small relative to real-world sets, and it runs fast enough to do at part of the build step (ie. on Travis-CI). Look at https://github.com/eBay/tsv-utils-dlang/tree/master/tsv-filter/profile_data to see a concrete example (tsv-filter). In that directory are five data files and a shell script that runs the commands and collects the data. This was done for four of the tools covering five of the benchmarks. I skipped one of the tools (tsv-join), as it's harder to come up with a concise set of profile operations for it. I then ran the standard benchmarks I usually report on in various D venues. Clearly personal judgment played a role. However, the tools are reasonably task focused, and I did take basic steps to ensure the benchmark data and tests were separate from the training data/tests. For these reasons, my confidence is good that the results are reasonable and well founded. --Jon
Re: TSV Utilities release with LTO and PGO enabled
On Tuesday, 16 January 2018 at 00:19:24 UTC, Martin Nowak wrote: On Sunday, 14 January 2018 at 23:18:42 UTC, Jon Degenhardt wrote: Combined, LTO and PGO resulted in performance improvements greater than 25% on three of my standard six benchmarks, and five of the six improved at least 8%. Yay, I'm usually seeing double digit improvements for PGO alone, and single digit improvements for LTO. Meaning PGO has more effect even though LTO seems to be the more hyped one. Have you bothered benchmarking them separately? Last spring I made a few quick tests of both separately. That was just against the app code, without druntime/phobos. Saw some benefit from LTO, mainly one of the tools, and not much from PGO. More recently I tried LTO standalone and LTO plus PGO, both against app code and druntime/phobos, but not PGO standalone. The LTO benchmarks are here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/dlang-meetup-14dec2017.pdf. I've haven't published the LTO + PGO benchmarks. The takeaway from my tests is that LTO and PGO will benefit different apps differently, perhaps in ways not easily predicted. One of my tools benefited primarily from PGO, two primarily from LTO, and one materially from both. So, it is worth trying both. For both, the big win was from optimizing across app code and libs (druntime/phobos in my case). It'd be interesting to see if other apps see similar behavior, either with phobos/druntime or other libraries, perhaps libraries from dub dependencies.
TSV Utilities release with LTO and PGO enabled
I just released a new version of eBay's TSV Utilities. The cool thing about the release is not about changes in toolkit, but that it was possible to build everything using LDC's support for Link Time Optimization (LTO) and Profile Guided Optimization (PGO). This includes running the optimizations on both the application code and the D standard libraries (druntime and phobos). Further, it was all doable on Travis-CI (Linux and MacOS), including building release binaries available from the GitHub release page. Combined, LTO and PGO resulted in performance improvements greater than 25% on three of my standard six benchmarks, and five of the six improved at least 8%. Release info: https://github.com/eBay/tsv-utils-dlang/releases/tag/v1.1.16
Re: DLang docker images for CircleCi 2.0
On Wednesday, 3 January 2018 at 13:12:48 UTC, Seb wrote: tl;dr: you can now use special D docker images for CircleCi 2.0 [snip PS: I'm aware of Stefan Rohe's great D Docker images [1], but this Docker image is built on top of the specialized CircleCi image (e.g. for their SSH login). One useful characteristic of Stefan's images is that the Dockerhub pages include the Dockerfile and github repository links. I don't know what it takes to include them. It does make it easier to see exactly what the configuration is, find the repo, and even create PRs against them. Would be useful if they can be added to the CircleCI image pages. My interest in this case - I use Stefan's LDC image in Travis-CI builds. Building the runtime libraries with LTO/PGO requires the ldc-build-runtime tool, which in turn requires a few additional things in the docker image, like cmake or ninja. I was interested if they might have been included in the CircleCI images as well. (Doesn't appear so.)
Re: Article: Finding memory bugs in D code with AddressSanitizer
On Monday, 25 December 2017 at 17:03:37 UTC, Johan Engelen wrote: I've been writing this article since August, and finally found some time to finish it: http://johanengelen.github.io/ldc/2017/12/25/LDC-and-AddressSanitizer.html "LDC comes with improved support for Address Sanitizer since the 1.4.0 release. Address Sanitizer (ASan) is a runtime memory write/read checker that helps discover and locate memory access bugs. ASan is part of the official LDC release binaries; to use it you must build with -fsanitize=address. In this article, I’ll explain how to use ASan, what kind of bugs it can find, and what bugs it will be able to find in the (hopefully near) future." Nice article. Main question / comment is about the need for blacklisting D standard libraries (druntime/phobos). If someone wants to try ASan out on their own code, can they start by ignoring the D standard libraries? And, for programs that use druntime/phobos, will this be effective? If I understand the post, the answer is "yes", but I think it could be more explicit. Second comment is related - If the reader was to try instrumenting druntime/phobos along with their own code, how much effort should be expected to correctly blacklist druntime/phobos code? Would many programs have smooth sailing if they took the blacklist published in the post? Or is this early stage enough that some real effort should be expected? Also, if the blacklist file in the post represents a meaningful starting point, perhaps it makes sense to check it in and distribute it. This would provide a place for contributors to start making improvements.
Re: Silicon Valley D Meetup - December 14, 2017 - "Experimenting with Link Time Optimization" by Jon Degenhardt
On Saturday, 16 December 2017 at 11:52:37 UTC, Johan Engelen wrote: Clearly very interested in what your PGO testing will show. :-) Early returns on adding PGO on top of LTO (first five benchmarks in the slide deck, tsv-join not tested): * Two meaningful improvements: - csv2tsv: Linux: 8%; macOS: 33% - tsv-summarize: Linux: 6%; macOS: 11% * Minor improvements on the other three benchmarks (< 5%) Overall, for LDC 1.5, the improvements going from a normal optimized build to one combining LTO and PGO ranged from on 8-45% Linux, and 6-57% on macOS. (First five benchmarks, excluding tsv-join). Impressive! --Jon
Re: Silicon Valley D Meetup - December 14, 2017 - "Experimenting with Link Time Optimization" by Jon Degenhardt
On Saturday, 16 December 2017 at 11:52:37 UTC, Johan Engelen wrote: On Friday, 15 December 2017 at 03:08:35 UTC, Ali Çehreli wrote: This should be live now: http://youtu.be/e05QvoKy_8k Great! I've added some comments there, pasted here: Fantastic feedback! Fills in some really important details. Can't wait to see the results of LTO on Weka.io's (LARGE) applications. Work in progress...! Agreed. It'd be great to see the experience of a few more apps. Could you add the reference links in the comment section there too? (can't click on blue links in the video ;-) Done. Thanks for pointing this out. I also updated the posted slide deck so that the hyperlinks work after downloading it. (They still aren't clickable in the GitHub inline viewer.) Clearly very interested in what your PGO testing will show. :-) Yes, should be interesting. Promising results in one benchmark. And sigh, I forgot to mention the opportunity you mentioned for someone to participate: Adding LLVM's IR-level PGO to the LDC compiler. Sounds pretty cool.
Re: Silicon Valley D Meetup - December 14, 2017 - "Experimenting with Link Time Optimization" by Jon Degenhardt
On Friday, 15 December 2017 at 03:08:35 UTC, Ali Çehreli wrote: This should be live now: http://youtu.be/e05QvoKy_8k Ali On 11/21/2017 11:58 AM, Ali Çehreli wrote: Meetup page: https://www.meetup.com/D-Lang-Silicon-Valley/events/245288287/ LDC[1], the LLVM-based D compiler, has been adding Link Time Optimization capabilities over the last several releases. [...] This talk will look at the results of applying LTO to one set of applications, eBay's TSV utilities[2]. [...] Jon Degenhardt is a member of eBay's Search Science team. [...] D quickly became his favorite programming language, one he uses whenever he can. Ali [1] https://github.com/ldc-developers/ldc#ldc--the-llvm-based-d-compiler [2] https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ Slides from the talk: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/dlang-meetup-14dec2017.pdf
Re: LDC 1.5.0
On Friday, 3 November 2017 at 17:17:04 UTC, kinke wrote: Hi everyone, on behalf of the LDC team, I'm glad to finally officially announce LDC 1.5. The highlights of this version in a nutshell: * Based on D 2.075.1. * Polished LLVM 5.0 support (now also used for the prebuilt release packages). * Prebuilt ARM-Linux package available again. * New command-line option `-linker` and ~25 new advanced ones for codegen fine-tuning. * Bugfixes, as always. Full release log and downloads: https://github.com/ldc-developers/ldc/releases/tag/v1.5.0 Thanks to all contributors! [LDC master is at v2.076.1, so LDC 1.6 won't take long.] Great work by the LDC team! Thanks to all the LTO work in 1.4 and 1.5, the Travis-CI builds of the eBay TSV utilities are LTO enabled for Phobos & Druntime as well as the application code. This is for both Linux and OS X builds. Couldn't do that before the LDC 1.5 release. The OS X executables are materially faster with the end-to-end LTO support. I haven't benchmarked the Linux versions yet. It would be very interesting to have get benchmark numbers from other apps, especially those making material use of phobos.
Re: LDC 1.4.0-beta1
On Saturday, 26 August 2017 at 22:35:11 UTC, kinke wrote: Hi everyone, on behalf of the LDC team, I'm glad to announce LDC 1.4.0-beta1. The highlights of version 1.4 in a nutshell: * Based on D 2.074.1. * Shipping with ldc-build-runtime, a small D tool to easily (cross-)compile the runtime libraries yourself. * Full Android support, incl. emulated TLS. * Improved support for AddressSanitizer and libFuzzer. The libraries are shipped with the prebuilt Linux x86_64 and OSX packages. * Prebuilt Linux x86_64 package shipping with LTO plugin, catching up with the OSX package. Full release log and downloads: https://github.com/ldc-developers/ldc/releases/tag/v1.4.0-beta1 Thanks to everybody contributing! Wow, this looks fantastic, congrats! --Jon
Re: Compile-Time Sort in D
On Wednesday, 7 June 2017 at 20:59:50 UTC, Joakim wrote: On Tuesday, 6 June 2017 at 01:08:45 UTC, Mike Parker wrote: On Monday, 5 June 2017 at 17:54:05 UTC, Jon Degenhardt wrote: Very nice post! Thanks! If it gets half as many page views as yours did, I'll be happy. Yours is the most-viewed post on the blog -- over 1000 views more than #2 (my GC post), and 5,000 more than #3 (A New Import Idiom). I was surprised it's so popular, as the proggit thread didn't do that great, but it did well on HN and I now see it inspired more posts for Rust (written by bearophile, I think) and Go, in addition to the Nim post linked here before: https://users.rust-lang.org/t/faster-command-line-tools-in-d-rust/10992 https://aadrake.com/posts/2017-05-29-faster-command-line-tools-with-go.html I was surprised as well, pleasantly of course. Using a simple example may have helped. Personally, I'm not bothered by the specific instances of negative feedback on Reddit. It's hard to write a post that manages to avoid that sort of thing entirely. It was also nice to see related follow-up in the D forums ("how to count lines fast" and "std.csv Performance Review"). It's less if the case for how well suited D's facilities are for the type of problem came across. It's much more clear in the Compile-Time Sort post. --Jon
Re: Compile-Time Sort in D
On Monday, 5 June 2017 at 14:23:34 UTC, Mike Parker wrote: The crowd-edited (?) blog post exploring some of D's compile-time features is now live. Thanks again to everyone who helped out with it. The blog: https://dlang.org/blog/2017/06/05/compile-time-sort-in-d/ Reddit: https://www.reddit.com/r/programming/comments/6fefdg/compiletime_sort_in_d/ Very nice post!
Re: Faster Command Line Tools in D
On Thursday, 25 May 2017 at 05:17:29 UTC, Walter Bright wrote: Any time one writes an article comparing speed between languages X and Y, someone gets their ox gored and will bitterly complain about how unfair the article is (though I noticed that none of the complainers wrote a faster Python version). Even if you tried to optimize the Python program, you'll be inevitably accused of deliberately not doing it right. The nadir of this for me was when I compared Digital Mars C++ code with DMD. Both share the same optimizer and back end, yet I was accused of "sabotaging" my own C++ compiler in order to make D look better !! Me, I just don't do public comparison benchmarking anymore. It's a waste of time arguing with people about it. I thought you wrote a fine article, and the criticism about the Python code was unwarranted (especially since nobody suggested better code), because the article was about optimizing D code, not optimizing Python. Thanks Walter, I appreciate your comments. And correct, as multiple people noted, a speed comparison with other languages not at all a goal of the article. The real intent was to tell a story of how several of D's features play together to enable optimizations like this, without having to write low-level code or step outside the core language features and standard library. --Jon
Re: Faster Command Line Tools in D
On Wednesday, 24 May 2017 at 21:46:10 UTC, cym13 wrote: On Wednesday, 24 May 2017 at 21:34:08 UTC, Walter Bright wrote: It's now #4 on the front page of Hacker News: https://news.ycombinator.com/news The comments on HN are useless though, everybody went for the "D versus Python" thing and seem to complain that it's doing a D/Python benchmark while only talking about D optimization...even though optimizing D is the whole point of the article. In the same way they rant against the fact that many iterations on the D script are shown while it is obviously to give different tricks while being clear on what trick gives what. I am disappointed because there are so many good things to say about this, so many good questions or remarks to make when not familiar with the language, and yet all we get is "Meh, this benchmark shows nothing of D's speed against Python". Its not easy writing an article that doesn't draw some form of criticism. FWIW, the reason I gave a Python example is because it is very commonly used for this type of problem and the language is well suited to it. A second reason is that I've seen several posts where someone has tried to rewrite a Python program like this in D, start with `split`, and wonder how to make it faster. My hope is that this will clarify how to achieve this. Another goal of the article was to describe how performance in the TSV Utilities had been achieved. The article is not about the TSV Utilities, but discussing both the benchmark results and how they had been achieved would be a very long article. --Jon
Re: Faster Command Line Tools in D
On Wednesday, 24 May 2017 at 17:36:29 UTC, cym13 wrote: On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote: [...snip...] A bit off topic but I really like that we still get quality content such as this post on this blog. Sustained quality is hard job and I thank everyone involved for that. The complement to the community is well deserved, thank you for including this post in the company. In this case, the post benefited from some really excellent review feedback and Mike made the publication side really easy. --Jon
Re: [OT] Fast Deterministic Selection
On Thursday, 18 May 2017 at 15:14:17 UTC, Andrei Alexandrescu wrote: The implementation is an improved version of what we now have in the D standard library. I'll take up the task of updating phobos at a later time. https://www.reddit.com/r/programming/comments/6bwsjn/fast_deterministic_selection_sea_2017_now_with/ Andrei Very nice! Is this materially faster than what is currently in Phobos (PR 4815)? That update was a substantial performance win by itself. --Jon
Re: dmd Backend converted to Boost License
On Friday, 7 April 2017 at 15:14:40 UTC, Walter Bright wrote: https://github.com/dlang/dmd/pull/6680 Yes, this is for real! Symantec has given their permission to relicense it. Thank you, Symantec! Congrats, this is a great result!
Re: Updates to the tsv-utils toolkit
On Wednesday, 22 February 2017 at 18:12:50 UTC, Jon Degenhardt wrote: It's not quite a year since the open-sourcing of eBay's tsv utilities. Since then there have been a number of additions and updates, and the tools form a more complete package. The tools assist with manipulation of tabular data files common in machine learning and data mining environments. They work alongside traditional Unix command line tools like 'cut', and 'sort'. They also fit well with data mining and stats packages like R and Pandas. The tools include filtering, slicing, joins and other manipulation, sampling, and statistical calculations. If you find yourself working with large data files from a unix shell, you may like these tools. Speed matters when processing large data files, and these tools are fast. I've published new benchmarks comparing the tools to similar tools written in several native compiled programming languages. The tools are the fastest on five of the six benchmarks run, generally by significant margins. It's a good result for the D programming language. The benchmarks may be of interest regardless of your interest in the tools themselves. Repository: https://github.com/eBay/tsv-utils-dlang Performance benchmarks: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md --Jon One more update: Schveiguy helped identify the performance bottleneck in the csv2tsv tool, now the tools are the fastest on all six benchmarks. Benchmarks have been updated (and reformatted a bit). Summary table here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md#top-four-in-each-benchmark
Re: Updates to the tsv-utils toolkit
On Wednesday, 22 February 2017 at 21:07:43 UTC, bpr wrote: On Wednesday, 22 February 2017 at 18:12:50 UTC, Jon Degenhardt wrote: ...snip... Repository: https://github.com/eBay/tsv-utils-dlang Performance benchmarks: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md --Jon This is very nice code, and a good result for D. I'll study this carefully. So much of data analysis is reading/transforming files... ...snip... Thanks! Both for the feedback and for any evaluation you might do. Any insights or thoughts you may have would be quite welcome. --Jon
Re: Updates to the tsv-utils toolkit
On Wednesday, 22 February 2017 at 18:43:57 UTC, Jack Stouffer wrote: On Wednesday, 22 February 2017 at 18:12:50 UTC, Jon Degenhardt wrote: Speed matters when processing large data files, and these tools are fast. I've published new benchmarks comparing the tools to similar tools written in several native compiled programming languages. The tools are the fastest on five of the six benchmarks run, generally by significant margins. It's a good result for the D programming language. Great news! Agreed, an outstanding result. I had not anticipated the deltas. The specialty toolkits have been anonymized in the tables below. The purpose of these benchmarks is to gauge performance of the D tools, not make comparisons between other toolkits. You're no fun ;) Yeah, I know. Not my style.
Updates to the tsv-utils toolkit
It's not quite a year since the open-sourcing of eBay's tsv utilities. Since then there have been a number of additions and updates, and the tools form a more complete package. The tools assist with manipulation of tabular data files common in machine learning and data mining environments. They work alongside traditional Unix command line tools like 'cut', and 'sort'. They also fit well with data mining and stats packages like R and Pandas. The tools include filtering, slicing, joins and other manipulation, sampling, and statistical calculations. If you find yourself working with large data files from a unix shell, you may like these tools. Speed matters when processing large data files, and these tools are fast. I've published new benchmarks comparing the tools to similar tools written in several native compiled programming languages. The tools are the fastest on five of the six benchmarks run, generally by significant margins. It's a good result for the D programming language. The benchmarks may be of interest regardless of your interest in the tools themselves. Repository: https://github.com/eBay/tsv-utils-dlang Performance benchmarks: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md --Jon
Re: Silicon Valley D Meetup - January 26, 2017 - "High Performance Tools in D" by Jon Degenhardt
On Saturday, 18 February 2017 at 07:50:02 UTC, Joakim wrote: On Friday, 27 January 2017 at 18:20:53 UTC, Jon Degenhardt wrote: On Friday, 27 January 2017 at 16:21:51 UTC, Jack Stouffer wrote: On Friday, 27 January 2017 at 03:58:26 UTC, Ali Çehreli wrote: And this: http://youtu.be/-DK4r5xewTY Hey Jon, if you're in this thread, are you able to post any of the code that you use for tsv parsing? Code has been open-sourced: https://github.com/eBay/tsv-utils-dlang The performance benchmarks showed in the talk are not in the repo, the benchmarks currently listed are from a year ago. I'm planning to update the repo in the next few weeks, probably after the next LDC release. If there are questions about specific types of things perhaps a thread in General forum would work. --Jon Watched the video some time back, interesting results. Any plans to blog about this? It would be great if you could run them through a profiler too, see why D is so much faster. Would be really worth writing this up, maybe on the D blog. Thanks for the feedback. I'm pretty close to publishing the benchmarks, they'll go in a doc file in the repository. They weren't quite complete when the meetup happened. Regarding a blog post - I haven't talked to Mike Parker, if there's interest I'd be open to it. As to why the tools compare so well - That's a really intriguing question, especially since the tools favor using high level constructs from D / Phobos rather than hand-built data structures or memory management. I have hypotheses, but no sure answers. Some of it likely involves design choices rather than language facilities per se, but even so, it's a good story for D. --Jon
Re: two points
On Thursday, 9 February 2017 at 16:48:16 UTC, Joseph Rushton Wakeling wrote: There's clearly in part a scaling problem here (in terms of how many people are available in general, and in terms of how many people have expertise on particular parts of the library) but it also feels like a few simple things (like making sure every PR author is given a reliable contact or two who they can feel entitled to chase up) could make a big difference. Regarding the scaling problem - Perhaps the bug system could be used to help engage a wider community of reviewers. Specifically, update the bugzilla ticket early in the PR lifecycle as an alerting mechanism. This idea comes from my experiences so far. I've found any number of bugs and enhancements in the bug system that directly interact with things I'm implementing. I typically add myself to CC list so I hear about changes. In many of these cases I'd be willing to help with reviewing. However, when a PR associated with the issue is created, the ticket itself is normally not updated until after the review is finished and the PR closed, to late to help out. Of course, someone like myself, a part-timer to the community at best, should not be a primary reviewer. However, for specific issues, it's often the case that I've studied the area of code involved. If there is a wider set of people in a similar situation perhaps this might help engage a wider set of people. --Jon
Re: Silicon Valley D Meetup - January 26, 2017 - "High Performance Tools in D" by Jon Degenhardt
On Friday, 27 January 2017 at 20:48:30 UTC, Ali Çehreli wrote: On 01/27/2017 08:21 AM, Jack Stouffer wrote: On Friday, 27 January 2017 at 03:58:26 UTC, Ali Çehreli wrote: And this: http://youtu.be/-DK4r5xewTY Hey Jon, if you're in this thread, are you able to post any of the code that you use for tsv parsing? Yeah, the slide starting at 19'35 is the most interesting: https://www.youtube.com/watch?v=-DK4r5xewTY&feature=youtu.be&t=1175 Tools written in D (mostly with Phobos and with GC) are at least 3 times faster! Let's verify the results and then make some noise. :) Ali An independent verification of the results would be fantastic. Any time a single person does this type of benchmark, especially the author of the tool, there's real risk of an error. In this case I took every reasonable step I knew to be diligent about it, but still. And yes, the deltas are impressive. I was surprised.
Re: Silicon Valley D Meetup - January 26, 2017 - "High Performance Tools in D" by Jon Degenhardt
On Friday, 27 January 2017 at 16:21:51 UTC, Jack Stouffer wrote: On Friday, 27 January 2017 at 03:58:26 UTC, Ali Çehreli wrote: And this: http://youtu.be/-DK4r5xewTY Hey Jon, if you're in this thread, are you able to post any of the code that you use for tsv parsing? Code has been open-sourced: https://github.com/eBay/tsv-utils-dlang The performance benchmarks showed in the talk are not in the repo, the benchmarks currently listed are from a year ago. I'm planning to update the repo in the next few weeks, probably after the next LDC release. If there are questions about specific types of things perhaps a thread in General forum would work. --Jon
Command line tool for weighted reservoir sampling
I released a new tool for weighted random sampling of tabular data files: tsv-sample. It's one of several tools recently added to tsv file toolkit I released last year. These tools are especially useful when data files are larger than is desirable to read entirely into memory in R and similar apps. I'll publish an announcement of broader set of tools updates in the next few weeks. I have some performance benchmarks to finish first. However, weighted reservoir sampling algorithms are interesting, I thought there might be enough interest to warrant a separate announcement. Repo: https://github.com/eBay/tsv-utils-dlang tsv-sample code: https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/src/tsv-sample.d --Jon
Re: Beta 2.073.0-b1
On Saturday, 7 January 2017 at 05:02:13 UTC, Martin Nowak wrote: First beta for the 2.073.0 release. This release comes with a few phobos additions, a new -mcpu=avx switch, an experimental safety checks (-transition=safe/-dip1000), and several bugfixes. http://dlang.org/download.html#dmd_beta http://dlang.org/changelog/2.073.0.html Please report any bugs at https://issues.dlang.org -Martin The change log should probably include the topN rewrite. PR 4815, several issue reports. --Jon
Re: The D Language Foundation is now a tax exempt non-profit organization
On Monday, 29 August 2016 at 17:03:34 UTC, Andrei Alexandrescu wrote: We're happy to report that the D Language Foundation is now a public charity operating under US Internal Revenue Code Section 501(c)(3). The decision is retroactive to September 23, 2015. This has wide-ranging implications, the most important being that individuals and organizations may make tax deductible bequests, devises, transfers, or gifts to the Foundation. We will mull over defining donation and sponsorship packages in the near future. If interesting in donating spontaneously, feel free to reach out to us via email at foundat...@dlang.org. Many thanks are due to the folks in this community who asked for and supported this initiative. Fantastic! Congrats, nice work!