Re: The Case Against Autodecode
On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote: > I am as unclear about the problems of autodecoding as I am about the necessity > to remove curl. Whenever I ask I hear some arguments that work well emotionally > but are scant on reason and engineering. Maybe it's time to rehash them? I just > did so about curl, no solid argument seemed to come together. I'd be curious of > a crisp list of grievances about autodecoding. -- Andrei Given the importance of performance in the auto-decoding topic, it seems reasonable to quantify it. I took a stab at this. It would of course be prudent to have others conduct similar analysis rather than rely on my numbers alone. Measurements were done using an artificial scenario, counting lower-case ascii letters. This had the effect of calling front/popFront many times on a long block of text. Runs were done both treating the text as char[] and ubyte[] and comparing the run times. (char[] performs auto-decoding, ubyte[] does not.) Timings were done with DMD and LDC, and on two different data sets. One data set was a mix of latin languages (e.g. German, English, Finnish, etc.), the other non-Latin languages (e.g. Japanese, Chinese, Greek, etc.). The goal being to distinguish between scenarios with high and low Ascii character content. The result: For DMD, auto-decoding showed a 1.6x to 2.6x cost. For LDC, a 12.2x to 12.9x cost. Details: - Test program: https://dpaste.dzfl.pl/67c7be11301f - DMD 2.071.0. Options: -release -O -boundscheck=off -inline - LDC 1.0.0-beta1 (based on DMD v2.070.2). Options: -release -O -boundscheck=off - Machine: Macbook Pro (2.8 GHz Intel I7, 16GB ram) Runs for each combination were done five times and the median times used. The median times and the char[] to ubyte[] ratio are below: | | |char[] | ubyte[] | | Compiler | Text type | time (ms) | time (ms) | ratio | |--+---+---+---+---| | DMD | Latin | 7261 | 4513 | 1.6 | | DMD | Non-latin | 10240 | 3928 | 2.6 | | LDC | Latin | 11773 | 913 | 12.9 | | LDC | Non-latin | 10756 | 883 | 12.2 | Note: The numbers above don't provide enough info to derive a front/popFront rate. The program artificially makes multiple loops to increase the run-times. (For these runs, the program's repeat-count was set to 20). Characteristics of the two data sets: | | | | | Bytes per | | Text type | Bytes | DChars | Ascii Chars | DChar | Pct Ascii | |---+-+-+-+---+---| | Latin | 4156697 | 4059016 | 3965585 | 1.024 | 97.7% | | Non-latin | 4061554 | 1949290 | 348164 | 2.084 | 17.9% | Run-to-run variability - The run times recorded were quite stable. The largest delta between minimum and median time for any group was 17 milliseconds.
Re: Command line parsing
On Saturday, 14 May 2016 at 13:17:05 UTC, Andrei Alexandrescu wrote: I showed a fellow programmer std.getopt. We were both on laptops. He wanted to show me how good Python's argparse is and how D should copy it. By the end of the chat it was obvious argparse was much more verbose and less pleasant to use than getopt. Like you have to create an object (?!?!) to parse the command line and many other lines of nonsense. I've found D's getopt package to be pretty good. There are a number of small things that could make it quite a bit better. To me these generally appear more the result of limited usage rather than anything fundamentally wrong with the design. For example, error text produced when a run-time argument doesn't match the option spec is often not helpful to the user who entered the command, and I've found I need to take steps to address this. A package like Perl's Getopt::Long tends to a bit more mature in some of these details. --Jon
Re: Intermediate level D and open source projects to study
On Wednesday, 11 May 2016 at 18:41:47 UTC, xtreak wrote: Hi, I am a D newbie. I worked through D programming language and programming in D books. I primarily use Python daily. I will be happy to know how I can go to intermediate level in D. It will be hepful to have projects in D of high quality and also beginner friendly code that I can study to improve my D. [snip] Might not be exactly what you are looking for, but I recently open-sourced some command line utilities you could take look at. They are real apps in that they take command line arguments, have help, error handling, etc. But, they are doing relatively straightforward tasks, things you might do in Python also. A caution: I'm relatively new to D as well, and there are likely places where the code could be more idiomatic D. Utilities are at: https://github.com/eBay/tsv-utils-dlang. The readme has a section labeled "The code" that describes the code structure.
Re: Compiler benchmarks for an alternative to std.uni.asLowerCase.
On Monday, 9 May 2016 at 00:15:03 UTC, Peter Häggman wrote: On Sunday, 8 May 2016 at 23:38:31 UTC, Jon D wrote: I did a performance study on speeding up case conversion in std.uni.asLowerCase. Specifics for asLowerCase have been added to issue https://issues.dlang.org/show_bug.cgi?id=11229. Publishing here as some of the more general observations may be of wider interest. [...] Nice, it seems that you would have enough material to advocate a pull request in phobos then ;) Thanks! I haven't yet taken the time to go through the 'becoming a contributor' steps, when I have the time I'll do that. In this case, I'd want to start by validating with the library designers that the approach makes sense. It by-passes what appears to a basic primitive, std.uni.toCaser. There may be reasons this is not desirable.
Compiler benchmarks for an alternative to std.uni.asLowerCase.
I did a performance study on speeding up case conversion in std.uni.asLowerCase. Specifics for asLowerCase have been added to issue https://issues.dlang.org/show_bug.cgi?id=11229. Publishing here as some of the more general observations may be of wider interest. Background - Case conversion can generally be sped up by checking if a character is ascii before invoking a full unicode case conversion. The single character std.uni.toLower does this optimization, but std.uni.asLowerCase does not. asLowerCase does a lazy conversion of a range. For the test, I created a replacement for asLowerCase which uses map and toLower. In essence, `map!(x => x.toLower)` or `map!(x => x.byDchar.toLower)`. Testing was with DMD (2.071) and LDC 1.0.0-beta1 (Phobos 2.070) on OSX. Compiler settings were `-release -O -boundscheck=off`. DMD was tested with and without `-inline`. LDC turns on inlining (-enable-inlining=1) by default with -O, but DMD does not. Texts tried were in Japanese, Chinese, Finnish, English, German, and Spanish. Timing was done both including and excluding decoding from utf-8 to dchar. Performance delta including decoding to dchar: | Language group | Pct Ascii | LDC gain | DMD gain | DMD no inline | |-+---++---+| | Latin |95-99% | 64% (2.7x) | 93% (14x) | 48% (1.9x) | | Asian (Jpn/Chn) | 2.4-3.7% | 36% (1.6x) | 80% (5x) | -1% Performance delta excluding decoding to dchar: | Language group | Pct Ascii | LDC gain | DMD gain | DMD no inline | |-+---++---+---| | Latin |95-99% | 60% (2.5x) | 95% (20x) | 60% (2.5x)| | Asian (Jpn/Chn) | 2.4-3.7% | 50% (2x) | 95% (20x) | -2% Observations: * mapAsLowerCase was faster than asLowerCase across the board. That it was better for Asian texts suggests the improvement involved more just the ascii check optimization. * Performance varied widely between compilers, and for DMD, whether the -inline flag was included. The performance delta between asLowerCase and the mapAsLowerCase replacement was very dependent on these choices. Similarly, the delta between inclusion and exclusion of auto-decoding was highly dependent on these selections. * DMD improvement by using -inline: 30% for asLowerCase (1.5x), 90% for mapAsLowerCase (10x). * DMD (-inline) vs LDC: For asLowerCase, LDC was 65-85% faster. For mapAsLowerCase, DMD was 10-40% faster. There were changes to the map implementation in 2.071, so these were not equivalent, but still, it's interesting that DMD beat LDC in this case. Thoughts: * The large variances between compiler settings imply extra diligence when performance tuning at the source code level, especially for code intended for multiple compilers. * Perhaps DMD -O should also turn on -inline. This would present a better performance picture to new users. It's also helpful when the different compilers agree on rough meaning of compiler switches. * Auto-decoding is an oft discussed concern. It doesn't show up in the table above, but the data I looked at suggests the cost/penalty may vary quite a bit depending on usage context and compiler/settings. I wasn't studying aspect explicitly. It may be worth its own analysis. Other details: * Code for mapAsLowerCase and the timing program is at: https://dpaste.dzfl.pl/a0e2fa1c71fd * Texts used for timing were books in several languages from the Project Gutenberg site (http://www.gutenberg.org/), with boilerplate text removed. --Jon
Re: Can't use std.algorithm.remove on a char[]?
On Saturday, 30 April 2016 at 19:21:30 UTC, ag0aep6g wrote: On 30.04.2016 21:08, Jon D wrote: If an initial step is to fix the documentation, it would be helpful to include specifically that it doesn't work with characters. It's not obvious that characters don't meet the requirement. Characters are not the problem. remove works fine on a range of chars, when the elements are assignable lvalues. char[] as a range has neither assignable elements, nor lvalue elements. That is, lines 3 and 4 here don't compile: import std.range: front; char[] a = ['f', 'o', 'o']; a.front = 'g'; auto ptr = I didn't mean to suggest making the documentation technically incorrect. Just that it be helpful in important cases that won't necessarily be obvious. To me, char[] is an important case, one that's not made obvious by listing the hasLvalueElements constraint by itself. --Jon
Re: Can't use std.algorithm.remove on a char[]?
On Saturday, 30 April 2016 at 18:32:32 UTC, ag0aep6g wrote: On 30.04.2016 18:44, TheGag96 wrote: I was just writing some code trying to remove a value from a character array, but the compiler complained "No overload matches for remove", and if I specifically say use std.algorithm.remove() the compiler doesn't think it fits any definition. For reference, this would be all I'm doing: char[] thing = ['a', 'b', 'c']; thing = thing.remove(1); Is this a bug? std.algorithm claims remove() works on any forward range... The documentation is wrong. 1) remove requires a bidirectional range. The constraints and parameter documentation correctly say so. char[] is a bidirectional range, though. 2) remove requires lvalue elements. char[] fails this, as the range primitives decode the chars on-the-fly to dchars. Pull request to fix the documentation: https://github.com/dlang/phobos/pull/4271 By the way, I think requiring lvalues is too restrictive. It should work with assignable elements. Also, it has apparently been missed that const/immutable can make non-assignable lvalues. There's a ticket open related to the lvalue element requirement: https://issues.dlang.org/show_bug.cgi?id=8930 Personally, I think this example is more compelling than the one in the ticket. It seems very reasonable to expect that std.algorithm.remove will work regardless of whether the elements are characters, integers, ubytes, etc. If an initial step is to fix the documentation, it would be helpful to include specifically that it doesn't work with characters. It's not obvious that characters don't meet the requirement. --Jon
Re: So, to print or not to print?
On Tuesday, 26 April 2016 at 16:30:22 UTC, Jonathan M Davis wrote: On Tuesday, April 26, 2016 12:18:11 cym13 via Digitalmars-d wrote: Finally it doesn't bring much. One learns writeln, laments a bit that it doesn't put spaces itself then just accepts it. I confess that I was very surprised to find out that writeln worked with multiple arguments. In my initial look at D I would have appreciated print. However, at least part of the reason is that it was a while before I knew writefln existed. After finding it (and discovering that writeln takes multiple arguments), having the functionality of print was less an issue. It's not easy to reconstruct why it took me a while to discover writefln, but perhaps finding places to show it off in introductory material would help others find it more quickly. --Jon
Re: Is there a way to disable 'dub test' for applications?
On Monday, 18 April 2016 at 11:47:42 UTC, Dicebot wrote: On Monday, 18 April 2016 at 04:25:25 UTC, Jon D wrote: I have an dub config file specifying a targetType of 'executable'. There is only one file, the file containing main(), and no unit tests. When I run 'dub test', dub builds and runs the executable. This is not really desirable. Is there a way to set up the dub configuration file to disable running the test? configuration "unittest" { excludedSourceFiles "path/to/main.d" } Very nice, thank you. What also seems to work is: configuration "unittest" { targetType "none" } Then 'dub test' produces the message: Configuration 'unittest' has target type "none". Skipping test.
Re: Is there a way to disable 'dub test' for applications?
On Monday, 18 April 2016 at 05:30:21 UTC, Jonathan M Davis wrote: On Monday, April 18, 2016 04:25:25 Jon D via Digitalmars-d-learn wrote: I have an dub config file specifying a targetType of 'executable'. There is only one file, the file containing main(), and no unit tests. When I run 'dub test', dub builds and runs the executable. This is not really desirable. Is there a way to set up the dub configuration file to disable running the test? Note: What I'd really like to do is run a custom shell command when 'dub test' is done, I haven't seen anything suggesting that's an option. However, disabling would still be useful. What's the point of even running dub test if you have no unit tests? Just do dub build, and then use the resulting executable, or if you want to build and run in one command, then use dub run. - Jonathan M Davis I should have supplied more context. A few days ago I announced open-sourcing a D package consisting of several executables. Multiple comments recommended making it available via the Dub repository. I wasn't using Dub to build, and there are a number of loose ends when working with Dub and multiple executables. I've been trying to limit the number of issues others might encounter if they pulled the package and ran typical commands, like 'dub test'. It's not a big deal, but if there's an easy way to provide a handler, I will. Also, the reason for a custom shell command is that there are tests, it's just that they are run against the built executable rather than via the unittest framework. --Jon
Is there a way to disable 'dub test' for applications?
I have an dub config file specifying a targetType of 'executable'. There is only one file, the file containing main(), and no unit tests. When I run 'dub test', dub builds and runs the executable. This is not really desirable. Is there a way to set up the dub configuration file to disable running the test? Note: What I'd really like to do is run a custom shell command when 'dub test' is done, I haven't seen anything suggesting that's an option. However, disabling would still be useful. --Jon
Specifying a minimum Phobos version in dub?
Is there a way to specify a minimum Phobos version in a dub package specification? --Jon
Re: Command line utilities for tab-separated value files
On Wednesday, 13 April 2016 at 19:52:30 UTC, Walter Bright wrote: On 4/11/2016 5:50 PM, Jon D wrote: I'd welcome any feedback, either on the apps or the code. Intention is that the code be reasonable example programs. And, I may write a blog post about my D explorations at some point, they'd be referenced in such an article. You've got questions on: https://www.reddit.com/r/programming/comments/4ems6a/commandline_utilities_for_large_tabseparated/ !! As the author, it'd be nice to do an AMA there. Thanks for posting there and letting me know. I responded and will watch the thread. What do you mean by an "AMA"?
Re: Command line utilities for tab-separated value files
On Wednesday, 13 April 2016 at 18:22:21 UTC, Dicebot wrote: On Wednesday, 13 April 2016 at 17:21:58 UTC, Jon D wrote: You don't need to put anything on path to run utils from dub packages. `dub run` will take care of setting necessary envionment (without messing with the system): dub fetch package_with_apps dub run package_with_apps:app1 --flags args These are command line utilities, along the lines of unix 'cut', 'grep', etc, intended to be used as part of unix pipeline. It'd be less convenient to be invoking them via dub. They really should be on the path themselves. Sure, that would be beyond dub scope though. Making binary packages is independent of build system or source layout (and is highly platform-specific). The `dun run` feature is mostly helpful when you need to use one such tool as part of a build process for another dub package. Right. So, partly what I'm wondering is if during the normal dub fetch/run cycle there might be an opportunity to print a message the user with some info to help them add the tools to their path. I haven't used dub much, so I'll have to look into it more. But there should be some way to make it reasonably easy and clear. It'll probably be a few days before I can get to this, but I would like to get them in the package registry. --Jon
Re: Command line utilities for tab-separated value files
On Wednesday, 13 April 2016 at 12:36:56 UTC, Dejan Lekic wrote: On Tuesday, 12 April 2016 at 00:50:24 UTC, Jon D wrote: I've open sourced a set of command line utilities for manipulating tab-separated value files. I rarely need TSV files, but I deal with CSV files every day. - It would be nice to test your implementation against std.csv (it can use TAB as separator). Did you try to compare the two? No, I didn't try using the std.csv library utilities. The utilities all take a delimiter, so comma can be specified, but that won't handle CSV escaping. For myself, I'd be more inclined to add TSV-CSV converters rather than adding native CSV support to each tool, but if you're working with CSV all the time that'd be nuisance. If you want, you can try rewriting the inner loop of one of the tools to use csvNextToken rather than algorithm.splitter. tsv-select would be the easiest of the tools to try. It'd also be necessary to replace the writeln for the output to properly add CSV escapes. --Jon
Re: Command line utilities for tab-separated value files
On Wednesday, 13 April 2016 at 07:34:11 UTC, Rory McGuire wrote: On Wed, Apr 13, 2016 at 3:41 AM, Puming via Digitalmars-d-announce < digitalmars-d-announce@puremagic.com> wrote: On Tuesday, 12 April 2016 at 06:22:55 UTC, Puming wrote: Here is what I know of it, using subPackages: Just tried your suggestion and it works. I just added the below to the parent project to get the apps build: void main() { import std.process : executeShell; executeShell(`dub build :app1`); executeShell(`dub build :app2`); executeShell(`dub build :app3`); } Thanks Rory, Puming. I'll look into this and see how best to make it fit. I'm realizing also there's one additional capability it'd be nice to have in dub for tools like this, which in an option to install the executables somewhere that can be easily be put on the path. Still, even without this there'd be benefit to having them fetched via dub. --Jon
Re: Command line utilities for tab-separated value files
On Tuesday, 12 April 2016 at 06:22:55 UTC, Puming wrote: On Tuesday, 12 April 2016 at 00:50:24 UTC, Jon D wrote: Hi all, I've open sourced a set of command line utilities for manipulating tab-separated value files. They are complementary to traditional unix tools like cut, grep, etc. They're useful for manipulating large data files. I use them when prepping files for R and similar tools. These tools were part of my 'explore D' programming exercises. [...] Interesting, I have large csv files, and this lib will be useful. Can you put it onto code.dlang.org so that we could use it with dub? I'd certainly like to make it available via dub, but I wasn't sure how to set it up. There are two issues. One is that the package builds multiple executables, which dub doesn't seem to support easily. More problematic is that quite a bit of the test suite is run against the executables, which I could automate using make, but didn't see how to do it with dub. If there are suggestions for setting this up in dub that'd be great. An example project doing something similar would be really helpful. --Jon
Command line utilities for tab-separated value files
Hi all, I've open sourced a set of command line utilities for manipulating tab-separated value files. They are complementary to traditional unix tools like cut, grep, etc. They're useful for manipulating large data files. I use them when prepping files for R and similar tools. These tools were part of my 'explore D' programming exercises. The tools are here: https://github.com/eBay/tsv-utils-dlang They are likely of interest primarily to people regularly working with large files, though others might find the performance benchmarks of interest as well (included in the README). I'd welcome any feedback, either on the apps or the code. Intention is that the code be reasonable example programs. And, I may write a blog post about my D explorations at some point, they'd be referenced in such an article. --Jon
Re: Weak Purity Blog Post
On Monday, 28 March 2016 at 01:44:02 UTC, sarn wrote: D's implementation of functional purity supports "weak" purity - functions that can mutate arguments but are otherwise traditionally pure. I wrote a post about some of the practical benefits of this kind of purity: https://theartofmachinery.com/2016/03/28/dirtying_pure_functions_can_be_useful.html Nice article. A suggestion: The point about improved testability when designing for purity is well made. In D, this is further supported by the ability to write and place unit tests alongside the functions themselves. That's my personal opinion at least - because unit test are so easy to write in D, it encourages design for testability. My suggestion is to add a note about this to the post. --Jon
Re: Pitching D to a gang of Gophers
On Saturday, 12 March 2016 at 08:09:41 UTC, Dmitry Olshansky wrote: On 05-Mar-2016 14:05, Dmitry Olshansky wrote: Obligatory slides: http://slides.com/dmitryolshansky/deck/fullscreen/ Very nice slide deck. Thanks for publishing. --Jon
Re: Speed kills
On Wednesday, 9 March 2016 at 20:30:10 UTC, Jon D wrote: I seen a few cases while exploring D. Turns out there are issues filed for each of the performance issues I mentioned: * Lower casing strings: https://issues.dlang.org/show_bug.cgi?id=11229 * Large associative arrays: https://issues.dlang.org/show_bug.cgi?id=2504 * Associative arrays - Checking membership with mutable values (char arrays) rather strings (immutable): https://issues.dlang.org/show_bug.cgi?id=15038
Re: Speed kills
On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote: Since I posted this thread I've learned std.algorithm.sum is 4 times slower than a naive loop sum. Even if this is for reasons of accuracy this is exactly what I am talking about- this is a hidden iceberg of terrible performance that will reflect poorly on D. That's so slow the function needs a health warning. I seen a few cases while exploring D. Not all fully researched (apologies for that), but since there appears to be interest in identification I'll list them. * Lower-casing strings (likely upper-casing), and some character type checks. Routines like to toLower and asLowerCase call functions that work for all unicode characters. I was able to create much faster versions by checking if the character was ascii, then calling either the ascii version or the more general version. Same is true for a few routines like isNumber. Some have the ascii check optimization built in, but not all. If this optimization is added, it might also be useful to add a few common combinations (or a generic solution, if that's feasible). For example, to check if a character is alpha-numeric, one currently ORs two tests from the standard library, isAlpha and isNumber. Putting in an ascii optimization check requires putting it before doing the OR, rather than inside the tests being ORed. * Large associative arrays When associative arrays get beyond about 10 million entries performance starts to decline. I believe this is due to resizing the arrays. It's worse with strings as keys than integers as keys. Having a way to reserve capacity may help under some circumstances. * Associative arrays - Converting keys to immutable versions for lookup Associative arrays want immutable values as keys. Far as I can tell, immutable values are also required when performing a lookup, even if a new entry won't be stored. A couple apps I've written walk through large lists of text values, naturally available as char[] because they are read from input streams. To test presence in an associative array, it's necessary to copy them to immutable strings first. I haven't fully researched this one, but my experience is that copying from char[] to string becomes a meaningful cost. On the surface, this appears to be an optimization opportunity, to create the immutable strings only when actually storing a new value. --Jon
Re: Get memory usage report from GC
On Saturday, 20 February 2016 at 05:34:01 UTC, tcak wrote: On Saturday, 20 February 2016 at 05:33:00 UTC, tcak wrote: Is there any way (I checked core.memory already) to collect report about memory usage from garbage collector? So, I can see a list of pointer and length information. Since getting this information would require another memory area in heap, it could be like logging when report is asked. My long running but idle program starts using 41.7% of memory (that's close to 3GB), and it is not obvious whether the memory is allocated by a single variable, or many variables. My mistake, it is close to 512MB. Doesn't sounds like precisely what you want, but there are summary reports of GC activity available via the "--DRT-gcopt=profile:1" command line option. More info at: http://dlang.org/spec/garbage.html --Jon
Re: Scala Spark-like RDD for D?
On Wednesday, 17 February 2016 at 02:32:01 UTC, bachmeier wrote: You can discuss here, but there is also a gitter room https://gitter.im/DlangScience/public Also, I've got a project that embeds R inside D http://lancebachmeier.com/rdlang/ It's not quite as good a user experience as others because I have limited time for things not related to work. I've got an older project to embed D inside R, but it hasn't been updated in a while and it's Linux only. https://bitbucket.org/bachmeil/dmdinline2 Excellent, thanks, I'll check these out. --Jon
Re: Scala Spark-like RDD for D?
On Tuesday, 16 February 2016 at 16:27:27 UTC, bachmeier wrote: On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer wrote: As an alternative are there plans for parallel/cluster computing frameworks for D? You can use MPI: https://github.com/DlangScience/OpenMPI FWIW, I'm interested in the wider topic of incorporating D into data science environments also. Sounds as if there are several interesting projects in the area, but so far my understanding of them is limited. Perhaps the forum isn't the best place to discuss, but if there happen to be any blog posts or other descriptions, it'd be great to get links. --Jon
Re: Reserving capacity in associative arrays
On Tuesday, 16 February 2016 at 19:49:55 UTC, H. S. Teoh wrote: On Tue, Feb 16, 2016 at 07:34:07PM +, Jon D via Digitalmars-d-learn wrote: On Tuesday, 16 February 2016 at 16:37:07 UTC, Steven Schveighoffer wrote: >On 2/14/16 10:22 PM, Jon D wrote: >>Is there a way to reserve capacity in associative arrays? >>[snip] >>The underlying implementation of associative arrays appears >>to take >>an initial number of buckets, and there's a private resize() >>method, >>but it's not clear if there's a public way to use these. Rehashing (aa.rehash) would resize the number of buckets, but if you don't already have the requisite number of keys, it wouldn't help. Thanks for the reply and the detailed example for manually controlling GC. I haven't experimented with taking control over GC that way. Regarding reserving capacity, the relevant method is aa.resize(), not aa.rehash(). See: https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L141. This allocates space for the buckets, doesn't matter if the keys are known. Note that every time the buckets array is resized the old bucket array is walked and elements reinserted. Preallocating allocating a large bucket array would avoid this. See also the private constructor in the same file (line 51). It takes an initial size. --Jon
Re: Reserving capacity in associative arrays
On Tuesday, 16 February 2016 at 16:37:07 UTC, Steven Schveighoffer wrote: On 2/14/16 10:22 PM, Jon D wrote: Is there a way to reserve capacity in associative arrays? [snip] The underlying implementation of associative arrays appears to take an initial number of buckets, and there's a private resize() method, but it's not clear if there's a public way to use these. There is not a public way to access these methods unfortunately. It would be a good addition to druntime I believe. Recently, I added a clear method to the AA, which does not reduce capacity. So if you frequently build large AAs, and then throw them away, you could instead reuse the memory. My programs build AAs lasting the lifetime of the program. I would caution to be sure of this cause, however, before thinking it would solve the problem. The AA not only uses an array for buckets, but allocates a memory location for each element as well. I'm often wrong when I assume what the problem is when it comes to GC issues... Completely agree. After posting I decided to take a more methodical look. Not finished yet, but I can share part of it. Key thing so far is noticeable step function related to GC costs related to AA size (likely not a surprise). My programs work with large data sets. Size is open-ended, what I'm trying to do is get an idea of the data set sizes they will handle reasonably. For purposes of illustration, word-count is a reasonable proxy for what I'm doing. It was in this context that I saw significant performance drop-off after 'size_t[string]' AAs reached about 10 million entries. I've started measuring with a simple program. Basically: StopWatch sw; sw.start; size_t[size_t] counts; foreach (i; 0..iterations) counts[uniform(0, uniqMax)]++; sw.stop; Same thing with string as key ('size_t[string]') AAs. 'iterations' and 'uniqMax' are varied between runs. GC stats are printed (via "--DRT-gcopt=profile:1"), plus timing and AA size. (Runs use LDC 17, release mode compiles, a fast 16GB MacBook). For the integer as key case ('size_t[size_t]', there are notable jumps in GC total time and GC max pause time as AA size crosses specific size thresholds. This makes sense, as the AA needs to grow. Approximate steps: | entries | gc_total (ms) | gc_max_pause (ms) | |-+---+---| | 2M |30 |60 | | 4M | 200 | 100 | | 12M | 650 | 330 | | 22M | 1650 | 750 | | 44M | 5300 | 3200 | Iterations didn't matter, and gc total time and gc max time were largely flat between these jumps. This suggests AA resize is the likely driver, and that preallocating a large size might address it. To the point about being sure about cause - my programs use strings as keys, not integers. The performance drop-off with strings was quite a bit more significant than with integers. That analysis seems a bit trickier, I'm not done with that yet. Different memory allocation, perhaps effects from creating short-lived, temporary strings to test AA membership. Could easily be that string use or the combo of AAs with strings as key is a larger effect. The other thing that jumps out from the table is the GC max pause time gets to be multiple seconds. Not an issue for my tools, which aren't interactive at those points, but would be significant issue for many interactive apps. --Jon
Re: Reserving capacity in associative arrays
On Tuesday, 16 February 2016 at 17:05:11 UTC, Basile B. wrote: On Tuesday, 16 February 2016 at 16:37:07 UTC, Steven Schveighoffer wrote: There is not a public way to access these methods unfortunately. It would be a good addition to druntime I believe. -Steve After reading the topic i've added this enhancement proposal, not quite sure if it's possible: https://issues.dlang.org/show_bug.cgi?id=15682 The idea is to concatenate smallers AA into the destination. There is also this: https://issues.dlang.org/show_bug.cgi?id=2504
Re: Reserving capacity in associative arrays
On Monday, 15 February 2016 at 05:29:23 UTC, sigod wrote: On Monday, 15 February 2016 at 03:22:44 UTC, Jon D wrote: Is there a way to reserve capacity in associative arrays? [snip] Maybe try using this: http://code.dlang.org/packages/aammm Thanks, I wasn't aware of this package. I'll give it a try. --Jon
Reserving capacity in associative arrays
Is there a way to reserve capacity in associative arrays? In some programs I've been writing I've been getting reasonable performance up to about 10 million entries, but beyond that performance is impacted considerably (say, 30 million or 50 million entries). GC stats (via the "--DRT-gcopt=profile:1" option) indicate dramatic increases in gc time, which I'm assuming comes from resizing the underlying hash table. I'm guessing that by preallocating a large size the performance degradation would not be quite so dramatic. The underlying implementation of associative arrays appears to take an initial number of buckets, and there's a private resize() method, but it's not clear if there's a public way to use these. --Jon
Re: Vision for the first semester of 2016
On Monday, 25 January 2016 at 02:37:40 UTC, Andrei Alexandrescu wrote: Hot off the press! http://wiki.dlang.org/Vision/2016H1 -- Andrei A couple comments: a) No mention of targeting increased organizational participation (academic, corporate, etc). Not trying to suggest it should or shouldn't be a goal. Just that if it is goal meaningful effort will be directed toward in H1 then it'd be worth including in the writeup. b) More specificity in the roadmap and priorities, to the extent they are known - As a potential D adopter, it'd be useful to have better insight into where the language might be a year or two out. For example, what forms of C++ integration might be available, or if the major components of the standard library are likely to be available nogc. However, it's hard to discern this from the writeup. Perhaps in many cases it would be premature to establish such goals, but to the extent there has been concrete thought it'd be useful to write it up. This comment is similar to a number of others suggesting a preference for more concrete goals. --Jon
Difference between toLower() and asLowerCase() for strings?
I'm trying to identify the preferred ways to lower case a string. In std.uni there are two functions that return the lower case form of a string: toLower() and asLowerCase(). There is also toLowerInPlace(). I'm having trouble figuring out what the relationship is between these, and when to prefer one over the other. Both take a strings, asLowerCase also takes range. Otherwise, I couldn't find the differences in the documentation. Implementations are apparently different, but not clear what the real difference is. Are there reasons to prefer one over the other? --Jon
Re: Difference between toLower() and asLowerCase() for strings?
On Sunday, 24 January 2016 at 21:04:46 UTC, Adam D. Ruppe wrote: On Sunday, 24 January 2016 at 20:56:20 UTC, Jon D wrote: I'm trying to identify the preferred ways to lower case a string. In std.uni there are two functions that return the lower case form of a string: toLower() and asLowerCase(). There is also toLowerInPlace(). toLower will allocate a new string, leaving the original untouched. toLowerInPlace will modify the existing string. asLowerCase will returned the modified data as you iterate over it, but will not actually allocate the new string. [snip...] As a general rule, the asLowerCase (etc.) version should be your first go since it is the most efficient. But the others are around for convenience in cases where you need a new string built anyway. Great explanation, thank you!
Re: Speed of csvReader
On Thursday, 21 January 2016 at 22:20:28 UTC, H. S. Teoh wrote: On Thu, Jan 21, 2016 at 10:09:24PM +, Jon D via Digitalmars-d-learn wrote: [...] FWIW - I've been implementing a few programs manipulating delimited files, e.g. tab-delimited. Simpler than CSV files because there is no escaping inside the data. I've been trying to do this in relatively straightforward ways, e.g. using byLine rather than byChunk. (Goal is to explore the power of D standard libraries). I've gotten significant speed-ups in a couple different ways: * DMD libraries 2.068+ - byLine is dramatically faster * LDC 0.17 (alpha) - Based on DMD 2.068, and faster than the DMD compiler While byLine has improved a lot, it's still not the fastest thing in the world, because it still performs (at least) one OS roundtrip per line, not to mention it will auto-reencode to UTF-8. If your data is already in a known encoding, reading in the entire file and casting to (|w|d)string then splitting it by line will be a lot faster, since you can eliminate a lot of I/O roundtrips that way. No disagreement, but I had other goals. At a high level, I'm trying to learn and evaluate D, which partly involves understanding the strengths and weaknesses of the standard library. From this perspective, byLine was a logical starting point. More specifically, the tools I'm writing are often used in unix pipelines, so input can be a mixture of standard input and files. And, the files can be arbitrarily large. In these cases, reading the entire file is not always appropriate. Buffering usually is, and my code knows when it is dealing with files vs standard input and could handle these differently. However, standard library code could handle these distinctions as well, which was part of the reason for trying the straightforward approach. Aside - Despite the 'learning D' motivation, the tools are real tools, and writing them in D has been a clear win, especially with the byLine performance improvements in 2.068.
Re: Speed of csvReader
On Thursday, 21 January 2016 at 09:39:30 UTC, data pulverizer wrote: I have been reading large text files with D's csv file reader and have found it slow compared to R's read.table function which is not known to be particularly fast. FWIW - I've been implementing a few programs manipulating delimited files, e.g. tab-delimited. Simpler than CSV files because there is no escaping inside the data. I've been trying to do this in relatively straightforward ways, e.g. using byLine rather than byChunk. (Goal is to explore the power of D standard libraries). I've gotten significant speed-ups in a couple different ways: * DMD libraries 2.068+ - byLine is dramatically faster * LDC 0.17 (alpha) - Based on DMD 2.068, and faster than the DMD compiler * Avoid utf-8 to dchar conversion - This conversion often occurs silently when working with ranges, but is generally not needed when manipulating data. * Avoid unnecessary string copies. e.g. Don't gratuitously convert char[] to string. At this point performance of the utilities I've been writing is quite good. They don't have direct equivalents with other tools (such as gnu core utils), so a head-to-head is not appropriate, but generally it seems the tools are quite competitive without needing to do my own buffer or memory management. And, they are dramatically faster than the same tools written in perl (which I was happy with). --Jon
function argument accepting function or delegate?
My underlying question is how to compose functions taking functions as arguments, while allowing the caller the flexibility to pass either a function or delegate. Simply declaring an argument as either a function or delegate seems to prohibit the other. Overloading works. Are there better ways? An example: auto callIntFn (int function(int) f, int x) { return f(x); } auto callIntDel (int delegate(int) f, int x) { return f(x); } auto callIntFnOrDel (int delegate(int) f, int x) { return f(x); } auto callIntFnOrDel (int function(int) f, int x) { return f(x); } void main(string[] args) { alias AddN = int delegate(int); AddN makeAddN(int n) { return x => x + n; } auto addTwo = makeAddN(2);// Delegate int function(int) addThree = x => x + 3; // Function // assert(callIntFn(addTwo, 4) == 6); // Compile error // assert(callIntDel(addThree, 4) == 7); // Compile error assert(callIntDel(addTwo, 4) == 6); assert(callIntFn(addThree, 4) == 7); assert(callIntFnOrDel(addTwo, 4) == 6); assert(callIntFnOrDel(addThree, 4) == 7); } ---Jon
Re: function argument accepting function or delegate?
On Sunday, 17 January 2016 at 06:49:23 UTC, rsw0x wrote: On Sunday, 17 January 2016 at 06:27:41 UTC, Jon D wrote: My underlying question is how to compose functions taking functions as arguments, while allowing the caller the flexibility to pass either a function or delegate. [...] Templates are an easy way. --- auto call(F, Args...)(F fun, auto ref Args args) { return fun(args); } --- Would probably look nicer with some constraints from std.traits. Thanks much, that works!
Re: Silicon Valley D Meetup December 17, 2015
On Friday, 18 December 2015 at 16:01:48 UTC, Andrei Alexandrescu wrote: On 12/17/2015 10:07 PM, Ali Cehreli wrote: On Thursday, 17 December 2015 at 17:41:30 UTC, Ali Çehreli wrote: On 12/12/2015 05:03 PM, Ali Çehreli wrote: Our guest speaker is Steven Schveighoffer. He will present "Mutability wildcards in D": How was it? -- Andrei From a newcomer's perspective (my 2nd meet-up) - Excellent. Steve's presentation improved my understanding of the language, and the opportunity for discussions with core members of the D community is fantastic. Thanks to Steve, Ali, and Truedat for putting this together. --Jon
Re: We need better documentation for functions with ranges and templates
On Monday, 14 December 2015 at 19:04:46 UTC, bachmeier wrote: Something has to be done with the documentation for Phobos functions that involve ranges and templates. Many useful ideas in this thread. One I don't recall seeing - a standard way to denote whether a routine is lazy or eager. I'm finding this to be a key piece of information. Many standard library routines document this in the description, but presence and presentation is not very consistent. It'd be nice to have this presented in a standard way for routines operating on ranges. --Jon
Re: Why should file names intended for executables be valid identifiers?
On Tuesday, 15 December 2015 at 03:31:18 UTC, Shriramana Sharma wrote: For instance, hyphens are often used as part of executable names on Linux, but if I do this: $ dmd usage-printer.d I get the following error: usage-printer.d: Error: module usage-printer has non-identifier characters in filename, use module declaration instead Try adding the line: module usage_printer; at the top of the file. This overrides the default module name (same as file name). --Jon
Re: Reason for 'static struct'
On Wednesday, 9 December 2015 at 21:23:03 UTC, Daniel Kozák wrote: V Wed, 09 Dec 2015 21:10:43 + Jon D via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> napsáno: There is a fair bit of range related code in the standard library structured like: auto MyRange(Range)(Range r) if (isInputRange!Range) { static struct Result { private Range source; // define empty, front, popFront, etc } return Result(r); } I'm curious about what declaring the Result struct as 'static' does, and if there are use cases where it be better to exclude the static qualifier. --Jon It make it non-nested struct: https://dlang.org/spec/struct.html#nested Thanks. So, is in the example above, would the advantage be that 'static' avoids saving the enclosing state, which is not needed?
Reason for 'static struct'
There is a fair bit of range related code in the standard library structured like: auto MyRange(Range)(Range r) if (isInputRange!Range) { static struct Result { private Range source; // define empty, front, popFront, etc } return Result(r); } I'm curious about what declaring the Result struct as 'static' does, and if there are use cases where it be better to exclude the static qualifier. --Jon
Re: block file reads and lazy utf-8 decoding
On Thursday, 10 December 2015 at 00:36:27 UTC, Jon D wrote: Question I have is if there is a better way to do this. For example, a different way to construct the lazy 'decodeUTF8Range' rather than writing it out in this fashion. A further thought - The decodeUTF8Range function is basically constructing a lazy wrapper range around decodeFront, which is effectively combining a 'front' and 'popFront' operation. So perhaps a generic way to compose a wrapper for such functions. auto decodeUTF8Range(Range)(Range charSource) if (isInputRange!Range && is(Unqual!(ElementType!Range) == char)) { static struct Result { private Range source; private dchar next; bool empty = false; dchar front() @property { return next; } void popFront() { if (source.empty) { empty = true; next = dchar.init; } else { next = source.decodeFront; } } } auto r = Result(charSource); r.popFront; return r; }
block file reads and lazy utf-8 decoding
I want to combine block reads with lazy conversion of utf-8 characters to dchars. Solution I came with is in the program below. This works fine. Has good performance, etc. Question I have is if there is a better way to do this. For example, a different way to construct the lazy 'decodeUTF8Range' rather than writing it out in this fashion. There is quite a bit of power in the library and I'm still learning it. I'm wondering if I overlooked a useful alternative. --Jon Program: --- import std.algorithm: each, joiner, map; import std.conv; import std.range; import std.stdio; import std.traits; import std.utf: decodeFront; auto decodeUTF8Range(Range)(Range charSource) if (isInputRange!Range && is(Unqual!(ElementType!Range) == char)) { static struct Result { private Range source; private dchar next; bool empty = false; dchar front() @property { return next; } void popFront() { if (source.empty) { empty = true; next = dchar.init; } else { next = source.decodeFront; } } } auto r = Result(charSource); r.popFront; return r; } void main(string[] args) { if (args.length != 2) { writeln("Provide one file name."); return; } ubyte[1024*1024] rawbuf; auto inputStream = args[1].File(); inputStream .byChunk(rawbuf)// Read in blocks .joiner // Join the blocks into a single input char range .map!(a => to!char(a)) // Cast ubyte to char for decodeFront. Any better ways? .decodeUTF8Range// utf8 to dchar conversion. .each; // Real work goes here. writeln("done"); }
Re: Wiki article: Starting as a Contributor
On Tuesday, 1 December 2015 at 18:58:37 UTC, Jack Stouffer wrote: On Monday, 3 August 2015 at 21:25:35 UTC, Andrei Alexandrescu wrote: I had to set up dmd and friends on a fresh Ubuntu box, so I thought I'd document the step-by-step process: http://wiki.dlang.org/Starting_as_a_Contributor Due to a realization that there were three places were contributing info was held on the wiki, I have merged the pages into this one as best as I could. This page now holds everything someone should need to get started. I suggest also having the description of the legal aspects of contributing identified in an easier to find location. There is a brief summary of copyright assignment in the Starting as a Contributor page (http://wiki.dlang.org/Starting_as_a_Contributor#Copyright_assignment), but it's not particularly easy to find. Similarly regarding licensing. I was able to find two statements in the FAQ page ("Is D open source", "Why does the standard library use the boost license? Why not public domain", but wasn't especially easy to find these. Could be I'm just looking in the wrong places for this info, but a clear link from the home page might be worthwhile. --Jon
Re: copy and array length vs capacity. (Doc suggestion?)
On Tuesday, 24 November 2015 at 01:00:40 UTC, Steven Schveighoffer wrote: On 11/23/15 7:29 PM, Ali Çehreli wrote: On 11/23/2015 04:03 PM, Steven Schveighoffer wrote: > On 11/23/15 4:29 PM, Jon D wrote: >> In the example I gave, what I was really wondering was if there is a >> difference between allocating with 'new' or with 'reserve', or with >> 'length', for that matter. That is, is there a material difference >> between: >> >> auto x = new int[](n); >> int[] y; y.length = n; > > There is no difference at all, other than the function that is called > (the former will call an allocation function, the latter will call a > length setting function, which then will determine if more data is > needed, and finding it is, call the allocation function). Although Jon's example above does not compare reserve, I have to ask: How about non-trivial types? Both cases above would set all elements to ..init, right? So, I think reserve would be faster if copy() knew how to take advantage of capacity. It could emplace elements instead of copying, no? I think the cost of looking up the array metadata is more than the initialization of elements to .init. However, using an Appender would likely fix all these problems. You could also use https://dlang.org/phobos/std_array.html#uninitializedArray to create the array before copying. There are quite a few options, actually :) A delegate is also surprisingly considered an output range! Because why not? So you can do this too as a crude substitute for appender (or for testing performance): import std.range; // for iota import std.algorithm; void main() { int[] arr; arr.reserve(100); iota(100).copy((int a) { arr ~= a;}); } -Steve Thanks. I was also wondering if that initial allocation could be avoided. Code I was writing involved repeatedly using a buffer in a loop. I was trying out taskPool.amap, which needs a random access range. This meant copying from the input range being read. Something like: auto input = anInfiniteRange(); auto bufsize = workPerThread * taskPool.size(); auto workbuf = new int[](bufsize); auto results = new int[](bufsize); while (true) { input.take(bufsize).copy(workbuf); input.popFront(bufsize); taskPool.amap!expensiveCalc(workbuf, workPerThread, results); results.doSomething(); } I'm just writing a toy example, but it is where these questions came from. For this example, the next step would be to allow the buffer size to change while iterating. --Jon
Re: copy and array length vs capacity. (Doc suggestion?)
On Monday, 23 November 2015 at 15:19:08 UTC, Steven Schveighoffer wrote: On 11/21/15 10:19 PM, Jon D wrote: On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis wrote: Honestly, arrays suck as output ranges. They don't get appended to; they get filled, and for better or worse, the documentation for copy is probably assuming that you know that. If you want your array to be appended to when using it as an output range, then you need to use std.array.Appender. Hi Jonathan, thanks for the reply and the info about std.array.Appender. I was actually using copy to fill an array, not append. However, I also wanted to preallocate the space. And, since I'm mainly trying to understand the language, I was also trying to figure out the difference between these two forms of creating a dynamic array with an initial size: auto x = new int[](n); int[] y; y.reserve(n); If you want to change the size of the array, use length: y.length = n; This will extend y to the correct length, automatically reserving a block of data that can hold it, and allow you to write to the array. All reserve does is to make sure there is enough space so you can append that much data to it. It is not relevant to your use case. The obvious difference is that first initializes n values, the second form does not. I'm still unclear if there are other material differences, or when one might be preferred over the other :) It's was in this context the behavior of copy surprised me, that it wouldn't operate on the second form without first filling in the elements. If this seems unclear, I can provide a slightly longer sample showing what I was doing. extending length affects the given array, extending if necessary. reserve is ONLY relevant if you are using appending (arr ~= x). It doesn't actually affect the "slice" or the variable you are using, at all (except to possibly point it at newly allocated space). copy uses an "output range" as it's destination. The output range supports taking elements and putting them somewhere. In the case of a simple array, putting them somewhere means assigning to the first element, and then moving to the next one. -Steve Thanks for the reply. And for your article (which Jonathan recommended). It clarified a number of things. In the example I gave, what I was really wondering was if there is a difference between allocating with 'new' or with 'reserve', or with 'length', for that matter. That is, is there a material difference between: auto x = new int[](n); int[] y; y.length = n; I can imagine that the first might be faster, but otherwise there appears no difference. As the article stresses, the question is the ownership model. If I'm understanding, both cause an allocation into the runtime managed heap. --Jon
Re: copy and array length vs capacity. (Doc suggestion?)
On Sunday, 22 November 2015 at 00:10:07 UTC, Ali Çehreli wrote: May I suggest that you improve that page. ;) If you don't already have a clone o the repo, you can do it easily by clicking the "Improve this page" button on that page. Hi Ali, thanks for the quick response. And point taken :) I hadn't noticed those buttons on the doc pages, looks very convenient. There are a couple formalities I need to look into before making contributions, even small ones, but I'll check into these. Regarding why copy() cannot use the capacity of the slice, it is because slices don't know about each other, so, copy could not let other slices know that the capacity has just been used by this particular slice. Thanks for the explanation, very helpful understanding what's going on. --Jon
copy and array length vs capacity. (Doc suggestion?)
Something I found confusing was the relationship between array capacity and copy(). A short example: void main() { import std.algorithm: copy; auto a = new int[](3); assert(a.length == 3); [1, 2, 3].copy(a); // Okay int[] b; b.reserve(3); assert(b.capacity >= 3); assert(b.length == 0); [1, 2, 3].copy(b); // Error } I had expected that copy() would work if the target had sufficient capacity, but that's not the case. Target has to have sufficient length. If I've understood this correctly, a small change to the documentation for copy() might make this clearer. In particular, the "precondition" section: Preconditions: target shall have enough room to accomodate the entirety of source. Clarifying that "enough room" means 'length' rather than 'capacity' might be beneficial.
Re: copy and array length vs capacity. (Doc suggestion?)
On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis wrote: Honestly, arrays suck as output ranges. They don't get appended to; they get filled, and for better or worse, the documentation for copy is probably assuming that you know that. If you want your array to be appended to when using it as an output range, then you need to use std.array.Appender. Hi Jonathan, thanks for the reply and the info about std.array.Appender. I was actually using copy to fill an array, not append. However, I also wanted to preallocate the space. And, since I'm mainly trying to understand the language, I was also trying to figure out the difference between these two forms of creating a dynamic array with an initial size: auto x = new int[](n); int[] y; y.reserve(n); The obvious difference is that first initializes n values, the second form does not. I'm still unclear if there are other material differences, or when one might be preferred over the other :) It's was in this context the behavior of copy surprised me, that it wouldn't operate on the second form without first filling in the elements. If this seems unclear, I can provide a slightly longer sample showing what I was doing. --Jon
compatible types for chains of different lengths
I'd like to chain several ranges and operate on them. However, if the chains are different lengths, the data type is different. This makes it hard to use in a general way. There is likely an alternate way to do this that I'm missing. A short example: $ cat chain.d import std.stdio; import std.range; import std.algorithm; void main(string[] args) { auto x1 = ["abc", "def", "ghi"]; auto x2 = ["jkl", "mno", "pqr"]; auto x3 = ["stu", "vwx", "yz"]; auto chain1 = (args.length > 1) ? chain(x1, x2) : chain(x1); auto chain2 = (args.length > 1) ? chain(x1, x2, x3) : chain(x1, x2); chain1.joiner(", ").writeln; chain2.joiner(", ").writeln; } $ dmd chain.d chain.d(10): Error: incompatible types for ((chain(x1, x2)) : (chain(x1))): 'Result' and 'string[]' chain.d(11): Error: incompatible types for ((chain(x1, x2, x3)) : (chain(x1, x2))): 'Result' and 'Result' Is there a different way to do this? --Jon
Re: compatible types for chains of different lengths
On Tuesday, 17 November 2015 at 23:22:58 UTC, Brad Anderson wrote: One solution: [snip] Thanks for the quick response. Extending your example, here's another style that works and may be nicer in some cases. import std.stdio; import std.range; import std.algorithm; void main(string[] args) { auto x1 = ["abc", "def", "ghi"]; auto x2 = ["jkl", "mno", "pqr"]; auto x3 = ["stu", "vwx", "yz"]; auto y1 = (args.length > 1) ? x1 : []; auto y2 = (args.length > 2) ? x2 : []; auto y3 = (args.length > 3) ? x3 : []; chain(y1, y2, y3).joiner(", ").writeln; }
Preferred behavior of take() with ranges (value vs reference range)
Just started looking at D, very promising! One of the first programs I constructed involved infinite sequences. A design question that showed up is whether to construct the range as a struct/value, or class/reference. It appears that structs/values are more the norm, but there are exceptions, notably refRange. I'm wondering if there are any community best practices or guidelines in this area. One key difference is the behavior of take(). If the range is a value/struct, take() does not consume elements. If it's a ref/class, it does consume elements. From a consistency perspective, it'd seem useful if the behavior was consistent as much as possible. Here's an example of the behavior differences below. It uses refRange, but same behavior occurs if the range is created as a class rather than a struct. import std.range; import std.algorithm; void main() { auto fib1 = recurrence!((a,n) => a[n-1] + a[n-2])(1, 1); auto fib2 = recurrence!((a,n) => a[n-1] + a[n-2])(1, 1); auto fib3 = refRange(); // Struct/value based range - take() does not consume elements assert(fib1.take(7).equal([1, 1, 2, 3, 5, 8, 13])); assert(fib1.take(7).equal([1, 1, 2, 3, 5, 8, 13])); fib1.popFrontN(7); assert(fib1.take(7).equal([21, 34, 55, 89, 144, 233, 377])); // Reference range (fib3) - take() consumes elements assert(fib2.take(7).equal([1, 1, 2, 3, 5, 8, 13])); assert(fib3.take(7).equal([1, 1, 2, 3, 5, 8, 13])); assert(fib3.take(7).equal([21, 34, 55, 89, 144, 233, 377])); assert(fib2.take(7).equal([610, 987, 1597, 2584, 4181, 6765, 10946])); assert(fib2.take(7).equal([610, 987, 1597, 2584, 4181, 6765, 10946])); } --Jon
Re: Preferred behavior of take() with ranges (value vs reference range)
On Monday, 9 November 2015 at 02:44:48 UTC, TheFlyingFiddle wrote: On Monday, 9 November 2015 at 02:14:58 UTC, Jon D wrote: Here's an example of the behavior differences below. It uses refRange, but same behavior occurs if the range is created as a class rather than a struct. --Jon This is an artifact of struct based ranges being value types. When you use take the range get's copied into another structure that is also a range but limits the number of elements you take from that range. ... If you want a more indepth explanation there were two talks at Dconf this year that (in part) discussed this topic. (https://www.youtube.com/watch?v=A8Btr8TPJ8c, https://www.youtube.com/watch?v=QdMdH7WX2ew=PLEDeq48KhndP-mlE-0Bfb_qPIMA4RrrKo=14) Thanks for the quick reply. The two videos were very helpful. I understood what was happening underneath (mostly), but the videos made it clear there are a number of open questions regarding reference and value ranges and how best to use them.