Re: to compose or hack?
On Wednesday, 7 July 2021 at 01:44:20 UTC, Steven Schveighoffer wrote: This is pretty minimal, but does what I want it to do. Is it ready for inclusion in Phobos? Not by a longshot! A truly generic interleave would properly forward everything else that the range supports (like `length`, `save`, etc). But it got me thinking, how often do people roll their own vs. trying to compose using existing Phobos nuggets? I found this pretty satisfying, even if I didn't test it to death and maybe I use it only in one place. Do you find it difficult to use Phobos in a lot of situations to compose your specialized ranges? I try to compose using existing Phobos facilities, but don't hesitate to write my own ranges. The reasons are usually along the lines you describe. For one, range creation is easy in D, consistent with the pro/con tradeoffs described in the thread/talk [Iterator and Ranges: Comparing C++ to D to Rust](https://forum.dlang.org/thread/diexjstekiyzgxlic...@forum.dlang.org). Another is that if application/task specific logic is involved, it is often simpler/faster to just incorporate it into the range rather than figure out how to factor it out of the more general range. Especially if the range is not going to be used much. --Jon
Re: Need for speed
On Thursday, 1 April 2021 at 19:55:05 UTC, H. S. Teoh wrote: On Thu, Apr 01, 2021 at 07:25:53PM +, matheus via Digitalmars-d-learn wrote: [...] Since this is a "Learn" part of the Foruam, be careful with "-boundscheck=off". I mean for this little snippet is OK, but for a other projects this my be wrong, and as it says here: https://dlang.org/dmd-windows.html#switch-boundscheck "This option should be used with caution and as a last resort to improve performance. Confirm turning off @safe bounds checks is worthwhile by benchmarking." [...] It's interesting that whenever a question about D's performance pops up in the forums, people tend to reach for optimization flags. I wouldn't say it doesn't help; but I've found that significant performance improvements can usually be obtained by examining the code first, and catching common newbie mistakes. Those usually account for the majority of the observed performance degradation. Only after the code has been cleaned up and obvious mistakes fixed, is it worth reaching for optimization flags, IMO. This is my experience as well, and not just for D. Pick good algorithms and pay attention to memory allocation. Don't go crazy on the latter. Many people try to avoid GC at all costs, but I don't usually find it necessary to go quite that far. Very often simply reusing already allocated memory does the trick. The blog post I wrote a few years ago focuses on these ideas: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ --Jon
Re: Trying to reduce memory usage
On Tuesday, 23 February 2021 at 00:08:40 UTC, tsbockman wrote: On Friday, 19 February 2021 at 00:13:19 UTC, Jon Degenhardt wrote: It would be interesting to see how the performance compares to tsv-uniq (https://github.com/eBay/tsv-utils/tree/master/tsv-uniq). The prebuilt binaries turn on all the optimizations (https://github.com/eBay/tsv-utils/releases). My program (called line-dedup below) is modestly faster than yours, with the gap gradually widening as files get bigger. Similarly, when not using a memory-mapped scratch file, my program is modestly less memory hungry than yours, with the gap gradually widening as files get bigger. In neither case is the difference very exciting though; the real benefit of my algorithm is that it can process files too large for physical memory. It might also handle frequent hash collisions better, and could be upgraded to handle huge numbers of very short lines efficiently. Thanks for running the comparison! I appreciate seeing how other implementations compare. I'd characterize the results a differently though. Based on the numbers, line-dedup is materially faster than tsv-uniq, at least on the tests run. To your point, it may not make much practical difference on data sets that fit in memory. tsv-uniq is fast enough for most needs. But it's still a material performance delta. Nice job! I agree also that the bigger pragmatic benefit is fast processing of files much larger than will fit in memory. There are other useful problems like this. One I often need is creating a random weighted ordering. Easy to do for data sets that fit in memory, but hard to do fast for data sets that do not. --Jon
Re: Trying to reduce memory usage
On Wednesday, 17 February 2021 at 04:10:24 UTC, tsbockman wrote: I spent some time experimenting with this problem, and here is the best solution I found, assuming that perfect de-duplication is required. (I'll put the code up on GitHub / dub if anyone wants to have a look.) It would be interesting to see how the performance compares to tsv-uniq (https://github.com/eBay/tsv-utils/tree/master/tsv-uniq). The prebuilt binaries turn on all the optimizations (https://github.com/eBay/tsv-utils/releases). tsv-uniq wasn't included in the different comparative benchmarks I published, but I did run my own benchmarks and it holds up well. However, it should not be hard to beat it. What might be more interesting is what the delta is. tsv-uniq is using the most straightforward approach of popping things into an associate array. No custom data structures. Enough memory is required to hold all the unique keys in memory, so it won't handle arbitrarily large data sets. It would be interesting to see how the straightforward approach compares with the more highly tuned approach. --Jon
Re: std.algorithm.splitter on a string not always bidirectional
On Friday, 22 January 2021 at 17:29:08 UTC, Steven Schveighoffer wrote: On 1/22/21 11:57 AM, Jon Degenhardt wrote: I think the idea is that if a construct like 'xyz.splitter(args)' produces a range with the sequence of elements {"a", "bc", "def"}, then 'xyz.splitter(args).back' should produce "def". But, if finding the split points starting from the back results in something like {"f", "de", "abc"} then that relationship hasn't held, and the results are unexpected. But that is possible with all 3 splitter variants. Why is one allowed to be bidirectional and the others are not? I'm not defending it, just explaining what I believe the thinking was based on the examination I did. It wasn't just looking at the code, there was a discussion somewhere. A forum discussion, PR discussion, bug or code comments. Something somewhere, but I don't remember exactly. However, to answer your question - The relationship described is guaranteed if the basis for the split is a single element. If the range is a string, that's a single 'char'. If the range is composed of integers, then a single integer. Note that if the basis for the split is itself a range, then the relationship described is not guaranteed. Personally, I can see a good argument that bidirectionality should not be supported in any of these cases, and instead force the user to choose between eager splitting or reversing the range via retro. For the common case of strings, the further argument could be made that the distinction between char and dchar is another point of inconsistency. Regardless whether the choices made were the best choices, there was some thinking that went into it, and it is worth understanding the thinking when considering changes. --Jon
Re: std.algorithm.splitter on a string not always bidirectional
On Friday, 22 January 2021 at 14:14:50 UTC, Steven Schveighoffer wrote: On 1/22/21 12:55 AM, Jon Degenhardt wrote: On Friday, 22 January 2021 at 05:51:38 UTC, Jon Degenhardt wrote: On Thursday, 21 January 2021 at 22:43:37 UTC, Steven Schveighoffer wrote: auto sp1 = "a|b|c".splitter('|'); writeln(sp1.back); // ok auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v)); writeln(sp2.back); // error, not bidirectional Why? is it an oversight, or is there a good reason for it? I believe the reason is two-fold. First, splitter is lazy. Second, the range splitting is defined in the forward direction, not the reverse direction. A bidirectional range is only supported if it is guaranteed that the splits will occur at the same points in the range when run in either direction. That's why the single element delimiter is supported. Its clearly the case for the predicate function in your example. If that's known to be always true then perhaps it would make sense to enhance splitter to generate bidirectional results in this case. Note that the predicate might use a random number generator to pick the split points. Even for same sequence of random numbers, the split points would be different if run from the front than if run from the back. I think this isn't a good explanation. All forms of splitter accept a predicate (including the one which supports a bi-directional result). Many other phobos algorithms that accept a predicate provide bidirectional support. The splitter result is also a forward range (which makes no sense in the context of random splits). Finally, I'd suggest that even if you split based on a subrange that is also bidirectional, it doesn't make sense that you couldn't split backwards based on that. Common sense says a range split on substrings is the same whether you split it forwards or backwards. I can do this too (and in fact I will, because it works, even though it's horrifically ugly): auto sp3 = "a.b|c".splitter!((c, unused) => !isAlphaNum(c))('?'); writeln(sp3.back); // ok Looking at the code, it looks like the first form of spltter uses a different result struct than the other two (which have a common implementation). It just needs cleanup. -Steve I think the idea is that if a construct like 'xyz.splitter(args)' produces a range with the sequence of elements {"a", "bc", "def"}, then 'xyz.splitter(args).back' should produce "def". But, if finding the split points starting from the back results in something like {"f", "de", "abc"} then that relationship hasn't held, and the results are unexpected. Note that in the above example, 'xyz.retro.splitter(args)' might produce {"f", "ed", "cba"}, so again not the same. Another way to look at it: If split (eager) took a predicate, that 'xyz.splitter(args).back' and 'xyz.split(args).back' should produce the same result. But they will not with the example given. I believe these consistency issues are the reason why the bidirectional support is limited. Note: I didn't design any of this, but I did redo the examples in the documentation at one point, which is why I looked at this. --Jon
Re: std.algorithm.splitter on a string not always bidirectional
On Friday, 22 January 2021 at 05:51:38 UTC, Jon Degenhardt wrote: On Thursday, 21 January 2021 at 22:43:37 UTC, Steven Schveighoffer wrote: auto sp1 = "a|b|c".splitter('|'); writeln(sp1.back); // ok auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v)); writeln(sp2.back); // error, not bidirectional Why? is it an oversight, or is there a good reason for it? -Steve I believe the reason is two-fold. First, splitter is lazy. Second, the range splitting is defined in the forward direction, not the reverse direction. A bidirectional range is only supported if it is guaranteed that the splits will occur at the same points in the range when run in either direction. That's why the single element delimiter is supported. Its clearly the case for the predicate function in your example. If that's known to be always true then perhaps it would make sense to enhance splitter to generate bidirectional results in this case. --Jon Note that the predicate might use a random number generator to pick the split points. Even for same sequence of random numbers, the split points would be different if run from the front than if run from the back.
Re: std.algorithm.splitter on a string not always bidirectional
On Thursday, 21 January 2021 at 22:43:37 UTC, Steven Schveighoffer wrote: auto sp1 = "a|b|c".splitter('|'); writeln(sp1.back); // ok auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v)); writeln(sp2.back); // error, not bidirectional Why? is it an oversight, or is there a good reason for it? -Steve I believe the reason is two-fold. First, splitter is lazy. Second, the range splitting is defined in the forward direction, not the reverse direction. A bidirectional range is only supported if it is guaranteed that the splits will occur at the same points in the range when run in either direction. That's why the single element delimiter is supported. Its clearly the case for the predicate function in your example. If that's known to be always true then perhaps it would make sense to enhance splitter to generate bidirectional results in this case. --Jon
Re: Why is BOM required to use unicode in tokens?
On Tuesday, 15 September 2020 at 14:59:03 UTC, Steven Schveighoffer wrote: On 9/15/20 10:18 AM, James Blachly wrote: What will it take (i.e. order of difficulty) to get this fixed -- will merely a bug report (and PR, not sure if I can tackle or not) do it, or will this require more in-depth discussion with compiler maintainers? I'm thinking your issue will not be fixed (just like we don't allow $abc to be an identifier). But the spec can be fixed to refer to the correct standards. Looks like it has to do with the '∂' character. But non-ascii alphabetic characters work generally. # The 'Ш' and 'ä' characters are fine. $ echo $'import std.stdio; void Шä() { writeln("Hello World!"); } void main() { Шä(); }' | dmd -run - Hello World! # But not '∂' $ echo $'import std.stdio; void x∂() { writeln("Hello World!"); } void main() { x∂(); }' | dmd -run - __stdin.d(1): Error: char 0x2202 not allowed in identifier __stdin.d(1): Error: character 0x2202 is not a valid token __stdin.d(1): Error: char 0x2202 not allowed in identifier __stdin.d(1): Error: character 0x2202 is not a valid token However, 'Ш' and 'ä' satisfy the definition of a Unicode letter, '∂' does not. (Using D's current Unicode definitions). I'll use tsv-filter (from tsv-utils) to show this rather than writing out the full D code. But, this uses std.regex.matchFirst(). # The input $ echo $'x\n∂\nШ\nä' x ∂ Ш ä The input filtered by Unicode letter '\p{L}' $ echo $'x\n∂\nШ\nä' | tsv-filter --regex 1:'^\p{L}$' x Ш ä The spec can be made more clear and correct. But if a "universal alpha" is essentially about Unicode letters you might be looking for a change in the spec to use the symbol chosen. --Jon
Re: Why is BOM required to use unicode in tokens?
On Tuesday, 15 September 2020 at 02:23:31 UTC, Paul Backus wrote: On Tuesday, 15 September 2020 at 01:49:13 UTC, James Blachly wrote: I wish to write a function including ∂x and ∂y (these are trivial to type with appropriate keyboard shortcuts - alt+d on Mac), but without a unicode byte order mark at the beginning of the file, the lexer rejects the tokens. It is not apparently easy to insert such marks (AFAICT no common tool does this specifically), while other languages work fine (i.e., accept unicode in their source) without it. Is there a downside to at least presuming UTF-8? According to the spec [1] this should Just Work. I'd recommend filing a bug. [1] https://dlang.org/spec/lex.html#source_text Under the identifiers section (https://dlang.org/spec/lex.html#identifiers) it describes identifiers as: Identifiers start with a letter, _, or universal alpha, and are followed by any number of letters, _, digits, or universal alphas. Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D of the C99 Standard. I was unable to find the definition of a "universal alpha", or whether that includes non-ascii alphabetic characters.
Re: Install multiple executables with DUB
On Friday, 4 September 2020 at 07:27:33 UTC, glis-glis wrote: On Thursday, 3 September 2020 at 14:34:48 UTC, Jacob Carlborg wrote: Oh, multiple binaries, I missed that. You can try to add multiple configurations [1]. Or if you have executables depending on only one source file, you can use single-file packages [2]. Thanks, but this still means I would have to write an install-script running `dub build --single` on each script, right? I looked at tsv-utils [1] which seems to be a similar use-case as mine, and they declare each tool as a subpackage. The main package runs a d-file called `dub_build.d` which compiles all subpackages. Fells like an overkill to me, I'll probably just stick to a makefile. [1] https://github.com/eBay/tsv-utils/blob/master/docs/AboutTheCode.md#building-and-makefile The `dub_build.d` is so that people can use `$ dub fetch` to download and build the tools with `$ dub run`, from code.dlang.org. dub fetch/run is the typical dub sequence. But it's awkward. And it geared toward users that have a D compiler plus dub already installed. For building your own binaries you might as well use `make`. However, if you decide to add your tools to the public dub package registry you might consider the technique. My understanding is that the dub developers recognize that multiple binaries are inconvenient at present and have ideas on improvements. Having a few more concrete use cases might help nail down the requirements. The tsv-utils directory layout may be worth a look. It's been pretty successful for multiple binaries in a single repo with some shared code. (Different folks made suggestions leading to this structure.) It works for both make and dub, and works well with other tools, like dlpdocs (Adam Ruppe's doc generator). The tsv-utils `make` setup is quite messy at this point, you can probably do quite a bit better. --Jon
Re: How to get the element type of an array?
On Tuesday, 25 August 2020 at 15:02:14 UTC, FreeSlave wrote: On Tuesday, 25 August 2020 at 03:41:06 UTC, Jon Degenhardt wrote: What's the best way to get the element type of an array at compile time? Something like std.range.ElementType except that works on any array type. There is std.traits.ForeachType, but it wasn't clear if that was the right thing. --Jon Why not just use typeof(a[0]) It does not matter if array is empty or not. Typeof does not actually evaluate its expression, just the type. Wow, yet another way that should have been obvious! Thanks! --Jon
Re: How to get the element type of an array?
On Tuesday, 25 August 2020 at 12:50:35 UTC, Steven Schveighoffer wrote: The situation is still confusing though. If only 'std.range.ElementType' is imported, a static array does not have a 'front' member, but ElementType still gets the correct type. (This is where the documentation says it'll return void.) You are maybe thinking of how C works? D imports are different, the code is defined the same no matter how it is imported. *your* module cannot see std.range.primitives.front, but the range module itself can see that UFCS function. This is a good characteristic. But the reason it surprised me was that I expected to be able to manually expand the ElementType (or ElementEncodingType) template see the results of the expressions it uses. template ElementType(R) { static if (is(typeof(R.init.front.init) T)) alias ElementType = T; else alias ElementType = void; } So, yes, I was expecting this to behave like an inline code expansion. Yesterday I was doing that for 'hasSlicing', which has a more complicated set of tests. I wanted to see exactly which expression in 'hasSlicing' was causing it to return false for a struct I wrote. (Turned out to be a test for 'length'.) I'll have to be more careful about this.
Re: How to get the element type of an array?
On Tuesday, 25 August 2020 at 05:02:46 UTC, Basile B. wrote: On Tuesday, 25 August 2020 at 03:41:06 UTC, Jon Degenhardt wrote: What's the best way to get the element type of an array at compile time? Something like std.range.ElementType except that works on any array type. There is std.traits.ForeachType, but it wasn't clear if that was the right thing. --Jon I'm curious to know what are the array types that were not accepted by ElementType ( or ElementEncodingType ) ? Interesting. I need to test static arrays. In fact 'ElementType' does work with static arrays. Which is likely what you expected. I assumed ElementType would not work, because static arrays don't satisfy 'isInputRange', and the documentation for ElementType says: The element type is determined as the type yielded by r.front for an object r of type R. [...] If R doesn't have front, ElementType!R is void. But, if std.range is imported, a static array does indeed get a 'front' member. It doesn't satisfy isInputRange, but it does have a 'front' element. The situation is still confusing though. If only 'std.range.ElementType' is imported, a static array does not have a 'front' member, but ElementType still gets the correct type. (This is where the documentation says it'll return void.) --- Import std.range --- @safe unittest { import std.range; ubyte[10] staticArray; ubyte[] dynamicArray = new ubyte[](10); static assert(is(ElementType!(typeof(staticArray)) == ubyte)); static assert(is(ElementType!(typeof(dynamicArray)) == ubyte)); // front is available static assert(__traits(compiles, staticArray.front)); static assert(__traits(compiles, dynamicArray.front)); static assert(is(typeof(staticArray.front) == ubyte)); static assert(is(typeof(dynamicArray.front) == ubyte)); } --- Import std.range.ElementType --- @safe unittest { import std.range : ElementType; ubyte[10] staticArray; ubyte[] dynamicArray = new ubyte[](10); static assert(is(ElementType!(typeof(staticArray)) == ubyte)); static assert(is(ElementType!(typeof(dynamicArray)) == ubyte)); // front is not available static assert(!__traits(compiles, staticArray.front)); static assert(!__traits(compiles, dynamicArray.front)); static assert(!is(typeof(staticArray.front) == ubyte)); static assert(!is(typeof(dynamicArray.front) == ubyte)); } This suggests the documentation for ElementType not quite correct.
Re: How to get the element type of an array?
On Tuesday, 25 August 2020 at 04:36:56 UTC, H. S. Teoh wrote: [...] Harry Gillanders, H.S. Teoh, Thank you both for the quick replies. Both methods address my needs. Very much appreciated, I was having trouble figuring this one out. --Jon
How to get the element type of an array?
What's the best way to get the element type of an array at compile time? Something like std.range.ElementType except that works on any array type. There is std.traits.ForeachType, but it wasn't clear if that was the right thing. --Jon
Re: getopt Basic usage
On Saturday, 15 August 2020 at 04:09:19 UTC, James Gray wrote: I am trying to use getopt and would not like the program to throw an unhandled exception when parsing command line options. Is the following, adapted from the first example in the getopt documentation, a reasonable approach? I use the approach you showed, except for writing errors to stderr and returning an exit status. This has worked fine. An example: https://github.com/eBay/tsv-utils/blob/master/number-lines/src/tsv_utils/number-lines.d#L48
Re: Reading from stdin significantly slower than reading file directly?
On Thursday, 13 August 2020 at 14:41:02 UTC, Steven Schveighoffer wrote: But for sure, reading from stdin doesn't do anything different than reading from a file if you are using the File struct. A more appropriate test might be using the shell to feed the file into the D program: dprogram < FILE Which means the same code runs for both tests. Indeed, using the 'prog < file' approach rather than 'cat file | prog' indeed removes any distinction for 'tsv-select'. 'tsv-select' uses File.rawRead rather than File.byLine.
Re: Reading from stdin significantly slower than reading file directly?
On Wednesday, 12 August 2020 at 22:44:44 UTC, methonash wrote: Hi, Relative beginner to D-lang here, and I'm very confused by the apparent performance disparity I've noticed between programs that do the following: 1) cat some-large-file | D-program-reading-stdin-byLine() 2) D-program-directly-reading-file-byLine() using File() struct The D-lang difference I've noticed from options (1) and (2) is somewhere in the range of 80% wall time taken (7.5s vs 4.1s), which seems pretty extreme. I don't know enough details of the implementation to really answer the question, and I expect it's a bit complicated. However, it's an interesting question, and I have relevant programs and data files, so I tried to get some actuals. The tests I ran don't directly answer the question posed, but may be a useful proxy. I used Unix 'cut' (latest GNU version) and 'tsv-select' from the tsv-utils package (https://github.com/eBay/tsv-utils). 'tsv-select' is written in D, and works like 'cut'. 'tsv-select' reads from stdin or a file via a 'File' struct. It's not using the built-in 'byLine' member though, it uses a version of 'byLine' that includes some additional buffering. Both stdin and a file system file are read this way. I used a file from the google ngram collection (http://storage.googleapis.com/books/ngrams/books/datasetsv2.html) and the file TREE_GRM_ESTN.csv from https://apps.fs.usda.gov/fia/datamart/CSV/datamart_csv.html, converted to a tsv file. The ngram file is a narrow file (21 bytes/line, 4 columns), the TREE file is wider (206 bytes/line, 49 columns). In both cases I cut the 2nd and 3rd columns. This tends to focus processing on input rather than processing and output. I also timed 'wc -l' for another data point. I ran the benchmarks 5 times each way and recorded the median time below. Machine used is a MacMini (so Mac OS) with 16 GB RAM and SSD drives. The numbers are very consisent for this test on this machine. Differences in the reported times are real deltas, not system noise. The commands timed were: * bash -c 'tsv-select -f 2,3 FILE > /dev/null' * bash -c 'cat FILE | tsv-select -f 2,3 > /dev/null' * bash -c 'gcut -f 2,3 FILE > /dev/null' * bash -c 'cat FILE | gcut -f 2,3 > /dev/null' * bash -c 'gwc -l FILE > /dev/null' * bash -c 'cat FILE | gwc -l > /dev/null' Note that 'gwc' and 'gcut' are the GNU versions of 'wc' and 'cut' installed by Homebrew. Google ngram file (the 's' unigram file): Test Elapsed System User --- -- tsv-select -f 2,3 FILE 10.280.42 9.85 cat FILE | tsv-select -f 2,311.101.45 10.23 cut -f 2,3 FILE 14.640.60 14.03 cat FILE | cut -f 2,3 14.361.03 14.19 wc -l FILE 1.320.39 0.93 cat FILE | wc -l 1.180.96 1.04 The TREE file: Test Elapsed System User --- -- tsv-select -f 2,3 FILE 3.770.95 2.81 cat FILE | tsv-select -f 2,3 4.542.65 3.28 cut -f 2,3 FILE 17.781.53 16.24 cat FILE | cut -f 2,3 16.772.64 16.36 wc -l FILE 1.380.91 0.46 cat FILE | wc -l 2.022.63 0.77 What this shows is that 'tsv-select' (D program) was faster when reading from a file than when reading from a standard input. It doesn't indicate why or whether the delta is due to code D library or code in 'tsv-select'. Interestingly, 'cut' showed the opposite behavior. It was faster when reading from standard input than when reading from the file. For 'wc', which method was faster was dependent on line length. Again, I caution against reading too much into this regarding performance of reading from standard input vs a disk file. Much more definitive tests can be done. However, it is an interesting comparison. Also, the D program is still fast in both cases. --Jon
Re: getopt: How does arraySep work?
On Thursday, 16 July 2020 at 17:40:25 UTC, Steven Schveighoffer wrote: On 7/16/20 1:13 PM, Andre Pany wrote: On Thursday, 16 July 2020 at 05:03:36 UTC, Jon Degenhardt wrote: On Wednesday, 15 July 2020 at 07:12:35 UTC, Andre Pany wrote: [...] An enhancement is likely to hit some corner-cases involving list termination requiring choices that are not fully generic. Any time a legal list value looks like a legal option. Perhaps the most important case is single digit numeric options like '-1', '-2'. These are legal short form options, and there are programs that use them. They are also somewhat common numeric values to include in command lines inputs. [...] My naive implementation would be that any dash would stop the list of multiple values. If you want to have a value containing a space or a dash, you enclose it with double quotes in the terminal. Enclose with double quotes in the terminal does nothing: myapp --modelicalibs "file-a.mo" "file-b.mo" will give you EXACTLY the same string[] args as: myapp --modelicalibs file-a.mo file-b.mo I think Jon's point is that it's difficult to distinguish where an array list ends if you get the parameters as separate items. Like: myapp --numbers 1 2 3 -5 -6 Is that numbers=> [1, 2, 3, -5, -6] or is it numbers=> [1, 2, 3], 5 => true, 6 => true This is probably why the code doesn't support that. -Steve Yes, this what I was getting. Thanks for the clarification. Also, it's not always immediately obvious what part of the argument splitting is being done by the shell, and what is being done by the program/getopt. Taking inspiration from the recent one-liners, here's way to see how the program gets the args from the shell for different command lines: $ echo 'import std.stdio; void main(string[] args) { args[1 .. $].writeln; }' | dmd -run - --numbers 1,2,3,-5,-6 ["--numbers", "1,2,3,-5,-6"] $ echo 'import std.stdio; void main(string[] args) { args[1 .. $].writeln; }' | dmd -run - --numbers 1 2 3 -5 -6 ["--numbers", "1", "2", "3", "-5", "-6"] $ echo 'import std.stdio; void main(string[] args) { args[1 .. $].writeln; }' | dmd -run - --numbers "1" "2" "3" "-5" "-6" ["--numbers", "1", "2", "3", "-5", "-6"] $ echo 'import std.stdio; void main(string[] args) { args[1 .. $].writeln; }' | dmd -run - --numbers '1 2 3 -5 -6' ["--numbers", "1 2 3 -5 -6"] The first case is what getopt supports now - All the values in a single string with a separator that getopt splits on. The 2nd and 3rd are identical from the program's perspective (Steve's point), but they've already been split, so getopt would need a different approach. And requires dealing with ambiguity. The fourth form eliminates the ambiguity, but puts the burden on the user to use quotes.
Re: getopt: How does arraySep work?
On Wednesday, 15 July 2020 at 07:12:35 UTC, Andre Pany wrote: On Tuesday, 14 July 2020 at 15:48:59 UTC, Andre Pany wrote: On Tuesday, 14 July 2020 at 14:33:47 UTC, Steven Schveighoffer wrote: On 7/14/20 10:22 AM, Steven Schveighoffer wrote: The documentation needs updating, it should say "parameters are added sequentially" or something like that, instead of "separation by whitespace". https://github.com/dlang/phobos/pull/7557 -Steve Thanks for the answer and the pr. Unfortunately my goal here is to simulate a partner tool written in C/C++ which supports this behavior. I will also create an enhancement issue for supporting this behavior. Kind regards Anste Enhancement issue: https://issues.dlang.org/show_bug.cgi?id=21045 Kind regards André An enhancement is likely to hit some corner-cases involving list termination requiring choices that are not fully generic. Any time a legal list value looks like a legal option. Perhaps the most important case is single digit numeric options like '-1', '-2'. These are legal short form options, and there are programs that use them. They are also somewhat common numeric values to include in command lines inputs. I ran into a couple cases like this with a getopt cover I wrote. The cover supports runtime processing of command arguments in the order entered on the command line rather than the compile-time getopt() call order. Since it was only for my stuff, not Phobos, it was an easy choice: Disallow single digit short options. But a Phobos enhancement might make other choices. IIRC, a characteristic of the current getopt implementation is that it does not have run-time knowledge of all the valid options, so the set of ambiguous entries is larger than just the limited set of options specified in the program. Essentially, anything that looks syntactically like an option. Doesn't mean an enhancement can't be built, just that there might some constraints to be aware of. --Jon
Re: Looking for a Code Review of a Bioinformatics POC
On Friday, 12 June 2020 at 06:20:59 UTC, H. S. Teoh wrote: I glanced over the implementation of byLine. It appears to be the unhappy compromise of trying to be 100% correct, cover all possible UTF encodings, and all possible types of input streams (on-disk file vs. interactive console). It does UTF decoding and resizing of arrays, and a lot of other frilly little squirrelly things. In fact I'm dismayed at how hairy it is, considering the conceptual simplicity of the task! Given this, it will definitely be much faster to load in large chunks of the file at a time into a buffer, and scanning in-memory for linebreaks. I wouldn't bother with decoding at all; I'd just precompute the byte sequence of the linebreaks for whatever encoding the file is expected to be in, and just scan for that byte pattern and return slices to the data. This is basically what bufferedByLine in tsv-utils does. See: https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/utils.d#L793. tsv-utils has the advantage of only needing to support utf-8 files with Unix newlines, so the code is simpler. (Windows newlines are detected, this occurs separately from bufferedByLine.) But as you describe, support for a wider variety of input cases could be done without sacrificing basic performance. iopipe provides much more generic support, and it is quite fast. Having said all of that, though: usually in non-trivial programs reading input is the least of your worries, so this kind of micro-optimization is probably unwarranted except for very niche cases and for micro-benchmarks and other such toy programs where the cost of I/O constitutes a significant chunk of running times. But knowing what byLine does under the hood is definitely interesting information for me to keep in mind, the next time I write an input-heavy program. tsv-utils tools saw performance gains of 10-40% by moving from File.byLine to bufferedByLine, depending on tool and type of file (narrow or wide). Gains of 5-20% were obtained by switching from File.write to BufferedOutputRange, with some special cases improving by 50%. tsv-utils tools aren't micro-benchmarks, but they are not typical apps either. Most of the tools go into a tight loop of some kind, running a transformation on the input and writing to the output. Performance is a real benefit to these tools, as they get run on reasonably large data sets.
Re: Looking for a Code Review of a Bioinformatics POC
On Friday, 12 June 2020 at 00:58:34 UTC, duck_tape wrote: On Thursday, 11 June 2020 at 23:45:31 UTC, H. S. Teoh wrote: Hmm, looks like it's not so much input that's slow, but *output*. In fact, it looks pretty bad, taking almost as much time as overlap() does in total! [snip...] I'll play with that a bit tomorrow! I saw a nice implementation on eBay's tsvutils that I may need to look closer at. Someone else suggested that stdout flushes per line by default. I dug around the stdlib but could confirm that. I also played around with setvbuf but it didn't seem to change anything. Have you run into that before / know if stdout is flushing every newline? I'm not above opening '/dev/stdout' as a file of that writes faster. I put some comparative benchmarks in https://github.com/jondegenhardt/dcat-perf. It compares input and output using standard Phobos facilities (File.byLine, File.write), iopipe (https://github.com/schveiguy/iopipe), and the tsv-utils buffered input and buffered output facilities. I haven't spent much time on results presentation, I know it's not that easy to read and interpret the results. Brief summary - On files with short lines buffering will result in dramatic throughput improvements over the standard phobos facilities. This is true for both input and output, through likely for different reasons. For input iopipe is the fastest available. tsv-utils buffered facilities are materially faster than phobos for both input and output, but not as fast as iopipe for input. Combining iopipe for input with tsv-utils BufferOutputRange for output works pretty well. For files with long lines both iopipe and tsv-utils BufferedByLine are materially faster than Phobos File.byLine when reading. For writing there wasn't much difference from Phobos File.write. A note on File.byLine - I've had many opportunities to compare Phobos File.byLine to facilities in other programming languages, and it is not bad at all. But it is beatable. About Memory Mapped Files - The benchmarks don't include compare against mmfile. They certainly make sense as a comparison point. --Jon
Re: Idiomatic way to write a range that tracks how much it consumes
On Monday, 27 April 2020 at 05:06:21 UTC, anon wrote: To implement your option A you could simply use std.range.enumerate. Would something like this work? import std.algorithm.iteration : map; import std.algorithm.searching : until; import std.range : tee; size_t bytesConsumed; auto result = input.map!(a => a.yourTransformation ) .until!(stringTerminator) .tee!(a => bytesConsumed++); // bytesConsumed is automatically updated as result is consumed That's interesting. Wouldn't work quite like, but something similar would, but I don't think it quite achieves what I want. One thing that's missing is that the initial input is simply a string, there's nothing to map over at that point. There is however a transformation step that transforms the string into a sequence of slices. Then there's a transformation on those slices. That would be a step prior to the 'map' step. Also, in my case 'map' cannot be used, because each slice may produce multiple outputs. The specifics are minor details, not really so important. The implementation can take a form along the lines described. However, structuring like this exposes the details of these steps to all callers. That is, all callers would have to write the code above. My goal is encapsulate the steps into a single range all callers can use. That is, encapsulate something like the steps you have above in a standalone range that takes the input string as an argument, produces all the output elements, and preserves the bytesConsumed in a way the caller can access it.
Re: Idiomatic way to write a range that tracks how much it consumes
On Monday, 27 April 2020 at 04:51:54 UTC, Steven Schveighoffer wrote: On 4/26/20 11:38 PM, Jon Degenhardt wrote: Is there a better way to write this? I had exactly the same problems. I created this to solve the problem, I've barely tested it, but I plan to use it with all my parsing utilities on iopipe: https://code.dlang.org/packages/bufref https://github.com/schveiguy/bufref/blob/master/source/bufref.d Thanks Steve, I'll definitely take a look at this. --Jon
Re: Idiomatic way to write a range that tracks how much it consumes
On Monday, 27 April 2020 at 04:41:58 UTC, drug wrote: 27.04.2020 06:38, Jon Degenhardt пишет: Is there a better way to write this? --Jon I don't know a better way, I think you enlist all possible ways - get a value using either `front` or special range member. I prefer the second variant, I don't think it is less consistent with range paradigms. Considering you need amount of consumed bytes only when range is empty the second way is more effective. Thanks. Of two, I like the second better as well.
Idiomatic way to write a range that tracks how much it consumes
I have a string that contains a sequence of elements, then a terminator character, followed by a different sequence of elements (of a different type). I want to create an input range that traverses the initial sequence. This is easy enough. But after the initial sequence has been traversed, the caller will need to know where the next sequence starts. That is, the caller needs to know the index in the input string where the initial sequence ends and the next sequence begins. The values returned by the range are a transformation of the input, so the values by themselves are insufficient for the caller to determined how much of the string has been consumed. And, the caller cannot simply search for the terminator character. Tracking the number of bytes consumed is easy enough. I like to do in a way that is consistent with D's normal range paradigm. Two candidate approaches: a) Instead of having the range return the individual values, it could return a tuple containing the value and the number of bytes consumed. b) Give the input range an extra member function which returns the number of bytes consumed. The caller could call this after 'empty()' returns true to find the amount of data consumed. Both will work, but I'm not especially satisfied with either. Approach (a) seems more consistent with the typical range paradigms, but also more of a hassle for callers. Is there a better way to write this? --Jon
Re: Integration tests
On Friday, 17 April 2020 at 16:56:57 UTC, Russel Winder wrote: Hi, Thinking of trying to do the next project in D rather than Rust, but… Rust has built in unit testing on a module basis. D has this so no problem. Rust allows for integration tests in the tests directory of a project. These are automatically build and run along with all unit tests as part of "cargo test". Does D have any integrated support for integration tests in the way Rust does? Automated testing is important, perhaps you describe further what's needed? I haven't worked with Rust test frameworks, but I took a look at the description of the integration tests and unit tests. It wasn't immediately obvious what can be done with the Rust integration test framework that cannot be done with D's unittest framework. An important concept described was testing a module as an external caller. That would seem very be doable using D's unittest framework. For example, one could create a set of tests against Phobos, put them in a separate location (e.g. a separate file), and arrange to have the unittests run as part of a CI process run along with a build. My look was very superficial, perhaps you could explain more.
Re: How to correctly import tsv-utilites functions?
On Tuesday, 14 April 2020 at 20:25:08 UTC, p.shkadzko wrote: On Tuesday, 14 April 2020 at 20:05:28 UTC, Steven Schveighoffer wrote: On 4/14/20 3:34 PM, p.shkadzko wrote: [...] What about using dependency tsv-utils:common ? Looks like tsv-utils is a collection of subpackages, and the main package just serves as a namespace. -Steve Yes, it works! Thank you. Glad that worked for you. (And thanks Steve!) I have a small app with an example of a dub.json file that pulls the tsv-utils common dependencies this way: https://github.com/jondegenhardt/dcat-perf/blob/master/dub.json --Jon
Re: Unexpected result with std.conv.to
On Friday, 15 November 2019 at 03:51:04 UTC, Joel wrote: I made a feature that converts, say, [9:59am] -> [10:00am] to 1 minute. but found '9'.to!int = 57 (not 9). Doesn't seem right... I'm guessing that's standard though, same with ldc. Use a string or char[] array. e.g. writeln("9".to!int) => 9. With a single 'char' what is being produced is the ascii value of the character.
Re: csvReader & specifying separator problems...
On Thursday, 14 November 2019 at 12:25:30 UTC, Robert M. Münch wrote: Just trying a very simple thing and it's pretty hard: "Read a CSV file (raw_data) that has a ; separator so that I can iterate over the lines and access the fields." csv_data = raw_data.byLine.joiner("\n") From the docs, which I find extremly hard to understand: auto csvReader(Contents = string, Malformed ErrorLevel = Malformed.throwException, Range, Separator = char)(Range input, Separator delimiter = ',', Separator quote = '"') So, let's see if I can decyphre this, step-by-step by trying out: csv_records = csv_data.csvReader(); Would split the CSV data into iterable CSV records using ',' char as separator using UFCS syntax. When running this I get: [...] Side comment - This code looks like it was taken from the first example in the std.csv documentation. To me, the code in the std.csv example is doing something that might not be obvious at first glance and is potentially confusing. In particular, 'byLine' is not reading individual CSV records. CSV can have embedded newlines, these are identified by CSV escape syntax. 'byLine' doesn't know the escape syntax. If there are embedded newlines, 'byLine' will read partial records, which may not be obvious at first glance. The .joiner("\n") step puts the newline back, stitching fields and records back together again in the process. The effect is to create an input range of characters representing the entire file, using 'byLine' to do buffered reads. This input range is passed to CSVReader. This could also be done using 'byChunk' and 'joiner' (with no separator). This would use a fixed size buffer, no searching for newlines while reading, so it should be faster. An example: csv_by_chunk.d import std.algorithm; import std.csv; import std.conv; import std.stdio; import std.typecons; import std.utf; void main() { // Small buffer used to show it works. Normally would use a larger buffer. ubyte[16] buffer; auto stdinBytes = stdin.byChunk(buffer).joiner; auto stdinDChars = stdinBytes.map!((ubyte b) => cast(char) b).byDchar; writefln("--"); foreach (record; stdinDChars.csvReader!(Tuple!(string, string, string))) { writefln("Field 0: |%s|", record[0]); writefln("Field 1: |%s|", record[1]); writefln("Field 2: |%s|", record[2]); writefln("--"); } } Pass it csv data without embedded newlines: $ echo $'abc,def,ghi\njkl,mno,pqr' | ./csv_by_chunk -- Field 0: |abc| Field 1: |def| Field 2: |ghi| -- Field 0: |jkl| Field 1: |mno| Field 2: |pqr| -- Pass it csv data with embedded newlines: $ echo $'abc,"LINE 1\nLINE 2",ghi\njkl,mno,pqr' | ./csv_by_chunk -- Field 0: |abc| Field 1: |LINE 1 LINE 2| Field 2: |ghi| -- Field 0: |jkl| Field 1: |mno| Field 2: |pqr| -- An example like this may avoid the confusion about newlines. Unfortunately, the need to do the odd looking conversion from ubyte to char/dchar is undesirable in a code example. I haven't found a cleaner way to write that. If there's a nicer way I'd appreciate hearing about it. --Jon
Re: formatting a float or double in a string with all significant digits kept
On Thursday, 10 October 2019 at 17:12:25 UTC, dan wrote: Thanks also berni44 for the information about the dig attribute, Jon for the neat packaging into one line using the attribute on the type. Unfortunately, the version of gdc that comes with the version of debian that i am using does not have the dig attribute yet, but perhaps i can upgrade, and eventually i think gdc will have it. Glad these ideas helped. The value of the 'double.dig' property is not going to change between compilers/versions/etc. It's really a property of IEEE 754 floating point for 64 bit floats. (D specified the size of double as 64). So, if you are using double, then it's pretty safe to use 15 until the compiler you're using is further along on versions. Declare an enum or const variable to give it a name so you can track it down later. Also, don't get thrown off by the PI is a real, not a double. D supports 80 bit floats as real, so constants like PI are defined as real. But if you convert PI to a double, it'll then have 15 significant bits of precision. --Jon
Re: formatting a float or double in a string with all significant digits kept
On Wednesday, 9 October 2019 at 05:46:12 UTC, berni44 wrote: On Tuesday, 8 October 2019 at 20:37:03 UTC, dan wrote: But i would like to be able to do this without knowing the expansion of pi, or writing too much code, especially if there's some d function like writeAllDigits or something similar. You can use the property .dig to get the number of significant digits of a number: writeln(PI.dig); // => 18 You still need to account for the numbers before the dot. If you're happy with scientific notation you can do: auto t = format("%.*e", PI.dig, PI); writeln("PI = ",t); Using the '.dig' property is a really nice idea and looks very useful for this. A clarification though - It's the significant digits in the data type, not the value. (PI is 18 because it's a real, not a double.) So: writeln(1.0f.dig, ", ", float.dig); => 6, 6 writeln(1.0.dig, ", ", double.dig); => 15, 15 writeln(1.0L.dig, ", ", real.dig); => 18, 18 Another possibility would be to combine the '.dig' property with the "%g" option, similar to the use "%e" shown. For example, these lines: writeln(format("%0.*g", PI.dig, PI)); writeln(format("%0.*g", double.dig, 1.0)); writeln(format("%0.*g", double.dig, 100.0)); writeln(format("%0.*g", double.dig, 1.0001)); writeln(format("%0.*g", double.dig, 0.0001)); produce: 3.14159265358979324 1 100 1.0001 1e-08 Hopefully experimenting with the different formatting options available will yield one that works for your use case.
Re: Help me decide D or C
On Wednesday, 31 July 2019 at 18:38:02 UTC, Alexandre wrote: Should I go for C and then when I become a better programmer change to D? Should I start with D right now? In my view, the most important thing is the decision you've already made - to pick a programming language and learn it in a reasonable bit of depth. Which programming language you choose is less important. No matter which choice you make you'll have the opportunity to learn skills that will transfer to other programming languages. As you can tell from the other responses, the pros and cons of a learning a specific language depend quite a bit on what you hope to get out of it, and are to a fair extent subjective. But both C and D provide meaningful opportunities to gain worthwhile experience. A couple reasons for considering learning D over C are its support for functional programming and templates. These were also mentioned by a few other people. These are not really "beginner" topics, but as one moves past the beginner stage they are really quite valuable techniques to start mastering. For both D is the far better option, and it's not necessary to use either when starting out. --Jon
Re: rdmd takes 2-3 seconds on a first-run of a simple .d script
On Saturday, 25 May 2019 at 22:18:16 UTC, Andre Pany wrote: On Saturday, 25 May 2019 at 08:32:08 UTC, BoQsc wrote: I have a simple standard .d script and I'm getting annoyed that it takes 2-3 seconds to run and see the results via rdmd. Also please keep in mind there could be other factors like slow disks, anti virus scanners,... which causes a slow down. I have seen similar behavior that I attribute to virus scan software. After compiling a program, the first run takes several seconds to run, after that it runs immediately. I'm assuming the first run of an unknown binary triggers a scan, though I cannot be completely sure. Try compiling a new binary in D or C++ and see if a similar effect is seen. --Jon
Re: Poor regex performance?
On Thursday, 4 April 2019 at 10:31:43 UTC, Julian wrote: On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole wrote: If you need performance use ldc not dmd (assumed). LLVM has many factors better code optimizes than dmd does. Thanks! I already had dmd installed from a brief look at D a long time ago, so I missed the details at https://dlang.org/download.html ldc2 -O3 does a lot better, but the result is still 30x slower without PCRE. Try: ldc2 -O3 -release -flto=thin -defaultlib=phobos2-ldc-lto,druntime-ldc-lto -enable-inlining This will improve inlining and optimization across the runtime library boundaries. This can help in certain types of code.
Dub: A json/sdl equivalent to --combined command line option?
In Dub, is there a way to specify the equivalent of the --combined command line argument in the json/sdl package config file? What I'd like to be able to do is create a custom build type such that $ dub build --build=build-xyz builds in combined mode, without needing to add the --combined on the command line. Putting it on the command line as follows did what I intended: $ dub build --build=build-xyz --combined --Jon
Re: Which Docker to use?
On Monday, 22 October 2018 at 18:44:01 UTC, Jacob Carlborg wrote: On 2018-10-21 20:45, Jon Degenhardt wrote: The issue that caused me to go to Ubuntu 16.04 had to do with uncaught exceptions when using LTO with the gold linker and LDC 1.5. Problem occurred with 14.04, but not 16.04. I should go back and retest on Ubuntu 14.04 with a more recent LDC, it may well have been corrected. The issue thread is here: https://github.com/ldc-developers/ldc/issues/2390. Ah, that might be the reason. I am not using LTO. You might want to try a newer version of LDC as well since 1.5 is quite old now. I switched to LDC 1.12.0. The problem remains with LTO and static builds on Ubuntu 14.04. Ubuntu 16.04 is required, at least with LTO of druntime/phobos. The good news on this front is that the regularly updated dlang2 docker images work fine with LTO on druntime/phobos (using the LTO build support available in LDC 1.9.0). Examples of travis-ci setups for both dlanguage and dlang2 docker images are available on the tsv-utils travis config: https://github.com/eBay/tsv-utils/blob/master/.travis.yml. Look for the DOCKERSPECIAL environment variables.
Re: d word counting approach performs well but has higher mem usage
On Saturday, 3 November 2018 at 14:26:02 UTC, dwdv wrote: Hi there, the task is simple: count word occurrences from stdin (around 150mb in this case) and print sorted results to stdout in a somewhat idiomatic fashion. Now, d is quite elegant while maintaining high performance compared to both c and c++, but I, as a complete beginner, can't identify where the 10x memory usage (~300mb, see results below) is coming from. Unicode overhead? Internal buffer? Is something slurping the whole file? Assoc array allocations? Couldn't find huge allocs with dmd -vgc and -profile=gc either. What did I do wrong? Not exactly the same problem, but there is relevant discussion in the blog post I wrote a while ago: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ See in particular the section on Associate Array lookup optimization. This takes advantage of the fact that it's only necessary to create the immutable string the first time a key is entered into the hash. Subsequent occurrences do not need to take this step. As creating allocates new memory, even if only used temporarily, this is a meaningful savings. There have been additional APIs added to the AA interface since I wrote the blog post, I believe it is now possible to accomplish the same thing with more succinct code. Other optimization possibilities: * Avoid auto-decode: Not sure if your code is hitting this, but if so it's a significant performance hit. Unfortunately, it's not always obvious when this is happening. The task your are performing doesn't need auto-decode because it is splitting on single-byte utf-8 char boundaries (newline and space). * LTO on druntime/phobos: This is easy and will have a material speedup. Simply add '-defaultlib=phobos2-ldc-lto,druntime-ldc-lto' to the 'ldc2' build line, after the '-flto=full' entry. This will be a win because it will enable a number of optimizations in the internal loop. * Reading the whole file vs line by line - 'byLine' is really fast. It's also nice and general, as it allows reading arbitrary size files or standard input without changes to the code. However, it's not as fast as reading the file in a single shot. * std.algorithm.joiner - Has improved dramatically, but is still slower than a foreach loop. See: https://github.com/dlang/phobos/pull/6492 --Jon
Re: Which Docker to use?
On Sunday, 21 October 2018 at 18:11:37 UTC, Jacob Carlborg wrote: On 2018-10-18 01:15, Jon Degenhardt wrote: I need to use docker to build static linked Linux executables. My reason is specific, may be different than the OP's. I'm using Travis-CI to build executables. Travis-CI uses Ubuntu 14.04, but static linking fails on 14.04. The standard C library from Ubuntu 16.04 or later is needed. There may be other/better ways to do this, I don't know. That's interesting. I've built static binaries for DStep using LDC on Travis CI without any problems. My comment painted too broad a brush. I had forgotten how specific the issue I saw was. Apologies for the confusion. The issue that caused me to go to Ubuntu 16.04 had to do with uncaught exceptions when using LTO with the gold linker and LDC 1.5. Problem occurred with 14.04, but not 16.04. I should go back and retest on Ubuntu 14.04 with a more recent LDC, it may well have been corrected. The issue thread is here: https://github.com/ldc-developers/ldc/issues/2390.
Re: Which Docker to use?
On Friday, 19 October 2018 at 22:16:04 UTC, Ky-Anh Huynh wrote: On Wednesday, 17 October 2018 at 23:15:53 UTC, Jon Degenhardt wrote: I need to use docker to build static linked Linux executables. My reason is specific, may be different than the OP's. I'm using Travis-CI to build executables. Travis-CI uses Ubuntu 14.04, but static linking fails on 14.04. The standard C library from Ubuntu 16.04 or later is needed. There may be other/better ways to do this, I don't know. Yes I'm also using Travis-CI and that's why I need some Docker support. I'm using dlanguage/ldc. The reason for that choice was because it was what was available when I put the travis build together. As you mentioned, it hasn't been updated in a while. I'm still producing this build with an older ldc version, but when I move to a more current version I'll have to switch to a different docker image. My travis config is here: https://github.com/eBay/tsv-utils/blob/master/.travis.yml. Look for the sections referencing the DOCKERSPECIAL environment variable.
Re: Which Docker to use?
On Wednesday, 17 October 2018 at 08:08:44 UTC, Gary Willoughby wrote: On Wednesday, 17 October 2018 at 03:37:21 UTC, Ky-Anh Huynh wrote: Hi, I need to build some static binaries with LDC. I also need to execute builds on both platform 32-bit and 64-bit. From Docker Hub there are two image groups: * language/ldc (last update 5 months ago) * dlang2/ldc-ubuntu (updated recently) Which one do you suggest? Thanks a lot. To be honest, you don't need docker for this. You can just download LDC in a self-contained folder and use it as is. https://github.com/ldc-developers/ldc/releases That's what I do on Linux. I need to use docker to build static linked Linux executables. My reason is specific, may be different than the OP's. I'm using Travis-CI to build executables. Travis-CI uses Ubuntu 14.04, but static linking fails on 14.04. The standard C library from Ubuntu 16.04 or later is needed. There may be other/better ways to do this, I don't know.
Re: Error: variable 'xyz' has scoped destruction, cannot build closure
On Friday, 5 October 2018 at 16:34:32 UTC, Paul Backus wrote: On Friday, 5 October 2018 at 06:56:49 UTC, Nicholas Wilson wrote: On Friday, 5 October 2018 at 06:44:08 UTC, Nicholas Wilson wrote: Alas is does not because each does not accept additional argument other than the range. Shouldn't be hard to fix though. https://issues.dlang.org/show_bug.cgi?id=19287 You can thread multiple arguments through to `each` using `std.range.zip`: tenRandomNumbers .zip(repeat(output)) .each!(unpack!((n, output) => output.appendln(n.to!string))); Full code: https://run.dlang.io/is/Qe7uHt Very interesting, thanks. It's a clever way to avoid the delegate capture issue. (Aside: A nested function that accesses 'output' from lexical context has the same issue as delegates wrt to capturing the variable.)
Re: Error: variable 'xyz' has scoped destruction, cannot build closure
On Friday, 5 October 2018 at 06:44:08 UTC, Nicholas Wilson wrote: On Friday, 5 October 2018 at 06:22:57 UTC, Nicholas Wilson wrote: tenRandomNumbers.each!((n,o) => o.appendln(n.to!string))(output); or tenRandomNumbers.each!((n, ref o) => o.appendln(n.to!string))(output); should hopefully do the trick (run.dlang.io seems to be down atm). Alas is does not because each does not accept additional argument other than the range. Shouldn't be hard to fix though. Yeah, that's what I was seeing also. Thanks for taking a look. Is there perhaps a way to limit the scope of the delegate to the local function? Something that would tell the compiler the delegate has a lifetime shorter than the struct. One specific it points out is that this a place where the BufferedOutputRange I wrote cannot be used interchangeably with other output ranges. It's minor, but the intent was to be able to pass this anyplace an output range could be used.
Error: variable 'xyz' has scoped destruction, cannot build closure
I got the compilation error in the subject line when trying to create a range via std.range.generate. Turns out this was caused by trying to create a closure for 'generate' where the closure was accessing a struct containing a destructor. The fix was easy enough: write out the loop by hand rather than using 'generate' with a closure. What I'm wondering/asking is if there alternate way to do this that would enable the 'generate' approach. This is more curiosity/learning at this point. Below is a stripped down version of what I was doing. I have a struct for output buffering. The destructor writes any data left in the buffer to the output stream. This gets passed to routines performing output. It was this context that I created a generator that wrote to it. example.d- struct BufferedStdout { import std.array : appender; private auto _outputBuffer = appender!(char[]); ~this() { import std.stdio : write; write(_outputBuffer.data); _outputBuffer.clear; } void appendln(T)(T stuff) { import std.range : put; put(_outputBuffer, stuff); put(_outputBuffer, "\n"); } } void foo(BufferedStdout output) { import std.algorithm : each; import std.conv : to; import std.range: generate, takeExactly; import std.random: Random, uniform, unpredictableSeed; auto randomGenerator = Random(unpredictableSeed); auto randomNumbers = generate!(() => uniform(0, 1000, randomGenerator)); auto tenRandomNumbers = randomNumbers.takeExactly(10); tenRandomNumbers.each!(n => output.appendln(n.to!string)); } void main(string[] args) { foo(BufferedStdout()); } End of example.d- Compiling the above results in: $ dmd example.d example.d(22): Error: variable `example.foo.output` has scoped destruction, cannot build closure As mentioned, using a loop rather than 'generate' works fine, but help with alternatives that would use generate would be appreciated. The actual buffered output struct has more behind it than shown above, but not too much. For anyone interested it's here: https://github.com/eBay/tsv-utils/blob/master/common/src/tsvutil.d#L358
Re: tupleof function parameters?
On Tuesday, 28 August 2018 at 06:20:37 UTC, Sebastiaan Koppe wrote: On Tuesday, 28 August 2018 at 06:11:35 UTC, Jon Degenhardt wrote: The goal is to write the argument list once and use it to create both the function and the Tuple alias. That way I could create a large number of these function / arglist tuple pairs with less brittleness. --Jon I would probably use a combination of std.traits.Parameters and std.traits.ParameterIdentifierTuple. Parameters returns a tuple of types and ParameterIdentifierTuple returns a tuple of strings. Maybe you'll need to implement a staticZip to interleave both tuples to get the result you want. (although I remember seeing one somewhere). Alex, Sebastiaan - Thanks much, this looks like it should get me what I'm looking for. --Jon
tupleof function parameters?
I'd like to create a Tuple alias representing a function's parameter list. Is there a way to do this? Here's an example creating a Tuple alias for a function's parameters by hand: import std.typecons: Tuple; bool fn(string op, int v1, int v2) { switch (op) { default: return false; case "<": return v1 < v2; case ">": return v1 > v2; } } alias fnArgs = Tuple!(string, "op", int, "v1", int, "v2"); unittest { auto args = fnArgs("<", 3, 5); assert(fn(args[])); } This is quite useful. I'm wondering if there is a way to create the 'fnArgs' alias from the definition of 'fn' without needing to manually write out the '(string, "op", int, "v1", int, "v2")' sequence by hand. Something like a 'tupleof' operation on the function parameter list. Or conversely, define the tuple and use it when defining the function. The goal is to write the argument list once and use it to create both the function and the Tuple alias. That way I could create a large number of these function / arglist tuple pairs with less brittleness. --Jon
Re: Splitting up large dirty file
On Monday, 21 May 2018 at 15:00:09 UTC, Dennis wrote: I want to be convinced that Range programming works like a charm, but the procedural approaches remain more flexible (and faster too) it seems. Thanks for the example. On Monday, 21 May 2018 at 22:11:42 UTC, Dennis wrote: In this case I used drop to drop lines, not characters. The exception was thrown by the joiner it turns out. ... From the benchmarking I did, I found that ranges are easily an order of magnitude slower even with compiler optimizations: My general experience is that range programming works quite well. It's especially useful when used to do lazy processing and as a result minimize memory allocations. I've gotten quite good performance with these techniques (see my DConf talk slides: https://dconf.org/2018/talks/degenhardt.html). Your benchmarks are not against the file split case, but if you benchmarked that you may have also seen it as slow. It that case you may be hitting specific areas where there are opportunities for performance improvement in the standard library. One is that joiner is slow (PR: https://github.com/dlang/phobos/pull/6492). Another is that the write[fln] routines are much faster when operating on a single large object than many small objects. e.g. It's faster to call write[fln] with an array of 100 characters than: (a) calling it 100 times with one character; (b) calling it once, with 100 characters as individual arguments (template form); (c) calling it once with range of 100 characters, each processed one at a time. When joiner is used as in your example, you not only hit the joiner performance issue, but the write[fln] issue. This is due to something that may not be obvious at first: When joiner is used to concatenate arrays or ranges, it flattens out the array/range into a single range of elements. So, rather than writing a line at a time, you example is effectively passing a character at a time to write[fln]. So, in the file split case, using byLine in an imperative fashion as in my example will have the effect of passing a full line at a time to write[fln], rather than individual characters. Mine will be faster, but not because it's imperative. The same thing could be achieved procedurally. Regarding the benchmark programs you showed - This is very interesting. It would certainly be worth additional looks into this. One thing I wonder is if the performance penalty may be due to a lack of inlining due to crossing library boundaries. The imperative versions aren't crossing these boundaries. If you're willing, you could try adding LDC's LTO options and see what happens. There are some instructions in the release notes for LDC 1.9.0 (https://github.com/ldc-developers/ldc/releases). Make sure you use the form that includes druntime and phobos. --Jon
Re: Splitting up large dirty file
On Thursday, 17 May 2018 at 20:08:09 UTC, Dennis wrote: On Wednesday, 16 May 2018 at 15:47:29 UTC, Jon Degenhardt wrote: If you write it in the style of my earlier example and use counters and if-tests it will work. byLine by itself won't try to interpret the characters (won't auto-decode them), so it won't trigger an exception if there are invalid utf-8 characters. When printing to stdout it seems to skip any validation, but writing to a file does give an exception: ``` auto inputStream = (args.length < 2 || args[1] == "-") ? stdin : args[1].File; auto outputFile = new File("output.txt"); foreach (line; inputStream.byLine(KeepTerminator.yes)) outputFile.write(line); ``` std.exception.ErrnoException@C:\D\dmd2\windows\bin\..\..\src\phobos\std\stdio.d(2877): (No error) According to the documentation, byLine can throw an UTFException so relying on the fact that it doesn't in some cases doesn't seem like a good idea. Instead of: auto outputFile = new File("output.txt"); try: auto outputFile = File("output.txt", "w"); That works for me. The second arg ("w") opens the file for write. When I omit it, I also get an exception, as the default open mode is for read: * If file does not exist: Cannot open file `output.txt' in mode `rb' (No such file or directory) * If file does exist: (Bad file descriptor) The second error presumably occurs when writing. As an aside - I agree with one of your bigger picture observations: It would be preferable to have more control over utf-8 error handling behavior at the application level.
Re: Splitting up large dirty file
On Wednesday, 16 May 2018 at 07:06:45 UTC, Dennis wrote: On Wednesday, 16 May 2018 at 02:47:50 UTC, Jon Degenhardt wrote: Can you show the program you are using that throws when using byLine? Here's a version that only outputs the first chunk: ``` import std.stdio; import std.range; import std.algorithm; import std.file; import std.exception; void main(string[] args) { enforce(args.length == 2, "Pass one filename as argument"); auto lineChunks = File(args[1], "r").byLine.drop(4).chunks(10_000_000/10); new File("output.txt", "w").write(lineChunks.front.joiner); } ``` If you write it in the style of my earlier example and use counters and if-tests it will work. byLine by itself won't try to interpret the characters (won't auto-decode them), so it won't trigger an exception if there are invalid utf-8 characters.
Re: Splitting up large dirty file
On Tuesday, 15 May 2018 at 20:36:21 UTC, Dennis wrote: I have a file with two problems: - It's too big to fit in memory (apparently, I thought 1.5 Gb would fit but I get an out of memory error when using std.file.read) - It is dirty (contains invalid Unicode characters, null bytes in the middle of lines) I want to write a program that splits it up into multiple files, with the splits happening every n lines. I keep encountering roadblocks though: - You can't give Yes.useReplacementChar to `byLine` and `byLine` (or `readln`) throws an Exception upon encountering an invalid character. Can you show the program you are using that throws when using byLine? I tried a very simple program that reads and outputs line-by-line, then fed it a file that contained invalid utf-8. I did not see an exception. The invalid utf-8 was created by taking part of this file: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt (a commonly used file with utf-8 edge cases), plus adding a number of random hex characters, including null. I don't see exceptions thrown. The program I used: int main(string[] args) { import std.stdio; import std.conv : to; try { auto inputStream = (args.length < 2 || args[1] == "-") ? stdin : args[1].File; foreach (line; inputStream.byLine(KeepTerminator.yes)) write(line); } catch (Exception e) { stderr.writefln("Error [%s]: %s", args[0], e.msg); return 1; } return 0; }
Re: What's the proper way to use std.getopt?
On Monday, 11 December 2017 at 20:58:25 UTC, Jordi Gutiérrez Hermoso wrote: What's the proper style, then? Can someone show me a good example of how to use getopt and the docstring it automatically generates? The command line tools I published use the approach described in a number of the replies, but with a tad more structure. It's hardly perfect, but may be useful if you want more examples. See: https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/src/tsv-sample.d. See the main() routine and the TsvSampleOptions struct. Most of the tools have a similar pattern. --Jon
Re: splitter string/char different behavior
On Saturday, 30 September 2017 at 17:17:17 UTC, SrMordred wrote: writeln( "a.b.c".splitter('.').dropBack(1) ); //compiles ok writeln( "a.b.c".splitter(".").dropBack(1) ); //error: Error: template std.range.dropBack cannot deduce function from argument types !()(Result, int), candidates are: (...) Hm.. can someone explain whats going on? Let's try again. I'm not sure the full explanation, but likely involves two separate template overloads being instantiated, each with a separate definition of the return type. * "a.b.c".splitter('.') - This overload: https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L3696-L3703 * "a.b.c".splitter(".") - This overload: https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L3973-L3982 But why one supports dropBack and the other doesn't I don't know.
Re: splitter string/char different behavior
On Saturday, 30 September 2017 at 19:26:14 UTC, SrMordred wrote: For "a.b.c"splitter(x), Range r is a string, r.front is a char. The template can only be instantiated if the predicate function is valid. The predicate function is "a == b". Since r.front is a char, then s must be a type that can be compared with '=='. A string and char cannot be compared with '==', which is why the a valid template instantiation could not be found. Would it be correct to just update the documentation to say "Lazily splits a range using an char as a separator" ? what is it; wchar and dchar too? I notice the example that is there has ' ' as the element. But this works: writeln("a.b.c".splitter(".") ); Geez, my mistake. I'm sorry about that. It's dropback that's failing, not splitter.
Re: splitter string/char different behavior
On Saturday, 30 September 2017 at 17:17:17 UTC, SrMordred wrote: writeln( "a.b.c".splitter('.').dropBack(1) ); //compiles ok writeln( "a.b.c".splitter(".").dropBack(1) ); //error: Error: template std.range.dropBack cannot deduce function from argument types !()(Result, int), candidates are: (...) Hm.. can someone explain whats going on? It's easy to overlook, but documentation for splitter starts out: Lazily splits a range using an element as a separator. An element of a string is a char, not a string. It needs to be read somewhat literally, but it is correct. It's also part of template constraint, useful once you've become accustomed to reading them: auto splitter(alias pred = "a == b", Range, Separator)(Range r, Separator s) if (is(typeof(binaryFun!pred(r.front, s)) : bool) && For "a.b.c"splitter(x), Range r is a string, r.front is a char. The template can only be instantiated if the predicate function is valid. The predicate function is "a == b". Since r.front is a char, then s must be a type that can be compared with '=='. A string and char cannot be compared with '==', which is why the a valid template instantiation could not be found.
Re: Region-based memory management and GC?
On Saturday, 30 September 2017 at 07:41:21 UTC, Igor wrote: On Friday, 29 September 2017 at 22:13:01 UTC, Jon Degenhardt wrote: Have there been any investigations into using region-based memory management (aka memory arenas) in D, possibly in conjunction with GC allocated memory? Sounds like just want to use https://dlang.org/phobos/std_experimental_allocator_building_blocks_region.html. Wow, thanks, I did not know about this. Will check it out.
Region-based memory management and GC?
Have there been any investigations into using region-based memory management (aka memory arenas) in D, possibly in conjunction with GC allocated memory? This would be a very speculative idea, but it'd be interesting to know if there have been looks at this area. My own interest is request-response applications, where memory allocated as part of a specific request can be discarded as a single block when the processing of that request completes, without running destructors. I've also seen some papers describing GC systems targeting big data platforms that incorporate this idea. eg. http://www.ics.uci.edu/~khanhtn1/papers/osdi16.pdf --Jon
Re: DUB and LTO?
On Tuesday, 5 September 2017 at 11:36:06 UTC, Sönke Ludwig wrote: Am 24.01.2017 um 17:02 schrieb Las: How do I enable LTO in DUB in a sane way? I could add it to dflags, but I only want it on release builds. You can put a "buildTypes" section in your package recipe and override default dflags or lflags there just for the "release" build type. See https://code.dlang.org/package-format?lang=json#build-types There are examples in my dub.json files. One here: https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/dub.json#L24-L28. All the dub.json files in the repo are setup this way. Turns on LTO (thin) for LDC on OS X, not used for other builds. Works in Travis-CI for the combos of os x and linux with ldc and dmd. --Jon
Re: Help Required on Getopt
On Friday, 1 September 2017 at 19:04:39 UTC, Daniel Kozak wrote: I have same issue. How this help you? Catching exception does not help. How do I catch exception and still print help message? Your are correct, sorry about that. What my response showed is how to avoid printing the full stack trace and instead printing a more nicely formatted error message. And separately, how to print formatted help. But, you are correct in that you can't directly print the formatted help text from the catch block as shown. In particular, the GetoptResult returned by getopt is not available. I don't have any examples that try to work around this. Presumably one could call getopt again to get the options list, then generate the formatted help. It'd be an annoyance, though perhaps judicious use of AliasSeq might make the code structure reasonable. --Jon
Re: Help Required on Getopt
On Friday, 1 September 2017 at 13:13:39 UTC, Vino.B wrote: Hi All, When i run the below program without any arguments "D1.d -r" it is throwing error, but i need it to show the help menu [snip...] Hi Vino, To get good error message behavior you need to put the construct in a try-catch block. Then you can choose how to respond. An example here: https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-append/src/tsv-append.d#L138-L194. This code prints outs the error message from the exception. In your case: "Missing value for argument -r.". But, you could also print out the help text as well. There is an example of that as well in the above code block, look for the 'if (r.helpWanted)' test. --Jon
Re: General performance tip about possibly using the GC or not
On Tuesday, 29 August 2017 at 00:52:11 UTC, Cecil Ward wrote: I am vacillating - considering breaking a lifetime's C habits and letting the D garbage collector make life wonderful by just cleaning up after me and ruining my future C disciple by not deleting stuff myself. The tsv command line tools I open-sourced haven't any problems with GC. They are only one type of app, perhaps better suited to GC than other apps, but still, it is a reasonable data point. I've done rather extensive benchmarking against similar tools written in native languages, mostly C. The D tools were faster, often by significant margins. The important part is not that they were faster on any particular benchmark, but that they did well against a fair variety of tools written by a fair number of different programmers, including several standard unix tools. The tools were programmed using the standard library where possible, without resorting to low-level optimizations. I don't know if the exercise says anything about GC vs manual memory management from the perspective of maximum possible code optimization. But, I do think it is suggestive of benefits that may occur in more regular programming, in that GC allows you to spend more time on other aspects of your program, and less time on memory management details. That said, all the caveats, suggestions, etc. given by others in this thread apply to my programs to. GC is hardly a free lunch. Benchmarks on the tsv utilities: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md Blog post describing some of the techniques used: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ --Jon
Re: std.range.put vs R.put: Best practices?
On Monday, 21 August 2017 at 05:58:01 UTC, Jonathan M Davis wrote: On Monday, August 21, 2017 02:34:23 Mike Parker via Digitalmars-d-learn wrote: On Sunday, 20 August 2017 at 18:08:27 UTC, Jon Degenhardt wrote: > Documentation for std.range.put > (https://dlang.org/phobos/std_range_primitives.html#.put) has > > the intriguing line: >> put should not be used "UFCS-style", e.g. r.put(e). Doing >> this >> may call R.put directly, by-passing any transformation >> feature >> provided by Range.put. put(r, e) is prefered. > > This raises the question of whether std.range.put is always > preferred over calling an output range's 'put' method, or if > there are times when calling an output range's 'put' method > directly is preferred. Also, it seems an easy oversight to > unintentionally call the wrong one. > > Does anyone have recommendations or best practice > suggestions for which form to use and when? It's recommended to always use the utility function in std.range unless you are working with an output range that has a well known put implementation. The issue is that put can be implemented to take any number or type of arguments, but as long as it has an implementation with one parameter of the range's element type, then the utility function will do the right thing internally whether you pass multiple elements, a single element, an array... It's particularly useful in generic code where most ranges are used. But again, if you are working with a specific range type then you can do as you like. Also, when the output range is a dynamic array, UFCS with the utility function is fine. As for mitigating the risk of calling the wrong one, when you do so you'll either get a compile-time error because of a parameter mismatch or it will do the right thing. If there's another likely outcome, I'm unaware of it. To add to that, the free function put handles putting different character types to a range of characters (IIRC, it also handles putting entire strings as well), whereas a particular implementation of put probably doesn't. In principle, a specific range type could do everything that the free function does, but it's highly unlikely that it will. In general, it's really just better to use the free function put, and arguably, we should have used a different function name for the output ranges themselves with the idea that the free function would always be the one called, and it would call the special function that the output ranges defined. Unfortunately, however, that's not how it works. In general, IMHO, output ranges really weren't thought out well enough. It's more like they were added as a countepart to input ranges because Andrei felt like they needed to be there rather than having them be fully fleshed out on their own. The result is a basic idea that's very powerful but that suffers in the details and probably needs at least a minor redesign (e.g. the output API has no concept of an output range that's full). In any case, I'd just suggest that you never use put with UFCS. Unfortunately, if you're using UFCS enough, it becomes habit to just call the function as if it were a member function, which is then a problem when using output ranges, but we're kind of stuck at this point. On the bright side, it's really only likely to cause issues in generic code where the member function might work with your tests but not everything that's passed to it. In other cases, if what you're doing doesn't work with the member function, then the code won't compile, and you'll know to switch to using the free function. Mike, Jonathan - Thanks for the detailed responses! Yes, by habit I use UFCS, there is where potential for the wrong call comes from. I agree also that output ranges are very powerful in concept, but the details are not fully fleshed out at this point. A few enhancements could make it much more compelling. --Jon
std.range.put vs R.put: Best practices?
Documentation for std.range.put (https://dlang.org/phobos/std_range_primitives.html#.put) has the intriguing line: put should not be used "UFCS-style", e.g. r.put(e). Doing this may call R.put directly, by-passing any transformation feature provided by Range.put. put(r, e) is prefered. This raises the question of whether std.range.put is always preferred over calling an output range's 'put' method, or if there are times when calling an output range's 'put' method directly is preferred. Also, it seems an easy oversight to unintentionally call the wrong one. Does anyone have recommendations or best practice suggestions for which form to use and when? --Jon
Re: Efficiently streaming data to associative array
On Wednesday, 9 August 2017 at 13:36:46 UTC, Steven Schveighoffer wrote: On 8/8/17 3:43 PM, Anonymouse wrote: On Tuesday, 8 August 2017 at 16:00:17 UTC, Steven Schveighoffer wrote: I wouldn't use formattedRead, as I think this is going to allocate temporaries for a and b. What would you suggest to use in its stead? My use-case is similar to the OP's in that I have a string of tokens that I want split into variables. using splitter(","), and then parsing each field using appropriate function (e.g. to!) For example, the OP's code, I would do: auto r = line.splitter(","); a = r.front; r.popFront; b = r.front; r.popFront; c = r.front.to!int; It would be nice if formattedRead didn't use appender, and instead sliced, but I'm not sure it can be fixed. Note, one could make a template that does this automatically in one line. -Steve The blog post Steve referred to has examples of this type processing while iterating over lines in a file. A couple different ways to access the elements are shown. AA access is addressed also: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ --Jon
Re: Too slow readln
On Sunday, 16 July 2017 at 17:03:27 UTC, unDEFER wrote: [snip] How to write in D grep not slower than GNU grep? GNU grep is pretty fast, it's tough to beat it reading one line at a time. That's because it can play a bit of a trick and do the initial match ignoring line boundaries and correct line boundaries later. There's a good discussion in this thread ("Why GNU grep is fast" by Mike Haertel): https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html --Jon
Re: Getopt default int init and zero
On Friday, 19 May 2017 at 12:09:38 UTC, Suliman wrote: I would like to check if user specified `0` as getopt parameter. But the problem that `int`'s are default in `0`. So if user did not specified nothing `int x` will be zero, and all other code will work as if it's zero. One way to do this is the use a callback function or delegate. Have the callback set both the main variable and a boolean tracking whether the option was entered. --Jon
Re: Processing a gzipped csv-file by line-by-line
On Wednesday, 10 May 2017 at 22:20:52 UTC, Nordlöw wrote: What's fastest way to on-the-fly-decompress and process a gzipped csv-fil line by line? Is it possible to combine http://dlang.org/phobos/std_zlib.html with some stream variant of File(path).byLineFast ? I was curious what byLineFast was, I'm guessing it's from here: https://github.com/biod/BioD/blob/master/bio/core/utils/bylinefast.d. I didn't test it, but it appears it may pre-date the speed improvements made to std.stdio.byLine perhaps a year and a half ago. If so, it might be worth comparing it to the current Phobos version, and of course iopipe. As mentioned in one of the other replies, byLine and variants aren't appropriate for CSV with escapes. For that, a real CSV parser is needed. As an alternative, run a converter that converts from csv to another format. --Jon
Re: Command Line Parsing
On Wednesday, 12 April 2017 at 09:51:34 UTC, Russel Winder wrote: Are Argon https://github.com/markuslaker/Argon or darg https://github. com/jasonwhite/darg getting traction as the default command line handling system for D or are they just peripheral and everyone just uses std.getopt https://dlang.org/phobos/std_getopt.html ? I use std.getopt in my tools. Overall it's pretty good, and the reliability of a package in the standard library has value. That said, I've bumped up against it's limits, and looking at the code, it's not clear to how extend it to more advanced use cases. There may be a case for introducing a next generation package. --Jon
Re: length = 0 clears reserve
On Tuesday, 11 April 2017 at 20:00:48 UTC, Jethro wrote: On Tuesday, 11 April 2017 at 03:00:29 UTC, Jon Degenhardt wrote: On Tuesday, 11 April 2017 at 01:59:57 UTC, Jonathan M Davis wrote: On Tuesday, April 11, 2017 01:42:32 Jethro via Digitalmars-d-learn wrote: [...] You can't reuse the memory of a dynamic array by simply setting its length to 0. If that were allowed, it would risk allow dynamic arrays to stomp on each others memory (since there is no guarantee that there are no other dynamic arrays referring to the same memory). However, if you know that there are no other dynamic arrays referrin to the same memory, then you can call assumeSafeAppend on the dynamic array, and then the runtime will assume that there are no other dynamic arrays referring to the same memory. [snip] Another technique that works for many cases is to use an Appender (std.array). Appender supports reserve and clear, the latter setting the length to zero without reallocating. A typical use case is an algorithm doing a series of appends, then setting the length to zero and starts appending again. --Jon Appender reports clear? Are you sure? Seems appender is no different than string, maybe worse? string as assumeSafeAppend, reserve and clear(although clear necessarily reallocates. They should have a function called empty, which resets the length to zero but doesn't reallocate. See the Appender.clear documentation (https://dlang.org/phobos/std_array.html#.Appender.clear), the key piece being: Clears the managed array. This allows the elements of the array to be reused for appending. I've tried using both dynamic array and appender in this way, setting the length of the dynamic array to zero vs using Appender.clear, in this cycle of fill-the-array by appending, operate on the array, clearing, and repeating. Appender is dramatically faster. And, if you look at GC reports you find that setting a dynamic array to zero creates garbage to collect, while Appender.clear does not. (Use the --DRT-gcopt=profile:1 command line option to see GC reports, described here: https://dlang.org/spec/garbage.html#gc_config).
Re: length = 0 clears reserve
On Tuesday, 11 April 2017 at 01:59:57 UTC, Jonathan M Davis wrote: On Tuesday, April 11, 2017 01:42:32 Jethro via Digitalmars-d-learn wrote: arrays have the ability to reserve but when setting the length to 0, it removes the reserve!! ;/ char[] buf; buf.reserve = 1000; buf.length = 0; assert(buf.capacity == 0); But I simply want to clear the buffer, not change it's reserve/capacity. I've tried to hack by setting the length to 0 through a pointer, but that still clears the capacity! I want to do this because I want to be able to reuse the array without ever reallocating(I'll set the capacity to the max that will ever be used, I don't have to worry about conflicts since it will always be ran serially). [snip] You can't reuse the memory of a dynamic array by simply setting its length to 0. If that were allowed, it would risk allow dynamic arrays to stomp on each others memory (since there is no guarantee that there are no other dynamic arrays referring to the same memory). However, if you know that there are no other dynamic arrays referrin to the same memory, then you can call assumeSafeAppend on the dynamic array, and then the runtime will assume that there are no other dynamic arrays referring to the same memory. [snip] Another technique that works for many cases is to use an Appender (std.array). Appender supports reserve and clear, the latter setting the length to zero without reallocating. A typical use case is an algorithm doing a series of appends, then setting the length to zero and starts appending again. --Jon
Re: pointer not aligned
On Friday, 31 March 2017 at 04:41:10 UTC, Joel wrote: Linking... ld: warning: pointer not aligned at address 0x10017A4C9 (_D30TypeInfo_AxS3std4file8DirEntry6__initZ + 16 from .dub/build/application-debug-posix.osx-x86_64-dmd_2072-EFDCDF4D45F944F7A9B1AEA5C32F81ED/spellit.o) ... and this goes on forever! Issue: https://issues.dlang.org/show_bug.cgi?id=17289
Re: Output range and writeln style functions
On Monday, 23 January 2017 at 22:20:59 UTC, Ali Çehreli wrote: On 01/23/2017 12:48 PM, Jon Degenhardt wrote: [snip] > So, what I'm really wondering is if there is built-in way to get closer to: outputStream.writefln(...); If it's about formatted output then perhaps formattedWrite? https://dlang.org/phobos/std_format.html#.formattedWrite The same function is used with stdout and an Appender: [snip] Ali Oh, that is better, thanks! --Jon
Re: Output range and writeln style functions
On Monday, 23 January 2017 at 08:03:14 UTC, Ali Çehreli wrote: On 01/22/2017 01:54 PM, Jon Degenhardt wrote: I've been increasingly using output ranges in my code (the "component programming" model described in several articles on the D site). It works very well, except that it would often be more convenient to use writeln style functions rather than 'put'. Especially when you start by drafting a sketch of code using writeln functions, then convert it an output range. Seems an obvious thing, I'm wondering if I missed something. Are there ways to use writeln style functions with output ranges? --Jon I don't think I understand the question. :) If you need a variadic put(), then I've come up with the following mildly tested AllAppender. Just as a reminder, I've also used std.range.tee that allows tapping into the stream to see what's flying through: [snip] Ali So I guess the is answer is "no" :) It's mainly about consistency of the output primitives. Includes variadic args, formatting, and names of the primitives. I keep finding myself starting with something like: void writeLuckyNumber(string name, int luckyNumber) { writefln("Hello %s, your lucky number is %d", name, luckyNumber); } and then re-factoring it as: void writeLuckyNumber(OutputRange) (OutputRange outputStream, string name, int luckyNumber) if (isOutputRange!(OutputRange, char)) { import std.format; outputStream.put( format("Hello %s, your lucky number is %d\n", name, luckyNumber)); } Not bad, but the actual output statements are a bit harder to read, especially if people reading your code are not familiar with output ranges. So, what I'm really wondering is if there is built-in way to get closer to: outputStream.writefln(...); that I've overlooked. --Jon
Output range and writeln style functions
I've been increasingly using output ranges in my code (the "component programming" model described in several articles on the D site). It works very well, except that it would often be more convenient to use writeln style functions rather than 'put'. Especially when you start by drafting a sketch of code using writeln functions, then convert it an output range. Seems an obvious thing, I'm wondering if I missed something. Are there ways to use writeln style functions with output ranges? --Jon
Re: compile-time test against dmd/phobos version number
On Saturday, 7 January 2017 at 02:41:54 UTC, ketmar wrote: On Saturday, 7 January 2017 at 02:30:53 UTC, Jon Degenhardt wrote: Is there a way to make a compile time check against the dmd/phobos version number? Functionally, what I'd like to achieve would be equivalent to: version(dmdVersion >= 2.070.1) { } else { ... } static if (__VERSION__ == 2072) { wow, it's dmd 2.072! } Perfect, thank you!
compile-time test against dmd/phobos version number
Is there a way to make a compile time check against the dmd/phobos version number? Functionally, what I'd like to achieve would be equivalent to: version(dmdVersion >= 2.070.1) { } else { ... } I think I've seen something like this, probably using 'static if', but can't find it now. What I'm really trying to do is test for existence of a specific enhancement in phobos, if it's present, use it, otherwise don't. Testing for a particular phobos release number seems the obvious thing to do. --Jon
Re: Constructing a variadic template parameter with source in two files
On Thursday, 22 December 2016 at 07:33:42 UTC, Ali Çehreli wrote: On 12/21/2016 07:59 PM, Jon Degenhardt wrote: > construct the 'opts' parameter from > definitions stored in two or more files. The reason for doing this is to > create a customization mechanism where-by there are a number of default > capabilities built-in to the main code base, but someone can customize > their copy of the code, putting definitions in a separate file, and have > it added in at compile time, including modifying command line arguments. I'm not sure this is any better than your mixin solution but getopt can be called multiple times on the same arguments. So, for example common code can parse them for its arguments and special code can parse them for its arguments. [...] Yes, that might work, thanks. I'll need to work on the code structure a bit (there are a couple other nuances to account for), but might be able to make it work. The mixin approach feels a bit brittle. --Jon
Constructing a variadic template parameter with source in two files
I'd like to find a way to define programming constructs in one file and reference them in a getopt call defined in another file. getopt uses variadic template argument, so the argument list must be known at compile time. The std.getopt.getopt signature: GetoptResult getopt(T...)(ref string[] args, T opts) So, what I'm trying to do is construct the 'opts' parameter from definitions stored in two or more files. The reason for doing this is to create a customization mechanism where-by there are a number of default capabilities built-in to the main code base, but someone can customize their copy of the code, putting definitions in a separate file, and have it added in at compile time, including modifying command line arguments. I found a way to do this with a mixin template, shown below. However, it doesn't strike me as a particularly modular design. My question - Is there a better approach? The solution I identified is below. The '--say-hello' option is built-in (defined in app.d), the '--say-hello-world' command is defined in custom_commands.d. Running: $ ./app --say-hello --say-hello-world will print: Hello Hello World Which is the goal. But, is there a better way? Help appreciated. --Jon === command_base.d === /* API for defining "commands". */ interface Command { string exec(); } class BaseCommand : Command { private string _result; this (string result) { _result = result; } final string exec() { return _result; } } === custom_commands.d === /* Defines custom commands and a mixin for generating the getopt argument. * Note that 'commandArgHandler' is defined in app.d, not visible in this file. */ import command_base; class HelloWorldCommand : BaseCommand { this() { super("Hello World"); } } mixin template CustomCommandDeclarations() { import std.meta; auto pHelloWorldHandler = &commandArgHandler!HelloWorldCommand; alias CustomCommandOptions = AliasSeq!( "say-hello-world", "Print 'hello world'.", pHelloWorldHandler, ); } === app.d === /* This puts it all together. It creates built-in commands and uses the mixin from * custom_commands.d to declare commands and construct the getopt argument. */ import std.stdio; import command_base; class HelloCommand : BaseCommand { this() { super("Hello"); } } struct CmdOptions { import std.meta; Command[] commands; void commandArgHandler(DerivedCommand : BaseCommand)() { commands ~= new DerivedCommand(); } bool processArgs (ref string[] cmdArgs) { import std.getopt; import custom_commands; auto pHelloHandler = &commandArgHandler!HelloCommand; alias BuiltinCommandOptions = AliasSeq!( "say-hello", "Print 'hello'.", pHelloHandler, ); mixin CustomCommandDeclarations; auto CommandOptions = AliasSeq!(BuiltinCommandOptions, CustomCommandOptions); auto r = getopt(cmdArgs, CommandOptions); if (r.helpWanted) defaultGetoptPrinter("Options:", r.options); return !r.helpWanted; // Return true if execution should continue } } void main(string[] cmdArgs) { CmdOptions cmdopt; if (cmdopt.processArgs(cmdArgs)) foreach (cmd; cmdopt.commands) writeln(cmd.exec()); }
Re: [Semi-OT] I don't want to leave this language!
On Wednesday, 7 December 2016 at 16:33:03 UTC, bachmeier wrote: On Wednesday, 7 December 2016 at 12:12:56 UTC, Ilya Yaroshenko wrote: R, Matlab, Python, Mathematica, Gauss, and Julia use C libs. --Ilya You can call into those same C libs using D. Only if you want a pure D solution do you need to be able to rewrite those libraries and get the same performance. D is a fine solution for the academic or the working statistician that is doing day-to-day analysis. The GC and runtime are not going to be an obstacle for most of them (and most won't even know anything about them). A cycle I think is common is for a researcher (industry or academic) to write functionality in native R code, then when trying to scale it, finds native R code is too slow, and switches to C/C++ to create a library used in R. C/C++ is chosen not because it the preferred choice, but because it is the common choice. In such situations, the performance need is often to be quite a bit faster than native R code, not that it reach zero overhead. My personal opinion, but I do think D would be a very good choice here, run-time, phobos, gc, etc., included. The larger barrier to entry is more about ease of getting started, community (are others using this approach), etc., and less about having the absolutely most optimal performance. (There are obviously areas where the most optimal performance is critical, Mir seems to be targeting a number of them.) For D to compete directly with R, Python, Julia, in these communities then some additional capabilities are probably needed, like a repl, standard scientific packages, etc.
Re: Impressed with Appender - Is there design/implementation description?
On Tuesday, 6 December 2016 at 15:29:59 UTC, Jonathan M Davis wrote: On Tuesday, December 06, 2016 13:19:22 Anonymouse via Digitalmars-d-learn wrote: On Tuesday, 6 December 2016 at 10:52:44 UTC, thedeemon wrote: [...] > 2. Up until 4 KB it reallocates when growing, but after 4 KB > the array lives in a larger pool of memory where it can > often grow a lot without reallocating, so in many scenarios > where other allocations do not interfere, the data array of > appender grows in place without copying any data, thanks to > GC.extend() method. I always assumed it kept its own manually allocated array on a malloc heap :O No. The main thing that Appender does is reduce the number of checks required for whether there's room for the array to append in place, because that check is a good chunk of why ~= is expensive for arrays. [...] Thanks everyone for the explanations. I should probably look into my data and see how often I'm reaching the 4kb size triggering GC.extend() use. --Jon
Impressed with Appender - Is there design/implementation description?
I've been using Appender quite a bit recently, typically when I need append-only arrays with variable and unknown final sizes. I had been prepared to write a custom data structure when I started using it with large amounts of data, but very nicely this has not surfaced as a need. Appender has held up quite well. I haven't actually benchmarked it against competing data structures, nor have I studied the implementation. I'd be very interested in understanding the design and how it compares to other data structures. Are there any write-ups or articles describing it? --Jon
Re: passing static arrays to each! with a ref param [Re: Why can't static arrays be sorted?]
On Tuesday, 11 October 2016 at 19:46:31 UTC, Jon Degenhardt wrote: On Tuesday, 11 October 2016 at 18:18:41 UTC, ag0aep6g wrote: On 10/11/2016 06:24 AM, Jon Degenhardt wrote: The example I gave uses ref parameters. On the surface it would seem reasonable to that passing a static array by ref would allow it to be modified, without having to slice it first. Your ref parameters are only for the per-element operations. You're not passing the array as a whole by reference. And you can't, because `each` itself takes the whole range by copy. So, the by-ref increments themselves do work, but they're applied to a copy of your original static array. I see. Thanks for the explanation. I wasn't thinking it through properly. Also, I guess I had assumed that the intent was that each! be able to modify the elements, and therefore the whole array it would be pass by reference, but didn't consider it properly. Another perspective where the current behavior could be confusing is that it is somewhat natural to assume that 'each' is the functional equivalent of foreach, and that they can be used interchangeably. However, for static arrays they cannot be.
Re: passing static arrays to each! with a ref param [Re: Why can't static arrays be sorted?]
On Tuesday, 11 October 2016 at 18:18:41 UTC, ag0aep6g wrote: On 10/11/2016 06:24 AM, Jon Degenhardt wrote: The example I gave uses ref parameters. On the surface it would seem reasonable to that passing a static array by ref would allow it to be modified, without having to slice it first. Your ref parameters are only for the per-element operations. You're not passing the array as a whole by reference. And you can't, because `each` itself takes the whole range by copy. So, the by-ref increments themselves do work, but they're applied to a copy of your original static array. I see. Thanks for the explanation. I wasn't thinking it through properly. Also, I guess I had assumed that the intent was that each! be able to modify the elements, and therefore the whole array it would be pass by reference, but didn't consider it properly. I'm not going to make any suggestions about whether the behavior should be changed. At some point when I get a bit of time I'll try to submit a documentation change to make the current behavior clearer. --Jon
passing static arrays to each! with a ref param [Re: Why can't static arrays be sorted?]
On Monday, 10 October 2016 at 16:46:55 UTC, Jonathan M Davis wrote: On Monday, October 10, 2016 16:29:41 TheGag96 via Digitalmars-d-learn wrote: On Saturday, 8 October 2016 at 21:14:43 UTC, Jon Degenhardt wrote: > This distinction is a bit on the nuanced side. Is it > behaving as it should? > > --Jon I think so? It's not being modified in the second case because the array is being passed by value... "x" there is a reference to an element of the copy created to be passed to each(). I assume there's a good reason why ranges in general are passed by value into these functions -- except in this one case, the stuff inside range types copied when passed by value won't be whole arrays, I'm guessing. Whether it's by value depends entirely on the type of the range. They're passed around, and copying them has whatever semantics it has. In most cases, it copies the state of the range but doesn't copy all of the elements (e.g. that's what happens with a dynamic array, since it gets sliced). But if a range is a class, then it's definitely a reference type. The only way to properly save the state of a range is to call save. But passing by ref would make no sense at all with input ranges. It would completely kill chaining them. Almost all range-based functions return rvalues. - Jonathan M Davis The example I gave uses ref parameters. On the surface it would seem reasonable to that passing a static array by ref would allow it to be modified, without having to slice it first. The documentation says: // If the range supports it, the value can be mutated in place arr.each!((ref n) => n++); assert(arr == [1, 2, 3, 4, 5]); but, 'arr' is a dynamic array, so technically it's not describing a static array (the opApply case). Expanding the example, using foreach with ref parameters will modify the static array in place, without slicing it. I would have expected each! with a ref parameter to behave the same. At a minimum this could be better documented, but it may also be a bug. Example: T increment(T)(ref T x) { return x++; } void main() { import std.algorithm : each; int[] dynamicArray = [1, 2, 3, 4, 5]; int[5] staticArray = [1, 2, 3, 4, 5]; dynamicArray.each!(x => x++); // Dynamic array by value assert(dynamicArray == [1, 2, 3, 4, 5]); // ==> Not modified dynamicArray.each!((ref x) => x++); // Dynamic array by ref assert(dynamicArray == [2, 3, 4, 5, 6]); // ==> Modified staticArray[].each!((ref x) => x++); // Slice of static array, by ref assert(staticArray == [2, 3, 4, 5, 6]); // ==> Modified staticArray.each!((ref x) => x++);// Static array by ref assert(staticArray == [2, 3, 4, 5, 6]); // ==> Not Modified /* Similar to above, using foreach and ref params. */ foreach (ref x; dynamicArray) x.increment; assert(dynamicArray == [3, 4, 5, 6, 7]); // Dynamic array => Modified foreach (ref x; staticArray[]) x.increment; assert(staticArray == [3, 4, 5, 6, 7]); // Static array slice => Modified foreach (ref x; staticArray) x.increment; assert(staticArray == [4, 5, 6, 7, 8]); // Static array => Modified }
Re: Why can't static arrays be sorted?
On Thursday, 6 October 2016 at 20:11:17 UTC, ag0aep6g wrote: On 10/06/2016 09:54 PM, TheGag96 wrote: Interestingly enough, I found that using .each() actually compiles without the [] [...] why can the compiler consider it a range here but not .sort()? each is not restricted to ranges. It accepts other `foreach`-ables, too. The documentation says that it "also supports opApply-based iterators", but it's really anything that foreach accepts. [snip] Thanks! Explains some things. I knew each! was callable in different circumstances than other functional operations, but hadn't quite figured it out. Looks like reduce! and fold! also take iterables. There also appears to be a distinction between the iterator and range cases when a ref parameter is used. As it iterator, each! won't modify the reference. Example: void main() { import std.algorithm : each; int[] dynamicArray = [1, 2, 3, 4, 5]; int[5] staticArray = [1, 2, 3, 4, 5]; dynamicArray.each!((ref x) => x++); assert(dynamicArray == [2, 3, 4, 5, 6]); // modified staticArray.each!((ref x) => x++); assert(staticArray == [1, 2, 3, 4, 5]); // not modified staticArray[].each!((ref x) => x++); assert(staticArray == [2, 3, 4, 5, 6]); // modified } This distinction is a bit on the nuanced side. Is it behaving as it should? --Jon
Re: Iterate over two arguments at once
On Monday, 19 September 2016 at 18:10:22 UTC, bachmeier wrote: Suppose I want to iterate over two arrays at once: foreach(v1, v2; [1.5, 2.5, 3.5], [4.5, 5.5, 6.5]) { ... } I have seen a way to do this but cannot remember what it is and cannot find it. range.lockstep: https://dlang.org/phobos/std_range.html#lockstep
Re: Instantiating a class with range template parameter
On Thursday, 8 September 2016 at 08:44:54 UTC, Lodovico Giaretta wrote: On Thursday, 8 September 2016 at 08:20:49 UTC, Jon Degenhardt wrote: [snip] I think that auto x = new Derived!(typeof(stdout.lockingTextWriter()))(); // note the parenthesis should work. But usually, you save the writer inside the object and make a free function called `derived` (same as the class, but with lowercase first). You define it this way: auto derived(OutputRange)(auto ref OutputRange writer) { auto result = new Derived!OutputRange(); result.writer = writer; // save the writer in a field of the object return result; } void main() { auto x = derived(stdout.lockingTextWriter); x.writeString("Hello world"); // the writer is saved in the object, no need to pass it } Yes, the form you suggested works, thanks! And thanks for the class structuring suggestion, it has some nice properties.
Instantiating a class with range template parameter
I've been generalizing output routines by passing an OutputRange as an argument. This gets interesting when the output routine is an virtual function. Virtual functions cannot be templates, so instead the template parameters need to be part of class definition and specified when instantiating the class. An example is below. It works fine. One thing I can't figure out: how to provide the range parameter without first declaring a variable of the appropriate type. What works is something like: auto writer = stdout.lockingTextWriter; auto x = new Derived!(typeof(writer)); Other forms I've tried fail to compile. For example, this fails: auto x = new Derived!(typeof(stdout.lockingTextWriter)); I'm curious if this can be done without declaring the variable first. Anyone happen to know? --Jon Full example: import std.stdio; import std.range; class Base(OutputRange) { abstract void writeString(OutputRange r, string s); } class Derived(OutputRange) : Base!OutputRange { override void writeString(OutputRange r, string s) { put(r, s); put(r, '\n'); } } void main() { auto writer = stdout.lockingTextWriter; auto x = new Derived!(typeof(writer)); x.writeString(writer, "Hello World"); }
Re: Template constraints for reference/value types?
On Wednesday, 7 September 2016 at 00:40:27 UTC, Jonathan M Davis wrote: On Tuesday, September 06, 2016 21:16:05 Jon Degenhardt via Digitalmars-d-learn wrote: On Tuesday, 6 September 2016 at 21:00:53 UTC, Lodovico Giaretta wrote: > On Tuesday, 6 September 2016 at 20:46:54 UTC, Jon Degenhardt > > wrote: >> Is there a way to constrain template arguments to reference >> or value types? I'd like to do something like: >> >> T foo(T)(T x) >> >> if (isReferenceType!T) >> >> { ... } >> >> --Jon > > You can use `if(is(T : class) || is(T : interface))`. > > If you also need other types, std.traits contains a bunch of > useful templates: isArray, isAssociativeArray, isPointer, ... Thanks. This looks like a practical approach. It'll get you most of the way there, but I don't think that it's actually possible to test for reference types in the general case [snip] - Jonathan M Davis Thanks, very helpful. I've concluded that what I wanted to do isn't worth pursuing at the moment (see the thread on associative arrays in the General forum). However, your description is helpful to understand the details involved.
Re: Template constraints for reference/value types?
On Tuesday, 6 September 2016 at 21:00:53 UTC, Lodovico Giaretta wrote: On Tuesday, 6 September 2016 at 20:46:54 UTC, Jon Degenhardt wrote: Is there a way to constrain template arguments to reference or value types? I'd like to do something like: T foo(T)(T x) if (isReferenceType!T) { ... } --Jon You can use `if(is(T : class) || is(T : interface))`. If you also need other types, std.traits contains a bunch of useful templates: isArray, isAssociativeArray, isPointer, ... Thanks. This looks like a practical approach.
Template constraints for reference/value types?
Is there a way to constrain template arguments to reference or value types? I'd like to do something like: T foo(T)(T x) if (isReferenceType!T) { ... } --Jon
Re: Why D isn't the next "big thing" already
On Saturday, 30 July 2016 at 22:52:23 UTC, bachmeier wrote: On Saturday, 30 July 2016 at 12:30:55 UTC, LaTeigne wrote: On Saturday, 30 July 2016 at 12:24:55 UTC, ketmar wrote: On Saturday, 30 July 2016 at 12:18:08 UTC, LaTeigne wrote: it you think that you know the things better than somebody who actually *lived* there in those times... well, keep thinking that. also, don't forget to teach physics to physicians, medicine to medics, and so on. i'm pretty sure that you will have a great success as a stupidiest comic they ever seen in their life. also, don't bother answering me, i won't see it anyway. Fucking schyzo ;) Have you took your little pills today ? Well this is beautiful marketing for the language. At some point, the leadership will need to put away ideology and get realistic about what belongs on this site. I agree with this sentiment. One of D's strengths is the helpful responses on the Learn forum. It is something the D community can be proud of. Participants in such personal attacks may view it as primarily as a 1-1 interchange, but they do take away from this strength. Better would be to move personal conflicts to some other venue.
Re: Is there a way to clear an OutBuffer?
On Wednesday, 25 May 2016 at 19:42:43 UTC, Gary Willoughby wrote: On Monday, 23 May 2016 at 03:03:12 UTC, Jon Degenhardt wrote: Currently not possible. Enhancement request perhaps? Looking at the implementation, setting its 'offset' member seems to work. Based on example from documentation: import std.outbuffer; void main() { OutBuffer b = new OutBuffer(); b.writefln("a%sb", 16); assert(b.toString() == "a16b\n"); b.offset = 0; b.writefln("a%sb", 16); assert(b.toString() == "a16b\n"); } Bug report perhaps? :) Ali Thanks. Enhancement request: https://issues.dlang.org/show_bug.cgi?id=16062 Is there a consensus on this? Does this really need a clear method seeing as though you can reset the offset directly? As an end-user, I'd have more confidence using a documented mechanism. If it's setting a public member variable, fine, if it's a method, also fine. The 'offset' member is not part of the publicly documented API. Looking at the implementation, it doesn't appear 'offset' is intended to be part of the API. Personally, I'd add a method to keep 'offset' out of the public API. However, simply documenting it is an option as well.
Re: Is there a way to clear an OutBuffer?
On Sunday, 22 May 2016 at 23:01:07 UTC, Ali Çehreli wrote: On 05/22/2016 11:59 AM, Jon Degenhardt wrote: Is there a way to clear an OutBuffer, but without freeing the internally managed buffer? Something similar to std.array.appender.clear method. Intent would be to reuse the OutBuffer, but without reallocating memory for the buffer. --Jon Currently not possible. Enhancement request perhaps? Looking at the implementation, setting its 'offset' member seems to work. Based on example from documentation: import std.outbuffer; void main() { OutBuffer b = new OutBuffer(); b.writefln("a%sb", 16); assert(b.toString() == "a16b\n"); b.offset = 0; b.writefln("a%sb", 16); assert(b.toString() == "a16b\n"); } Bug report perhaps? :) Ali Thanks. Enhancement request: https://issues.dlang.org/show_bug.cgi?id=16062
Is there a way to clear an OutBuffer?
Is there a way to clear an OutBuffer, but without freeing the internally managed buffer? Something similar to std.array.appender.clear method. Intent would be to reuse the OutBuffer, but without reallocating memory for the buffer. --Jon