Re: Efficiently streaming data to associative array
On Wednesday, 9 August 2017 at 10:00:14 UTC, kerdemdemir wrote: As a total beginner I am feeling a bit not comfortable with basic operations in AA. First even I am very happy we have pointers but using pointers in a common operation like this IMHO makes the language a bit not safe. Second "in" keyword always seemed so specific to me. I think I will use your solution "ref Value GetWithDefault(Value)" very often since it hides the two things above. You don't need this most of the time, if you already have the correct type it's easy: size_t[string][string] indexed_map; string a, b; // a and b are strings not char[] indexed_map[a][b] = value; // this will create the AA slots if needed In my specific case the data is streamed from stdin and is not kept in memory. byLine returns a view of the stdin buffer which may be replaced at the next for-loop iteration so I can't use the index operator directly, I need a string that does not change over time. I could have used this code: void main() { size_t[string][string] indexed_map; foreach(char[] line ; stdin.byLine) { char[] a; char[] b; size_t value; line.formattedRead!"%s,%s,%d"(a,b,value); indexed_map[a.idup][b.idup] = value; } indexed_map.writeln; } It's perfectly ok if data is small. In my case data is huge and creating a copy of the strings at each iteration is costly.
Re: Efficiently streaming data to associative array
On Tuesday, 8 August 2017 at 16:00:17 UTC, Steven Schveighoffer wrote: On 8/8/17 11:28 AM, Guillaume Chatelet wrote: Let's say I'm processing MB of data, I'm lazily iterating over the incoming lines storing data in an associative array. I don't want to copy unless I have to. Contrived example follows: input file -- a,b,15 c,d,12 Efficient ingestion --- void main() { size_t[string][string] indexed_map; foreach(char[] line ; stdin.byLine) { char[] a; char[] b; size_t value; line.formattedRead!"%s,%s,%d"(a,b,value); auto pA = a in indexed_map; if(pA is null) { pA = &(indexed_map[a.idup] = (size_t[string]).init); } auto pB = b in (*pA); if(pB is null) { pB = &((*pA)[b.idup] = size_t.init } // Technically unneeded but let's say we have more than 2 dimensions. (*pB) = value; } indexed_map.writeln; } I qualify this code as ugly but fast. Any idea on how to make this less ugly? Is there something in Phobos to help? I wouldn't use formattedRead, as I think this is going to allocate temporaries for a and b. Note, this is very close to Jon Degenhardt's blog post in May: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ -Steve I haven't yet dug into formattedRead but thx for letting me know : ) I was mostly speaking about the pattern with the AA. I guess the best I can do is a templated function to hide the ugliness. ref Value GetWithDefault(Value)(ref Value[string] map, const (char[]) key) { auto pValue = key in map; if(pValue) return *pValue; return map[key.idup] = Value.init; } void main() { size_t[string][string] indexed_map; foreach(char[] line ; stdin.byLine) { char[] a; char[] b; size_t value; line.formattedRead!"%s,%s,%d"(a,b,value); indexed_map.GetWithDefault(a).GetWithDefault(b) = value; } indexed_map.writeln; } Not too bad actually !
Efficiently streaming data to associative array
Let's say I'm processing MB of data, I'm lazily iterating over the incoming lines storing data in an associative array. I don't want to copy unless I have to. Contrived example follows: input file -- a,b,15 c,d,12 ... Efficient ingestion --- void main() { size_t[string][string] indexed_map; foreach(char[] line ; stdin.byLine) { char[] a; char[] b; size_t value; line.formattedRead!"%s,%s,%d"(a,b,value); auto pA = a in indexed_map; if(pA is null) { pA = &(indexed_map[a.idup] = (size_t[string]).init); } auto pB = b in (*pA); if(pB is null) { pB = &((*pA)[b.idup] = size_t.init); } // Technically unneeded but let's say we have more than 2 dimensions. (*pB) = value; } indexed_map.writeln; } I qualify this code as ugly but fast. Any idea on how to make this less ugly? Is there something in Phobos to help?
Re: Floating point rounding
On Thursday, 2 March 2017 at 21:34:56 UTC, ag0aep6g wrote: On 03/02/2017 10:10 PM, Guillaume Chatelet wrote: On Thursday, 2 March 2017 at 20:30:47 UTC, Guillaume Chatelet wrote: Here is the same code in D: void main(string[] args) { import std.math; FloatingPointControl fpctrl; fpctrl.rounding = FloatingPointControl.roundUp; writefln("%.32g", float.min_normal + 1.0f); } Execution on my machine yields: dmd -run test_denormal.d 1 Did I miss something? This example is closer to the C++ one: void main(string[] args) { import core.stdc.fenv; fesetround(FE_UPWARD); writefln("%.32g", float.min_normal + 1.0f); } It still yields "1" This prints the same as the C++ version: void main(string[] args) { import std.stdio; import core.stdc.fenv; fesetround(FE_UPWARD); float x = 1.0f; x += float.min_normal; writefln("%.32g", x); } Soo, a bug/limitation of constant folding? With FloatingPointControl it still prints "1". Does FloatingPointControl.rounding do something different than fesetround? The example in the docs [1] only shows how it changes rint's behavior. [1] http://dlang.org/phobos/std_math.html#.FloatingPointControl Thx for the investigation! Here is the code for FloatingPointControl https://github.com/dlang/phobos/blob/master/std/math.d#L4809 Other code (enableExceptions / disableExceptions) seems to have two code path depending on "version(X86_Any)", rounding doesn't. Maybe that's the bug?
Re: Floating point rounding
On Thursday, 2 March 2017 at 20:30:47 UTC, Guillaume Chatelet wrote: Here is the same code in D: void main(string[] args) { import std.math; FloatingPointControl fpctrl; fpctrl.rounding = FloatingPointControl.roundUp; writefln("%.32g", float.min_normal + 1.0f); } Execution on my machine yields: dmd -run test_denormal.d 1 Did I miss something? This example is closer to the C++ one: void main(string[] args) { import core.stdc.fenv; fesetround(FE_UPWARD); writefln("%.32g", float.min_normal + 1.0f); } It still yields "1"
Floating point rounding
I would expect that (1.0f + smallest float subnormal) > 1.0f when the Floating Point unit is set to Round Up. Here is some C++ code: #include #include #include int main(int, char**) { std::fesetround(FE_UPWARD); printf("%.32g\n", std::numeric_limits::denorm_min() + 1.0f); return 0; } Execution on my machine yields: clang++ --std=c++11 test_denormal.cc && ./a.out 1.0011920928955078125 Here is the same code in D: void main(string[] args) { import std.math; FloatingPointControl fpctrl; fpctrl.rounding = FloatingPointControl.roundUp; writefln("%.32g", float.min_normal + 1.0f); } Execution on my machine yields: dmd -run test_denormal.d 1 Did I miss something?
Re: Bug in csv or byLine ?
On Sunday, 10 January 2016 at 19:50:15 UTC, Tobi G. wrote: On Sunday, 10 January 2016 at 19:07:52 UTC, Jesse Phillips wrote: On Sunday, 10 January 2016 at 18:09:23 UTC, Tobi G. wrote: The bug has been fixed... Do you have a link for the fix? Is there a BugZilla entry? Yes sure.. https://issues.dlang.org/show_bug.cgi?id=15545 and the fix at github https://github.com/D-Programming-Language/phobos/pull/3917 togrue Thx for the fix !
Bug in csv or byLine ?
$ cat debug.csv timestamp,curr_property 2015-12-01 06:07:55,7035 $ cat process.d import std.stdio; import std.csv; import std.algorithm; import std.file; void main(string[] args) { version (Fail) { File(args[1], "r").byLine.joiner("\n").csvReader.each!writeln; } else { readText(args[1]).csvReader.each!writeln; } } $ dmd -run ./process.d debug.csv ["timestamp", "curr_property"] ["2015-12-01 06:07:55", "7035"] $ dmd -version=Fail -run ./process.d debug.csv ["timestamp", "curr_property"] ["2015-12-01 06:07:55", "7035"] core.exception.AssertError@std/algorithm/iteration.d(2027): Assertion failure ??:? _d_assert [0x4633d3] ??:? void std.algorithm.iteration.__assert(int) [0x46d770] ??:? pure @property @safe dchar std.algorithm.iteration.joiner!(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).joiner(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).Result.front() [0x44eaf0] ??:? void std.csv.CsvReader!(immutable(char)[], 1, std.algorithm.iteration.joiner!(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).joiner(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).Result, dchar, immutable(char)[][]).CsvReader.popFront() [0x44f7fc] ??:? void std.algorithm.iteration.__T4eachS183std5stdio7writelnZ.each!(std.csv.CsvReader!(immutable(char)[], 1, std.algorithm.iteration.joiner!(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).joiner(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).Result, dchar, immutable(char)[][]).CsvReader).each(std.csv.CsvReader!(immutable(char)[], 1, std.algorithm.iteration.joiner!(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).joiner(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).Result, dchar, immutable(char)[][]).CsvReader) [0x4608f7] ??:? _Dmain [0x44bc93] Any idea ?
Re: Bug in csv or byLine ?
On Friday, 8 January 2016 at 13:22:40 UTC, Tobi G. wrote: On Friday, 8 January 2016 at 12:13:59 UTC, Guillaume Chatelet wrote: On Friday, 8 January 2016 at 12:07:05 UTC, Tobi G. wrote: No, sorry. Under Windows DMD v2.069.2 it works perfectly in both cases. Which compiler do you use? - DMD64 D Compiler v2.069.2 on Linux. - LDC 0.16.1 (DMD v2.067.1, LLVM 3.7.0) I ran it now under Linux/Ubuntu DMD64 D Compiler v2.069.2 But both still worked.. Are there some characters in your input data which are invalid and not displayed in the forum? (multiple empty lines after the actual csv data for example) togrue Indeed there's an empty line at the end of the csv. Interestingly enough if I try with DMD64 D Compiler v2.069, the Fail version runs fine but the normal version returns: std.csv.CSVException@/usr/include/dlang/dmd/std/csv.d(1246): Row 3's length 1 does not match previous length of 2.
Re: Bug in csv or byLine ?
On Friday, 8 January 2016 at 12:07:05 UTC, Tobi G. wrote: No, sorry. Under Windows DMD v2.069.2 it works perfectly in both cases. Which compiler do you use? - DMD64 D Compiler v2.069.2 on Linux. - LDC 0.16.1 (DMD v2.067.1, LLVM 3.7.0) So if it works on windows I guess it's a problem with the File implementation. You could run DMD with the -g option. This will print often more useful output, if it fails. -g didn't bring much. core.exception.AssertError@std/algorithm/iteration.d(2027): Assertion failure ??:? _d_assert [0x4a9c33] ??:? void std.algorithm.iteration.__assert(int) [0x4b8048] /usr/include/dmd/phobos/std/algorithm/iteration.d:2027 pure @property @safe dchar std.algorithm.iteration.joiner!(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).joiner(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).Result.front() [0x495330] /usr/include/dmd/phobos/std/csv.d:1018 void std.csv.CsvReader!(immutable(char)[], 1, std.algorithm.iteration.joiner!(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).joiner(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).Result, dchar, immutable(char)[][]).CsvReader.popFront() [0x49608c] /usr/include/dmd/phobos/std/algorithm/iteration.d:881 void std.algorithm.iteration.__T4eachS183std5stdio7writelnZ.each!(std.csv.CsvReader!(immutable(char)[], 1, std.algorithm.iteration.joiner!(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).joiner(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).Result, dchar, immutable(char)[][]).CsvReader).each(std.csv.CsvReader!(immutable(char)[], 1, std.algorithm.iteration.joiner!(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).joiner(std.stdio.File.ByLine!(char, char).ByLine, immutable(char)[]).Result, dchar, immutable(char)[][]).CsvReader) [0x4a5063] ./process.d:8 _Dmain [0x49226c]
Re: Idiomatic adjacent_difference
On Friday, 16 October 2015 at 11:38:35 UTC, John Colvin wrote: import std.range, std.algorithm; auto slidingWindow(R)(R r, size_t n) if(isForwardRange!R) { //you could definitely do this with less overhead return roundRobin(r.chunks(n), r.save.drop(1).chunks(n)) .filter!(p => p.length == n); } auto adjacentDiff(R)(R r) { return r.slidingWindow(2).map!"a[1] - a[0]"; } Nice ! I wanted to use lockstep(r, r.dropOne) but it doesn't return a Range :-/ It has to be used in a foreach.
Idiomatic adjacent_difference
Is there an idiomatic way to do: int[] numbers = [0, 1, 2, 3]; assert(adjacent_diff(numbers) == [1, 1, 1]); I can't find something useful in the std library.
Re: Idiomatic adjacent_difference
On Friday, 16 October 2015 at 12:03:56 UTC, Per Nordlöw wrote: On Friday, 16 October 2015 at 11:48:19 UTC, Edwin van Leeuwen wrote: zip(r, r[1..$]).map!((t) => t[1]-t[0]); And for InputRanges (not requiring random-access): zip(r, r.dropOne).map!((t) => t[1]-t[0]); That's neat. Thx guys :)
Re: Nested C++ namespace library linking
On Wednesday, 21 January 2015 at 14:59:15 UTC, John Colvin wrote: Looks like a bug to me, for sure. In the mean-time you may be able to use some pragma(mangle, ...) hacks to force the compiler to emit the right symbols. Thx John, extern(C++, A.B) { struct Type {} pragma(mangle,_ZN1A1B3fooENS0_4TypeE) int foo(Type unused); } is indeed linking correctly :)
Nested C++ namespace library linking
Consider the following foo.cpp namespace A { namespace B { struct Type {}; int foo(Type unused){ return 42; } } } Compile it : g++ foo.cpp -c -o foo.o Then the following main.d extern(C++, A.B) { struct Type {} int foo(Type unused); } void main() { foo(Type()); } Compile it : dmd main.d foo.o It fails with : undefined reference to « A::B::foo(A::Type) » It looks like the Type is not resolved in the right namespace. A::Type instead of A::B::Type. Did I miss something or is this a bug ? I also tried fully qualifying foo and Type but I end up with the exact same error : A.B.foo(A.B.Type());
Re: Nested C++ namespace library linking
That's what I thought. I reported this bug a while ago but it didn't get a lot of attention. https://issues.dlang.org/show_bug.cgi?id=13337
Re: Should formattedWrite take the outputrange by ref?
+1 I've been bitten by this also.