On Tue, May 15, 2012 at 04:42:39PM +0200, dcoder wrote: > On Monday, 14 May 2012 at 09:00:14 UTC, simendsjo wrote: [...] > >I believe byLine reuses the internal buffer. Try duping the lines: > > > > > auto i = f.byLine().map!"a.idup"().array(); > > > Can someone please explain to me the last line? > > I'm trying to learn D, by playing with code and reading this forum. > I'm a slow learner. :) > > Anyways, I looked at std.stdio code and noticed that byLine resturns > a struct ByLine, but where does the .map come from? Thanks!
map is one of the generic algorithms from std.algorithm. It "maps" the expression "a.idup" to each element returned by byLine(). I'll try to explain slowly. First, f.byLine() returns a range of lines, that is, a struct that has the members .front, .popFront(), and .empty. Anything that implements these three methods can be treated as a sort of generic "array", which we call a "range". By not requiring a specific type for map (and other algorithms in std.algorithm), we allow any kind of concrete type to be used without needing any explicit conversions. In a nutshell, the struct that f.byLine() returns is, abstractly speaking, a range of lines that you can iterate over. Second, .map!"a.idup"() is making use of Unified Function Call Syntax (UFCS), which is a neat feature of D that if you use member invocation syntax obj.memb(x,y,z), but memb isn't a member of obj, then the compiler quietly rewrites the call to be memb(obj,x,y,z). So, the line up to the .map call is actually treated by the compiler as: map!"a.idup"(f.byLine()) that is, it takes the expression "a.idup" and applies it to each of the lines in f.byLine(), substituting "a" with each respective line. This, in effect, creates another range of lines, which is the range resulting from calling .idup on each line returned by f.byLine(). In other words, this makes a copy of each line returned by f.byLine(). Finally, the .array() call at the end turns the range returned by map() back into an array. This is needed because, just as map() takes a range as input (remember, a range is anything with the members .front, .popFront(), and .empty), it also returns a range. What it returns is an internal object that implements the range methods .front, .popFront(), and .empty, and which iterates over each element of the result. This internal object is, in general, not the same as an actual array, so to get an array out of it, we need to explicitly make an array from it using .array(). Here, again, UFCS is exploited: std.algorithm actually defines a function called array(R), which takes a single parameter, a range, to be turned into an array. Since the range returned by map() doesn't have a member called "array", when you write f.byLine().map!"a.idup"().array(), it gets translated into: array(map!"a.idup"(f.byLine())) which is how you'd write this in traditional function-call syntax. UFCS, however, lets you write things in the order they happen, which some find to be more readable: f.byLine().map!"a.idup"().array() means "take f, get a range of its lines, map the expression "a.idup" to the lines, then make an array out of that". The key to this line is the map!"a.idup", which makes a copy of each line returned by f.byLine(). The reason this is necessary is because byLine() doesn't return an array; it _reuses_ an internal buffer to store each line it reads from f. So if you didn't duplicate each line, by the time the next line is read, the original line has been overwritten, so you'll get garbage data. Using map!"a.idup" essentially means "call .idup on every line returned", thereby avoiding this problem. Hope this helps. T -- This is not a sentence.