Sry, overlooked this post. Andrei Alexandrescu wrote: > On 5/8/11 5:57 PM, Timon Gehr wrote: >> Andrei Alexandrescu wrote: >>> On 5/8/11 3:04 PM, Timon Gehr wrote: >>>> However I agree that Phobos has to provide some better input handling, >>>> since using >>>> possibly unsafe C functions is the best way to do it by now. (I think >>>> readf is >>>> severely crippled) I may try to implement a meaningful "read" function. >>> >>> Looking forward to detailed feedback about readf. It was implemented in >>> a hurry so definitely it has a long way to go. >>> >>> Andrei >> >> What I consider the most important points about readf: > >Thanks very much for providing detailed feedback. > >> 1. Whitespace handling is different than scanf. It is much stricter and even >> feels >> inconsistent, Eg: >> >> int a,b; >> >> readf("%s %s",&a,&b);//input "1 2\n" read. >> readf("%s %s",&a,&b);//input "1 2\n" read (and a==1&& b==2). > > So far so good. By design one space in readf means "skip all whitespace". > >> readf("%s",&a);//input "1\n" read. yay. >> readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too. > > I'm not seeing skipping in my tests; I do see an exception being thrown. > Here's how I test: > > import std.stdio; > void main() > { > int a, b; > readf("%s",&a); > assert(a == 1); > readf("%s",&b); > assert(b == 2); > } > > dmd ./test && echo '1\n 2' | ./test
I tested inputting manually in terminal. The exception is thrown only when I provide an EOF. Seems like the input is not being skipped after all, but readf does not return until there is an EOF. > I'm not hooked on this strict whitespace handling, but I think it makes > a lot of sense particularly when you want to make sure the input looks > exactly as you think it should. With scanf you can't have precise > parsing even if you wanted; with readf all you need is to insert a space. > > Precision is important. For example, Hive uses a \t for field separation > when streaming to a file. It is very important to figure that you have > one tab there versus two (two means a NULL field was in between). It should be possible to do that with scanf using %[] if I'm not mistaken. > readf("%s ",&a);//input "1 \n" read. > readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space > (!) > is missing. > > On my machine this passes: > > import std.stdio; > void main() > { > int a, b; > readf("%s ",&a); > assert(a == 1); > readf("%s ",&b); > assert(b == 2); > } > > dmd ./test && echo '1\n 2' | ./test > > The explanation is that, again, a space means "skip all whitespace". So > the first space eats the "\n " and the second space eats the final "\n" > in the input (produced by echo). Please adjust this example so it unduly > fails. Again, misinterpretation on my side. Typing into the terminal expects new input until a non-whitespace character is inserted. Should be fine, but can be surprising. > >> readf(" %s",&a);//input "1\n" read. >> readf("\t%s",&a);//input "1\n": exception is thrown. > > A "\t" in the formatting string for readf simply requires a tab. To skip > over any number of tabs, do this: > > readf("%*1[\t]%s",&a); > > That instructs readf to read, but not store, a string consisting of at > most one tab. (To skip multiple tabs drop the "1".) This functionality > is not yet implemented. I did not know it would ever be! That removes many of my concerns. (and the 'read' function removes the rest) >> readf("%s\n",&a);//input "1\n" read. >> readf("%s\n",&a);//input "1 \n": exception is thrown. > > That is as expected - if you specify \n readf expects a \n. > >> readf("%s\t\n",&a);//input "1\t\n" read. >> readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any >> further input. > > My testbed: > > import std.stdio; > > void main() > { > int a, b; > readf("%s\t\n",&a); > assert(a == 1); > readf("%s \n",&b); > assert(b == 2); > } > > dmd ./test && echo "1\t\n2 " | ./test > > It fails because it can't find the last \n. That's a bug. At least I found one. =) >> And some more, I do not remember all of them. Exceptions are most of the >> time only >> as useful as "Enforcement failed". >> >> >> You (almost?) never want this behavior, even at the points it marginally >> makes >> sense. It would be nice to have an optional whitespace-enforcing version that >> _really_ enforces it >> (as opposed to the current implementation), but that should not be the >> default. >> And then it should be consistent (also on skipping or exception throwing). > Except for one bug and one lacking implementation artifact, I find the > current behavior consistent with a strict approach to whitespace handling. Agreed. Thanks for your explanations! >> 2. readf takes pointers. Ugly, end of story. I even like C++ cin with all >> its '>>' >> more. >> scanf has that problem too, but it is a C function, you _cannot_ expect >> it to >> do any better than that. >> D has variadic template functions that may take ref parameters. It can >> be done >> entirely pointer-free. > > When I implemented readf, ref variadic arguments weren't working. I'd be > hesitant to change it right now as it does not improve actual > functionality and disrupts current uses. But I agree ideally it should > accept parameters by reference. We can have both, since it will never be possible to read in raw pointers: import std.stdio; import std.conv; private bool containsPointersImpl(T...)(){ //nesting this inside containsPointer template removes eponymous template trick. Is this a bug? foreach(t;T) static if(is(t U:U*)) return true; return false; } template containsPointers(T...){enum containsPointers=containsPointersImpl!T();} private bool onlyPointersImpl(T...)(){ foreach(t;T) static if(!is(t U:U*)) return false; return true; } template onlyPointers(T...){enum onlyPointers=onlyPointersImpl!T();} private string _readfImpl(int len){ string res="return std.stdio.stdin.readf(format,"; foreach(t;0..len) res~="&args["~to!string(t)~"], "; res~=");"; return res; } int _readf(T...)(string format, ref T args) if(!containsPointers!T){mixin(_readfImpl(T.length));} //classic definition for backwards compatibility. int _readf(T...)(string format, T args) if(onlyPointers!T){ return std.stdio.stdin.readf(format, args); } void main(){ int a; _readf(" %s",&a); writeln(a); _readf(" %s",a); writeln(a); } >> 3. nonsense like readf("mooh",&a); cannot be caught at compile time. >> When/Why did >> you throw away the idea of static overloads? It would have been a powerful >> feature, >> and very useful for this case. scanf in C/C++ does not have this problem, >> because most modern compilers generate warnings for this. But that is making >> some >> functions >> "more equal than the others" > > One early version I had was doing that and spelled > > readf!"format string"(arguments); > > Unfortunately, sometimes runtime-computed formatting strings are needed > and useful (see the recent std.log discussion...) so I decided to go > with dynamic formatting for now. Once we get that right, providing an > optional compile-time-checked formatting function shouldn't be too > difficult with CTFE. The problem I see here is that the dynamic version still cannot be checked when passed a statically known format string. Why did you drop the idea of allowing something like int readf(T...)(static string format, T args) ? > >> 4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I >> mistakenly claimed before). I think this is just a quality of implementation >> issue, but it is important. > > I agree. I'm amazed readf is not slower actually. It uses by character > file iteration, by far the slowest (and most embarrassing) code I wrote > in Phobos: each character read entails one call to getc() to fetch the > character, one call to ungetc() to restore the stream position, and > finally one more call to getc() to move forward. The code is correct but > very slow. Some C APIs provide undocumented means to peek at the next > character in the stream without actually advancing the stream, which is > what we need. I know how to do it on most Unixen and Walter knows how to > do it on his own cstdlib implementation. We didn't have the time yet, > and I'm glad the matter is under spotlight. > >> Especially for programming competitions where there are time limits, you >> do not >> want IO to unnecessarily become a mayor bottleneck. (Input files can be huge) > > Agreed. > >> Other than that, D is WAY the most convenient language I have ever tried >> to >> solve small algorithmic tasks in. >> 5. Not really readf related: There's writef(ln) and there is write(ln). And >> then >> there is readf. I will provide a proof-of-concept for the read function soon. > > Good idea. I suggest you provide a template read(T)() that mimics the > functionality of Java's nextInt, nextFloat etc: > > auto a = stdin.next!int(); > auto b = stdin.next!double(); > auto s = stdin.next!string("\n"); // read a string up to \n > ... > > > Andrei Yes, I think it should support: auto a = read!int; auto b = read!double; auto s = read!string("\n"); // this could be an overload on immutability. alternative would be read!(string,"\n"); I don not know. auto x = read!(int[])(50); // read an array of 50 integers separated by whitespace auto y = read!(int[],",")(50); // read an array of 50 integers separated by commas auto z = read!(int[],", ")(50); // read an array of 50 integers separated by commas and whitespace Plus the same for every type that can be to!type(string)'d. But also: read should replace readf wherever possible in the following forms: int a; double b; string s; read(a,b,s);//reads whitespace-separated a, b and s in turn. (delimiter could be changed by template argument or so) char[] c=new char[1000]; read(c); // only relocates c if the number of read characters exceeds 1000. One problem I see: An evildoer could provide a huge input, filling up the whole RAM. I think this vulnerability is also present in readln. Any ideas? Non-string arrays are handled this way: int[100] arr; read(arr); // reads 100 integers and stores in arr read(arr[0..20]); //reads 20 integers into the first 20 slots of arr int arr[] = new arr[100]; read(arr); //ditto Rationale: reading input should not /require/ heap activity. The read function would cover all cases where no strict whitespace handling is required, and readf would take the rest! I think that would be a very nice solution. Timon