On 5/8/11 5:57 PM, Timon Gehr wrote:
Andrei Alexandrescu wrote:
On 5/8/11 3:04 PM, Timon Gehr wrote:
However I agree that Phobos has to provide some better input handling, since 
using
possibly unsafe C functions is the best way to do it by now. (I think readf is
severely crippled) I may try to implement a meaningful "read" function.

Looking forward to detailed feedback about readf. It was implemented in
a hurry so definitely it has a long way to go.

Andrei

What I consider the most important points about readf:

Thanks very much for providing detailed feedback.

1. Whitespace handling is different than scanf. It is much stricter and even 
feels
inconsistent, Eg:

int a,b;

readf("%s %s",&a,&b);//input "1 2\n" read.
readf("%s %s",&a,&b);//input "1  2\n" read (and a==1&&  b==2).

So far so good. By design one space in readf means "skip all whitespace".

readf("%s",&a);//input "1\n" read. yay.
readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too.

I'm not seeing skipping in my tests; I do see an exception being thrown. Here's how I test:

import std.stdio;
void main()
{
    int a, b;
    readf("%s",&a);
    assert(a == 1);
    readf("%s",&b);
    assert(b == 2);
}

dmd ./test && echo '1\n 2' | ./test

The first input is read into 'a' and reading stops just at the \n. Next you're trying to read "\n 2" into b, which fails due to the strict whitespace handling. To fix this, you'd need to insert a space before the second "%s".

I'm not hooked on this strict whitespace handling, but I think it makes a lot of sense particularly when you want to make sure the input looks exactly as you think it should. With scanf you can't have precise parsing even if you wanted; with readf all you need is to insert a space.

Precision is important. For example, Hive uses a \t for field separation when streaming to a file. It is very important to figure that you have one tab there versus two (two means a NULL field was in between).

readf("%s ",&a);//input "1 \n" read.
readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space (!)
is missing.

On my machine this passes:

import std.stdio;
void main()
{
    int a, b;
    readf("%s ",&a);
    assert(a == 1);
    readf("%s ",&b);
    assert(b == 2);
}

dmd ./test && echo '1\n 2' | ./test

The explanation is that, again, a space means "skip all whitespace". So the first space eats the "\n " and the second space eats the final "\n" in the input (produced by echo). Please adjust this example so it unduly fails.

readf(" %s",&a);//input "1\n" read.
readf("\t%s",&a);//input "1\n": exception is thrown.

A "\t" in the formatting string for readf simply requires a tab. To skip over any number of tabs, do this:

readf("%*1[\t]%s",&a);

That instructs readf to read, but not store, a string consisting of at most one tab. (To skip multiple tabs drop the "1".) This functionality is not yet implemented.

readf("%s\n",&a);//input "1\n" read.
readf("%s\n",&a);//input "1 \n": exception is thrown.

That is as expected - if you specify \n readf expects a \n.

readf("%s\t\n",&a);//input "1\t\n" read.
readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any
further input.

My testbed:

import std.stdio;

void main()
{
    int a, b;
    readf("%s\t\n",&a);
    assert(a == 1);
    readf("%s \n",&b);
    assert(b == 2);
}

dmd ./test && echo "1\t\n2 " | ./test

It fails because it can't find the last \n. That's a bug.

And some more, I do not remember all of them. Exceptions are most of the time 
only
as useful as "Enforcement failed".


You (almost?) never want this behavior, even at the points it marginally makes
sense. It would be nice to have an optional whitespace-enforcing version that
_really_ enforces it
(as opposed to the current implementation), but that should not be the default.
And then it should be consistent (also on skipping or exception throwing).

Except for one bug and one lacking implementation artifact, I find the current behavior consistent with a strict approach to whitespace handling.

2. readf takes pointers. Ugly, end of story. I even like C++ cin with all its 
'>>'
more.
    scanf has that problem too, but it is a C function, you _cannot_ expect it 
to
do any better than that.
    D has variadic template functions that may take ref parameters. It can be 
done
entirely pointer-free.

When I implemented readf, ref variadic arguments weren't working. I'd be hesitant to change it right now as it does not improve actual functionality and disrupts current uses. But I agree ideally it should accept parameters by reference.

3. nonsense like readf("mooh",&a); cannot be caught at compile time. When/Why 
did
you throw away the idea of static overloads? It would have been a powerful 
feature,
    and very useful for this case. scanf in C/C++ does not have this problem,
because most modern compilers generate warnings for this. But that is making 
some
functions
    "more equal than the others"

One early version I had was doing that and spelled

readf!"format string"(arguments);

Unfortunately, sometimes runtime-computed formatting strings are needed and useful (see the recent std.log discussion...) so I decided to go with dynamic formatting for now. Once we get that right, providing an optional compile-time-checked formatting function shouldn't be too difficult with CTFE.

4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I
mistakenly claimed before). I think this is just a quality of implementation
issue, but it is important.

I agree. I'm amazed readf is not slower actually. It uses by character file iteration, by far the slowest (and most embarrassing) code I wrote in Phobos: each character read entails one call to getc() to fetch the character, one call to ungetc() to restore the stream position, and finally one more call to getc() to move forward. The code is correct but very slow. Some C APIs provide undocumented means to peek at the next character in the stream without actually advancing the stream, which is what we need. I know how to do it on most Unixen and Walter knows how to do it on his own cstdlib implementation. We didn't have the time yet, and I'm glad the matter is under spotlight.

    Especially for programming competitions where there are time limits, you do 
not
want IO to unnecessarily become a mayor bottleneck. (Input files can be huge)

Agreed.

    Other than that, D is WAY the most convenient language I have ever tried to
solve small algorithmic tasks in.
5. Not really readf related: There's writef(ln) and there is write(ln). And then
there is readf. I will provide a proof-of-concept for the read function soon.

Good idea. I suggest you provide a template read(T)() that mimics the functionality of Java's nextInt, nextFloat etc:

auto a = stdin.next!int();
auto b = stdin.next!double();
auto s = stdin.next!string("\n"); // read a string up to \n
...


Andrei

Reply via email to