On 5/9/11 12:43 PM, Timon Gehr wrote:
Andrei Alexandrescu wrote:
I've implemented readf to be a fair amount more Nazi about whitespace than
scanf in an attempt to improve its precision. Scanf has been famously difficult
to use for complex input parsing and validation, and I attribute some of that
to its laissez-faire attitude toward whitespace. I'd be glad to relax some of
readf's insistence on precise whitespace handling if there's enough evidence
that that serves most of our users. I personally believe that the current
behavior (strict by default, easy to relax) is best.

In my experience readf behavior is not very useful for routine coding tasks that
involve some IO.

If this assessment would be reverted by simply inserting spaces in the formatting string, I'd be hard pressed to agree.

I do agree that readf behavior is surprising if you expect 100% scanf compatibility. This is intentional and beneficial as I believe scanf is wanting in more than one way.

If you really need to have very strict requirements about the input format, 
readf
does not serve you well, because a ' ' still skips all whitespace, a failure to
read leaves the file pointer in an undefined position etc.

That is not an issue (albeit some the underlying machinery is not yet implemented). If you want to skip at most one space but no other whitespace, insert "%*1[ ]" in the formatting string. To skip any number of spaces, insert "%*[ ]". Skipping exactly one space is not supported at the formatting string level, but you can always read one character with %c and then enforce the character is ' '. I agree that that could be improved. What's needed is a specification for the minimum number of characters read, e.g. "%*1.1[ ]" for scanning and skipping exactly one space.

In contrast, having e.g. %d skipping all whitespace is a losing proposition if you want to do precision parsing. This is because that behavior can't be disabled. That's why I excised it.

Reading is greedy. Failure to read leaves the pointer in a defined position, but we need to improve documentation.

All carryovers from
scanf. I never want to use scanf when there is a valid chance of invalid input.

I agree, but that's a problem with scanf that should and could be fixed. There's almost always a chance of invalid input.

As
far as I can see, neither readf nor scanf can be used for sophisticated input
validation or parsing of non-trivial input. You have to do it manually. How does
readf make things better with strict(er) whitespace handling?

Far as I can see, implementing Posix %[charset] extension would make readf a powerful one-stop shop for parsing input. Of course its speed needs to be up to snuff too. And of course its specification can be improved, which is where your input is very valuable.

What behavior is by design, what behavior is caused by bugs? Can you give a
real-world example where readf design clearly beats scanf design? (as it is the
default it should be almost always better, but I fail to see it)

Apart from that, what about the other points I mentioned?

I answered all of these in my other, longer post.


Andrei

Reply via email to