Re: Google Code Jam 2011 Language Usage

Andrei Alexandrescu Mon, 09 May 2011 11:20:37 -0700

On 5/9/11 12:43 PM, Timon Gehr wrote:

Andrei Alexandrescu wrote:

I've implemented readf to be a fair amount more Nazi about whitespace than
scanf in an attempt to improve its precision. Scanf has been famously difficult
to use for complex input parsing and validation, and I attribute some of that
to its laissez-faire attitude toward whitespace. I'd be glad to relax some of
readf's insistence on precise whitespace handling if there's enough evidence
that that serves most of our users. I personally believe that the current
behavior (strict by default, easy to relax) is best.


In my experience readf behavior is not very useful for routine coding tasks that
involve some IO.

If this assessment would be reverted by simply inserting spaces in theformatting string, I'd be hard pressed to agree.

I do agree that readf behavior is surprising if you expect 100% scanfcompatibility. This is intentional and beneficial as I believe scanf iswanting in more than one way.

If you really need to have very strict requirements about the input format, 
readf
does not serve you well, because a ' ' still skips all whitespace, a failure to
read leaves the file pointer in an undefined position etc.

That is not an issue (albeit some the underlying machinery is not yetimplemented). If you want to skip at most one space but no otherwhitespace, insert "%*1[ ]" in the formatting string. To skip any numberof spaces, insert "%*[ ]". Skipping exactly one space is not supportedat the formatting string level, but you can always read one characterwith %c and then enforce the character is ' '. I agree that that couldbe improved. What's needed is a specification for the minimum number ofcharacters read, e.g. "%*1.1[ ]" for scanning and skipping exactly onespace.

In contrast, having e.g. %d skipping all whitespace is a losingproposition if you want to do precision parsing. This is because thatbehavior can't be disabled. That's why I excised it.

Reading is greedy. Failure to read leaves the pointer in a definedposition, but we need to improve documentation.

All carryovers from
scanf. I never want to use scanf when there is a valid chance of invalid input.

I agree, but that's a problem with scanf that should and could be fixed.There's almost always a chance of invalid input.

As
far as I can see, neither readf nor scanf can be used for sophisticated input
validation or parsing of non-trivial input. You have to do it manually. How does
readf make things better with strict(er) whitespace handling?

Far as I can see, implementing Posix %[charset] extension would makereadf a powerful one-stop shop for parsing input. Of course its speedneeds to be up to snuff too. And of course its specification can beimproved, which is where your input is very valuable.

What behavior is by design, what behavior is caused by bugs? Can you give a
real-world example where readf design clearly beats scanf design? (as it is the
default it should be almost always better, but I fail to see it)

Apart from that, what about the other points I mentioned?


I answered all of these in my other, longer post.


Andrei

Re: Google Code Jam 2011 Language Usage

Reply via email to