Re: Google Code Jam 2011 Language Usage

Timon Gehr Mon, 09 May 2011 14:01:18 -0700

Sry, overlooked this post.

Andrei Alexandrescu wrote:
> On 5/8/11 5:57 PM, Timon Gehr wrote:
>> Andrei Alexandrescu wrote:
>>> On 5/8/11 3:04 PM, Timon Gehr wrote:
>>>> However I agree that Phobos has to provide some better input handling, 
>>>> since
using
>>>> possibly unsafe C functions is the best way to do it by now. (I think 
>>>> readf is
>>>> severely crippled) I may try to implement a meaningful "read" function.
>>>
>>> Looking forward to detailed feedback about readf. It was implemented in
>>> a hurry so definitely it has a long way to go.
>>>
>>> Andrei
>>
>> What I consider the most important points about readf:
>
>Thanks very much for providing detailed feedback.
>
>> 1. Whitespace handling is different than scanf. It is much stricter and even 
>> feels
>> inconsistent, Eg:
>>
>> int a,b;
>>
>> readf("%s %s",&a,&b);//input "1 2\n" read.
>> readf("%s %s",&a,&b);//input "1  2\n" read (and a==1&&  b==2).
>
> So far so good. By design one space in readf means "skip all whitespace".
>
>> readf("%s",&a);//input "1\n" read. yay.
>> readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too.
>
> I'm not seeing skipping in my tests; I do see an exception being thrown.
> Here's how I test:
>
> import std.stdio;
> void main()
> {
>      int a, b;
>      readf("%s",&a);
>      assert(a == 1);
>      readf("%s",&b);
>      assert(b == 2);
> }
>
> dmd ./test && echo '1\n 2' | ./test


I tested inputting manually in terminal. The exception is thrown only when I
provide an EOF. Seems like the input is not being skipped after all, but readf
does not return until there is an EOF.

> I'm not hooked on this strict whitespace handling, but I think it makes
> a lot of sense particularly when you want to make sure the input looks
> exactly as you think it should. With scanf you can't have precise
> parsing even if you wanted; with readf all you need is to insert a space.
>
> Precision is important. For example, Hive uses a \t for field separation
> when streaming to a file. It is very important to figure that you have
> one tab there versus two (two means a NULL field was in between).

It should be possible to do that with scanf using %[] if I'm not mistaken.

> readf("%s ",&a);//input "1 \n" read.
> readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space 
> (!)
> is missing.
>
> On my machine this passes:
>
> import std.stdio;
> void main()
> {
>      int a, b;
>      readf("%s ",&a);
>      assert(a == 1);
>      readf("%s ",&b);
>      assert(b == 2);
> }
>
> dmd ./test && echo '1\n 2' | ./test
>
> The explanation is that, again, a space means "skip all whitespace". So
> the first space eats the "\n " and the second space eats the final "\n"
> in the input (produced by echo). Please adjust this example so it unduly
> fails.

Again, misinterpretation on my side. Typing into the terminal expects new input
until a non-whitespace character is inserted. Should be fine, but can be 
surprising.

>
>> readf(" %s",&a);//input "1\n" read.
>> readf("\t%s",&a);//input "1\n": exception is thrown.
>
> A "\t" in the formatting string for readf simply requires a tab. To skip
> over any number of tabs, do this:
>
> readf("%*1[\t]%s",&a);
>
> That instructs readf to read, but not store, a string consisting of at
> most one tab. (To skip multiple tabs drop the "1".) This functionality
> is not yet implemented.

I did not know it would ever be! That removes many of my concerns. (and the 
'read'
function removes the rest)

>> readf("%s\n",&a);//input "1\n" read.
>> readf("%s\n",&a);//input "1 \n": exception is thrown.
>
> That is as expected - if you specify \n readf expects a \n.
>
>> readf("%s\t\n",&a);//input "1\t\n" read.
>> readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any
>> further input.
>
> My testbed:
>
> import std.stdio;
>
> void main()
> {
>      int a, b;
>      readf("%s\t\n",&a);
>      assert(a == 1);
>      readf("%s \n",&b);
>      assert(b == 2);
> }
>
> dmd ./test && echo "1\t\n2 " | ./test
>
> It fails because it can't find the last \n. That's a bug.

At least I found one. =)

>> And some more, I do not remember all of them. Exceptions are most of the 
>> time only
>> as useful as "Enforcement failed".
>>
>>
>> You (almost?) never want this behavior, even at the points it marginally 
>> makes
>> sense. It would be nice to have an optional whitespace-enforcing version that
>> _really_ enforces it
>> (as opposed to the current implementation), but that should not be the 
>> default.
>> And then it should be consistent (also on skipping or exception throwing).

> Except for one bug and one lacking implementation artifact, I find the
> current behavior consistent with a strict approach to whitespace handling.

Agreed. Thanks for your explanations!

>> 2. readf takes pointers. Ugly, end of story. I even like C++ cin with all 
>> its '>>'
>> more.
>>     scanf has that problem too, but it is a C function, you _cannot_ expect 
>> it to
>> do any better than that.
>>     D has variadic template functions that may take ref parameters. It can 
>> be done
>> entirely pointer-free.
>
> When I implemented readf, ref variadic arguments weren't working. I'd be
> hesitant to change it right now as it does not improve actual
> functionality and disrupts current uses. But I agree ideally it should
> accept parameters by reference.

We can have both, since it will never be possible to read in raw pointers:

import std.stdio;
import std.conv;

private bool containsPointersImpl(T...)(){ //nesting this inside containsPointer
template removes eponymous template trick. Is this a bug?
                foreach(t;T) static if(is(t U:U*)) return true;
                return false;
}

template containsPointers(T...){enum containsPointers=containsPointersImpl!T();}

private bool onlyPointersImpl(T...)(){
                foreach(t;T) static if(!is(t U:U*)) return false;
                return true;
}

template onlyPointers(T...){enum onlyPointers=onlyPointersImpl!T();}


private string _readfImpl(int len){
        string res="return std.stdio.stdin.readf(format,";
        foreach(t;0..len) res~="&args["~to!string(t)~"], ";
        res~=");";
        return res;
}

int _readf(T...)(string format, ref T args)
if(!containsPointers!T){mixin(_readfImpl(T.length));}

//classic definition for backwards compatibility.
int _readf(T...)(string format, T args) if(onlyPointers!T){
        return std.stdio.stdin.readf(format, args);
}

void main(){
        int a;
        _readf(" %s",&a);
        writeln(a);
        _readf(" %s",a);
        writeln(a);
}



>> 3. nonsense like readf("mooh",&a); cannot be caught at compile time. 
>> When/Why did
>> you throw away the idea of static overloads? It would have been a powerful 
>> feature,
>>     and very useful for this case. scanf in C/C++ does not have this problem,
>> because most modern compilers generate warnings for this. But that is making 
>> some
>> functions
>>     "more equal than the others"
>
> One early version I had was doing that and spelled
>
> readf!"format string"(arguments);
>
> Unfortunately, sometimes runtime-computed formatting strings are needed
> and useful (see the recent std.log discussion...) so I decided to go
> with dynamic formatting for now. Once we get that right, providing an
> optional compile-time-checked formatting function shouldn't be too
> difficult with CTFE.

The problem I see here is that the dynamic version still cannot be checked when
passed a statically known format string.

Why did you drop the idea of allowing something like

int readf(T...)(static string format, T args) ?


>
>> 4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I
>> mistakenly claimed before). I think this is just a quality of implementation
>> issue, but it is important.
>
> I agree. I'm amazed readf is not slower actually. It uses by character
> file iteration, by far the slowest (and most embarrassing) code I wrote
> in Phobos: each character read entails one call to getc() to fetch the
> character, one call to ungetc() to restore the stream position, and
> finally one more call to getc() to move forward. The code is correct but
> very slow. Some C APIs provide undocumented means to peek at the next
> character in the stream without actually advancing the stream, which is
> what we need. I know how to do it on most Unixen and Walter knows how to
> do it on his own cstdlib implementation. We didn't have the time yet,
> and I'm glad the matter is under spotlight.
>
>>     Especially for programming competitions where there are time limits, you 
>> do not
>> want IO to unnecessarily become a mayor bottleneck. (Input files can be huge)
>
> Agreed.
>
>>     Other than that, D is WAY the most convenient language I have ever tried 
>> to
>> solve small algorithmic tasks in.
>> 5. Not really readf related: There's writef(ln) and there is write(ln). And 
>> then
>> there is readf. I will provide a proof-of-concept for the read function soon.
>
> Good idea. I suggest you provide a template read(T)() that mimics the
> functionality of Java's nextInt, nextFloat etc:
>
> auto a = stdin.next!int();
> auto b = stdin.next!double();
> auto s = stdin.next!string("\n"); // read a string up to \n
> ...
>
>
> Andrei


Yes, I think it should support:

auto a = read!int;
auto b = read!double;
auto s = read!string("\n"); // this could be an overload on immutability.
alternative would be read!(string,"\n"); I don not know.

auto x = read!(int[])(50); // read an array of 50 integers separated by 
whitespace
auto y = read!(int[],",")(50); // read an array of 50 integers separated by 
commas
auto z = read!(int[],", ")(50); // read an array of 50 integers separated by
commas and whitespace

Plus the same for every type that can be to!type(string)'d.

But also: read should replace readf wherever possible in the following forms:

int a; double b; string s;
read(a,b,s);//reads whitespace-separated a, b and s in turn. (delimiter could be
changed by template argument or so)

char[] c=new char[1000];
read(c); // only relocates c if the number of read characters exceeds 1000.

One problem I see: An evildoer could provide a huge input, filling up the whole
RAM. I think this vulnerability is also present in readln. Any ideas?


Non-string arrays are handled this way:

int[100] arr;
read(arr); // reads 100 integers and stores in arr

read(arr[0..20]); //reads 20 integers into the first 20 slots of arr

int arr[] = new arr[100];
read(arr); //ditto

Rationale: reading input should not /require/ heap activity.

The read function would cover all cases where no strict whitespace handling is
required, and readf would take the rest! I think that would be a very nice 
solution.


Timon

Re: Google Code Jam 2011 Language Usage

Reply via email to