Re: Curl support RFC

2011-03-30 Thread Jonas Drewsen

On 29/03/11 17.31, Johannes Pfau wrote:

Jonas Drewsen wrote:


This is a nice protocol parser. I would very much like it to be used
with the curl API but without it being a dependency. This is already
possible now using the onReceiveHeader callback and this would
decouple the two. At least until std.protocol.http is in phobos as
well - at that point convenience methods could be added :)

/Jonas


Thanks, I think I'll propose the parser for the new experimental
namespace when it's available.


I'm looking forward to that.


About the headersReceived callback: You're totally right, it can be
done with the onReceiveHeader callback right now. But I think in the
common case the user wants the headers in an key/value array. So if the
user doesn't want to use the onReceiveHeader api, a headersReceived
callback would probably be convenient. But, as said it's not necessary.


I'll put it on my todo and reconsider when I get to it :)


Reading the curl documentation showed another small trap:
CURLOPT_HEADERFUNCTION

It's important to note that the callback will be invoked for the
headers of all responses received after initiating a request and not
just the final response. This includes all responses which occur during
authentication negotiation. If you need to operate on only the headers
from the final response, you will need to collect headers in the
callback yourself and use HTTP status lines, for example, to delimit
response boundaries.


I think if we store the headers into an array, we should only store the
headers of the final response. Another question is should all headers
or only final headers trigger the onReceiveHeader callback? Passing
only the final headers would require extra work, passing all headers
should at least be documented.


Yeah... I've discovered this myself as well.

The current implementation does as libcurl does it an passes all headers 
not just for the final subrequest.



Thinking of this more, this also means the _receiveHeaderCallback is
not 100% correct, as it expects all lines after the first line to be
header or empty lines, but it's possible that we get multiple statuslines.
It still works, the regex doesn't match anything and the code
ignores that line. But this way, the stored statusline will always be
the first statusline, which isn't optimal. We'd also need to detect if a
line is a statusline to reset the headers array if it's used. Seems
like we have to think about this some more.


My local version already takes care of this. It was the wrong place for 
parsing status lines and headers anyway. It is now moved to the Http 
class where it should have been all the time.


I have implemented almost all of the features/changes suggested now. The 
last one I'm currently fighting is the support for foreach and async 
.byLine/.byChunk. I may have to make some changes in the current design 
to support this with the calling API that I would like to expose.


I wonder who could take the step and open a std.experimental package for 
submissions?


Thank you for the feedback!


Re: Curl support RFC

2011-03-29 Thread Johannes Pfau
Jonas Drewsen wrote:

This is a nice protocol parser. I would very much like it to be used 
with the curl API but without it being a dependency. This is already 
possible now using the onReceiveHeader callback and this would
decouple the two. At least until std.protocol.http is in phobos as
well - at that point convenience methods could be added :)

/Jonas

Thanks, I think I'll propose the parser for the new experimental
namespace when it's available.

About the headersReceived callback: You're totally right, it can be
done with the onReceiveHeader callback right now. But I think in the
common case the user wants the headers in an key/value array. So if the
user doesn't want to use the onReceiveHeader api, a headersReceived
callback would probably be convenient. But, as said it's not necessary.

Reading the curl documentation showed another small trap:
CURLOPT_HEADERFUNCTION

It's important to note that the callback will be invoked for the
headers of all responses received after initiating a request and not
just the final response. This includes all responses which occur during
authentication negotiation. If you need to operate on only the headers
from the final response, you will need to collect headers in the
callback yourself and use HTTP status lines, for example, to delimit
response boundaries.


I think if we store the headers into an array, we should only store the
headers of the final response. Another question is should all headers
or only final headers trigger the onReceiveHeader callback? Passing
only the final headers would require extra work, passing all headers
should at least be documented.

Thinking of this more, this also means the _receiveHeaderCallback is
not 100% correct, as it expects all lines after the first line to be
header or empty lines, but it's possible that we get multiple statuslines.
It still works, the regex doesn't match anything and the code
ignores that line. But this way, the stored statusline will always be
the first statusline, which isn't optimal. We'd also need to detect if a
line is a statusline to reset the headers array if it's used. Seems
like we have to think about this some more.

-- 
Johannes Pfau


signature.asc
Description: PGP signature


Re: Curl support RFC

2011-03-27 Thread Jonas Drewsen

On 25/03/11 10.54, Johannes Pfau wrote:

Jonas Drewsen wrote:

Hi,

   So I've been working a bit on the etc.curl module. Currently most
of
the HTTP functionality is done and some very simple Ftp.

I would very much like to know if this has a chance of getting in
phobos if I finish it with the current design. If not then it will be
for my own project only and doesn't need as much documentation or all
the features.

https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:

Hi,

So I've spent some time trying to wrap libcurl for D. There is a lot
of things that you can do with libcurl which I did not know so I'm
starting out small.

For now I've created all the declarations for the latest public curl
C api. I have put that in the etc.c.curl module.

On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently
works but before proceeding further down this road I would like to
get your comments on it.

//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get(http://www.google.com;).content );

//
// GET with custom data receiver delegates
//
Http http = new Http(http://www.google.dk;);
http.setReceiveHeaderCallback( (string key, string value) {
writeln(key ~ : ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;

//
// POST with some timouts
//
http.setUrl(http://www.testing.com/test.cgi;);
http.setReceiveCallback( (string data) { writeln(data); } );
http.setConnectTimeout(1000);
http.setDataTimeout(1000);
http.setDnsTimeout(1000);
http.setPostData(The quick);
http.perform;

//
// PUT with data sender delegate
//
string msg = Hello world;
size_t len = msg.length; /* using chuncked transfer if omitted */

http.setSendCallback( delegate size_t(char[] data) {
if (msg.empty) return 0;
auto l = msg.length;
data[0..l] = msg[0..$];
msg.length = 0;
return l;
},
HttpMethod.put, len );
http.perform;

//
// HTTPS
//
writeln(Http.get(https://mail.google.com;).content);

//
// FTP
//
writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
./downloaded-file));


// ... authenication, cookies, interface select, progress callback
// etc. is also implemented this way.


/Jonas




I looked at the code again and I got 2 more suggestions:

1.) Would it be useful to have a headersReceived callback which would be
called when all headers have been received (when the data callback is
called the first time)? I think of a situation where you don't know
what data the server will return: a few KB html which you can easily
keep in memory or a huge file which you'd have to save to disk. You
can only know that if the headers have been received. It would also be
possible to do that by just overwriting the headerCallback and looking
out for the ContentLength/ContentType header, but I think it should
also work with the default headerCallback.


I'm a little confused as to what a headersReceived(string[string] 
headers) would give you compared to the onReceiveHeader(const(char)[], 
const(char)[])) callback that exists today in the example.


The headersReceived callback would probably lookup the content-length 
header and set a flag about whether to save content to file or memory.


The existing onReceiveHeader could do the same by setting the flag when 
it receives the content-length field.


Or maybe I'm misunderstanding you?



2.)
As far as I can see you store the http headers in a case sensitive way.
(res.headers[key] ~= value;). This means Content-Length vs
content-length would produce two entries in the array and it makes
it difficult to get the header from the associative array. It is maybe
useful to keep the original casing, but probably not in the array key.

BTW: According to RFC2616 the only headers which are allowed
to be included multiple times in the response must consist of comma
separated lists. So in theory we could keep a simple string[string]
list and if we see a header twice we can just merge it with a ','.

http://tools.ietf.org/html/rfc2616#section-4.2
Relevant part from the RFC:
--
Multiple message-header fields with the same field-name MAY be
present in a message if and only if the entire field-value for that
header field is defined as a comma-separated list [i.e., #(values)].
It MUST be possible to combine the multiple header fields into one
field-name: field-value pair, without changing the semantics of the
message, by appending each subsequent field-value to the first, each
separated by a comma. The order in which header fields with the same
field-name are received is therefore significant to the
interpretation of the combined field value, and thus a proxy MUST NOT
change the order of these field values when a message is forwarded.
--


I will surely implement this combined 

Re: Curl support RFC

2011-03-27 Thread Jonas Drewsen

On 25/03/11 12.07, Johannes Pfau wrote:

Johannes Pfau wrote:

Jonas Drewsen wrote:

Hi,

   So I've been working a bit on the etc.curl module. Currently most
of
the HTTP functionality is done and some very simple Ftp.

I would very much like to know if this has a chance of getting in
phobos if I finish it with the current design. If not then it will be
for my own project only and doesn't need as much documentation or all
the features.

https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:

Hi,

So I've spent some time trying to wrap libcurl for D. There is a lot
of things that you can do with libcurl which I did not know so I'm
starting out small.

For now I've created all the declarations for the latest public curl
C api. I have put that in the etc.c.curl module.

On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently
works but before proceeding further down this road I would like to
get your comments on it.

//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get(http://www.google.com;).content );

//
// GET with custom data receiver delegates
//
Http http = new Http(http://www.google.dk;);
http.setReceiveHeaderCallback( (string key, string value) {
writeln(key ~ : ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;

//
// POST with some timouts
//
http.setUrl(http://www.testing.com/test.cgi;);
http.setReceiveCallback( (string data) { writeln(data); } );
http.setConnectTimeout(1000);
http.setDataTimeout(1000);
http.setDnsTimeout(1000);
http.setPostData(The quick);
http.perform;

//
// PUT with data sender delegate
//
string msg = Hello world;
size_t len = msg.length; /* using chuncked transfer if omitted */

http.setSendCallback( delegate size_t(char[] data) {
if (msg.empty) return 0;
auto l = msg.length;
data[0..l] = msg[0..$];
msg.length = 0;
return l;
},
HttpMethod.put, len );
http.perform;

//
// HTTPS
//
writeln(Http.get(https://mail.google.com;).content);

//
// FTP
//
writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
./downloaded-file));


// ... authenication, cookies, interface select, progress callback
// etc. is also implemented this way.


/Jonas




I looked at the code again and I got 2 more suggestions:

1.) Would it be useful to have a headersReceived callback which would
be called when all headers have been received (when the data callback
is called the first time)? I think of a situation where you don't know
what data the server will return: a few KB html which you can easily
keep in memory or a huge file which you'd have to save to disk. You
can only know that if the headers have been received. It would also be
possible to do that by just overwriting the headerCallback and looking
out for the ContentLength/ContentType header, but I think it should
also work with the default headerCallback.

2.)
As far as I can see you store the http headers in a case sensitive way.
(res.headers[key] ~= value;). This means Content-Length vs
content-length would produce two entries in the array and it makes
it difficult to get the header from the associative array. It is maybe
useful to keep the original casing, but probably not in the array key.

BTW: According to RFC2616 the only headers which are allowed
to be included multiple times in the response must consist of comma
separated lists. So in theory we could keep a simple string[string]
list and if we see a header twice we can just merge it with a ','.

http://tools.ietf.org/html/rfc2616#section-4.2
Relevant part from the RFC:
--
   Multiple message-header fields with the same field-name MAY be
   present in a message if and only if the entire field-value for that
   header field is defined as a comma-separated list [i.e., #(values)].
   It MUST be possible to combine the multiple header fields into one
   field-name: field-value pair, without changing the semantics of
the message, by appending each subsequent field-value to the first,
each separated by a comma. The order in which header fields with the
same field-name are received is therefore significant to the
   interpretation of the combined field value, and thus a proxy MUST
NOT change the order of these field values when a message is
forwarded.
--

I'm also done with the first pass through the http parsers.
Documentation is here:
http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html

Code here:
https://gist.github.com/886612
The http.d file is generated from the http.d.rl file.



I added some code to show how I think this could be used in the HTTP
client:
https://gist.github.com/886612#file_gistfile1.d

Like in the .net webclient we'd need two of these collections: one for
received headers and one for headers to be sent.


Thanks!

It would be very nice to have 

Re: Curl support RFC

2011-03-25 Thread Johannes Pfau
Jonas Drewsen wrote:
Hi,

   So I've been working a bit on the etc.curl module. Currently most
 of 
the HTTP functionality is done and some very simple Ftp.

I would very much like to know if this has a chance of getting in
phobos if I finish it with the current design. If not then it will be
for my own project only and doesn't need as much documentation or all
the features.

https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get(http://www.google.com;).content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http(http://www.google.dk;);
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ : ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl(http://www.testing.com/test.cgi;);
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData(The quick);
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg = Hello world;
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get(https://mail.google.com;).content);

 //
 // FTP
 //
 writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
 ./downloaded-file));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas


I looked at the code again and I got 2 more suggestions:

1.) Would it be useful to have a headersReceived callback which would be
called when all headers have been received (when the data callback is
called the first time)? I think of a situation where you don't know
what data the server will return: a few KB html which you can easily
keep in memory or a huge file which you'd have to save to disk. You
can only know that if the headers have been received. It would also be
possible to do that by just overwriting the headerCallback and looking
out for the ContentLength/ContentType header, but I think it should
also work with the default headerCallback.

2.)
As far as I can see you store the http headers in a case sensitive way.
(res.headers[key] ~= value;). This means Content-Length vs
content-length would produce two entries in the array and it makes
it difficult to get the header from the associative array. It is maybe
useful to keep the original casing, but probably not in the array key.

BTW: According to RFC2616 the only headers which are allowed
to be included multiple times in the response must consist of comma
separated lists. So in theory we could keep a simple string[string]
list and if we see a header twice we can just merge it with a ','.

http://tools.ietf.org/html/rfc2616#section-4.2
Relevant part from the RFC:
--
   Multiple message-header fields with the same field-name MAY be
   present in a message if and only if the entire field-value for that
   header field is defined as a comma-separated list [i.e., #(values)].
   It MUST be possible to combine the multiple header fields into one
   field-name: field-value pair, without changing the semantics of the
   message, by appending each subsequent field-value to the first, each
   separated by a comma. The order in which header fields with the same
   field-name are received is therefore significant to the
   interpretation of the combined field value, and thus a proxy MUST NOT
   change the order of these field values when a message is forwarded.
--

I'm also done with the first pass through the http parsers.
Documentation is here:
http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html

Code here:
https://gist.github.com/886612
The http.d file is generated from the http.d.rl file. 

-- 
Johannes Pfau


signature.asc
Description: PGP signature


Re: Curl support RFC

2011-03-25 Thread Johannes Pfau
Johannes Pfau wrote:
Jonas Drewsen wrote:
Hi,

   So I've been working a bit on the etc.curl module. Currently most
 of 
the HTTP functionality is done and some very simple Ftp.

I would very much like to know if this has a chance of getting in
phobos if I finish it with the current design. If not then it will be
for my own project only and doesn't need as much documentation or all
the features.

https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get(http://www.google.com;).content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http(http://www.google.dk;);
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ : ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl(http://www.testing.com/test.cgi;);
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData(The quick);
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg = Hello world;
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get(https://mail.google.com;).content);

 //
 // FTP
 //
 writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
 ./downloaded-file));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas


I looked at the code again and I got 2 more suggestions:

1.) Would it be useful to have a headersReceived callback which would
be called when all headers have been received (when the data callback
is called the first time)? I think of a situation where you don't know
what data the server will return: a few KB html which you can easily
keep in memory or a huge file which you'd have to save to disk. You
can only know that if the headers have been received. It would also be
possible to do that by just overwriting the headerCallback and looking
out for the ContentLength/ContentType header, but I think it should
also work with the default headerCallback.

2.)
As far as I can see you store the http headers in a case sensitive way.
(res.headers[key] ~= value;). This means Content-Length vs
content-length would produce two entries in the array and it makes
it difficult to get the header from the associative array. It is maybe
useful to keep the original casing, but probably not in the array key.

BTW: According to RFC2616 the only headers which are allowed
to be included multiple times in the response must consist of comma
separated lists. So in theory we could keep a simple string[string]
list and if we see a header twice we can just merge it with a ','.

http://tools.ietf.org/html/rfc2616#section-4.2
Relevant part from the RFC:
--
   Multiple message-header fields with the same field-name MAY be
   present in a message if and only if the entire field-value for that
   header field is defined as a comma-separated list [i.e., #(values)].
   It MUST be possible to combine the multiple header fields into one
   field-name: field-value pair, without changing the semantics of
 the message, by appending each subsequent field-value to the first,
 each separated by a comma. The order in which header fields with the
 same field-name are received is therefore significant to the
   interpretation of the combined field value, and thus a proxy MUST
 NOT change the order of these field values when a message is
 forwarded.
--

I'm also done with the first pass through the http parsers.
Documentation is here:
http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html

Code here:
https://gist.github.com/886612
The http.d file is generated from the http.d.rl file. 


I added some code to show how I think this could be used in the HTTP
client:
https://gist.github.com/886612#file_gistfile1.d

Like in the .net webclient we'd need two of these collections: one for
received headers and one for headers to be sent.
-- 
Johannes Pfau



Re: Curl support RFC

2011-03-14 Thread Jonas Drewsen

On 13/03/11 23.44, Andrei Alexandrescu wrote:

On 3/11/11 9:20 AM, Jonas Drewsen wrote:

Hi,

So I've spent some time trying to wrap libcurl for D. There is a lot of
things that you can do with libcurl which I did not know so I'm starting
out small.

For now I've created all the declarations for the latest public curl C
api. I have put that in the etc.c.curl module.


Great! Could you please create a pull request for that?


Will do as soon as I've figured out howto create a pull request for a 
single file in a branch. Anyone knows how to do that on github? Or 
should I just create a pull request including the etc.curl wrapper as well?



On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently works
but before proceeding further down this road I would like to get your
comments on it.

//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get(http://www.google.com;).content );


Sweet. As has been discussed, often the content is not text so you may
want to have content return ubyte[] and add a new property such as
textContent or text.


I've already changed it to void[] as done in the std.file module. Is 
ubyte[] better suited?


I'll add a text property as well.



//
// GET with custom data receiver delegates
//
Http http = new Http(http://www.google.dk;);


You'll probably need to justify the existence of a class hierarchy and
what overridable methods there are. In particular, since you seem to
offer hooks via delegates, probably classes wouldn't be needed at all.
(FWIW I would've done the same; I wouldn't want to inherit just to
intercept the headers etc.)


http.setReceiveHeaderCallback( (string key, string value) {
writeln(key ~ : ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;


As discussed, properties may be better here than setXxx and getXxx. The
setReceiveCallback hook should take a ubyte[]. The
setReceiveHeaderCallback should take a const(char)[]. That way you won't
need to copy all headers, leaving safely that option to the client.


I've already replaced the set/get methods with properties and renamed 
them. Hadn't thought of using const(char)[].. thanks for the hint.




//
// POST with some timouts
//
http.setUrl(http://www.testing.com/test.cgi;);
http.setReceiveCallback( (string data) { writeln(data); } );
http.setConnectTimeout(1000);
http.setDataTimeout(1000);
http.setDnsTimeout(1000);
http.setPostData(The quick);
http.perform;


setPostData - setTextPostData, and then changing everything to
properties would make it something like textPostData. Or wait, there
could be some overloading going on... Anyway, the basic idea is that
generally get and post data could be raw bytes, and the user could elect
to transfer strings instead.


I'll make sure both text and byte[]/void[] versions will be available.


//
// PUT with data sender delegate
//
string msg = Hello world;
size_t len = msg.length; /* using chuncked transfer if omitted */

http.setSendCallback( delegate size_t(char[] data) {
if (msg.empty) return 0;
auto l = msg.length;
data[0..l] = msg[0..$];
msg.length = 0;
return l;
},
HttpMethod.put, len );
http.perform;


The callback would take ubyte[].


Already fixed.



//
// HTTPS
//
writeln(Http.get(https://mail.google.com;).content);

//
// FTP
//
writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
./downloaded-file));


// ... authenication, cookies, interface select, progress callback
// etc. is also implemented this way.


/Jonas


This is all very encouraging. I think this API covers nicely a variety
of needs. We need to make sure everything interacts well with threads,
in particular that one can shut down a transfer (or the entire library)
from a thread or callback and have the existing transfer(s) throw an
exception immediately.


I'll have a look at it.



Regarding a range interface, it would be great if you allowed e.g.

foreach (line; Http.get(https://mail.google.com;).byLine()) {
...
}

The data transfer should happen concurrently with the foreach code. The
type of line is char[] or const(char)[]. Similarly, there would be a
byChunk interface that transfers in ubyte[] chunks.

Also we need a head() method for the corresponding command.

Andrei


That would be neat. What do you mean about concurrent data transfers 
with foreach?



/Jonas


Re: Curl support RFC

2011-03-14 Thread Jonathan M Davis
On Monday 14 March 2011 02:16:12 Jonas Drewsen wrote:
 On 13/03/11 23.44, Andrei Alexandrescu wrote:
  On 3/11/11 9:20 AM, Jonas Drewsen wrote:
  Hi,
  
  So I've spent some time trying to wrap libcurl for D. There is a lot of
  things that you can do with libcurl which I did not know so I'm starting
  out small.
  
  For now I've created all the declarations for the latest public curl C
  api. I have put that in the etc.c.curl module.
  
  Great! Could you please create a pull request for that?
 
 Will do as soon as I've figured out howto create a pull request for a
 single file in a branch. Anyone knows how to do that on github? Or
 should I just create a pull request including the etc.curl wrapper as well?

You can't. A pull request is for an entire branch. It pulls _everything_ from 
that branch which differs from the one being merged with. git cares about 
commits, not files. And pulling from another repository pulls all of the 
commits 
which you don't have. So, if you want to do a pull request, you create a branch 
with exactly the commits that you wanted merged in on it. No more, no less.

  On top of that I've created a more D like api as seen below. This is
  located in the 'etc.curl' module. What you can see below currently works
  but before proceeding further down this road I would like to get your
  comments on it.
  
  //
  // Simple HTTP GET with sane defaults
  // provides the .content, .headers and .status
  //
  writeln( Http.get(http://www.google.com;).content );
  
  Sweet. As has been discussed, often the content is not text so you may
  want to have content return ubyte[] and add a new property such as
  textContent or text.
 
 I've already changed it to void[] as done in the std.file module. Is
 ubyte[] better suited?

That's debatable. Some would argue one way, some another. Personally, I'd argue 
ubyte[]. I don't like void[] one bit. Others would agree with me, and yet 
others 
would disagree. I don't think that there's really a general agreement on 
whether 
void[] or ubyte[] is better when it comes to reading binary data like that.

- Jonathan M Davis


Re: Curl support RFC

2011-03-14 Thread Johannes Pfau
Jonas Drewsen wrote:
Hi,

   So I've been working a bit on the etc.curl module. Currently most
 of 
the HTTP functionality is done and some very simple Ftp.

I would very much like to know if this has a chance of getting in
phobos if I finish it with the current design. If not then it will be
for my own project only and doesn't need as much documentation or all
the features.

https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get(http://www.google.com;).content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http(http://www.google.dk;);
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ : ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl(http://www.testing.com/test.cgi;);
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData(The quick);
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg = Hello world;
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get(https://mail.google.com;).content);

 //
 // FTP
 //
 writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
 ./downloaded-file));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas

Hi,
I really like the API. A few comments:

You use the internal curl progress meter. According to the
documentation (It's a little hidden, look at CURLOPT_NOPROGRESS) the
progress meter is likely to removed in future curl versions. The
download progress should be easy to reimplement, although you'd have to
parse the Content-Length header. Upload shouldn't be to difficult either
(One problem: What does curl pass as ultotal/dltotal when chunked
encoding is used or the total size is not known?). Then we could also
use different delegates for upload/download.

The callback interface suits curl best and I actually like it, but how
will it interact with streams? As an example: If someone wrote a
stream/filter that decoded gzip for files it should be usable with
the http streams as well. But files/ filestreams have a pull
interface (no callbacks, stream.read() in a loop). So how could a gzip
stream be written without to much code duplication supporting files and
the http stuff?

Do you plan to add some kind of support for header parsing? I think
something like what the .net webclient uses
( http://msdn.microsoft.com/en-us/library/system.net.webclient(v=VS.100).aspx )
would be great. Especially the HeaderCollection supporting headers as
strings and as data types (for both parsing and formatting), but
without a class hierarchy for the headers, using templates instead.

I've written D parsers/formatters for almost all headers in
rfc2616 (1 or 2 might be missing) and for a few additional commonly
used headers (Content-Disposition, cookie headers). The parsers are
written with ragel and are to be used with curl (continuations must be
removed and the parsers always take 1 line of input, just as you get it
from curl). Right now only the client side is implemented (no parsers
for headers which can only be sent from client--server ). However, I
need to add some more documentation to the parsers, need to do
some refactoring and I've got absolutely no time for that in the next 2
weeks ('abitur' final exams). But if you could wait 2 weeks or if
you wanted to do the refactoring yourself, I would be happy to
contribute that code.


-- 
Johannes Pfau


signature.asc
Description: PGP signature


Re: Curl support RFC

2011-03-14 Thread Lars T. Kyllingstad
On Mon, 14 Mar 2011 02:36:07 -0700, Jonathan M Davis wrote:

 On Monday 14 March 2011 02:16:12 Jonas Drewsen wrote:
 On 13/03/11 23.44, Andrei Alexandrescu wrote:
  On 3/11/11 9:20 AM, Jonas Drewsen wrote:
  Hi,
  
  So I've spent some time trying to wrap libcurl for D. There is a lot
  of things that you can do with libcurl which I did not know so I'm
  starting out small.
  
  For now I've created all the declarations for the latest public curl
  C api. I have put that in the etc.c.curl module.
  
  Great! Could you please create a pull request for that?
 
 Will do as soon as I've figured out howto create a pull request for a
 single file in a branch. Anyone knows how to do that on github? Or
 should I just create a pull request including the etc.curl wrapper as
 well?
 
 You can't. A pull request is for an entire branch. It pulls _everything_
 from that branch which differs from the one being merged with. git cares
 about commits, not files. And pulling from another repository pulls all
 of the commits which you don't have. So, if you want to do a pull
 request, you create a branch with exactly the commits that you wanted
 merged in on it. No more, no less.
 
  On top of that I've created a more D like api as seen below. This is
  located in the 'etc.curl' module. What you can see below currently
  works but before proceeding further down this road I would like to
  get your comments on it.
  
  //
  // Simple HTTP GET with sane defaults // provides the .content,
  .headers and .status //
  writeln( Http.get(http://www.google.com;).content );
  
  Sweet. As has been discussed, often the content is not text so you
  may want to have content return ubyte[] and add a new property such
  as textContent or text.
 
 I've already changed it to void[] as done in the std.file module. Is
 ubyte[] better suited?
 
 That's debatable. Some would argue one way, some another. Personally,
 I'd argue ubyte[]. I don't like void[] one bit. Others would agree with
 me, and yet others would disagree. I don't think that there's really a
 general agreement on whether void[] or ubyte[] is better when it comes
 to reading binary data like that.

I also think ubyte[] is best, because:

1. It can be used directly.  (You can't get an element from a void[] 
array without casting it to something else first.)

2. There are no assumptions about the type of data contained in the 
array.  (char[] arrays are assumed to be UTF-8 encoded.)

3. ubyte[] arrays are (AFAIK) not scanned by the GC.  (void[] arrays may 
contain pointers and must therefore be scanned.)

I think the rule of thumb should be:  If the array contains raw data of 
unspecified type, but no pointers or references, use ubyte[].  

void[] is very useful for input parameters, however, since all arrays are 
implicitly castable to void[]:

  void writeData(void[] data) { ... }

  writeData(Hello World!);
  writeData([1, 2, 3, 4]);

-Lars


Re: Curl support RFC

2011-03-14 Thread Kagamin
Lars T. Kyllingstad Wrote:

 2. There are no assumptions about the type of data contained in the 
 array.  (char[] arrays are assumed to be UTF-8 encoded.)

http has content-type, so it's known, what is contained in the array.


Re: Curl support RFC

2011-03-14 Thread Steven Schveighoffer
On Mon, 14 Mar 2011 07:20:26 -0400, Lars T. Kyllingstad  
public@kyllingen.nospamnet wrote:



On Mon, 14 Mar 2011 02:36:07 -0700, Jonathan M Davis wrote:


On Monday 14 March 2011 02:16:12 Jonas Drewsen wrote:

On 13/03/11 23.44, Andrei Alexandrescu wrote:
 On 3/11/11 9:20 AM, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 Great! Could you please create a pull request for that?

Will do as soon as I've figured out howto create a pull request for a
single file in a branch. Anyone knows how to do that on github? Or
should I just create a pull request including the etc.curl wrapper as
well?


You can't. A pull request is for an entire branch. It pulls _everything_
from that branch which differs from the one being merged with. git cares
about commits, not files. And pulling from another repository pulls all
of the commits which you don't have. So, if you want to do a pull
request, you create a branch with exactly the commits that you wanted
merged in on it. No more, no less.


 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults // provides the .content,
 .headers and .status //
 writeln( Http.get(http://www.google.com;).content );

 Sweet. As has been discussed, often the content is not text so you
 may want to have content return ubyte[] and add a new property such
 as textContent or text.

I've already changed it to void[] as done in the std.file module. Is
ubyte[] better suited?


That's debatable. Some would argue one way, some another. Personally,
I'd argue ubyte[]. I don't like void[] one bit. Others would agree with
me, and yet others would disagree. I don't think that there's really a
general agreement on whether void[] or ubyte[] is better when it comes
to reading binary data like that.


I also think ubyte[] is best, because:

1. It can be used directly.  (You can't get an element from a void[]
array without casting it to something else first.)

2. There are no assumptions about the type of data contained in the
array.  (char[] arrays are assumed to be UTF-8 encoded.)

3. ubyte[] arrays are (AFAIK) not scanned by the GC.  (void[] arrays may
contain pointers and must therefore be scanned.)


This isn't exactly true.  arrays *created* as void[] will be scanned.   
Arrays created as ubyte[] and then cast to void[] will not be scanned.


However, it is far too easy while dealing with a void[] array to have it  
mysteriously flip its bit to scan-able.



I think the rule of thumb should be:  If the array contains raw data of
unspecified type, but no pointers or references, use ubyte[].

void[] is very useful for input parameters, however, since all arrays are
implicitly castable to void[]:

  void writeData(void[] data) { ... }

  writeData(Hello World!);
  writeData([1, 2, 3, 4]);


I think (and this differs from  my previous opinion) const(void)[] should  
be used for input parameters where any array type could be passed in.   
However, ubyte[] should be used for output parameters and for internal  
storage.  void[] just has too many pitfalls to be used anywhere but where  
its implicit casting ability is useful.


-Steve


Re: Curl support RFC

2011-03-14 Thread Jonas Drewsen

On 14/03/11 12.10, Johannes Pfau wrote:

Jonas Drewsen wrote:

Hi,

   So I've been working a bit on the etc.curl module. Currently most
of
the HTTP functionality is done and some very simple Ftp.

I would very much like to know if this has a chance of getting in
phobos if I finish it with the current design. If not then it will be
for my own project only and doesn't need as much documentation or all
the features.

https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:

Hi,

So I've spent some time trying to wrap libcurl for D. There is a lot
of things that you can do with libcurl which I did not know so I'm
starting out small.

For now I've created all the declarations for the latest public curl
C api. I have put that in the etc.c.curl module.

On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently
works but before proceeding further down this road I would like to
get your comments on it.

//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get(http://www.google.com;).content );

//
// GET with custom data receiver delegates
//
Http http = new Http(http://www.google.dk;);
http.setReceiveHeaderCallback( (string key, string value) {
writeln(key ~ : ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;

//
// POST with some timouts
//
http.setUrl(http://www.testing.com/test.cgi;);
http.setReceiveCallback( (string data) { writeln(data); } );
http.setConnectTimeout(1000);
http.setDataTimeout(1000);
http.setDnsTimeout(1000);
http.setPostData(The quick);
http.perform;

//
// PUT with data sender delegate
//
string msg = Hello world;
size_t len = msg.length; /* using chuncked transfer if omitted */

http.setSendCallback( delegate size_t(char[] data) {
if (msg.empty) return 0;
auto l = msg.length;
data[0..l] = msg[0..$];
msg.length = 0;
return l;
},
HttpMethod.put, len );
http.perform;

//
// HTTPS
//
writeln(Http.get(https://mail.google.com;).content);

//
// FTP
//
writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
./downloaded-file));


// ... authenication, cookies, interface select, progress callback
// etc. is also implemented this way.


/Jonas



Hi,
I really like the API. A few comments:

You use the internal curl progress meter. According to the
documentation (It's a little hidden, look at CURLOPT_NOPROGRESS) the
progress meter is likely to removed in future curl versions. The
download progress should be easy to reimplement, although you'd have to
parse the Content-Length header. Upload shouldn't be to difficult either
(One problem: What does curl pass as ultotal/dltotal when chunked
encoding is used or the total size is not known?). Then we could also
use different delegates for upload/download.


I did see the notice about the future of NOPROGRESS's removal but 
decided to wrap it anyway. Maybe I should just remove it in an initial 
version. As you say it is pretty simple to implement ourselves.



The callback interface suits curl best and I actually like it, but how
will it interact with streams? As an example: If someone wrote a
stream/filter that decoded gzip for files it should be usable with
the http streams as well. But files/ filestreams have a pull
interface (no callbacks, stream.read() in a loop). So how could a gzip
stream be written without to much code duplication supporting files and
the http stuff?


If we take Andrei's stream proposal as the base of a new streaming 
design then the http would just be another Transport. Files have a pull 
interface that blocks until data is read. The same could be done for a 
the http class.


What I would really like is for the stream design to support 
non-blocking as mentioned in the stream proposal. Just have to figure 
out how the streaming API should behave in such cases I guess.




Do you plan to add some kind of support for header parsing? I think
something like what the .net webclient uses
( http://msdn.microsoft.com/en-us/library/system.net.webclient(v=VS.100).aspx )
would be great. Especially the HeaderCollection supporting headers as
strings and as data types (for both parsing and formatting), but
without a class hierarchy for the headers, using templates instead.


It would be nice to be able to get/set headers by string and enums 
(http://msdn.microsoft.com/en-us/library/system.net.httprequestheader.aspx). 
But I cannot see that .net is using datatypes or templates for it. Could 
you give me a pointer please?




I've written D parsers/formatters for almost all headers in
rfc2616 (1 or 2 might be missing) and for a few additional commonly
used headers (Content-Disposition, cookie headers). The parsers are
written with ragel and are to be used with curl (continuations must be
removed and the parsers always take 1 line of input, just as you get it
from curl). Right 

Re: Curl support RFC

2011-03-14 Thread Jacob Carlborg

On 2011-03-13 22:39, Jonas Drewsen wrote:

Hi,

So I've been working a bit on the etc.curl module. Currently most of the
HTTP functionality is done and some very simple Ftp.

I would very much like to know if this has a chance of getting in phobos
if I finish it with the current design. If not then it will be for my
own project only and doesn't need as much documentation or all the
features.

https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:

Hi,

So I've spent some time trying to wrap libcurl for D. There is a lot of
things that you can do with libcurl which I did not know so I'm starting
out small.

For now I've created all the declarations for the latest public curl C
api. I have put that in the etc.c.curl module.

On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently works
but before proceeding further down this road I would like to get your
comments on it.

//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get(http://www.google.com;).content );

//
// GET with custom data receiver delegates
//
Http http = new Http(http://www.google.dk;);
http.setReceiveHeaderCallback( (string key, string value) {
writeln(key ~ : ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;

//
// POST with some timouts
//
http.setUrl(http://www.testing.com/test.cgi;);
http.setReceiveCallback( (string data) { writeln(data); } );
http.setConnectTimeout(1000);
http.setDataTimeout(1000);
http.setDnsTimeout(1000);
http.setPostData(The quick);
http.perform;

//
// PUT with data sender delegate
//
string msg = Hello world;
size_t len = msg.length; /* using chuncked transfer if omitted */

http.setSendCallback( delegate size_t(char[] data) {
if (msg.empty) return 0;
auto l = msg.length;
data[0..l] = msg[0..$];
msg.length = 0;
return l;
},
HttpMethod.put, len );
http.perform;

//
// HTTPS
//
writeln(Http.get(https://mail.google.com;).content);

//
// FTP
//
writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
./downloaded-file));


// ... authenication, cookies, interface select, progress callback
// etc. is also implemented this way.


/Jonas


I thought that the etc package was for C bindings and would expect the 
curl module to be placed in std.curl or std.net.curl.


--
/Jacob Carlborg


Re: Curl support RFC

2011-03-14 Thread Jonas Drewsen

On 14/03/11 13.28, Steven Schveighoffer wrote:

On Mon, 14 Mar 2011 07:20:26 -0400, Lars T. Kyllingstad
public@kyllingen.nospamnet wrote:


On Mon, 14 Mar 2011 02:36:07 -0700, Jonathan M Davis wrote:


On Monday 14 March 2011 02:16:12 Jonas Drewsen wrote:

On 13/03/11 23.44, Andrei Alexandrescu wrote:
 On 3/11/11 9:20 AM, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 Great! Could you please create a pull request for that?

Will do as soon as I've figured out howto create a pull request for a
single file in a branch. Anyone knows how to do that on github? Or
should I just create a pull request including the etc.curl wrapper as
well?


You can't. A pull request is for an entire branch. It pulls _everything_
from that branch which differs from the one being merged with. git cares
about commits, not files. And pulling from another repository pulls all
of the commits which you don't have. So, if you want to do a pull
request, you create a branch with exactly the commits that you wanted
merged in on it. No more, no less.


 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults // provides the .content,
 .headers and .status //
 writeln( Http.get(http://www.google.com;).content );

 Sweet. As has been discussed, often the content is not text so you
 may want to have content return ubyte[] and add a new property such
 as textContent or text.

I've already changed it to void[] as done in the std.file module. Is
ubyte[] better suited?


That's debatable. Some would argue one way, some another. Personally,
I'd argue ubyte[]. I don't like void[] one bit. Others would agree with
me, and yet others would disagree. I don't think that there's really a
general agreement on whether void[] or ubyte[] is better when it comes
to reading binary data like that.


I also think ubyte[] is best, because:

1. It can be used directly. (You can't get an element from a void[]
array without casting it to something else first.)

2. There are no assumptions about the type of data contained in the
array. (char[] arrays are assumed to be UTF-8 encoded.)

3. ubyte[] arrays are (AFAIK) not scanned by the GC. (void[] arrays may
contain pointers and must therefore be scanned.)


This isn't exactly true. arrays *created* as void[] will be scanned.
Arrays created as ubyte[] and then cast to void[] will not be scanned.

However, it is far too easy while dealing with a void[] array to have it
mysteriously flip its bit to scan-able.


I think the rule of thumb should be: If the array contains raw data of
unspecified type, but no pointers or references, use ubyte[].

void[] is very useful for input parameters, however, since all arrays are
implicitly castable to void[]:

void writeData(void[] data) { ... }

writeData(Hello World!);
writeData([1, 2, 3, 4]);


I think (and this differs from my previous opinion) const(void)[] should
be used for input parameters where any array type could be passed in.
However, ubyte[] should be used for output parameters and for internal
storage. void[] just has too many pitfalls to be used anywhere but where
its implicit casting ability is useful.

-Steve


const(ubyte)[] for input
void[] for output

that sounds reasonable. I guess that if everybody can agree on this then 
the all of phobos (e.g. std.file) should use the same types?


/Jonas



Re: Curl support RFC

2011-03-14 Thread Jonas Drewsen

On 13/03/11 23.44, Andrei Alexandrescu wrote:

You'll probably need to justify the existence of a class hierarchy and
what overridable methods there are. In particular, since you seem to
offer hooks via delegates, probably classes wouldn't be needed at all.
(FWIW I would've done the same; I wouldn't want to inherit just to
intercept the headers etc.)


Missed this one in my last reply.

Ftp/Http etc. are all inheriting from a Protocol class. The Protocol 
class defines common settings (@properties) for all protocols e.g. 
dnsTimeout, connectTimeout, networkInterface, url, port selection.


I could make these into a mixin and thereby get rid of the inheritance 
of course.


I think that keeping the Protocol as an abstract base class would 
benefit e.g. the integration with streams. In that case we could simply 
create a CurlTransport that contains a reference to a Protocol derived 
objects (Http,Ftp...).


Or would it be better to have specific HttpTransport, FtpTransport?


/Jonas


Re: Curl support RFC

2011-03-14 Thread Johannes Pfau
Jonas Drewsen wrote:
 Do you plan to add some kind of support for header parsing? I think
 something like what the .net webclient uses
 ( 
 http://msdn.microsoft.com/en-us/library/system.net.webclient(v=VS.100).aspx )
 would be great. Especially the HeaderCollection supporting headers as
 strings and as data types (for both parsing and formatting), but
 without a class hierarchy for the headers, using templates instead.

It would be nice to be able to get/set headers by string and enums 
(http://msdn.microsoft.com/en-us/library/system.net.httprequestheader.aspx). 
But I cannot see that .net is using datatypes or templates for it.
Could you give me a pointer please?


You're right I didn't look close enough at the .net documentation. I
thought HttpRequestHeader is a class. What I meant for D was something
like this:

struct ETagHeader
{
//Data members
bool Weak = false;
string Value;

//All header structs provide these
static string Key = ETag;

static ETagHeader parse(string value)
{
//parser logic here
}

void format(T writer)
if (isOutputRange!(T, string))
{
if(etag.Weak)
writer.put(W/);
assert(etag.Value != );
writer.put(quote(etag.Value));
}
}

Then we can offer methods like these:

setHeader(T)(T header)
if(isHeader(T))
{
headers[T.Key] = formatHeader(header);
}

T getHeader(T type)()
if(isHeader(T))
{
   if(!T.Key in headers)
   throw Exception();
   return T.parse(headers[T.key]);
}

So user code wouldn't have to deal with header parsing / formatting:
auto etag = client.getHeader!ETagHeader();
assert(etag.Weak);

 I've written D parsers/formatters for almost all headers in
 rfc2616 (1 or 2 might be missing) and for a few additional commonly
 used headers (Content-Disposition, cookie headers). The parsers are
 written with ragel and are to be used with curl (continuations must
 be removed and the parsers always take 1 line of input, just as you
 get it from curl). Right now only the client side is implemented (no
 parsers for headers which can only be sent from client--server ).
 However, I need to add some more documentation to the parsers, need
 to do some refactoring and I've got absolutely no time for that in
 the next 2 weeks ('abitur' final exams). But if you could wait 2
 weeks or if you wanted to do the refactoring yourself, I would be
 happy to contribute that code.

That sounds very interesting. I would very much like to see the code
and see if fits in.

Ok, here it is, but it seriously needs to be refactored and documented:
https://gist.github.com/869324

-- 
Johannes Pfau


signature.asc
Description: PGP signature


Re: Curl support RFC

2011-03-14 Thread Andrei Alexandrescu

On 3/14/11 4:36 AM, Jonathan M Davis wrote:

That's debatable. Some would argue one way, some another. Personally, I'd argue
ubyte[]. I don't like void[] one bit. Others would agree with me, and yet others
would disagree. I don't think that there's really a general agreement on whether
void[] or ubyte[] is better when it comes to reading binary data like that.


void[]: There is a typed array underneath, but I forgot its exact type.

Evidence: all array types convert to void[] automatically.

ubyte[]: We're dealing with an array of octets here.

Evidence: ubyte[] has no special properties over T[].

All raw data reads should yield ubyte[], not void[]. This is because the 
user may or may not know that underneath really there's a different 
type, but the compiler and runtime have no such idea. So the burden of 
the assumption is on the user.


Raw data writes that take arrays could be allowed to accept void[] if 
implicit conversion from T[] is desirable.



Andrei


Re: Curl support RFC

2011-03-14 Thread Andrei Alexandrescu

On 3/14/11 10:06 AM, Jonas Drewsen wrote:

const(ubyte)[] for input
void[] for output

that sounds reasonable. I guess that if everybody can agree on this then
the all of phobos (e.g. std.file) should use the same types?


Move the const from the first to the second line :o). I see no reason 
why user code can't mess with the buffer once read.


Yes, I agree std.file et al should switch to ubyte[].

Andrei



Re: Curl support RFC

2011-03-14 Thread Andrei Alexandrescu

On 3/14/11 10:38 AM, Jonas Drewsen wrote:

On 13/03/11 23.44, Andrei Alexandrescu wrote:

You'll probably need to justify the existence of a class hierarchy and
what overridable methods there are. In particular, since you seem to
offer hooks via delegates, probably classes wouldn't be needed at all.
(FWIW I would've done the same; I wouldn't want to inherit just to
intercept the headers etc.)


Missed this one in my last reply.

Ftp/Http etc. are all inheriting from a Protocol class. The Protocol
class defines common settings (@properties) for all protocols e.g.
dnsTimeout, connectTimeout, networkInterface, url, port selection.

I could make these into a mixin and thereby get rid of the inheritance
of course.


Use Occam's razor and the path of least resistence to get the most 
natural interface.



I think that keeping the Protocol as an abstract base class would
benefit e.g. the integration with streams. In that case we could simply
create a CurlTransport that contains a reference to a Protocol derived
objects (Http,Ftp...).

Or would it be better to have specific HttpTransport, FtpTransport?


Count the commonalities and the differences and then make an executive 
decision.



Andrei


Re: Curl support RFC

2011-03-14 Thread Andrei Alexandrescu

On 3/14/11 4:16 AM, Jonas Drewsen wrote:

On 13/03/11 23.44, Andrei Alexandrescu wrote:

Sweet. As has been discussed, often the content is not text so you may
want to have content return ubyte[] and add a new property such as
textContent or text.


I've already changed it to void[] as done in the std.file module. Is
ubyte[] better suited?


Yah, as per the ensuing discussion.


As discussed, properties may be better here than setXxx and getXxx. The
setReceiveCallback hook should take a ubyte[]. The
setReceiveHeaderCallback should take a const(char)[]. That way you won't
need to copy all headers, leaving safely that option to the client.


I've already replaced the set/get methods with properties and renamed
them. Hadn't thought of using const(char)[].. thanks for the hint.


A good general guideline: make sure that the user could easily and 
safely use a loop that reads a large http stream (with hooks and all) 
without allocating one item each pass through the loop.



Regarding a range interface, it would be great if you allowed e.g.

foreach (line; Http.get(https://mail.google.com;).byLine()) {
...
}

The data transfer should happen concurrently with the foreach code. The
type of line is char[] or const(char)[]. Similarly, there would be a
byChunk interface that transfers in ubyte[] chunks.

Also we need a head() method for the corresponding command.

Andrei


That would be neat. What do you mean about concurrent data transfers
with foreach?


Assume the body of the loop does some time-consuming processing - like 
e.g. writing to another HTTP stream. Then your network reads should not 
wait for that processing. While the user code does something, you should 
already have the next transfer in flight.


Example: a utility that efficiently uses GET from one http source and 
uses the data to POST it to an http target should be an efficient 
few-liner. (FTP versions and mixed ones too.)



Andrei


Re: Curl support RFC

2011-03-14 Thread Jonas Drewsen

On 14/03/11 16.40, Johannes Pfau wrote:

Jonas Drewsen wrote:

Do you plan to add some kind of support for header parsing? I think
something like what the .net webclient uses
( http://msdn.microsoft.com/en-us/library/system.net.webclient(v=VS.100).aspx )
would be great. Especially the HeaderCollection supporting headers as
strings and as data types (for both parsing and formatting), but
without a class hierarchy for the headers, using templates instead.


It would be nice to be able to get/set headers by string and enums
(http://msdn.microsoft.com/en-us/library/system.net.httprequestheader.aspx).
But I cannot see that .net is using datatypes or templates for it.
Could you give me a pointer please?



You're right I didn't look close enough at the .net documentation. I
thought HttpRequestHeader is a class. What I meant for D was something
like this:

struct ETagHeader
{
 //Data members
 bool Weak = false;
 string Value;

 //All header structs provide these
 static string Key = ETag;

 static ETagHeader parse(string value)
 {
 //parser logic here
 }

 void format(T writer)
 if (isOutputRange!(T, string))
 {
 if(etag.Weak)
 writer.put(W/);
 assert(etag.Value != );
 writer.put(quote(etag.Value));
 }
}

Then we can offer methods like these:

setHeader(T)(T header)
 if(isHeader(T))
{
 headers[T.Key] = formatHeader(header);
}

T getHeader(T type)()
 if(isHeader(T))
{
if(!T.Key in headers)
throw Exception();
return T.parse(headers[T.key]);
}

So user code wouldn't have to deal with header parsing / formatting:
auto etag = client.getHeader!ETagHeader();
assert(etag.Weak);


Seems like a very nice addition. I will have a look at your github and 
probably wait until you have made it ready for consumption before adding 
it :)



I've written D parsers/formatters for almost all headers in
rfc2616 (1 or 2 might be missing) and for a few additional commonly
used headers (Content-Disposition, cookie headers). The parsers are
written with ragel and are to be used with curl (continuations must
be removed and the parsers always take 1 line of input, just as you
get it from curl). Right now only the client side is implemented (no
parsers for headers which can only be sent from client--server ).
However, I need to add some more documentation to the parsers, need
to do some refactoring and I've got absolutely no time for that in
the next 2 weeks ('abitur' final exams). But if you could wait 2
weeks or if you wanted to do the refactoring yourself, I would be
happy to contribute that code.


That sounds very interesting. I would very much like to see the code
and see if fits in.


Ok, here it is, but it seriously needs to be refactored and documented:
https://gist.github.com/869324





Re: Curl support RFC

2011-03-14 Thread Jonas Drewsen

On 14/03/11 18.46, Andrei Alexandrescu wrote:

On 3/14/11 10:06 AM, Jonas Drewsen wrote:

const(ubyte)[] for input
void[] for output

that sounds reasonable. I guess that if everybody can agree on this then
the all of phobos (e.g. std.file) should use the same types?


Move the const from the first to the second line :o). I see no reason
why user code can't mess with the buffer once read.


You are right of course. bummer.


Yes, I agree std.file et al should switch to ubyte[].

Andrei


Then lets hope someone makes a patch for it. Maybe I'll make it when I'm 
done with the curl stuff if no one beats me to it.


/Jonas




Re: Curl support RFC

2011-03-14 Thread Jonas Drewsen

On 14/03/11 18.55, Andrei Alexandrescu wrote:

On 3/14/11 4:16 AM, Jonas Drewsen wrote:

On 13/03/11 23.44, Andrei Alexandrescu wrote:

Sweet. As has been discussed, often the content is not text so you may
want to have content return ubyte[] and add a new property such as
textContent or text.


I've already changed it to void[] as done in the std.file module. Is
ubyte[] better suited?


Yah, as per the ensuing discussion.


As discussed, properties may be better here than setXxx and getXxx. The
setReceiveCallback hook should take a ubyte[]. The
setReceiveHeaderCallback should take a const(char)[]. That way you won't
need to copy all headers, leaving safely that option to the client.


I've already replaced the set/get methods with properties and renamed
them. Hadn't thought of using const(char)[].. thanks for the hint.


A good general guideline: make sure that the user could easily and
safely use a loop that reads a large http stream (with hooks and all)
without allocating one item each pass through the loop.


Makes sense. I'll keep that in mind.


Regarding a range interface, it would be great if you allowed e.g.

foreach (line; Http.get(https://mail.google.com;).byLine()) {
...
}

The data transfer should happen concurrently with the foreach code. The
type of line is char[] or const(char)[]. Similarly, there would be a
byChunk interface that transfers in ubyte[] chunks.

Also we need a head() method for the corresponding command.

Andrei


That would be neat. What do you mean about concurrent data transfers
with foreach?


Assume the body of the loop does some time-consuming processing - like
e.g. writing to another HTTP stream. Then your network reads should not
wait for that processing. While the user code does something, you should
already have the next transfer in flight.

Example: a utility that efficiently uses GET from one http source and
uses the data to POST it to an http target should be an efficient
few-liner. (FTP versions and mixed ones too.)


Andrei


I get it. Any existing implementation that does this I can have a look at?

/Jonas



Re: Curl support RFC

2011-03-14 Thread Andrei Alexandrescu

On 3/14/11 4:11 PM, Jonas Drewsen wrote:

On 14/03/11 18.55, Andrei Alexandrescu wrote:

Assume the body of the loop does some time-consuming processing - like
e.g. writing to another HTTP stream. Then your network reads should not
wait for that processing. While the user code does something, you should
already have the next transfer in flight.

Example: a utility that efficiently uses GET from one http source and
uses the data to POST it to an http target should be an efficient
few-liner. (FTP versions and mixed ones too.)


Andrei


I get it. Any existing implementation that does this I can have a look at?


Unfortunately not at the moment. I wanted to define such a thing for 
std.stdio called byLineAsync and byChunkAsync but never got to it.


The basic idea is:

1. Define a new range type, e.g. AsyncHttpInputRange

2. Inside that range start a secondary thread that does the actual 
transfer and passes read buffers to the main thread by means of messages


3. See std.concurrency and the free chapter 
http://www.informit.com/articles/printerfriendly.aspx?p=1609144 for details


4. Control congestion (too many buffers in flight) with setMaxMailboxSize.

5. Make sure you have a little protocol that stops the secondary thread 
when the range is destroyed.



Andrei



Re: Curl support RFC

2011-03-13 Thread Jesse Phillips
Jonas Drewsen Wrote:

 Just tried the property stuff out but it seems a bit inconsistent. Maybe 
 someone can enlighten me:
 
 import std.stdio;
 
 alias void delegate() deleg;
 
 class T {
private deleg tvalue;
@property void prop(deleg dg) {
  tvalue = dg;
}
@property deleg prop() {
  return tvalue;
}
 }
 
 void main(string[] args) {
T t = new T;
t.prop = { writeln(fda); };
 
// Seems a bit odd that assigning to a temporary (tvalue) suddently
// changes the behaviour.
auto tvalue = t.prop;
tvalue(); // Works as expected by printing fda
t.prop(); // Just returns the delegate!
 
// Shouldn't the @property attribute ensure that no () is needed
// when using the property
t.prop()(); // Works
 }
 
 /Jonas

Ah, yes. One of the big reasons for introducing @property was because returning 
delegates could be very confusing in terms if whether the delegate is called or 
returned from the function. Since the old system has not yet been ripped out 
@property basically does nothing except under some conditions where it will 
complain you have added a ().

So the situation should improve, but I really don't know how or when things 
will change.


Re: Curl support RFC

2011-03-13 Thread Jonas Drewsen

Hi,

  So I've been working a bit on the etc.curl module. Currently most of 
the HTTP functionality is done and some very simple Ftp.


I would very much like to know if this has a chance of getting in phobos 
if I finish it with the current design. If not then it will be for my 
own project only and doesn't need as much documentation or all the features.


https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:

Hi,

So I've spent some time trying to wrap libcurl for D. There is a lot of
things that you can do with libcurl which I did not know so I'm starting
out small.

For now I've created all the declarations for the latest public curl C
api. I have put that in the etc.c.curl module.

On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently works
but before proceeding further down this road I would like to get your
comments on it.

//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get(http://www.google.com;).content );

//
// GET with custom data receiver delegates
//
Http http = new Http(http://www.google.dk;);
http.setReceiveHeaderCallback( (string key, string value) {
writeln(key ~ : ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;

//
// POST with some timouts
//
http.setUrl(http://www.testing.com/test.cgi;);
http.setReceiveCallback( (string data) { writeln(data); } );
http.setConnectTimeout(1000);
http.setDataTimeout(1000);
http.setDnsTimeout(1000);
http.setPostData(The quick);
http.perform;

//
// PUT with data sender delegate
//
string msg = Hello world;
size_t len = msg.length; /* using chuncked transfer if omitted */

http.setSendCallback( delegate size_t(char[] data) {
if (msg.empty) return 0;
auto l = msg.length;
data[0..l] = msg[0..$];
msg.length = 0;
return l;
},
HttpMethod.put, len );
http.perform;

//
// HTTPS
//
writeln(Http.get(https://mail.google.com;).content);

//
// FTP
//
writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
./downloaded-file));


// ... authenication, cookies, interface select, progress callback
// etc. is also implemented this way.


/Jonas




Re: Curl support RFC

2011-03-13 Thread Andrei Alexandrescu

On 3/11/11 9:20 AM, Jonas Drewsen wrote:

Hi,

So I've spent some time trying to wrap libcurl for D. There is a lot of
things that you can do with libcurl which I did not know so I'm starting
out small.

For now I've created all the declarations for the latest public curl C
api. I have put that in the etc.c.curl module.


Great! Could you please create a pull request for that?


On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently works
but before proceeding further down this road I would like to get your
comments on it.

//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get(http://www.google.com;).content );


Sweet. As has been discussed, often the content is not text so you may 
want to have content return ubyte[] and add a new property such as 
textContent or text.



//
// GET with custom data receiver delegates
//
Http http = new Http(http://www.google.dk;);


You'll probably need to justify the existence of a class hierarchy and 
what overridable methods there are. In particular, since you seem to 
offer hooks via delegates, probably classes wouldn't be needed at all. 
(FWIW I would've done the same; I wouldn't want to inherit just to 
intercept the headers etc.)



http.setReceiveHeaderCallback( (string key, string value) {
writeln(key ~ : ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;


As discussed, properties may be better here than setXxx and getXxx. The 
setReceiveCallback hook should take a ubyte[]. The 
setReceiveHeaderCallback should take a const(char)[]. That way you won't 
need to copy all headers, leaving safely that option to the client.



//
// POST with some timouts
//
http.setUrl(http://www.testing.com/test.cgi;);
http.setReceiveCallback( (string data) { writeln(data); } );
http.setConnectTimeout(1000);
http.setDataTimeout(1000);
http.setDnsTimeout(1000);
http.setPostData(The quick);
http.perform;


setPostData - setTextPostData, and then changing everything to 
properties would make it something like textPostData. Or wait, there 
could be some overloading going on... Anyway, the basic idea is that 
generally get and post data could be raw bytes, and the user could elect 
to transfer strings instead.



//
// PUT with data sender delegate
//
string msg = Hello world;
size_t len = msg.length; /* using chuncked transfer if omitted */

http.setSendCallback( delegate size_t(char[] data) {
if (msg.empty) return 0;
auto l = msg.length;
data[0..l] = msg[0..$];
msg.length = 0;
return l;
},
HttpMethod.put, len );
http.perform;


The callback would take ubyte[].


//
// HTTPS
//
writeln(Http.get(https://mail.google.com;).content);

//
// FTP
//
writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
./downloaded-file));


// ... authenication, cookies, interface select, progress callback
// etc. is also implemented this way.


/Jonas


This is all very encouraging. I think this API covers nicely a variety 
of needs. We need to make sure everything interacts well with threads, 
in particular that one can shut down a transfer (or the entire library) 
from a thread or callback and have the existing transfer(s) throw an 
exception immediately.


Regarding a range interface, it would be great if you allowed e.g.

foreach (line; Http.get(https://mail.google.com;).byLine()) {
   ...
}

The data transfer should happen concurrently with the foreach code. The 
type of line is char[] or const(char)[]. Similarly, there would be a 
byChunk interface that transfers in ubyte[] chunks.


Also we need a head() method for the corresponding command.


Andrei


Re: Curl support RFC

2011-03-12 Thread Jonas Drewsen

Thank you.

Regarding scalability: In my experience the fastest network handling for 
multiple concurrent request is done asyncronously using select or epoll. 
The current wrapper would probably use threading and messages to handle 
multiple concurrent requests which is not as efficient.


Usually you only need this kind of scalability for server side 
networking and not client side like libcurl is providing so I do not see 
this as a major issue for an initial version.


I do know how to support epoll/select based curl and by that better 
scalability and that would fortunately just be an extension to the API 
I've shown. Currently I will focus on getting the common things finished 
and rock solid.


/Jonas


On 11/03/11 16.30, dsimcha wrote:

I don't know much about this kind of stuff except that I use it for very simple
use cases occasionally.  One thing I'll definitely give your design credit for,
based on your examples, is making simple things simple.  I don't know how it
scales to more complex use cases (not saying it doesn't, just that I'm not
qualified to evaluate that), but I definitely would use this.  Nice work.

BTW, what is the license status of libcurl?  According to Wikipedia it's MIT
licensed.  Where does that leave us with regard to the binary attribution issue?

== Quote from Jonas Drewsen (jdrew...@nospam.com)'s article

Hi,
 So I've spent some time trying to wrap libcurl for D. There is a lot
of things that you can do with libcurl which I did not know so I'm
starting out small.
For now I've created all the declarations for the latest public curl C
api. I have put that in the etc.c.curl module.
On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently works
but before proceeding further down this road I would like to get your
comments on it.
//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get(http://www.google.com;).content );
//
// GET with custom data receiver delegates
//
Http http = new Http(http://www.google.dk;);
http.setReceiveHeaderCallback( (string key, string value) {
writeln(key ~ : ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;
//
// POST with some timouts
//
http.setUrl(http://www.testing.com/test.cgi;);
http.setReceiveCallback( (string data) { writeln(data); } );
http.setConnectTimeout(1000);
http.setDataTimeout(1000);
http.setDnsTimeout(1000);
http.setPostData(The quick);
http.perform;
//
// PUT with data sender delegate
//
string msg = Hello world;
size_t len = msg.length; /* using chuncked transfer if omitted */
http.setSendCallback( delegate size_t(char[] data) {
  if (msg.empty) return 0;
  auto l = msg.length;
  data[0..l] = msg[0..$];
  msg.length = 0;
  return l;
  },
  HttpMethod.put, len );
http.perform;
//
// HTTPS
//
writeln(Http.get(https://mail.google.com;).content);
//
// FTP
//
writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
  ./downloaded-file));
// ... authenication, cookies, interface select, progress callback
// etc. is also implemented this way.
/Jonas






Re: Curl support RFC

2011-03-12 Thread Jonas Drewsen

On 11/03/11 19.31, Jacob Carlborg wrote:

On 2011-03-11 16:20, Jonas Drewsen wrote:

Hi,

So I've spent some time trying to wrap libcurl for D. There is a lot of
things that you can do with libcurl which I did not know so I'm starting
out small.

For now I've created all the declarations for the latest public curl C
api. I have put that in the etc.c.curl module.

On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently works
but before proceeding further down this road I would like to get your
comments on it.

//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get(http://www.google.com;).content );

//
// GET with custom data receiver delegates
//
Http http = new Http(http://www.google.dk;);
http.setReceiveHeaderCallback( (string key, string value) {
writeln(key ~ : ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;

//
// POST with some timouts
//
http.setUrl(http://www.testing.com/test.cgi;);
http.setReceiveCallback( (string data) { writeln(data); } );
http.setConnectTimeout(1000);
http.setDataTimeout(1000);
http.setDnsTimeout(1000);
http.setPostData(The quick);
http.perform;

//
// PUT with data sender delegate
//
string msg = Hello world;
size_t len = msg.length; /* using chuncked transfer if omitted */

http.setSendCallback( delegate size_t(char[] data) {
if (msg.empty) return 0;
auto l = msg.length;
data[0..l] = msg[0..$];
msg.length = 0;
return l;
},
HttpMethod.put, len );
http.perform;

//
// HTTPS
//
writeln(Http.get(https://mail.google.com;).content);

//
// FTP
//
writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
./downloaded-file));


// ... authenication, cookies, interface select, progress callback
// etc. is also implemented this way.


/Jonas


Is there support for other HTTP methods/verbs in the D wrapper, like
delete?



Yes.. all methods in libcurl are supported.

/Jonas


Re: Curl support RFC

2011-03-12 Thread Jonas Drewsen

On 11/03/11 22.21, Jesse Phillips wrote:

I'll make some comments on the API. Do we have to choose Http/Ftp...? The URI 
already contains this, I could see being able to specifically request one or 
the other for performance or so www.google.com works.


That is a good question.

The problem with creating a grand unified Curl class that does it all is 
that each protocol supports different things ie. http supports cookie 
handling and http redirection, ftp supports passive/active mode and dir 
listings and so on.


I think it would confuse the user of the API if e.g. he were allowed to 
set cookies on his ftp request.


The protocols supported (Http, Ftp,... classes) do have a base class 
Protocol that implements common things like timouts etc.




And what about properties? They tend to be very nice instead of set methods. 
examples below.


Actually I thought off this and went the usual C++ way of _not_ using 
public properties but use accessor methods. Is public properties 
accepted as the D way and if so what about the usual reasons about why 
you should use accessor methods (like encapsulation and tolerance to 
future changes to the API)?


I do like the shorter onHeader/onContent much better though :)

/Jonas


Jonas Drewsen Wrote:


//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get(http://www.google.com;).content );

//
// GET with custom data receiver delegates
//
Http http = new Http(http://www.google.dk;);
http.setReceiveHeaderCallback( (string key, string value) {
writeln(key ~ : ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;


http.onHeader = (string key, string value) {...};
http.onContent = (string data) { ... };
http.perform();




Re: Curl support RFC

2011-03-12 Thread Jonas Drewsen

On 12/03/11 05.30, Ary Manzana wrote:

On 3/11/11 12:20 PM, Jonas Drewsen wrote:

Hi,

So I've spent some time trying to wrap libcurl for D. There is a lot of
things that you can do with libcurl which I did not know so I'm starting
out small.

For now I've created all the declarations for the latest public curl C
api. I have put that in the etc.c.curl module.

On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently works
but before proceeding further down this road I would like to get your
comments on it.


I *love* it.

All APIs should be like yours. One-liners for what you want right now.
If it's a little more complex, some more lines. This is perfect.

Congratulations!


Thank you! Words like these keep up the motivation.

/Jonas


Re: Curl support RFC

2011-03-12 Thread Jonas Drewsen

On 11/03/11 17.33, Vladimir Panteleev wrote:

On Fri, 11 Mar 2011 17:20:38 +0200, Jonas Drewsen jdrew...@nospam.com
wrote:


writeln( Http.get(http://www.google.com;).content );


Does this return a string? What if the page's encoding isn't UTF-8?

Data should probably be returned as void[], similar to std.file.read.


Currently it returns a string, but should probably return void[] as you 
suggest.


Maybe the interface should be something like this to support misc. 
encodings (like the std.file.readText does):


class Http {
struct Result(S) {
S content;
...
}
static Result!S get(S = void[])(in string url);

}

Actually I just took a look at Andrei's std.stream2 suggestion and 
Http/Ftp... Transports would be pretty neat to have as well for reading 
formatted data.


I'll follow the newly spawned Stream proposal thread on this one :)

/Jonas


Re: Curl support RFC

2011-03-12 Thread Lutger Blijdestijn
Jonas Drewsen wrote:

 On 11/03/11 22.21, Jesse Phillips wrote:
 I'll make some comments on the API. Do we have to choose Http/Ftp...? The
 URI already contains this, I could see being able to specifically request
 one or the other for performance or so www.google.com works.
 
 That is a good question.
 
 The problem with creating a grand unified Curl class that does it all is
 that each protocol supports different things ie. http supports cookie
 handling and http redirection, ftp supports passive/active mode and dir
 listings and so on.
 
 I think it would confuse the user of the API if e.g. he were allowed to
 set cookies on his ftp request.
 
 The protocols supported (Http, Ftp,... classes) do have a base class
 Protocol that implements common things like timouts etc.
 
 
 And what about properties? They tend to be very nice instead of set
 methods. examples below.
 
 Actually I thought off this and went the usual C++ way of _not_ using
 public properties but use accessor methods. Is public properties
 accepted as the D way and if so what about the usual reasons about why
 you should use accessor methods (like encapsulation and tolerance to
 future changes to the API)?
 
 I do like the shorter onHeader/onContent much better though :)
 
 /Jonas

Properties *are* accessor methods, with some sugar. In fact you already have 
used them, try it:

http.setReceiveHeaderCallback =  (string key, string value) {
writeln(key ~ : ~ value);
};

Marking a function with @property just signals it's intended use, in which 
case it's nicer to grop the get/set prefixes. Supposedly using parenthesis 
with such declarations will be outlawed in the future, but I don't think 
that's the case currently.

 Jonas Drewsen Wrote:

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get(http://www.google.com;).content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http(http://www.google.dk;);
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ : ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 http.onHeader = (string key, string value) {...};
 http.onContent = (string data) { ... };
 http.perform();



Re: Curl support RFC

2011-03-12 Thread Jesse Phillips
Jonas Drewsen Wrote:

 On 11/03/11 22.21, Jesse Phillips wrote:
  I'll make some comments on the API. Do we have to choose Http/Ftp...? The 
  URI already contains this, I could see being able to specifically request 
  one or the other for performance or so www.google.com works.
 
 That is a good question.
 
 The problem with creating a grand unified Curl class that does it all is 
 that each protocol supports different things ie. http supports cookie 
 handling and http redirection, ftp supports passive/active mode and dir 
 listings and so on.
 
 I think it would confuse the user of the API if e.g. he were allowed to 
 set cookies on his ftp request.
 
 The protocols supported (Http, Ftp,... classes) do have a base class 
 Protocol that implements common things like timouts etc.

Ah. I guess I was just thinking about if you want to download some file, you 
don't really care where you are getting it from you just have the URL and are 
read to go.

  And what about properties? They tend to be very nice instead of set 
  methods. examples below.
 
 Actually I thought off this and went the usual C++ way of _not_ using 
 public properties but use accessor methods. Is public properties 
 accepted as the D way and if so what about the usual reasons about why 
 you should use accessor methods (like encapsulation and tolerance to 
 future changes to the API)?
 
 I do like the shorter onHeader/onContent much better though :)

D was originally very friendly with properties. Your could can at this moment 
be written: 

http.setReceiveHeaderCallback = (string key, string value) {
writeln(key ~ : ~ value);
};

But is going to be deprecated for the use of the @property attribute. You are 
probably aware of properties in C#, so yes D is fine with public fields and 
functions that look like public fields.

Otherwise this looks really good and I do hope to see it in Phobos.



Re: Curl support RFC

2011-03-12 Thread Jonas Drewsen

On 12/03/11 20.44, Jesse Phillips wrote:

Jonas Drewsen Wrote:


On 11/03/11 22.21, Jesse Phillips wrote:

I'll make some comments on the API. Do we have to choose Http/Ftp...? The URI 
already contains this, I could see being able to specifically request one or 
the other for performance or so www.google.com works.


That is a good question.

The problem with creating a grand unified Curl class that does it all is
that each protocol supports different things ie. http supports cookie
handling and http redirection, ftp supports passive/active mode and dir
listings and so on.

I think it would confuse the user of the API if e.g. he were allowed to
set cookies on his ftp request.

The protocols supported (Http, Ftp,... classes) do have a base class
Protocol that implements common things like timouts etc.


Ah. I guess I was just thinking about if you want to download some file, you 
don't really care where you are getting it from you just have the URL and are 
read to go.


There should definitely be a simple method based only on an url. I'll 
put that in.




And what about properties? They tend to be very nice instead of set methods. 
examples below.


Actually I thought off this and went the usual C++ way of _not_ using
public properties but use accessor methods. Is public properties
accepted as the D way and if so what about the usual reasons about why
you should use accessor methods (like encapsulation and tolerance to
future changes to the API)?

I do like the shorter onHeader/onContent much better though :)


D was originally very friendly with properties. Your could can at this moment 
be written:

http.setReceiveHeaderCallback = (string key, string value) {
 writeln(key ~ : ~ value);
};

But is going to be deprecated for the use of the @property attribute. You are 
probably aware of properties in C#, so yes D is fine with public fields and 
functions that look like public fields.


Just tried the property stuff out but it seems a bit inconsistent. Maybe 
someone can enlighten me:


import std.stdio;

alias void delegate() deleg;

class T {
  private deleg tvalue;
  @property void prop(deleg dg) {
tvalue = dg;
  }
  @property deleg prop() {
return tvalue;
  }
}

void main(string[] args) {
  T t = new T;
  t.prop = { writeln(fda); };

  // Seems a bit odd that assigning to a temporary (tvalue) suddently
  // changes the behaviour.
  auto tvalue = t.prop;
  tvalue(); // Works as expected by printing fda
  t.prop(); // Just returns the delegate!

  // Shouldn't the @property attribute ensure that no () is needed
  // when using the property
  t.prop()(); // Works
}

/Jonas





Otherwise this looks really good and I do hope to see it in Phobos.





Re: Curl support RFC

2011-03-12 Thread Jonathan M Davis
On Saturday 12 March 2011 13:51:37 Jonas Drewsen wrote:
 On 12/03/11 20.44, Jesse Phillips wrote:
  Jonas Drewsen Wrote:
  On 11/03/11 22.21, Jesse Phillips wrote:
  I'll make some comments on the API. Do we have to choose Http/Ftp...?
  The URI already contains this, I could see being able to specifically
  request one or the other for performance or so www.google.com works.
  
  That is a good question.
  
  The problem with creating a grand unified Curl class that does it all is
  that each protocol supports different things ie. http supports cookie
  handling and http redirection, ftp supports passive/active mode and dir
  listings and so on.
  
  I think it would confuse the user of the API if e.g. he were allowed to
  set cookies on his ftp request.
  
  The protocols supported (Http, Ftp,... classes) do have a base class
  Protocol that implements common things like timouts etc.
  
  Ah. I guess I was just thinking about if you want to download some file,
  you don't really care where you are getting it from you just have the
  URL and are read to go.
 
 There should definitely be a simple method based only on an url. I'll
 put that in.
 
  And what about properties? They tend to be very nice instead of set
  methods. examples below.
  
  Actually I thought off this and went the usual C++ way of _not_ using
  public properties but use accessor methods. Is public properties
  accepted as the D way and if so what about the usual reasons about why
  you should use accessor methods (like encapsulation and tolerance to
  future changes to the API)?
  
  I do like the shorter onHeader/onContent much better though :)
  
  D was originally very friendly with properties. Your could can at this
  moment be written:
  
  http.setReceiveHeaderCallback = (string key, string value) {
  
   writeln(key ~ : ~ value);
  
  };
  
  But is going to be deprecated for the use of the @property attribute. You
  are probably aware of properties in C#, so yes D is fine with public
  fields and functions that look like public fields.
 
 Just tried the property stuff out but it seems a bit inconsistent. Maybe
 someone can enlighten me:
 
 import std.stdio;
 
 alias void delegate() deleg;
 
 class T {
private deleg tvalue;
@property void prop(deleg dg) {
  tvalue = dg;
}
@property deleg prop() {
  return tvalue;
}
 }
 
 void main(string[] args) {
T t = new T;
t.prop = { writeln(fda); };
 
// Seems a bit odd that assigning to a temporary (tvalue) suddently
// changes the behaviour.
auto tvalue = t.prop;
tvalue(); // Works as expected by printing fda
t.prop(); // Just returns the delegate!
 
// Shouldn't the @property attribute ensure that no () is needed
// when using the property
t.prop()(); // Works
 }

@property doesn't currently enforce much of anything. Things are in a 
transitory 
state with regards to property. Originally, there was no such thing as 
@property 
and any function which had no parameters and returned a value could be used as 
a 
getter and any function which returned nothing and took a single argument could 
be used as a setter. It was decided to make it more restrictive, so @property 
was added. Eventually, you will _only_ be able to use such functions as 
property 
functions if they are marked with @property, and you will _have_ to call them 
with the property syntax and will _not_ be able to call non-property functions 
with the property syntax. However, at the moment, the compiler doesn't enforce 
that. It will eventually, but there are several bugs with regards to property 
functions (they mostly work, but you found one of the cases where they don't), 
and it probably wouldn't be a good idea to enforce it until more of those bugs 
have been fixed.

- Jonathan M Davis


Re: Curl support RFC

2011-03-12 Thread Jonas Drewsen

On 13/03/11 00.28, Jonathan M Davis wrote:

On Saturday 12 March 2011 13:51:37 Jonas Drewsen wrote:

On 12/03/11 20.44, Jesse Phillips wrote:

Jonas Drewsen Wrote:

On 11/03/11 22.21, Jesse Phillips wrote:

I'll make some comments on the API. Do we have to choose Http/Ftp...?
The URI already contains this, I could see being able to specifically
request one or the other for performance or so www.google.com works.


That is a good question.

The problem with creating a grand unified Curl class that does it all is
that each protocol supports different things ie. http supports cookie
handling and http redirection, ftp supports passive/active mode and dir
listings and so on.

I think it would confuse the user of the API if e.g. he were allowed to
set cookies on his ftp request.

The protocols supported (Http, Ftp,... classes) do have a base class
Protocol that implements common things like timouts etc.


Ah. I guess I was just thinking about if you want to download some file,
you don't really care where you are getting it from you just have the
URL and are read to go.


There should definitely be a simple method based only on an url. I'll
put that in.


And what about properties? They tend to be very nice instead of set
methods. examples below.


Actually I thought off this and went the usual C++ way of _not_ using
public properties but use accessor methods. Is public properties
accepted as the D way and if so what about the usual reasons about why
you should use accessor methods (like encapsulation and tolerance to
future changes to the API)?

I do like the shorter onHeader/onContent much better though :)


D was originally very friendly with properties. Your could can at this
moment be written:

http.setReceiveHeaderCallback = (string key, string value) {

  writeln(key ~ : ~ value);

};

But is going to be deprecated for the use of the @property attribute. You
are probably aware of properties in C#, so yes D is fine with public
fields and functions that look like public fields.


Just tried the property stuff out but it seems a bit inconsistent. Maybe
someone can enlighten me:

import std.stdio;

alias void delegate() deleg;

class T {
private deleg tvalue;
@property void prop(deleg dg) {
  tvalue = dg;
}
@property deleg prop() {
  return tvalue;
}
}

void main(string[] args) {
T t = new T;
t.prop = { writeln(fda); };

// Seems a bit odd that assigning to a temporary (tvalue) suddently
// changes the behaviour.
auto tvalue = t.prop;
tvalue(); // Works as expected by printing fda
t.prop(); // Just returns the delegate!

// Shouldn't the @property attribute ensure that no () is needed
// when using the property
t.prop()(); // Works
}


@property doesn't currently enforce much of anything. Things are in a transitory
state with regards to property. Originally, there was no such thing as @property
and any function which had no parameters and returned a value could be used as a
getter and any function which returned nothing and took a single argument could
be used as a setter. It was decided to make it more restrictive, so @property
was added. Eventually, you will _only_ be able to use such functions as property
functions if they are marked with @property, and you will _have_ to call them
with the property syntax and will _not_ be able to call non-property functions
with the property syntax. However, at the moment, the compiler doesn't enforce
that. It will eventually, but there are several bugs with regards to property
functions (they mostly work, but you found one of the cases where they don't),
and it probably wouldn't be a good idea to enforce it until more of those bugs
have been fixed.

- Jonathan M Davis


Okey... nice to hear that this is coming up.

Thanks again!
/Jonas




Re: Curl support RFC

2011-03-11 Thread dsimcha
I don't know much about this kind of stuff except that I use it for very simple
use cases occasionally.  One thing I'll definitely give your design credit for,
based on your examples, is making simple things simple.  I don't know how it
scales to more complex use cases (not saying it doesn't, just that I'm not
qualified to evaluate that), but I definitely would use this.  Nice work.

BTW, what is the license status of libcurl?  According to Wikipedia it's MIT
licensed.  Where does that leave us with regard to the binary attribution issue?

== Quote from Jonas Drewsen (jdrew...@nospam.com)'s article
 Hi,
 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.
 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.
 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.
 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get(http://www.google.com;).content );
 //
 // GET with custom data receiver delegates
 //
 Http http = new Http(http://www.google.dk;);
 http.setReceiveHeaderCallback( (string key, string value) {
   writeln(key ~ : ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;
 //
 // POST with some timouts
 //
 http.setUrl(http://www.testing.com/test.cgi;);
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData(The quick);
 http.perform;
 //
 // PUT with data sender delegate
 //
 string msg = Hello world;
 size_t len = msg.length; /* using chuncked transfer if omitted */
 http.setSendCallback( delegate size_t(char[] data) {
  if (msg.empty) return 0;
  auto l = msg.length;
  data[0..l] = msg[0..$];
  msg.length = 0;
  return l;
  },
  HttpMethod.put, len );
 http.perform;
 //
 // HTTPS
 //
 writeln(Http.get(https://mail.google.com;).content);
 //
 // FTP
 //
 writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
  ./downloaded-file));
 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.
 /Jonas



Re: Curl support RFC

2011-03-11 Thread Vladimir Panteleev
On Fri, 11 Mar 2011 17:20:38 +0200, Jonas Drewsen jdrew...@nospam.com  
wrote:



writeln( Http.get(http://www.google.com;).content );


Does this return a string? What if the page's encoding isn't UTF-8?

Data should probably be returned as void[], similar to std.file.read.

--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


Re: Curl support RFC

2011-03-11 Thread Lutger Blijdestijn
dsimcha wrote:

 I don't know much about this kind of stuff except that I use it for very
 simple
 use cases occasionally.  One thing I'll definitely give your design credit
 for,
 based on your examples, is making simple things simple.  I don't know how
 it scales to more complex use cases (not saying it doesn't, just that I'm
 not
 qualified to evaluate that), but I definitely would use this.  Nice work.
 
 BTW, what is the license status of libcurl?  According to Wikipedia it's
 MIT
 licensed.  Where does that leave us with regard to the binary attribution
 issue?
 

Walter contacted the author, it's not a problem:

http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.comgroup=digitalmars.Dartnum=112832


Re: Curl support RFC

2011-03-11 Thread Jacob Carlborg

On 2011-03-11 16:20, Jonas Drewsen wrote:

Hi,

So I've spent some time trying to wrap libcurl for D. There is a lot of
things that you can do with libcurl which I did not know so I'm starting
out small.

For now I've created all the declarations for the latest public curl C
api. I have put that in the etc.c.curl module.

On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently works
but before proceeding further down this road I would like to get your
comments on it.

//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get(http://www.google.com;).content );

//
// GET with custom data receiver delegates
//
Http http = new Http(http://www.google.dk;);
http.setReceiveHeaderCallback( (string key, string value) {
writeln(key ~ : ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;

//
// POST with some timouts
//
http.setUrl(http://www.testing.com/test.cgi;);
http.setReceiveCallback( (string data) { writeln(data); } );
http.setConnectTimeout(1000);
http.setDataTimeout(1000);
http.setDnsTimeout(1000);
http.setPostData(The quick);
http.perform;

//
// PUT with data sender delegate
//
string msg = Hello world;
size_t len = msg.length; /* using chuncked transfer if omitted */

http.setSendCallback( delegate size_t(char[] data) {
if (msg.empty) return 0;
auto l = msg.length;
data[0..l] = msg[0..$];
msg.length = 0;
return l;
},
HttpMethod.put, len );
http.perform;

//
// HTTPS
//
writeln(Http.get(https://mail.google.com;).content);

//
// FTP
//
writeln(Ftp.get(ftp://ftp.digitalmars.com/sieve.ds;,
./downloaded-file));


// ... authenication, cookies, interface select, progress callback
// etc. is also implemented this way.


/Jonas


Is there support for other HTTP methods/verbs in the D wrapper, like delete?

--
/Jacob Carlborg


Re: Curl support RFC

2011-03-11 Thread Jesse Phillips
I'll make some comments on the API. Do we have to choose Http/Ftp...? The URI 
already contains this, I could see being able to specifically request one or 
the other for performance or so www.google.com works.

And what about properties? They tend to be very nice instead of set methods. 
examples below.

Jonas Drewsen Wrote:

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get(http://www.google.com;).content );
 
 //
 // GET with custom data receiver delegates
 //
 Http http = new Http(http://www.google.dk;);
 http.setReceiveHeaderCallback( (string key, string value) {
   writeln(key ~ : ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

http.onHeader = (string key, string value) {...};
http.onContent = (string data) { ... };
http.perform();


Re: Curl support RFC

2011-03-11 Thread Ary Manzana

On 3/11/11 12:20 PM, Jonas Drewsen wrote:

Hi,

So I've spent some time trying to wrap libcurl for D. There is a lot of
things that you can do with libcurl which I did not know so I'm starting
out small.

For now I've created all the declarations for the latest public curl C
api. I have put that in the etc.c.curl module.

On top of that I've created a more D like api as seen below. This is
located in the 'etc.curl' module. What you can see below currently works
but before proceeding further down this road I would like to get your
comments on it.


I *love* it.

All APIs should be like yours. One-liners for what you want right now. 
If it's a little more complex, some more lines. This is perfect.


Congratulations!