Re: [PHP-DEV] RFC1867 (multipart/form-data) PUT requests

2023-06-28 Thread Ilija Tovilo
Hi Ben

On Tue, Jun 27, 2023 at 9:54 PM Ben Ramsey  wrote:
>
> > On Jun 27, 2023, at 04:01, Ilija Tovilo  wrote:
> >
> > Hi Ben, Hi Rowan
> >
> > On Mon, Jun 26, 2023 at 8:55 PM Ben Ramsey  wrote:
> >>
> >>> On Jun 20, 2023, at 06:06, Rowan Tommins  wrote:
> >>>
> >>> On Tue, 20 Jun 2023 at 10:25, Ilija Tovilo  wrote:
> >>>
>  Introduce a new function (currently named populate_post_data()) to
>  read the input stream and populate the $_POST and $_FILES
>  superglobals.
>
> In the past, I’ve used something like the following to solve this:
>
> parse_str(file_get_contents('php://input'), $data);
>
> I haven’t looked up how any of the frameworks solve this, but I would be 
> willing to bet they also do something similar.
>
> Rather than implementing functionality to populate globals, would you be 
> interested in introducing some new HTTP request functions. Something like:
>
> http_request_body(): string
> http_parse_query(string $queryString): array
>
> `http_request_body()` would return the raw body and would be the equivalent 
> of calling `file_get_contents('php://input')`. Of special note is that it 
> should _always_ return the raw body, even if `$_POST` is populated, for the 
> sake of consistency and reducing confusion.
>
> `http_parse_query()` would be the opposite of `http_build_query()` and would 
> return a value instead of requiring a reference parameter, like `parse_str()`.

The problem is that the content stream for multipart/form-data is
expected to be big, as in possibly multiple gigabytes big. We can't
use http_request_body() to return the entire content as a string at
once. The current RFC1867 implementation reads and operates in chunks,
i.e. appends it to a file or to a string, depending on the content
part. It never has to hold on to the entire content in memory.
http_request_body() also can't return the content of the request again
after it has been consumed, because that's not how the HTTP protocol
works. We would need to buffer the content somewhere when reading it
for the first time, which again we can't do because it may be very
big.

It may be possible to pass the fopen('php://input', 'r') stream to
this function and let it consume it. However, as mentioned in my
original e-mail this requires some changes to how RFC1867 requests are
handled. Currently, it calls sapi_module.read_post() which directly
reads from the TCP socket. Instead, we'd need to read from the stream,
possibly in addition so that the general case is not degraded in terms
of performance. I'll verify if this is an option, and whether the
changes are (too) big. However, I don't suspect there to be a lot of
use-cases for this as RFC1867 is primarily used for requests and not
for responses, so you wouldn't usually need to parse this type of
content from some other source.

As for returning the parsed values as non-globals, that's entirely
possible. However, it's inconsistent with how requests are currently
handled. The values will need to be passed around manually and kept
alive, but the function still modifies global state (i.e. the input
stream, whether that's sapi_module.read_post() or php://input). I
don't believe it will be common to call this function more than once
per request, and thus decoupling the state is not really necessary.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC1867 (multipart/form-data) PUT requests

2023-06-28 Thread Michał Marcin Brzuchalski
Hi Andreas,

śr., 28 cze 2023 o 07:55 Andreas Heigl  napisał(a):

> ...
> While I like not adding more Superglobals, it seems like we are adding
> more and more functions to retrieve the different parts of a
> Request-Object...
>
> So when we are at it: Why don't we introduce exactly that? A
> Request-Object that has all the methods. And which is immutable.
>
> And one method `request(): \Request`
>

I agree with the above, this would allow us to clean up the global
namespace in the future.
My personal use cases for PHP are mostly queue workers/event stream
consumers,
so the usual request/response model SAPI is used rarely. But I understand
that it's not the most used case.
Adding a couple of additional functions to a set of already ones just adds
more symbols not always used.
Ideally, I'd see an HTTP module in the future enabled in
request/response-oriented SAPI but that's a different story.


>
> I deliberately didn't call it `getRequest` (or `get_request`) to not
> confuse people why there isn't also a `post_request` or `put_request` or
> ... you get the picture)
>
> One additional function in global namespace and then we can use one of
> the request-objects that are already out in the wild. I don't think
> there's a need to invent the wheel again.
>
> The advantage might be that no matter how many calls to `request()` you
> will always get the same result. The Request as it came in.
>

That sounds like a use for a const?!

Just my .50€

Cheers,
Michał Marcin Brzuchalski


Re: [PHP-DEV] RFC1867 (multipart/form-data) PUT requests

2023-06-27 Thread Andreas Heigl

Hey All

On 28.06.23 02:45, Larry Garfield wrote:

On Tue, Jun 27, 2023, at 3:26 PM, Stephen Reay wrote:

On 28 Jun 2023, at 02:53, Ben Ramsey  wrote:


On Jun 27, 2023, at 04:01, Ilija Tovilo  wrote:

Hi Ben, Hi Rowan

On Mon, Jun 26, 2023 at 8:55 PM Ben Ramsey  wrote:



On Jun 20, 2023, at 06:06, Rowan Tommins  wrote:

On Tue, 20 Jun 2023 at 10:25, Ilija Tovilo  wrote:


Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals.


How about "request_form_populate_globals"?


The word "form" seems a bit out of place (even though it appears in
both multipart/form-data and application/x-www-form-urlencoded),
because this function is mainly targeted at PUT/PATCH requests for
REST APIs. Maybe request_body_populate_globals?


Another option for the name: `populate_multipart_form_data()`.


I avoided the term "multipart" because the function technically also
works for application/x-www-form-urlencoded requests. It's less
necessary for the reasons outlined in my previous email, but it would
allow for consistent handling of such requests for all HTTP methods.

Some people on GitHub voiced that they would prefer an INI setting.
Therefore I will create an RFC accordingly.



I know this issue comes up enough because it’s confusing to developers that 
there’s not a `$_PUT`, etc., but I’m not a fan of introducing something new 
that populates globals.

While I’ve never used `application/x-www-form-urlencoded` data for `PUT` 
requests (because I don’t think it’s a proper content-type for the semantics of 
`PUT`), I do see how this content-type could be used with `PATCH`, and I also 
don’t want to rule-out use cases of it for `PUT` or any other HTTP method.

In the past, I’ve used something like the following to solve this:

parse_str(file_get_contents('php://input'), $data);

I haven’t looked up how any of the frameworks solve this, but I would be 
willing to bet they also do something similar.

Rather than implementing functionality to populate globals, would you be 
interested in introducing some new HTTP request functions. Something like:

http_request_body(): string
http_parse_query(string $queryString): array

`http_request_body()` would return the raw body and would be the equivalent of 
calling `file_get_contents('php://input')`. Of special note is that it should 
_always_ return the raw body, even if `$_POST` is populated, for the sake of 
consistency and reducing confusion.

`http_parse_query()` would be the opposite of `http_build_query()` and would 
return a value instead of requiring a reference parameter, like `parse_str()`.

While these don’t address the confusion users face by not having a `$_PUT` 
superglobal, I’d prefer not to overload the superglobals. Maybe we can update 
the documentation to encourage use of these new functions instead of the 
superglobals?

We also might want to introduce something like `http_query_string(): string` 
that always returns the raw query string, instead of requiring use of the 
superglobal `$_SERVER['QUERY_STRING']`.

Cheers,
Ben



As a userland/library developer I *much* prefer this approach, but
*please* also include `http_uploads()` (name/signature TBD), returning
an array of file uploads using a structure akin to what $_POST (or
http_parse_query) gives, based on the field name, rather than the
current scenario where the presence of square brackets radically
changes the structure of the $_FILES array.

My initial thought was that perhaps the leaf entities in said array
could be a subclass of SplFileInfo referencing the uploaded temporary
file, with additional read-only properties for the extra upload-related
attributes - but really the important thing is just having a sane,
consistent structure. Having an object rather than an array with
constant keys would be a nice addition, but is much less important.



Cheers

Stephen


I also like Ben's suggestion.  It avoids adding more magic globals, but also is 
still simple and straightforward to use in an easily composable fashion.  Based 
on the Content-Type header of the request, it would allow for:

$formData = http_parse_query(http_request_body());
$jsonData = json_decode(http_request_body());
... Whatever other body parsers.

While you should probably not call it exactly like that, it means "take body string, 
pass to mime-appropriate body parser" is an easy pattern, and one that can also be 
easily mocked for testing.


While I like not adding more Superglobals, it seems like we are adding 
more and more functions to retrieve the different parts of a 
Request-Object...


So when we are at it: Why don't we introduce exactly that? A 
Request-Object that has all the methods. And which is immutable.


And one method `request(): \Request`

I deliberately didn't call it `getRequest` (or `get_request`) to not 
confuse people why there isn't also a `post_request` or `put_request` or 
... you get the picture)


One additional 

Re: [PHP-DEV] RFC1867 (multipart/form-data) PUT requests

2023-06-27 Thread Larry Garfield
On Tue, Jun 27, 2023, at 3:26 PM, Stephen Reay wrote:
>> On 28 Jun 2023, at 02:53, Ben Ramsey  wrote:
>> 
>>> On Jun 27, 2023, at 04:01, Ilija Tovilo  wrote:
>>> 
>>> Hi Ben, Hi Rowan
>>> 
>>> On Mon, Jun 26, 2023 at 8:55 PM Ben Ramsey  wrote:
 
> On Jun 20, 2023, at 06:06, Rowan Tommins  wrote:
> 
> On Tue, 20 Jun 2023 at 10:25, Ilija Tovilo  wrote:
> 
>> Introduce a new function (currently named populate_post_data()) to
>> read the input stream and populate the $_POST and $_FILES
>> superglobals.
> 
> How about "request_form_populate_globals"?
>>> 
>>> The word "form" seems a bit out of place (even though it appears in
>>> both multipart/form-data and application/x-www-form-urlencoded),
>>> because this function is mainly targeted at PUT/PATCH requests for
>>> REST APIs. Maybe request_body_populate_globals?
>>> 
 Another option for the name: `populate_multipart_form_data()`.
>>> 
>>> I avoided the term "multipart" because the function technically also
>>> works for application/x-www-form-urlencoded requests. It's less
>>> necessary for the reasons outlined in my previous email, but it would
>>> allow for consistent handling of such requests for all HTTP methods.
>>> 
>>> Some people on GitHub voiced that they would prefer an INI setting.
>>> Therefore I will create an RFC accordingly.
>> 
>> 
>> I know this issue comes up enough because it’s confusing to developers that 
>> there’s not a `$_PUT`, etc., but I’m not a fan of introducing something new 
>> that populates globals.
>> 
>> While I’ve never used `application/x-www-form-urlencoded` data for `PUT` 
>> requests (because I don’t think it’s a proper content-type for the semantics 
>> of `PUT`), I do see how this content-type could be used with `PATCH`, and I 
>> also don’t want to rule-out use cases of it for `PUT` or any other HTTP 
>> method.
>> 
>> In the past, I’ve used something like the following to solve this:
>> 
>>parse_str(file_get_contents('php://input'), $data);
>> 
>> I haven’t looked up how any of the frameworks solve this, but I would be 
>> willing to bet they also do something similar.
>> 
>> Rather than implementing functionality to populate globals, would you be 
>> interested in introducing some new HTTP request functions. Something like:
>> 
>>http_request_body(): string
>>http_parse_query(string $queryString): array
>> 
>> `http_request_body()` would return the raw body and would be the equivalent 
>> of calling `file_get_contents('php://input')`. Of special note is that it 
>> should _always_ return the raw body, even if `$_POST` is populated, for the 
>> sake of consistency and reducing confusion.
>> 
>> `http_parse_query()` would be the opposite of `http_build_query()` and would 
>> return a value instead of requiring a reference parameter, like 
>> `parse_str()`.
>> 
>> While these don’t address the confusion users face by not having a `$_PUT` 
>> superglobal, I’d prefer not to overload the superglobals. Maybe we can 
>> update the documentation to encourage use of these new functions instead of 
>> the superglobals?
>> 
>> We also might want to introduce something like `http_query_string(): string` 
>> that always returns the raw query string, instead of requiring use of the 
>> superglobal `$_SERVER['QUERY_STRING']`.
>> 
>> Cheers,
>> Ben
>
>
> As a userland/library developer I *much* prefer this approach, but 
> *please* also include `http_uploads()` (name/signature TBD), returning 
> an array of file uploads using a structure akin to what $_POST (or 
> http_parse_query) gives, based on the field name, rather than the 
> current scenario where the presence of square brackets radically 
> changes the structure of the $_FILES array.
>
> My initial thought was that perhaps the leaf entities in said array 
> could be a subclass of SplFileInfo referencing the uploaded temporary 
> file, with additional read-only properties for the extra upload-related 
> attributes - but really the important thing is just having a sane, 
> consistent structure. Having an object rather than an array with 
> constant keys would be a nice addition, but is much less important.
>
>
>
> Cheers
>
> Stephen

I also like Ben's suggestion.  It avoids adding more magic globals, but also is 
still simple and straightforward to use in an easily composable fashion.  Based 
on the Content-Type header of the request, it would allow for:

$formData = http_parse_query(http_request_body());
$jsonData = json_decode(http_request_body());
... Whatever other body parsers.

While you should probably not call it exactly like that, it means "take body 
string, pass to mime-appropriate body parser" is an easy pattern, and one that 
can also be easily mocked for testing.

--Larry Garfield

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC1867 (multipart/form-data) PUT requests

2023-06-27 Thread Stephen Reay


> On 28 Jun 2023, at 02:53, Ben Ramsey  wrote:
> 
>> On Jun 27, 2023, at 04:01, Ilija Tovilo  wrote:
>> 
>> Hi Ben, Hi Rowan
>> 
>> On Mon, Jun 26, 2023 at 8:55 PM Ben Ramsey  wrote:
>>> 
 On Jun 20, 2023, at 06:06, Rowan Tommins  wrote:
 
 On Tue, 20 Jun 2023 at 10:25, Ilija Tovilo  wrote:
 
> Introduce a new function (currently named populate_post_data()) to
> read the input stream and populate the $_POST and $_FILES
> superglobals.
 
 How about "request_form_populate_globals"?
>> 
>> The word "form" seems a bit out of place (even though it appears in
>> both multipart/form-data and application/x-www-form-urlencoded),
>> because this function is mainly targeted at PUT/PATCH requests for
>> REST APIs. Maybe request_body_populate_globals?
>> 
>>> Another option for the name: `populate_multipart_form_data()`.
>> 
>> I avoided the term "multipart" because the function technically also
>> works for application/x-www-form-urlencoded requests. It's less
>> necessary for the reasons outlined in my previous email, but it would
>> allow for consistent handling of such requests for all HTTP methods.
>> 
>> Some people on GitHub voiced that they would prefer an INI setting.
>> Therefore I will create an RFC accordingly.
> 
> 
> I know this issue comes up enough because it’s confusing to developers that 
> there’s not a `$_PUT`, etc., but I’m not a fan of introducing something new 
> that populates globals.
> 
> While I’ve never used `application/x-www-form-urlencoded` data for `PUT` 
> requests (because I don’t think it’s a proper content-type for the semantics 
> of `PUT`), I do see how this content-type could be used with `PATCH`, and I 
> also don’t want to rule-out use cases of it for `PUT` or any other HTTP 
> method.
> 
> In the past, I’ve used something like the following to solve this:
> 
>parse_str(file_get_contents('php://input'), $data);
> 
> I haven’t looked up how any of the frameworks solve this, but I would be 
> willing to bet they also do something similar.
> 
> Rather than implementing functionality to populate globals, would you be 
> interested in introducing some new HTTP request functions. Something like:
> 
>http_request_body(): string
>http_parse_query(string $queryString): array
> 
> `http_request_body()` would return the raw body and would be the equivalent 
> of calling `file_get_contents('php://input')`. Of special note is that it 
> should _always_ return the raw body, even if `$_POST` is populated, for the 
> sake of consistency and reducing confusion.
> 
> `http_parse_query()` would be the opposite of `http_build_query()` and would 
> return a value instead of requiring a reference parameter, like `parse_str()`.
> 
> While these don’t address the confusion users face by not having a `$_PUT` 
> superglobal, I’d prefer not to overload the superglobals. Maybe we can update 
> the documentation to encourage use of these new functions instead of the 
> superglobals?
> 
> We also might want to introduce something like `http_query_string(): string` 
> that always returns the raw query string, instead of requiring use of the 
> superglobal `$_SERVER['QUERY_STRING']`.
> 
> Cheers,
> Ben


As a userland/library developer I *much* prefer this approach, but *please* 
also include `http_uploads()` (name/signature TBD), returning an array of file 
uploads using a structure akin to what $_POST (or http_parse_query) gives, 
based on the field name, rather than the current scenario where the presence of 
square brackets radically changes the structure of the $_FILES array.

My initial thought was that perhaps the leaf entities in said array could be a 
subclass of SplFileInfo referencing the uploaded temporary file, with 
additional read-only properties for the extra upload-related attributes - but 
really the important thing is just having a sane, consistent structure. Having 
an object rather than an array with constant keys would be a nice addition, but 
is much less important.



Cheers

Stephen 

Re: [PHP-DEV] RFC1867 (multipart/form-data) PUT requests

2023-06-27 Thread Ben Ramsey
> On Jun 27, 2023, at 04:01, Ilija Tovilo  wrote:
> 
> Hi Ben, Hi Rowan
> 
> On Mon, Jun 26, 2023 at 8:55 PM Ben Ramsey  wrote:
>> 
>>> On Jun 20, 2023, at 06:06, Rowan Tommins  wrote:
>>> 
>>> On Tue, 20 Jun 2023 at 10:25, Ilija Tovilo  wrote:
>>> 
 Introduce a new function (currently named populate_post_data()) to
 read the input stream and populate the $_POST and $_FILES
 superglobals.
>>> 
>>> How about "request_form_populate_globals"?
> 
> The word "form" seems a bit out of place (even though it appears in
> both multipart/form-data and application/x-www-form-urlencoded),
> because this function is mainly targeted at PUT/PATCH requests for
> REST APIs. Maybe request_body_populate_globals?
> 
>> Another option for the name: `populate_multipart_form_data()`.
> 
> I avoided the term "multipart" because the function technically also
> works for application/x-www-form-urlencoded requests. It's less
> necessary for the reasons outlined in my previous email, but it would
> allow for consistent handling of such requests for all HTTP methods.
> 
> Some people on GitHub voiced that they would prefer an INI setting.
> Therefore I will create an RFC accordingly.


I know this issue comes up enough because it’s confusing to developers that 
there’s not a `$_PUT`, etc., but I’m not a fan of introducing something new 
that populates globals.

While I’ve never used `application/x-www-form-urlencoded` data for `PUT` 
requests (because I don’t think it’s a proper content-type for the semantics of 
`PUT`), I do see how this content-type could be used with `PATCH`, and I also 
don’t want to rule-out use cases of it for `PUT` or any other HTTP method.

In the past, I’ve used something like the following to solve this:

parse_str(file_get_contents('php://input'), $data);

I haven’t looked up how any of the frameworks solve this, but I would be 
willing to bet they also do something similar.

Rather than implementing functionality to populate globals, would you be 
interested in introducing some new HTTP request functions. Something like:

http_request_body(): string
http_parse_query(string $queryString): array

`http_request_body()` would return the raw body and would be the equivalent of 
calling `file_get_contents('php://input')`. Of special note is that it should 
_always_ return the raw body, even if `$_POST` is populated, for the sake of 
consistency and reducing confusion.

`http_parse_query()` would be the opposite of `http_build_query()` and would 
return a value instead of requiring a reference parameter, like `parse_str()`.

While these don’t address the confusion users face by not having a `$_PUT` 
superglobal, I’d prefer not to overload the superglobals. Maybe we can update 
the documentation to encourage use of these new functions instead of the 
superglobals?

We also might want to introduce something like `http_query_string(): string` 
that always returns the raw query string, instead of requiring use of the 
superglobal `$_SERVER['QUERY_STRING']`.

Cheers,
Ben



signature.asc
Description: Message signed with OpenPGP


Re: [PHP-DEV] RFC1867 (multipart/form-data) PUT requests

2023-06-27 Thread Ilija Tovilo
Hi Ben, Hi Rowan

On Mon, Jun 26, 2023 at 8:55 PM Ben Ramsey  wrote:
>
> > On Jun 20, 2023, at 06:06, Rowan Tommins  wrote:
> >
> > On Tue, 20 Jun 2023 at 10:25, Ilija Tovilo  wrote:
> >
> >> Introduce a new function (currently named populate_post_data()) to
> >> read the input stream and populate the $_POST and $_FILES
> >> superglobals.
> >
> > How about "request_form_populate_globals"?

The word "form" seems a bit out of place (even though it appears in
both multipart/form-data and application/x-www-form-urlencoded),
because this function is mainly targeted at PUT/PATCH requests for
REST APIs. Maybe request_body_populate_globals?

> Another option for the name: `populate_multipart_form_data()`.

I avoided the term "multipart" because the function technically also
works for application/x-www-form-urlencoded requests. It's less
necessary for the reasons outlined in my previous email, but it would
allow for consistent handling of such requests for all HTTP methods.

Some people on GitHub voiced that they would prefer an INI setting.
Therefore I will create an RFC accordingly.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] RFC1867 (multipart/form-data) PUT requests

2023-06-26 Thread Ben Ramsey
> On Jun 20, 2023, at 06:06, Rowan Tommins  wrote:
> 
> On Tue, 20 Jun 2023 at 10:25, Ilija Tovilo  wrote:
> 
>> Introduce a new function (currently named populate_post_data()) to
>> read the input stream and populate the $_POST and $_FILES
>> superglobals.
>> 
> 
> 
> My initial instinct was to discuss how this could be made more flexible in
> terms of input and output; but looking at how simple the implementation is,
> this seems like a really quick win, and anything else can be future scope.
> 
> To bikeshed the name a bit:
> 
> * We should probably avoid the word "post"; although it populates $_POST,
> that's really misnamed anyway, what we're really talking about is "form
> data"
> * Even if there isn't a current category prefix where it belongs, it should
> probably follow the "category_specific" naming scheme
> * We might in future add some related functions, such as returning the data
> rather than populating globals
> 
> How about "request_form_populate_globals"?


Another option for the name: `populate_multipart_form_data()`.

Cheers,
Ben





signature.asc
Description: Message signed with OpenPGP


Re: [PHP-DEV] RFC1867 (multipart/form-data) PUT requests

2023-06-20 Thread Rowan Tommins
On Tue, 20 Jun 2023 at 10:25, Ilija Tovilo  wrote:

> Introduce a new function (currently named populate_post_data()) to
> read the input stream and populate the $_POST and $_FILES
> superglobals.
>


My initial instinct was to discuss how this could be made more flexible in
terms of input and output; but looking at how simple the implementation is,
this seems like a really quick win, and anything else can be future scope.

To bikeshed the name a bit:

* We should probably avoid the word "post"; although it populates $_POST,
that's really misnamed anyway, what we're really talking about is "form
data"
* Even if there isn't a current category prefix where it belongs, it should
probably follow the "category_specific" naming scheme
* We might in future add some related functions, such as returning the data
rather than populating globals

How about "request_form_populate_globals"?

Regards,
-- 
Rowan Tommins
[IMSoP]


Re: [PHP-DEV] RFC1867 (multipart/form-data) PUT requests

2023-06-20 Thread Hans Henrik Bergan
how are errors handled, like if the format of php://input is
unrecognized, not valid multipart/form-data and not valid
application/x-www-form-urlencoded?
errors? exceptions? nothing?

On Tue, 20 Jun 2023 at 11:26, Ilija Tovilo  wrote:
>
> Hi internals
>
> A while ago I encountered a limitation of how RFC1867 requests are
> handled in PHP. PHP populates the $_POST and $_FILES superglobals when
> the Content-Type is multipart/form-data or
> application/x-www-form-urlencoded, but only when the method is POST.
> For application/x-www-form-urlencoded PUT requests this is not a
> problem because the format is simple, usually limited in size and PHP
> offers functions to parse it, namely parse_str and parse_url. For
> RFC1867 it's a different story.
>
> The code handling the request will need to use streams because RFC1867
> is often used with files, the format is much more complicated, files
> should be cleaned up when the request ends if unused, etc. Handling
> this manually is non-trivial. This has been reported many years ago,
> and evidently caused a bit of frustration.
> https://bugs.php.net/bug.php?id=55815
>
> This is not limited to PUT either, multipart/form-data bodies are
> valid with other requests. Here's the approach I believe is best.
>
> Introduce a new function (currently named populate_post_data()) to
> read the input stream and populate the $_POST and $_FILES
> superglobals. The function works for any non-POST requests. It assumes
> that none of the input stream has been consumed, and that the
> Content-Type is set accordingly. A nice side-effect of this approach
> is that it may be used with the enable_post_data_reading ini setting
> to decide whether to parse the RFC1867 bodies dynamically. For
> example, a specific endpoint may accept bigger requests. The function
> may be implemented in a more generic way 1. by returning the
> data/files arrays instead of populating the superglobals and 2. by
> providing an input stream manually. I don't know if there's such a
> use-case and thus if this is worthwhile, as it would require bigger
> changes in the RFC1867 handling.
>
> Here's the proof-of-concept implementation:
> https://github.com/php/php-src/pull/11472
>
> For completeness, here are other options I considered.
>
> 1. Create a new $_PUT superglobal that is always populated. Two
> issues: The obvious one is that this is limited to PUT requests. While
> we could also introduce $_PATCH, this seems like a poor solution.
> While discouraged, other methods can also contain bodies. Another
> issue is that the code for processing RFC1867 consumes the input
> stream. This constitutes a BC break. Buffering the input is not
> feasible for large requests that would be expected here.
> 2. The same as option 1, but populate the existing $_POST global. This
> comes with the same BC break.
> 3. The same as options 1 or 2 with an additional ini setting to opt
> into the behavior. The issue with this approach is that both the old
> and new behavior might be desired in different parts of the same
> application. The ini option can't be changed at runtime because the
> populating of the superglobals happens before user code is being
> executed.
>
> Let me know what your thoughts are. If there is consensus in the
> feedback I'll update the implementation accordingly and post an update
> to the list. If there is no consensus, I will create an RFC.
>
> Ilija
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: https://www.php.net/unsub.php
>

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



[PHP-DEV] RFC1867 (multipart/form-data) PUT requests

2023-06-20 Thread Ilija Tovilo
Hi internals

A while ago I encountered a limitation of how RFC1867 requests are
handled in PHP. PHP populates the $_POST and $_FILES superglobals when
the Content-Type is multipart/form-data or
application/x-www-form-urlencoded, but only when the method is POST.
For application/x-www-form-urlencoded PUT requests this is not a
problem because the format is simple, usually limited in size and PHP
offers functions to parse it, namely parse_str and parse_url. For
RFC1867 it's a different story.

The code handling the request will need to use streams because RFC1867
is often used with files, the format is much more complicated, files
should be cleaned up when the request ends if unused, etc. Handling
this manually is non-trivial. This has been reported many years ago,
and evidently caused a bit of frustration.
https://bugs.php.net/bug.php?id=55815

This is not limited to PUT either, multipart/form-data bodies are
valid with other requests. Here's the approach I believe is best.

Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals. The function works for any non-POST requests. It assumes
that none of the input stream has been consumed, and that the
Content-Type is set accordingly. A nice side-effect of this approach
is that it may be used with the enable_post_data_reading ini setting
to decide whether to parse the RFC1867 bodies dynamically. For
example, a specific endpoint may accept bigger requests. The function
may be implemented in a more generic way 1. by returning the
data/files arrays instead of populating the superglobals and 2. by
providing an input stream manually. I don't know if there's such a
use-case and thus if this is worthwhile, as it would require bigger
changes in the RFC1867 handling.

Here's the proof-of-concept implementation:
https://github.com/php/php-src/pull/11472

For completeness, here are other options I considered.

1. Create a new $_PUT superglobal that is always populated. Two
issues: The obvious one is that this is limited to PUT requests. While
we could also introduce $_PATCH, this seems like a poor solution.
While discouraged, other methods can also contain bodies. Another
issue is that the code for processing RFC1867 consumes the input
stream. This constitutes a BC break. Buffering the input is not
feasible for large requests that would be expected here.
2. The same as option 1, but populate the existing $_POST global. This
comes with the same BC break.
3. The same as options 1 or 2 with an additional ini setting to opt
into the behavior. The issue with this approach is that both the old
and new behavior might be desired in different parts of the same
application. The ini option can't be changed at runtime because the
populating of the superglobals happens before user code is being
executed.

Let me know what your thoughts are. If there is consensus in the
feedback I'll update the implementation accordingly and post an update
to the list. If there is no consensus, I will create an RFC.

Ilija

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php