testdir.c

Branko Čibej Tue, 27 Aug 2019 02:43:44 -0700

On 27.08.2019 11:25, Ruediger Pluem wrote:
>
> On 08/25/2019 04:04 PM, Branko Čibej wrote:
>> On 24.08.2019 16:39, Ivan Zhakov wrote:
>>> On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov <i...@visualsvn.com> wrote:
>>>> On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov <i...@visualsvn.com> wrote:
>>>>> On Wed, 3 Jul 2019 at 18:37, Joe Orton <jor...@redhat.com> wrote:
>>>>>> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
>>>>>>> They also make the existing old API unusable for many cases thus
>>>>>>> making a switch to the new API mandatory, even though it doesn't have
>>>>>>> to be so (see below).
>>>>>>>
>>>>>>> APR 1.x did not specify that the result of apr_dir_read() has to be 
>>>>>>> allocated
>>>>>>> in dir->pool:
>>>>>>>
>>>>>>>   
>>>>>>> https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
>>>>>>>
>>>>>>> This function does not accept a pool argument, and I think that it's 
>>>>>>> natural
>>>>>>> to assume limited lifetime in such case. This is what the current major 
>>>>>>> APR
>>>>>>> users (httpd, svn) do, and other users should probably be ready for 
>>>>>>> that too,
>>>>>>> since on Win32 the subsequent calls to apr_dir_read() already 
>>>>>>> invalidate the
>>>>>>> previous apr_finfo_t.
>>>>>> Are there many examples in the APR API where returned strings are not
>>>>>> either pool-allocated or constant (process lifetime)?  It seems unusual
>>>>>> and very surprising to me.
>>>>>>
>>>>>> A hypothetical API consumer can have made either assumption:
>>>>>>
>>>>>> a) of constant memory (true with limits on Win32, breaks for Unix), or
>>>>>> b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
>>>>>>
>>>>>> neither is documented and both are broken by current implementations. It
>>>>>> is impossible to satisfy both so anybody can argue that fixing either
>>>>>> implementation to match the other is a regression.
>>>>>>
>>>>>> Doubling down on (a) over (b) seems strongly inferior to me, the API
>>>>>> cannot support this without introducing runtime overhead for all callers
>>>>>> (a new iterpool in the dir object), so I'd rather fix this properly with
>>>>>> a new API.
>>>>>>
>>>>>> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
>>>>>> the same as introducing _pread() if desired - making the strdup only
>>>>>> happen for _pread would only be a minor tweak, not a whole new subpool.
>>>>>>
>>>>> There is example of apr_hash_t.ITERATOR that is statically allocated in
>>>>> the hash object and returned to the caller.
>>>>>
>>>>> Also the native readdir() API behaves the same way [1]:
>>>>> [[[
>>>>> The returned pointer, and pointers within the structure, might be 
>>>>> invalidated
>>>>> or the structure or the storage areas might be overwritten by a 
>>>>> subsequent call
>>>>> to readdir() on the same directory stream.
>>>>> ]]]
>>>>>
>>>>> From this perspective the current behavior of apr_dir_read on Unix looks
>>>>> inconsistent and surprising for the API user because it is impossible to 
>>>>> use
>>>>> in a loop without unbounded memory usage.
>>>>>
>>>>> I strongly agree that the revved version of this function that accepts a 
>>>>> POOL
>>>>> argument will be good from both usage and implementation terms. I would 
>>>>> be +1
>>>>> on adding such API. But I also think that unless we fix the original issue
>>>>> by using the preallocated buffer/iterpool, the whole thing would still be
>>>>> problematic for any existing API user.
>>>>>
>>>>> More detailed: if we take this opportunity to choose and document the API 
>>>>> to
>>>>> consistently return a temporary result, then as the caller I can either 
>>>>> process
>>>>> the data during iteration or manually put it into a long-living pool if 
>>>>> that is
>>>>> necessary. I think that all current callers work with the API this way. 
>>>>> And I
>>>>> can also switch to the new revved function to manually specify the pool 
>>>>> and
>>>>> better control the allocations. If I would like to support both APR 1.x 
>>>>> and 2.x
>>>>> the compatibility is also straightforward to achieve.
>>>>>
>>>>> If we instead choose an option where the results may be allocated
>>>>> in the dir object pool, then as the caller I cannot avoid unbounded memory
>>>>> usage when supporting both APR 1.x and 2.x. And this approach also 
>>>>> introduces
>>>>> the unavoidable the unbounded memory usage for all Win32 users that
>>>>> do not update their code.
>>>>>
>>>>> [1] 
>>>>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475
>>>>>
>>>> Is there any feedback or thoughts on these proposed changes?
>>>>
>>>> Thanks!
>>>>
>>> For what it's worth, I'm -1 on the changes done in r1862071 and
>>> r1862435 for the reasons listed above.
>>>
>>> PS: No idea if I can actually veto changes, because while I'm a
>>> committer on the APR project, I am not in its PMC. But "-1" properly
>>> expresses my technical opinion on this topic.
>>
>> I agree with your analysis. I think it would be best if you just
>> implemented your proposal, including the revised version of the API. The
>> unbounded-memory case you describe is pretty much a showstopper. The
>> allocation choices do have to be documented better, IMO.
>>
>> That would be trunk and 1.7.x, although we can't add a revised API in
>> the 1.7.x line since it's released already. For compatibility it would
>> probably be best to leave apr_dir_read on trunk without the extra pool
>> parameter.
> The API that allows to specify a pool can only be introduced with 1.8 / 
> trunk, but what about the current API which
> needs to stay for 1.8? Either way we fix the code we might break 
> applications. If we use the Windows approach on Unix
> you break applications that rely on getting a copy back, if we use the Unix 
> approach we have an unbounded-memory issue.
> Do you see any other option but choosing one implementation (Windows or Unix) 
> for the other case as well and place
> a fat note about the changed behavior somewhere (Release notes / Change log)?



The way stuff is allocated was only documented a couple months ago on
trunk. Fix those docs and that's it. Anyone who previously relied on
getting a copy was looking at the implementation, not the documentation.

This is release notes/API errata territory.

-- Brane

Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

Reply via email to