On 27.08.2019 11:25, Ruediger Pluem wrote: > > On 08/25/2019 04:04 PM, Branko Čibej wrote: >> On 24.08.2019 16:39, Ivan Zhakov wrote: >>> On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov <[email protected]> wrote: >>>> On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov <[email protected]> wrote: >>>>> On Wed, 3 Jul 2019 at 18:37, Joe Orton <[email protected]> wrote: >>>>>> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote: >>>>>>> They also make the existing old API unusable for many cases thus >>>>>>> making a switch to the new API mandatory, even though it doesn't have >>>>>>> to be so (see below). >>>>>>> >>>>>>> APR 1.x did not specify that the result of apr_dir_read() has to be >>>>>>> allocated >>>>>>> in dir->pool: >>>>>>> >>>>>>> >>>>>>> https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8 >>>>>>> >>>>>>> This function does not accept a pool argument, and I think that it's >>>>>>> natural >>>>>>> to assume limited lifetime in such case. This is what the current major >>>>>>> APR >>>>>>> users (httpd, svn) do, and other users should probably be ready for >>>>>>> that too, >>>>>>> since on Win32 the subsequent calls to apr_dir_read() already >>>>>>> invalidate the >>>>>>> previous apr_finfo_t. >>>>>> Are there many examples in the APR API where returned strings are not >>>>>> either pool-allocated or constant (process lifetime)? It seems unusual >>>>>> and very surprising to me. >>>>>> >>>>>> A hypothetical API consumer can have made either assumption: >>>>>> >>>>>> a) of constant memory (true with limits on Win32, breaks for Unix), or >>>>>> b) of pool-allocated string lifetime (works for Unix, breaks for Win32) >>>>>> >>>>>> neither is documented and both are broken by current implementations. It >>>>>> is impossible to satisfy both so anybody can argue that fixing either >>>>>> implementation to match the other is a regression. >>>>>> >>>>>> Doubling down on (a) over (b) seems strongly inferior to me, the API >>>>>> cannot support this without introducing runtime overhead for all callers >>>>>> (a new iterpool in the dir object), so I'd rather fix this properly with >>>>>> a new API. >>>>>> >>>>>> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at >>>>>> the same as introducing _pread() if desired - making the strdup only >>>>>> happen for _pread would only be a minor tweak, not a whole new subpool. >>>>>> >>>>> There is example of apr_hash_t.ITERATOR that is statically allocated in >>>>> the hash object and returned to the caller. >>>>> >>>>> Also the native readdir() API behaves the same way [1]: >>>>> [[[ >>>>> The returned pointer, and pointers within the structure, might be >>>>> invalidated >>>>> or the structure or the storage areas might be overwritten by a >>>>> subsequent call >>>>> to readdir() on the same directory stream. >>>>> ]]] >>>>> >>>>> From this perspective the current behavior of apr_dir_read on Unix looks >>>>> inconsistent and surprising for the API user because it is impossible to >>>>> use >>>>> in a loop without unbounded memory usage. >>>>> >>>>> I strongly agree that the revved version of this function that accepts a >>>>> POOL >>>>> argument will be good from both usage and implementation terms. I would >>>>> be +1 >>>>> on adding such API. But I also think that unless we fix the original issue >>>>> by using the preallocated buffer/iterpool, the whole thing would still be >>>>> problematic for any existing API user. >>>>> >>>>> More detailed: if we take this opportunity to choose and document the API >>>>> to >>>>> consistently return a temporary result, then as the caller I can either >>>>> process >>>>> the data during iteration or manually put it into a long-living pool if >>>>> that is >>>>> necessary. I think that all current callers work with the API this way. >>>>> And I >>>>> can also switch to the new revved function to manually specify the pool >>>>> and >>>>> better control the allocations. If I would like to support both APR 1.x >>>>> and 2.x >>>>> the compatibility is also straightforward to achieve. >>>>> >>>>> If we instead choose an option where the results may be allocated >>>>> in the dir object pool, then as the caller I cannot avoid unbounded memory >>>>> usage when supporting both APR 1.x and 2.x. And this approach also >>>>> introduces >>>>> the unavoidable the unbounded memory usage for all Win32 users that >>>>> do not update their code. >>>>> >>>>> [1] >>>>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475 >>>>> >>>> Is there any feedback or thoughts on these proposed changes? >>>> >>>> Thanks! >>>> >>> For what it's worth, I'm -1 on the changes done in r1862071 and >>> r1862435 for the reasons listed above. >>> >>> PS: No idea if I can actually veto changes, because while I'm a >>> committer on the APR project, I am not in its PMC. But "-1" properly >>> expresses my technical opinion on this topic. >> >> I agree with your analysis. I think it would be best if you just >> implemented your proposal, including the revised version of the API. The >> unbounded-memory case you describe is pretty much a showstopper. The >> allocation choices do have to be documented better, IMO. >> >> That would be trunk and 1.7.x, although we can't add a revised API in >> the 1.7.x line since it's released already. For compatibility it would >> probably be best to leave apr_dir_read on trunk without the extra pool >> parameter. > The API that allows to specify a pool can only be introduced with 1.8 / > trunk, but what about the current API which > needs to stay for 1.8? Either way we fix the code we might break > applications. If we use the Windows approach on Unix > you break applications that rely on getting a copy back, if we use the Unix > approach we have an unbounded-memory issue. > Do you see any other option but choosing one implementation (Windows or Unix) > for the other case as well and place > a fat note about the changed behavior somewhere (Release notes / Change log)?
The way stuff is allocated was only documented a couple months ago on trunk. Fix those docs and that's it. Anyone who previously relied on getting a copy was looking at the implementation, not the documentation. This is release notes/API errata territory. -- Brane
