On Tue, 2 Jul 2019 at 13:18, Joe Orton <jor...@redhat.com> wrote:
>
> On Tue, Jul 02, 2019 at 09:59:20AM +0200, Branko Čibej wrote:
> > On 02.07.2019 08:49, Joe Orton wrote:
> > > On Thu, Jun 27, 2019 at 05:01:56PM +0300, Ivan Zhakov wrote:
> > >> On Tue, 25 Jun 2019 at 17:21, <jor...@apache.org> wrote:
> > >>
> > >>> Author: jorton
> > >>> Date: Tue Jun 25 14:21:56 2019
> > >>> New Revision: 1862071
> > >>>
> > >>> URL: http://svn.apache.org/viewvc?rev=1862071&view=rev
> > >>> Log:
> > >>> Add apr_dir_pread(), a variant of apr_dir_read() which allows callers
> > >>> to read a directory with constant memory consumption:
> > > ...
> > >> I'm not sure it's best fix. Better solution would be allocate buffer for
> > >> dirname + MAX_FILE_NAME in apr_dir_t and reuse it on every iteration. 
> > >> Win32
> > >> already has such code.
> >
> > > I didn't look at win32 too closely, so on win32 currently doing:
> > >
> > >   apr_dir_open(&x, ...);
> > >   apr_dir_read(&f1, ..., x);
> > >   apr_dir_read(&f2, ..., x);
> > >
> > > invalidates f1?  That's pretty ugly, there is no indication in the API
> > > that apr_dir_read() has side-effects like that.
> >
> > The 2.0 docstring says explicitly that "memory is allocated in the pool
> > passed to apr_dir_open()". While this is really bad API design (hence
> > apr_dir_pread() ...), it pretty much forbids the implementation from
> > reusing an internal name buffer. Looks like the Windows implementation
> > needs fixing?
>
> Reading the win32 code a bit more, it is not constant-memory even with
> the buffer in apr_dir_t, because it can allocate from dir->pool (and
> even create cleanups!) in the more_finfo() call, so I think the current
> behaviour is not justifiable.
>
> Ivan, do you have a counter-proposal to apr_dir_pread() to allow
> constant memory while reading a directory?  If we could have a hard
> constraint on apr_dir_read() *never* allocating any memory I guess that
> would work but appears to require substantial changes to the win32 side
> which I'm not qualified for.
>
> IMO not pstrdup'ing the filenames on Win32 is an API violation.  You
> have to be very clear in documenting things like that if someone is
> passing in a struct *.
>
Hi Joe,

I think that these changes to apr_dir_read() may be troublesome, because
they basically *introduce* unbounded memory usage for any existing user of
apr_dir_read() on Win32. They also make the existing old API unusable for
many cases thus making a switch to the new API mandatory, even though it
doesn't have to be so (see below).

APR 1.x did not specify that the result of apr_dir_read() has to be allocated
in dir->pool:

  
https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8

This function does not accept a pool argument, and I think that it's natural
to assume limited lifetime in such case. This is what the current major APR
users (httpd, svn) do, and other users should probably be ready for that too,
since on Win32 the subsequent calls to apr_dir_read() already invalidate the
previous apr_finfo_t.

With all that in mind, here is what I propose instead:

1) First, fix the apr_dir_read()'s unbounded memory usage without introducing
   the new API.

  I think that this can be done by using the preallocated buffer for the
  filename or by introducing an internal `iterpool` that will be cleared
  by subsequent apr_dir_read() calls.

   If there are other Win32-related unbounded memory usages in the current
   apr_dir_read() implementation, I'll also be willing to fix them.

2) For APR 2.0, revv apr_dir_read() to apr_dir_read2() that accepts two pools:
   `result_pool` and `scratch_pool`.

   I think that this would be an appropriate change by itself (by giving the
   caller more fine-grained control over the allocations), but I also see it
   as an improvement that is unrelated to (1).

   In other words, I think that we should fix the existing problem first by
   changing or plumbing the existing API and only then add the new API that
   improves over it, because doing so doesn't make the switch to the new API
   mandatory for existing users who want to avoid regressions.

-- 
Ivan Zhakov

Reply via email to