On Tue, Jun 11, 2019 at 9:14 AM William A Rowe Jr <wr...@rowe-clan.net> wrote:
> On Tue, Jun 11, 2019 at 4:15 AM Branko Čibej <br...@apache.org> wrote: > >> On 07.06.2019 21:58, William A Rowe Jr wrote: >> > I think the optimal way is to allocate a pair of apr thread-specific >> > wchar buffers in each thread's pool on startup, and use those >> > exclusively per-thread for wchar translations. We could be looking at >> > 64k/thread exclusively for name translation, but it doesn't seem >> > unreasonable. >> > >> > The alternative is to continue to use stack, we surely don't want to >> > lock on acquiring or allocating name translation buffers. /shrug >> >> Since this is Windows, and there's no embedded Windows environment left >> that I'm aware of that's still alive, we can continue with using the >> stack ... but wouldn't it be so much better to alloca() the required >> size instead of blindly burning through 64k every time? Obviously that >> means counting the characters first. >> > > I don't think there is any need to do so. A name buffer needs 32k (right > now > we limit that to 8k, but if we wanted to satisfy the OS-acceptable pathname > length, we would blow that out 4x the current arbitrary limit.) For a > rename, > make that 2x buffers. But there is no benefit to wasting cycles determining > the string length ahead of allocation, because the very next call can run > right past that limit and hit the wall on available stack. Demanding that > there > always be a potential 32k runway of available stack doesn't seem excessive. > > The point to the stack is that it contracts immediately on return. So we > aren't > burning through a 64k buffer - we are ensuring that we have been topped-up > to 64k remaining. The targets of these calls are all Win32 API > invocations, > we are never nesting them inside further big-buffer allocations of our own. > What might happen in ntdll we have little control over. > > We either reserve about 2x buffers for file name transliteration in heap > per thread, or we use the thread stack. As long as we trust that our utf-8 > to ucs-2 logic is rock solid and the allocations and limits are correctly > coded, this continues to be a safe approach. The 32k/64k are immediately > given back in the current stack-based approach, but would simply be boat > anchors most of the time if they are set aside in heap. > > I'm +/-0 on switching from stack to heap for this particular transformation > but welcome good suggestions. > I failed to call this out, but guessing that we agree that if we preserve wide character string results (without discarding them after a single API call), using the string size or moving towards counted byte string representation makes a whole lot more sense for longer-term heap allocations.