[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-06-26 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #16 from Jonathan Wakely  ---
Author: redi
Date: Wed Jun 26 15:12:15 2019
New Revision: 272697

URL: https://gcc.gnu.org/viewcvs?rev=272697=gcc=rev
Log:
PR libstdc++/90634 reduce allocations in filesystem::path construction

Backport from mainline
2019-05-28  Jonathan Wakely  

PR libstdc++/90634
* include/experimental/bits/fs_path.h (path::path(path&&)): Only call
_M_split_cmpts() for a path with multiple components.
(path::_S_is_dir_sep()): Add missing 'static' keyword to function.
* src/filesystem/path.cc (path::_M_split_cmpts()): Count number of
components and reserve space in vector. Return early when there is
only one component.
* testsuite/experimental/filesystem/path/construct/90634.cc: New test.

Added:
   
branches/gcc-7-branch/libstdc++-v3/testsuite/experimental/filesystem/path/construct/90634.cc
Modified:
branches/gcc-7-branch/libstdc++-v3/ChangeLog
branches/gcc-7-branch/libstdc++-v3/include/experimental/bits/fs_path.h
branches/gcc-7-branch/libstdc++-v3/src/filesystem/path.cc

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-30 Thread 1000hz.radiowave at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #15 from baltic <1000hz.radiowave at gmail dot com> ---
(In reply to Jonathan Wakely from comment #14)

> It's true that the standard does not require path::iterator to be usable in
> generic algorithms that expect forward iterators or bidirectional iterators.

Indeed! Ability to run a modifying algorithm on a path's elements is extremely
useful. Sorting path's components makes a lot of practical sense. So does
std::random_shuffle them. And ability to heapify them (std::make_heap) is
basically unavoidable in any app, which has to deal with filesystem::path. 
But what weirdo would want to store hundreds of thousands of paths without x10
memory overhead, right?

So i agree. Just screw the C++ standard and do what you think is best for ppl
out there.

Thanks for your great work!

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-29 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Jonathan Wakely  ---
(In reply to baltic from comment #9)
> (In reply to Jonathan Wakely from comment #8)
> 
> > The libc++ implementation also fails this test:
> 
> As i've shown before, neither of those are failures. By the current C++
> standard at least.

It's true that the standard does not require path::iterator to be usable in
generic algorithms that expect forward iterators or bidirectional iterators.

I think the standard is wrong.

> So, long story short: "I am not going to fix the x10 overhead, because I
> believe the standard is wrong."

Yes.

The vector reallocations during path construction are gone for gcc 8.4 (and
also in experimental::filesystem::path in all branches except gcc-7-branch).

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-28 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #13 from Jonathan Wakely  ---
Author: redi
Date: Tue May 28 20:48:31 2019
New Revision: 271719

URL: https://gcc.gnu.org/viewcvs?rev=271719=gcc=rev
Log:
PR libstdc++/90634 reduce allocations in filesystem::path construction

Backport from mainline
2019-05-28  Jonathan Wakely  

PR libstdc++/90634
* include/experimental/bits/fs_path.h (path::path(path&&)): Only call
_M_split_cmpts() for a path with multiple components.
(path::_S_is_dir_sep()): Add missing 'static' keyword to function.
* src/filesystem/path.cc (path::_M_split_cmpts()): Count number of
components and reserve space in vector. Return early when there is
only one component.
* testsuite/27_io/filesystem/path/construct/90634.cc: New test.
* testsuite/experimental/filesystem/path/construct/90634.cc: New test.

Added:
   
branches/gcc-9-branch/libstdc++-v3/testsuite/27_io/filesystem/path/construct/90634.cc
   
branches/gcc-9-branch/libstdc++-v3/testsuite/experimental/filesystem/path/construct/90634.cc
Modified:
branches/gcc-9-branch/libstdc++-v3/ChangeLog
branches/gcc-9-branch/libstdc++-v3/include/experimental/bits/fs_path.h
branches/gcc-9-branch/libstdc++-v3/src/filesystem/path.cc

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-28 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #12 from Jonathan Wakely  ---
Author: redi
Date: Tue May 28 19:39:48 2019
New Revision: 271717

URL: https://gcc.gnu.org/viewcvs?rev=271717=gcc=rev
Log:
PR libstdc++/90634 reduce allocations in filesystem::path construction

PR libstdc++/90634
* include/experimental/bits/fs_path.h (path::path(path&&)): Only call
_M_split_cmpts() for a path with multiple components.
(path::_S_is_dir_sep()): Add missing 'static' keyword to function.
* src/filesystem/path.cc (path::_M_split_cmpts()): Count number of
components and reserve space in vector. Return early when there is
only one component.
* testsuite/27_io/filesystem/path/construct/90634.cc: New test.
* testsuite/experimental/filesystem/path/construct/90634.cc: New test.

Added:
trunk/libstdc++-v3/testsuite/27_io/filesystem/path/construct/90634.cc
   
trunk/libstdc++-v3/testsuite/experimental/filesystem/path/construct/90634.cc
Modified:
trunk/libstdc++-v3/ChangeLog
trunk/libstdc++-v3/include/experimental/bits/fs_path.h
trunk/libstdc++-v3/src/filesystem/path.cc

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-28 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #11 from Jonathan Wakely  ---
Author: redi
Date: Tue May 28 16:08:51 2019
New Revision: 271712

URL: https://gcc.gnu.org/viewcvs?rev=271712=gcc=rev
Log:
Fix check for root-directory path and add tests

PR libstdc++/90634
* src/filesystem/path.cc (path::_M_split_cmpts()): Fix check for "/".
* testsuite/27_io/filesystem/path/construct/90634.cc: New test.
* testsuite/experimental/filesystem/path/construct/90634.cc: New test.

Added:
   
branches/gcc-8-branch/libstdc++-v3/testsuite/27_io/filesystem/path/construct/90634.cc
   
branches/gcc-8-branch/libstdc++-v3/testsuite/experimental/filesystem/path/construct/90634.cc
Modified:
branches/gcc-8-branch/libstdc++-v3/ChangeLog
branches/gcc-8-branch/libstdc++-v3/src/filesystem/path.cc

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-28 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #10 from Jonathan Wakely  ---
Author: redi
Date: Tue May 28 14:57:35 2019
New Revision: 271710

URL: https://gcc.gnu.org/viewcvs?rev=271710=gcc=rev
Log:
PR libstdc++/90634 reduce allocations in filesystem::path construction

PR libstdc++/90634
* include/bits/fs_path.h (path::path(path&&)): Only call
_M_split_cmpts() for a path with multiple components.
(path::_S_is_dir_sep()): Add missing 'static' keyword to function.
* include/experimental/bits/fs_path.h: Likewise.
* src/filesystem/path.cc (path::_M_split_cmpts()): Count number of
components and reserve space in vector. Return early when there is
only one component.
* src/filesystem/std-path.cc (path::_M_split_cmpts()): Likewise.

Modified:
branches/gcc-8-branch/libstdc++-v3/ChangeLog
branches/gcc-8-branch/libstdc++-v3/include/bits/fs_path.h
branches/gcc-8-branch/libstdc++-v3/include/experimental/bits/fs_path.h
branches/gcc-8-branch/libstdc++-v3/src/filesystem/path.cc
branches/gcc-8-branch/libstdc++-v3/src/filesystem/std-path.cc

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-28 Thread 1000hz.radiowave at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #9 from baltic <1000hz.radiowave at gmail dot com> ---
(In reply to Jonathan Wakely from comment #8)

> The libc++ implementation also fails this test:

As i've shown before, neither of those are failures. By the current C++
standard at least.

So, long story short: "I am not going to fix the x10 overhead, because I
believe the standard is wrong."

Ok.
But damn! how good clang looks on the same test:

about to quit. total allocated 0 bytes

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-27 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #8 from Jonathan Wakely  ---
Yes, I'm well aware what the standard says, I reported
https://wg21.link/lwg2674

I consider it a mistake for the standard to say "these are bidirectional
iterators, except they're not, but it's OK because at some point in the future
a subset of algorithms won't care about that requirement any more". GCC's
implementation provides real bidirectional iterators that meet the C++17
requirements and the C++2a BidirectionalIterator requirement that:

"Pointers and references obtained from a forward iterator into a range [i, s)
shall remain valid while [i, s) continues to denote a range."

The libc++ implementation also fails this test:
https://wandbox.org/permlink/RwiFphMn1Hh2iis4
It's OK though, because you don't pay for what you don't use!

The design trade off was discussed in Bug 71044 comment 6. Populating the range
lazily would still be possible (as long as it was thread-safe) but I'm not
convinced it would be an improvement. In theory the range could also be shared
by copies of the same path (again, as long as it was thread-safe). Those
optimizations can be considered later, but for now the implementation aims to
be correct and not produce dangling pointers.

As I said, I'll reduce the number of allocations done by the GCC 8.x
implementation, but I'm not planning on any other changes for now.

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-27 Thread 1000hz.radiowave at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #7 from baltic <1000hz.radiowave at gmail dot com> ---
(In reply to Jonathan Wakely from comment #5)

> So don't store them as filesystem::path objects then, store them as strings
> and create a path as needed.

Type system is here for a reason. And it has advantages to type something,
which is naturally a path, as such, not as an arbitrary string.

> GCC's implementation creates the path components eagerly, so that
> path::iterator meets the requirements of a forward iterator, whereas the
> libc++ implementation creates them lazily during iteration, and so is not a
> valid forward iterator. 

It doesn't have to be. See standard 30.11.7.5:

A path::iterator is a constant iterator satisfying all the requirements of a
bidirectional iterator (27.2.6)
except that, for dereferenceable iterators a and b of type path::iterator with
a == b, there is no requirement
that *a and *b are bound to the same object. Its value_type is path.

> The implementations have different trade-offs. That is not a bug.

Poorly considered trade-offs are called "bad design" ;)
Besides, your implementation breaks the fundamental C++ design principle "you
don't pay for what you don't use". now everyone has to pay x10 memory overhead
even if they are not ever going to iterate over a path object. I agree, that is
not a bug, in C++ world, its a pure crime ;)

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-26 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|9.0 |8.4

--- Comment #6 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #5)
> The correct results with 9.1.0 are:
> 
> allocating 21 bytes
> allocating 248 bytes
> about to quit. total allocated 269 bytes
> freed 248 bytes
> freed 21 bytes

Proof: https://wandbox.org/permlink/zJVe39jAZtL1p6je

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-26 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #5 from Jonathan Wakely  ---
(In reply to baltic from comment #3)
> (In reply to Jonathan Wakely from comment #2)
>  
> > I'm not sure where the repeated 72 allocations come from, and can't check
> > the code right now, but the new code doesn't do it anyway.
> 
> It comes from std::string creation, which then is passed to path constructor.

No it doesn't, that's the 21 bytes.

With GCC 9.x the 72 bytes is sizeof(path), which comes from allocating a path.
It comes from each component of the path creating a vector during
parsing, and then if there's only one component it empties the vector again.
Those transient allocations could be avoided, so I'll change that for GCC
8.4.0, but there's nothing to do for GCC 9.x as it's already avoiding
unnecessary vector operations.

> > I think this can be considered fixed.
> 
> hardly. the gcc 9.1 is even worse in that regard, as it allocates 2kB of
> memory for the same case
> see the line:
> 
> about to quit. total allocated 2131 bytes
> 
> in here:
> https://wandbox.org/permlink/u9dEfPh1Zc5pmJ34

You should try using a real compiler and not only rely on wandbox. Maybe
there's something wrong with the wandbox compiler, but that result only happens
when using Boost (so you should take it up with Boost instead).

The correct results with 9.1.0 are:

allocating 21 bytes
allocating 248 bytes
about to quit. total allocated 269 bytes
freed 248 bytes
freed 21 bytes

(In reply to baltic from comment #4)
> besides the 269 bytes you have mentioned, is still x10 overhead for a 20
> chars string. and massively adds up, if you store a lot objects.

The overhead is linear in the number of components in the path, not the string
length. If you have a path with a single filename and no directories then the
overhead is nothing like 10x.

> for example, when i go around home folder on my machine and save all the
> found paths to vector, it takes 2.7GB, while should take ~150MB!! quite an
> overhead on a simple task, for a language which strives for efficiency.

So don't store them as filesystem::path objects then, store them as strings and
create a path as needed.

> check out clang, for example:
> https://wandbox.org/permlink/u9dEfPh1Zc5pmJ34
> it's smart enough to allocate such short strings inplace:
> about to quit. total allocated 0 bytes

GCC will also create short strings in place, but with a different limit for
what is considered "short".

GCC's implementation creates the path components eagerly, so that
path::iterator meets the requirements of a forward iterator, whereas the libc++
implementation creates them lazily during iteration, and so is not a valid
forward iterator. This fails with libc++:

https://wandbox.org/permlink/eas3ZqA2CojecrQJ

The implementations have different trade-offs. That is not a bug.

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-26 Thread 1000hz.radiowave at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #4 from baltic <1000hz.radiowave at gmail dot com> ---
besides the 269 bytes you have mentioned, is still x10 overhead for a 20 chars
string. and massively adds up, if you store a lot objects.
for example, when i go around home folder on my machine and save all the found
paths to vector, it takes 2.7GB, while should take ~150MB!! quite an overhead
on a simple task, for a language which strives for efficiency.

check out clang, for example:
https://wandbox.org/permlink/u9dEfPh1Zc5pmJ34
it's smart enough to allocate such short strings inplace:
about to quit. total allocated 0 bytes

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-26 Thread 1000hz.radiowave at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

baltic <1000hz.radiowave at gmail dot com> changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|FIXED   |---

--- Comment #3 from baltic <1000hz.radiowave at gmail dot com> ---
(In reply to Jonathan Wakely from comment #2)

> I'm not sure where the repeated 72 allocations come from, and can't check
> the code right now, but the new code doesn't do it anyway.

It comes from std::string creation, which then is passed to path constructor.

> I think this can be considered fixed.

hardly. the gcc 9.1 is even worse in that regard, as it allocates 2kB of memory
for the same case
see the line:

about to quit. total allocated 2131 bytes

in here:
https://wandbox.org/permlink/u9dEfPh1Zc5pmJ34

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-26 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |9.0

--- Comment #2 from Jonathan Wakely  ---
The 813 bytes seen with GCC 8.x is due to reallocations within std::vector, as
the sequence of path objects in  [begin(),end()) is populated. The new code
counts the number of components first, so there's no need to keep growing as
the path is parsed. The sequence no longer uses std::vector at all, which also
reduces sizeof(path).

I'm not sure where the repeated 72 allocations come from, and can't check the
code right now, but the new code doesn't do it anyway. I think this can be
considered fixed.

[Bug libstdc++/90634] filesystem::path insane memory allocations

2019-05-26 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90634

--- Comment #1 from Jonathan Wakely  ---
The path type was rewritten for GCC 9, and now prints:

allocating 21 bytes
allocating 248 bytes
about to quit.
total allocated 269 bytes
freed 248 bytes
freed 21 bytes

The 21 bytes is for the native() string, and 248 bytes is the sequence of path
objects in the range [begin(),end()).