El dilluns, 27 d’octubre del 2025, a les 10:00:09 (Hora estàndard d’Europa central), Sune Stolborg Vuorela va escriure: > On Sunday, October 26, 2025 1:55:25 AM Central European Standard Time > Jonathan > Hähne wrote: > > I have been working on a patch to generally use `std::string_view` for > > passing around read-only strings. The old handling has been highly > > inconsistent, with rather random usage of `const std::string &`, `const > > GooString &`, `const GooString *` and `std::unique_ptr<GooString>`. > > I have also been doing some things here. What I have been trying is to > - document ownership (by changing GooString* to std::unique_ptr<GooString>) > - Reducing the GooString api to bring it closer to std::string > - Removing the GooString api where it very much differs from std::string > - Changing various occurences of GooString to std::string in small steps > - not only for string types, but converting many malloc/new/delete/free to > c+ + collections. > > > * Is there some fundamental reason why most `GooString`s are > > > > wrapped in `std::unique_ptr`s rather than being stored inline? > > The reason for the std::unique_ptr is because it was 'raw pointers' before. > Please note that GooString specifically isn't copyable, that's from before > my time, but it is likely an attempt to avoid accidentally copying large > strings. > > * Is there even much value in `GooString` over `std::string`? Would > > > > it make sense to phase it out, to somewhat improve readability and > > compatibility? > > I think everyone agrees that GooString is to be phased out (except the > formatter things, but they can be moved to std::string fully). But it is > also not the thing on top of everyones list to do, so it is more like > chomping slowly away at it. Try compare a january GooString.h with a > 'today' GooString.h to see some of the changes that has happened. > > > * Finally, this effort may also help improve the encoding handling > > > > story. It's currently not well-documented how any given string within > > the library is encoded. > > Some strings may hold either single-byte or double-byte characters, > > distinguished by a BOM. > > I've tried a couple of times to move from pointer,size for example Unicode* > and char* and others to views and spans, not every time I ended up > succesful. > > I have also some non-successful experiments of std::u32string and > std::u16string which I think should be the good way to go for the utf8, > utf16 and utf32 characters (and utf16 in two bytes in a std::string and > utf16 partially filling out a 4bytes block) rather than the current > 'surprise encoding' we currently have. > > > Those are my rough goals and impressions from digging through the > > codebase, but I'd really like some feedback before committing to > > polishing up the rest of the many, many places that touch strings. Would > > the Poppler project be generally interested in this kind of overhaul? > > Are there any major pitfalls waiting on this path? > > I suggest you take it a little bit at a time, gets the reviews in, get > Albert to run his megatestsuite - and maybe hang around in the irc/matrix > room and ask from time to time. > > I do think it is worth the effort doing it. I will happily look over such > changes, and I'm quite sure Albert will happily merge them (He at least did > so with all my changes). I'm also sure he will write something here soon if > I'm wrong. > > It also have the side effect of getting a cleaner memory allocation/cleanup > pattern > > I think the major pitfall is 'getting it right' and 'keeping the changes not > too big'. > > One of the big pitfalls that I have removed was that GooString handled a > null char pointer differently (empty string) than std::string (crashing) > > Good luck, and try to keep the changesets small and many rather than big and > few.
Yes, small changes is the key. This also makes that in case we disagree your wasted effort is minimal. Cheers, Albert > > /Sune
