Re: String handling overhaul

Albert Astals Cid Mon, 27 Oct 2025 06:35:04 -0700

El dilluns, 27 d’octubre del 2025, a les 10:00:09 (Hora estàndard d’Europa 
central), Sune Stolborg Vuorela va escriure:
> On Sunday, October 26, 2025 1:55:25 AM Central European Standard Time
> Jonathan
> Hähne wrote:
> > I have been working on a patch to generally use `std::string_view` for
> > passing around read-only strings. The old handling has been highly
> > inconsistent, with rather random usage of `const std::string &`, `const
> > GooString &`, `const GooString *` and `std::unique_ptr<GooString>`.
> 
> I have also been doing some things here. What I have been trying is to
>  - document ownership (by changing GooString* to std::unique_ptr<GooString>)
> - Reducing the GooString api to bring it closer to std::string
>  - Removing the GooString api where it very much differs from std::string
>  - Changing various occurences of GooString to std::string in small steps
>  - not only for string types, but converting many malloc/new/delete/free to
> c+ + collections.
> 
> >     * Is there some fundamental reason why most `GooString`s are
> >     
> >       wrapped in `std::unique_ptr`s rather than being stored inline?
> 
> The reason for the std::unique_ptr is because it was 'raw pointers' before.
> Please note that GooString specifically isn't copyable, that's from before
> my time, but it is likely an attempt to avoid accidentally copying large
> strings.
> >     * Is there even much value in `GooString` over `std::string`? Would
> >     
> >       it make sense to phase it out, to somewhat improve readability and
> >       compatibility?
> 
> I think everyone agrees that GooString is to be phased out (except the
> formatter things, but they can be moved to std::string fully). But it is
> also not the thing on top of everyones list to do, so it is more like
> chomping slowly away at it. Try compare a january GooString.h with a
> 'today' GooString.h to see some of the changes that has happened.
> 
> > * Finally, this effort may also help improve the encoding handling
> > 
> >   story. It's currently not well-documented how any given string within
> >   the library is encoded.
> >   Some strings may hold either single-byte or double-byte characters,
> >   distinguished by a BOM.
> 
> I've tried a couple of times to move from pointer,size for example Unicode*
> and char* and others to views and spans, not every time I ended up
> succesful.
> 
> I have also some non-successful experiments of std::u32string and
> std::u16string which I think should be the good way to go for the utf8,
> utf16 and utf32 characters (and utf16 in two bytes in a std::string and
> utf16 partially filling out a 4bytes block)  rather than the current
> 'surprise encoding' we currently have.
> 
> > Those are my rough goals and impressions from digging through the
> > codebase, but I'd really like some feedback before committing to
> > polishing up the rest of the many, many places that touch strings. Would
> > the Poppler project be generally interested in this kind of overhaul?
> > Are there any major pitfalls waiting on this path?
> 
> I suggest you take it a little bit at a time, gets the reviews in, get
> Albert to run his megatestsuite - and maybe hang around in the irc/matrix
> room and ask from time to time.
> 
> I do think it is worth the effort doing it. I will happily look over such
> changes, and I'm quite sure Albert will happily merge them (He at least did
> so with all my changes). I'm also sure he will write something here soon if
> I'm wrong.
> 
> It also have the side effect of getting a cleaner memory allocation/cleanup
> pattern
> 
> I think the major pitfall is 'getting it right' and 'keeping the changes not
> too big'.
> 
> One of the big pitfalls that I have removed was that GooString handled a
> null char pointer differently (empty string) than std::string (crashing)
> 
> Good luck, and try to keep the changesets small and many rather than big and
> few.


Yes, small changes is the key.

This also makes that in case we disagree your wasted effort is minimal.

Cheers,
  Albert

> 
> /Sune

Re: String handling overhaul

Reply via email to