Hi!

I have been working on a patch to generally use `std::string_view` for
passing around read-only strings. The old handling has been highly
inconsistent, with rather random usage of `const std::string &`, `const
GooString &`, `const GooString *` and `std::unique_ptr<GooString>`.

* The `string_view`ification, while still in progress, is already helping
  with reducing some unnecessary allocations and copies. But more
  importantly, this more abstract representation will allow improving
  string allocation strategies in important places, such as the
  `Object`'s string storage. The total performance benefit I have
  reached so far is already around 10%.

* The other big benefit will be type safety. Some functions taking
  `const GooString *` arguments use them to mean a non-null reference,
  whereas others mean an optional string. Similarly, it feels like half
  the places using a `std::unique_ptr<GooString>` ascribe meaning to a
  nullptr value, and the other half don't.
    * Is there some fundamental reason why most `GooString`s are
      wrapped in `std::unique_ptr`s rather than being stored inline?
    * Is there even much value in `GooString` over `std::string`? Would
      it make sense to phase it out, to somewhat improve readability and
      compatibility?

* Finally, this effort may also help improve the encoding handling
  story. It's currently not well-documented how any given string within
  the library is encoded.
  Some strings may hold either single-byte or double-byte characters,
  distinguished by a BOM. This is again bad for type safety, and
  prevents taking in-place substrings: a few places in Annot.cc and
  Form.cc that need substringing have to make copies and re-insert the
  BOM every time, sometimes in suspiciously quadratic-looking loops.

Those are my rough goals and impressions from digging through the
codebase, but I'd really like some feedback before committing to
polishing up the rest of the many, many places that touch strings. Would
the Poppler project be generally interested in this kind of overhaul?
Are there any major pitfalls waiting on this path?

Cheers
Jonathan

Reply via email to