On Wed, Jun 3, 2020 at 1:22 AM Emil Dotchevski via Boost-users <[email protected]> wrote: > > On Tue, Jun 2, 2020 at 7:44 PM Glen Fernandes via Boost-users > <[email protected]> wrote: > > > > The Boost formal review of Zach Laine's Unicode library, Text, > > commences on June 11th and concludes June 20th. We welcome the > > participation of Boost developers and users, and the C++ community. > > Here is my review of Text in the form of a list of notes/questions, in no > particular order.
Thanks for the review, Emil. > Reading through the documentation, everything makes sense. The motivation for > each type and operation is easy to understand. > > I welcome the use of signed int as index/size type, but I wonder if this > would interact in some negative way when mixed with similar types from the > standard library. I don't think so, but who knows. In some cases, that is sure to be the case. text::string needs to exist only so that I can experiment with integrating something like text::text into it, to show how we might add to std::string if std::text ever becomes a thing. So, I don't expect the signed/unsigned interop issue to come up much, if at all -- I don't expect people to use text::string much directly. > I don't like the multi-page format of the documentation. I'd prefer a single > HTML page for easy searching. There's a lot of it! In previous Boost reviews, I've gotten the opposite of this feedback, FWIW. > What is the purpose of the throw_on_encoding_error type here? > https://github.com/tzlaine/text/blob/b8bb1a1101e7cf53b460671a3ead19516e7636fe/include/boost/text/transcode_iterator.hpp#L35 > > I searched and couldn't find any place in the documentation that explains how > ErrorHandler is used. Looking at the source code, it's just a function, but > its interface should be documented. Right! That's an oversight. I'll add documentation on its use and interface. > Generally I dislike interfaces with configurable error handling but it > appears that in the case of the conversion iterators this is warranted: both > replacement character and throwing are useful. I wonder though if it would be > better to delete this template parameter and always output 0xfffd, and in > addition provide an iterator adaptor type which can be used to wrap e.g. an > output iterator, and which throws if it sees 0xfffd being output. This is possible, and if others agree I can certainly make this change. > Both .c_str() and .data() should be added to string for compatibility. Also, > it is not true that s.begin() is equivalent to s.c_str(), because in debug > iterators should not be pointers. Perhaps &*s.begin() is equivalent, but this > should not be valid on an empty string (assert), null termination > notwithstanding. Well, I don't have any non-pointer debug iterators to consider, and an empty text::string is null-terminated. .c_str() .data() and .begin() truly are synonyms in this case. I see no need to have more than one. > The note about passing ropes by value, I think, is redundant. Logically, > copying a rope gets you an independent copy, the fact that internally the > copies share state is an implementation detail. Therefore, the documentation > can just specify that a rope is as thread-safe as an int. I don't know about that. That is, I don't know if it's actually as threadsafe as an int. I'm simply trying to remind folks that if you pass a rope via const& you're not going to get the thread-safety you want. I've seen a lot of shared_ptrs passed that way, and I'm trying to short-circuit the internal decision making that might lead to the same choice when passing a rope. > Speaking of thread-safety, where are the thread-safety guarantees of rope > documented? Where are the relevant tests? The documentation of threadsafety is here: https://tzlaine.github.io/text/doc/html/boost_text__proposed_/the_string_layer.html#boost_text__proposed_.the_string_layer.the__code__phrase_role__identifier__unencoded_rope__phrase___code__type You have to read down a bit. The tests are in rope_threadsafety.cpp, which I have run with and without TSan. > Speaking of tests, I suggest separating the generated test sources in a > subdirectory so things like test/string.cpp are easy to find. Not to mention > github.com truncates the test directory to 1000 files. This seems like a good idea -- it would certainly make things easier for me too! I'll do this. > The grammar seems wrong here; 'I don't know if you've ever written an undo > system that can do, undo, or redo any command you can think of, in less than > 40 lines of code, but there one is." It parses for me, and apparently Gavin, but I agree it is a bit awkward. I'll reword it. > "Every text editor you've ever used that handles large files gracefully has > done something to break the file into chunks" What about the buffer gap > algorithm? I think Borland editors used this way back in the day. :) I have no idea. I *think* this confirms what I was trying to say, though -- that your editor is dealing with sections of your large file (and not the whole thing) for efficiency. The rope approach I use in Boost.Text is just one possible implementation of this high-level approach. Zach _______________________________________________ Boost-users mailing list [email protected] https://lists.boost.org/mailman/listinfo.cgi/boost-users
