Re: [Boost-users] [review] [text] Text review begins June 11

Zach Laine via Boost-users Fri, 12 Jun 2020 10:49:33 -0700

On Wed, Jun 3, 2020 at 1:22 AM Emil Dotchevski via Boost-users
<[email protected]> wrote:
>
> On Tue, Jun 2, 2020 at 7:44 PM Glen Fernandes via Boost-users 
> <[email protected]> wrote:
> >
> > The Boost formal review of Zach Laine's Unicode library, Text,
> > commences on June 11th and concludes June 20th. We welcome the
> > participation of Boost developers and users, and the C++ community.
>
> Here is my review of Text in the form of a list of notes/questions, in no 
> particular order.


Thanks for the review, Emil.

> Reading through the documentation, everything makes sense. The motivation for 
> each type and operation is easy to understand.
>
> I welcome the use of signed int as index/size type, but I wonder if this 
> would interact in some negative way when mixed with similar types from the 
> standard library. I don't think so, but who knows.

In some cases, that is sure to be the case.  text::string needs to
exist only so that I can experiment with integrating something like
text::text into it, to show how we might add to std::string if
std::text ever becomes a thing.  So, I don't expect the
signed/unsigned interop issue to come up much, if at all -- I don't
expect people to use text::string much directly.

> I don't like the multi-page format of the documentation. I'd prefer a single 
> HTML page for easy searching.

There's a lot of it!  In previous Boost reviews, I've gotten the
opposite of this feedback, FWIW.

> What is the purpose of the throw_on_encoding_error type here? 
> https://github.com/tzlaine/text/blob/b8bb1a1101e7cf53b460671a3ead19516e7636fe/include/boost/text/transcode_iterator.hpp#L35
>
> I searched and couldn't find any place in the documentation that explains how 
> ErrorHandler is used. Looking at the source code, it's just a function, but 
> its interface should be documented.

Right!  That's an oversight.  I'll add documentation on its use and interface.

> Generally I dislike interfaces with configurable error handling but it 
> appears that in the case of the conversion iterators this is warranted: both 
> replacement character and throwing are useful. I wonder though if it would be 
> better to delete this template parameter and always output 0xfffd, and in 
> addition provide an iterator adaptor type which can be used to wrap e.g. an 
> output iterator, and which throws if it sees 0xfffd being output.

This is possible, and if others agree I can certainly make this change.

> Both .c_str() and .data() should be added to string for compatibility. Also, 
> it is not true that s.begin() is equivalent to s.c_str(), because in debug 
> iterators should not be pointers. Perhaps &*s.begin() is equivalent, but this 
> should not be valid on an empty string (assert), null termination 
> notwithstanding.

Well, I don't have any non-pointer debug iterators to consider, and an
empty text::string is null-terminated.  .c_str() .data() and .begin()
truly are synonyms in this case.  I see no need to have more than one.

> The note about passing ropes by value, I think, is redundant. Logically, 
> copying a rope gets you an independent copy, the fact that internally the 
> copies share state is an implementation detail. Therefore, the documentation 
> can just specify that a rope is as thread-safe as an int.

I don't know about that.  That is, I don't know if it's actually as
threadsafe as an int.  I'm simply trying to remind folks that if you
pass a rope via const& you're not going to get the thread-safety you
want.  I've seen a lot of shared_ptrs passed that way, and I'm trying
to short-circuit the internal decision making that might lead to the
same choice when passing a rope.

> Speaking of thread-safety, where are the thread-safety guarantees of rope 
> documented? Where are the relevant tests?

The documentation of threadsafety is here:

https://tzlaine.github.io/text/doc/html/boost_text__proposed_/the_string_layer.html#boost_text__proposed_.the_string_layer.the__code__phrase_role__identifier__unencoded_rope__phrase___code__type

You have to read down a bit.  The tests are in rope_threadsafety.cpp,
which I have run with and without TSan.

> Speaking of tests, I suggest separating the generated test sources in a 
> subdirectory so things like test/string.cpp are easy to find. Not to mention 
> github.com truncates the test directory to 1000 files.

This seems like a good idea -- it would certainly make things easier
for me too!  I'll do this.

> The grammar seems wrong here; 'I don't know if you've ever written an undo 
> system that can do, undo, or redo any command you can think of, in less than 
> 40 lines of code, but there one is."

It parses for me, and apparently Gavin, but I agree it is a bit
awkward.  I'll reword it.

> "Every text editor you've ever used that handles large files gracefully has 
> done something to break the file into chunks" What about the buffer gap 
> algorithm? I think Borland editors used this way back in the day. :)

I have no idea.  I *think* this confirms what I was trying to say,
though -- that your editor is dealing with sections of your large file
(and not the whole thing) for efficiency.  The rope approach I use in
Boost.Text is just one possible implementation of this high-level
approach.

Zach
_______________________________________________
Boost-users mailing list
[email protected]
https://lists.boost.org/mailman/listinfo.cgi/boost-users

Re: [Boost-users] [review] [text] Text review begins June 11

Reply via email to