On Tue, Jan 01, 2013 at 11:18:29PM +0100, Ondřej Bílka wrote:
> On Tue, Jan 01, 2013 at 09:12:07PM +0100, Loup Vaillant-David wrote:
> > 
> >   void latin1_to_utf8(std::string & s);
> > 
> Let me guess. They do it to save cycles caused by allocation of new
> string.
> > instead of
> > 
> >   std::string utf8_of_latin1(std::string s)
> > or
> >   std::string utf8_of_latin1(const std::string & s)

You may have guessed right.  But then, *they* guessed wrong.

First, the program in which I saw this conversion routine is dead slow
anyway.  If they really cared about the performance of a few encoding
conversion, they should have started by unifying string handling to
begin with (there are 6 string types in the program, all actively
used, and sometimes converted back and forth).

Second, every time the conversion does actually do anything, the utf8
string will be longer than the original one, and require a realloc()
anyway (unless they wrote some very clever code, but the overall
quality of their monstrosity makes it unlikely).

Finally, I often needed to write this:

  std::string temp = compute_text();
  latin1_to_utf8(temp);
  call_function(temp);

Which does not reduce allocations in the slightest, compared to

  call_function(utf8_of_latin1(compute_text()));

My version may even be a bit more amenable to optimisation by the
compiler. (In addition to be more readable, I dare say.)

So, they *may* have made this move because they cared about
performance.  A more likely explanation though, is that they simply
thought "oh, I need to convert some strings to utf8", and
transliterated that in C++.  They could have thought "oh, I need utf8
versions of some strings" instead, but that would be functional
thinking.

Loup.
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to