Re: Efficient string concatenation

2012-12-03 Thread Lubos Lunak
On Monday 03 of December 2012, Caolán McNamara wrote:
> On Sun, 2012-12-02 at 22:59 -0600, Norbert Thiebaud wrote:
> > vaguely related... since we are talking about performance... why
> > *_new_WithLength() in strtmpl.cxx is doing a memset on the whole newly
> > allocated buffer...
>
> yeah, someone noticed that when writing i18npool/i18nutil and wrote a
> non-memsetting version which I moved into comphelper as
> rtl_uString_alloc/rtl_string_alloc

 But that doesn't make much sense, does it? What's the point of an internal 
string helper function in comphelper instead of sal?

-- 
 Lubos Lunak
 l.lu...@suse.cz
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Efficient string concatenation

2012-12-03 Thread Caolán McNamara
On Sun, 2012-12-02 at 22:59 -0600, Norbert Thiebaud wrote:
> vaguely related... since we are talking about performance... why
> *_new_WithLength() in strtmpl.cxx is doing a memset on the whole newly
> allocated buffer...

yeah, someone noticed that when writing i18npool/i18nutil and wrote a
non-memsetting version which I moved into comphelper as
rtl_uString_alloc/rtl_string_alloc

c.

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Efficient string concatenation

2012-12-03 Thread Stephan Bergmann

On 12/02/2012 11:56 PM, Lubos Lunak wrote:

From 9b6e150c83c43c51dbdc6a5075d110fcc6e25210 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lubo=C5=A1=20Lu=C5=88=C3=A1k?=
Date: Sun, 2 Dec 2012 22:33:47 +0100
Subject: [PATCH 4/5] make sure uno::Any works with fast operator+

The result of the operation needs to be first converted to O(U)String.

Change-Id: I24dafeaebf68a0eff3edf1d1cf713bfc10bbd8f4
---
  cppu/inc/com/sun/star/uno/Any.hxx |   42 +
  1 file changed, 42 insertions(+)

diff --git a/cppu/inc/com/sun/star/uno/Any.hxx 
b/cppu/inc/com/sun/star/uno/Any.hxx
index beeed81..8f6dfc6 100644
--- a/cppu/inc/com/sun/star/uno/Any.hxx
+++ b/cppu/inc/com/sun/star/uno/Any.hxx
@@ -188,6 +188,23 @@ inline Any SAL_CALL makeAny( bool const & value ) 
SAL_THROW(())
  }

  
//__
+#ifdef RTL_FAST_STRING
+template< class C1, class C2 >
+inline Any SAL_CALL makeAny( const rtl::OStringConcat< C1, C2 >& value ) 
SAL_THROW(())
+{
+const rtl::OString str( value );
+return Any( &str, ::cppu::getTypeFavourUnsigned(&str) );
+}


There should be no support for OString in Any.

Stephan


+//__
+template< class C1, class C2 >
+inline Any SAL_CALL makeAny( const rtl::OUStringConcat< C1, C2 >& value ) 
SAL_THROW(())
+{
+const rtl::OUString str( value );
+return Any( &str, ::cppu::getTypeFavourUnsigned(&str) );
+}
+#endif
+//__
  template< class C >
  inline void SAL_CALL operator <<= ( Any & rAny, const C & value ) 
SAL_THROW(())
  {
@@ -208,6 +225,31 @@ inline void SAL_CALL operator <<= ( Any & rAny, bool const 
& value )
  (uno_AcquireFunc) cpp_acquire, (uno_ReleaseFunc) cpp_release );
  }

+//__
+#ifdef RTL_FAST_STRING
+template< class C1, class C2 >
+inline void SAL_CALL operator <<= ( Any & rAny, const rtl::OStringConcat< C1, C2 
>& value )
+SAL_THROW(())
+{
+const rtl::OString str( value );
+const Type & rType = ::cppu::getTypeFavourUnsigned(&str);
+::uno_type_any_assign(
+&rAny, const_cast< rtl::OString * >( &str ), rType.getTypeLibType(),
+(uno_AcquireFunc)cpp_acquire, (uno_ReleaseFunc)cpp_release );
+}
+
+//__
+template< class C1, class C2 >
+inline void SAL_CALL operator <<= ( Any & rAny, const rtl::OUStringConcat< C1, C2 
>& value )
+SAL_THROW(())
+{
+const rtl::OUString str( value );
+const Type & rType = ::cppu::getTypeFavourUnsigned(&str);
+::uno_type_any_assign(
+&rAny, const_cast< rtl::OUString * >( &str ), rType.getTypeLibType(),
+(uno_AcquireFunc)cpp_acquire, (uno_ReleaseFunc)cpp_release );
+}
+#endif
  
//__
  template< class C >
  inline sal_Bool SAL_CALL operator >>= ( const Any & rAny, C & value ) 
SAL_THROW(())
-- 1.7.10.4


___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Efficient string concatenation

2012-12-03 Thread Lubos Lunak
> > On Sun, Dec 2, 2012 at 4:56 PM, Lubos Lunak  wrote:
> >>  The work is based on threads [1] and [2] and occassionally seeing in
> >> the commits that people doing string cleanups sometimes change ugly code
> >> to only slightly less ugly code. With the new feature enabled, any
> >> string concatenation/creation is simply done as (well, ok, the number()
> >> part is not done yet, but shouldn't be difficult to add):
> >>
> >> OUString s = foo + bar + "baz" + OUString::number( many ) + "whatever";
> >>
> >> All the other alternatives, like explicit OUStringBuffer and repeated
> >> append() should be now worse in all possible aspects.
>
> What is the recommended way to deal with
> for()
> {
>   sString += foo;
> }
>
> OUStringBuffer is still the way to go in that case right ?

 Yes. I was referring above to the practice of chaining of append() calls in 
one expression or several consequent statements. If the construction really 
needs to be done step by step, then it still needs to be done this way, I 
don't see any good way around it (except for folding OUStringBuffer 
functionality to OUString one day).

 But thinking of this, I should add support for this fast concat to 
operator+=/append().

On Monday 03 of December 2012, Norbert Thiebaud wrote:
> On Sun, Dec 2, 2012 at 4:56 PM, Lubos Lunak  wrote:
> +char* end = c.addData( buffer->buffer );
> +buffer->length = end - buffer->buffer;
> +pData = buffer;
> +nCapacity = l + 16;
>  how does that work ? you allocate l-bytes but declare a capacity
> of l + 16 

 OUStringBuffer always allocates 16 extra characters at the beginning. See 
e.g. the default ctor.

On Monday 03 of December 2012, Norbert Thiebaud wrote:
> +OStringBuffer( const OStringConcat< T1, T2 >& c )
> +{
> +const int l = c.length();
> +rtl_String* buffer = NULL;
> +rtl_string_new_WithLength( &buffer, l );
> +char* end = c.addData( buffer->buffer );
> ^^^
> here the buffer is not 0-terminated...

 It is 0-terminated because of the 0-memset in the _new_WithLength() function. 
But given the comment about that not being necessary, it makes sense to add 
it explicitly.

> vaguely related... since we are talking about performance... why
> *_new_WithLength() in strtmpl.cxx is doing a memset on the whole newly
> allocated buffer...

 Good question. But the string stuff does more things that are not very smart.

-- 
 Lubos Lunak
 l.lu...@suse.cz
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Efficient string concatenation

2012-12-02 Thread Norbert Thiebaud
On Sun, Dec 2, 2012 at 10:59 PM, Norbert Thiebaud  wrote:
> vaguely related... since we are talking about performance... why
> *_new_WithLength() in strtmpl.cxx is doing a memset on the whole newly
> allocated buffer...
> surely the allocated buffer will be filled with something soon
> enough... all that is needed is buffer[0] = 0 to make getStr()
> C-string-safe.

Oh, apparently you noticed too... later in the patch:

+rtl_uString* buffer = NULL;
+rtl_uString_new_WithLength( &buffer, l ); // TODO this
clears, not necessary

:-)

Norbert
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Efficient string concatenation

2012-12-02 Thread Norbert Thiebaud
Patch 0001

it seems to me that addData via the different addDataHelper, do _not_
add the final binary 0, like rtl_stringbuffer_insert() for instance is
doing

so:
+#ifdef RTL_FAST_STRING
+template< typename T1, typename T2 >
+OStringBuffer( const OStringConcat< T1, T2 >& c )
+{
+const int l = c.length();
+rtl_String* buffer = NULL;
+rtl_string_new_WithLength( &buffer, l );
+char* end = c.addData( buffer->buffer );
^^^
here the buffer is not 0-terminated...

so

OStringBuffer sBuff
sBuff.append("foo")
printf("%s", sBuff.getStr())  -> ok

OStringBuffer sBuff = "f" + "oo";
printf("%s", sBuff.getStr())  -> segfault or at least buffer overrun

no?

vaguely related... since we are talking about performance... why
*_new_WithLength() in strtmpl.cxx is doing a memset on the whole newly
allocated buffer...
surely the allocated buffer will be filled with something soon
enough... all that is needed is buffer[0] = 0 to make getStr()
C-string-safe.

Norbert
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Efficient string concatenation

2012-12-02 Thread Norbert Thiebaud
On Sun, Dec 2, 2012 at 4:56 PM, Lubos Lunak  wrote:
>

in pathc 0001:

+#ifdef RTL_FAST_STRING
+template< typename T1, typename T2 >
+OStringBuffer( const OStringConcat< T1, T2 >& c )
+{
+const int l = c.length();
+rtl_String* buffer = NULL;
+rtl_string_new_WithLength( &buffer, l );
+char* end = c.addData( buffer->buffer );
+buffer->length = end - buffer->buffer;
+pData = buffer;
+nCapacity = l + 16;
 how does that work ? you allocate l-bytes but declare a capacity
of l + 16 

Norbert
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Efficient string concatenation

2012-12-02 Thread Norbert Thiebaud
On Sun, Dec 2, 2012 at 6:55 PM, Norbert Thiebaud  wrote:

Sorry fro previous post. mishap...

> On Sun, Dec 2, 2012 at 4:56 PM, Lubos Lunak  wrote:
>>
>>  Hello,
>>
>>  The work is based on threads [1] and [2] and occassionally seeing in the
>> commits that people doing string cleanups sometimes change ugly code to only
>> slightly less ugly code. With the new feature enabled, any string
>> concatenation/creation is simply done as (well, ok, the number() part is not
>> done yet, but shouldn't be difficult to add):
>>
>> OUString s = foo + bar + "baz" + OUString::number( many ) + "whatever";
>>
>> All the other alternatives, like explicit OUStringBuffer and repeated 
>> append()
>> should be now worse in all possible aspects.
>

What is the recommended way to deal with
for()
{
  sString += foo;
}

OUStringBuffer is still the way to go in that case right ?

Norbert
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Efficient string concatenation

2012-12-02 Thread Norbert Thiebaud
On Sun, Dec 2, 2012 at 4:56 PM, Lubos Lunak  wrote:
>
>  Hello,
>
>  The work is based on threads [1] and [2] and occassionally seeing in the
> commits that people doing string cleanups sometimes change ugly code to only
> slightly less ugly code. With the new feature enabled, any string
> concatenation/creation is simply done as (well, ok, the number() part is not
> done yet, but shouldn't be difficult to add):
>
> OUString s = foo + bar + "baz" + OUString::number( many ) + "whatever";
>
> All the other alternatives, like explicit OUStringBuffer and repeated append()
> should be now worse in all possible aspects.

What is the recommended way to deal with
for()
{

}


 In fact, this should result in
> just one OUString allocation, one data copy for anything and at most one
> length computation, so it should possibly beat even strcpy+strcat, while at
> the same time looking good.
>
>  I successfully built with gcc (not the ancient Apple one though, I intend to
> pass there again), clang and msvc2010 and passed 'make check'. The resulting
> binary size is about the same (funnily enough it seems that gcc's -Os stops
> it from fully inlining, preventing it from optimizing out more stuff and
> making the code smaller).
>
>  Even though this is in sal/, the intention is to keep this code LO-internal,
> so there won't be any BIC problems, 3rd party apps will keep getting the
> original code. All O(U)String code is inline functions, so there shouldn't be
> any trouble there.
>
>  So as you can see, this would be perfect, if it weren't for some small
> gotchas:
> - since operator+ now returns a different object, this is not entirely source
> compatible, and explicit conversions to O(U)String may need to be added
> (e.g. '( a + "b" ).getStr()' -> 'OUString( a + "b" ).getStr()' ). If some of
> those cases would be too annoying, I can try harder to avoid them, but some
> are unavoidable ( ?: operator being one of them and somewhat vexing). However
> the patch 0005 patch fixing all such issues in LO is pretty small, so this
> does not currently seem to be an issue (although that may be because the idea
> of writing simple string-handling code may be catching up slowly).
> - as it is template-based, error messages can get somewhat longer, but IMO
> it's nothing horrific. Compilers with decent error reporting are
> recommended :). Alternatively, temporary "#define RTL_DISABLE_FAST_STRING" at
> the top of the source file should help too.
>
>  Still, I think it works pretty well.
>
> [1]
> http://lists.freedesktop.org/archives/libreoffice/2011-November/021156.html
> [2]
> http://lists.freedesktop.org/archives/libreoffice/2011-December/022323.html
>
> --
>  Lubos Lunak
>  l.lu...@suse.cz
>
> ___
> LibreOffice mailing list
> LibreOffice@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/libreoffice
>
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice