Re: Efficient string concatenation
On Monday 03 of December 2012, Caolán McNamara wrote: > On Sun, 2012-12-02 at 22:59 -0600, Norbert Thiebaud wrote: > > vaguely related... since we are talking about performance... why > > *_new_WithLength() in strtmpl.cxx is doing a memset on the whole newly > > allocated buffer... > > yeah, someone noticed that when writing i18npool/i18nutil and wrote a > non-memsetting version which I moved into comphelper as > rtl_uString_alloc/rtl_string_alloc But that doesn't make much sense, does it? What's the point of an internal string helper function in comphelper instead of sal? -- Lubos Lunak l.lu...@suse.cz ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Efficient string concatenation
On Sun, 2012-12-02 at 22:59 -0600, Norbert Thiebaud wrote: > vaguely related... since we are talking about performance... why > *_new_WithLength() in strtmpl.cxx is doing a memset on the whole newly > allocated buffer... yeah, someone noticed that when writing i18npool/i18nutil and wrote a non-memsetting version which I moved into comphelper as rtl_uString_alloc/rtl_string_alloc c. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Efficient string concatenation
On 12/02/2012 11:56 PM, Lubos Lunak wrote: From 9b6e150c83c43c51dbdc6a5075d110fcc6e25210 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Lubo=C5=A1=20Lu=C5=88=C3=A1k?= Date: Sun, 2 Dec 2012 22:33:47 +0100 Subject: [PATCH 4/5] make sure uno::Any works with fast operator+ The result of the operation needs to be first converted to O(U)String. Change-Id: I24dafeaebf68a0eff3edf1d1cf713bfc10bbd8f4 --- cppu/inc/com/sun/star/uno/Any.hxx | 42 + 1 file changed, 42 insertions(+) diff --git a/cppu/inc/com/sun/star/uno/Any.hxx b/cppu/inc/com/sun/star/uno/Any.hxx index beeed81..8f6dfc6 100644 --- a/cppu/inc/com/sun/star/uno/Any.hxx +++ b/cppu/inc/com/sun/star/uno/Any.hxx @@ -188,6 +188,23 @@ inline Any SAL_CALL makeAny( bool const & value ) SAL_THROW(()) } //__ +#ifdef RTL_FAST_STRING +template< class C1, class C2 > +inline Any SAL_CALL makeAny( const rtl::OStringConcat< C1, C2 >& value ) SAL_THROW(()) +{ +const rtl::OString str( value ); +return Any( &str, ::cppu::getTypeFavourUnsigned(&str) ); +} There should be no support for OString in Any. Stephan +//__ +template< class C1, class C2 > +inline Any SAL_CALL makeAny( const rtl::OUStringConcat< C1, C2 >& value ) SAL_THROW(()) +{ +const rtl::OUString str( value ); +return Any( &str, ::cppu::getTypeFavourUnsigned(&str) ); +} +#endif +//__ template< class C > inline void SAL_CALL operator <<= ( Any & rAny, const C & value ) SAL_THROW(()) { @@ -208,6 +225,31 @@ inline void SAL_CALL operator <<= ( Any & rAny, bool const & value ) (uno_AcquireFunc) cpp_acquire, (uno_ReleaseFunc) cpp_release ); } +//__ +#ifdef RTL_FAST_STRING +template< class C1, class C2 > +inline void SAL_CALL operator <<= ( Any & rAny, const rtl::OStringConcat< C1, C2 >& value ) +SAL_THROW(()) +{ +const rtl::OString str( value ); +const Type & rType = ::cppu::getTypeFavourUnsigned(&str); +::uno_type_any_assign( +&rAny, const_cast< rtl::OString * >( &str ), rType.getTypeLibType(), +(uno_AcquireFunc)cpp_acquire, (uno_ReleaseFunc)cpp_release ); +} + +//__ +template< class C1, class C2 > +inline void SAL_CALL operator <<= ( Any & rAny, const rtl::OUStringConcat< C1, C2 >& value ) +SAL_THROW(()) +{ +const rtl::OUString str( value ); +const Type & rType = ::cppu::getTypeFavourUnsigned(&str); +::uno_type_any_assign( +&rAny, const_cast< rtl::OUString * >( &str ), rType.getTypeLibType(), +(uno_AcquireFunc)cpp_acquire, (uno_ReleaseFunc)cpp_release ); +} +#endif //__ template< class C > inline sal_Bool SAL_CALL operator >>= ( const Any & rAny, C & value ) SAL_THROW(()) -- 1.7.10.4 ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Efficient string concatenation
> > On Sun, Dec 2, 2012 at 4:56 PM, Lubos Lunak wrote: > >> The work is based on threads [1] and [2] and occassionally seeing in > >> the commits that people doing string cleanups sometimes change ugly code > >> to only slightly less ugly code. With the new feature enabled, any > >> string concatenation/creation is simply done as (well, ok, the number() > >> part is not done yet, but shouldn't be difficult to add): > >> > >> OUString s = foo + bar + "baz" + OUString::number( many ) + "whatever"; > >> > >> All the other alternatives, like explicit OUStringBuffer and repeated > >> append() should be now worse in all possible aspects. > > What is the recommended way to deal with > for() > { > sString += foo; > } > > OUStringBuffer is still the way to go in that case right ? Yes. I was referring above to the practice of chaining of append() calls in one expression or several consequent statements. If the construction really needs to be done step by step, then it still needs to be done this way, I don't see any good way around it (except for folding OUStringBuffer functionality to OUString one day). But thinking of this, I should add support for this fast concat to operator+=/append(). On Monday 03 of December 2012, Norbert Thiebaud wrote: > On Sun, Dec 2, 2012 at 4:56 PM, Lubos Lunak wrote: > +char* end = c.addData( buffer->buffer ); > +buffer->length = end - buffer->buffer; > +pData = buffer; > +nCapacity = l + 16; > how does that work ? you allocate l-bytes but declare a capacity > of l + 16 OUStringBuffer always allocates 16 extra characters at the beginning. See e.g. the default ctor. On Monday 03 of December 2012, Norbert Thiebaud wrote: > +OStringBuffer( const OStringConcat< T1, T2 >& c ) > +{ > +const int l = c.length(); > +rtl_String* buffer = NULL; > +rtl_string_new_WithLength( &buffer, l ); > +char* end = c.addData( buffer->buffer ); > ^^^ > here the buffer is not 0-terminated... It is 0-terminated because of the 0-memset in the _new_WithLength() function. But given the comment about that not being necessary, it makes sense to add it explicitly. > vaguely related... since we are talking about performance... why > *_new_WithLength() in strtmpl.cxx is doing a memset on the whole newly > allocated buffer... Good question. But the string stuff does more things that are not very smart. -- Lubos Lunak l.lu...@suse.cz ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Efficient string concatenation
On Sun, Dec 2, 2012 at 10:59 PM, Norbert Thiebaud wrote: > vaguely related... since we are talking about performance... why > *_new_WithLength() in strtmpl.cxx is doing a memset on the whole newly > allocated buffer... > surely the allocated buffer will be filled with something soon > enough... all that is needed is buffer[0] = 0 to make getStr() > C-string-safe. Oh, apparently you noticed too... later in the patch: +rtl_uString* buffer = NULL; +rtl_uString_new_WithLength( &buffer, l ); // TODO this clears, not necessary :-) Norbert ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Efficient string concatenation
Patch 0001 it seems to me that addData via the different addDataHelper, do _not_ add the final binary 0, like rtl_stringbuffer_insert() for instance is doing so: +#ifdef RTL_FAST_STRING +template< typename T1, typename T2 > +OStringBuffer( const OStringConcat< T1, T2 >& c ) +{ +const int l = c.length(); +rtl_String* buffer = NULL; +rtl_string_new_WithLength( &buffer, l ); +char* end = c.addData( buffer->buffer ); ^^^ here the buffer is not 0-terminated... so OStringBuffer sBuff sBuff.append("foo") printf("%s", sBuff.getStr()) -> ok OStringBuffer sBuff = "f" + "oo"; printf("%s", sBuff.getStr()) -> segfault or at least buffer overrun no? vaguely related... since we are talking about performance... why *_new_WithLength() in strtmpl.cxx is doing a memset on the whole newly allocated buffer... surely the allocated buffer will be filled with something soon enough... all that is needed is buffer[0] = 0 to make getStr() C-string-safe. Norbert ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Efficient string concatenation
On Sun, Dec 2, 2012 at 4:56 PM, Lubos Lunak wrote: > in pathc 0001: +#ifdef RTL_FAST_STRING +template< typename T1, typename T2 > +OStringBuffer( const OStringConcat< T1, T2 >& c ) +{ +const int l = c.length(); +rtl_String* buffer = NULL; +rtl_string_new_WithLength( &buffer, l ); +char* end = c.addData( buffer->buffer ); +buffer->length = end - buffer->buffer; +pData = buffer; +nCapacity = l + 16; how does that work ? you allocate l-bytes but declare a capacity of l + 16 Norbert ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Efficient string concatenation
On Sun, Dec 2, 2012 at 6:55 PM, Norbert Thiebaud wrote: Sorry fro previous post. mishap... > On Sun, Dec 2, 2012 at 4:56 PM, Lubos Lunak wrote: >> >> Hello, >> >> The work is based on threads [1] and [2] and occassionally seeing in the >> commits that people doing string cleanups sometimes change ugly code to only >> slightly less ugly code. With the new feature enabled, any string >> concatenation/creation is simply done as (well, ok, the number() part is not >> done yet, but shouldn't be difficult to add): >> >> OUString s = foo + bar + "baz" + OUString::number( many ) + "whatever"; >> >> All the other alternatives, like explicit OUStringBuffer and repeated >> append() >> should be now worse in all possible aspects. > What is the recommended way to deal with for() { sString += foo; } OUStringBuffer is still the way to go in that case right ? Norbert ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: Efficient string concatenation
On Sun, Dec 2, 2012 at 4:56 PM, Lubos Lunak wrote: > > Hello, > > The work is based on threads [1] and [2] and occassionally seeing in the > commits that people doing string cleanups sometimes change ugly code to only > slightly less ugly code. With the new feature enabled, any string > concatenation/creation is simply done as (well, ok, the number() part is not > done yet, but shouldn't be difficult to add): > > OUString s = foo + bar + "baz" + OUString::number( many ) + "whatever"; > > All the other alternatives, like explicit OUStringBuffer and repeated append() > should be now worse in all possible aspects. What is the recommended way to deal with for() { } In fact, this should result in > just one OUString allocation, one data copy for anything and at most one > length computation, so it should possibly beat even strcpy+strcat, while at > the same time looking good. > > I successfully built with gcc (not the ancient Apple one though, I intend to > pass there again), clang and msvc2010 and passed 'make check'. The resulting > binary size is about the same (funnily enough it seems that gcc's -Os stops > it from fully inlining, preventing it from optimizing out more stuff and > making the code smaller). > > Even though this is in sal/, the intention is to keep this code LO-internal, > so there won't be any BIC problems, 3rd party apps will keep getting the > original code. All O(U)String code is inline functions, so there shouldn't be > any trouble there. > > So as you can see, this would be perfect, if it weren't for some small > gotchas: > - since operator+ now returns a different object, this is not entirely source > compatible, and explicit conversions to O(U)String may need to be added > (e.g. '( a + "b" ).getStr()' -> 'OUString( a + "b" ).getStr()' ). If some of > those cases would be too annoying, I can try harder to avoid them, but some > are unavoidable ( ?: operator being one of them and somewhat vexing). However > the patch 0005 patch fixing all such issues in LO is pretty small, so this > does not currently seem to be an issue (although that may be because the idea > of writing simple string-handling code may be catching up slowly). > - as it is template-based, error messages can get somewhat longer, but IMO > it's nothing horrific. Compilers with decent error reporting are > recommended :). Alternatively, temporary "#define RTL_DISABLE_FAST_STRING" at > the top of the source file should help too. > > Still, I think it works pretty well. > > [1] > http://lists.freedesktop.org/archives/libreoffice/2011-November/021156.html > [2] > http://lists.freedesktop.org/archives/libreoffice/2011-December/022323.html > > -- > Lubos Lunak > l.lu...@suse.cz > > ___ > LibreOffice mailing list > LibreOffice@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/libreoffice > ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice