Re: xub_StrLen and = 64k paragraphs

2013-12-02 Thread Caolán McNamara
On Thu, 2013-11-28 at 23:03 +0100, Eike Rathke wrote:
 I'd rather like to see a value (SAL_MAX_INT32-1) as replacement and name
 it SAL_MAX_STRING 

FWIW, writer has a TXTNODE_MAX which is currently STRING_LEN - 2;

C.



___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


xub_StrLen and = 64k paragraphs

2013-11-28 Thread Caolán McNamara
So now UniString is gone we have the chance to make use of OUString for
= 64k paragraphs (#i17171#) but we basically have to decide what to do
with places that currently return STRING_LEN/STRING_MAXLEN in xub_Strlen
(unsigned short). Do we want to return -1 or SAL_MAX_INT32 as the
replacement. -1 is nice but there is so much code in existence that
expects the error/boundary condition return value to be  any valid
number that I reckon SAL_MAX_INT32 might be safer as the default.

For a concrete place see OutputDevice::GetTextBreak where STRING_LEN is
returned for the boundary case. See also grepping for STRING_LEN in sw
and think mass search/replace STRING_LEN SAL_MAX_INT32 :-)

For reference, I attach the trivial patches to remove the 16bit length
limits in writer that stop anything larger that 0x entering it which
are the final bits to be applied once everything else is done.

C.
From 00a379e51e629b218bcdd281e8be25f5f6cc3b7d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Caol=C3=A1n=20McNamara?= caol...@redhat.com
Date: Thu, 7 Nov 2013 14:59:33 +
Subject: [PATCH 01/12] Related: fdo#17171 stop clipping strings to STRING_LEN
 on .doc import

Change-Id: Ic99132d0ee7804dc3625bef88cd35b2894f342c2
---
 sw/source/filter/ww8/ww8par.cxx | 42 +++--
 1 file changed, 7 insertions(+), 35 deletions(-)

diff --git a/sw/source/filter/ww8/ww8par.cxx b/sw/source/filter/ww8/ww8par.cxx
index 6a1c048..66107ff 100644
--- a/sw/source/filter/ww8/ww8par.cxx
+++ b/sw/source/filter/ww8/ww8par.cxx
@@ -2974,19 +2974,13 @@ bool SwWW8ImplReader::ReadPlainChars(WW8_CP rPos, sal_Int32 nEnd, sal_Int32 nCp
 // Reset Unicode flag and correct FilePos if needed.
 // Note: Seek is not expensive, as we're checking inline whether or not
 // the correct FilePos has already been reached.
-xub_StrLen nStrLen;
-if (nValidStrLen = (STRING_MAXLEN-1))
-nStrLen = writer_castxub_StrLen(nValidStrLen);
-else
-nStrLen = STRING_MAXLEN-1;
-
 const rtl_TextEncoding eSrcCharSet = bVer67 ? GetCurrentCharSet() :
 RTL_TEXTENCODING_MS_1252;
 const rtl_TextEncoding eSrcCJKCharSet = bVer67 ? GetCurrentCJKCharSet() :
 RTL_TEXTENCODING_MS_1252;
 
 // allocate unicode string data
-rtl_uString *pStr = rtl_uString_alloc(nStrLen);
+rtl_uString *pStr = rtl_uString_alloc(nValidStrLen);
 sal_Unicode* pBuffer = pStr-buffer;
 sal_Unicode* pWork = pBuffer;
 
@@ -2997,7 +2991,7 @@ bool SwWW8ImplReader::ReadPlainChars(WW8_CP rPos, sal_Int32 nEnd, sal_Int32 nCp
 hConverter = rtl_createTextToUnicodeConverter(eSrcCharSet);
 
 if (!bIsUnicode)
-p8Bits = new sal_Char[nStrLen];
+p8Bits = new sal_Char[nValidStrLen];
 
 // read the stream data
 sal_uInt8   nBCode = 0;
@@ -3009,7 +3003,7 @@ bool SwWW8ImplReader::ReadPlainChars(WW8_CP rPos, sal_Int32 nEnd, sal_Int32 nCp
 if (pItem != NULL)
 nCTLLang = dynamic_castconst SvxLanguageItem *(pItem)-GetLanguage();
 
-for( nL2 = 0; nL2  nStrLen; ++nL2, ++pWork )
+for( nL2 = 0; nL2  nValidStrLen; ++nL2, ++pWork )
 {
 if (bIsUnicode)
 *pStrm  nUCode; // unicode  -- read 2 bytes
@@ -3064,9 +3058,9 @@ bool SwWW8ImplReader::ReadPlainChars(WW8_CP rPos, sal_Int32 nEnd, sal_Int32 nCp
 xub_StrLen nEndUsed = nL2;
 
 if (!bIsUnicode)
-nEndUsed = Custom8BitToUnicode(hConverter, p8Bits, nL2, pBuffer, nStrLen);
+nEndUsed = Custom8BitToUnicode(hConverter, p8Bits, nL2, pBuffer, nValidStrLen);
 
-for( sal_Int32 nI = 0; nI  nStrLen; ++nI, ++pBuffer )
+for( sal_Int32 nI = 0; nI  nValidStrLen; ++nI, ++pBuffer )
 if (m_bRegardHindiDigits  bBidi  LangUsesHindiNumbers(nCTLLang))
 *pBuffer = TranslateToHindiNumbers(*pBuffer);
 
@@ -3085,7 +3079,7 @@ bool SwWW8ImplReader::ReadPlainChars(WW8_CP rPos, sal_Int32 nEnd, sal_Int32 nCp
 if (pStr)
 rtl_uString_release(pStr);
 delete [] p8Bits;
-return nL2 = nStrLen;
+return nL2 = nValidStrLen;
 }
 
 #define MSASCII SAL_MAX_INT16
@@ -3326,29 +3320,7 @@ void SwWW8ImplReader::simpleAddTextToParagraph(const OUString rAddString)
 if (!pNd)
 return;
 
-if ((pNd-GetTxt().getLength() + rAddString.getLength())  STRING_MAXLEN-1)
-{
-rDoc.InsertString(*pPaM, rAddString);
-}
-else
-{
-
-if (pNd-GetTxt().getLength()  STRING_MAXLEN -1)
-{
-OUString sTempStr = rAddString.copy( 0,
-STRING_MAXLEN - pNd-GetTxt().getLength() -1);
-rDoc.InsertString(*pPaM, sTempStr);
-sTempStr = rAddString.copy(sTempStr.getLength(),
-rAddString.getLength() - sTempStr.getLength());
-AppendTxtNode(*pPaM-GetPoint());
-rDoc.InsertString(*pPaM, sTempStr);
-}
-else
-{
-AppendTxtNode(*pPaM-GetPoint());
-rDoc.InsertString(*pPaM, rAddString);
-}
-}
+

Re: xub_StrLen and = 64k paragraphs

2013-11-28 Thread Matteo Casalin
On Thu, 28 Nov 2013 16:02:06 +
Caolán McNamara caol...@redhat.com wrote:

 So now UniString is gone we have the chance to make use of OUString for
 = 64k paragraphs (#i17171#) but we basically have to decide what to do
 with places that currently return STRING_LEN/STRING_MAXLEN in xub_Strlen
 (unsigned short). Do we want to return -1 or SAL_MAX_INT32 as the
 replacement. -1 is nice but there is so much code in existence that
 expects the error/boundary condition return value to be  any valid
 number that I reckon SAL_MAX_INT32 might be safer as the default.
 
 For a concrete place see OutputDevice::GetTextBreak where STRING_LEN is
 returned for the boundary case. See also grepping for STRING_LEN in sw
 and think mass search/replace STRING_LEN SAL_MAX_INT32 :-)
 
 For reference, I attach the trivial patches to remove the 16bit length
 limits in writer that stop anything larger that 0x entering it which
 are the final bits to be applied once everything else is done.
 
 C.

I'm (slowly) working on this in sw and, from what I 've seen up to now, 
sometimes STRING_LEN/STRING_MAXLEN are used just as an upper bound for some 
following code, while sometimes they report an error condition.
Even if I really would like this conversion process being completed quickly, I 
can't see a mass search/replace as a safe solution. Deciding the right value to 
be used on a case by case basis is also difficult since it involves looking at 
a lot of (sometimes obscure) code but it has two advantages:
* finally separating error from boundary conditions
* incremental work should be easier to debug/fix
If a mass replacement is chosen as the way to go, I would suggest to choose 
SAL_MAX_INT32.

Cheers
Matteo

PS: in some code also sal_uInt32 was used as an extentended xub_StrLen: I was 
planning to ask on the ML for hints about a class that does that and then 
interfaces with OUStrings, sadly I can't remember its name at this moment.

PPS: in order to further improve code clarity and maintainability, I was also 
thinking about introducing a

class OUString
{
[...]
static sal_Int32 maxLength() {return SAL_MAX_INT32;}
};

I can provide a patch, in case this is seen as a good idea.
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: xub_StrLen and = 64k paragraphs

2013-11-28 Thread Eike Rathke
Hi Caolán,

On Thursday, 2013-11-28 16:02:06 +, Caolán McNamara wrote:

 Do we want to return -1 or SAL_MAX_INT32 as the
 replacement. -1 is nice but there is so much code in existence that
 expects the error/boundary condition return value to be  any valid
 number that I reckon SAL_MAX_INT32 might be safer as the default.

I'd rather like to see a value (SAL_MAX_INT32-1) as replacement and name
it SAL_MAX_STRING or some such. Reason is there are/were places that
loop until = STRING_MAXLEN and if the string really was that long
looped forever because it wrapped around 0x ... I'd like to avoid
situations where such could happen again and SAL_MAX_STRING+1 actually
is a value within the bounds of the type.

  Eike

-- 
LibreOffice Calc developer. Number formatter stricken i18n transpositionizer.
GPG key ID: 0x65632D3A - 2265 D7F3 A7B0 95CC 3918  630B 6A6C D5B7 6563 2D3A
Support the FSFE, care about Free Software! https://fsfe.org/support/?erack


pgpL3idvbs8w5.pgp
Description: PGP signature
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice