[dev] Re: refactoring OUString

2011-06-10 Thread Stephan Bergmann
On Fri, Jun 10, 2011 at 6:51 AM, tora - Takamichi Akiyama <
t...@openoffice.org> wrote:

> Sorry, this mail is too long...


No problem, I'll briefly go through your five items one by one:

1. Delegation of the responsibility to choose a type of memory allocator
> To achieve both stability and performance at the same time, I would like to
> propose "Don't do all of it in the SAL", rather "Delegate certain
> responsibility to its users, i.e. programmers."
>
> Who knows the type of life time of data? SAL does? No. The programmers do.
>

Do they?  Theoretically, they do have the information available that is
necessary to choose the most efficient approach that is still correct
(depending on how early data can be destroyed again, whether it needs to be
multi-thread safe or async safe, etc.).

However, in practice, the necessary information is often so big that
programmers will make mistakes, choosing approaches that are incorrect.  If
these are not caught at compile time, and lead to program failure, I think
this is a problem (and history shows it to be a rather severe one).


> 2. Potential dead lock
> A code for crash reporter has a potential, dead lock problem.
> http://hg.services.openoffice.org/DEV300/file/tip/sal/osl/unx/backtrace.c
> Asynchronous-unsafe functions such as fprintf() are used in the context of
> signal handler.
>
> Consider this situation:
> 1. A "Segment violation, aka SEGV" occurs in malloc() or free() due to
> memory corruption. Such a function holds the global mutex lock.
> 2. The first call of fprintf(), it internally calls malloc() to obtain a
> memory area as a text buffer. Then a dead lock occurs.
>
> For that topic, I would be posing a question later.


Yes, OOo's signal handler code is horribly broken.  I do not know whether
its original authors were unaware of the gross violations of correct
programming that are taking place here, but general consensus appears to be
that the code happens to work (by accident, I would say) most of the time,
and sometimes just locks up even worse than the crash that caused the signal
handler to be called in the first place.

Anyway, this should be cleaned up one day (and is indeed a topic for a
thread of its own).


> Please have a look at an additional code fragment in the destructor above:
>
>if ( Applicatoin::IsMemoryCheckRequested() )
>for ( iterator m_vector )  // Turn them to be a trap
>alter_page_attribute( *it,
> NO_READ_ACCESS|NO_WRITE_ACCESS|NO_EXEC );
>
> 1. soffice.bin is invoked with a new command line option such as
> "-memorycheck"
> 2. Applicatoin::IsMemoryCheckRequested() returns TRUE.
> 3. The memory pages being freed turns to be a trap.
> 4. A problematic code mistakenly attempts to read or write data in the
> already-freed-memory-area.
> 5. The trap sets off the alarm and an interruption is sent by the OS.
> 6. A signal handler in the SAL catches the interruption.
> 7. A crash report that reveals the exact location of the code is made.
>
> We have been cultivating thousands of test scenarios for more than a
> decade.
> Just leave the qatesttool running for a day and night with the option
> -memorycheck.
>
>
> 4. Utilizing the cutting-edge technology invented in the 21th century.
>
> solaris$ cat attempt-of-accessing-the-already-freed-memory-area.c
>
> #include 
> int main()
> {
>char *p = (char *) malloc(10);
>free(p);
>*p = 1;
>return 0;
> }
>
> $ cc -g attempt-of-accessing-the-already-freed-memory-area.c
>
> $ LD_PRELOAD=watchmalloc.so.1 MALLOC_DEBUG=WATCH,RW ./a.out
> Trace/Breakpoint Trap (core dumped)
>
> $ dbx ./a.out core
> ...
> program terminated by signal TRAP (write access watchpoint trap)
> Current function is main
>7   *p = 1;
>
> Is it easy enough?
>

Both approaches above (Applicatoin::IsMemoryCheckRequested and watchmalloc)
are good for debugging buggy software, but I do not think they are very good
answers to the question: "When designing classes like OUString etc., how
should efficiency be balanced against safety and maintainability?"

I understand that you argue that efficiency should be a priority, and safety
can be guaranteed (more or less thoroughly) by testing the code with the
mechanisms outlined above.

I rather argue that the abstractions available to the programmers should be
as safe as possible (even if that costs some efficiency), as programmers
will invariably make mistakes, so the potential for mistakes should be
minimized.  Testing code is all well and important (very much so!), but the
tests cannot find all problems (let alone the fact that test coverage for
OOo is still rather small).

This is probably just another facet of the everlasting dispute between the
dynamic and static typing camps.  I confess I am sold on the benefits of
type theory.

5. 99.9% use cases could be the default.
>
[...]

> In the case above, i.e. in the typical, 99.9% code of OpenOffice.org, I
> don't think multithread awareness is required.
>
> Therefore, the current implem

[dev] Re: refactoring OUString

2011-06-09 Thread tora - Takamichi Akiyama

Sorry, this mail is too long...


On Thu, Jun 9, 2011 at 9:20 AM, tora - Takamichi Akiyama mailto:t...@openoffice.org>> wrote:
That is why I would like to encourage programmers to take care of the life 
time of data.


I know that that statement is controversial.

On 2011/06/09 18:02, Stephan Bergmann wrote:

First of, I am doubtful that encouraging manual memory management is a good 
idea.  Errors in manual memory management probably are the cause for the vast 
majority of severe failures in C/C++ programs.


Please be noticed that I don't say programmers should need to explicitly call 
memory management related functions such as malloc() or free().

Rather, I would like to suggest thinking of the characteristics of the 
questioned data.

1. Delegation of the responsibility to choose a type of memory allocator
To achieve both stability and performance at the same time, I would like to propose "Don't do 
all of it in the SAL", rather "Delegate certain responsibility to its users, i.e. 
programmers."

Who knows the type of life time of data? SAL does? No. The programmers do.

Life time of data
 (1) data lasting until the soffice.bin quits.
 (2) data lasting until a document is closed.
 (3) data lasting until a current thread ends.
 (4) data lasting until a certain task finishes.
 (5) data lasting until a current function call returns.
 (6) data lasting until a current block ends.

Multithread awareness
 (a) data that is shared with more than one threads.
 (b) data that is used in the only this thread.

Asynchronous awareness
 (i) data that is used in a asynchronously called function such as a signal 
handler.
 (ii) data that is used in a normal function.


2. Potential dead lock
A code for crash reporter has a potential, dead lock problem.
http://hg.services.openoffice.org/DEV300/file/tip/sal/osl/unx/backtrace.c
Asynchronous-unsafe functions such as fprintf() are used in the context of 
signal handler.

Consider this situation:
1. A "Segment violation, aka SEGV" occurs in malloc() or free() due to memory 
corruption. Such a function holds the global mutex lock.
2. The first call of fprintf(), it internally calls malloc() to obtain a memory 
area as a text buffer. Then a dead lock occurs.

For that topic, I would be posing a question later.


> Hence, I would always try to abstract from actual memory as much as possible. 
 (Performance considerations are of course valid, but they must be balanced 
against safety and maintainability considerations.)

3. Come up with the exciting measures
There in no need to keep relying on the traditional approaches invented in the 
20th century.

With my experiences from 8 bit processor, I certainly believe the programmers' 
awareness of how memory area is treated is the crucial factor to achieve 
performance, safety, and maintainability at the same time.

I do not have an objection against your idea "abstraction," though.

=
// Slicing cheese and throwing them out at once
#define ALLOCATION_SIZE ( 1024 * 1024 ) // 1MB
#define ALIGNMENT   4

void* SCATTOAO::xmalloc( size_t nSize )
{
nSize = ( ( nSize - 1 ) / ALIGNMENT + 1 ) * ALIGNMENT;
if ( m_nRest < nSize ) {
nAllocationSize = ( ( nSize - 1 ) / ALLOCATION_SIZE + 1 ) * 
ALLOCATION_SIZE;
p = memory_page_allocation( nAllocationSize, PRIVATE|ANONIMOUS );
m_vector.append( Entry( p, nAllocationSize ) );
m_pNose = p;
n_nRest = nAllocationSize;
}
ret = m_pNose;
m_pNose += nSize;  // Slice a block of cheese
m_nRest -= nSize;
return (void *) ret;
}

void SCATTOAO::xfree( void* )
{
// do nothing at all
}

SCATTOAO::~SCATTOAO()
{
if ( Applicatoin::IsMemoryCheckRequested() )
for ( iterator m_vector )  // Turn them to be a trap
alter_page_attribute( *it, NO_READ_ACCESS|NO_WRITE_ACCESS|NO_EXEC );
else
for ( iterator m_vector )  // Throw them at once
memory_page_deallocation( it->m_pAddress, it->m_nSize );
}
=

Please have a look at an additional code fragment in the destructor above:

if ( Applicatoin::IsMemoryCheckRequested() )
for ( iterator m_vector )  // Turn them to be a trap
alter_page_attribute( *it, NO_READ_ACCESS|NO_WRITE_ACCESS|NO_EXEC );

1. soffice.bin is invoked with a new command line option such as "-memorycheck"
2. Applicatoin::IsMemoryCheckRequested() returns TRUE.
3. The memory pages being freed turns to be a trap.
4. A problematic code mistakenly attempts to read or write data in the 
already-freed-memory-area.
5. The trap sets off the alarm and an interruption is sent by the OS.
6. A signal handler in the SAL catches the interruption.
7. A crash report that reveals the exact location of the code is made.

We have been cultivating thousands of test scenarios for more than a decade.
Just leave the qatesttool running for a day and night with the option 
-memorycheck.


4. Utilizing the cutting-edge technology invented in the 21th century.

solaris$ cat attempt-of

[dev] Re: refactoring OUString

2011-06-09 Thread Stephan Bergmann
On Thu, Jun 9, 2011 at 9:20 AM, tora - Takamichi Akiyama <
t...@openoffice.org> wrote:

> That is why I would like to encourage programmers to take care of the life
> time of data.
>

First of, I am doubtful that encouraging manual memory management is a good
idea.  Errors in manual memory management probably are the cause for the
vast majority of severe failures in C/C++ programs.  Hence, I would always
try to abstract from actual memory as much as possible.  (Performance
considerations are of course valid, but they must be balanced against safety
and maintainability considerations.)

What you describe with "Slicing cheese and throwing them out at once" can be
done, but I would not want to do it manually.  There are systems more clever
than C++, building on effect types and region-based memory management, that
exploit such optimizations.  But there, it is the language
implementation---and not the programmer writing a program in that
language---that carries out the proof that keeping data in a region of
memory that is discarded wholesale at a certain point in time is sound.

That said, it might work to map your various levels of data---from "data
lasting until the soffice.bin quits" to "data lasting until a current
function call returns"---to different C++ types with appropriate conversion
functions that potentially need to copy data, to statically ensure sound
memory access while on the one hand allowing to exploit optimized memory
management strategies and on the other hand still being safe if data does
escape from its anticipated level.  Would be a nice experiment.

-Stephan
-- 
-
To unsubscribe send email to dev-unsubscr...@openoffice.org
For additional commands send email to sy...@openoffice.org
with Subject: help


[dev] Re: refactoring OUString

2011-06-09 Thread tora - Takamichi Akiyama

On 2011/06/08 0:22, Niklas Nebel wrote:

Of course we should try to make more use of multiple threads. This isn't a new 
idea either, see 
http://wiki.services.openoffice.org/wiki/Calc/Performance/misc. Christian did 
some experiments with parallel loading a while ago 
(http://blogs.oracle.com/GullFOSS/entry/xml_performance_and_now_for). The 
results for Impress weren't spectacular, but Calc or Writer may be different.


Yep! I am a multithread, data-driven programming lover, too. :-)


On 07.06.2011 13:15, tora - Takamichi Akiyama wrote:

2) Slicing cheese and throwing them out at once
For the internal tasks such as "Save as" and "Export to" we might get
a big advantage. Such a task starts from the framework, calls thousands
of methods, and finally leaves the only single value meaning a SUCCESS
or FAILURE. No String instance involved during the task is needed to be
persistent.


On 2011/06/08 0:22, Niklas Nebel wrote:

Right now, that isn't entirely true. For example, saving might need to 
calculate a formula, and the calculated result is then kept in the cell, in a 
string that continues to be referenced after saving. There might be similar 
cases elsewhere. These would probably have to be moved into a separate step 
before saving. Sounds a bit fragile, but then it could actually save a 
significant amount of time.


That is why I would like to encourage programmers to take care of the life time 
of data.

For instance, in the user scenario below, there might be
 (1) data lasting until the soffice.bin quits.
 (2) data lasting until a document is closed.
 (3) data lasting until a current thread ends.
 (4) data lasting until a certain task finishes.
 (5) data lasting until a current function call returns.

 1. File - New - Spreadsheet
 2. work on it and save it.
 3. File - Close.

In the step 1, construct an instance of memory allocator for (2).
In the step 2, use it to allocate memory chunks lasting as long as the document 
is open.
In the step 3, destroy the allocator to completely free the allocated memory.

Lessons we might have learned:
 We can implement and utilize some purpose oriented memory allocators as well 
as the general, expensive one: malloc() and free().

 Programmers may wisely choose what memory allocator is appropriate for 
questioned data.



On the other hand, now might be a perfect time to discuss "crazy ideas", 
without mundane details getting in the way.


Aha! here is another "crazy ideas" :-)

 https://bitbucket.org/tora/ooo-idea-zstring/src

  memory_allocator_for_zstring.cxx
shows an idea of reusable, cache, memory allocation mechanism for new String class. 
The key concept here is not to actually "free" the memory being freed, but to 
cache it for a later use.

  Reuse the most recently freed memory first so that the Translation Lookaside 
Buffer (TLB) achieves higher hit ratio.

  In contrast, if the oldest freed memory is used first, the entire system 
performance might suffer because the relevant entry is surely absent from the 
TLB and, moreover, the relevant memory page might have been swapped out to a 
disk device.

  vec.hxx
implements a c++ template for cheaply expandable vector.

  test_vec.cxx
demonstrates usage of vec.hxx

Best regards,
Tora
--
-
To unsubscribe send email to dev-unsubscr...@openoffice.org
For additional commands send email to sy...@openoffice.org
with Subject: help


[dev] Re: refactoring OUString

2011-06-07 Thread Mathias Bauer

On 07.06.2011 17:22, Niklas Nebel wrote:

On 07.06.2011 13:15, tora - Takamichi Akiyama wrote:

As many already know, malloc() is too general and too expensive.
Moreover, free() is much more expensive than malloc().
e.g. a source code of malloc() in glibc:
http://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c


We use our own implementation, rtl_allocateMemory (see
sal/rtl/source/alloc*). But of course the point remains valid: Both
allocation and deallocation take time.


Even though current OpenOffice.org runs as a multi-thread process,
it runs as if it is a single thread. So, we could have several options
to implement its underlying memory allocation mechanism for the specific
purposes of OpenOffice.org.


If there was only a single thread, we could get rid of quite some
locking overhead. But in fact, with clipboard, UNO acceptor thread and
such stuff, we have just enough multithreading going on to cause the
overhead, without the benefit of actually doing work in parallel.


Properly using a read-only string class (at least in code that might be 
accessed in multiple threads) could also prevent locking overhead.


Regards,
Mathias

--
Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
OpenOffice.org Engineering at Oracle: http://blogs.sun.com/GullFOSS
Please don't reply to "nospamfor...@gmx.de".
I use it for the OOo lists and only rarely read other mails sent to it.
--
-
To unsubscribe send email to dev-unsubscr...@openoffice.org
For additional commands send email to sy...@openoffice.org
with Subject: help


[dev] Re: refactoring OUString

2011-06-07 Thread Niklas Nebel

On 07.06.2011 13:15, tora - Takamichi Akiyama wrote:

As many already know, malloc() is too general and too expensive.
Moreover, free() is much more expensive than malloc().
e.g. a source code of malloc() in glibc:
http://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c


We use our own implementation, rtl_allocateMemory (see 
sal/rtl/source/alloc*). But of course the point remains valid: Both 
allocation and deallocation take time.



Even though current OpenOffice.org runs as a multi-thread process,
it runs as if it is a single thread. So, we could have several options
to implement its underlying memory allocation mechanism for the specific
purposes of OpenOffice.org.


If there was only a single thread, we could get rid of quite some 
locking overhead. But in fact, with clipboard, UNO acceptor thread and 
such stuff, we have just enough multithreading going on to cause the 
overhead, without the benefit of actually doing work in parallel.


Of course we should try to make more use of multiple threads. This isn't 
a new idea either, see 
http://wiki.services.openoffice.org/wiki/Calc/Performance/misc. 
Christian did some experiments with parallel loading a while ago 
(http://blogs.oracle.com/GullFOSS/entry/xml_performance_and_now_for). 
The results for Impress weren't spectacular, but Calc or Writer may be 
different.



2) Slicing cheese and throwing them out at once
For the internal tasks such as "Save as" and "Export to" we might get
a big advantage. Such a task starts from the framework, calls thousands
of methods, and finally leaves the only single value meaning a SUCCESS
or FAILURE. No String instance involved during the task is needed to be
persistent.


Right now, that isn't entirely true. For example, saving might need to 
calculate a formula, and the calculated result is then kept in the cell, 
in a string that continues to be referenced after saving. There might be 
similar cases elsewhere. These would probably have to be moved into a 
separate step before saving. Sounds a bit fragile, but then it could 
actually save a significant amount of time.



I think the above are just a tip of potential, brilliant ideas.
Let's discuss later this kind of topic once the surrounding situation is
settled.


On the other hand, now might be a perfect time to discuss "crazy ideas", 
without mundane details getting in the way.


Niklas
--
-
To unsubscribe send email to dev-unsubscr...@openoffice.org
For additional commands send email to sy...@openoffice.org
with Subject: help


[dev] Re: refactoring OUString

2011-06-07 Thread tora - Takamichi Akiyama

On 06.06.2011 19:43, tora - Takamichi Akiyama wrote:

And also, please cover the underlying memory allocation mechanism which
would be another key factor for the performance improvement.


On 2011/06/07 3:04, Niklas Nebel wrote:

There's an old suggestion to treat small strings differently, see 
http://wiki.services.openoffice.org/wiki/Uno/Binary/Analysis/String_Performance.


Thank you for the information!

In addition to it, I am wondering if these ideas might help a lot.

As many already know, malloc() is too general and too expensive.
Moreover, free() is much more expensive than malloc().
e.g. a source code of malloc() in glibc:
http://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c

Not only the large number of machine instructions, but also its waste
of memory usage affects the system-wide performance.
In reality, malloc(1) consumes 32 bytes on CentOS 5.4 64 bit kernel:
http://tora-japan.com/wiki/Boundaries_of_memory_allocation_with_malloc%28%29


Even though current OpenOffice.org runs as a multi-thread process,
it runs as if it is a single thread. So, we could have several options
to implement its underlying memory allocation mechanism for the specific
purposes of OpenOffice.org.

1) Memory allocation mechanism used in a kernel
For temporal use, utilize the memory allocation mechanism similar to
the one normally used in a kernel. Use a single bit to hold the status
of memory chunk. e.g: 0 means vacant ; 1 denotes occupied. The size of
memory chunk could be 128, 256, 512, 1024, ...

2) Slicing cheese and throwing them out at once
For the internal tasks such as "Save as" and "Export to" we might get
a big advantage. Such a task starts from the framework, calls thousands
of methods, and finally leaves the only single value meaning a SUCCESS
or FAILURE. No String instance involved during the task is needed to be
persistent.

#define ALLOCATION_SIZE ( 1024 * 1024 ) // 1MB
#define ALIGNMENT   4

void* SCATTOAO::xmalloc( size_t nSize )
{
nSize = ( ( nSize - 1 ) / ALIGNMENT + 1 ) * ALIGNMENT;
if ( m_nRest < nSize ) {
nAllocationSize = ( ( nSize - 1 ) / ALLOCATION_SIZE + 1 ) * 
ALLOCATION_SIZE;
p = memory_page_allocation( nAllocationSize, PRIVATE|ANONIMOUS );
m_vector.append( Entry( p, nAllocationSize ) );
m_pNose = p;
n_nRest = nAllocationSize;
}
ret = m_pNose;
m_pNose += nSize;  // Slice a block of cheese
m_nRest -= nSize;
return (void *) ret;
}

void SCATTOAO::xfree( void* )
{
// do nothing at all
}

SCATTOAO::~SCATTOAO()
{
for ( iterator m_vector )  // Throw them at once
memory_page_deallocation( it->m_pAddress, it->m_nSize );
}

The instance of allocator class SCATTOAO is a thread specific object and
it is used by the only own thread. Therefore, no mutex lock is required.

I think the above are just a tip of potential, brilliant ideas.
Let's discuss later this kind of topic once the surrounding situation is 
settled.

Best regards,
Tora
--
-
To unsubscribe send email to dev-unsubscr...@openoffice.org
For additional commands send email to sy...@openoffice.org
with Subject: help


[dev] Re: refactoring OUString

2011-06-07 Thread Christian Lippka

Am 06.06.2011 17:27, schrieb Michael Stahl:

On 06.06.11 16:35, tora - Takamichi Akiyama wrote:

Has anyone tried refactoring OUString?

   - It converts iso-8859-1 letters ranging 0x00-0x7f into UCS2 even it is not 
necessary.
   - It requires malloc(), realloc(), and free() or their equivalents.
   - It prevents debugging efforts because of sal_Unicode buffer[1].
   - It mixtures different purposes: passing/returning parameters and 
long-lasting data.
   - and else...


hi Tora,

refactoring OUString has to be done carefully because it is a central part
of the URE API/ABI and those must be compatible.

I would put that "must be" up for discussion.


a number of people here have come to the conclusion that it would be an
improvement to use ::rtl::OString with UTF8 encoding as the standard
string type, but unfortunately this would be an enormous effort to change,
and it would mean breaking the backward compatibility of the C++ UNO
binding, so it was never likely to actually happen.

Still desirable project

Regards,
Christian

--
-
To unsubscribe send email to dev-unsubscr...@openoffice.org
For additional commands send email to sy...@openoffice.org
with Subject: help


[dev] Re: refactoring OUString

2011-06-06 Thread Niklas Nebel

On 06.06.2011 19:43, tora - Takamichi Akiyama wrote:

And also, please cover the underlying memory allocation mechanism which
would be another key factor for the performance improvement.


There's an old suggestion to treat small strings differently, see 
http://wiki.services.openoffice.org/wiki/Uno/Binary/Analysis/String_Performance.


Niklas
--
-
To unsubscribe send email to dev-unsubscr...@openoffice.org
For additional commands send email to sy...@openoffice.org
with Subject: help


[dev] Re: refactoring OUString

2011-06-06 Thread tora - Takamichi Akiyama

On 06.06.2011 18:57, tora - Takamichi Akiyama wrote:

This is just an idea. How about adding a new class besides OUString?


On 2011/06/07 2:16, Mathias Bauer wrote:

We already have enough string classes. :-)


Yes, we have! :-)


Besides that, you are right, rtl::OUString is stupid. We planned to discuss its 
replacement in the context of a future OOo 4.0 release, allowing for some 
incompatibility here. If done properly, the changes would require only 
recompilation of in-process C++ code.


Sounds nice!

And also, please cover the underlying memory allocation mechanism which would 
be another key factor for the performance improvement.

One more thing. This might be controversial. But, IMHO, it would be better if a 
programmer takes care of the life duration of a string instance. Is it for a 
temporal use, or persistent use? I would like to say the new String class might 
offer certain ways to take care of its life duration.


But as you know, we are faced with a completely new situation for the OOo 
future. So we should postpone discussing this topic until the dust has settled.


Yep, we should postpone this exciting topic!

Thank you for your time, Michael, Mathias.

Best regards,
Tora
--
-
To unsubscribe send email to dev-unsubscr...@openoffice.org
For additional commands send email to sy...@openoffice.org
with Subject: help


[dev] Re: refactoring OUString

2011-06-06 Thread Mathias Bauer

On 06.06.2011 18:57, tora - Takamichi Akiyama wrote:

On 2011/06/07 0:27, Michael Stahl wrote:
 > refactoring OUString has to be done carefully because it is a central
part
 > of the URE API/ABI and those must be compatible.
 >
 > a number of people here have come to the conclusion that it would be an
 > improvement to use ::rtl::OString with UTF8 encoding as the standard
 > string type, but unfortunately this would be an enormous effort to
change,
 > and it would mean breaking the backward compatibility of the C++ UNO
 > binding, so it was never likely to actually happen.
 >
 > so far we haven't even got rid of the tools strings... sigh.

I see.

This is just an idea. How about adding a new class besides OUString?

We already have enough string classes. :-)

Besides that, you are right, rtl::OUString is stupid. We planned to 
discuss its replacement in the context of a future OOo 4.0 release, 
allowing for some incompatibility here. If done properly, the changes 
would require only recompilation of in-process C++ code.


But as you know, we are faced with a completely new situation for the 
OOo future. So we should postpone discussing this topic until the dust 
has settled.


Regards,
Mathias

--
Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
OpenOffice.org Engineering at Oracle: http://blogs.sun.com/GullFOSS
Please don't reply to "nospamfor...@gmx.de".
I use it for the OOo lists and only rarely read other mails sent to it.
--
-
To unsubscribe send email to dev-unsubscr...@openoffice.org
For additional commands send email to sy...@openoffice.org
with Subject: help


[dev] Re: refactoring OUString

2011-06-06 Thread tora - Takamichi Akiyama

On 2011/06/07 0:27, Michael Stahl wrote:
> refactoring OUString has to be done carefully because it is a central part
> of the URE API/ABI and those must be compatible.
>
> a number of people here have come to the conclusion that it would be an
> improvement to use ::rtl::OString with UTF8 encoding as the standard
> string type, but unfortunately this would be an enormous effort to change,
> and it would mean breaking the backward compatibility of the C++ UNO
> binding, so it was never likely to actually happen.
>
> so far we haven't even got rid of the tools strings... sigh.

I see.

This is just an idea. How about adding a new class besides OUString?

class ZString
{
sal_Char*buffer;
sal_Int32   length;
sal_uInt16  type;
rtl_TextEncodingencoding;
oslInterlockedCount refCount;
};

 - Gradually shift to the new one ZString, if applicable, in place of OUString 
and OString.
 - "type" might be an ID number denoting "const char*" "char *" "const sal_Unicode 
*", ...
 - "encoding" is an encoding id defined in "rtl/textenc.h"
 - refCount, assignment, copy constructor, ... would be done in the same manner.
 - No encoding conversion will be done until the conversion is really demanded.
 - Use arrays as a memory pool for the fixed-sized structure ZString.
 - ...

e.g
1. String literal that is treated as it is

 ZString a( "xyz" );
   buffer directly points to "xyz"
   no memory allocation neither data copy is involved until encoding conversion 
is demanded.
   length is left uninitialized in this case, but will be measured and cached 
upon being requested.
   type denotes "const char*, zero terminated"
   encoding might be ASCII_US or UTF-8; which might depend on the OS and 
compiler.

 (debugger) print a.buffer  ... prints "xyz"

2. Receiving a result string from a callee in a storage allocated by alloca() 
instead of malloc()

 ZString temp( 100, RTL_ALLOCA );
 func( temp );

 func( ZString& x )
 {
x = "abc";
 }

 In a destructor of temp above,
   if a reference count is 1, nothing special would be done and the allocated 
memory in the stack area will be automatically freed upon returning to the 
upper frame.
   if a reference count is more than 1, then memory allocation and data copy 
will be involved.

Best,
Tora


On 06.06.11 16:35, tora - Takamichi Akiyama wrote:

Has anyone tried refactoring OUString?

   - It converts iso-8859-1 letters ranging 0x00-0x7f into UCS2 even it is not 
necessary.
   - It requires malloc(), realloc(), and free() or their equivalents.
   - It prevents debugging efforts because of sal_Unicode buffer[1].
   - It mixtures different purposes: passing/returning parameters and 
long-lasting data.
   - and else...

--
-
To unsubscribe send email to dev-unsubscr...@openoffice.org
For additional commands send email to sy...@openoffice.org
with Subject: help


[dev] Re: refactoring OUString

2011-06-06 Thread Michael Stahl
On 06.06.11 16:35, tora - Takamichi Akiyama wrote:
> Has anyone tried refactoring OUString?
> 
>   - It converts iso-8859-1 letters ranging 0x00-0x7f into UCS2 even it is not 
> necessary.
>   - It requires malloc(), realloc(), and free() or their equivalents.
>   - It prevents debugging efforts because of sal_Unicode buffer[1].
>   - It mixtures different purposes: passing/returning parameters and 
> long-lasting data.
>   - and else...

hi Tora,

refactoring OUString has to be done carefully because it is a central part
of the URE API/ABI and those must be compatible.

a number of people here have come to the conclusion that it would be an
improvement to use ::rtl::OString with UTF8 encoding as the standard
string type, but unfortunately this would be an enormous effort to change,
and it would mean breaking the backward compatibility of the C++ UNO
binding, so it was never likely to actually happen.

so far we haven't even got rid of the tools strings... sigh.

regards,
 michael

-- 
"One of [the Middle Ages'] characteristics was that 'reasoning by analogy'
 was rampant; another characteristic was almost total intellectual stag-
 nation, and we now see why the two go together. [...] by developing a
 keen ear for unwarranted analogies, one can detect a lot of medieval
 thinking today." -- Edsger W. Dijkstra

-- 
-
To unsubscribe send email to dev-unsubscr...@openoffice.org
For additional commands send email to sy...@openoffice.org
with Subject: help