On Mon, 11 Feb 2008, Saulius Zrelskis wrote:
> I always compiling xHarbour with BCC -a4 alignment; sizeof(HB_ITEM) = 24
> and memory(HB_MEM_STACK) / memory(HB_MEM_STACKITEMS) = 24.
> But never noticed any anomaly in OLE work. Can you help me how to make
> sure with this?? Till now I think, that it is enough for OLE structures
> to be compiling with -a8 and they are nohow related with [x]Harbour
> structures/internals.
> OLE include files have their suitable #pragma option directives with
> restoring _all_ initial settings, so different alignment seems as if
> provided by compiler...

I also thought that it should work but people reported problems
that it GPFs when compiled with -a4 and with -a8 it works.
Maybe it's result of some other problems and -a4 can only exploit
them in some cases. I'm not MS-Windows user I cannot make such test
myself. I can run most of windows application in my Linux box using
WINE but to make some tests with OLE I need to install some windows
applications which are OLE servers and even if I'll make test then
I will never be sure if some possible problems comes from HBOLE
implementation or from WINE or from 3-rd party application which
is executed in foreign environment. It's highly possible that you
are right. Below I'm attaching the message I sent to Ron yesterday.
Today I've looked more carefully for the HBOLE code and I've found
that the memory leak is false alarm because there is a call to
VariantClear() which should release BSTR blocks. Anyhow some other
things for sure needs serious cleanup. We need much cleaner functions
to convert between HB_ITMEs and VARIANTs. If I'll find some spare time
then I'll read something about OLE implementation and maybe I'll create
such code. But first I will have to know more about OLE. I do not
like to write code when I do not know what exactly it should does
and what is expected behavior in some unusual conditions. It could
be source of stupid mistakes like VariantClear() call I missed when
I was writing message to Ron.

best regards,
Przemek


----- Forwarded message from Przemyslaw Czerpak <[EMAIL PROTECTED]> -----
> I received a note about your comments for win32ole.prg. As you might  
> know I'm not conscious of such low level details, except when  
> intensely focusing on specific issue. Can you please help me save  
> time by explaining the alignment requirement? I always used BCC,  
> MSVC, and xCC with default make files/alignment, and I'm not aware of  
> GPF traps. Will greatly appreciate your help.

Because you were able to recompile whole code from source and any
3-rd party binaries you were using were compiled with the same
alignment. But it does not mean that the default alignment is
optimal. F.e. BCC uses 8 byte alignment by default. It causes
the some internal structures are much bigger then they should be.
Very good example is HB_ITEM structure. In xHarbour the size of
HB_ITEM compiled by BCC with default switches is 32 now. In Harbour
it's 24 because some internal things were implemented in different
way. But Harbour in Harbour access to HB_ITEM members is blocked.
PHB_ITEM is mapped to void* for non core code. Just like to most
of other internal structures so users code has to execute documented
API functions and can be compiled with any alignment different then
the HVM ones. So many people uses -a4 BCC switch. It causes that
HB_ITEM allocates 16 bytes in Harbour. If you coampre it to xHarbour
default size (32) then you will find that programs which needs
a lot of arrays or objects allocates about half of memory necesary
for xHarbour. The operations on item like copy or move have to update
half less bytes. In some cases like AINS, ADEL, ASIZE it can give
huge speed difference. Farther current CPUs uses cash more efficient
because smaller data increase the chance that it can be access from
L1, L2, L3 cache. It gives additional improvement. If you make some
test then code which does not use expensive array operation is 2-3
percent faster if HVM is compiled with -a4 (and this improvemnt
will be noticeable in all applications) but code with a lot of AINS,
ADEL can be 30% or more faster. I can easy create some tests where
is will be hardly noticeable. And now the problem. HBOLE code access
HB_ITEM internals. It means that it has to be compiled with the same
alignment switches as HVM. But it also uses OLE variant structures
which in BCC have to be compiled with -a8 or application will GPF.
Try to recompile xHarbour using BCC with -a4 and use some OLE code.
You will see what happens. It means that this code blocks compiling
HVM with optimal alignment switches. Maybe in this case it will be
possible to create workaround and add some #pragma pack before
   #include <ole2.h>
   #include <oleauto.h>
but at lest in my version of BCC55 header files such pragmas exists
in them but it does not help. At lease Enrico and other users reported
that HBOLE code GPFs when [x]Harbour is compiled with -a4. I cannot test
it myself because OLE code cannot be executed in my Linux box but it
seems to be such simple tests that I believe them. Maybe the problem
is inside other header files included by above two files. If you
can make some test move #include for these files before windows.h or
any other windows/bcc headers then maybe it will work. As I said
I cannot test it myself. Anyhow this is workaround for the problem
caused by direct accessing of HB_ITEM. Instead of passing pointers
to HB_ITEM members directly when variable is passed by reference or
when string item is passed we should allocate array with unions of
all possible values and pass pointers to member of this union in
hb_oleItemToVariant(). Then simply copy results and free allocated
resources in FreeParams(). The modification is small and it will be
necessary to change type of aPrgParams but because I cannot test
it I do not want to make it myself. Code will be cleaner and will
not depend on alignment switches used to compile HVM so it will work
with any version. Farther you will probably find some memory leaks
and possible GPFs which are hard to see now IMHO due to not clean
access of HB_ITEM members. F.e. in FreeParams():
        case VT_BYREF | VT_BSTR:
          SysFreeString( *pVariant->n1.n2.n3.pbstrVal );
          sString = hb_oleWideToAnsi( *( pVariant->n1.n2.n3.pbstrVal ) );
          hb_itemPutCPtr( pItem, sString, strlen( sString ) );
          break;
first memory is freed then freed pointer is passed to hb_oleWideToAnsi()
It may cause any unpredictable results. Like corrupted results (free
operation may damage some part of freed data) or GPF if it was bigger
memory area which were allocating few system pages and inside SysFreeString()
these pages were returned to OS. If you will try to access them in
hb_oleWideToAnsi() then you will have GPF because you will access memory
which do not longer belong to the process. Few lines below is:
         case VT_BSTR:
           sString = hb_oleWideToAnsi( pVariant->n1.n2.n3.bstrVal );
           hb_itemPutCPtr( pItem, sString, strlen( sString ) );
           break;
what seems to be unreported memory leak because we do not know what
happend with memory allocated by hb_oleItemToVariant() in:
        case HB_IT_STRING:
        case HB_IT_MEMO:
          [...]
          }
          else
          {
             pVariant->n1.n2.vt   = VT_BSTR;
             pVariant->n1.n2.n3.bstrVal = hb_oleAnsiToSysString( sString );
          }
          break;
I'm not OLE specialist but IMHO it should be freed by caller process.
If receiver may changed the variant value then we do not have original
pointer and we cannot free pVariant->n1.n2.n3.bstrVal. BTW if receiver
can change pVariant->n1.n2.n3.pbstrVal then then even SysFreeString()
above seems to be not safe. Separate array with unions for all possible
types where we can keep addresses of allocated resouces will resolve
all such problems and it's yet another reason why we should have it.
And yet another thing. In the future we will have to add unicode support
to HVM. The API should be VM independent and 3-rd party code developers
should not have to worry about type od passed string and make all
conversions manually respecting HVM CP (now whole OLE code ignores
HVM CP setting and uses MultiByteToWideChar() only which operates
on default Windows CP not HVM what is yet another problem). It can
be very easy reached by adding set of functions to access converted
strings, f.e. hb_parc_utf8(), hb_parc_u16(), etc (I'm attaching message
I send to XHGTK devel list about it). It means that all code which hacks
string HB_ITEM internals is potently dangerous for future extensions.
With separate array of allocated resources I'm talking about it will
not be necessary to use hb_itemPutCRawStatic() or similar tricks.
In fact you are using hb_itemPutCRawStatic() to make HB_ITEM as
temporary pointer holder only. Also returning value will be easier
because it will be possible to use hb_storc_u16()/hb_arraySetUC16()
and similar new functions.

HTH.

Best regards,
Przemek



----- Forwarded message from Przemyslaw Czerpak <[EMAIL PROTECTED]> -----
From: Przemyslaw Czerpak <[EMAIL PROTECTED]>
Subject: Re: [xhgtk-developers] Code page conversion
To: Rodrigo Miguel <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Date: Thu, 6 Dec 2007 09:26:31 +0100
Lines: 142

On Wed, 05 Dec 2007, Rodrigo Miguel wrote:
> All,
> There are something that I'm already worried about.
> I'm using postgres database using LATIN1 (ISO-8859-1) and my Application
> code (like accents) are in windows/ISO-8859-1. But GTK uses UTF8, so,
> how It will be handled by hb_parcutf8()?
> I'm just wondering when I have some code in linux/windows and vice-versa,
> about that complete confusion that can become.

Hi Rodrigo,

1-st I would like to clarify one thing. In other message you wrote:

> when I write a code using  f.e. Ubuntu Linux, there's no code changes
> or conversion made by xhgtk_locale*, but it's required when my code
> has windows encoding

These means that you use in your Linux box editors which stores files
using UTF8 encoding. And of curse it's not necessary to make additional
translation inside xhgtk layer because strings are already in UTF8.
But HVM does not know anything about it. Functions like LEN(), SUBSTR(),
STUFF(), STRPEEK(), STRPOKE(), UPPER(), LOWER(), ISALPHA(), etc. do
not operate on letters but on allocated bytes. Comparison operators
( <, <=, >, >= ) do not respect national characters, indexes are not
well sorted and all other libraries should expect UTF8 strings instead
of some CP encoding. In summary XHGTK can show text with national
characters you put in your editor without any problems but other code
is not ready to accept such strings. We do not have native support for
Unicode strings in HVM yet so at least now we should not use UTF8
in source code encoding. In the future I'll add to compiler support
for encoding conversion during compilation but now you should use
some CP valid for your country. So if you have source code encoded
in UTF-8 then I suggest to convert it to some CP, f.e. ISO-8859-1
using iconv.
Then you should inform HVM about encoding used by your source code
using HB_SETCODEPAGE() command. This command will also set national
characters and collation rules. F.e. if you are leaving in Spain
you can use:
    REQUEST HB_CODEPAGE_ESWIN   // force linking ESWIN CP.
    HB_SETCODEPAGE( "ESWIN" )   // set ESWIN CP in HVM 

HB_SETCODEPAGE() makes yet another important thing - it informs
compiler about Unicode values of characters in strings - with each
CP is bound Unicode table.
It means that now HVM can make automatic translation from normal
CP strings to Unicode ones in different representation, f.e. to UTF8
and because internally strings are in CP encoding then all other
character oriented functions and subsystems are working well.
At .prg level you can translate strings to/from UTF8 using
HB_STRTOUTF8()/HB_UTF8TOSTR() functions, f.e.:

    proc main()
       local s1, s2
       s1 := "ĄĆĘŁŃÓŚŹŻ"
       REQUEST HB_CODEPAGE_PLISO
       HB_SETCODEPAGE( "PLISO" )
       ? LEN( s1 )
       s2 := HB_STRTOUTF8( s1 )
       ? LEN( s2 ), HB_UTF8LEN( s2 )
    return

I added set of functions for conversions and operations on UTF8 strings
to Harbour:

   2007-06-23 11:10 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl)
       + added two prg functions for translations from/to UTF-8:
            HB_STRTOUTF8( <cStr> [, <cCPID> ] ) -> <cUTF8Str>
            HB_UTF8TOSTR( <cUTF8Str> [, <cCPID> ] ) -> <cStr>
         <cCPID> is Harbour codepage id, f.e.: "EN", "ES", "ESWIN",
         "PLISO", "PLMAZ", "PL852", "PLWIN", ...
         When not given then default HVM codepage (set by HB_SETCODEPAGE())
         is used.
   2007-07-18 21:30 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl)
       + added .prg functions: HB_UTF8SUBSTR(), HB_UTF8LEFT(), HB_UTF8RIGHT(),
                               HB_UTF8LEN(), HB_UTF8PEEK()
         They are working like corresponding functions: SUBSTR(), LEFT(),
         RIGHT(), LEN(), STRPEEK() but operates on UTF-8 strings.
   2007-08-14 15:22 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl)
       + added HB_UTF8TRAN(), HB_UTF8STUFF(), HB_UTF8POKE()

and Phil added this code to xHarbour so they are working on both compilers.

But they are for .prg code. I also plan to add to extend C API functions
which will operate on Unicode strings. And here will appear hb_parcutf8()
and similar functions.
Now XHGTK uses xhgtk_locale_to_utf8() function to convert HVM strings to
UTF8. This function uses internaly g_locale_to_utf8() for string encoding.
It causes two bad side effects:
1. it needs exactly the same source code encoding as used in LOCALE setting.
   It's impossible to take source code with different encoding and set valid
   HB_SETCODEPAGE() because it does not know anything about HVM and it's
   internal setting and how strings should be translated.
2. when we will have native support for Unicode strings in HVM then you
   will have to update XHGTK code to use Unicode strings. I would like
   to eliminate this problem so code which you will create now will work
   with current HVM and the future one without any modifications and all
   necessary translations will be done internally by Harbour core functions.

How will it work?
In very simple way. I'll add set of functions which will operate on Unicode
strings (using different encoding). F.e. for UTF-8 I'll add:
    char * hb_parc_utf8( int iParam );
it will return string in UTF-8 encoding. If passed item contains string in
CP encoding then it will be translated to UTF8 using Unicode table set by
HB_SETCODEPAGE() function and then return. In the future when we will have
native support for Unicode and string item will be in UTF8 then it will be
returned without any translations. For you as 3-rd party code programmer
it's absolutely unimportant what operations will be done inside hb_parc_utf8()
You only asks about string in UTF8 encoding and you will receive it.
I'll also add:
    ULONG hb_parclen_utf8( int iParam );
    void hb_retc_utf8( const char *szUtf8Value );
    void hb_storc_utf8( const char *szUtf8Value, int iParam, ... );

and item oriented functions, f.e.:
    char * hb_itemGetCUTF8( PHB_ITEM pItem );
    PHB_ITEM hb_itemPutCUTF8( PHB_ITEM pItem, const char * szUtf8Value );
    PHB_ITEM hb_itemPutCLUTF8( PHB_ITEM pItem, const char * szUtf8Value,
                               ULONG ulBytesLen );

As long as we will not have native Unicode support these functions
will always make internally translations from/to character strings using
Unicode table bound with CP set by HB_SETCODEPAGE() function.
In the future the translation will be done only if necessary and also
normal character functions like hb_parc() will make translation. F.e.
when you pass string item encoded in UTF8 and call hb_parc() then
it will be automatically translated from UTF8 to character string.
But all such translations will be done inside Harbour core code and
you will not have to worry about them and/or change your C source code.

best regards
Przemek
----- End forwarded message -----
----- End forwarded message -----
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Reply via email to