On Mon, 11 Feb 2008, Saulius Zrelskis wrote: > I always compiling xHarbour with BCC -a4 alignment; sizeof(HB_ITEM) = 24 > and memory(HB_MEM_STACK) / memory(HB_MEM_STACKITEMS) = 24. > But never noticed any anomaly in OLE work. Can you help me how to make > sure with this?? Till now I think, that it is enough for OLE structures > to be compiling with -a8 and they are nohow related with [x]Harbour > structures/internals. > OLE include files have their suitable #pragma option directives with > restoring _all_ initial settings, so different alignment seems as if > provided by compiler...
I also thought that it should work but people reported problems that it GPFs when compiled with -a4 and with -a8 it works. Maybe it's result of some other problems and -a4 can only exploit them in some cases. I'm not MS-Windows user I cannot make such test myself. I can run most of windows application in my Linux box using WINE but to make some tests with OLE I need to install some windows applications which are OLE servers and even if I'll make test then I will never be sure if some possible problems comes from HBOLE implementation or from WINE or from 3-rd party application which is executed in foreign environment. It's highly possible that you are right. Below I'm attaching the message I sent to Ron yesterday. Today I've looked more carefully for the HBOLE code and I've found that the memory leak is false alarm because there is a call to VariantClear() which should release BSTR blocks. Anyhow some other things for sure needs serious cleanup. We need much cleaner functions to convert between HB_ITMEs and VARIANTs. If I'll find some spare time then I'll read something about OLE implementation and maybe I'll create such code. But first I will have to know more about OLE. I do not like to write code when I do not know what exactly it should does and what is expected behavior in some unusual conditions. It could be source of stupid mistakes like VariantClear() call I missed when I was writing message to Ron. best regards, Przemek ----- Forwarded message from Przemyslaw Czerpak <[EMAIL PROTECTED]> ----- > I received a note about your comments for win32ole.prg. As you might > know I'm not conscious of such low level details, except when > intensely focusing on specific issue. Can you please help me save > time by explaining the alignment requirement? I always used BCC, > MSVC, and xCC with default make files/alignment, and I'm not aware of > GPF traps. Will greatly appreciate your help. Because you were able to recompile whole code from source and any 3-rd party binaries you were using were compiled with the same alignment. But it does not mean that the default alignment is optimal. F.e. BCC uses 8 byte alignment by default. It causes the some internal structures are much bigger then they should be. Very good example is HB_ITEM structure. In xHarbour the size of HB_ITEM compiled by BCC with default switches is 32 now. In Harbour it's 24 because some internal things were implemented in different way. But Harbour in Harbour access to HB_ITEM members is blocked. PHB_ITEM is mapped to void* for non core code. Just like to most of other internal structures so users code has to execute documented API functions and can be compiled with any alignment different then the HVM ones. So many people uses -a4 BCC switch. It causes that HB_ITEM allocates 16 bytes in Harbour. If you coampre it to xHarbour default size (32) then you will find that programs which needs a lot of arrays or objects allocates about half of memory necesary for xHarbour. The operations on item like copy or move have to update half less bytes. In some cases like AINS, ADEL, ASIZE it can give huge speed difference. Farther current CPUs uses cash more efficient because smaller data increase the chance that it can be access from L1, L2, L3 cache. It gives additional improvement. If you make some test then code which does not use expensive array operation is 2-3 percent faster if HVM is compiled with -a4 (and this improvemnt will be noticeable in all applications) but code with a lot of AINS, ADEL can be 30% or more faster. I can easy create some tests where is will be hardly noticeable. And now the problem. HBOLE code access HB_ITEM internals. It means that it has to be compiled with the same alignment switches as HVM. But it also uses OLE variant structures which in BCC have to be compiled with -a8 or application will GPF. Try to recompile xHarbour using BCC with -a4 and use some OLE code. You will see what happens. It means that this code blocks compiling HVM with optimal alignment switches. Maybe in this case it will be possible to create workaround and add some #pragma pack before #include <ole2.h> #include <oleauto.h> but at lest in my version of BCC55 header files such pragmas exists in them but it does not help. At lease Enrico and other users reported that HBOLE code GPFs when [x]Harbour is compiled with -a4. I cannot test it myself because OLE code cannot be executed in my Linux box but it seems to be such simple tests that I believe them. Maybe the problem is inside other header files included by above two files. If you can make some test move #include for these files before windows.h or any other windows/bcc headers then maybe it will work. As I said I cannot test it myself. Anyhow this is workaround for the problem caused by direct accessing of HB_ITEM. Instead of passing pointers to HB_ITEM members directly when variable is passed by reference or when string item is passed we should allocate array with unions of all possible values and pass pointers to member of this union in hb_oleItemToVariant(). Then simply copy results and free allocated resources in FreeParams(). The modification is small and it will be necessary to change type of aPrgParams but because I cannot test it I do not want to make it myself. Code will be cleaner and will not depend on alignment switches used to compile HVM so it will work with any version. Farther you will probably find some memory leaks and possible GPFs which are hard to see now IMHO due to not clean access of HB_ITEM members. F.e. in FreeParams(): case VT_BYREF | VT_BSTR: SysFreeString( *pVariant->n1.n2.n3.pbstrVal ); sString = hb_oleWideToAnsi( *( pVariant->n1.n2.n3.pbstrVal ) ); hb_itemPutCPtr( pItem, sString, strlen( sString ) ); break; first memory is freed then freed pointer is passed to hb_oleWideToAnsi() It may cause any unpredictable results. Like corrupted results (free operation may damage some part of freed data) or GPF if it was bigger memory area which were allocating few system pages and inside SysFreeString() these pages were returned to OS. If you will try to access them in hb_oleWideToAnsi() then you will have GPF because you will access memory which do not longer belong to the process. Few lines below is: case VT_BSTR: sString = hb_oleWideToAnsi( pVariant->n1.n2.n3.bstrVal ); hb_itemPutCPtr( pItem, sString, strlen( sString ) ); break; what seems to be unreported memory leak because we do not know what happend with memory allocated by hb_oleItemToVariant() in: case HB_IT_STRING: case HB_IT_MEMO: [...] } else { pVariant->n1.n2.vt = VT_BSTR; pVariant->n1.n2.n3.bstrVal = hb_oleAnsiToSysString( sString ); } break; I'm not OLE specialist but IMHO it should be freed by caller process. If receiver may changed the variant value then we do not have original pointer and we cannot free pVariant->n1.n2.n3.bstrVal. BTW if receiver can change pVariant->n1.n2.n3.pbstrVal then then even SysFreeString() above seems to be not safe. Separate array with unions for all possible types where we can keep addresses of allocated resouces will resolve all such problems and it's yet another reason why we should have it. And yet another thing. In the future we will have to add unicode support to HVM. The API should be VM independent and 3-rd party code developers should not have to worry about type od passed string and make all conversions manually respecting HVM CP (now whole OLE code ignores HVM CP setting and uses MultiByteToWideChar() only which operates on default Windows CP not HVM what is yet another problem). It can be very easy reached by adding set of functions to access converted strings, f.e. hb_parc_utf8(), hb_parc_u16(), etc (I'm attaching message I send to XHGTK devel list about it). It means that all code which hacks string HB_ITEM internals is potently dangerous for future extensions. With separate array of allocated resources I'm talking about it will not be necessary to use hb_itemPutCRawStatic() or similar tricks. In fact you are using hb_itemPutCRawStatic() to make HB_ITEM as temporary pointer holder only. Also returning value will be easier because it will be possible to use hb_storc_u16()/hb_arraySetUC16() and similar new functions. HTH. Best regards, Przemek ----- Forwarded message from Przemyslaw Czerpak <[EMAIL PROTECTED]> ----- From: Przemyslaw Czerpak <[EMAIL PROTECTED]> Subject: Re: [xhgtk-developers] Code page conversion To: Rodrigo Miguel <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 09:26:31 +0100 Lines: 142 On Wed, 05 Dec 2007, Rodrigo Miguel wrote: > All, > There are something that I'm already worried about. > I'm using postgres database using LATIN1 (ISO-8859-1) and my Application > code (like accents) are in windows/ISO-8859-1. But GTK uses UTF8, so, > how It will be handled by hb_parcutf8()? > I'm just wondering when I have some code in linux/windows and vice-versa, > about that complete confusion that can become. Hi Rodrigo, 1-st I would like to clarify one thing. In other message you wrote: > when I write a code using f.e. Ubuntu Linux, there's no code changes > or conversion made by xhgtk_locale*, but it's required when my code > has windows encoding These means that you use in your Linux box editors which stores files using UTF8 encoding. And of curse it's not necessary to make additional translation inside xhgtk layer because strings are already in UTF8. But HVM does not know anything about it. Functions like LEN(), SUBSTR(), STUFF(), STRPEEK(), STRPOKE(), UPPER(), LOWER(), ISALPHA(), etc. do not operate on letters but on allocated bytes. Comparison operators ( <, <=, >, >= ) do not respect national characters, indexes are not well sorted and all other libraries should expect UTF8 strings instead of some CP encoding. In summary XHGTK can show text with national characters you put in your editor without any problems but other code is not ready to accept such strings. We do not have native support for Unicode strings in HVM yet so at least now we should not use UTF8 in source code encoding. In the future I'll add to compiler support for encoding conversion during compilation but now you should use some CP valid for your country. So if you have source code encoded in UTF-8 then I suggest to convert it to some CP, f.e. ISO-8859-1 using iconv. Then you should inform HVM about encoding used by your source code using HB_SETCODEPAGE() command. This command will also set national characters and collation rules. F.e. if you are leaving in Spain you can use: REQUEST HB_CODEPAGE_ESWIN // force linking ESWIN CP. HB_SETCODEPAGE( "ESWIN" ) // set ESWIN CP in HVM HB_SETCODEPAGE() makes yet another important thing - it informs compiler about Unicode values of characters in strings - with each CP is bound Unicode table. It means that now HVM can make automatic translation from normal CP strings to Unicode ones in different representation, f.e. to UTF8 and because internally strings are in CP encoding then all other character oriented functions and subsystems are working well. At .prg level you can translate strings to/from UTF8 using HB_STRTOUTF8()/HB_UTF8TOSTR() functions, f.e.: proc main() local s1, s2 s1 := "ĄĆĘŁŃÓŚŹŻ" REQUEST HB_CODEPAGE_PLISO HB_SETCODEPAGE( "PLISO" ) ? LEN( s1 ) s2 := HB_STRTOUTF8( s1 ) ? LEN( s2 ), HB_UTF8LEN( s2 ) return I added set of functions for conversions and operations on UTF8 strings to Harbour: 2007-06-23 11:10 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl) + added two prg functions for translations from/to UTF-8: HB_STRTOUTF8( <cStr> [, <cCPID> ] ) -> <cUTF8Str> HB_UTF8TOSTR( <cUTF8Str> [, <cCPID> ] ) -> <cStr> <cCPID> is Harbour codepage id, f.e.: "EN", "ES", "ESWIN", "PLISO", "PLMAZ", "PL852", "PLWIN", ... When not given then default HVM codepage (set by HB_SETCODEPAGE()) is used. 2007-07-18 21:30 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl) + added .prg functions: HB_UTF8SUBSTR(), HB_UTF8LEFT(), HB_UTF8RIGHT(), HB_UTF8LEN(), HB_UTF8PEEK() They are working like corresponding functions: SUBSTR(), LEFT(), RIGHT(), LEN(), STRPEEK() but operates on UTF-8 strings. 2007-08-14 15:22 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl) + added HB_UTF8TRAN(), HB_UTF8STUFF(), HB_UTF8POKE() and Phil added this code to xHarbour so they are working on both compilers. But they are for .prg code. I also plan to add to extend C API functions which will operate on Unicode strings. And here will appear hb_parcutf8() and similar functions. Now XHGTK uses xhgtk_locale_to_utf8() function to convert HVM strings to UTF8. This function uses internaly g_locale_to_utf8() for string encoding. It causes two bad side effects: 1. it needs exactly the same source code encoding as used in LOCALE setting. It's impossible to take source code with different encoding and set valid HB_SETCODEPAGE() because it does not know anything about HVM and it's internal setting and how strings should be translated. 2. when we will have native support for Unicode strings in HVM then you will have to update XHGTK code to use Unicode strings. I would like to eliminate this problem so code which you will create now will work with current HVM and the future one without any modifications and all necessary translations will be done internally by Harbour core functions. How will it work? In very simple way. I'll add set of functions which will operate on Unicode strings (using different encoding). F.e. for UTF-8 I'll add: char * hb_parc_utf8( int iParam ); it will return string in UTF-8 encoding. If passed item contains string in CP encoding then it will be translated to UTF8 using Unicode table set by HB_SETCODEPAGE() function and then return. In the future when we will have native support for Unicode and string item will be in UTF8 then it will be returned without any translations. For you as 3-rd party code programmer it's absolutely unimportant what operations will be done inside hb_parc_utf8() You only asks about string in UTF8 encoding and you will receive it. I'll also add: ULONG hb_parclen_utf8( int iParam ); void hb_retc_utf8( const char *szUtf8Value ); void hb_storc_utf8( const char *szUtf8Value, int iParam, ... ); and item oriented functions, f.e.: char * hb_itemGetCUTF8( PHB_ITEM pItem ); PHB_ITEM hb_itemPutCUTF8( PHB_ITEM pItem, const char * szUtf8Value ); PHB_ITEM hb_itemPutCLUTF8( PHB_ITEM pItem, const char * szUtf8Value, ULONG ulBytesLen ); As long as we will not have native Unicode support these functions will always make internally translations from/to character strings using Unicode table bound with CP set by HB_SETCODEPAGE() function. In the future the translation will be done only if necessary and also normal character functions like hb_parc() will make translation. F.e. when you pass string item encoded in UTF8 and call hb_parc() then it will be automatically translated from UTF8 to character string. But all such translations will be done inside Harbour core code and you will not have to worry about them and/or change your C source code. best regards Przemek ----- End forwarded message ----- ----- End forwarded message ----- _______________________________________________ Harbour mailing list Harbour@harbour-project.org http://lists.harbour-project.org/mailman/listinfo/harbour