On Thu, Jul 16, 2009 at 8:26 PM, Nathan Beyer<ndbe...@apache.org> wrote: > On Thu, Jul 16, 2009 at 8:18 PM, Charles Lee<littlee1...@gmail.com> wrote: >> Hi Nathan, >> >> What I got is 936, the code page identifier. Is there a api for us to map >> 936 to the gb2312? > > Oh, the 'identifier' bit was missing - yeah, we'll need to translate > that into a name of some sort. I'll poke around a bit and see what I > can find.
We'll probably just have to put in a mapping ourselves based on the documentation. We'd call GetACP [1] and map that to a known alias in java.nio.charset that matches the definitions[2] of the identifiers. [1] http://msdn.microsoft.com/en-us/library/dd318070%28VS.85%29.aspx [2] http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx > >> If we put 936 in the file.encoding, can we successfully get the encoder and >> decoder by charset? >> >> On Fri, Jul 17, 2009 at 9:05 AM, Nathan Beyer <ndbe...@apache.org> wrote: >> >>> On Thu, Jul 16, 2009 at 1:28 AM, Charles Lee<littlee1...@gmail.com> wrote: >>> > Hi guys, >>> > >>> > I have add the locale function in the drlvm, the patch is attached. >>> Please >>> > try this new patch on the linux. >>> > >>> > The patch should work on the linux but fail on the windows. Because >>> windows >>> > returns code page not charset from the setlocale. >>> >>> Code page and character set are the same thing. We shouldn't need to >>> convert it as the Charset APIs will have to support the values anyway. >>> >>> What's the value you're getting? If it's 'Cp1252', then we're good, as >>> that's just an alias for 'Windows-1252' (or vice-versa). >>> >>> -Nathan >>> >>> >>> > I hv tried long time to >>> > get the charset name from the codepage, for example: >>> > CPINFOEX cpInfoEx; >>> > BOOL iReturn = GetCPInfoEx(CP_ACP,0, &cPInfoEx); >>> > if (iReturn > 0) { >>> > printf("FULL NAME %s\n", cPinfoEx,CodePageName); >>> > } >>> > But I only get the full name without any format. >>> > >>> > There is code page identifiers map in the msdn, detail here. I may hard >>> code >>> > this map in the file. But the note on the msdn says: >>> > "ANSI code pages can be different on different computers, or can be >>> > changed for a single computer, leading to data corruption. For the most >>> > consistent results, applications should use Unicode, such as UTF-8 or >>> > UTF-16, instead of a specific code page." >>> > I am afraid hard-code will fail on some machines. (By the way, this seems >>> > the UTF-8 is suggested to be the default again :-) >>> > >>> > There is also a class Encoding in the VC++, detail here. But we can not >>> use >>> > it here. >>> > >>> > So anyone knows some thing about locale on the windows? >>> > Again, shall use UTF-8 as our default? >>> > >>> > On Wed, Jul 15, 2009 at 2:12 PM, Charles Lee <littlee1...@gmail.com> >>> wrote: >>> >> >>> >> That seems we should add it in the drlvm. >>> >> >>> >> On Wed, Jul 15, 2009 at 1:58 PM, Regis <xu.re...@gmail.com> wrote: >>> >>> >>> >>> Nathan Beyer wrote: >>> >>>> >>> >>>> Is the IBM VME dealing with this correctly? Do we just need to fix >>> >>>> DRLVM? >>> >>> >>> >>> Yes, I only tested on Linux, IBM VME set the property correctly. >>> >>> >>> >>>> >>> >>>> On Wed, Jul 15, 2009 at 12:25 AM, Regis<xu.re...@gmail.com> wrote: >>> >>>>> >>> >>>>> Kevin Zhou wrote: >>> >>>>>> >>> >>>>>> Yea, from luniglob.c, CL attempts to read the "file.encoding" >>> property >>> >>>>>> adown >>> >>>>>> VM but fails to get the correct encoding. >>> >>>>>> >>> >>>>>> Regis, do you know any other specific ways that CL can gain the >>> right >>> >>>>>> property? >>> >>>>> >>> >>>>> We can get from OS directly. Maybe just read env variables on Linux? >>> >>>>> >>> >>>>>> Wed, Jul 15, 2009 at 9:59 AM, Regis <xu.re...@gmail.com> wrote: >>> >>>>>> >>> >>>>>>> Charles Lee wrote: >>> >>>>>>> >>> >>>>>>>> Hi Nanthan, >>> >>>>>>>> >>> >>>>>>>> If the file encoding derive from the OS, it should be the some >>> bugs >>> >>>>>>>> in >>> >>>>>>>> it >>> >>>>>>>> because on my LINUX machine the locale is en_US.UTF-8. Our default >>> >>>>>>>> codec >>> >>>>>>>> is >>> >>>>>>>> still ISO8859-1. Do you know where can we found such codes? >>> >>>>>>>> >>> >>>>>>> Classlib expected vm do this and set the property, but it didn't, >>> so >>> >>>>>>> we >>> >>>>>>> have to do this by ourselves. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>>> On Tue, Jul 14, 2009 at 10:17 PM, Nathan Beyer <nbe...@gmail.com> >>> >>>>>>>> wrote: >>> >>>>>>>> >>> >>>>>>>> Are we talking about windows or linux?the default file encoding >>> >>>>>>>> should >>> >>>>>>>>> >>> >>>>>>>>> derive from the OS. I believe that's defined by the specs. >>> >>>>>>>>> >>> >>>>>>>>> Sent from my iPhone >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> On Jul 14, 2009, at 5:51 AM, Charles Lee <littlee1...@gmail.com> >>> >>>>>>>>> wrote: >>> >>>>>>>>> >>> >>>>>>>>> On Tue, Jul 14, 2009 at 6:12 PM, Jimmy,Jing Lv >>> >>>>>>>>> <firep...@gmail.com> >>> >>>>>>>>> >>> >>>>>>>>>> wrote: >>> >>>>>>>>>> >>> >>>>>>>>>> Hi, >>> >>>>>>>>>> >>> >>>>>>>>>>> Charles, I believe UTF-8 is the default encoding for RI, and >>> it >>> >>>>>>>>>>> sounds >>> >>>>>>>>>>> reasonable. >>> >>>>>>>>>>> BTW, it may encounter some compatibility problem, maybe we >>> need >>> >>>>>>>>>>> to >>> >>>>>>>>>>> run >>> >>>>>>>>>>> more tests to verify? >>> >>>>>>>>>>> >>> >>>>>>>>>>> 2009/7/14 Charles Lee <littlee1...@gmail.com> >>> >>>>>>>>>>> >>> >>>>>>>>>>> Hi guys: >>> >>>>>>>>>>> >>> >>>>>>>>>>>> I am doing some test cases on the ant junit test case and >>> >>>>>>>>>>>> meeting >>> >>>>>>>>>>>> some >>> >>>>>>>>>>>> encoding problems. I find they are maybe caused by the >>> different >>> >>>>>>>>>>>> default >>> >>>>>>>>>>>> encoding from RI and harmony. My local is en_US.UTF-8, RI >>> >>>>>>>>>>>> default is >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> UTF-8 >>> >>>>>>>>>>> >>> >>>>>>>>>>> but harmony is 8859-1. And then I have encountered >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> HARMONY-3736< >>> https://issues.apache.org/jira/browse/HARMONY-3736>, >>> >>>>>>>>>>>> and the two diffs attached on that issue. It seems we always >>> get >>> >>>>>>>>>>>> 8859-1. >>> >>>>>>>>>>>> Because: (correct me if wrong :-) >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> 1. we remove the set code in the vm. we will always get null >>> if >>> >>>>>>>>>>>> we >>> >>>>>>>>>>>> call >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> vm >>> >>>>>>>>>>> >>> >>>>>>>>>>> method >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> 2. we set the file.encode in the libglob.c, if we got null >>> from >>> >>>>>>>>>>>> vm, >>> >>>>>>>>>>>> we >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> set >>> >>>>>>>>>>> >>> >>>>>>>>>>> Sorry, it should be luniglob.c >>> >>>>>>>>>>> >>> >>>>>>>>>> 8859-1. >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> 3. we can not set file.encode on the run time. >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> ant use UTF-8 to encode filename which contains the non-ascii >>> >>>>>>>>>>>> character. >>> >>>>>>>>>>>> So why we use iso8859-1 as our unchangeable default? >>> >>>>>>>>>>>> From the wiki http://en.wikipedia.org/wiki/ISO8859-1, it says >>> >>>>>>>>>>>> "In >>> >>>>>>>>>>>> computing >>> >>>>>>>>>>>> applications, encodings that provide full UCS support (such as >>> >>>>>>>>>>>> UTF-8<http://en.wikipedia.org/wiki/UTF-8>and >>> >>>>>>>>>>>> UTF-16 <http://en.wikipedia.org/wiki/UTF-16>) are finding >>> >>>>>>>>>>>> increasing >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> favor >>> >>>>>>>>>>> >>> >>>>>>>>>>> over encodings based on ISO 8859-1." Should we simply change >>> >>>>>>>>>>> iso8859-1 >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> to >>> >>>>>>>>>>>> utf-8? >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> -- >>> >>>>>>>>>>>> Yours sincerely, >>> >>>>>>>>>>>> Charles Lee >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> >>>>>>>>>>> -- >>> >>>>>>>>>>> >>> >>>>>>>>>>> Best Regards! >>> >>>>>>>>>>> >>> >>>>>>>>>>> Jimmy, Jing Lv >>> >>>>>>>>>>> China Software Development Lab, IBM >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>> -- >>> >>>>>>>>>> Yours sincerely, >>> >>>>>>>>>> Charles Lee >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>> -- >>> >>>>>>> Best Regards, >>> >>>>>>> Regis. >>> >>>>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> Best Regards, >>> >>>>> Regis. >>> >>>>> >>> >>>> >>> >>> >>> >>> >>> >>> -- >>> >>> Best Regards, >>> >>> Regis. >>> >> >>> >> >>> >> >>> >> -- >>> >> Yours sincerely, >>> >> Charles Lee >>> >> >>> > >>> > >>> > >>> > -- >>> > Yours sincerely, >>> > Charles Lee >>> > >>> > >>> >> >> >> >> -- >> Yours sincerely, >> Charles Lee >> >