On Thu, Jul 16, 2009 at 8:35 PM, Nathan Beyer<ndbe...@apache.org> wrote: > On Thu, Jul 16, 2009 at 8:26 PM, Nathan Beyer<ndbe...@apache.org> wrote: >> On Thu, Jul 16, 2009 at 8:18 PM, Charles Lee<littlee1...@gmail.com> wrote: >>> Hi Nathan, >>> >>> What I got is 936, the code page identifier. Is there a api for us to map >>> 936 to the gb2312? >> >> Oh, the 'identifier' bit was missing - yeah, we'll need to translate >> that into a name of some sort. I'll poke around a bit and see what I >> can find. > > We'll probably just have to put in a mapping ourselves based on the > documentation. We'd call GetACP [1] and map that to a known alias in > java.nio.charset that matches the definitions[2] of the identifiers. > > [1] http://msdn.microsoft.com/en-us/library/dd318070%28VS.85%29.aspx > [2] http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx
This may be better - APR has a function for getting the OS default encoding. This would work across all platforms that APR supports and I believe we already use APR. http://apr.apache.org/docs/apr/1.3/group__apr__portabile.html#g6e21845a4a5f3b7dd107b2beea50c91e -Nathan > >> >>> If we put 936 in the file.encoding, can we successfully get the encoder and >>> decoder by charset? >>> >>> On Fri, Jul 17, 2009 at 9:05 AM, Nathan Beyer <ndbe...@apache.org> wrote: >>> >>>> On Thu, Jul 16, 2009 at 1:28 AM, Charles Lee<littlee1...@gmail.com> wrote: >>>> > Hi guys, >>>> > >>>> > I have add the locale function in the drlvm, the patch is attached. >>>> Please >>>> > try this new patch on the linux. >>>> > >>>> > The patch should work on the linux but fail on the windows. Because >>>> windows >>>> > returns code page not charset from the setlocale. >>>> >>>> Code page and character set are the same thing. We shouldn't need to >>>> convert it as the Charset APIs will have to support the values anyway. >>>> >>>> What's the value you're getting? If it's 'Cp1252', then we're good, as >>>> that's just an alias for 'Windows-1252' (or vice-versa). >>>> >>>> -Nathan >>>> >>>> >>>> > I hv tried long time to >>>> > get the charset name from the codepage, for example: >>>> > CPINFOEX cpInfoEx; >>>> > BOOL iReturn = GetCPInfoEx(CP_ACP,0, &cPInfoEx); >>>> > if (iReturn > 0) { >>>> > printf("FULL NAME %s\n", cPinfoEx,CodePageName); >>>> > } >>>> > But I only get the full name without any format. >>>> > >>>> > There is code page identifiers map in the msdn, detail here. I may hard >>>> code >>>> > this map in the file. But the note on the msdn says: >>>> > "ANSI code pages can be different on different computers, or can be >>>> > changed for a single computer, leading to data corruption. For the most >>>> > consistent results, applications should use Unicode, such as UTF-8 or >>>> > UTF-16, instead of a specific code page." >>>> > I am afraid hard-code will fail on some machines. (By the way, this seems >>>> > the UTF-8 is suggested to be the default again :-) >>>> > >>>> > There is also a class Encoding in the VC++, detail here. But we can not >>>> use >>>> > it here. >>>> > >>>> > So anyone knows some thing about locale on the windows? >>>> > Again, shall use UTF-8 as our default? >>>> > >>>> > On Wed, Jul 15, 2009 at 2:12 PM, Charles Lee <littlee1...@gmail.com> >>>> wrote: >>>> >> >>>> >> That seems we should add it in the drlvm. >>>> >> >>>> >> On Wed, Jul 15, 2009 at 1:58 PM, Regis <xu.re...@gmail.com> wrote: >>>> >>> >>>> >>> Nathan Beyer wrote: >>>> >>>> >>>> >>>> Is the IBM VME dealing with this correctly? Do we just need to fix >>>> >>>> DRLVM? >>>> >>> >>>> >>> Yes, I only tested on Linux, IBM VME set the property correctly. >>>> >>> >>>> >>>> >>>> >>>> On Wed, Jul 15, 2009 at 12:25 AM, Regis<xu.re...@gmail.com> wrote: >>>> >>>>> >>>> >>>>> Kevin Zhou wrote: >>>> >>>>>> >>>> >>>>>> Yea, from luniglob.c, CL attempts to read the "file.encoding" >>>> property >>>> >>>>>> adown >>>> >>>>>> VM but fails to get the correct encoding. >>>> >>>>>> >>>> >>>>>> Regis, do you know any other specific ways that CL can gain the >>>> right >>>> >>>>>> property? >>>> >>>>> >>>> >>>>> We can get from OS directly. Maybe just read env variables on Linux? >>>> >>>>> >>>> >>>>>> Wed, Jul 15, 2009 at 9:59 AM, Regis <xu.re...@gmail.com> wrote: >>>> >>>>>> >>>> >>>>>>> Charles Lee wrote: >>>> >>>>>>> >>>> >>>>>>>> Hi Nanthan, >>>> >>>>>>>> >>>> >>>>>>>> If the file encoding derive from the OS, it should be the some >>>> bugs >>>> >>>>>>>> in >>>> >>>>>>>> it >>>> >>>>>>>> because on my LINUX machine the locale is en_US.UTF-8. Our default >>>> >>>>>>>> codec >>>> >>>>>>>> is >>>> >>>>>>>> still ISO8859-1. Do you know where can we found such codes? >>>> >>>>>>>> >>>> >>>>>>> Classlib expected vm do this and set the property, but it didn't, >>>> so >>>> >>>>>>> we >>>> >>>>>>> have to do this by ourselves. >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>>> On Tue, Jul 14, 2009 at 10:17 PM, Nathan Beyer <nbe...@gmail.com> >>>> >>>>>>>> wrote: >>>> >>>>>>>> >>>> >>>>>>>> Are we talking about windows or linux?the default file encoding >>>> >>>>>>>> should >>>> >>>>>>>>> >>>> >>>>>>>>> derive from the OS. I believe that's defined by the specs. >>>> >>>>>>>>> >>>> >>>>>>>>> Sent from my iPhone >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> On Jul 14, 2009, at 5:51 AM, Charles Lee <littlee1...@gmail.com> >>>> >>>>>>>>> wrote: >>>> >>>>>>>>> >>>> >>>>>>>>> On Tue, Jul 14, 2009 at 6:12 PM, Jimmy,Jing Lv >>>> >>>>>>>>> <firep...@gmail.com> >>>> >>>>>>>>> >>>> >>>>>>>>>> wrote: >>>> >>>>>>>>>> >>>> >>>>>>>>>> Hi, >>>> >>>>>>>>>> >>>> >>>>>>>>>>> Charles, I believe UTF-8 is the default encoding for RI, and >>>> it >>>> >>>>>>>>>>> sounds >>>> >>>>>>>>>>> reasonable. >>>> >>>>>>>>>>> BTW, it may encounter some compatibility problem, maybe we >>>> need >>>> >>>>>>>>>>> to >>>> >>>>>>>>>>> run >>>> >>>>>>>>>>> more tests to verify? >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> 2009/7/14 Charles Lee <littlee1...@gmail.com> >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> Hi guys: >>>> >>>>>>>>>>> >>>> >>>>>>>>>>>> I am doing some test cases on the ant junit test case and >>>> >>>>>>>>>>>> meeting >>>> >>>>>>>>>>>> some >>>> >>>>>>>>>>>> encoding problems. I find they are maybe caused by the >>>> different >>>> >>>>>>>>>>>> default >>>> >>>>>>>>>>>> encoding from RI and harmony. My local is en_US.UTF-8, RI >>>> >>>>>>>>>>>> default is >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> UTF-8 >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> but harmony is 8859-1. And then I have encountered >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> HARMONY-3736< >>>> https://issues.apache.org/jira/browse/HARMONY-3736>, >>>> >>>>>>>>>>>> and the two diffs attached on that issue. It seems we always >>>> get >>>> >>>>>>>>>>>> 8859-1. >>>> >>>>>>>>>>>> Because: (correct me if wrong :-) >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> 1. we remove the set code in the vm. we will always get null >>>> if >>>> >>>>>>>>>>>> we >>>> >>>>>>>>>>>> call >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> vm >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> method >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> 2. we set the file.encode in the libglob.c, if we got null >>>> from >>>> >>>>>>>>>>>> vm, >>>> >>>>>>>>>>>> we >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> set >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> Sorry, it should be luniglob.c >>>> >>>>>>>>>>> >>>> >>>>>>>>>> 8859-1. >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> 3. we can not set file.encode on the run time. >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> ant use UTF-8 to encode filename which contains the non-ascii >>>> >>>>>>>>>>>> character. >>>> >>>>>>>>>>>> So why we use iso8859-1 as our unchangeable default? >>>> >>>>>>>>>>>> From the wiki http://en.wikipedia.org/wiki/ISO8859-1, it says >>>> >>>>>>>>>>>> "In >>>> >>>>>>>>>>>> computing >>>> >>>>>>>>>>>> applications, encodings that provide full UCS support (such as >>>> >>>>>>>>>>>> UTF-8<http://en.wikipedia.org/wiki/UTF-8>and >>>> >>>>>>>>>>>> UTF-16 <http://en.wikipedia.org/wiki/UTF-16>) are finding >>>> >>>>>>>>>>>> increasing >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> favor >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> over encodings based on ISO 8859-1." Should we simply change >>>> >>>>>>>>>>> iso8859-1 >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> to >>>> >>>>>>>>>>>> utf-8? >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> -- >>>> >>>>>>>>>>>> Yours sincerely, >>>> >>>>>>>>>>>> Charles Lee >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>> -- >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> Best Regards! >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> Jimmy, Jing Lv >>>> >>>>>>>>>>> China Software Development Lab, IBM >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>>> -- >>>> >>>>>>>>>> Yours sincerely, >>>> >>>>>>>>>> Charles Lee >>>> >>>>>>>>>> >>>> >>>>>>>>>> >>>> >>>>>>> -- >>>> >>>>>>> Best Regards, >>>> >>>>>>> Regis. >>>> >>>>>>> >>>> >>>>> >>>> >>>>> -- >>>> >>>>> Best Regards, >>>> >>>>> Regis. >>>> >>>>> >>>> >>>> >>>> >>> >>>> >>> >>>> >>> -- >>>> >>> Best Regards, >>>> >>> Regis. >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Yours sincerely, >>>> >> Charles Lee >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > Yours sincerely, >>>> > Charles Lee >>>> > >>>> > >>>> >>> >>> >>> >>> -- >>> Yours sincerely, >>> Charles Lee >>> >> >