On Sat, 18 Nov 2000, ha shao wrote:

> On Fri, Nov 17, 2000 at 09:24:05PM +0400, [EMAIL PROTECTED] wrote:
> > 
> >  wv was really terribly broken for word6 format files. Here is a patch that
> > fixes this.
> > 
> >  To CJK guys:
> > 
> > * Now word6.doc from _Belcon_ gets imported properly too (and word2k document
> >   from Chih-Wei Huang also OK)
> > 
> > * word6.doc from Chih-Wei Huang's mail I've forwarded here doesn't import
> >   properly -  (chars are not converted to unicode) since wv thinks it's in 
> >   word7 format (!) - wvQuerySupported(&ps->fib,NULL) returns WORD7, so it 
> >   seems there is no clean workaround/hack for importing it (may be wordpad is 
> >   that broken - is word able to read this file ? And what version of windows
> >   wordpad is used from -  is it from win2k or from NT or from win9x?). 
> >   IMO the only hack that can be used - is to check whether the
> >   arrived character's code  is less (or more or is in the range) than some
> >   constant for given charset, and if doesn't satisfy constraints on the value, 
> >   its character type is set to '1' to force conversion to unicode.
> 
> I only see word6 and word8/Word97 document at 
> http://www.wotsit.org/search.asp?s=text
> So it might as well that WORD7 do not use unicode either. It looks
> like that word6 has more common features with Word95 than
> word8 has with word95.

 Yes, I have the same impression now.
 
> I only see one place in the word97 document mentioned unicode with word95 
> that state:
> =====
> XCHAR( eXtended CHARacter set):
> 
> A data type which defines a "character". Each XCHAR corresponds to a character in 
>the document, where "character" is defined as
> a glyph, regardless of whether it is a single-byte or double-byte character. With 
>Word6/FE, Word95/FE, Word97/all and future
> versions of Word, this is defined as a 16-bit integer corresponding to the Unicode 
>character code of the glyph.
> ======
> where /FE means far-east.
> If set chartype to 1 for word7 format also get proper result for
> .doc import under other languages(russian?), we can assume word7
> behave similarly with word6 in this aspect. What do you think?

 I should say I didn't have problems importing russian Word95 documents (I
believe - I don't remember exactly whether I tested word95 docs) - at least I
didn't have problems with docs generated by wordpads (chartype was set 
correctly). So I conclude that word6/7/wordpad doesn't set chartype properly 
only for CJK docs. So we can change " <= WORD6"  to "<=WORD7" in the hack
you've added and check results.

> -- 
> Best regard
> ha_shao
> 

 Best regards,
  -Vlad




Reply via email to