Hi,all

Following Tony Thompson's advice, I got the JTidy.Because JTidy don't 
support the Chinese-simple character encodings,so I use command line like 
this:

java -jar Tidy.jar -raw -asxml -m mine.html

Although it seems to work for everybody, but still something trouble. The 
&nbsp entity is parsed with '?' (the HEX code is #A030 ).

I spend lots of time testing and thinking, at last, I decide substituting " 
" for &nbsp with jakarta-oro, then converting html to wellformed xml with 
JTidy.

That's great! It's working sucessful.

But, as a side note, could the JTidy or tidy can convering directly the 
chinese character encoding html to wellformed xml?? :)

Thanks a lot.
Surfbird


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp

Reply via email to