Hi,all Following Tony Thompson's advice, I got the JTidy.Because JTidy don't support the Chinese-simple character encodings,so I use command line like this:
java -jar Tidy.jar -raw -asxml -m mine.html Although it seems to work for everybody, but still something trouble. The   entity is parsed with '?' (the HEX code is #A030 ). I spend lots of time testing and thinking, at last, I decide substituting " " for   with jakarta-oro, then converting html to wellformed xml with JTidy. That's great! It's working sucessful. But, as a side note, could the JTidy or tidy can convering directly the chinese character encoding html to wellformed xml?? :) Thanks a lot. Surfbird _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp