tnx I will read the doc carefully 在 2016年8月31日星期三 UTC+8下午9:15:47,Paul Tremberth写道: > > Hello, > > you can fetch this URL with scrapy shell and check the response encoding: > > $ scrapy shell "http://chengyu.t086.com/chaxun.php?q1=%C0%EF&q2=&q3=&q4=" > 2016-08-31 15:05:58 [scrapy] INFO: Scrapy 1.1.2 started (bot: scrapybot) > (... > 2016-08-31 15:05:59 [scrapy] DEBUG: Crawled (200) <GET > http://chengyu.t086.com/chaxun.php?q1=%C0%EF&q2=&q3=&q4=> (referer: None) > (...) > In [1]: response.encoding > Out[1]: 'gb18030' > > > Note: in the examples below, I'm using Python 3.5 > > You can also verify the encoding of the URL using parse_qs[1] (in Python3 > you can pass the encoding) > > In [2]: from urllib.parse import parse_qs > > In [3]: parse_qs('q1=%C0%EF&q2=&q3=&q4=', encoding='gb18030') > Out[3]: {'q1': ['里']} > > > > When building Requests objects, when you build your URLs with Chinese > characters > you'll need to either pass safe URL strings with gb18030 encoding (here > I pass the same 里 4 times (this is just an example obvisouly), > > In [4]: from w3lib.url import safe_url_string > > In [5]: safe_url_string(' > http://chengyu.t086.com/chaxun.php?q1=%C0%EF&q2=里&q3=里&q4=里 > <http://chengyu.t086.com/chaxun.php?q1=%C0%EF&q2=%E9%87%8C&q3=%E9%87%8C&q4=%E9%87%8C>', > > encoding=response.encoding) > Out[5]: ' > http://chengyu.t086.com/chaxun.php?q1=%C0%EF&q2=%C0%EF&q3=%C0%EF&q4=%C0%EF > ' > > In [6]: from scrapy import Request > > In [7]: Request(' > http://chengyu.t086.com/chaxun.php?q1=%C0%EF&q2=%C0%EF&q3=%C0%EF&q4=%C0%EF > ') > Out[7]: <GET > http://chengyu.t086.com/chaxun.php?q1=%C0%EF&q2=%C0%EF&q3=%C0%EF&q4=%C0%EF > > > > > > > or pass the encoding parameter to the Request constructor, otherwise, > UTF-8 is used for query parameters before percent-escaping: > > In [9]: Request('http://chengyu.t086.com/chaxun.php?q1=里&q2=里') > Out[9]: <GET http://chengyu.t086.com/chaxun.php?q1=%E9%87%8C&q2=%E9%87%8C> > > In [10]: Request('http://chengyu.t086.com/chaxun.php?q1=里&q2=里', > encoding='gb18030') > Out[10]: <GET http://chengyu.t086.com/chaxun.php?q1=%C0%EF&q2=%C0%EF> > > > Hope it helps. > > Regards, > Paul. > > [1] > https://docs.python.org/3/library/urllib.parse.html#urllib.parse.parse_qs > > > > On Tuesday, August 30, 2016 at 11:33:27 AM UTC+2, 李哲 wrote: >> >> http://chengyu.t086.com/chaxun.php?q1=%C0%EF&q2=&q3=&q4= >> >> >> >> I want to scrape this web, I should fill the &q1="some thing" or &q2= >> "something", with the Chinese character which is encoded, >> >> but the query content %C0%EF seems not utf - 8 encoding(or urlencoding, >> both I have tried). How can I know the encoding format? >> >> What is the encoding format here ? >> >> >>
-- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
