DFS writes:

> On 5/2/2016 12:57 PM, Jussi Piitulainen wrote:
>> DFS writes:
>>
>>> Have: list1 = ['\r\n   Item 1  ','  Item 2  ','\r\n  ']
>>> Want: list1 = ['Item 1','Item 2']

. .

>> Funny-looking data you have.
>
> I know - sadly, it's actual data:
>
> --------------------------------------------------------------------
> from lxml import html
> import requests
>
> webpage =
> "http://www.usdirectory.com/ypr.aspx?fromform=qsearch&qs=TN&wqhqn=2&qc=Nashville&rg=30&qhqn=restaurant&sb=zipdisc&ap=2";
>
> page  = requests.get(webpage)
> tree  = html.fromstring(page.content)
> addr1 = tree.xpath('//span[@class="text3"]/text()')
> print 'Addresses: ', addr1
> --------------------------------------------------------------------
>
> I couldn't figure out a better way to extract it from the HTML (maybe
> XML and DOM?)

I should have guessed :) But now I'm a bit worried about those spaces
inside your items. Can it happen that item text is split into strings in
the middle? Then the above sanitation does the wrong thing.

If someone has the right solution, I'm watching, too.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to