Re: pylint woes

DFS Sun, 08 May 2016 14:28:49 -0700

On 5/8/2016 7:36 AM, Steven D'Aprano wrote:

On Sun, 8 May 2016 11:16 am, DFS wrote:

address data is scraped from a website:

names = tree.xpath()
addr  = tree.xpath()


Why are you scraping the data twice?



Because it exists in 2 different sections of the document.

names     = tree.xpath('//span[@class="header_text3"]/text()')
addresses = tree.xpath('//span[@class="text3"]/text()')

I thought you were a "master who knew her tools", and I was theapprentice?


So why did "the master" think xpath() was magic?

names = addr = tree.xpath()

or if you prefer the old-fashioned:

names = tree.xpath()
addr = names

but that raises the question, how can you describe the same set of data as
both "names" and "addr[esses]" and have them both be accurate?

I want to store the data atomically,


I'm not really sure what you mean by "atomically" here. I know what *I* mean
by "atomically", which is to describe an operation which either succeeds
entirely or fails.


That's atomicity.



> But I don't know what you mean by it.

http://www.databasedesign-resource.com/atomic-database-values.html

so I parse street, city, state, and
zip into their own lists.


None of which is atomic.


All of which are atomic.

"1250 Peachtree Rd, Atlanta, GA 30303

street = [s.split(',')[0] for s in addr]
city   = [c.split(',')[1].strip() for c in addr]
state  = [s[-8:][:2] for s in addr]
zipcd  = [z[-5:] for z in addr]


At this point, instead of iterating over the same list four times, doing the
same thing over and over again, you should do things the old-fashioned way:

streets, cities, states, zipcodes = [], [], [], []
for word in addr:
    items = word.split(',')
    streets.append(items[0])
    cities.append(items[1].strip())
    states.append(word[-8:-2])
    zipcodes.append(word[-5:])




That's a good one.

Chris Angelico mentioned something like that, too, and I already put itplace.

Oh, and use better names. "street" is a single street, not a list of
streets, note plural.



I'll use whatever names I like.





--
https://mail.python.org/mailman/listinfo/python-list

Re: pylint woes

Reply via email to