web scraping help / better way to do it ?

Matt Tue, 19 Jan 2016 00:46:22 -0800

Beginner python user (3.5) and trying to scrape this page and get the ladder
-   www.afl.com.au/ladder .  Its dynamic content so I used lynx -dump to get
a  txt file and parsing that.


Here is the code 

# import lynx -dump txt file
f = open('c:/temp/afl2.txt','r').read()

# Put import txt file into list 
afl_list = f.split(' ')

#here are the things we want to search for
search_list = ['FRE', 'WCE', 'HAW', 'SYD', 'RICH', 'WB', 'ADEL', 'NMFC',
'PORT', 'GEEL', 'GWS', 'COLL', 'MELB', 'STK', 'ESS', 'GCFC', 'BL', 'CARL']

def build_ladder():
    for l in search_list:
        output_num = afl_list.index(l)
        list_pos = output_num -1
        ladder_pos = afl_list[list_pos]
        print(ladder_pos + ' ' + '-' + ' ' + l)

build_ladder()


Which outputs this.

1 - FRE
2 - WCE
3 - HAW
4 - SYD
5 - RICH
6 - WB
7 - ADEL
8 - NMFC
9 - PORT
10 - GEEL
* - GWS
12 - COLL
13 - MELB
14 - STK
15 - ESS
16 - GCFC
17 - BL
18 - CARL

Notice that number 11 is missing because my script picks up "GWS" which is
located earlier in the page.  What is the best way to skip that (and get the
"GWS" lower down in the txt file) or am I better off approaching the code in
a different way?


TIA

Matt




-- 
https://mail.python.org/mailman/listinfo/python-list

web scraping help / better way to do it ?

Reply via email to