[amibroker] Re: Python 2.5 based Yahoo ticker downloader

haberdasher111 Mon, 03 Aug 2009 03:13:45 -0700

I was wondering whether the script referred to here is available anywhere.


Thanks,

H


--- In [email protected], "tpowers2010" <wing...@...> wrote:
>
> I'm currently working on a Python 2.5 script to download all the stocks
> listed in the Yahoo Industry Browser <http://biz.yahoo.com/p/>   by
> sector then industry.
> 
> I basically do the same thing that is done by the Excel workbook found
> at http://icc-az.com/amibroker_files%5CStocks_XLS.zip
> <http://icc-az.com/amibroker_files%5CStocks_XLS.zip> . However, that
> page says "Since this using plain VBA for all extraction, it is very
> slow. Expect 12 hours to do an extract...".
> 
> For comparison, my Python script currently takes about 8 minutes or so.
> The main reason is that I can get ticker, company name, sector, and
> industry without having to download the individual company profile
> pages. And, unlike the Excel solution which downloads entire webpages
> (including images), I only have to grab the basic html page.
> 
> Using the Python 3rd party BeautifulSoup module
> <http://www.crummy.com/software/BeautifulSoup/> , it turns out it's
> pretty easy to extract the required information from the raw html
> (rather than making Excel convert webpages to spreadsheets).
> 
> Finally, to get the exchange information, instead of having to read each
> company's profile page I use the
> http://finance.yahoo.com/d/quotes.csv?s=TICKERS&f=x
> <http://finance.yahoo.com/d/quotes.csv?s=TICKERS&f=x>  URL with TICKERS
> replaced with a + separated list of ticker symbols to get the exchanges
> for 200 companies at once.
> 
> A caveat is that it turns out that getting info from the Industry
> Browser pages alone surprisingly yields ticker symbols that are already
> incorrect! (This seems to happen for any stock whose exchange is listed
> as "n/a". My impression is that the newer Yahoo 
> <http://biz.yahoo.com/ic/ind_index.html> Industry Center
> <http://biz.yahoo.com/ic/ind_index.html>  page is more accurate but
> slightly harder to parse.
> 
> Therefore to be absolutely sure that the tickers are valid, you end up
> having to make sure you can download each companies profile or quotes
> page. The only time I've tried doing that took about 3 hours. As a side
> benefit of this process you can scape additional information on each
> company (such as number of employees). Only about 10 or so of the 7500+
> symbols were listed incorrectly on the main Industry Browser pages (all
> of them being OTC BB traded stocks).
> 
> I'm thinking about using multiple threads to download say 10 pages at
> once to speed up this last process. Unfortunately, I didn't design the
> original code to be thread-safe so this will take some work.
> 
> Once I have the basic stock information I spit out a .csv list (readable
> by Excel), broker.sectors, and broker.industries files. I also use a
> separate small Python script to initialize a new AmiBroker database. You
> have to manually update the Markets since there is apparently no way to
> do this from COM (but there are only 8 of them).
> 
> One thing I noticed is that the brokers.industries file used to
> initialize new databases seems to have an undocumented limit of about 38
> or 39 characters for Industry Name? The "Textile - Apparel Footwear &
> Accessories" industry gets truncated and a bogus industry gets added
> unless I first limit the industry name length.
> 
> Also, Industries don't appear to be sorted correctly under their Sectors
> (I saw another post here that mentions the same thing).
> 
> Anyway, this is all somewhat of a work in progress. It also is a
> command-line only script. There is no GUI associated with it. You'll
> have to be comfortable with installing ActiveState's free python 2.5 for
> Windows distribution, installing the BeautifulSoup, and mechanize
> modules, and running scripts from a Command Prompt.
>

[amibroker] Re: Python 2.5 based Yahoo ticker downloader

Reply via email to