No. While I tried his stuff I noticed that the US Stocks database seemed to be 
out of date. Since I knew this kind of thing is "simple" using Python I decided 
to write my own downloader from scratch. 

Of course the devil is in the details and I still have a few issues to sort out.

--- In [email protected], "areehoi" <aree...@...> wrote:
>
> This is great news.  Have you been in contact with Jim Swindle.  I know he 
> has been looking for someone to take over the updating of the US-Stocks 
> database. He has provided a great service so hopefully you can do the same as 
> you've solved the main ingredient. Let us hear from you on progress on the 
> project.  Thanks for your interest and help.
> 
> Dick H
> 
> --- In [email protected], "tpowers2010" <wingusr@> wrote:
> >
> > I'm currently working on a Python 2.5 script to download all the stocks
> > listed in the Yahoo Industry Browser <http://biz.yahoo.com/p/>   by
> > sector then industry.
> > 
> > I basically do the same thing that is done by the Excel workbook found
> > at http://icc-az.com/amibroker_files%5CStocks_XLS.zip
> > <http://icc-az.com/amibroker_files%5CStocks_XLS.zip> . However, that
> > page says "Since this using plain VBA for all extraction, it is very
> > slow. Expect 12 hours to do an extract...".
> > 
> > For comparison, my Python script currently takes about 8 minutes or so.
> > The main reason is that I can get ticker, company name, sector, and
> > industry without having to download the individual company profile
> > pages. And, unlike the Excel solution which downloads entire webpages
> > (including images), I only have to grab the basic html page.
> > 
> > Using the Python 3rd party BeautifulSoup module
> > <http://www.crummy.com/software/BeautifulSoup/> , it turns out it's
> > pretty easy to extract the required information from the raw html
> > (rather than making Excel convert webpages to spreadsheets).
> > 
> > Finally, to get the exchange information, instead of having to read each
> > company's profile page I use the
> > http://finance.yahoo.com/d/quotes.csv?s=TICKERS&f=x
> > <http://finance.yahoo.com/d/quotes.csv?s=TICKERS&f=x>  URL with TICKERS
> > replaced with a + separated list of ticker symbols to get the exchanges
> > for 200 companies at once.
> > 
> > A caveat is that it turns out that getting info from the Industry
> > Browser pages alone surprisingly yields ticker symbols that are already
> > incorrect! (This seems to happen for any stock whose exchange is listed
> > as "n/a". My impression is that the newer Yahoo 
> > <http://biz.yahoo.com/ic/ind_index.html> Industry Center
> > <http://biz.yahoo.com/ic/ind_index.html>  page is more accurate but
> > slightly harder to parse.
> > 
> > Therefore to be absolutely sure that the tickers are valid, you end up
> > having to make sure you can download each companies profile or quotes
> > page. The only time I've tried doing that took about 3 hours. As a side
> > benefit of this process you can scape additional information on each
> > company (such as number of employees). Only about 10 or so of the 7500+
> > symbols were listed incorrectly on the main Industry Browser pages (all
> > of them being OTC BB traded stocks).
> > 
> > I'm thinking about using multiple threads to download say 10 pages at
> > once to speed up this last process. Unfortunately, I didn't design the
> > original code to be thread-safe so this will take some work.
> > 
> > Once I have the basic stock information I spit out a .csv list (readable
> > by Excel), broker.sectors, and broker.industries files. I also use a
> > separate small Python script to initialize a new AmiBroker database. You
> > have to manually update the Markets since there is apparently no way to
> > do this from COM (but there are only 8 of them).
> > 
> > One thing I noticed is that the brokers.industries file used to
> > initialize new databases seems to have an undocumented limit of about 38
> > or 39 characters for Industry Name? The "Textile - Apparel Footwear &
> > Accessories" industry gets truncated and a bogus industry gets added
> > unless I first limit the industry name length.
> > 
> > Also, Industries don't appear to be sorted correctly under their Sectors
> > (I saw another post here that mentions the same thing).
> > 
> > Anyway, this is all somewhat of a work in progress. It also is a
> > command-line only script. There is no GUI associated with it. You'll
> > have to be comfortable with installing ActiveState's free python 2.5 for
> > Windows distribution, installing the BeautifulSoup, and mechanize
> > modules, and running scripts from a Command Prompt.
> >
>


Reply via email to