Hi, If Needed you can run Below Script for Storing Data on your Local System
for i in {1901..2012} do cd /home/ubuntu/work/ wget -r -np -nH .cut-dirs=3 -R index.html http://ftp3.ncdc.noaa.gov/pub/data/noaa/$i/ cd pub/data/noaa/$i/ cp *.gz /home/ubuntu/work/files cd /home/ubuntu/work/ rm -r pub/ done On Mon, Feb 13, 2012 at 3:43 PM, Andy Doddington <a...@doddington.net>wrote: > OK, well for starters, I think you can safely ignore the PDF data; to > paraphrase Star Wars" “that isn’t the data > in which you are interested”. > > Page 16 of the book describes the data format and refers to a data store > that contains directories for each year from > 1901 to 2001. It also shows the naming of .gz files within a sample > directory (1990). The files in this directory have > names "010010-99999-1990.gz", "010014-99999-1990.gz", > "010015-99999-1990.gz", and so on… > > Referring back to the NCDC web site, at the link below ( > http://www.ncdc.noaa.gov) and clicking on the ‘Free Data’ > link on the left-hand side of the screen beings up a new screen, as shown > below: > > > Clicking again on the ‘Free Data’ link in the middle section of this page > brings up another page, listing the available > data sets: > > > As this page notes, although some of this data needs to be paid for, there > is at least one ‘free’ options within > each section. For simplicity, I went for the first one - the one labelled > “3505 FTP data access” - which the comment > says is free. I used anonymous FTP and found that this site contained > directories for each year from 1901 to 2012. > I expect the additional directories reflect the fact that time has moved > on since the book was written :-) > > There are also several text or pdf files that provide further information > on the contents of the site. I suggest you > read some of these to get more details. One of these is called > "ish-format-document.pdf" and it seems to describe > the document format in some detail. If you open this, you can check > whether it matches the formate expected by > the hadoop sample code. There is also a ‘software’ directory, which > contains various bits of code that might > prove useful. > > On drilling down into the directory for 1990, I get the following list of > files: > > > Which looks close enough to the the file names in the hadoop book - I’d > guess that these are the correct files. > > Given the passage of time, it is still possible that the file format has > changed to make it incompatible with the > hadoop code. However, it shouldn’t be that difficult to modify the code to > suit the new format (which is very > well documented, as already noted). > > Good luck! > > Andy > > —————————————— > > On 12 Feb 2012, at 08:50, Bing Li wrote: > > Andy, > > Since there is a lot of data on the free data of the site, I cannot figure > out which one is the one talked in the book. Any format differences might > cause the source code to get exceptions. Some data is even in PDF format! > > Thanks so much! > Bing > > On Sun, Feb 12, 2012 at 4:35 PM, Andy Doddington <a...@doddington.net > >wrote: > > According to Page 15 of the book, this data is available from the US > > National Climatic Data Center, at > > http://www.ncdc.noaa.gov. Once you get to this site, there is a menu of > > links on the left-hand side of the > > page, listed under the heading ‘Data & Products’. I suspect that the entry > > labelled ‘Free Data’ is the most > > likely area you need to investigate :-) > > > Good Luck > > > Andy D > > > ———————————————————— > > > On 12 Feb 2012, at 07:14, Bing Li wrote: > > > Dear all, > > > I am following the book, Hadoop: the Definitive Guide. However, I got > > stuck > > because I could not get the NCDC Weather data that is used by the source > > code in the book. The Appendix C told me I could follow some instructions > > in www.hadoopbook.com. But I didn't get the instructions there. Could > > you > > give me a hand? > > > Thanks so much! > > > Best regards, > > Bing > > > > >