Hi, I'm very new to Hadoop and am working through how we may be able to apply it to our data set.
One of the things that I am struggling with is understanding if it is possible to pass tell Hadoop that only parts of the input file will be needed for a specific job. The reason I believe I may need this is that we have two big dimensions in our data set. Queries may want only one of these dimensions and while some un-needed reading is unavoidable there are cases that reading the entire data set presents a very significant overhead. Or have I just misunderstood something ;-( thanks -- *Franc Carter* | Systems architect | Sirca Ltd <marc.zianideferra...@sirca.org.au> franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215