Hi,

I'm very new to Hadoop and am working through how we may be able to apply
it to our data set.

One of the things that I am struggling with is understanding if it is
possible to pass tell Hadoop that only parts of the input file will be
needed for a specific job. The reason I believe I may need this is that we
have two big dimensions in our data set. Queries may want only one of these
dimensions and while some un-needed reading is unavoidable there are cases
that reading the entire data set presents a very significant overhead.

Or have I just misunderstood something ;-(

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 <marc.zianideferra...@sirca.org.au>

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Reply via email to