The binary decoder needs some work to improve performance that requires some 
extra buffering. (AVRO-327).  Once that is done, adding on some deferred lazy 
load capabilities wouldn't be that intrusive, and I am willing to build it into 
the Java BinaryDecoder if it is needed.  

-Scott

On Jan 22, 2010, at 6:38 PM, Philip Zeyliger wrote:

> Not with any of today's APIs.  "SELECT col1, col3 FROM t" is handled
> easily: you construct a schema that only has those columns, and col2
> is skipped at read time.
> 
> Does Hive have a use case for this that you're interested in?  If you
> don't mind paying the buffer copy, you could probably write a
> "DeferredFoo" class that doesn't de-serialize certain structures...
> 
> -- Philip
> 
> On Fri, Jan 22, 2010 at 6:20 PM, Zheng Shao <[email protected]> wrote:
>> I noticed that avro has the "skip" functions which can help skip a
>> field when deserializing data.
>> This is good for column pruning in most cases, but we might be able to
>> do better in the following case.
>> 
>> 
>> Let's say we have a query like this:
>> 
>> CREATE TABLE t (col1 STRING, col2 STRING, col3 STRING);
>> SELECT col2 FROM t WHERE col3 = 'abcde';
>> 
>> We want to get field col3 first, if that matches what we want, then we
>> want to get to field col2.
>> 
>> 
>> Is there anyway to "remember" the current location of deserialization,
>> so that we can "resume" from that point?
>> 
>> 
>> --
>> Yours,
>> Zheng
>> 

Reply via email to