Hi Ryan,

You're right. I did some benchmarking and found that the function,
fillInDefaults() took over 70% of time cost.

I am wondering if it is possible to simply assign a NULL as the
default for the column not showing in the read schema. For example,

private void fillInDefaults() {

    for (Map.Entry<Schema.Field, Object> entry : recordDefaults.entrySet()) {
      Schema.Field f = entry.getKey();
      // replace following with model.deepCopy once AVRO-1455 is being used
      Object defaultValue = null;
      this.currentRecord.put(f.pos(), defaultValue);
    }

  }

In the application, the default value would be corrected if the column
is accessed.

What do you think?

Thanks,
Yan

On Mon, Dec 8, 2014 at 6:29 PM, Ryan Blue <[email protected]> wrote:
> On 12/04/2014 01:22 PM, Yan Qi wrote:
>>
>> I see.
>>
>> Is there one way to keep the original object?  My use case might be a lit
>> different, because the read and projected schema can change frequently but
>> the file schema is kind of fixed.
>>
>> Thanks,
>> Yan
>
>
> Assuming that my guess is the actual cause of the slow-down, then I think
> you have a choice between changing objects and defaulting the fields you've
> projected out. I think this sounds reasonable in general, but I'm open to
> suggestions for how we might fix it. I don't think that it's a possibility
> to eliminate columns and not fill in defaults though.
>
>
> rb
>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.

Reply via email to