On 12/04/2014 10:43 AM, Yan Qi wrote:
Hi Ryan,

Thanks for your quick reply!

Probably you're right. The projected schema has 4 columns (out of 13 in the
read schema). If that's the problem, how does the read schema get FILTERED?
I thought the read schema should be always the same as the file schema
(i.e., Profile.getClassSchema()), right?

Thanks,
Yan

The read schema is the schema that your application expects. If you rely on 4 data fields in your application, then your read schema should reflect that. The reason why the read and projection schemas are separate is that you might want to load 4 columns of data, but the object you're using has more fields that you'll just ignore. In that case, you don't mind that those are set to default values instead of data values.

I actually think we need to fix how this works and derive the projection schema from the read schema and the file schema. That way, we wouldn't default columns that you don't want loaded and you select all columns from the data if it exists there.

rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to