On 12/04/2014 10:43 AM, Yan Qi wrote:
Hi Ryan,
Thanks for your quick reply!
Probably you're right. The projected schema has 4 columns (out of 13 in the
read schema). If that's the problem, how does the read schema get FILTERED?
I thought the read schema should be always the same as the file schema
(i.e., Profile.getClassSchema()), right?
Thanks,
Yan
The read schema is the schema that your application expects. If you rely
on 4 data fields in your application, then your read schema should
reflect that. The reason why the read and projection schemas are
separate is that you might want to load 4 columns of data, but the
object you're using has more fields that you'll just ignore. In that
case, you don't mind that those are set to default values instead of
data values.
I actually think we need to fix how this works and derive the projection
schema from the read schema and the file schema. That way, we wouldn't
default columns that you don't want loaded and you select all columns
from the data if it exists there.
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.