I use "LATERAL VIEW explode(...)" to read data from a parquet-file but the full schema is requeseted by parquet instead just the used columns. When I didn't use LATERAL VIEW the requested schema has just the two columns which I use. Is it correct or is there place for an optimization or do I understand there somthing wrong?
Here are my examples: 1) hiveContext.sql("SELECT userid FROM pef WHERE observeddays==20140509") The requested schema is: optional group observedDays (LIST) { repeated int32 array; } required int64 userid; } This is what I expect although the result does not work, but that is not the problem here! 2) hiveContext.sql("SELECT userid FROM pef LATERAL VIEW explode(observeddays) od AS observed WHERE observed==20140509") the requested schema is: required int64 userid; optional int32 source; optional group observedDays (LIST) { repeated int32 array; } optional group placetobe (LIST) { repeated group bag { optional group array { optional binary palces (UTF8); optional group dates (LIST) { repeated int32 array; } } } } } Why does parquet request the full schema. I just use two fields of the table. Can somebody please explain me why this can happen. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LATERAL-VIEW-explode-requests-the-full-schema-tp21893.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org