Thanks Yongqiang for quick reply. It was helpful.
Ashutosh On Mon, Jun 28, 2010 at 10:43, yongqiang he <heyongqiang...@gmail.com> wrote: > This is expected. > The reason we did that is because of easy implementation. Because that > way, hive will not need to compute array offset to get a column. > > On Mon, Jun 28, 2010 at 10:11 AM, Ashutosh Chauhan > <ashutosh.chau...@gmail.com> wrote: >> Hi, >> >> I am trying to use RCFile outside of realms of Hive. Though I am >> still using column serde and column struct to get the row. I found >> that the way to tell RCFile the columns I am interested in is through >> setting READ_COLUMN_IDS_CONF_STR key in jobconf. This worked except >> for one thing. If there are originally 5 columns in the data and I ask >> RCFile to project 3 columns out of it. I get back row of 5 columns >> with data in 3 columns I asked it to project and 2 nulls. I expected >> it to give me back row with exactly 3 columns. As a concrete example, >> assume data is as follows: >> >> 123 | 456 | "hadoop" | 23090L | 5.3D | >> and I ask to project column 0,2,4 I get back >> 123 | null | "hadoop" | null | 5.3D | >> instead I had expected to get: >> |123| "hadoop" | 5.3D | >> >> So, my question is this the expected behavior (or I am doing something >> wrong ?). If it is, then is this by design and it is expected that >> "higher layers" (like hive) are expected to reconstruct the row with >> nulls weeded out. >> >> Thanks, >> Ashutosh >> >