[ https://issues.apache.org/jira/browse/HIVE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731797#comment-15731797 ]
Tianwan Zhao commented on HIVE-6131: ------------------------------------ Hive provides new DDL to handle such situation in later versions after HIVE-3833 is applied in 0.11.0. See: [Language Manual CASCADE|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Add/ReplaceColumns] ??The CASCADE|RESTRICT clause is available in Hive 0.15.0. ALTER TABLE ADD|REPLACE COLUMNS with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column changes only to table metadata.?? In this case you can change your step 5 to 5. Alter the column definitions: ALTER TABLE jvaughan_test REPLACE COLUMNS (col1 string, col2 string) *CASCADE*; > New columns after table alter result in null values despite data > ---------------------------------------------------------------- > > Key: HIVE-6131 > URL: https://issues.apache.org/jira/browse/HIVE-6131 > Project: Hive > Issue Type: Bug > Affects Versions: 0.11.0, 0.12.0, 0.13.0, 1.2.1 > Reporter: James Vaughan > Priority: Critical > Attachments: HIVE-6131.1.patch > > > Hi folks, > I found and verified a bug on our CDH 4.0.3 install of Hive when adding > columns to tables with Partitions using 'REPLACE COLUMNS'. I dug through the > Jira a little bit and didn't see anything for it so hopefully this isn't just > noise on the radar. > Basically, when you alter a table with partitions and then reupload data to > that partition, it doesn't seem to recognize the extra data that actually > exists in HDFS- as in, returns NULL values on the new column despite having > the data and recognizing the new column in the metadata. > Here's some steps to reproduce using a basic table: > 1. Run this hive command: CREATE TABLE jvaughan_test (col1 string) > partitioned by (day string); > 2. Create a simple file on the system with a couple of entries, something > like "hi" and "hi2" separated by newlines. > 3. Run this hive command, pointing it at the file: LOAD DATA LOCAL INPATH > '<FILEDIR>' OVERWRITE INTO TABLE jvaughan_test PARTITION (day = '2014-01-02'); > 4. Confirm the data with: SELECT * FROM jvaughan_test WHERE day = > '2014-01-02'; > 5. Alter the column definitions: ALTER TABLE jvaughan_test REPLACE COLUMNS > (col1 string, col2 string); > 6. Edit your file and add a second column using the default separator > (ctrl+v, then ctrl+a in Vim) and add two more entries, such as "hi3" on the > first row and "hi4" on the second > 7. Run step 3 again > 8. Check the data again like in step 4 > For me, this is the results that get returned: > hive> select * from jvaughan_test where day = '2014-01-01'; > OK > hi NULL 2014-01-02 > hi2 NULL 2014-01-02 > This is despite the fact that there is data in the file stored by the > partition in HDFS. > Let me know if you need any other information. The only workaround for me > currently is to drop partitions for any I'm replacing data in and THEN > reupload the new data file. > Thanks, > -James -- This message was sent by Atlassian JIRA (v6.3.4#6332)