[jira] [Commented] (HIVE-2711) Make the header of RCFile unique
[ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253070#comment-13253070 ] Hudson commented on HIVE-2711: -- Integrated in Hive-trunk-h0.21 #1370 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1370/]) HIVE-2711 : Make the header of RCFile unique (Owen Omalley via Ashutosh Chauhan) (Revision 1325442) Result = SUCCESS hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1325442 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java * /hive/trunk/ql/src/test/data * /hive/trunk/ql/src/test/data/rc-file-v0.rc * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java * /hive/trunk/ql/src/test/results/clientpositive/alter_concatenate_indexed_table.q.out * /hive/trunk/ql/src/test/results/clientpositive/alter_merge.q.out * /hive/trunk/ql/src/test/results/clientpositive/alter_merge_stats.q.out * /hive/trunk/ql/src/test/results/clientpositive/create_merge_compressed.q.out * /hive/trunk/ql/src/test/results/clientpositive/ctas.q.out * /hive/trunk/ql/src/test/results/clientpositive/partition_wise_fileformat.q.out * /hive/trunk/ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out * /hive/trunk/ql/src/test/results/clientpositive/sample10.q.out Make the header of RCFile unique Key: HIVE-2711 URL: https://issues.apache.org/jira/browse/HIVE-2711 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.10 Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2115.3.patch, HIVE-2711.D2571.1.patch, rc-file-v0.rc The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles. I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2711) Make the header of RCFile unique
[ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247463#comment-13247463 ] Phabricator commented on HIVE-2711: --- ashutoshc has accepted the revision HIVE-2711 [jira] Make the header of RCFile unique. +1 REVISION DETAIL https://reviews.facebook.net/D2115 BRANCH h-2711 Make the header of RCFile unique Key: HIVE-2711 URL: https://issues.apache.org/jira/browse/HIVE-2711 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2115.3.patch, HIVE-2711.D2571.1.patch The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles. I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2711) Make the header of RCFile unique
[ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244404#comment-13244404 ] Phabricator commented on HIVE-2711: --- omalley has abandoned the revision HIVE-2711 [jira] Make the header of RCFile unique. I accidentally got a new revision. This is the same as 2511. REVISION DETAIL https://reviews.facebook.net/D2571 Make the header of RCFile unique Key: HIVE-2711 URL: https://issues.apache.org/jira/browse/HIVE-2711 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2571.1.patch The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles. I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2711) Make the header of RCFile unique
[ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244981#comment-13244981 ] Ashutosh Chauhan commented on HIVE-2711: Patch results failures in TestCliDriver in following queries: * alter_concatenate_indexed_table.q * alter_merge.q * alter_merge_stats.q * create_merge_compressed.q * ctas.q * partition_wise_fileformat.q * partition_wise_fileformat3.q * sample10.q Make the header of RCFile unique Key: HIVE-2711 URL: https://issues.apache.org/jira/browse/HIVE-2711 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2571.1.patch The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles. I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2711) Make the header of RCFile unique
[ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243910#comment-13243910 ] Phabricator commented on HIVE-2711: --- omalley has commented on the revision HIVE-2711 [jira] Make the header of RCFile unique. Ashutosh, My point is that RCFile was *always* distinct from Sequence Files. RCFile was a fork of Sequence File when the Sequence File version was 6, therefore nothing before version 6 can possibly be an RCFile. Headers: Sequence Files: SEQ1, SEQ2, SEQ3, SEQ4, SEQ5, SEQ6 RCFiles: SEQ6, RCF1 Also note that SEQ5 was last written by Hadoop 0.10 back in Feb 2007, a year and a half before Hive was created. REVISION DETAIL https://reviews.facebook.net/D2115 Make the header of RCFile unique Key: HIVE-2711 URL: https://issues.apache.org/jira/browse/HIVE-2711 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-2711.D2115.1.patch The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles. I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2711) Make the header of RCFile unique
[ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243918#comment-13243918 ] Ashutosh Chauhan commented on HIVE-2711: I see. Yeah, very first commit of RCFile http://svn.apache.org/viewvc/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java?view=markuppathrev=770548 started with SEQ6 so there could possibly be no data written in RCFile format with version SEQ5 or earlier. So, backward compatibility with SEQ6 suffices. So, +1 will commit if tests pass. Make the header of RCFile unique Key: HIVE-2711 URL: https://issues.apache.org/jira/browse/HIVE-2711 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-2711.D2115.1.patch The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles. I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2711) Make the header of RCFile unique
[ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243920#comment-13243920 ] Ashutosh Chauhan commented on HIVE-2711: Patch fails to apply. Needs to be rebased. Make the header of RCFile unique Key: HIVE-2711 URL: https://issues.apache.org/jira/browse/HIVE-2711 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-2711.D2115.1.patch The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles. I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2711) Make the header of RCFile unique
[ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242930#comment-13242930 ] Ashutosh Chauhan commented on HIVE-2711: That makes sense. I will take a look. Make the header of RCFile unique Key: HIVE-2711 URL: https://issues.apache.org/jira/browse/HIVE-2711 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-2711.D2115.1.patch The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles. I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2711) Make the header of RCFile unique
[ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242932#comment-13242932 ] Phabricator commented on HIVE-2711: --- ashutoshc has commented on the revision HIVE-2711 [jira] Make the header of RCFile unique. It seems like you are backward compatible with SEQ6 but not for anything before that. If the intent is to break backward compatibility, then I think we should send an email on both dev and user list about this change, since folks might have historical data in this format, which can't be read then. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1205 By doing this, you are effectively dropping the ability to read data before version SEQ6. This is backward incompatible change. ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1224 By removing this, you will not be able to read data before version SEQ2. ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1252 This removes ability to read before version 6. ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1256 This removes ability to read before SEQ1 ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1357 This removes ability to read before SEQ1 REVISION DETAIL https://reviews.facebook.net/D2115 Make the header of RCFile unique Key: HIVE-2711 URL: https://issues.apache.org/jira/browse/HIVE-2711 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-2711.D2115.1.patch The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles. I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2711) Make the header of RCFile unique
[ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237371#comment-13237371 ] Ashutosh Chauhan commented on HIVE-2711: @Owen, I think original design of RCFile was done with compatibility with Sequence File in mind. This patch will break that. Whats the advantage of this change? Make the header of RCFile unique Key: HIVE-2711 URL: https://issues.apache.org/jira/browse/HIVE-2711 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-2711.D2115.1.patch The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles. I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira