[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

2012-04-12 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253070#comment-13253070
 ] 

Hudson commented on HIVE-2711:
--

Integrated in Hive-trunk-h0.21 #1370 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1370/])
HIVE-2711 : Make the header of RCFile unique (Owen Omalley via Ashutosh 
Chauhan) (Revision 1325442)

 Result = SUCCESS
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1325442
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
* /hive/trunk/ql/src/test/data
* /hive/trunk/ql/src/test/data/rc-file-v0.rc
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java
* 
/hive/trunk/ql/src/test/results/clientpositive/alter_concatenate_indexed_table.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_merge.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_merge_stats.q.out
* /hive/trunk/ql/src/test/results/clientpositive/create_merge_compressed.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ctas.q.out
* /hive/trunk/ql/src/test/results/clientpositive/partition_wise_fileformat.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sample10.q.out


 Make the header of RCFile unique
 

 Key: HIVE-2711
 URL: https://issues.apache.org/jira/browse/HIVE-2711
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.10

 Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, 
 HIVE-2711.D2115.3.patch, HIVE-2711.D2571.1.patch, rc-file-v0.rc


 The RCFile implementation was copied from Hadoop's SequenceFile and copied 
 the 'magic' string in the header. This means that you can't use the header to 
 distinguish between RCFiles and SequenceFiles.
 I'd propose that we create a new header for RCFiles (RCF?) to replace the 
 current SEQ. To maintain compatibility, we'll need to continue to accept the 
 current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

2012-04-05 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247463#comment-13247463
 ] 

Phabricator commented on HIVE-2711:
---

ashutoshc has accepted the revision HIVE-2711 [jira] Make the header of RCFile 
unique.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D2115

BRANCH
  h-2711


 Make the header of RCFile unique
 

 Key: HIVE-2711
 URL: https://issues.apache.org/jira/browse/HIVE-2711
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, 
 HIVE-2711.D2115.3.patch, HIVE-2711.D2571.1.patch


 The RCFile implementation was copied from Hadoop's SequenceFile and copied 
 the 'magic' string in the header. This means that you can't use the header to 
 distinguish between RCFiles and SequenceFiles.
 I'd propose that we create a new header for RCFiles (RCF?) to replace the 
 current SEQ. To maintain compatibility, we'll need to continue to accept the 
 current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

2012-04-02 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244404#comment-13244404
 ] 

Phabricator commented on HIVE-2711:
---

omalley has abandoned the revision HIVE-2711 [jira] Make the header of RCFile 
unique.

  I accidentally got a new revision. This is the same as 2511.

REVISION DETAIL
  https://reviews.facebook.net/D2571


 Make the header of RCFile unique
 

 Key: HIVE-2711
 URL: https://issues.apache.org/jira/browse/HIVE-2711
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2571.1.patch


 The RCFile implementation was copied from Hadoop's SequenceFile and copied 
 the 'magic' string in the header. This means that you can't use the header to 
 distinguish between RCFiles and SequenceFiles.
 I'd propose that we create a new header for RCFiles (RCF?) to replace the 
 current SEQ. To maintain compatibility, we'll need to continue to accept the 
 current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

2012-04-02 Thread Ashutosh Chauhan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244981#comment-13244981
 ] 

Ashutosh Chauhan commented on HIVE-2711:


Patch results failures in TestCliDriver in following queries:

* alter_concatenate_indexed_table.q
* alter_merge.q
* alter_merge_stats.q
* create_merge_compressed.q
* ctas.q
* partition_wise_fileformat.q
* partition_wise_fileformat3.q
* sample10.q

 Make the header of RCFile unique
 

 Key: HIVE-2711
 URL: https://issues.apache.org/jira/browse/HIVE-2711
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, 
 HIVE-2711.D2571.1.patch


 The RCFile implementation was copied from Hadoop's SequenceFile and copied 
 the 'magic' string in the header. This means that you can't use the header to 
 distinguish between RCFiles and SequenceFiles.
 I'd propose that we create a new header for RCFiles (RCF?) to replace the 
 current SEQ. To maintain compatibility, we'll need to continue to accept the 
 current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

2012-04-01 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243910#comment-13243910
 ] 

Phabricator commented on HIVE-2711:
---

omalley has commented on the revision HIVE-2711 [jira] Make the header of 
RCFile unique.

  Ashutosh,

  My point is that RCFile was *always* distinct from Sequence Files. RCFile was 
a fork of Sequence File when the Sequence File version was 6, therefore nothing 
before version 6 can possibly be an RCFile.

  Headers:
Sequence Files: SEQ1, SEQ2, SEQ3, SEQ4, SEQ5, SEQ6
RCFiles: SEQ6, RCF1

  Also note that SEQ5 was last written by Hadoop 0.10 back in Feb 2007, a year 
and a half before Hive was created.

REVISION DETAIL
  https://reviews.facebook.net/D2115


 Make the header of RCFile unique
 

 Key: HIVE-2711
 URL: https://issues.apache.org/jira/browse/HIVE-2711
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-2711.D2115.1.patch


 The RCFile implementation was copied from Hadoop's SequenceFile and copied 
 the 'magic' string in the header. This means that you can't use the header to 
 distinguish between RCFiles and SequenceFiles.
 I'd propose that we create a new header for RCFiles (RCF?) to replace the 
 current SEQ. To maintain compatibility, we'll need to continue to accept the 
 current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

2012-04-01 Thread Ashutosh Chauhan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243918#comment-13243918
 ] 

Ashutosh Chauhan commented on HIVE-2711:


I see. Yeah, very first commit of RCFile 
http://svn.apache.org/viewvc/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java?view=markuppathrev=770548
 started with SEQ6 so there could possibly be no data written in RCFile format 
with version SEQ5 or earlier. So, backward compatibility with SEQ6 suffices. 
So, +1 will commit if tests pass.

 Make the header of RCFile unique
 

 Key: HIVE-2711
 URL: https://issues.apache.org/jira/browse/HIVE-2711
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-2711.D2115.1.patch


 The RCFile implementation was copied from Hadoop's SequenceFile and copied 
 the 'magic' string in the header. This means that you can't use the header to 
 distinguish between RCFiles and SequenceFiles.
 I'd propose that we create a new header for RCFiles (RCF?) to replace the 
 current SEQ. To maintain compatibility, we'll need to continue to accept the 
 current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

2012-04-01 Thread Ashutosh Chauhan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243920#comment-13243920
 ] 

Ashutosh Chauhan commented on HIVE-2711:


Patch fails to apply. Needs to be rebased.

 Make the header of RCFile unique
 

 Key: HIVE-2711
 URL: https://issues.apache.org/jira/browse/HIVE-2711
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-2711.D2115.1.patch


 The RCFile implementation was copied from Hadoop's SequenceFile and copied 
 the 'magic' string in the header. This means that you can't use the header to 
 distinguish between RCFiles and SequenceFiles.
 I'd propose that we create a new header for RCFiles (RCF?) to replace the 
 current SEQ. To maintain compatibility, we'll need to continue to accept the 
 current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

2012-03-30 Thread Ashutosh Chauhan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242930#comment-13242930
 ] 

Ashutosh Chauhan commented on HIVE-2711:


That makes sense. I will take a look.

 Make the header of RCFile unique
 

 Key: HIVE-2711
 URL: https://issues.apache.org/jira/browse/HIVE-2711
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-2711.D2115.1.patch


 The RCFile implementation was copied from Hadoop's SequenceFile and copied 
 the 'magic' string in the header. This means that you can't use the header to 
 distinguish between RCFiles and SequenceFiles.
 I'd propose that we create a new header for RCFiles (RCF?) to replace the 
 current SEQ. To maintain compatibility, we'll need to continue to accept the 
 current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

2012-03-30 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242932#comment-13242932
 ] 

Phabricator commented on HIVE-2711:
---

ashutoshc has commented on the revision HIVE-2711 [jira] Make the header of 
RCFile unique.

  It seems like you are backward compatible with SEQ6 but not for anything 
before that. If the intent is to break backward compatibility, then I think we 
should send an email on both dev and user list about this change, since folks 
might have historical data in this format, which can't be read then.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1205 By doing this, you 
are effectively dropping the ability to read data before version SEQ6. This is 
backward incompatible change.
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1224 By removing this, 
you will not be able to read data before version SEQ2.
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1252 This removes 
ability to read before version 6.
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1256 This removes 
ability to read before SEQ1
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1357 This removes 
ability to read before SEQ1

REVISION DETAIL
  https://reviews.facebook.net/D2115


 Make the header of RCFile unique
 

 Key: HIVE-2711
 URL: https://issues.apache.org/jira/browse/HIVE-2711
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-2711.D2115.1.patch


 The RCFile implementation was copied from Hadoop's SequenceFile and copied 
 the 'magic' string in the header. This means that you can't use the header to 
 distinguish between RCFiles and SequenceFiles.
 I'd propose that we create a new header for RCFiles (RCF?) to replace the 
 current SEQ. To maintain compatibility, we'll need to continue to accept the 
 current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

2012-03-23 Thread Ashutosh Chauhan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237371#comment-13237371
 ] 

Ashutosh Chauhan commented on HIVE-2711:


@Owen,
 I think original design of RCFile was done with compatibility with Sequence 
File in mind. This patch will break that. Whats the advantage of this change?

 Make the header of RCFile unique
 

 Key: HIVE-2711
 URL: https://issues.apache.org/jira/browse/HIVE-2711
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-2711.D2115.1.patch


 The RCFile implementation was copied from Hadoop's SequenceFile and copied 
 the 'magic' string in the header. This means that you can't use the header to 
 distinguish between RCFiles and SequenceFiles.
 I'd propose that we create a new header for RCFiles (RCF?) to replace the 
 current SEQ. To maintain compatibility, we'll need to continue to accept the 
 current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira