[jira] [Commented] (HIVE-1950) Block merge for RCFile

2014-10-17 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174764#comment-14174764
 ] 

Lefty Leverenz commented on HIVE-1950:
--

Doc note:  [~prasanth_j] documented this in the wiki here:

* [DDL -- Alter Table/Partition Concatenate | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate]

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.8.0

 Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
 HIVE-1950.4.patch, HIVE-1950.5.patch, HIVE-1950.6.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-23 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998583#comment-12998583
 ] 

He Yongqiang commented on HIVE-1950:


it's a typo, and i fixed in the new patch. 

HIVE_STATS_ATOMIC is an existing conf for stats.

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
 HIVE-1950.4.patch, HIVE-1950.5.patch, HIVE-1950.6.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-14 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994610#comment-12994610
 ] 

Ning Zhang commented on HIVE-1950:
--

Yongqiang, I'm still reviewing the new patch (.4) but found some of my comments 
are not address (e.g., QTestUtil). Can you elaborate which comments have been 
addressed and which are not (and the reasons)?

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
 HIVE-1950.4.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-11 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993806#comment-12993806
 ] 

Ning Zhang commented on HIVE-1950:
--

Yongqiang, does the review board have the latest patch?

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-11 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993852#comment-12993852
 ] 

Ning Zhang commented on HIVE-1950:
--

Yongqiang, the patch doesn't compile. Below are some initial reviews from me:

QTestUtil.java: 
 334: you may want to add those index tables that you want to keep in 
srcTables. Otherewise indexes that are created inside a test will not be 
cleaned -- side-effect. 

StatsTask:
 a StatsTask is added in DDLSemanticAnalyzer for the mege task but why set it 
to do nothing? 

ExecDriver:
 jobExecHelper is constructed in both the constructors and initialize(). Is 
there a reason?

 checkFatalError: why removed some code?

 Why remove METASTOREPWD?

DDLTask:
 move semantics checking (index  archive checking etc.) to 
DDLSemanticAnalyzer. Execution time should only raise exception if there are 
runtime exceptions. In another word, explain plan of the query shoull throw an 
exception if there are indexes or table is archived. 

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-09 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992674#comment-12992674
 ] 

Namit Jain commented on HIVE-1950:
--

1. Can you change merge_files to concatenate ?
   alter table T concatenate;

2. Move RCFile check to SemanticAnalyzer from runtime.

3. More comments: DDLTask.java/mergeFiles
   RCFile: all the new functions etc.


 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-08 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992112#comment-12992112
 ] 

He Yongqiang commented on HIVE-1950:


review comments from internal review:
1) if the stats present, try to correct it
2) jobClose of RCFileMergeMapper should share the code in FileSinkOperator
3) move the original data to a dump loc first
4) remove getRecordWriter() and RCFileBlockMergeOutputFormat
5) ioCxt for input file changed
6) disable merge for archived table/partition and bucketized table/partition
7) comments
8) negative tests for hiveinputformat



 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-08 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992227#comment-12992227
 ] 

Ning Zhang commented on HIVE-1950:
--

As discussed offline, this patch should be able to handle stats update 
(creating a StatsTask as child). 

Also please keep in mind that the design and implementation of the new 
MergeTask should be easy to be used in the merge process in INSERT OVERWRITE. 

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-03 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990389#comment-12990389
 ] 

He Yongqiang commented on HIVE-1950:


review board:
https://reviews.apache.org/r/388/


 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira