[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1307: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Finally -- Committed. Thanks Ning > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, > HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures
[ https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1578: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Committed. Thanks Paul. > Add conf. property hive.exec.show.job.failure.debug.info to enable/disable > displaying link to the task with most failures > - > > Key: HIVE-1578 > URL: https://issues.apache.org/jira/browse/HIVE-1578 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Paul Yang > Fix For: 0.7.0 > > Attachments: HIVE-1578.1.patch > > > If a job fails, Hive currently displays a link to the task with the most > number of failures for easy access to the error logs. However, generating the > link may require many RPC's to get all the task completion events, adding a > delay of up to 30 minutes. This patch adds a configuration variable to > control whether the link is generated. Turning off this feature would also > disable automatic debugging tips generated by heuristics reading from the > error logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Attachment: HIVE-1307.9.patch sigh, hopefully this is the last patch. I'm finishing some conflict in bucketmapjoin[1-3].q.out in 0.17. will run 0.17 again. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, > HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures
[ https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900948#action_12900948 ] Carl Steinbach commented on HIVE-1578: -- It would probably also be a good idea to add some INFO level logging statements to the loop in showJobDebugFailInfo() so that the user can roughy gauge the rate of progress. > Add conf. property hive.exec.show.job.failure.debug.info to enable/disable > displaying link to the task with most failures > - > > Key: HIVE-1578 > URL: https://issues.apache.org/jira/browse/HIVE-1578 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Paul Yang > Fix For: 0.7.0 > > Attachments: HIVE-1578.1.patch > > > If a job fails, Hive currently displays a link to the task with the most > number of failures for easy access to the error logs. However, generating the > link may require many RPC's to get all the task completion events, adding a > delay of up to 30 minutes. This patch adds a configuration variable to > control whether the link is generated. Turning off this feature would also > disable automatic debugging tips generated by heuristics reading from the > error logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures
[ https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900946#action_12900946 ] Carl Steinbach commented on HIVE-1578: -- Hi Paul, before calling showJobDebugFailInfo() can you please print out a message telling the user that you're going to do this, that it may take a long time to complete, and that this feature can be disabled by setting the conf property {{hive.exec.show.job.failure.debug.info}}? > Add conf. property hive.exec.show.job.failure.debug.info to enable/disable > displaying link to the task with most failures > - > > Key: HIVE-1578 > URL: https://issues.apache.org/jira/browse/HIVE-1578 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Paul Yang > Fix For: 0.7.0 > > Attachments: HIVE-1578.1.patch > > > If a job fails, Hive currently displays a link to the task with the most > number of failures for easy access to the error logs. However, generating the > link may require many RPC's to get all the task completion events, adding a > delay of up to 30 minutes. This patch adds a configuration variable to > control whether the link is generated. Turning off this feature would also > disable automatic debugging tips generated by heuristics reading from the > error logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures
[ https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1578: - Summary: Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures (was: Add conf. variable for displaying link to the task with most failures) > Add conf. property hive.exec.show.job.failure.debug.info to enable/disable > displaying link to the task with most failures > - > > Key: HIVE-1578 > URL: https://issues.apache.org/jira/browse/HIVE-1578 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Paul Yang > Fix For: 0.7.0 > > Attachments: HIVE-1578.1.patch > > > If a job fails, Hive currently displays a link to the task with the most > number of failures for easy access to the error logs. However, generating the > link may require many RPC's to get all the task completion events, adding a > delay of up to 30 minutes. This patch adds a configuration variable to > control whether the link is generated. Turning off this feature would also > disable automatic debugging tips generated by heuristics reading from the > error logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Status: Patch Available (was: Open) > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.patch, > HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-741) NULL is not handled correctly in join
[ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900941#action_12900941 ] Ning Zhang commented on HIVE-741: - The SMB test case still has a minor issue: the tables was created as 2 buckets but there is only 1 file in the tables. This is conflicting to the table schema. If a table is defined as bucketd 2, there should be 2 files in the partition or table. They SMB join takes the 1st file in T1 join the 1st file in T2, and 2nd file in T1 join 2nd file in T2. So the test case should cover this use case. > NULL is not handled correctly in join > - > > Key: HIVE-741 > URL: https://issues.apache.org/jira/browse/HIVE-741 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Amareshwari Sriramadasu > Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, > patch-741.txt, smbjoin_nulls.q.txt > > > With the following data in table input4_cb: > KeyValue > -- > NULL 325 > 18 NULL > The following query: > {code} > select * from input4_cb a join input4_cb b on a.key = b.value; > {code} > returns the following result: > NULL32518 NULL > The correct result should be empty set. > When 'null' is replaced by '' it works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900940#action_12900940 ] Ning Zhang commented on HIVE-1510: -- it does't fail on trunk but caused by parallel test. HIVE-1576 was filed for this. Will tes again and commit once HIVE-1307 is committed. > HiveCombineInputFormat should not use prefix matching to find the > partitionDesc for a given path > > > Key: HIVE-1510 > URL: https://issues.apache.org/jira/browse/HIVE-1510 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: hive-1510.1.patch, hive-1510.3.patch, hive-1510.4.patch > > > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > drop table combine_3_srcpart_seq_rc; > create table combine_3_srcpart_seq_rc (key int , value string) partitioned by > (ds string, hr string) stored as sequencefile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="00") select * from src; > alter table combine_3_srcpart_seq_rc set fileformat rcfile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="001") select * from src; > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00"); > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001"); > select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key; > drop table combine_3_srcpart_seq_rc; > will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[DISCUSS] Hive as TLP
The Hive subproject has voted to become a TLP http://bit.ly/9nb4nN Does the Hadoop community have any questions or concerns on this? I will be calling a more formal vote after this discussion. The Hive dev community is still dominated by Facebook but the community is working hard to diversify the base and hopes to add committers from Yahoo and Cloudera. We anticipate that we will have a more diversified base by the end of the year modulo contributions from developers at these entities - and there are a fair bit in the pipeline. Thanks, Ashish
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Attachment: HIVE-1307.8.patch Uploading HIVE-1307.8.patch which clean up the TestParse in 0.17. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.patch, > HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1578) Add conf. variable for displaying link to the task with most failures
[ https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900927#action_12900927 ] Namit Jain commented on HIVE-1578: -- +1 > Add conf. variable for displaying link to the task with most failures > - > > Key: HIVE-1578 > URL: https://issues.apache.org/jira/browse/HIVE-1578 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Paul Yang > Fix For: 0.7.0 > > Attachments: HIVE-1578.1.patch > > > If a job fails, Hive currently displays a link to the task with the most > number of failures for easy access to the error logs. However, generating the > link may require many RPC's to get all the task completion events, adding a > delay of up to 30 minutes. This patch adds a configuration variable to > control whether the link is generated. Turning off this feature would also > disable automatic debugging tips generated by heuristics reading from the > error logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900926#action_12900926 ] Ning Zhang commented on HIVE-1307: -- Ok. I thought only these 3 .q files are failing on 0.17. I'm rerunning TestParse. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, > HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1578) Add conf. variable for displaying link to the task with most failures
[ https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1578: Description: If a job fails, Hive currently displays a link to the task with the most number of failures for easy access to the error logs. However, generating the link may require many RPC's to get all the task completion events, adding a delay of up to 30 minutes. This patch adds a configuration variable to control whether the link is generated. Turning off this feature would also disable automatic debugging tips generated by heuristics reading from the error logs. (was: If a job fails, Hive currently displays a link to the task with the most number of failures, for easy access to the error logs. However, generating the link may require many RPC calls to get all the task completion events, adding a delay of up to 30 minutes. This patch adds a configuration variable to control whether the link is generated. Turning off this feature would also disable automatic debugging tips generated by heuristics reading from the error logs.) > Add conf. variable for displaying link to the task with most failures > - > > Key: HIVE-1578 > URL: https://issues.apache.org/jira/browse/HIVE-1578 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Paul Yang > Fix For: 0.7.0 > > Attachments: HIVE-1578.1.patch > > > If a job fails, Hive currently displays a link to the task with the most > number of failures for easy access to the error logs. However, generating the > link may require many RPC's to get all the task completion events, adding a > delay of up to 30 minutes. This patch adds a configuration variable to > control whether the link is generated. Turning off this feature would also > disable automatic debugging tips generated by heuristics reading from the > error logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1578) Add conf. variable for displaying link to the task with most failures
[ https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1578: Status: Patch Available (was: Open) > Add conf. variable for displaying link to the task with most failures > - > > Key: HIVE-1578 > URL: https://issues.apache.org/jira/browse/HIVE-1578 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Paul Yang > Fix For: 0.7.0 > > Attachments: HIVE-1578.1.patch > > > If a job fails, Hive currently displays a link to the task with the most > number of failures for easy access to the error logs. However, generating the > link may require many RPC's to get all the task completion events, adding a > delay of up to 30 minutes. This patch adds a configuration variable to > control whether the link is generated. Turning off this feature would also > disable automatic debugging tips generated by heuristics reading from the > error logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1578) Add conf. variable for displaying link to the task with most failures
[ https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1578: Attachment: HIVE-1578.1.patch > Add conf. variable for displaying link to the task with most failures > - > > Key: HIVE-1578 > URL: https://issues.apache.org/jira/browse/HIVE-1578 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Paul Yang > Fix For: 0.7.0 > > Attachments: HIVE-1578.1.patch > > > If a job fails, Hive currently displays a link to the task with the most > number of failures, for easy access to the error logs. However, generating > the link may require many RPC calls to get all the task completion events, > adding a delay of up to 30 minutes. This patch adds a configuration variable > to control whether the link is generated. Turning off this feature would also > disable automatic debugging tips generated by heuristics reading from the > error logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1578) Add conf. variable for displaying link to the task with most failures
Add conf. variable for displaying link to the task with most failures - Key: HIVE-1578 URL: https://issues.apache.org/jira/browse/HIVE-1578 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Paul Yang Assignee: Paul Yang Fix For: 0.7.0 If a job fails, Hive currently displays a link to the task with the most number of failures, for easy access to the error logs. However, generating the link may require many RPC calls to get all the task completion events, adding a delay of up to 30 minutes. This patch adds a configuration variable to control whether the link is generated. Turning off this feature would also disable automatic debugging tips generated by heuristics reading from the error logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900906#action_12900906 ] Namit Jain commented on HIVE-1307: -- ant test -Dtestcase=TestParse -Doffline=true -Dhadoop.version="0.17.2.1" I am still getting a lot of diffs for the above. Is it running > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, > HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900906#action_12900906 ] Namit Jain edited comment on HIVE-1307 at 8/20/10 7:04 PM: --- ant test -Dtestcase=TestParse -Doffline=true -Dhadoop.version="0.17.2.1" I am still getting a lot of diffs for the above. Is it running OK for you ? was (Author: namit): ant test -Dtestcase=TestParse -Doffline=true -Dhadoop.version="0.17.2.1" I am still getting a lot of diffs for the above. Is it running > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, > HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path
[ https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900904#action_12900904 ] He Yongqiang commented on HIVE-1510: even without this patch, the 0.17 test failed on index_compat3.q. Please file a separate jira for this issue. > HiveCombineInputFormat should not use prefix matching to find the > partitionDesc for a given path > > > Key: HIVE-1510 > URL: https://issues.apache.org/jira/browse/HIVE-1510 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: hive-1510.1.patch, hive-1510.3.patch, hive-1510.4.patch > > > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > drop table combine_3_srcpart_seq_rc; > create table combine_3_srcpart_seq_rc (key int , value string) partitioned by > (ds string, hr string) stored as sequencefile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="00") select * from src; > alter table combine_3_srcpart_seq_rc set fileformat rcfile; > insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", > hr="001") select * from src; > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00"); > desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001"); > select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key; > drop table combine_3_srcpart_seq_rc; > will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Attachment: HIVE-1307.7.patch Uploading HIVE-1307.7.patch. The only differences from the last on is the log change in input[1-3].q.xml in 0.17 and input[2-3].q.xml in 0.20. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, > HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900884#action_12900884 ] Ning Zhang commented on HIVE-1307: -- Will regenerate the patch. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1307: - Status: Open (was: Patch Available) > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [DISCUSSION] Move to become a TLP
I'm not qualified to vote on this, but as a fan and user I'm curious to hear what, if any, disadvantages there are of becoming a TLP. On Fri, 20 Aug 2010 13:20:39 -0700, Edward Capriolo wrote: I am +1 as well. On Fri, Aug 20, 2010 at 1:29 PM, Ashish Thusoo wrote: Thanks everyone who voted. Looks like this is unanimous at this point. I will start the proceedings in the Hadoop PMC to make Hive a TLP. Ashish -Original Message- From: Paul Yang [mailto:py...@facebook.com] Sent: Thursday, August 19, 2010 4:05 PM To: hive-dev@hadoop.apache.org Subject: RE: [DISCUSSION] Move to become a TLP +1 -Original Message- From: Joydeep Sen Sarma [mailto:jssa...@facebook.com] Sent: Thursday, August 19, 2010 3:30 PM To: hive-dev@hadoop.apache.org Subject: RE: [DISCUSSION] Move to become a TLP +1 -Original Message- From: Carl Steinbach [mailto:c...@cloudera.com] Sent: Thursday, August 19, 2010 3:18 PM To: hive-dev@hadoop.apache.org Subject: Re: [DISCUSSION] Move to become a TLP +1 On Thu, Aug 19, 2010 at 3:15 PM, Ning Zhang wrote: +1 as well. On Aug 19, 2010, at 3:06 PM, Zheng Shao wrote: > +1. > > Zheng > > On Mon, Aug 16, 2010 at 11:58 AM, John Sichi wrote: >> +1 from me. The momentum on cross-company collaboration we're >> +seeing now, plus big integration contributions such as the new storage handlers (HyperTable and Cassandra), are all signs that Hive is growing up fast. >> >> HBase recently took the same route, so I'm going to have a chat >> with Jonathan Gray to find out what that involved for them. >> >> JVS >> >> On Aug 14, 2010, at 4:42 PM, Jeff Hammerbacher wrote: >> >>> Yes, I think Hive is ready to become a TLP. >>> >>> On Fri, Aug 13, 2010 at 1:36 PM, Ashish Thusoo >>> wrote: >>> Nice one Ed... Folks, Please chime in. I think we should close this out next week one way or the other. We can consider this a vote at this point, so please vote on this issue. Thanks, Ashish -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, August 12, 2010 8:05 AM To: hive-dev@hadoop.apache.org Subject: Re: [DISCUSSION] Move to become a TLP On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo wrote: > Folks, > > This question has come up in the PMC once again and would be > great to hear once more on this topic. What do people think? Are we ready to become a TLP? > > Thanks, > Ashish I thought of one more benefit. We can rename our packages from org.apache.hadoop.hive.* to org.apache.hive.* :) >> >> > > > > -- > Yours, > Zheng > http://www.linkedin.com/in/zshao
[jira] Commented: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900866#action_12900866 ] Namit Jain commented on HIVE-1307: -- TestParse is failing on both 17 and 20. On 17, the following tests are failing in 17: bucketmapjoin1.q bucketmapjoin2.q bucketmapjoin3.q All of them are log file updates - can you fix the log files and submit a new patch ? > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1577) Add configuration property hive.exec.local.scratchdir
Add configuration property hive.exec.local.scratchdir - Key: HIVE-1577 URL: https://issues.apache.org/jira/browse/HIVE-1577 Project: Hadoop Hive Issue Type: New Feature Components: Configuration Reporter: Carl Steinbach When Hive is run in local mode it uses the hardcoded local directory {{/${java.io.tmpdir}/${user.name}}} for temporary files. This path should be configurable via the property {{hive.exec.local.scratchdir}}. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [DISCUSSION] Move to become a TLP
I am +1 as well. On Fri, Aug 20, 2010 at 1:29 PM, Ashish Thusoo wrote: > Thanks everyone who voted. Looks like this is unanimous at this point. I will > start the proceedings in the Hadoop PMC to make Hive a TLP. > > Ashish > > -Original Message- > From: Paul Yang [mailto:py...@facebook.com] > Sent: Thursday, August 19, 2010 4:05 PM > To: hive-dev@hadoop.apache.org > Subject: RE: [DISCUSSION] Move to become a TLP > > +1 > > -Original Message- > From: Joydeep Sen Sarma [mailto:jssa...@facebook.com] > Sent: Thursday, August 19, 2010 3:30 PM > To: hive-dev@hadoop.apache.org > Subject: RE: [DISCUSSION] Move to become a TLP > > +1 > > -Original Message- > From: Carl Steinbach [mailto:c...@cloudera.com] > Sent: Thursday, August 19, 2010 3:18 PM > To: hive-dev@hadoop.apache.org > Subject: Re: [DISCUSSION] Move to become a TLP > > +1 > > On Thu, Aug 19, 2010 at 3:15 PM, Ning Zhang wrote: > >> +1 as well. >> >> On Aug 19, 2010, at 3:06 PM, Zheng Shao wrote: >> >> > +1. >> > >> > Zheng >> > >> > On Mon, Aug 16, 2010 at 11:58 AM, John Sichi >> wrote: >> >> +1 from me. The momentum on cross-company collaboration we're >> >> +seeing >> now, plus big integration contributions such as the new storage >> handlers (HyperTable and Cassandra), are all signs that Hive is growing up >> fast. >> >> >> >> HBase recently took the same route, so I'm going to have a chat >> >> with >> Jonathan Gray to find out what that involved for them. >> >> >> >> JVS >> >> >> >> On Aug 14, 2010, at 4:42 PM, Jeff Hammerbacher wrote: >> >> >> >>> Yes, I think Hive is ready to become a TLP. >> >>> >> >>> On Fri, Aug 13, 2010 at 1:36 PM, Ashish Thusoo >> >>> >> wrote: >> >>> >> Nice one Ed... >> >> Folks, >> >> Please chime in. I think we should close this out next week one >> way or >> the >> other. We can consider this a vote at this point, so please vote >> on >> this >> issue. >> >> Thanks, >> Ashish >> >> -Original Message- >> From: Edward Capriolo [mailto:edlinuxg...@gmail.com] >> Sent: Thursday, August 12, 2010 8:05 AM >> To: hive-dev@hadoop.apache.org >> Subject: Re: [DISCUSSION] Move to become a TLP >> >> On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo >> >> wrote: >> > Folks, >> > >> > This question has come up in the PMC once again and would be >> > great to >> hear once more on this topic. What do people think? Are we ready >> to >> become a >> TLP? >> > >> > Thanks, >> > Ashish >> >> I thought of one more benefit. We can rename our packages from >> >> org.apache.hadoop.hive.* >> to >> org.apache.hive.* >> >> :) >> >> >> >> >> >> > >> > >> > >> > -- >> > Yours, >> > Zheng >> > http://www.linkedin.com/in/zshao >> >> >
[jira] Created: (HIVE-1576) index_compact*.q should not share common result file
index_compact*.q should not share common result file Key: HIVE-1576 URL: https://issues.apache.org/jira/browse/HIVE-1576 Project: Hadoop Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: He Yongqiang some index output of index_compact*.q share the same file name (e.g., /tmp/index_test_index_result). This causes parallel test breaks intermittently. Ideally they should output to the local warehouse directory where parallel tests won't conflict. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1512) Need to get hive_hbase-handler to work with hbase versions 0.20.4 0.20.5 and cloudera CDH3 version
[ https://issues.apache.org/jira/browse/HIVE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Basab Maulik updated HIVE-1512: --- Attachment: HIVE-1512.3.patch Thanks John. This is a small change to the patch, fixes a potential NPE. Also, HBase 0.89.x introduces an additional runtime dependency for the tests, guava-r05.jar, I think the Google collections library jar. > Need to get hive_hbase-handler to work with hbase versions 0.20.4 0.20.5 and > cloudera CDH3 version > --- > > Key: HIVE-1512 > URL: https://issues.apache.org/jira/browse/HIVE-1512 > Project: Hadoop Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 0.7.0 >Reporter: Jimmy Hu >Assignee: Basab Maulik > Fix For: 0.7.0 > > Attachments: HIVE-1512.2.patch, HIVE-1512.3.patch, HIVE-1512.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > the current trunk hive_hbase-handler only works with hbase 0.20.3, we need > to get it to work with hbase versions 0.20.4 0.20.5 and cloudera CDH3 version -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900817#action_12900817 ] Ning Zhang commented on HIVE-1307: -- all tests on 0.17 and 0.20 passed. There is an intermittent diff in index_compact_2.q on 0.20 in parallel test. When I run it individually it succeeded. Not sure if it is due to parallel testing. Will run 0.20 sequentially again. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900812#action_12900812 ] Namit Jain commented on HIVE-1307: -- The patch applied cleanly > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Attachment: HIVE-1307.6.patch Uploading HIVE-1307.6.patch which applies cleanly with the current trunk. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900788#action_12900788 ] Namit Jain commented on HIVE-1307: -- The patch does not apply cleanly - can you regenerate > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900786#action_12900786 ] Namit Jain commented on HIVE-1307: -- will start testing and reviewing again > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [DISCUSSION] Move to become a TLP
Thanks everyone who voted. Looks like this is unanimous at this point. I will start the proceedings in the Hadoop PMC to make Hive a TLP. Ashish -Original Message- From: Paul Yang [mailto:py...@facebook.com] Sent: Thursday, August 19, 2010 4:05 PM To: hive-dev@hadoop.apache.org Subject: RE: [DISCUSSION] Move to become a TLP +1 -Original Message- From: Joydeep Sen Sarma [mailto:jssa...@facebook.com] Sent: Thursday, August 19, 2010 3:30 PM To: hive-dev@hadoop.apache.org Subject: RE: [DISCUSSION] Move to become a TLP +1 -Original Message- From: Carl Steinbach [mailto:c...@cloudera.com] Sent: Thursday, August 19, 2010 3:18 PM To: hive-dev@hadoop.apache.org Subject: Re: [DISCUSSION] Move to become a TLP +1 On Thu, Aug 19, 2010 at 3:15 PM, Ning Zhang wrote: > +1 as well. > > On Aug 19, 2010, at 3:06 PM, Zheng Shao wrote: > > > +1. > > > > Zheng > > > > On Mon, Aug 16, 2010 at 11:58 AM, John Sichi > wrote: > >> +1 from me. The momentum on cross-company collaboration we're > >> +seeing > now, plus big integration contributions such as the new storage > handlers (HyperTable and Cassandra), are all signs that Hive is growing up > fast. > >> > >> HBase recently took the same route, so I'm going to have a chat > >> with > Jonathan Gray to find out what that involved for them. > >> > >> JVS > >> > >> On Aug 14, 2010, at 4:42 PM, Jeff Hammerbacher wrote: > >> > >>> Yes, I think Hive is ready to become a TLP. > >>> > >>> On Fri, Aug 13, 2010 at 1:36 PM, Ashish Thusoo > >>> > wrote: > >>> > Nice one Ed... > > Folks, > > Please chime in. I think we should close this out next week one > way or > the > other. We can consider this a vote at this point, so please vote > on > this > issue. > > Thanks, > Ashish > > -Original Message- > From: Edward Capriolo [mailto:edlinuxg...@gmail.com] > Sent: Thursday, August 12, 2010 8:05 AM > To: hive-dev@hadoop.apache.org > Subject: Re: [DISCUSSION] Move to become a TLP > > On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo > > wrote: > > Folks, > > > > This question has come up in the PMC once again and would be > > great to > hear once more on this topic. What do people think? Are we ready > to > become a > TLP? > > > > Thanks, > > Ashish > > I thought of one more benefit. We can rename our packages from > > org.apache.hadoop.hive.* > to > org.apache.hive.* > > :) > > >> > >> > > > > > > > > -- > > Yours, > > Zheng > > http://www.linkedin.com/in/zshao > >
[jira] Commented: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900773#action_12900773 ] Ning Zhang commented on HIVE-1307: -- OK, 0.17 tests passed. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Status: Patch Available (was: Open) > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1307: - Attachment: HIVE-1307.5.patch Uploading HIVE-1307.5.patch which should solves the 0.17 issue. I'm runing 0.17 test now. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, > HIVE-1307.patch, HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1505) Support non-UTF8 data
[ https://issues.apache.org/jira/browse/HIVE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900697#action_12900697 ] Edward Capriolo commented on HIVE-1505: --- Maybe you should fork hive and call it chive. On a serious node . Great job. Would you consider editing the cli.xml in the xdocs to explain this feature? I think it would be very helpful look in docs/xdocs/. > Support non-UTF8 data > - > > Key: HIVE-1505 > URL: https://issues.apache.org/jira/browse/HIVE-1505 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Affects Versions: 0.5.0 >Reporter: bc Wong >Assignee: Ted Xu > Attachments: trunk-encoding.patch > > > I'd like to work with non-UTF8 data easily. > Suppose I have data in latin1. Currently, doing a "select *" will return the > upper ascii characters in '\xef\xbf\xbd', which is the replacement character > '\ufffd' encoded in UTF-8. Would be nice for Hive to understand different > encodings, or to have a concept of byte string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1307: - Status: Open (was: Patch Available) > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.patch, > HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1307) More generic and efficient merge method
[ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900644#action_12900644 ] Ning Zhang commented on HIVE-1307: -- It's weired. 0.20 passed, but 0.17 failed mysteriously. Investigating. > More generic and efficient merge method > --- > > Key: HIVE-1307 > URL: https://issues.apache.org/jira/browse/HIVE-1307 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, > HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.patch, > HIVE-1307_java_only.patch > > > Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is > create to read the input files and output to one reducer for merging. This MR > job is created at compile time and one MR job for one partition. In the case > of dynamic partition case, multiple partitions could be created at execution > time and generating merging MR job at compile time is impossible. > We should generalize the merge framework to allow multiple partitions and > most of the time a map-only job should be sufficient if we use > CombineHiveInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-741) NULL is not handled correctly in join
[ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-741: - Attachment: patch-741-3.txt Thanks Ning for the comments. Patch incorporates the review comments. Looked at smb_mapjoin* query files and updated smb join queries. > NULL is not handled correctly in join > - > > Key: HIVE-741 > URL: https://issues.apache.org/jira/browse/HIVE-741 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Amareshwari Sriramadasu > Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, > patch-741.txt, smbjoin_nulls.q.txt > > > With the following data in table input4_cb: > KeyValue > -- > NULL 325 > 18 NULL > The following query: > {code} > select * from input4_cb a join input4_cb b on a.key = b.value; > {code} > returns the following result: > NULL32518 NULL > The correct result should be empty set. > When 'null' is replaced by '' it works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.