[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1307:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Finally --

Committed. Thanks Ning

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, 
> HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures

2010-08-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1578:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Paul.

> Add conf. property hive.exec.show.job.failure.debug.info to enable/disable 
> displaying link to the task with most failures
> -
>
> Key: HIVE-1578
> URL: https://issues.apache.org/jira/browse/HIVE-1578
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1578.1.patch
>
>
> If a job fails, Hive currently displays a link to the task with the most 
> number of failures for easy access to the error logs. However, generating the 
> link may require many RPC's to get all the task completion events, adding a 
> delay of up to 30 minutes. This patch adds a configuration variable to 
> control whether the link is generated. Turning off this feature would also 
> disable automatic debugging tips generated by heuristics reading from the 
> error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Attachment: HIVE-1307.9.patch

sigh, hopefully this is the last patch. I'm finishing some conflict in 
bucketmapjoin[1-3].q.out in 0.17. will run 0.17 again.

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.9.patch, 
> HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures

2010-08-20 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900948#action_12900948
 ] 

Carl Steinbach commented on HIVE-1578:
--

It would probably also be a good idea to add some INFO level logging statements 
to the loop in showJobDebugFailInfo() so that the user can roughy gauge the 
rate of progress.

> Add conf. property hive.exec.show.job.failure.debug.info to enable/disable 
> displaying link to the task with most failures
> -
>
> Key: HIVE-1578
> URL: https://issues.apache.org/jira/browse/HIVE-1578
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1578.1.patch
>
>
> If a job fails, Hive currently displays a link to the task with the most 
> number of failures for easy access to the error logs. However, generating the 
> link may require many RPC's to get all the task completion events, adding a 
> delay of up to 30 minutes. This patch adds a configuration variable to 
> control whether the link is generated. Turning off this feature would also 
> disable automatic debugging tips generated by heuristics reading from the 
> error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures

2010-08-20 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900946#action_12900946
 ] 

Carl Steinbach commented on HIVE-1578:
--

Hi Paul, before calling showJobDebugFailInfo() can you please print out a 
message telling the user that you're going to do this, that it may take a long 
time to complete, and that this feature can be disabled by setting the conf 
property {{hive.exec.show.job.failure.debug.info}}?

> Add conf. property hive.exec.show.job.failure.debug.info to enable/disable 
> displaying link to the task with most failures
> -
>
> Key: HIVE-1578
> URL: https://issues.apache.org/jira/browse/HIVE-1578
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1578.1.patch
>
>
> If a job fails, Hive currently displays a link to the task with the most 
> number of failures for easy access to the error logs. However, generating the 
> link may require many RPC's to get all the task completion events, adding a 
> delay of up to 30 minutes. This patch adds a configuration variable to 
> control whether the link is generated. Turning off this feature would also 
> disable automatic debugging tips generated by heuristics reading from the 
> error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures

2010-08-20 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1578:
-

Summary: Add conf. property hive.exec.show.job.failure.debug.info to 
enable/disable displaying link to the task with most failures  (was: Add conf. 
variable for displaying link to the task with most failures)

> Add conf. property hive.exec.show.job.failure.debug.info to enable/disable 
> displaying link to the task with most failures
> -
>
> Key: HIVE-1578
> URL: https://issues.apache.org/jira/browse/HIVE-1578
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1578.1.patch
>
>
> If a job fails, Hive currently displays a link to the task with the most 
> number of failures for easy access to the error logs. However, generating the 
> link may require many RPC's to get all the task completion events, adding a 
> delay of up to 30 minutes. This patch adds a configuration variable to 
> control whether the link is generated. Turning off this feature would also 
> disable automatic debugging tips generated by heuristics reading from the 
> error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Status: Patch Available  (was: Open)

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.patch, 
> HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-741) NULL is not handled correctly in join

2010-08-20 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900941#action_12900941
 ] 

Ning Zhang commented on HIVE-741:
-

The SMB test case still has a minor issue: the tables was created as 2 buckets 
but there is only 1 file in the tables. This is conflicting to the table 
schema. If a table is defined as bucketd 2, there should be 2 files in the 
partition or table. They SMB join takes the 1st file in T1 join the 1st file in 
T2, and 2nd file in T1 join 2nd file in T2. So the test case should cover this 
use case. 

> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, 
> patch-741.txt, smbjoin_nulls.q.txt
>
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-20 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900940#action_12900940
 ] 

Ning Zhang commented on HIVE-1510:
--

it does't fail on trunk but caused by parallel test. HIVE-1576 was filed for 
this. 

Will tes again and commit once HIVE-1307 is committed.

> HiveCombineInputFormat should not use prefix matching to find the 
> partitionDesc for a given path
> 
>
> Key: HIVE-1510
> URL: https://issues.apache.org/jira/browse/HIVE-1510
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1510.1.patch, hive-1510.3.patch, hive-1510.4.patch
>
>
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> drop table combine_3_srcpart_seq_rc;
> create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
> (ds string, hr string) stored as sequencefile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="00") select * from src;
> alter table combine_3_srcpart_seq_rc set fileformat rcfile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="001") select * from src;
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00");
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001");
> select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key;
> drop table combine_3_srcpart_seq_rc;
> will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[DISCUSS] Hive as TLP

2010-08-20 Thread Ashish Thusoo
The Hive subproject has voted to become a TLP

http://bit.ly/9nb4nN

Does the Hadoop community have any questions or concerns on this? I will be 
calling a more formal vote after this discussion.

The Hive dev community is still dominated by Facebook but the community is 
working hard to diversify the base and hopes to add committers from Yahoo and 
Cloudera. We anticipate that we will have a more diversified base by the end of 
the year modulo contributions from developers at these entities - and there are 
a fair bit in the pipeline.

Thanks,
Ashish


[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Attachment: HIVE-1307.8.patch

Uploading HIVE-1307.8.patch which clean up the TestParse in 0.17.

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.8.patch, HIVE-1307.patch, 
> HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1578) Add conf. variable for displaying link to the task with most failures

2010-08-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900927#action_12900927
 ] 

Namit Jain commented on HIVE-1578:
--

+1


> Add conf. variable for displaying link to the task with most failures
> -
>
> Key: HIVE-1578
> URL: https://issues.apache.org/jira/browse/HIVE-1578
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1578.1.patch
>
>
> If a job fails, Hive currently displays a link to the task with the most 
> number of failures for easy access to the error logs. However, generating the 
> link may require many RPC's to get all the task completion events, adding a 
> delay of up to 30 minutes. This patch adds a configuration variable to 
> control whether the link is generated. Turning off this feature would also 
> disable automatic debugging tips generated by heuristics reading from the 
> error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900926#action_12900926
 ] 

Ning Zhang commented on HIVE-1307:
--

Ok. I thought only these 3 .q files are failing on 0.17. I'm rerunning 
TestParse.

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, 
> HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1578) Add conf. variable for displaying link to the task with most failures

2010-08-20 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1578:


Description: If a job fails, Hive currently displays a link to the task 
with the most number of failures for easy access to the error logs. However, 
generating the link may require many RPC's to get all the task completion 
events, adding a delay of up to 30 minutes. This patch adds a configuration 
variable to control whether the link is generated. Turning off this feature 
would also disable automatic debugging tips generated by heuristics reading 
from the error logs.  (was: If a job fails, Hive currently displays a link to 
the task with the most number of failures, for easy access to the error logs. 
However, generating the link may require many RPC calls to get all the task 
completion events, adding a delay of up to 30 minutes. This patch adds a 
configuration variable to control whether the link is generated. Turning off 
this feature would also disable automatic debugging tips generated by 
heuristics reading from the error logs.)

> Add conf. variable for displaying link to the task with most failures
> -
>
> Key: HIVE-1578
> URL: https://issues.apache.org/jira/browse/HIVE-1578
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1578.1.patch
>
>
> If a job fails, Hive currently displays a link to the task with the most 
> number of failures for easy access to the error logs. However, generating the 
> link may require many RPC's to get all the task completion events, adding a 
> delay of up to 30 minutes. This patch adds a configuration variable to 
> control whether the link is generated. Turning off this feature would also 
> disable automatic debugging tips generated by heuristics reading from the 
> error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1578) Add conf. variable for displaying link to the task with most failures

2010-08-20 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1578:


Status: Patch Available  (was: Open)

> Add conf. variable for displaying link to the task with most failures
> -
>
> Key: HIVE-1578
> URL: https://issues.apache.org/jira/browse/HIVE-1578
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1578.1.patch
>
>
> If a job fails, Hive currently displays a link to the task with the most 
> number of failures for easy access to the error logs. However, generating the 
> link may require many RPC's to get all the task completion events, adding a 
> delay of up to 30 minutes. This patch adds a configuration variable to 
> control whether the link is generated. Turning off this feature would also 
> disable automatic debugging tips generated by heuristics reading from the 
> error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1578) Add conf. variable for displaying link to the task with most failures

2010-08-20 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1578:


Attachment: HIVE-1578.1.patch

> Add conf. variable for displaying link to the task with most failures
> -
>
> Key: HIVE-1578
> URL: https://issues.apache.org/jira/browse/HIVE-1578
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1578.1.patch
>
>
> If a job fails, Hive currently displays a link to the task with the most 
> number of failures, for easy access to the error logs. However, generating 
> the link may require many RPC calls to get all the task completion events, 
> adding a delay of up to 30 minutes. This patch adds a configuration variable 
> to control whether the link is generated. Turning off this feature would also 
> disable automatic debugging tips generated by heuristics reading from the 
> error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1578) Add conf. variable for displaying link to the task with most failures

2010-08-20 Thread Paul Yang (JIRA)
Add conf. variable for displaying link to the task with most failures
-

 Key: HIVE-1578
 URL: https://issues.apache.org/jira/browse/HIVE-1578
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Paul Yang
Assignee: Paul Yang
 Fix For: 0.7.0


If a job fails, Hive currently displays a link to the task with the most number 
of failures, for easy access to the error logs. However, generating the link 
may require many RPC calls to get all the task completion events, adding a 
delay of up to 30 minutes. This patch adds a configuration variable to control 
whether the link is generated. Turning off this feature would also disable 
automatic debugging tips generated by heuristics reading from the error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900906#action_12900906
 ] 

Namit Jain commented on HIVE-1307:
--

ant test -Dtestcase=TestParse -Doffline=true -Dhadoop.version="0.17.2.1"

I am still getting a lot of diffs for the above. Is it running 

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, 
> HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900906#action_12900906
 ] 

Namit Jain edited comment on HIVE-1307 at 8/20/10 7:04 PM:
---

ant test -Dtestcase=TestParse -Doffline=true -Dhadoop.version="0.17.2.1"

I am still getting a lot of diffs for the above. Is it running OK for you ?

  was (Author: namit):
ant test -Dtestcase=TestParse -Doffline=true -Dhadoop.version="0.17.2.1"

I am still getting a lot of diffs for the above. Is it running 
  
> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, 
> HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-20 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900904#action_12900904
 ] 

He Yongqiang commented on HIVE-1510:


even without this patch, the 0.17 test failed on index_compat3.q. Please file a 
separate jira for this issue. 

> HiveCombineInputFormat should not use prefix matching to find the 
> partitionDesc for a given path
> 
>
> Key: HIVE-1510
> URL: https://issues.apache.org/jira/browse/HIVE-1510
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1510.1.patch, hive-1510.3.patch, hive-1510.4.patch
>
>
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> drop table combine_3_srcpart_seq_rc;
> create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
> (ds string, hr string) stored as sequencefile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="00") select * from src;
> alter table combine_3_srcpart_seq_rc set fileformat rcfile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="001") select * from src;
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00");
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001");
> select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key;
> drop table combine_3_srcpart_seq_rc;
> will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Attachment: HIVE-1307.7.patch

Uploading HIVE-1307.7.patch. The only differences from the last on is the log 
change in input[1-3].q.xml in 0.17 and input[2-3].q.xml in 0.20.

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.7.patch, HIVE-1307.patch, 
> HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900884#action_12900884
 ] 

Ning Zhang commented on HIVE-1307:
--

Will regenerate the patch.

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1307:
-

Status: Open  (was: Patch Available)

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [DISCUSSION] Move to become a TLP

2010-08-20 Thread Venky Iyer
I'm not qualified to vote on this, but as a fan and user I'm curious to  
hear what, if any, disadvantages there are of becoming a TLP.


On Fri, 20 Aug 2010 13:20:39 -0700, Edward Capriolo  
 wrote:



I am +1 as well.



On Fri, Aug 20, 2010 at 1:29 PM, Ashish Thusoo   
wrote:
Thanks everyone who voted. Looks like this is unanimous at this point.  
I will start the proceedings in the Hadoop PMC to make Hive a TLP.


Ashish

-Original Message-
From: Paul Yang [mailto:py...@facebook.com]
Sent: Thursday, August 19, 2010 4:05 PM
To: hive-dev@hadoop.apache.org
Subject: RE: [DISCUSSION] Move to become a TLP

+1

-Original Message-
From: Joydeep Sen Sarma [mailto:jssa...@facebook.com]
Sent: Thursday, August 19, 2010 3:30 PM
To: hive-dev@hadoop.apache.org
Subject: RE: [DISCUSSION] Move to become a TLP

+1

-Original Message-
From: Carl Steinbach [mailto:c...@cloudera.com]
Sent: Thursday, August 19, 2010 3:18 PM
To: hive-dev@hadoop.apache.org
Subject: Re: [DISCUSSION] Move to become a TLP

+1

On Thu, Aug 19, 2010 at 3:15 PM, Ning Zhang  wrote:


+1 as well.

On Aug 19, 2010, at 3:06 PM, Zheng Shao wrote:

> +1.
>
> Zheng
>
> On Mon, Aug 16, 2010 at 11:58 AM, John Sichi 
wrote:
>> +1 from me.  The momentum on cross-company collaboration we're
>> +seeing
now, plus big integration contributions such as the new storage
handlers (HyperTable and Cassandra), are all signs that Hive is  
growing up fast.

>>
>> HBase recently took the same route, so I'm going to have a chat
>> with
Jonathan Gray to find out what that involved for them.
>>
>> JVS
>>
>> On Aug 14, 2010, at 4:42 PM, Jeff Hammerbacher wrote:
>>
>>> Yes, I think Hive is ready to become a TLP.
>>>
>>> On Fri, Aug 13, 2010 at 1:36 PM, Ashish Thusoo
>>> 
wrote:
>>>
 Nice one Ed...

 Folks,

 Please chime in. I think we should close this out next week one
 way or
the
 other. We can consider this a vote at this point, so please vote
 on
this
 issue.

 Thanks,
 Ashish

 -Original Message-
 From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
 Sent: Thursday, August 12, 2010 8:05 AM
 To: hive-dev@hadoop.apache.org
 Subject: Re: [DISCUSSION] Move to become a TLP

 On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo
 
 wrote:
> Folks,
>
> This question has come up in the PMC once again and would be
> great to
 hear once more on this topic. What do people think? Are we ready
 to
become a
 TLP?
>
> Thanks,
> Ashish

 I thought of one more benefit. We can rename our packages from

 org.apache.hadoop.hive.*
 to
 org.apache.hive.*

 :)

>>
>>
>
>
>
> --
> Yours,
> Zheng
> http://www.linkedin.com/in/zshao




[jira] Commented: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900866#action_12900866
 ] 

Namit Jain commented on HIVE-1307:
--

TestParse is failing on both 17 and 20.

On 17, the following tests are failing in 17:

bucketmapjoin1.q
bucketmapjoin2.q
bucketmapjoin3.q


All of them are log file updates - can you fix the log files and submit a new 
patch ?

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1577) Add configuration property hive.exec.local.scratchdir

2010-08-20 Thread Carl Steinbach (JIRA)
Add configuration property hive.exec.local.scratchdir
-

 Key: HIVE-1577
 URL: https://issues.apache.org/jira/browse/HIVE-1577
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Configuration
Reporter: Carl Steinbach


When Hive is run in local mode it uses the hardcoded local directory 
{{/${java.io.tmpdir}/${user.name}}} for temporary files. This path should be
configurable via the property {{hive.exec.local.scratchdir}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [DISCUSSION] Move to become a TLP

2010-08-20 Thread Edward Capriolo
I am +1 as well.



On Fri, Aug 20, 2010 at 1:29 PM, Ashish Thusoo  wrote:
> Thanks everyone who voted. Looks like this is unanimous at this point. I will 
> start the proceedings in the Hadoop PMC to make Hive a TLP.
>
> Ashish
>
> -Original Message-
> From: Paul Yang [mailto:py...@facebook.com]
> Sent: Thursday, August 19, 2010 4:05 PM
> To: hive-dev@hadoop.apache.org
> Subject: RE: [DISCUSSION] Move to become a TLP
>
> +1
>
> -Original Message-
> From: Joydeep Sen Sarma [mailto:jssa...@facebook.com]
> Sent: Thursday, August 19, 2010 3:30 PM
> To: hive-dev@hadoop.apache.org
> Subject: RE: [DISCUSSION] Move to become a TLP
>
> +1
>
> -Original Message-
> From: Carl Steinbach [mailto:c...@cloudera.com]
> Sent: Thursday, August 19, 2010 3:18 PM
> To: hive-dev@hadoop.apache.org
> Subject: Re: [DISCUSSION] Move to become a TLP
>
> +1
>
> On Thu, Aug 19, 2010 at 3:15 PM, Ning Zhang  wrote:
>
>> +1 as well.
>>
>> On Aug 19, 2010, at 3:06 PM, Zheng Shao wrote:
>>
>> > +1.
>> >
>> > Zheng
>> >
>> > On Mon, Aug 16, 2010 at 11:58 AM, John Sichi 
>> wrote:
>> >> +1 from me.  The momentum on cross-company collaboration we're
>> >> +seeing
>> now, plus big integration contributions such as the new storage
>> handlers (HyperTable and Cassandra), are all signs that Hive is growing up 
>> fast.
>> >>
>> >> HBase recently took the same route, so I'm going to have a chat
>> >> with
>> Jonathan Gray to find out what that involved for them.
>> >>
>> >> JVS
>> >>
>> >> On Aug 14, 2010, at 4:42 PM, Jeff Hammerbacher wrote:
>> >>
>> >>> Yes, I think Hive is ready to become a TLP.
>> >>>
>> >>> On Fri, Aug 13, 2010 at 1:36 PM, Ashish Thusoo
>> >>> 
>> wrote:
>> >>>
>>  Nice one Ed...
>> 
>>  Folks,
>> 
>>  Please chime in. I think we should close this out next week one
>>  way or
>> the
>>  other. We can consider this a vote at this point, so please vote
>>  on
>> this
>>  issue.
>> 
>>  Thanks,
>>  Ashish
>> 
>>  -Original Message-
>>  From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
>>  Sent: Thursday, August 12, 2010 8:05 AM
>>  To: hive-dev@hadoop.apache.org
>>  Subject: Re: [DISCUSSION] Move to become a TLP
>> 
>>  On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo
>>  
>>  wrote:
>> > Folks,
>> >
>> > This question has come up in the PMC once again and would be
>> > great to
>>  hear once more on this topic. What do people think? Are we ready
>>  to
>> become a
>>  TLP?
>> >
>> > Thanks,
>> > Ashish
>> 
>>  I thought of one more benefit. We can rename our packages from
>> 
>>  org.apache.hadoop.hive.*
>>  to
>>  org.apache.hive.*
>> 
>>  :)
>> 
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Yours,
>> > Zheng
>> > http://www.linkedin.com/in/zshao
>>
>>
>


[jira] Created: (HIVE-1576) index_compact*.q should not share common result file

2010-08-20 Thread Ning Zhang (JIRA)
index_compact*.q should not share common result file


 Key: HIVE-1576
 URL: https://issues.apache.org/jira/browse/HIVE-1576
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: He Yongqiang


some index output of index_compact*.q share the same file name (e.g., 
/tmp/index_test_index_result). This causes parallel test breaks intermittently. 
Ideally they should output to the local warehouse directory where parallel 
tests won't conflict. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1512) Need to get hive_hbase-handler to work with hbase versions 0.20.4 0.20.5 and cloudera CDH3 version

2010-08-20 Thread Basab Maulik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Basab Maulik updated HIVE-1512:
---

Attachment: HIVE-1512.3.patch

Thanks John. This is a small change to the patch, fixes a potential NPE. Also, 
HBase 0.89.x introduces an additional runtime dependency for the tests, 
guava-r05.jar, I think the Google collections library jar.

> Need to get hive_hbase-handler to work with hbase versions 0.20.4  0.20.5 and 
> cloudera CDH3 version
> ---
>
> Key: HIVE-1512
> URL: https://issues.apache.org/jira/browse/HIVE-1512
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Jimmy Hu
>Assignee: Basab Maulik
> Fix For: 0.7.0
>
> Attachments: HIVE-1512.2.patch, HIVE-1512.3.patch, HIVE-1512.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> the current trunk  hive_hbase-handler only works with hbase 0.20.3, we need 
> to get it to work with hbase versions 0.20.4  0.20.5 and cloudera CDH3 version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900817#action_12900817
 ] 

Ning Zhang commented on HIVE-1307:
--

all tests on 0.17 and 0.20 passed. There is an intermittent diff in 
index_compact_2.q on 0.20 in parallel test. When I run it individually it 
succeeded. Not sure if it is due to parallel testing. Will run 0.20 
sequentially again. 

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900812#action_12900812
 ] 

Namit Jain commented on HIVE-1307:
--

The patch applied cleanly

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Attachment: HIVE-1307.6.patch

Uploading HIVE-1307.6.patch which applies cleanly with the current trunk.

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.6.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900788#action_12900788
 ] 

Namit Jain commented on HIVE-1307:
--

The patch does not apply cleanly - can you regenerate

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900786#action_12900786
 ] 

Namit Jain commented on HIVE-1307:
--

will start testing and reviewing again

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [DISCUSSION] Move to become a TLP

2010-08-20 Thread Ashish Thusoo
Thanks everyone who voted. Looks like this is unanimous at this point. I will 
start the proceedings in the Hadoop PMC to make Hive a TLP.

Ashish 

-Original Message-
From: Paul Yang [mailto:py...@facebook.com] 
Sent: Thursday, August 19, 2010 4:05 PM
To: hive-dev@hadoop.apache.org
Subject: RE: [DISCUSSION] Move to become a TLP

+1

-Original Message-
From: Joydeep Sen Sarma [mailto:jssa...@facebook.com]
Sent: Thursday, August 19, 2010 3:30 PM
To: hive-dev@hadoop.apache.org
Subject: RE: [DISCUSSION] Move to become a TLP

+1

-Original Message-
From: Carl Steinbach [mailto:c...@cloudera.com]
Sent: Thursday, August 19, 2010 3:18 PM
To: hive-dev@hadoop.apache.org
Subject: Re: [DISCUSSION] Move to become a TLP

+1

On Thu, Aug 19, 2010 at 3:15 PM, Ning Zhang  wrote:

> +1 as well.
>
> On Aug 19, 2010, at 3:06 PM, Zheng Shao wrote:
>
> > +1.
> >
> > Zheng
> >
> > On Mon, Aug 16, 2010 at 11:58 AM, John Sichi 
> wrote:
> >> +1 from me.  The momentum on cross-company collaboration we're 
> >> +seeing
> now, plus big integration contributions such as the new storage 
> handlers (HyperTable and Cassandra), are all signs that Hive is growing up 
> fast.
> >>
> >> HBase recently took the same route, so I'm going to have a chat 
> >> with
> Jonathan Gray to find out what that involved for them.
> >>
> >> JVS
> >>
> >> On Aug 14, 2010, at 4:42 PM, Jeff Hammerbacher wrote:
> >>
> >>> Yes, I think Hive is ready to become a TLP.
> >>>
> >>> On Fri, Aug 13, 2010 at 1:36 PM, Ashish Thusoo 
> >>> 
> wrote:
> >>>
>  Nice one Ed...
> 
>  Folks,
> 
>  Please chime in. I think we should close this out next week one 
>  way or
> the
>  other. We can consider this a vote at this point, so please vote 
>  on
> this
>  issue.
> 
>  Thanks,
>  Ashish
> 
>  -Original Message-
>  From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
>  Sent: Thursday, August 12, 2010 8:05 AM
>  To: hive-dev@hadoop.apache.org
>  Subject: Re: [DISCUSSION] Move to become a TLP
> 
>  On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo 
>  
>  wrote:
> > Folks,
> >
> > This question has come up in the PMC once again and would be 
> > great to
>  hear once more on this topic. What do people think? Are we ready 
>  to
> become a
>  TLP?
> >
> > Thanks,
> > Ashish
> 
>  I thought of one more benefit. We can rename our packages from
> 
>  org.apache.hadoop.hive.*
>  to
>  org.apache.hive.*
> 
>  :)
> 
> >>
> >>
> >
> >
> >
> > --
> > Yours,
> > Zheng
> > http://www.linkedin.com/in/zshao
>
>


[jira] Commented: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900773#action_12900773
 ] 

Ning Zhang commented on HIVE-1307:
--

OK, 0.17 tests passed. 

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Status: Patch Available  (was: Open)

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Attachment: HIVE-1307.5.patch

Uploading HIVE-1307.5.patch which should solves the 0.17 issue. I'm runing 0.17 
test now.

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.5.patch, 
> HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1505) Support non-UTF8 data

2010-08-20 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900697#action_12900697
 ] 

Edward Capriolo commented on HIVE-1505:
---

 Maybe you should fork hive and call it chive. 

On a serious node . Great job. Would you consider editing the cli.xml in the 
xdocs to explain this feature? I think it would be very helpful look in 
docs/xdocs/.

> Support non-UTF8 data
> -
>
> Key: HIVE-1505
> URL: https://issues.apache.org/jira/browse/HIVE-1505
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: bc Wong
>Assignee: Ted Xu
> Attachments: trunk-encoding.patch
>
>
> I'd like to work with non-UTF8 data easily.
> Suppose I have data in latin1. Currently, doing a "select *" will return the 
> upper ascii characters in '\xef\xbf\xbd', which is the replacement character 
> '\ufffd' encoded in UTF-8. Would be nice for Hive to understand different 
> encodings, or to have a concept of byte string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1307:
-

Status: Open  (was: Patch Available)

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.patch, 
> HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1307) More generic and efficient merge method

2010-08-20 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900644#action_12900644
 ] 

Ning Zhang commented on HIVE-1307:
--

It's weired. 0.20 passed, but 0.17 failed mysteriously. Investigating.

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.4.patch, HIVE-1307.patch, 
> HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-741) NULL is not handled correctly in join

2010-08-20 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-741:
-

Attachment: patch-741-3.txt

Thanks Ning for the comments.

Patch incorporates the review comments. Looked at smb_mapjoin* query files and 
updated smb join queries. 

> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, 
> patch-741.txt, smbjoin_nulls.q.txt
>
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.