[jira] Updated: (HIVE-741) NULL is not handled correctly in join

2010-08-22 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-741:
-

Attachment: patch-741-5.txt

Updated the patch. Thanks Ning for your help.

> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, 
> patch-741-4.txt, patch-741-5.txt, patch-741.txt, smbjoin_nulls.q.txt
>
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-741) NULL is not handled correctly in join

2010-08-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901275#action_12901275
 ] 

Ning Zhang commented on HIVE-741:
-

Looks good except one mintor thing: SerDeUtils.java:369 should return true? 
Amareshwari, can you upload a new patch and I'll run unit tests. 

Yongqiang, can you test this patch on the production SMB join queries?



> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, 
> patch-741-4.txt, patch-741.txt, smbjoin_nulls.q.txt
>
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'

2010-08-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901264#action_12901264
 ] 

Ning Zhang commented on HIVE-1582:
--

@namit, merging happens even before HIVE-1307. There does not seems to exist a 
unit test for this feature -- no merge for inserting to directory). BTW, what's 
the rationale behind this? 

> merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
> --
>
> Key: HIVE-1582
> URL: https://issues.apache.org/jira/browse/HIVE-1582
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> hive> 
> > 
> > 
> >  SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
> hive>SET hive.exec.compress.output=false;
> hive>INSERT OVERWRITE DIRECTORY 'x'
> >  SELECT  from  a;
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks is set to 0 since there's no reduce operator
> ..
> Ended Job = job_201008191557_54169
> Ended Job = 450290112, job is filtered out (removed at runtime).
> Launching Job 2 out of 2
> .
> the second job should not get started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'

2010-08-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901260#action_12901260
 ] 

Namit Jain commented on HIVE-1582:
--

@Ning, there should be no merge job for insert directory, we only used to merge 
for inserting into tables and partitions before

> merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
> --
>
> Key: HIVE-1582
> URL: https://issues.apache.org/jira/browse/HIVE-1582
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> hive> 
> > 
> > 
> >  SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
> hive>SET hive.exec.compress.output=false;
> hive>INSERT OVERWRITE DIRECTORY 'x'
> >  SELECT  from  a;
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks is set to 0 since there's no reduce operator
> ..
> Ended Job = job_201008191557_54169
> Ended Job = 450290112, job is filtered out (removed at runtime).
> Launching Job 2 out of 2
> .
> the second job should not get started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1293) Concurrency Model for Hive

2010-08-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1293:
-

Status: Patch Available  (was: Open)

> Concurrency Model for Hive
> --
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive.1293.6.patch, hive.1293.7.patch, 
> hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1293) Concurrency Model for Hive

2010-08-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1293:
-

Attachment: hive.1293.7.patch

another - hopefully, final patch

> Concurrency Model for Hive
> --
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive.1293.6.patch, hive.1293.7.patch, 
> hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-741) NULL is not handled correctly in join

2010-08-22 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-741:
-

Attachment: patch-741-4.txt

Updated smb input with two files. 

> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, 
> patch-741-4.txt, patch-741.txt, smbjoin_nulls.q.txt
>
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1583) Hive should not override Hadoop specific system properties

2010-08-22 Thread Amareshwari Sriramadasu (JIRA)
Hive should not override Hadoop specific system properties
--

 Key: HIVE-1583
 URL: https://issues.apache.org/jira/browse/HIVE-1583
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Configuration
Reporter: Amareshwari Sriramadasu


Currently Hive overrides Hadoop specific system properties such as 
HADOOP_CLASSPATH.
It does the following in bin/hive script :
{code}
# pass classpath to hadoop
export HADOOP_CLASSPATH=${CLASSPATH}
{code}
Instead, It should honor the value of HADOOP_CLASSPATH set by client by 
appending CLASSPATH to it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'

2010-08-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901250#action_12901250
 ] 

Ning Zhang commented on HIVE-1582:
--

I'm confused. Do you mean the second job should not be started or the second 
job should not be filtered out? I've tested the behaviors before and after 
HIVE-1307, and they are the same and always fires the merge job. 

> merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
> --
>
> Key: HIVE-1582
> URL: https://issues.apache.org/jira/browse/HIVE-1582
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> hive> 
> > 
> > 
> >  SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
> hive>SET hive.exec.compress.output=false;
> hive>INSERT OVERWRITE DIRECTORY 'x'
> >  SELECT  from  a;
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks is set to 0 since there's no reduce operator
> ..
> Ended Job = job_201008191557_54169
> Ended Job = 450290112, job is filtered out (removed at runtime).
> Launching Job 2 out of 2
> .
> the second job should not get started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'

2010-08-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901242#action_12901242
 ] 

He Yongqiang commented on HIVE-1582:


Ended Job = 450290112, job is filtered out (removed at runtime).

the second job seems be filtered out at runtime

> merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
> --
>
> Key: HIVE-1582
> URL: https://issues.apache.org/jira/browse/HIVE-1582
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> hive> 
> > 
> > 
> >  SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
> hive>SET hive.exec.compress.output=false;
> hive>INSERT OVERWRITE DIRECTORY 'x'
> >  SELECT  from  a;
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks is set to 0 since there's no reduce operator
> ..
> Ended Job = job_201008191557_54169
> Ended Job = 450290112, job is filtered out (removed at runtime).
> Launching Job 2 out of 2
> .
> the second job should not get started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'

2010-08-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901239#action_12901239
 ] 

Ning Zhang commented on HIVE-1582:
--

Is hive.merge.mapfiles=true? If so the second merge job should be fired. Am I 
missing something?

> merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
> --
>
> Key: HIVE-1582
> URL: https://issues.apache.org/jira/browse/HIVE-1582
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> hive> 
> > 
> > 
> >  SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
> hive>SET hive.exec.compress.output=false;
> hive>INSERT OVERWRITE DIRECTORY 'x'
> >  SELECT  from  a;
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks is set to 0 since there's no reduce operator
> ..
> Ended Job = job_201008191557_54169
> Ended Job = 450290112, job is filtered out (removed at runtime).
> Launching Job 2 out of 2
> .
> the second job should not get started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.

2010-08-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1581:
---

Attachment: HIVE-1581.1.patch

> CompactIndexInputFormat should create split only for files in the index 
> output file.
> 
>
> Key: HIVE-1581
> URL: https://issues.apache.org/jira/browse/HIVE-1581
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1581.1.patch
>
>
> We can get a list of files from the index file, so no need to create splits 
> based on all files in the base table/partition

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.

2010-08-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1581:
---

Status: Patch Available  (was: Open)

> CompactIndexInputFormat should create split only for files in the index 
> output file.
> 
>
> Key: HIVE-1581
> URL: https://issues.apache.org/jira/browse/HIVE-1581
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1581.1.patch
>
>
> We can get a list of files from the index file, so no need to create splits 
> based on all files in the base table/partition

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'

2010-08-22 Thread He Yongqiang (JIRA)
merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
--

 Key: HIVE-1582
 URL: https://issues.apache.org/jira/browse/HIVE-1582
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang


hive> 
> 
> 
>  SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
hive>SET hive.exec.compress.output=false;
hive>INSERT OVERWRITE DIRECTORY 'x'
>  SELECT  from  a;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
..
Ended Job = job_201008191557_54169
Ended Job = 450290112, job is filtered out (removed at runtime).
Launching Job 2 out of 2
.

the second job should not get started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.

2010-08-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1581:
---

Attachment: (was: HIVE-1581.1.patch)

> CompactIndexInputFormat should create split only for files in the index 
> output file.
> 
>
> Key: HIVE-1581
> URL: https://issues.apache.org/jira/browse/HIVE-1581
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>
> We can get a list of files from the index file, so no need to create splits 
> based on all files in the base table/partition

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.

2010-08-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1581:
---

Attachment: HIVE-1581.1.patch

> CompactIndexInputFormat should create split only for files in the index 
> output file.
> 
>
> Key: HIVE-1581
> URL: https://issues.apache.org/jira/browse/HIVE-1581
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1581.1.patch
>
>
> We can get a list of files from the index file, so no need to create splits 
> based on all files in the base table/partition

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures

2010-08-22 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901207#action_12901207
 ] 

Paul Yang commented on HIVE-1578:
-

@Carl

The message to the user about the conf var is a good idea. I can put info level 
logging statements, but I don't think it's possible to know the number of task 
completion events before retrieving them, so there won't be a % complete 
message.

> Add conf. property hive.exec.show.job.failure.debug.info to enable/disable 
> displaying link to the task with most failures
> -
>
> Key: HIVE-1578
> URL: https://issues.apache.org/jira/browse/HIVE-1578
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1578.1.patch
>
>
> If a job fails, Hive currently displays a link to the task with the most 
> number of failures for easy access to the error logs. However, generating the 
> link may require many RPC's to get all the task completion events, adding a 
> delay of up to 30 minutes. This patch adds a configuration variable to 
> control whether the link is generated. Turning off this feature would also 
> disable automatic debugging tips generated by heuristics reading from the 
> error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.

2010-08-22 Thread He Yongqiang (JIRA)
CompactIndexInputFormat should create split only for files in the index output 
file.


 Key: HIVE-1581
 URL: https://issues.apache.org/jira/browse/HIVE-1581
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1581.1.patch

We can get a list of files from the index file, so no need to create splits 
based on all files in the base table/partition

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.