[jira] Updated: (HIVE-741) NULL is not handled correctly in join
[ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-741: - Attachment: patch-741-5.txt Updated the patch. Thanks Ning for your help. > NULL is not handled correctly in join > - > > Key: HIVE-741 > URL: https://issues.apache.org/jira/browse/HIVE-741 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Amareshwari Sriramadasu > Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, > patch-741-4.txt, patch-741-5.txt, patch-741.txt, smbjoin_nulls.q.txt > > > With the following data in table input4_cb: > KeyValue > -- > NULL 325 > 18 NULL > The following query: > {code} > select * from input4_cb a join input4_cb b on a.key = b.value; > {code} > returns the following result: > NULL32518 NULL > The correct result should be empty set. > When 'null' is replaced by '' it works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-741) NULL is not handled correctly in join
[ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901275#action_12901275 ] Ning Zhang commented on HIVE-741: - Looks good except one mintor thing: SerDeUtils.java:369 should return true? Amareshwari, can you upload a new patch and I'll run unit tests. Yongqiang, can you test this patch on the production SMB join queries? > NULL is not handled correctly in join > - > > Key: HIVE-741 > URL: https://issues.apache.org/jira/browse/HIVE-741 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Amareshwari Sriramadasu > Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, > patch-741-4.txt, patch-741.txt, smbjoin_nulls.q.txt > > > With the following data in table input4_cb: > KeyValue > -- > NULL 325 > 18 NULL > The following query: > {code} > select * from input4_cb a join input4_cb b on a.key = b.value; > {code} > returns the following result: > NULL32518 NULL > The correct result should be empty set. > When 'null' is replaced by '' it works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
[ https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901264#action_12901264 ] Ning Zhang commented on HIVE-1582: -- @namit, merging happens even before HIVE-1307. There does not seems to exist a unit test for this feature -- no merge for inserting to directory). BTW, what's the rationale behind this? > merge mapfiles task behaves incorrectly for 'inserting overwrite directory...' > -- > > Key: HIVE-1582 > URL: https://issues.apache.org/jira/browse/HIVE-1582 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang > > hive> > > > > > > SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > hive>SET hive.exec.compress.output=false; > hive>INSERT OVERWRITE DIRECTORY 'x' > > SELECT from a; > Total MapReduce jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks is set to 0 since there's no reduce operator > .. > Ended Job = job_201008191557_54169 > Ended Job = 450290112, job is filtered out (removed at runtime). > Launching Job 2 out of 2 > . > the second job should not get started. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
[ https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901260#action_12901260 ] Namit Jain commented on HIVE-1582: -- @Ning, there should be no merge job for insert directory, we only used to merge for inserting into tables and partitions before > merge mapfiles task behaves incorrectly for 'inserting overwrite directory...' > -- > > Key: HIVE-1582 > URL: https://issues.apache.org/jira/browse/HIVE-1582 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang > > hive> > > > > > > SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > hive>SET hive.exec.compress.output=false; > hive>INSERT OVERWRITE DIRECTORY 'x' > > SELECT from a; > Total MapReduce jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks is set to 0 since there's no reduce operator > .. > Ended Job = job_201008191557_54169 > Ended Job = 450290112, job is filtered out (removed at runtime). > Launching Job 2 out of 2 > . > the second job should not get started. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1293) Concurrency Model for Hive
[ https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1293: - Status: Patch Available (was: Open) > Concurrency Model for Hive > -- > > Key: HIVE-1293 > URL: https://issues.apache.org/jira/browse/HIVE-1293 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, > hive.1293.4.patch, hive.1293.5.patch, hive.1293.6.patch, hive.1293.7.patch, > hive_leases.txt > > > Concurrency model for Hive: > Currently, hive does not provide a good concurrency model. The only > guanrantee provided in case of concurrent readers and writers is that > reader will not see partial data from the old version (before the write) and > partial data from the new version (after the write). > This has come across as a big problem, specially for background processes > performing maintenance operations. > The following possible solutions come to mind. > 1. Locks: Acquire read/write locks - they can be acquired at the beginning of > the query or the write locks can be delayed till move > task (when the directory is actually moved). Care needs to be taken for > deadlocks. > 2. Versioning: The writer can create a new version if the current version is > being read. Note that, it is not equivalent to snapshots, > the old version can only be accessed by the current readers, and will be > deleted when all of them have finished. > Comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1293) Concurrency Model for Hive
[ https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1293: - Attachment: hive.1293.7.patch another - hopefully, final patch > Concurrency Model for Hive > -- > > Key: HIVE-1293 > URL: https://issues.apache.org/jira/browse/HIVE-1293 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, > hive.1293.4.patch, hive.1293.5.patch, hive.1293.6.patch, hive.1293.7.patch, > hive_leases.txt > > > Concurrency model for Hive: > Currently, hive does not provide a good concurrency model. The only > guanrantee provided in case of concurrent readers and writers is that > reader will not see partial data from the old version (before the write) and > partial data from the new version (after the write). > This has come across as a big problem, specially for background processes > performing maintenance operations. > The following possible solutions come to mind. > 1. Locks: Acquire read/write locks - they can be acquired at the beginning of > the query or the write locks can be delayed till move > task (when the directory is actually moved). Care needs to be taken for > deadlocks. > 2. Versioning: The writer can create a new version if the current version is > being read. Note that, it is not equivalent to snapshots, > the old version can only be accessed by the current readers, and will be > deleted when all of them have finished. > Comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-741) NULL is not handled correctly in join
[ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-741: - Attachment: patch-741-4.txt Updated smb input with two files. > NULL is not handled correctly in join > - > > Key: HIVE-741 > URL: https://issues.apache.org/jira/browse/HIVE-741 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Amareshwari Sriramadasu > Attachments: patch-741-1.txt, patch-741-2.txt, patch-741-3.txt, > patch-741-4.txt, patch-741.txt, smbjoin_nulls.q.txt > > > With the following data in table input4_cb: > KeyValue > -- > NULL 325 > 18 NULL > The following query: > {code} > select * from input4_cb a join input4_cb b on a.key = b.value; > {code} > returns the following result: > NULL32518 NULL > The correct result should be empty set. > When 'null' is replaced by '' it works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1583) Hive should not override Hadoop specific system properties
Hive should not override Hadoop specific system properties -- Key: HIVE-1583 URL: https://issues.apache.org/jira/browse/HIVE-1583 Project: Hadoop Hive Issue Type: Bug Components: Configuration Reporter: Amareshwari Sriramadasu Currently Hive overrides Hadoop specific system properties such as HADOOP_CLASSPATH. It does the following in bin/hive script : {code} # pass classpath to hadoop export HADOOP_CLASSPATH=${CLASSPATH} {code} Instead, It should honor the value of HADOOP_CLASSPATH set by client by appending CLASSPATH to it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
[ https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901250#action_12901250 ] Ning Zhang commented on HIVE-1582: -- I'm confused. Do you mean the second job should not be started or the second job should not be filtered out? I've tested the behaviors before and after HIVE-1307, and they are the same and always fires the merge job. > merge mapfiles task behaves incorrectly for 'inserting overwrite directory...' > -- > > Key: HIVE-1582 > URL: https://issues.apache.org/jira/browse/HIVE-1582 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang > > hive> > > > > > > SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > hive>SET hive.exec.compress.output=false; > hive>INSERT OVERWRITE DIRECTORY 'x' > > SELECT from a; > Total MapReduce jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks is set to 0 since there's no reduce operator > .. > Ended Job = job_201008191557_54169 > Ended Job = 450290112, job is filtered out (removed at runtime). > Launching Job 2 out of 2 > . > the second job should not get started. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
[ https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901242#action_12901242 ] He Yongqiang commented on HIVE-1582: Ended Job = 450290112, job is filtered out (removed at runtime). the second job seems be filtered out at runtime > merge mapfiles task behaves incorrectly for 'inserting overwrite directory...' > -- > > Key: HIVE-1582 > URL: https://issues.apache.org/jira/browse/HIVE-1582 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang > > hive> > > > > > > SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > hive>SET hive.exec.compress.output=false; > hive>INSERT OVERWRITE DIRECTORY 'x' > > SELECT from a; > Total MapReduce jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks is set to 0 since there's no reduce operator > .. > Ended Job = job_201008191557_54169 > Ended Job = 450290112, job is filtered out (removed at runtime). > Launching Job 2 out of 2 > . > the second job should not get started. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
[ https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901239#action_12901239 ] Ning Zhang commented on HIVE-1582: -- Is hive.merge.mapfiles=true? If so the second merge job should be fired. Am I missing something? > merge mapfiles task behaves incorrectly for 'inserting overwrite directory...' > -- > > Key: HIVE-1582 > URL: https://issues.apache.org/jira/browse/HIVE-1582 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang > > hive> > > > > > > SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > hive>SET hive.exec.compress.output=false; > hive>INSERT OVERWRITE DIRECTORY 'x' > > SELECT from a; > Total MapReduce jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks is set to 0 since there's no reduce operator > .. > Ended Job = job_201008191557_54169 > Ended Job = 450290112, job is filtered out (removed at runtime). > Launching Job 2 out of 2 > . > the second job should not get started. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.
[ https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1581: --- Attachment: HIVE-1581.1.patch > CompactIndexInputFormat should create split only for files in the index > output file. > > > Key: HIVE-1581 > URL: https://issues.apache.org/jira/browse/HIVE-1581 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-1581.1.patch > > > We can get a list of files from the index file, so no need to create splits > based on all files in the base table/partition -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.
[ https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1581: --- Status: Patch Available (was: Open) > CompactIndexInputFormat should create split only for files in the index > output file. > > > Key: HIVE-1581 > URL: https://issues.apache.org/jira/browse/HIVE-1581 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-1581.1.patch > > > We can get a list of files from the index file, so no need to create splits > based on all files in the base table/partition -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
merge mapfiles task behaves incorrectly for 'inserting overwrite directory...' -- Key: HIVE-1582 URL: https://issues.apache.org/jira/browse/HIVE-1582 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang hive> > > > SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; hive>SET hive.exec.compress.output=false; hive>INSERT OVERWRITE DIRECTORY 'x' > SELECT from a; Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks is set to 0 since there's no reduce operator .. Ended Job = job_201008191557_54169 Ended Job = 450290112, job is filtered out (removed at runtime). Launching Job 2 out of 2 . the second job should not get started. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.
[ https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1581: --- Attachment: (was: HIVE-1581.1.patch) > CompactIndexInputFormat should create split only for files in the index > output file. > > > Key: HIVE-1581 > URL: https://issues.apache.org/jira/browse/HIVE-1581 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > > We can get a list of files from the index file, so no need to create splits > based on all files in the base table/partition -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.
[ https://issues.apache.org/jira/browse/HIVE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1581: --- Attachment: HIVE-1581.1.patch > CompactIndexInputFormat should create split only for files in the index > output file. > > > Key: HIVE-1581 > URL: https://issues.apache.org/jira/browse/HIVE-1581 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: HIVE-1581.1.patch > > > We can get a list of files from the index file, so no need to create splits > based on all files in the base table/partition -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures
[ https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901207#action_12901207 ] Paul Yang commented on HIVE-1578: - @Carl The message to the user about the conf var is a good idea. I can put info level logging statements, but I don't think it's possible to know the number of task completion events before retrieving them, so there won't be a % complete message. > Add conf. property hive.exec.show.job.failure.debug.info to enable/disable > displaying link to the task with most failures > - > > Key: HIVE-1578 > URL: https://issues.apache.org/jira/browse/HIVE-1578 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Paul Yang > Fix For: 0.7.0 > > Attachments: HIVE-1578.1.patch > > > If a job fails, Hive currently displays a link to the task with the most > number of failures for easy access to the error logs. However, generating the > link may require many RPC's to get all the task completion events, adding a > delay of up to 30 minutes. This patch adds a configuration variable to > control whether the link is generated. Turning off this feature would also > disable automatic debugging tips generated by heuristics reading from the > error logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1581) CompactIndexInputFormat should create split only for files in the index output file.
CompactIndexInputFormat should create split only for files in the index output file. Key: HIVE-1581 URL: https://issues.apache.org/jira/browse/HIVE-1581 Project: Hadoop Hive Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1581.1.patch We can get a list of files from the index file, so no need to create splits based on all files in the base table/partition -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.