date:20121119


 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata reassigned HIVE-3718:
---

Assignee: Pamela Vagata

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization


[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500469#comment-13500469
 ] 

Carl Steinbach commented on HIVE-2206:
--

@Yin: The correlation optimizer is only enabled for a small set of new 
CliDriver tests. If I enable the correlation optimizer by default, which of the 
existing CliDriver tests are expected to fail?

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time


 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Attachment: (was: HIVE-3718.1.patch.txt)

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time


 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Attachment: HIVE-3718.1.patch.txt

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time


 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Status: Patch Available  (was: Open)

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread David Inbar (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500474#comment-13500474
]

David Inbar commented on HIVE-2206:
---

I will be on vacation through Friday Nov 23rd, but will be checking email and
voicemail periodically.

For all time-critical items, please call my mobile phone.

Many thanks,
David

NOTICE: All information in and attached to this email may be proprietary,
confidential, privileged and otherwise protected from improper or erroneous
disclosure. If you are not the sender's intended recipient, you are not
authorized to intercept, read, print, retain, copy, forward, or disseminate
this message.

add a new optimizer for query correlation discovery and optimization

Key: HIVE-2206
URL: https://issues.apache.org/jira/browse/HIVE-2206
Project: Hive
Issue Type: New Feature
Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
Attachments: HIVE-2206.10-r1384442.patch.txt,
HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt,
HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt,
HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt,
HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt,
HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt,
HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt,
HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt,
HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch

This issue proposes a new logical optimizer called Correlation Optimizer,
which is used to merge correlated MapReduce jobs (MR jobs) into a single MR
job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The
paper and slides of YSmart are linked at the bottom.
Since Hive translates queries in a sentence by sentence fashion, for every
operation which may need to shuffle the data (e.g. join and aggregation
operations), Hive will generate a MapReduce job for that operation. However,
for those operations which may need to shuffle the data, they may involve
correlations explained below and thus can be executed in a single MR job.
# Input Correlation: Multiple MR jobs have input correlation (IC) if their
input relation sets are not disjoint;
# Transit Correlation: Multiple MR jobs have transit correlation (TC) if they
have not only input correlation, but also the same partition key;
# Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its
child nodes if it has the same partition key as that child node.
The current implementation of correlation optimizer only detect correlations
among MR jobs for reduce-side join operators and reduce-side aggregation
operators (not map only aggregation). A query will be optimized if it
satisfies following conditions.
# There exists a MR job for reduce-side join operator or reduce side
aggregation operator which have JFC with all of its parents MR jobs (TCs will
be also exploited if JFC exists);
# All input tables of those correlated MR job are original input tables (not
intermediate tables generated by sub-queries); and
# No self join is involved in those correlated MR jobs.
Correlation optimizer is implemented as a logical optimizer. The main reasons
are that it only needs to manipulate the query plan tree and it can leverage
the existing component on generating MR jobs.
Current implementation can serve as a framework for correlation related
optimizations. I think that it is better than adding individual optimizers.
There are several work that can be done in future to improve this optimizer.
Here are three examples.
# Support queries only involve TC;
# Support queries in which input tables of correlated MR jobs involves
intermediate tables; and
# Optimize queries involving self join.
References:
Paper and presentation of YSmart.
Paper:
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-11-19 Thread Kevin Wilfong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500493#comment-13500493
 ] 

Kevin Wilfong commented on HIVE-3718:
-

+1

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3647) map-side groupby wrongly due to HIVE-3432

2012-11-19 Thread Kevin Wilfong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3647:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed, thanks Namit.

 map-side groupby wrongly due to HIVE-3432
 -

 Key: HIVE-3647
 URL: https://issues.apache.org/jira/browse/HIVE-3647
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3647.1.patch, hive.3647.2.patch, hive.3647.3.patch, 
 hive.3647.4.patch, hive.3647.5.patch, hive.3647.6.patch, hive.3647.7.patch, 
 hive.3647.8.patch


 There seems to be a bug due to HIVE-3432.
 We are converting the group by to a map side group by after only looking at
 sorting columns. This can give wrong results if the data is sorted and
 bucketed by different columns.
 Add some tests for that scenario, verify and fix any issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes


[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500500#comment-13500500
 ] 

Carl Steinbach commented on HIVE-3678:
--

The upgrade scripts look good to me. As for HIVE-3712 which is included in this 
patch, I have started to wonder if it would be better for the metastore DB to 
store the column stats values (e.g. min/max value, num trues/falses, 
min/max/avg length, etc) as a JSON text blob. This approach would make the code 
more portable by eliminating dependencies on specific DBs and will also make it 
easier to add new fields in the future. The big downside of this approach is 
that we won't be able to push down column stats filters on these fields, but 
I'm not convinced that this is a practical use case in the first place.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread Yin Huai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500499#comment-13500499
]

Yin Huai commented on HIVE-2206:

[~cwsteinbach]
If the optimizer is enabled by default, based on my last tests, only
auto_join26.q is expected to fail, because it will be optimized by correlation
optimizer. But, except the query plan, the query result of auto_join26.q is
correct. Also, once I finished HIVE-3671 (I am working on it right now), the
failure of auto_join26.q should be eliminated.

add a new optimizer for query correlation discovery and optimization

[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time


 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Attachment: (was: HIVE-3718.1.patch.txt)

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3719) Improve HiveServer to support username/password authentication

2012-11-19 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3719:
---

Assignee: Yu Gao

 Improve HiveServer to support username/password authentication
 --

 Key: HIVE-3719
 URL: https://issues.apache.org/jira/browse/HIVE-3719
 Project: Hive
  Issue Type: Improvement
  Components: Authentication, JDBC
Affects Versions: 0.9.0
Reporter: Yu Gao
Assignee: Yu Gao
  Labels: security

 The current HiveServer implementation (call it HiveServer version 1 to 
 distinguish it from HIveServer2 that is under development currently) does not 
 have any authentication mechanism against connecting clients, which means 
 anyone can access it, e.g. through Hive JDBC driver, without any security 
 control. The user and password property are simply ignored by Hive JDBC 
 driver and never get to HiveServer1.
 It would be good to introduce authentication infrastructure to HiveServer 1, 
 and improve JDBC driver implementation as well to support this, so that 
 together with the existing authorization infrastructure, for applications 
 that want to access HiveServer1 via JDBC driver, connections and operations 
 are under security control.
 Although there's HiveServer2 that has been under implementation for a while, 
 this improvement for HiveServer1 is very necessary to fill the big security 
 hole, and would benefit applications a lot that are using HiveServer1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-19 Thread Shreepadma Venugopalan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500526#comment-13500526
]

Shreepadma Venugopalan commented on HIVE-3678:
--

With the changes from HIVE-3712, the column schema has *no* dependency on any
specific db. The column schema, with the changes from HIVE-3712, uses simple
data types, which are supported across DBs. The primary motivation for making
the change to the schema in HIVE-3712 was to avoid storing column statistics
fields as a BLOB. The problem with using a BLOB is a) BLOBs are designed to
store large volumes of data in the order of GBs and are hence stored outside
the row. A consequence of this design is BLOBs don't perform well for storing
small amounts of data. While some DBs such as Oracle inline small BLOBs, all
DBs don't. While BLOBs are the only practical choice for storing data whose
size is not known in advance, it is an overkill for storing around 100 bytes of
data, and b) there is no uniform support across DB vendors and versions. Hence
I don't really see the value in storing this as a JSON BLOB.

Add metastore upgrade scripts for column stats schema changes
-

Key: HIVE-3678
URL: https://issues.apache.org/jira/browse/HIVE-3678
Project: Hive
Issue Type: Bug
Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Fix For: 0.10.0

Attachments: HIVE-3678.1.patch.txt

Add upgrade script for column statistics schema changes for
Postgres/MySQL/Oracle/Derby

[jira] [Updated] (HIVE-3709) Stop storing default ConfVars in temp file


 [ 
https://issues.apache.org/jira/browse/HIVE-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3709:
-

Status: Open  (was: Patch Available)

@Kevin: I still see errors in TestHiveServerSessions when I run the test 
individually:

% ant clean package test -Dtestcase=TestHiveServerSessions

test:
 [echo] Project: service
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/Users/carl/.local/java/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/Users/carl/Work/repos/hive-test/build/ivy/lib/hadoop0.20.shim/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hive.service.TestHiveServerSessions
[junit] Hive history 
file=/Users/carl/Work/repos/hive-test/build/service/tmp/hive_job_log_carl_201211191056_789001489.txt
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 8.439 sec
[junit] Hive history 
file=/Users/carl/Work/repos/hive-test/build/service/tmp/hive_job_log_carl_201211191056_788616740.txt
[junit] [Fatal Error] :1:1: Content is not allowed in prolog.
[junit] [Fatal Error] :92:58: The element type name must be terminated by 
the matching end-tag /name.
[junit] Test org.apache.hadoop.hive.service.TestHiveServerSessions FAILED
  [for] /Users/carl/Work/repos/hive-test/service/build.xml: The following 
error occurred while executing this line:
  [for] /Users/carl/Work/repos/hive-test/build.xml:325: The following error 
occurred while executing this line:
  [for] /Users/carl/Work/repos/hive-test/build-common.xml:455: Tests failed!

BUILD FAILED
/Users/carl/Work/repos/hive-test/build.xml:320: Keepgoing execution: 1 of 12 
iterations failed.


 Stop storing default ConfVars in temp file
 --

 Key: HIVE-3709
 URL: https://issues.apache.org/jira/browse/HIVE-3709
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3709.1.patch.txt, HIVE-3709.2.patch.txt


 To work around issues with Hadoop's Configuration object, specifically it's 
 addResource(InputStream), default configurations are written to a temp file 
 (I think HIVE-2362 introduced this).
 This, however, introduces the problem that once that file is deleted from 
 /tmp the client crashes.  This is particularly problematic for long running 
 services like the metastore server.
 Writing a custom InputStream to deal with the problems in the Configuration 
 object should provide a work around, which does not introduce a time bomb 
 into Hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes


[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500570#comment-13500570
 ] 

Carl Steinbach commented on HIVE-3678:
--

Sorry for the confusion. When I wrote blob I was trying to convey only that 
the field will be opaque to the DB (since it's a JSON struct), not that it will 
actually be stored in a BLOB column. If we store the JSON struct in a VARCHAR 
we have at least 4000 bytes to work with.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread Yin Huai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Nov. 19, 2012, 7:51 p.m.)


Review request for hive.


Changes
---

Correlation optimizer will guess which join operators at the bottom (input 
tables are not intermediate tables) will be optimized by auto join convert and 
ignore those join operators in the optimization of correlation optimizer.


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9fa9525 
  conf/hive-default.xml.template f332f3a 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 7c4c413 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 46daeb2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 68302f8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java edde378 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java d1555e2 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 2bf284d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 330aa52 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5a9f064 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java b33d616 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6f8bc47 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml cd0d6e4 
  ql/src/test/results/compiler/plan/groupby2.q.xml 7b07f02 
  ql/src/test/results/compiler/plan/groupby3.q.xml a6a1986 
  ql/src/test/results/compiler/plan/groupby5.q.xml 25e3583 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

All tests pass.


Thanks,

Yin Huai

[jira] [Updated] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs

2012-11-19 Thread Arup Malakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arup Malakar updated HIVE-3648:
---

Attachment: HIVE_3648_branch_0.patch
HIVE_3648_trunk_1.patch

Patch available for branch. Added one missing abstract method in 
HadoopShimsSecure class.

Updated trunk review: https://reviews.facebook.net/D6759
Branch review: https://reviews.facebook.net/D6801

Thanks,
Arup

 HiveMetaStoreFsImpl is not compatible with hadoop viewfs
 

 Key: HIVE-3648
 URL: https://issues.apache.org/jira/browse/HIVE-3648
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0, 0.10.0
Reporter: Kihwal Lee
 Attachments: HIVE_3648_branch_0.patch, HIVE-3648-trunk-0.patch, 
 HIVE_3648_trunk_1.patch


 HiveMetaStoreFsImpl#deleteDir() method calls Trash#moveToTrash(). This may 
 not work when viewfs is used. It needs to call Trash#moveToAppropriateTrash() 
 instead.  Please note that this method is not available in hadoop versions 
 earlier than 0.23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3721) ALTER TABLE ADD PARTS should check for valid partition spec and throw a SemanticException if part spec is not valid

Pamela Vagata created HIVE-3721:
---

 Summary: ALTER TABLE ADD PARTS should check for valid partition 
spec and throw a SemanticException if part spec is not valid
 Key: HIVE-3721
 URL: https://issues.apache.org/jira/browse/HIVE-3721
 Project: Hive
  Issue Type: Task
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread Yin Huai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.19-r1410581.patch.txt

I just integrate HIVE-3671 into this patch. At the beginning of correlation
optimizer, it will predict if a join operator will be converted by
CommonJoinResolver, if so, correlation optimizer will annotate this join
operator and in the future optimization, ignore this operator. The prediction
can only be made to those join operators the input tables of which are not
intermediate tables. The method of the prediction is ported from
CommonJoinResolver. Also, a test is added in correlationoptimizer1.q

[~namit]
Please take a look at this patch. Let me know if you have any comment.

add a new optimizer for query correlation discovery and optimization

Key: HIVE-2206
URL: https://issues.apache.org/jira/browse/HIVE-2206
Project: Hive
Issue Type: New Feature
Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
Attachments: HIVE-2206.10-r1384442.patch.txt,
HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt,
HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt,
HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt,
HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt,
HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt,
HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt,
HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt,
HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt,
HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch

[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-19 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500599#comment-13500599
 ] 

Ashutosh Chauhan commented on HIVE-3678:


I agree with Carl, making it easier to evolve such that its independent of 
exact type will be a win. We already have one such use-case with BigDecimal 
support being added over on HIVE-2693. 

Also, following looks unintentional change.
{code}
 -- Constraints for table PARTITION_KEYS
-ALTER TABLE PARTITION_KEYS ADD CONSTRAINT PARTITION_KEYS_FK1 FOREIGN KEY 
(TBL_ID) REFERENCES TBLS (TBL_ID) INITIALLY DEFERRED ;
+ALTER TABLE PARTITION_KEYS ADD CONSTRAINT PARTITION_KEYS_FK1 FOREIGN KEY 
(TBTB_ID) REFERENCES TBLS (TBL_ID) INITIALLY DEFERRED ;
{code}


 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization


[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500626#comment-13500626
 ] 

Carl Steinbach commented on HIVE-2206:
--

I'm surprised that auto_join26 is the only test that fails due to different 
EXPLAIN output. Is that because this optimization doesn't affect the queries in 
most tests, or because we don't consistently call EXPLAIN in the tests?

What is preventing us from enabling this by default right now?

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #203

2012-11-19 Thread Apache Jenkins Server

See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/203/

--
[...truncated 36981 lines...]
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-11-19_12-44-29_760_9041461297608391868/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211191244_402762564.txt
[junit] Copying file: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/jenkins/hive_2012-11-19_12-44-33_658_2399849414089401271/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-11-19_12-44-33_658_2399849414089401271/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211191244_1902789586.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211191244_994263279.txt
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211191244_1983954224.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (key int, value 
string)

Re: hive 0.10 release

2012-11-19 Thread Ashutosh Chauhan

Another quick update. I have created a hive-0.10 branch. At this point,
HIVE-3678 is a blocker to do a 0.10 release. There are few others nice to
have which were there in my previous email. I will be happy to merge new
patches between now and RC if folks request for it and are low risk.

Thanks,
Ashutosh
On Thu, Nov 15, 2012 at 2:29 PM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Good progress. Looks like folks are on board. I propose to cut the branch
 in next couple of days. There are few jiras which are patch ready which I
 want to get into the hive-0.10 release, including HIVE-3255 HIVE-2517
 HIVE-3400 HIVE-3678
 Ed has already made a request for HIVE-3083.  If folks have other patches
 they want see in 0.10, please chime in.
 Also, request to other committers to help in review patches. There are
 quite a few in Patch Available state.

 Thanks,
 Ashutosh


 On Thu, Nov 8, 2012 at 3:22 PM, Owen O'Malley omal...@apache.org wrote:

 +1


 On Thu, Nov 8, 2012 at 3:18 PM, Carl Steinbach c...@cloudera.com wrote:

  +1
 
  On Wed, Nov 7, 2012 at 11:23 PM, Alexander Lorenz wget.n...@gmail.com
  wrote:
 
   +1, good karma
  
   On Nov 8, 2012, at 4:58 AM, Namit Jain nj...@fb.com wrote:
  
+1 to the idea
   
On 11/8/12 6:33 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
   
That sounds good. I think this issue needs to be solved as well as
anything else that produces a bugus query result.
   
https://issues.apache.org/jira/browse/HIVE-3083
   
Edward
   
On Wed, Nov 7, 2012 at 7:50 PM, Ashutosh Chauhan 
  hashut...@apache.org
wrote:
Hi,
   
Its been a while since we released 0.10 more than six months ago.
 All
this
while, lot of action has happened with various cool features
 landing
  in
trunk. Additionally, I am looking forward to HiveServer2 landing
 in
trunk.  So, I propose that we cut the branch for 0.10 soon
 afterwards
and
than release it. Thoughts?
   
Thanks,
Ashutosh
   
  
   --
   Alexander Alten-Lorenz
   http://mapredit.blogspot.com
   German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: hive 0.10 release

2012-11-19 Thread kulkarni.swar...@gmail.com

There are couple of enhancements that I have been working on mainly related
to the hive/hbase integration. It would be awesome if it is possible at all
to include them in this release. None of them should really be high risk. I
have patches submitted for few of them. Will try to get for others
submitted in next couple of days. Any specific deadline that I should be
looking forward to?

[1] https://issues.apache.org/jira/browse/HIVE-2599 (Patch Available)
[2] https://issues.apache.org/jira/browse/HIVE-3553 (Patch Available)
[3] https://issues.apache.org/jira/browse/HIVE-3211
[4] https://issues.apache.org/jira/browse/HIVE-3555
[5] https://issues.apache.org/jira/browse/HIVE-3725


On Mon, Nov 19, 2012 at 4:55 PM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Another quick update. I have created a hive-0.10 branch. At this point,
 HIVE-3678 is a blocker to do a 0.10 release. There are few others nice to
 have which were there in my previous email. I will be happy to merge new
 patches between now and RC if folks request for it and are low risk.

 Thanks,
 Ashutosh
 On Thu, Nov 15, 2012 at 2:29 PM, Ashutosh Chauhan hashut...@apache.org
 wrote:

  Good progress. Looks like folks are on board. I propose to cut the branch
  in next couple of days. There are few jiras which are patch ready which I
  want to get into the hive-0.10 release, including HIVE-3255 HIVE-2517
  HIVE-3400 HIVE-3678
  Ed has already made a request for HIVE-3083.  If folks have other patches
  they want see in 0.10, please chime in.
  Also, request to other committers to help in review patches. There are
  quite a few in Patch Available state.
 
  Thanks,
  Ashutosh
 
 
  On Thu, Nov 8, 2012 at 3:22 PM, Owen O'Malley omal...@apache.org
 wrote:
 
  +1
 
 
  On Thu, Nov 8, 2012 at 3:18 PM, Carl Steinbach c...@cloudera.com
 wrote:
 
   +1
  
   On Wed, Nov 7, 2012 at 11:23 PM, Alexander Lorenz 
 wget.n...@gmail.com
   wrote:
  
+1, good karma
   
On Nov 8, 2012, at 4:58 AM, Namit Jain nj...@fb.com wrote:
   
 +1 to the idea

 On 11/8/12 6:33 AM, Edward Capriolo edlinuxg...@gmail.com
  wrote:

 That sounds good. I think this issue needs to be solved as well
 as
 anything else that produces a bugus query result.

 https://issues.apache.org/jira/browse/HIVE-3083

 Edward

 On Wed, Nov 7, 2012 at 7:50 PM, Ashutosh Chauhan 
   hashut...@apache.org
 wrote:
 Hi,

 Its been a while since we released 0.10 more than six months
 ago.
  All
 this
 while, lot of action has happened with various cool features
  landing
   in
 trunk. Additionally, I am looking forward to HiveServer2 landing
  in
 trunk.  So, I propose that we cut the branch for 0.10 soon
  afterwards
 and
 than release it. Thoughts?

 Thanks,
 Ashutosh

   
--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF
   
   
  
 
 
 




-- 
Swarnim

Hive-trunk-h0.21 - Build # 1805 - Still Failing

2012-11-19 Thread Apache Jenkins Server

Changes for Build #1764
[kevinwilfong] HIVE-3610. Add a command Explain dependency ... (Sambavi 
Muthukrishnan via kevinwilfong)


Changes for Build #1765

Changes for Build #1766
[hashutosh] HIVE-3441 : testcases escape1,escape2 fail on windows (Thejas Nair 
via Ashutosh Chauhan)

[kevinwilfong] HIVE-3499. add tests to use bucketing metadata for partitions. 
(njain via kevinwilfong)


Changes for Build #1767
[kevinwilfong] HIVE-3276. optimize union sub-queries. (njain via kevinwilfong)


Changes for Build #1768

Changes for Build #1769

Changes for Build #1770
[namit] HIVE-3570 Add/fix facility to collect operator specific statisticsin 
hive + add hash-in/hash-out
counter for GroupBy Optr (Satadru Pan via namit)

[namit] HIVE-3554 Hive List Bucketing - Query logic
(Gang Tim Liu via namit)

[cws] HIVE-3563. Drop database cascade fails when there are indexes on any 
tables (Prasad Mujumdar via cws)


Changes for Build #1771
[kevinwilfong] HIVE-3640. Reducer allocation is incorrect if enforce bucketing 
and mapred.reduce.tasks are both set. (Vighnesh Avadhani via kevinwilfong)


Changes for Build #1772

Changes for Build #1773

Changes for Build #1774

Changes for Build #1775
[namit] HIVE-3673 Sort merge join not used when join columns have different 
names
(Kevin Wilfong via namit)


Changes for Build #1776
[kevinwilfong] HIVE-3627. eclipse misses library: 
javolution-@javolution-version@.jar. (Gang Tim Liu via kevinwilfong)


Changes for Build #1777
[kevinwilfong] HIVE-3524. Storing certain Exception objects thrown in 
HiveMetaStore.java in MetaStoreEndFunctionContext. (Maheshwaran Srinivasan via 
kevinwilfong)

[cws] HIVE-1977. DESCRIBE TABLE syntax doesn't support specifying a database 
qualified table name (Zhenxiao Luo via cws)

[cws] HIVE-3674. Test case TestParse broken after recent checkin (Sambavi 
Muthukrishnan via cws)


Changes for Build #1778
[cws] HIVE-1362. Column level scalar valued statistics on Tables and Partitions 
(Shreepadma Venugopalan via cws)


Changes for Build #1779

Changes for Build #1780
[kevinwilfong] HIVE-3686. Fix compile errors introduced by the interaction of 
HIVE-1362 and HIVE-3524. (Shreepadma Venugopalan via kevinwilfong)


Changes for Build #1781
[namit] HIVE-3687 smb_mapjoin_13.q is nondeterministic
(Kevin Wilfong via namit)


Changes for Build #1782
[hashutosh] HIVE-2715: Upgrade Thrift dependency to 0.9.0 (Ashutosh Chauhan)


Changes for Build #1783
[kevinwilfong] HIVE-3654. block relative path access in hive. (njain via 
kevinwilfong)

[hashutosh] HIVE-3658 : Unable to generate the Hbase related unit tests using 
velocity templates on Windows (Kanna Karanam via Ashutosh Chauhan)

[hashutosh] HIVE-3661 : Remove the Windows specific = related swizzle path 
changes from Proxy FileSystems (Kanna Karanam via Ashutosh Chauhan)

[hashutosh] HIVE-3480 : Resource leak: Fix the file handle leaks in Symbolic 
 Symlink related input formats. (Kanna Karanam via Ashutosh Chauhan)


Changes for Build #1784
[kevinwilfong] HIVE-3675. NaN does not work correctly for round(n). (njain via 
kevinwilfong)

[cws] HIVE-3651. bucketmapjoin?.q tests fail with hadoop 0.23 (Prasad Mujumdar 
via cws)


Changes for Build #1785
[namit] HIVE-3613 Implement grouping_id function
(Ian Gorbachev via namit)

[namit] HIVE-3692 Update parallel test documentation
(Ivan Gorbachev via namit)

[namit] HIVE-3649 Hive List Bucketing - enhance DDL to specify list bucketing 
table
(Gang Tim Liu via namit)


Changes for Build #1786
[namit] HIVE-3696 Revert HIVE-3483 which causes performance regression
(Gang Tim Liu via namit)


Changes for Build #1787
[kevinwilfong] HIVE-3621. Make prompt in Hive CLI configurable. (Jingwei Lu via 
kevinwilfong)

[kevinwilfong] HIVE-3695. TestParse breaks due to HIVE-3675. (njain via 
kevinwilfong)


Changes for Build #1788
[kevinwilfong] HIVE-3557. Access to external URLs in hivetest.py. (Ivan 
Gorbachev via kevinwilfong)


Changes for Build #1789
[hashutosh] HIVE-3662 : TestHiveServer: testScratchDirShouldClearWhileStartup 
is failing on Windows (Kanna Karanam via Ashutosh Chauhan)

[hashutosh] HIVE-3659 : TestHiveHistory::testQueryloglocParentDirNotExist Test 
fails on Windows because of some resource leaks in ZK (Kanna Karanam via 
Ashutosh Chauhan)

[hashutosh] HIVE-3663 Unable to display the MR Job file path on Windows in case 
of MR job failures.  (Kanna Karanam via Ashutosh Chauhan)


Changes for Build #1790

Changes for Build #1791

Changes for Build #1792

Changes for Build #1793
[hashutosh] HIVE-3704 : name of some metastore scripts are not per convention 
(Ashutosh Chauhan)


Changes for Build #1794
[hashutosh] HIVE-3243 : ignore white space between entries of hive/hbase table 
mapping (Shengsheng Huang via Ashutosh Chauhan)

[hashutosh] HIVE-3215 : JobDebugger should use RunningJob.getTrackingURL 
(Bhushan Mandhani via Ashutosh Chauhan)


Changes for Build #1795
[cws] HIVE-3437. 0.23 compatibility: fix unit tests when building against 0.23 
(Chris Drome via cws)

[jira] [Commented] (HIVE-3722) Create index fails on CLI using remote metastore

2012-11-19 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500828#comment-13500828
 ] 

Namit Jain commented on HIVE-3722:
--

+1

 Create index fails on CLI using remote metastore
 

 Key: HIVE-3722
 URL: https://issues.apache.org/jira/browse/HIVE-3722
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3722.1.patch.txt


 If the CLI uses a remote metastore and the user attempts to create an index 
 without a comment, it will fail with a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3722) Create index fails on CLI using remote metastore

2012-11-19 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500834#comment-13500834
 ] 

Ashutosh Chauhan commented on HIVE-3722:


Kevin,
I am not sure if you have looked at the discussion on HIVE-2800 Adding a 
null-check may just be masking an underlying issue. I think it might be 
worthwhile to uncover it, since this thrift nuisance (of null handling) may 
bite us again in future.

 Create index fails on CLI using remote metastore
 

 Key: HIVE-3722
 URL: https://issues.apache.org/jira/browse/HIVE-3722
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3722.1.patch.txt


 If the CLI uses a remote metastore and the user attempts to create an index 
 without a comment, it will fail with a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3722) Create index fails on CLI using remote metastore

2012-11-19 Thread Kevin Wilfong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500847#comment-13500847
 ] 

Kevin Wilfong commented on HIVE-3722:
-

Ashutosh, I missed that JIRA.  But based on THRIFT-1625 it sounds like we have 
to add a check to our code.

 Create index fails on CLI using remote metastore
 

 Key: HIVE-3722
 URL: https://issues.apache.org/jira/browse/HIVE-3722
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3722.1.patch.txt


 If the CLI uses a remote metastore and the user attempts to create an index 
 without a comment, it will fail with a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3589) describe/show partition/show tblproperties command should accept database name

2012-11-19 Thread Phabricator (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500857#comment-13500857
 ] 

Phabricator commented on HIVE-3589:
---

navis has commented on the revision HIVE-3589 [jira] describe/show 
partition/show tblproperties command should accept database name.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java:1802 fixed.
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:1407 I 
just split original method to two. Exception seemed for handling thrift errors 
and should be re-thrown to user.
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:1472 
agreed. I'll do it.
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:1474 I 
always thought splitting with regex pattern for this kind of simple string is a 
bit too much. But if it's cleaner, I'll do it.
  ql/src/java/org/apache/hadoop/hive/ql/plan/DescTableDesc.java:38 ok.
  ql/src/java/org/apache/hadoop/hive/ql/plan/DescTableDesc.java:112 I'll check 
on that.
  ql/src/java/org/apache/hadoop/hive/ql/plan/ShowPartitionsDesc.java:64 ok.
  ql/src/java/org/apache/hadoop/hive/ql/plan/ShowTblPropertiesDesc.java:34 ok.
  ql/src/test/queries/clientpositive/describe_table.q:5 Yes, it was HIVE-3676. 
I'll add the test.

REVISION DETAIL
  https://reviews.facebook.net/D6075

BRANCH
  DPAL-1916

To: JIRA, cwsteinbach, navis


 describe/show partition/show tblproperties command should accept database name
 --

 Key: HIVE-3589
 URL: https://issues.apache.org/jira/browse/HIVE-3589
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Affects Versions: 0.8.1
Reporter: Sujesh Chirackkal
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3589.D6075.1.patch


 describe command not giving the details when called as describe 
 dbname.tablename.
 Throwing the error Table dbname not found.
 Ex: hive -e describe masterdb.table1 will throw error
 Table masterdb not found

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3635) allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type


 [ 
https://issues.apache.org/jira/browse/HIVE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alten-Lorenz updated HIVE-3635:
-

Attachment: (was: HIVE-3635.patch)

  allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for 
 the boolean hive type
 ---

 Key: HIVE-3635
 URL: https://issues.apache.org/jira/browse/HIVE-3635
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.9.0
Reporter: Alexander Alten-Lorenz
Assignee: Alexander Alten-Lorenz
 Fix For: 0.10.0

 Attachments: HIVE-3635.patch


 interpret t as true and f as false for boolean types. PostgreSQL exports 
 represent it that way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3635) allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type


 [ 
https://issues.apache.org/jira/browse/HIVE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alten-Lorenz updated HIVE-3635:
-

Status: Patch Available  (was: Open)

  allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for 
 the boolean hive type
 ---

 Key: HIVE-3635
 URL: https://issues.apache.org/jira/browse/HIVE-3635
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.9.0
Reporter: Alexander Alten-Lorenz
Assignee: Alexander Alten-Lorenz
 Fix For: 0.10.0

 Attachments: HIVE-3635.patch


 interpret t as true and f as false for boolean types. PostgreSQL exports 
 represent it that way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3635) allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type


 [ 
https://issues.apache.org/jira/browse/HIVE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alten-Lorenz updated HIVE-3635:
-

Attachment: HIVE-3635.patch

  allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for 
 the boolean hive type
 ---

 Key: HIVE-3635
 URL: https://issues.apache.org/jira/browse/HIVE-3635
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.9.0
Reporter: Alexander Alten-Lorenz
Assignee: Alexander Alten-Lorenz
 Fix For: 0.10.0

 Attachments: HIVE-3635.patch


 interpret t as true and f as false for boolean types. PostgreSQL exports 
 represent it that way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3635) allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type