Changes for Build #2171 [hashutosh] HIVE-4618 : show create table creating unusable DDL when field delimiter is \001 (Navis via Ashutosh Chauhan)
[hashutosh] HIVE-4559 : hcatalog/webhcat scripts in tar.gz don't have execute permissions set (Eugene Koifman via Ashutosh Chauhan) [hashutosh] HIVE-4798 : NPE when we call isSame from an instance of ExprNodeConstantDesc with null value (Yin Huai via Ashutosh Chauhan) [hashutosh] HIVE-4781 : LEFT SEMI JOIN generates wrong results when the number of rows belonging to a single key of the right table exceed hive.join.emit.interval (Yin Huai via Ashutosh Chauhan) [hashutosh] HIVE-4647 : RetryingHMSHandler logs too many error messages (Navis via Ashutosh Chauhan) [hashutosh] HIVE-4692 : Constant agg parameters will be replaced by ExprNodeColumnDesc with single-sourced multi-gby cases (Navis via Ashutosh Chauhan) Changes for Build #2172 [hashutosh] HIVE-4781 : Adding new data files for tests. Missed in original commit. Changes for Build #2173 Changes for Build #2174 [navis] HIVE-2517 : Support group by on struct type (Ashutosh Chauhan via Navis) [hashutosh] HIVE-4406 : Missing / or /<dbname> in hs2 jdbc uri switches mode to embedded mode(Anandha Ranganathan via Ashutosh Chauhan) [hashutosh] HIVE-4430 : Semantic analysis fails in presence of certain literals in on clause (Kevin Wilfong via Ashutosh Chauhan) [hashutosh] HIVE-4757 : LazyTimestamp goes into irretrievable NULL mode once inited with NULL once (Gopal V via Ashutosh Chauhan) [hashutosh] HIVE-4785 : Implement isCaseSensitive for Hive JDBC driver (Robert Roland via Ashutosh Chauhan) Changes for Build #2175 [navis] HIVE-4436 : hive.exec.parallel=true doesn't work on hadoop-2 (Gopal V via Navis) Changes for Build #2176 Changes for Build #2177 [hashutosh] HIVE-4689 : For outerjoins, joinEmitInterval might make wrong result (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3253 : ArrayIndexOutOfBounds exception for deeply nested structs (Thejas Nair via Ashutosh Chauhan) Changes for Build #2178 Changes for Build #2179 Changes for Build #2180 Changes for Build #2181 [hashutosh] HIVE-4089 : javax.jdo : jdo2-api dependency not in Maven Central (Navis via Ashutosh Chauhan) [ecapriolo] HIVE-4804 parallel order by fails for small datasets (Navis via egc) Submitted by: Navis Reviewed by: Edward Capriolo Changes for Build #2182 Changes for Build #2183 [hashutosh] HIVE-4814 : Adjust WebHCat e2e tests until HIVE4703 is addressed (Eugene Koifman via Ashutosh Chauhan) Changes for Build #2184 [hashutosh] HIVE-4811 : (Slightly) break up the SemanticAnalyzer monstrosity (Gunther Hagleitner via Ashutosh Chauhan) Changes for Build #2185 [hashutosh] HIVE-4251 : Indices can't be built on tables whose schema info comes from SerDe (Mark Wagner via Ashutosh Chauhan) [hashutosh] HIVE-4805 : Enhance coverage of package org.apache.hadoop.hive.ql.exec.errors (Ivan Veselovsky via Ashutosh Chauhan) Changes for Build #2186 [hashutosh] HIVE-4733 : HiveLockObjectData is not compared properly (Navis via Ashutosh Chauhan) [ecapriolo] HIVE-3475 INLINE UDTF does not convert types properly (Igor Kabiljo and Navis Ryu via egc) Submitted by: Navis Ryu and Igor Kabiljo Reviewed by: Edward Capriolo Changes for Build #2187 [hashutosh] HIVE-4802 : Fix url check for missing / or /<db> after hostname in jdb uri (Thejas Nair via Ashutosh Chauhan) Changes for Build #2188 [hashutosh] HIVE-4813 : Improve test coverage of package org.apache.hadoop.hive.ql.optimizer.pcr (Ivan Veselovsky via Ashutosh Chauhan) [hashutosh] HIVE-4580 : Change DDLTask to report errors using canonical error messages rather than http status codes (Eugene Koifman via Ashutosh Chauhan) [hashutosh] HIVE-4796 : Increase coverage of package org.apache.hadoop.hive.common.metrics (Ivan Veselovsky via Ashutosh Chauhan) [navis] HIVE-4812 : Logical explain plan (Gunther Hagleitner V via Navis) [hashutosh] HIVE-3810 : HiveHistory.log need to replace \r with space before writing Entry.value to historyfile (Mark Grover via Ashutosh Chauhan) Changes for Build #2189 [hashutosh] HIVE-4810 [jira] Refactor exec package (Gunther Hagleitner via Ashutosh Chauhan) Summary: HIVE-4810 The exec package contains both operators and classes used to execute the job. Moving the latter into a sub package makes the package slightly more manageable and will make it easier to provide a tez-based implementation. Test Plan: Refactoring Reviewers: ashutoshc Reviewed By: ashutoshc Differential Revision: https://reviews.facebook.net/D11625 [hashutosh] HIVE-4829 : TestWebHCatE2e checkstyle violation causes all tests to fail (Eugene Koifman via Ashutosh Chauhan) [hashutosh] HIVE-4819 : Comments in CommonJoinOperator for aliasTag is not valid (Navis via Ashutosh Chauhan) Changes for Build #2190 [hashutosh] HIVE-4807 : Hive metastore hangs (Sarvesh Sakalanaga via Ashutosh Chauhan) [hashutosh] HIVE-4833 : Fix eclipse template classpath to include the correct jdo lib (Yin Huai via Ashutosh Chauhan) [hashutosh] HIVE-4830 : Test clientnegative/nested_complex_neg.q got broken due to 4580 (Vikram Dixit via Ashutosh Chauhan) Changes for Build #2191 [hashutosh] HIVE-3691 : TestDynamicSerDe failed with IBM JDK (Bing Li & Renata Ghisloti via Ashutosh Chauhan) Changes for Build #2192 Changes for Build #2193 Changes for Build #2194 Changes for Build #2195 [hashutosh] HIVE-4840 : Fix eclipse template classpath to include the BoneCP lib (Yin Huai via Ashutosh Chauhan) Changes for Build #2196 [navis] HIVE-4290 : Build profiles: Partial builds for quicker dev (Gunther Hagleitner via Navis) [navis] HIVE-4658 : Make KW_OUTER optional in outer joins (Edward Capriolo via Navis) Changes for Build #2197 Changes for Build #2198 Changes for Build #2199 [hashutosh] HIVE-4852 : -Dbuild.profile=core fails (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4854 : testCliDriver_load_hdfs_file_with_space_in_the_name fails on hadoop 2 (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4853 : junit timeout needs to be updated (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4721 : Fix TestCliDriver.ptf_npath.q on 0.23 (Gunther Hagleitner via Ashutosh Chauhan) Changes for Build #2200 [ecapriolo] HIVE-3603 Enable client-side caching for scans on HBase (Navis Ryu via EGC) Submitted by: Navis Ryu Reviewed by: Edward Capriolo Changes for Build #2201 Changes for Build #2203 [daijy] HIVE-4820 : webhcat_config.sh should set default values for HIVE_HOME and HCAT_PREFIX that work with default build tree structure (Eugene Koifman via Jianyong Dai) [hashutosh] HIVE-4845 : Correctness issue with MapJoins using the null safe operator (Brock Noland via Ashutosh Chauhan) Changes for Build #2204 [brock] HIVE-4865 - HiveLockObjects: Unlocking retries/times out when query contains ":" (Gunther Hagleitner via Brock Noland) Changes for Build #2205 [hashutosh] HIVE-2206 [jira] add a new optimizer for query correlation discovery and optimization (Yin Huai via Ashutosh Chauhan) Summary: update test results This issue proposes a new logical optimizer called Correlation Optimizer, which is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The paper and slides of YSmart are linked at the bottom. Since Hive translates queries in a sentence by sentence fashion, for every operation which may need to shuffle the data (e.g. join and aggregation operations), Hive will generate a MapReduce job for that operation. However, for those operations which may need to shuffle the data, they may involve correlations explained below and thus can be executed in a single MR job. Input Correlation: Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint; Transit Correlation: Multiple MR jobs have transit correlation (TC) if they have not only input correlation, but also the same partition key; Job Flow Correlation: An MR has job flow correlation (JFC) with one of its child nodes if it has the same partition key as that child node. The current implementation of correlation optimizer only detect correlations among MR jobs for reduce-side join operators and reduce-side aggregation operators (not map only aggregation). A query will be optimized if it satisfies following conditions. There exists a MR job for reduce-side join operator or reduce side aggregation operator which have JFC with all of its parents MR jobs (TCs will be also exploited if JFC exists); All input tables of those correlated MR job are original input tables (not intermediate tables generated by sub-queries); and No self join is involved in those correlated MR jobs. Correlation optimizer is implemented as a logical optimizer. The main reasons are that it only needs to manipulate the query plan tree and it can leverage the existing component on generating MR jobs. Current implementation can serve as a framework for correlation related optimizations. I think that it is better than adding individual optimizers. There are several work that can be done in future to improve this optimizer. Here are three examples. Support queries only involve TC; Support queries in which input tables of correlated MR jobs involves intermediate tables; and Optimize queries involving self join. References: Paper and presentation of YSmart. Paper: http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf Slides: http://sdrv.ms/UpwJJc Test Plan: EMPTY Reviewers: JIRA, ashutoshc Reviewed By: ashutoshc CC: brock Differential Revision: https://reviews.facebook.net/D11097 [ecapriolo] HIVE-4873 Sort candidate functions in case of UDFArgumentException (Xuefu Zhang via egc) Submitted by: Xuefu Zhang Reviewed by: Edward Capriolo Changes for Build #2206 Changes for Build #2207 [ecapriolo] HIVE-4675 Create new parallel unit test environment (Brock Noland via egc) Submitted by: Brock Noland Reviewed by: Edward Capriolo Changes for Build #2208 Changes for Build #2209 [gates] Enable parallel execution of various E2E tests (deepeshk via gates) [hashutosh] HIVE-4730 : Join on more than 2^31 records on single reducer failed (wrong results) (Navis via Ashutosh Chauhan) [brock] HIVE-4818: SequenceId in operator is not thread safe (Edward Capriolo via Brock Noland) [brock] HIVE-4874 Identical methods PTFDeserializer.addOIPropertiestoSerDePropsMap(), PTFTranslator.addOIPropertiestoSerDePropsMap() (Edward Capriolo via Brock Noland Changes for Build #2210 Changes for Build #2211 [hashutosh] HIVE-4877 : In ExecReducer, remove tag from the row which will be passed to the first Operator at the Reduce-side (Yin Huai via Ashutosh Chauhan) [omalley] HIVE-4724 Better detection of non-ORC files in the ORC reader (omalley) Changes for Build #2212 All tests passed The Apache Jenkins build system has built Hive-trunk-h0.21 (build #2212) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/2212/ to view the results.