Changes for Build #2171
[hashutosh] HIVE-4618 : show create table creating unusable DDL when field 
delimiter is \001 (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-4559 : hcatalog/webhcat scripts in tar.gz don't have execute 
permissions set (Eugene Koifman via Ashutosh Chauhan)

[hashutosh] HIVE-4798 : NPE when we call isSame from an instance of 
ExprNodeConstantDesc with null value (Yin Huai via Ashutosh Chauhan)

[hashutosh] HIVE-4781 : LEFT SEMI JOIN generates wrong results when the number 
of rows belonging to a single key of the right table exceed 
hive.join.emit.interval (Yin Huai via Ashutosh Chauhan)

[hashutosh] HIVE-4647 : RetryingHMSHandler logs too many error messages (Navis 
via Ashutosh Chauhan)

[hashutosh] HIVE-4692 : Constant agg parameters will be replaced by 
ExprNodeColumnDesc with single-sourced multi-gby cases (Navis via Ashutosh 
Chauhan)


Changes for Build #2172
[hashutosh] HIVE-4781 : Adding new data files for tests. Missed in original 
commit.


Changes for Build #2173

Changes for Build #2174
[navis] HIVE-2517 : Support group by on struct type (Ashutosh Chauhan via Navis)

[hashutosh] HIVE-4406 : Missing / or /<dbname> in hs2 jdbc uri switches mode to 
embedded mode(Anandha Ranganathan via Ashutosh Chauhan)

[hashutosh] HIVE-4430 : Semantic analysis fails in presence of certain literals 
in on clause (Kevin Wilfong via Ashutosh Chauhan)

[hashutosh] HIVE-4757 : LazyTimestamp goes into irretrievable NULL mode once 
inited with NULL once (Gopal V via Ashutosh Chauhan)

[hashutosh] HIVE-4785 : Implement isCaseSensitive for Hive JDBC driver (Robert 
Roland via Ashutosh Chauhan)


Changes for Build #2175
[navis] HIVE-4436 : hive.exec.parallel=true doesn't work on hadoop-2
 (Gopal V via Navis)


Changes for Build #2176

Changes for Build #2177
[hashutosh] HIVE-4689 : For outerjoins, joinEmitInterval might make wrong 
result (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-3253 : ArrayIndexOutOfBounds exception for deeply nested 
structs (Thejas Nair via Ashutosh Chauhan)


Changes for Build #2178

Changes for Build #2179

Changes for Build #2180

Changes for Build #2181
[hashutosh] HIVE-4089 : javax.jdo : jdo2-api dependency not in Maven Central 
(Navis via Ashutosh Chauhan)

[ecapriolo] HIVE-4804 parallel order by fails for small datasets (Navis via egc)

Submitted by:   Navis
Reviewed by:    Edward Capriolo


Changes for Build #2182

Changes for Build #2183
[hashutosh] HIVE-4814 : Adjust WebHCat e2e tests until HIVE4703 is addressed 
(Eugene Koifman via Ashutosh Chauhan)


Changes for Build #2184
[hashutosh] HIVE-4811 : (Slightly) break up the SemanticAnalyzer monstrosity 
(Gunther Hagleitner via Ashutosh Chauhan)


Changes for Build #2185
[hashutosh] HIVE-4251 : Indices can't be built on tables whose schema info 
comes from SerDe (Mark Wagner via Ashutosh Chauhan)

[hashutosh] HIVE-4805 : Enhance coverage of package 
org.apache.hadoop.hive.ql.exec.errors (Ivan Veselovsky via Ashutosh Chauhan)


Changes for Build #2186
[hashutosh] HIVE-4733 : HiveLockObjectData is not compared properly (Navis via 
Ashutosh Chauhan)

[ecapriolo] HIVE-3475 INLINE UDTF does not convert types properly (Igor Kabiljo 
and Navis Ryu via egc)

Submitted by:   Navis Ryu and Igor Kabiljo
Reviewed by:    Edward Capriolo


Changes for Build #2187
[hashutosh] HIVE-4802 : Fix url check for missing / or /<db> after hostname in 
jdb uri (Thejas Nair via Ashutosh Chauhan)


Changes for Build #2188
[hashutosh] HIVE-4813 : Improve test coverage of package 
org.apache.hadoop.hive.ql.optimizer.pcr (Ivan Veselovsky via Ashutosh Chauhan)

[hashutosh] HIVE-4580 : Change DDLTask to report errors using canonical error 
messages rather than http status codes (Eugene Koifman via Ashutosh Chauhan)

[hashutosh] HIVE-4796 : Increase coverage of package 
org.apache.hadoop.hive.common.metrics (Ivan Veselovsky via Ashutosh Chauhan)

[navis] HIVE-4812 : Logical explain plan (Gunther Hagleitner V via Navis)

[hashutosh] HIVE-3810 : HiveHistory.log need to replace \r with space before 
writing Entry.value to historyfile (Mark Grover via Ashutosh Chauhan)


Changes for Build #2189
[hashutosh] HIVE-4810 [jira] Refactor exec package
(Gunther Hagleitner via Ashutosh Chauhan)

Summary:
HIVE-4810

The exec package contains both operators and classes used to execute the job. 
Moving the latter into a sub package makes the package slightly more manageable 
and will make it easier to provide a tez-based implementation.

Test Plan: Refactoring

Reviewers: ashutoshc

Reviewed By: ashutoshc

Differential Revision: https://reviews.facebook.net/D11625

[hashutosh] HIVE-4829 : TestWebHCatE2e checkstyle violation causes all tests to 
fail (Eugene Koifman via Ashutosh Chauhan)

[hashutosh] HIVE-4819 : Comments in CommonJoinOperator for aliasTag is not 
valid (Navis via Ashutosh Chauhan)


Changes for Build #2190
[hashutosh] HIVE-4807 : Hive metastore hangs (Sarvesh Sakalanaga via Ashutosh 
Chauhan)

[hashutosh] HIVE-4833 : Fix eclipse template classpath to include the correct 
jdo lib (Yin Huai via Ashutosh Chauhan)

[hashutosh] HIVE-4830 : Test clientnegative/nested_complex_neg.q got broken due 
to 4580 (Vikram Dixit via Ashutosh Chauhan)


Changes for Build #2191
[hashutosh] HIVE-3691 : TestDynamicSerDe failed with IBM JDK (Bing Li & Renata 
Ghisloti via Ashutosh Chauhan)


Changes for Build #2192

Changes for Build #2193

Changes for Build #2194

Changes for Build #2195
[hashutosh] HIVE-4840 : Fix eclipse template classpath to include the BoneCP 
lib (Yin Huai via Ashutosh Chauhan)


Changes for Build #2196
[navis] HIVE-4290 : Build profiles: Partial builds for quicker dev (Gunther 
Hagleitner via Navis)

[navis] HIVE-4658 : Make KW_OUTER optional in outer joins (Edward Capriolo via 
Navis)


Changes for Build #2197

Changes for Build #2198

Changes for Build #2199
[hashutosh] HIVE-4852 : -Dbuild.profile=core fails (Gunther Hagleitner via 
Ashutosh Chauhan)

[hashutosh] HIVE-4854 : testCliDriver_load_hdfs_file_with_space_in_the_name 
fails on hadoop 2 (Gunther Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-4853 : junit timeout needs to be updated (Gunther Hagleitner 
via Ashutosh Chauhan)

[hashutosh] HIVE-4721 : Fix TestCliDriver.ptf_npath.q on 0.23 (Gunther 
Hagleitner via Ashutosh Chauhan)


Changes for Build #2200
[ecapriolo] HIVE-3603 Enable client-side caching for scans on HBase (Navis Ryu 
via EGC)

Submitted by:   Navis Ryu
Reviewed by:    Edward Capriolo


Changes for Build #2201

Changes for Build #2203
[daijy] HIVE-4820 : webhcat_config.sh should set default values for HIVE_HOME 
and HCAT_PREFIX that work with default build tree structure (Eugene Koifman via 
Jianyong Dai)

[hashutosh] HIVE-4845 : Correctness issue with MapJoins using the null safe 
operator (Brock Noland via Ashutosh Chauhan)


Changes for Build #2204
[brock] HIVE-4865 - HiveLockObjects: Unlocking retries/times out when query 
contains ":" (Gunther Hagleitner via Brock Noland)


Changes for Build #2205
[hashutosh] HIVE-2206 [jira] add a new optimizer for query correlation 
discovery and optimization
(Yin Huai via Ashutosh Chauhan)

Summary:
update test results

This issue proposes a new logical optimizer called Correlation Optimizer, which 
is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The 
idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The paper and 
slides of YSmart are linked at the bottom.

Since Hive translates queries in a sentence by sentence fashion, for every 
operation which may need to shuffle the data (e.g. join and aggregation 
operations), Hive will generate a MapReduce job for that operation. However, 
for those operations which may need to shuffle the data, they may involve 
correlations explained below and thus can be executed in a single MR job.

        Input Correlation: Multiple MR jobs have input correlation (IC) if 
their input relation sets are not disjoint;
        Transit Correlation: Multiple MR jobs have transit correlation (TC) if 
they have not only input correlation, but also the same partition key;
        Job Flow Correlation: An MR has job flow correlation (JFC) with one of 
its child nodes if it has the same partition key as that child node.

The current implementation of correlation optimizer only detect correlations 
among MR jobs for reduce-side join operators and reduce-side aggregation 
operators (not map only aggregation). A query will be optimized if it satisfies 
following conditions.

        There exists a MR job for reduce-side join operator or reduce side 
aggregation operator which have JFC with all of its parents MR jobs (TCs will 
be also exploited if JFC exists);
        All input tables of those correlated MR job are original input tables 
(not intermediate tables generated by sub-queries); and
        No self join is involved in those correlated MR jobs.

Correlation optimizer is implemented as a logical optimizer. The main reasons 
are that it only needs to manipulate the query plan tree and it can leverage 
the existing component on generating MR jobs.

Current implementation can serve as a framework for correlation related 
optimizations. I think that it is better than adding individual optimizers.

There are several work that can be done in future to improve this optimizer. 
Here are three examples.

        Support queries only involve TC;
        Support queries in which input tables of correlated MR jobs involves 
intermediate tables; and
        Optimize queries involving self join.

References:
Paper and presentation of YSmart.
Paper: 
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

Test Plan: EMPTY

Reviewers: JIRA, ashutoshc

Reviewed By: ashutoshc

CC: brock

Differential Revision: https://reviews.facebook.net/D11097

[ecapriolo] HIVE-4873 Sort candidate functions in case of UDFArgumentException 
(Xuefu Zhang via egc)

Submitted by:   Xuefu Zhang
Reviewed by:    Edward Capriolo


Changes for Build #2206

Changes for Build #2207
[ecapriolo] HIVE-4675 Create new parallel unit test environment (Brock Noland 
via egc)

Submitted by: Brock Noland      
Reviewed by: Edward Capriolo


Changes for Build #2208

Changes for Build #2209
[gates] Enable parallel execution of various E2E tests (deepeshk via gates)

[hashutosh] HIVE-4730 : Join on more than 2^31 records on single reducer failed 
(wrong results) (Navis via Ashutosh Chauhan)

[brock] HIVE-4818: SequenceId in operator is not thread safe (Edward Capriolo 
via Brock Noland)

[brock] HIVE-4874 Identical methods 
PTFDeserializer.addOIPropertiestoSerDePropsMap(), 
PTFTranslator.addOIPropertiestoSerDePropsMap() (Edward Capriolo via Brock Noland


Changes for Build #2210

Changes for Build #2211
[hashutosh] HIVE-4877 : In ExecReducer, remove tag from the row which will be 
passed to the first Operator at the Reduce-side (Yin Huai via Ashutosh Chauhan)

[omalley] HIVE-4724 Better detection of non-ORC files in the ORC reader 
(omalley)


Changes for Build #2212



All tests passed

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #2212)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/2212/ to 
view the results.

Reply via email to