This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hive-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 4fd8629 deploy: 8c2160824c476a53b74aee27bedaac7595b8157e
4fd8629 is described below
commit 4fd862910cb625905f6fa4870763ef1d6a4adf41
Author: zabetak <[email protected]>
AuthorDate: Mon Sep 1 13:39:13 2025 +0000
deploy: 8c2160824c476a53b74aee27bedaac7595b8157e
---
community/resources/hive-apis-overview/index.html | 21 --
community/resources/index.xml | 4 +-
.../resources/unit-testing-hive-sql/index.html | 41 +---
.../column-statistics-in-hive/index.html | 28 ---
.../hivereplicationdevelopment/index.html | 6 -
development/desingdocs/index.xml | 4 +-
development/desingdocs/links/index.html | 263 ---------------------
development/desingdocs/listbucketing/index.html | 112 +--------
development/desingdocs/llap/index.html | 8 -
.../desingdocs/skewed-join-optimization/index.html | 24 --
.../admin/adminmanual-configuration/index.html | 3 -
.../index.html | 1 -
.../admin/hive-on-spark-getting-started/index.html | 21 --
docs/latest/admin/index.html | 2 +-
docs/latest/admin/index.xml | 2 +-
docs/latest/admin/replication/index.html | 3 +-
.../index.html | 36 ---
docs/latest/language/index.html | 2 +-
docs/latest/language/index.xml | 2 +-
docs/latest/language/languagemanual-ddl/index.html | 2 -
.../language/languagemanual-types/index.html | 1 -
docs/latest/language/reflectudf/index.html | 39 +--
docs/latest/language/supported-features/index.html | 75 +-----
.../user/configuration-properties/index.html | 22 --
docs/latest/user/hive-transactions-acid/index.html | 2 -
docs/latest/user/hive-transactions/index.html | 2 -
docs/latest/user/hiveserver2-clients/index.html | 1 -
docs/latest/user/index.html | 2 +-
docs/latest/user/index.xml | 2 +-
docs/latest/user/multidelimitserde/index.html | 35 +--
docs/latest/user/serde/index.html | 68 +-----
index.json | 2 +-
index.xml | 14 +-
33 files changed, 32 insertions(+), 818 deletions(-)
diff --git a/community/resources/hive-apis-overview/index.html
b/community/resources/hive-apis-overview/index.html
index 14baf3d..5ad5c8c 100644
--- a/community/resources/hive-apis-overview/index.html
+++ b/community/resources/hive-apis-overview/index.html
@@ -167,7 +167,6 @@ Last updated: December 12, 2024
<li><a href=#streaming-data-ingest-java>Streaming Data Ingest (Java)</a></li>
<li><a href=#streaming-mutation-java>Streaming Mutation (Java)</a></li>
<li><a href=#hive-jdbc-jdbc>hive-jdbc (JDBC)</a></li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
@@ -196,7 +195,6 @@ Last updated: December 12, 2024
<li><a href=#streaming-data-ingest-java>Streaming Data Ingest (Java)</a></li>
<li><a href=#streaming-mutation-java>Streaming Mutation (Java)</a></li>
<li><a href=#hive-jdbc-jdbc>hive-jdbc (JDBC)</a></li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
@@ -231,24 +229,6 @@ Last updated: December 12, 2024
<p>Operation based Java API focused on mutating (insert/update/delete) records
into transactional tables using Hive’s <a
href=https://hive.apache.org/docs/latest/user/hive-transactions/>ACID</a>
feature. Large volumes of mutations are applied atomically in a single
long-lived transaction. Documented <a
href=https://hive.apache.org/docs/latest/hcatalog/hcatalog-streaming-mutation-api/>on
the wiki</a>. Scheduled for release in Hive version 2.0.0 (<a
href=https://issues.apache.org/jira/brow [...]
<h2 id=hive-jdbc-jdbc>hive-jdbc (JDBC)</h2>
<p>JDBC API supported by Hive. It supports most of the functionality in JDBC
spec.</p>
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-</tr>
-<tr>
-<td>Page created after <a
href="https://issues.apache.org/jira/browse/HIVE-12285?focusedCommentId=14981551&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14981551">an
interesting discussion</a>.</td>
-</tr>
-</tbody>
-</table>
-<p>Posted by teabot at Oct 30, 2015 17:09
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -292,7 +272,6 @@ Last updated: December 12, 2024
<li><a href=#streaming-data-ingest-java>Streaming Data Ingest (Java)</a></li>
<li><a href=#streaming-mutation-java>Streaming Mutation (Java)</a></li>
<li><a href=#hive-jdbc-jdbc>hive-jdbc (JDBC)</a></li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
diff --git a/community/resources/index.xml b/community/resources/index.xml
index d4ef5de..04a2b7a 100644
--- a/community/resources/index.xml
+++ b/community/resources/index.xml
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0"
xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Resources on Hive
Site</title><link>https://hive.apache.org/community/resources/</link><description>Recent
content in Resources on Hive Site</description><generator>Hugo --
gohugo.io</generator><language>en-us</language><lastBuildDate>Thu, 24 Jul 2025
00:00:00 +0000</lastBuildDate><atom:link
href="https://hive.apache.org/community/resources/index.xml" rel="se [...]
-Apache Hive : Hive APIs Overview API categories Operation based APIs Query
based APIs Available APIs HCatClient (Java) HCatalog Storage Handlers (Java)
HCatalog CLI (Command Line) Metastore (Java) WebHCat (REST) Streaming Data
Ingest (Java) Streaming Mutation (Java) hive-jdbc (JDBC) Comments: API
categories The APIs can be segmented into two conceptual categories: operation
based APIs and query based APIs.</description></item><item><title>Apache Hive :
HiveDeveloperFAQ</title><link>https [...]
+Apache Hive : Hive APIs Overview API categories Operation based APIs Query
based APIs Available APIs HCatClient (Java) HCatalog Storage Handlers (Java)
HCatalog CLI (Command Line) Metastore (Java) WebHCat (REST) Streaming Data
Ingest (Java) Streaming Mutation (Java) hive-jdbc (JDBC) API categories The
APIs can be segmented into two conceptual categories: operation based APIs and
query based APIs.</description></item><item><title>Apache Hive :
HiveDeveloperFAQ</title><link>https://hive.ap [...]
Apache Hive : Guide for Committers New committers Review Reject PreCommit
runs, and committing patches Commit Backporting commits to previous branches
Dialog New committers New committers are encouraged to first read
Apache&rsquo;s generic committer
documentation:</description></item><item><title>Apache Hive :
HowToContribute</title><link>https://hive.apache.org/community/resources/howtocontribute/</link><pubDate>Thu,
12 Dec 2024 00:00:00 +0000</pubDate><guid>https://hive.apache.org/ [...]
Apache Hive : How to Contribute Getting the Source Code Becoming a Contributor
Making Changes Coding Conventions Understanding Maven Understanding Hive
Branches Hadoop Dependencies Unit Tests Add a Unit Test Submitting a PR
Fetching a PR from Github Contributing Your Work JIRA Guidelines Generating
Thrift Code See Also Getting the Source Code First of all, you need the Hive
source code.</description></item><item><title>Apache Hive :
HowToRelease</title><link>https://hive.apache.org/commu [...]
To ensure that the IMetaStoreClient implementations provide the same API we
created a set of tests to validate their workings.
@@ -11,4 +11,4 @@ Apache Hivemall (incubating) Apache Hivemall is a scalable
machine learning lib
Apache Sentry (incubating) Sentry is a role-based authorization system for
Apache Hive.</description></item><item><title>Apache Hive : Running
Yetus</title><link>https://hive.apache.org/community/resources/running-yetus/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/community/resources/running-yetus/</guid><description>Apache
Hive : Running Yetus Overview Yetus is added to Hive in release 3.0.0 to run
checks on the new patches. See HIVE-15051.
There are several rules already defined by the community, but most of them are
not enforced.
Yetus helps us by checking these rules for newly introduced errors. Note that
Yetus checks only the changed part of the code. If any unchanged code contains
errors, then Yetus will not report them, but all of the new code should conform
to the rules.</description></item><item><title>Apache Hive :
TestingDocs</title><link>https://hive.apache.org/community/resources/testingdocs/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/community/resources/testin [...]
-Hive Developer FAQ: Testing Developer Guide: Unit Tests Unit Testing Hive SQL
Running Yetus MetaStore API Tests Query File
Test(qtest)</description></item><item><title>Apache Hive : Unit Testing Hive
SQL</title><link>https://hive.apache.org/community/resources/unit-testing-hive-sql/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/community/resources/unit-testing-hive-sql/</guid><description>Apache
Hive : Unit Testing Hive SQL Apache Hive : Unit Testi [...]
\ No newline at end of file
+Hive Developer FAQ: Testing Developer Guide: Unit Tests Unit Testing Hive SQL
Running Yetus MetaStore API Tests Query File
Test(qtest)</description></item><item><title>Apache Hive : Unit Testing Hive
SQL</title><link>https://hive.apache.org/community/resources/unit-testing-hive-sql/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/community/resources/unit-testing-hive-sql/</guid><description>Apache
Hive : Unit Testing Hive SQL Apache Hive : Unit Testi [...]
\ No newline at end of file
diff --git a/community/resources/unit-testing-hive-sql/index.html
b/community/resources/unit-testing-hive-sql/index.html
index ffff0c8..0a890c6 100644
--- a/community/resources/unit-testing-hive-sql/index.html
+++ b/community/resources/unit-testing-hive-sql/index.html
@@ -166,11 +166,7 @@ Last updated: December 12, 2024
<li><a href=#tools-and-frameworks>Tools and frameworks</a></li>
<li><a href=#useful-practices>Useful practices</a></li>
<li><a href=#relevant-issues>Relevant issues</a></li>
-<li><a href=#other-hive-unit-testing-concerns>Other Hive unit testing
concerns</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#other-hive-unit-testing-concerns>Other Hive unit testing
concerns</a></li>
</ul>
</nav>
</div>
@@ -195,11 +191,7 @@ Last updated: December 12, 2024
<li><a href=#tools-and-frameworks>Tools and frameworks</a></li>
<li><a href=#useful-practices>Useful practices</a></li>
<li><a href=#relevant-issues>Relevant issues</a></li>
-<li><a href=#other-hive-unit-testing-concerns>Other Hive unit testing
concerns</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#other-hive-unit-testing-concerns>Other Hive unit testing
concerns</a></li>
</ul>
</nav>
</aside>
@@ -291,29 +283,6 @@ GROUP BY ...; -- Query 1
</ul>
<h1 id=other-hive-unit-testing-concerns>Other Hive unit testing concerns</h1>
<p>Although not specifically related to Hive SQL, tooling exists for the
testing of other aspects of the Hive ecosystem. In particular the <a
href=https://github.com/HotelsDotCom/beeju>BeeJU</a> project provides JUnit
rules to simplify the testing of integrations with the Hive Metastore and
HiveServer2 services. These are useful, if for example, you are developing
alternative data processing frameworks or tools that aim to leverage
Hive’s metadata features.</p>
-<p> </p>
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-</tr>
-<tr>
-<td>Disclosure: The tools are listed according to level of experience I have
with each tool, HiveRunner being the tool that I have used the most.
Furthermore, I have previously contributed to the HiveRunner project.
I’ve also been involved with the BeeJU project.</td>
-</tr>
-</tbody>
-</table>
-<p>Posted by teabot at Nov 11, 2015 10:20
-|
-|
-Where does <a
href=https://cwiki.apache.org/confluence/download/attachments/27362054/CapybaraHiveMeetupNov2015.pptx>Capybara</a>
fit into this (it at all)?</p>
-<p>Posted by teabot at Dec 03, 2015 11:30
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -356,11 +325,7 @@ Where does <a
href=https://cwiki.apache.org/confluence/download/attachments/2736
<li><a href=#tools-and-frameworks>Tools and frameworks</a></li>
<li><a href=#useful-practices>Useful practices</a></li>
<li><a href=#relevant-issues>Relevant issues</a></li>
-<li><a href=#other-hive-unit-testing-concerns>Other Hive unit testing
concerns</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#other-hive-unit-testing-concerns>Other Hive unit testing
concerns</a></li>
</ul>
</nav>
</div>
diff --git a/development/desingdocs/column-statistics-in-hive/index.html
b/development/desingdocs/column-statistics-in-hive/index.html
index 2797c6c..fc29ad5 100644
--- a/development/desingdocs/column-statistics-in-hive/index.html
+++ b/development/desingdocs/column-statistics-in-hive/index.html
@@ -160,7 +160,6 @@ Last updated: December 12, 2024
<li><a href=#metastore-thrift-api><strong>Metastore Thrift
API</strong></a></li>
</ul>
</li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
@@ -181,7 +180,6 @@ Last updated: December 12, 2024
<li><a href=#metastore-thrift-api><strong>Metastore Thrift
API</strong></a></li>
</ul>
</li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
@@ -339,31 +337,6 @@ bool delete_table_column_statistics(1:string db_name,
2:string tbl_name, 3:strin
4:InvalidInputException o4)</p>
<p>Note that delete_column_statistics is needed to remove the entries from the
metastore when a table is dropped. Also note that currently Hive doesn’t
support drop column.</p>
<p>Note that in V1 of the project, we will support only scalar statistics.
Furthermore, we will support only static partitions, i.e., both the partition
key and partition value should be specified in the analyze command. In a
following version, we will add support for height balanced histograms as well
as support for dynamic partitions in the analyze command for column level
statistics.</p>
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-</tr>
-<tr>
-<td>Shreepadma, is there a jira for this ? Is this ready for review, or is it
a initial design ?</td>
-</tr>
-<tr>
-<td>Also, can you go over <a
href=https://issues.apache.org/jira/browse/HIVE-3421>https://issues.apache.org/jira/browse/HIVE-3421</a>
and see how the two are related ?</td>
-</tr>
-</tbody>
-</table>
-<p>Posted by namit.jain at Sep 14, 2012 00:51
-|
-|
-Namit, This patch is ready for review. There is already a JIRA for this -
HIVE-1362. I’ve the patch on both JIRA and reviewboard. Please note that
this goes beyond HIVE-3421 - this patch adds the stats specified on both this
wiki and the JIRA page. Thanks.</p>
-<p>Posted by shreepadma at Oct 03, 2012 00:46
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -400,7 +373,6 @@ Namit, This patch is ready for review. There is already a
JIRA for this - HIVE-1
<li><a href=#metastore-thrift-api><strong>Metastore Thrift
API</strong></a></li>
</ul>
</li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
diff --git a/development/desingdocs/hivereplicationdevelopment/index.html
b/development/desingdocs/hivereplicationdevelopment/index.html
index af74d51..e3e2546 100644
--- a/development/desingdocs/hivereplicationdevelopment/index.html
+++ b/development/desingdocs/hivereplicationdevelopment/index.html
@@ -439,12 +439,6 @@ Last updated: December 12, 2024
<p><a href=https://issues.apache.org/jira/browse/HIVE-7973>HIVE-7973</a>
tracks progress on developing replication in Hive.</p>
<h1 id=references>References</h1>
<p>[1] Kemme, B., et al., “Database Replication: A Tutorial,” in
<em>Replication: Theory and Practice</em>, B. Charron-Bost et al., Eds. Berlin,
Germany: Springer, 2010, pp. 219-252.</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
diff --git a/development/desingdocs/index.xml b/development/desingdocs/index.xml
index d2e0458..e183068 100644
--- a/development/desingdocs/index.xml
+++ b/development/desingdocs/index.xml
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0"
xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Design Documents on
Hive
Site</title><link>https://hive.apache.org/development/desingdocs/</link><description>Recent
content in Design Documents on Hive Site</description><generator>Hugo --
gohugo.io</generator><language>en-us</language><lastBuildDate>Thu, 24 Jul 2025
00:00:00 +0000</lastBuildDate><atom:link
href="https://hive.apache.org/development/desingdoc [...]
-Make Apache Hive’s data model and metadata services accessible to users of the
Apache Pig dataflow programming language as well as other Hadoop language
runtimes. Make it possible for Hive users and users of other Hadoop language
runtimes to share data stored in Hive’s HDFS data
warehouse.</description></item><item><title>Apache Hive : Binary DataType
Proposal</title><link>https://hive.apache.org/development/desingdocs/binary-datatype-proposal/</link><pubDate>Thu,
12 Dec 2024 00:00:00 +0 [...]
+Make Apache Hive’s data model and metadata services accessible to users of the
Apache Pig dataflow programming language as well as other Hadoop language
runtimes. Make it possible for Hive users and users of other Hadoop language
runtimes to share data stored in Hive’s HDFS data
warehouse.</description></item><item><title>Apache Hive : Binary DataType
Proposal</title><link>https://hive.apache.org/development/desingdocs/binary-datatype-proposal/</link><pubDate>Thu,
12 Dec 2024 00:00:00 +0 [...]
set hive.optimize.correlation=true; 1. Overview In Hadoop environments, an SQL
query submitted to Hive will be evaluated in distributed systems. Thus, after
generating a query operator tree representing the submitted SQL query, Hive
needs to determine what operations can be executed in a task which will be
evalauted in a single node.</description></item><item><title>Apache Hive :
Default Constraint
(HIVE-18726)</title><link>https://hive.apache.org/development/desingdocs/default-constrain
[...]
Background With the addition of DEFAULT constraint (HIVE-18726) user can
define columns to have default value which will be used in case user doesn’t
explicitly specify it while INSERTING data. For DEFAULT constraint to kick in
user has to explicitly specify column schema leaving out the column name for
which user would like the sytem to use DEFAULT
value.</description></item><item><title>Apache Hive : Dependent
Tables</title><link>https://hive.apache.org/development/desingdocs/dependent
[...]
create table T (key string, value string) partitioned by (ds string, hr
string);
@@ -34,7 +34,7 @@ Howl wiki Yahoo group for Howl developers (including mailing
list archive) Howl
Materialized views with automatic rewriting can result in very similar
results. Hive 2.3.0 adds support for materialzed
views.</description></item><item><title>Apache Hive : IndexDev
Bitmap</title><link>https://hive.apache.org/development/desingdocs/indexdev-bitmap/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/development/desingdocs/indexdev-bitmap/</guid><description>Apache
Hive : Bitmap Indexing Apache Hive : Bitmap Indexing Introduction Approac [...]
Bitmap indexing (http://en.wikipedia.org/wiki/Bitmap_index) is a standard
technique for indexing columns with few distinct
values, such as gender.
-Approach We want to develop a bitmap index that can reuse as much of the
existing Compact Index code as
possible.</description></item><item><title>Apache Hive :
Links</title><link>https://hive.apache.org/development/desingdocs/links/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/development/desingdocs/links/</guid><description>Apache
Hive : Links Motivation Today, the infrastructure provided by Hive allows for
the setup of a single shared warehouse [...]
+Approach We want to develop a bitmap index that can reuse as much of the
existing Compact Index code as
possible.</description></item><item><title>Apache Hive :
Links</title><link>https://hive.apache.org/development/desingdocs/links/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/development/desingdocs/links/</guid><description>Apache
Hive : Links Motivation Today, the infrastructure provided by Hive allows for
the setup of a single shared warehouse [...]
There are many tables of the following format:
create table T(a, b, c, .</description></item><item><title>Apache Hive :
LLAP</title><link>https://hive.apache.org/development/desingdocs/llap/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/development/desingdocs/llap/</guid><description>Apache
Hive : LLAP Live Long And Process (LLAP) functionality was added in Hive 2.0
(HIVE-7926 and associated tasks). HIVE-9850 links documentation, features, and
issues for this enhancement. For configuration of L [...]
Apache Hive : LLAP Overview Persistent Daemon Execution Engine Query Fragment
Execution I/O Caching Workload Management ACID Support Security Monitoring Web
Services SLIDER on YARN Deployment LLAP Status Resources Attachments: Overview
Hive has become significantly faster thanks to various features and
improvements that were built by the community in recent years, including Tez
and Cost-based-optimization.</description></item><item><title>Apache Hive :
Locking</title><link>https://hive.a [...]
diff --git a/development/desingdocs/links/index.html
b/development/desingdocs/links/index.html
index 896fba1..4215d03 100644
--- a/development/desingdocs/links/index.html
+++ b/development/desingdocs/links/index.html
@@ -159,7 +159,6 @@ Last updated: December 12, 2024
<li><a
href=#modeling-namespace-as-a-role-in-hive-using-the-authorization-model>Modeling
Namespace as a Role in Hive (using the authorization model)</a></li>
<li><a href=#modeling-namespace-by-tagging-objects>Modeling Namespace by
tagging objects</a></li>
<li><a
href=#modeling-namespace-as-a-database-but-using-views-for-imports>Modeling
Namespace as a database but using views for imports</a></li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
@@ -257,267 +256,6 @@ Last updated: December 12, 2024
<li>We would like to differentiate the set of partitions that are available
for the same imported tables across namespaces. This would require partition
pruning based on the view partitions in query rewrite which is not how it works
today.</li>
</ul>
<p>The above notes make it clear that what we are trying to build is a very
special case of a degenerate view, and it would be cleaner to introduce a new
concept in Hive to model these ‘imports’.</p>
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-</tr>
-<tr>
-<td>Questions from Ashutosh Chauhan (with inline responses):</td>
-</tr>
-</tbody>
-</table>
-<ul>
-<li>
-<p>What exactly is contained in tracking capacity usage. One is disk space.
That I presume you are going to track via summing size under database
directory. Are you also thinking of tracking resource usage in terms of
CPU/memory/network utilization for different teams?
-<em>Right now the capacity usage in Hive we will track is the disk space
(managed tables that belong to the namespace + imported tables). We will track
the mappers and reducers that the namepace utilizes directly from
Hadoop.</em></p>
-</li>
-<li>
-<p>Each namespace (ns) will have exactly one database. If so, then users are
not allowed to create/use databases in such deployment? Not necessarily a
problem, just trying to understand design.
-<em>This is correct – this is a limitation of the design. Introducing a new
concept seemed heavyweight, so we re-used databases for namespaces. But this
approach means that a given namespace cannot have sub-databases in it.</em></p>
-</li>
-<li>
-<p>How are you going to keep metadata consistent across two ns? If metadata
gets updated in remote ns, will it get automatically updated in user’s
local ns? If yes, how will this be implemented? If no, then every time user
need to use data from remote ns, she has to bring metadata uptodate in her ns.
How will she do it?
-<em>Metadata will be kept in sync for linked tables. We will make alter table
on the remote table (source of the link) cause an update to the target of the
link. Note that from a Hive perspective, the metadata for the source and target
of a link is in the same metastore.</em></p>
-</li>
-<li>
-<p>Is it even required that metadata of two linked tables to be consistent?
Seems like user has to run “alter link add partition” herself for
each partition. She can choose only to add few partitions. In this case, tables
in two ns have different number of partitions and thus data.
-<em>What you say above is true for static links. For dynamic links, add and
drop partition on the source of the link will cause the target to get those
partitions as well (we trap alter table add/drop partition to provide this
behavior).</em></p>
-</li>
-<li>
-<p>Who is allowed to create links?
-<em>Any user on the database who has create/all privileges on the database. We
could potentially create a new privilege for this, but I think create privilege
should suffice. We can similarly map alter, drop privileges to the appropriate
operations.</em></p>
-</li>
-<li>
-<p>Once user creates a link, who can use it? If everyone is allowed to access,
then I don’t see how is it different from the problem that you are
outlining in first alternative design option, wherein user having an access to
two ns via roles has access to data on both ns.
-<em>The link creates metadata in the target database. So you can only access
data that has been linked into this database (access is via the T@Y or Y.T
syntax depending on the chosen design option). Note that this is different than
having a role that a user maps to since in that case, there is no local
metadata in the target database specifying if the imported data is accessible
from this database.</em></p>
-</li>
-<li>
-<p>If links are first class concepts, then authorization model also needs to
understand them? I don’t see any mention of that.
-<em>Yes, we need to account for the authorization model.</em></p>
-</li>
-<li>
-<p>I see there is a hdfs jira for implementing hard links of files in hdfs
layer, so that takes care of linking physical data on hdfs. What about tables
whose data is stored in external systems. For example, hbase. Does hbase also
needs to implement feature of hard-linking their table for hive to make use of
this feature? What about other storage handlers like cassandra, mongodb etc.
-<em>The link does not create a link on HDFS. It just points to the source
table/partitions. One can think of it as a Hive-level link so there is no need
for any changes/features from the other storage handlers.</em></p>
-</li>
-<li>
-<p>Migration will involve two step process of distcp’ing data from one
cluster to another and then replicating one mysql instance to another. Are
there any other steps? Do you plan to (later) build tools to automate this
process of migration.
-<em>We will be building tools to enable migration of a namespace to another
cluster. Migration will involve replicating the metadata and the data as you
mention above.</em></p>
-</li>
-<li>
-<p>When migrating ns from one datacenter to another, will links be dropped or
they are also preserved?
-<em>We will preserve them – by copying the data for the links to the other
datacenter.</em></p>
-</li>
-</ul>
-<p>Posted by sambavi at May 22, 2012 02:10
-|
-|
-The first draft of this proposal is very hard to decipher because it relies on
terms that aren’t well defined. For example, here’s the second
sentence from the motivations section:
-Growth beyond a single warehouse (or) separation of capacity usage and
allocation requires the creation of multiple physical warehouses, i.e.,
separate Hive instances.
-What’s the difference between a <em>warehouse</em> and a <em>physical
warehouse</em>? How do you define a <em>Hive instance</em>? In the requirements
section the term <em>virtual warehouse</em> is introduced and equated to a
namespace, but clearly it’s more than that because otherwise DBs/Schemas
would suffice. Can you please update the proposal to include definitions of
these terms?</p>
-<p>Posted by cwsteinbach at May 22, 2012 18:35
-|
-|
-Prevent access using two part name syntax (Y.T) if namespaces feature is
“on” in a Hive instance. This ensures the database is
self-contained.
-The cross-namespace Hiveconf ACL proposed in HIVE-3016 doesn’t prevent
anyone from doing anything because there is no way to keep users from disabling
it. I’m surprised to see this ticket mentioned here since three
committers have already gone on record saying that this is the wrong approach,
and one committer even -1’d it. If preventing cross-db references in
queries is a requirement for this project, then I think Hive’s
authorization mechanism will need to be ex [...]
-<p>Posted by cwsteinbach at May 22, 2012 18:48
-|
-|
-From the design section:
-We are building a namespace service external to Hive that has metadata on
namespace location across the Hive instances, and allows importing data across
Hive instances using replication.
-Does the work proposed in HIVE-2989 also include adding this Db/Table
replication infrastructure to Hive?</p>
-<p>Posted by cwsteinbach at May 22, 2012 18:53
-|
-|
-We mention the JIRA here for the sake of completeness. We are implementing
this as a pre-execution hook for now, but support for namespaces will be
incomplete without this control (since you can’t guarantee self-contained
namespaces unless you prevent two-part name access).
-What extensions to the authorization system are you thinking of? One idea
would be to set role for a session (corresponding to the namespace the user is
operating in), so that a user operating in the context of that role can only
see the data available to that role.</p>
-<p>Posted by sambavi at May 22, 2012 19:42
-|
-|
-What extensions to the authorization system are you thinking of?
-Add a new privilege named something like “select_cross_db” and
GRANT it to specific users as follows:
-GRANT select_cross_db ON DATABASE db TO USER x;
-GRANT select_cross_db ON DATABASE db to ROLE x;
-This privilege would be provided by default, but if absent, then the user
would be prevented from referencing DBs outside of ‘db’ while using
‘db’ as the primary database.</p>
-<p>Posted by cwsteinbach at May 22, 2012 20:01
-|
-|
-Thanks Carl - this is an interesting suggestion - for installations with
namespaces, we would need to turn this privilege off by default and have no
users. groups or roles be granted this privilege. We’ll discuss
internally.</p>
-<p>Posted by sambavi at May 22, 2012 20:25
-|
-|
-I’ve edited the main page to include a definition for physical warehouse
and removed the term Hive instance to reduce ambiguity.</p>
-<p>Posted by sambavi at May 22, 2012 20:26
-|
-|
-No Hive-2989 will not include the replication infrastructure. We plan to
provide replication in the second half of the year.</p>
-<p>Posted by sambavi at May 23, 2012 10:30
-|
-|
-I have opened a JIRA for adding a new privilege for cross database commands
and resolved HIVE-3016. Thanks for the suggestion!</p>
-<p>Posted by sambavi at May 23, 2012 12:36
-|
-|
-This proposal describes the DDL and metadata changes that are necessary to
support DB Links, but it doesn’t include any details about the mechanics
of replicating data across clusters (it’s probably a little more
complicated than just running distcp). I think the proposal needs to include
these details before it can be considered complete.
-No Hive-2989 will not include the replication infrastructure. We plan to
provide replication in the second half of the year.
-The metadata extensions described in this proposal will require users to run
metastore upgrade scripts, and the DDL extensions will become part of the
public API. The former imposes a burden on users, and the latter constitutes a
continuing maintenance burden on the people who contribute to this project.
Taking this into account I think we need to be able to demonstrate that the new
code tangibly benefits users before it appears in a Hive release. I don’t
think it will be possible [...]
-<p>Posted by cwsteinbach at May 25, 2012 19:21
-|
-|
-Some thoughts on this from our team:</p>
-<ol>
-<li>Even in a single physical warehouse, namespaces allow better quota
management and isolation between independent team’s data/workload. This
is independent of security and replication considerations.</li>
-<li>Further, replication as a feature is pretty big and will take a while to
be built out. We can hide the table link feature behind a config parameter so
that its not exposed to users who don’t need it until its completed. The
only piece we cannot hide is the metastore changes, but the upgrade script for
the metastore will just add few columns in few tables, and should not take more
than a few minutes even for a pretty large warehouse (few thousand tables +
~100,000 partitions). I [...]
-If there was links support to start with in hive, we would have used it from
the very beginning, and not gotten into the mess of one large warehouse with
difficulty in dealing with multi-tenancy. We seriously believe that this is the
right direction for the community, and all new users can design the warehouse
in the right way from the very start, and learn from Facebook’s
experience.</li>
-</ol>
-<p>Posted by sambavi at May 25, 2012 21:37
-|
-|
-Even in a single physical warehouse, namespaces allow better quota management
and isolation between independent team’s data/workload. This is
independent of security and replication considerations.
-Hive already provides support for namespaces in the form of databases/schemas.
As far as I can tell the table link feature proposed here actually weakens the
isolation guarantees provided by databases/schemas, and consequently will make
quota and workload management between teams more complicated. In order to
resolve this disagreement I think it would help if you provided some concrete
examples of how the table link feature improves this situation.</p>
-<p>Posted by cwsteinbach at May 29, 2012 13:54
-|
-|
-Suppose there are 2 teams which want to use the same physical warehouse.
-Team 1 wants to use the following: (let us say that each table is partitioned
by date)
-T1 (all partitions)
-T2 (all partitions)
-T3 (all partitions)
-Team 2 wants to use the following:
-T1 (partitions for the last 3 days)
-T2 (partition for a fixed day: say May 29' 2012)
-T4 (all partitions)
-Using the current hive architecture, we can perform the following:</p>
-<ul>
-<li>Use a single database and have scripts for quota</li>
-<li>Use 2 databases and copy the data in both the databases (say, the
databases are DB1 and DB2 respectively)</li>
-<li>Use 2 databases, and use views in database 2 (to be used to team 2).</li>
-</ul>
-<p>The problems with these approaches is as follows:</p>
-<ul>
-<li>Table Discovery etc.becomes very messy. You can do that via tags, but
then, all the functionality that is provided by databases
-can also be provided via tags.</li>
-<li>Duplication of data</li>
-<li>The user will have to perform the management himself. When a partition
gets added to DB1.T1, the corresponding partition needs to be added to
DB2.View1, and the 3 day old partition from DB2.View1 needs to be dropped. This
has to be done outside hive, and makes
-the task of maintaining these partitions very difficult - how do you make sure
this is atomic etc. User has to do lot more scripting.</li>
-</ul>
-<p>Links is a degenerate case of views. With links, the above use case can be
solved very easily.
-This is a real use case at Facebook today, and I think, there will be similar
use cases for other users. Maybe, they are not solving it in the most optimal
manner currently.</p>
-<p>Posted by namit.jain at May 29, 2012 18:33
-|
-|
-Furthermore, databases don’t provide isolation today since two part name
access is unrestricted. By introducing a trackable way of accessing content
outside the current database (using table links), we get isolation for a
namespace using Hive databases.</p>
-<p>Posted by sambavi at May 30, 2012 01:22
-|
-|
-@Namit: Thanks for providing an example. I have a couple followup questions:
-Use a single database and have scripts for quota
-Can you provide some more details about how quota management works? For
example, Team 1 and 2 both need access to a single partition in table T2, so
who pays for the space occupied by this partition? Are both teams charged for
it?
-Use 2 databases and copy the data in both the databases (say, the databases
are DB1 and DB2 respectively)
-I’m not sure why this is listed as an option. What does this actually
accomplish?
-Use 2 databases, and use views in database 2 (to be used to team 2).
-This seems like the most reasonable approach given my current understanding of
your use case.
-You listed three problems with these approaches. Most of them don’t seem
applicable to views:
-Table Discovery etc.becomes very messy. You can do that via tags, but then,
all the functionality that is provided by databases can also be provided via
tags.
-It’s hard for me to evaluate this claim since I’m not totally sure
what is meant by “table discovery”. Can you please provide an
example? However, my guess is that this is not a differentiator if you’re
comparing the table links approach to views.
-Duplication of data
-Not applicable for views and table links.
-The user will have to perform the management himself. When a partition gets
added to DB1.T1, the corresponding partition needs to be added to DB2.View1,
and the 3 day old partition from DB2.View1 needs to be dropped.
-Based on the description of table links in HIVE-2989 it sounds like the user
will still have to perform manual management even with table links, e.g.
dropping the link that points to the partition from four days ago and adding a
new link that points to the most recent partition. In this case views may
actually work better since you can embed the filter condition (last three days)
in the view definition instead of relying on external tools to update the table
links.
-This has to be done outside hive, and makes the task of maintaining these
partitions very difficult - how do you make sure this is atomic etc. User has
to do lot more scripting.
-I don’t think table links make this process atomic, and as I mentioned
above the process of maintaining this linked set of partitions actually seems
easier if you use views instead.
-Links is a degenerate case of views.
-I agree that table links are a degenerate case of views. Since that’s
the case, why is it necessary to implement table links? Why not leverage the
functionality that is already provided with views?</p>
-<p>Posted by cwsteinbach at May 31, 2012 16:45
-|
-|
-Furthermore, databases don’t provide isolation today since two part name
access is unrestricted.
-DBs in conjunction with the authorization system provide strong isolation
between different namespaces. Also, it should be possible to extend the
authorization system to the two-part-name-access case that you described above
(e.g. <a href=https://issues.apache.org/jira/browse/HIVE-3047>HIVE-3047</a>).
-By introducing a trackable way of accessing content outside the current
database (using table links), we get isolation for a namespace using Hive
databases.
-I think you already get that by using views. If I’m wrong can you please
explain how the view approach falls short? Thanks.</p>
-<p>Posted by cwsteinbach at May 31, 2012 16:59
-|
-|
-Carl: I’ve addressed your questions below.</p>
-<blockquote>
-<blockquote>
-<p>Can you provide some more details about how quota management works? For
example, Team 1 and 2 both need access to a single partition in table T2, so
who pays for the space occupied by this partition? Are both teams charged for
it?
-If the partition is shared, it is accounted towards both their quota (base
quota for the team that owns the partition, and imported quota for the team
that imports it via a link). The reason for this is that when a namespace is
moved to another datacenter, we have to account for all the quota (both
imported and base) as belonging to the namespace (the data can no longer be
shared directly via a link, and we will need to replicate it).
-Use 2 databases and copy the data in both the databases (say, the databases
are DB1 and DB2 respectively)
-I’m not sure why this is listed as an option. What does this actually
accomplish?
-It was just one way of achieving the same data being available in the two
namespaces. You can ignore this one (smile)
-It’s hard for me to evaluate this claim since I’m not totally sure
what is meant by “table discovery”. Can you please provide an
example? However, my guess is that this is not a differentiator if you’re
comparing the table links approach to views.
-I think Namit meant this in reference to the design option of using a single
database and using scripts for quota management. In the case of views, due to
views being opaque, it will be hard to see which tables are imported into the
namespace.
-Based on the description of table links in HIVE-2989 it sounds like the user
will still have to perform manual management even with table links, e.g.
dropping the link that points to the partition from four days ago and adding a
new link that points to the most recent partition. In this case views may
actually work better since you can embed the filter condition (last three days)
in the view definition instead of relying on external tools to update the table
links.
-Maybe the description was unclear. Table links have two types: static and
dynamic. Static links behave the way you describe, but dynamic links will have
addition and drop of partitions when the source table (of the link) has
partitions added or removed from it.
-I don’t think table links make this process atomic, and as I mentioned
above the process of maintaining this linked set of partitions actually seems
easier if you use views instead.
-Addressed this above - table links do keep the links updated when the source
of the link has partitions added or dropped. This will be atomic since it is
done in one metastore operation during an ALTER TABLE ADD/DROP PARTITION
command.
-I agree that table links are a degenerate case of views. Since that’s
the case, why is it necessary to implement table links? Why not leverage the
functionality that is already provided with views?
-Table links allow for better accounting of imported data (views are opaque),
single instancing of imports and partition pruning when the imports only have
some of the partitions of the source table of the link. Given this, it seems
ideal to introduce table links as a concept rather than overload views.</p>
-</blockquote>
-</blockquote>
-<p>Posted by sambavi at May 31, 2012 18:09
-|
-|
-Hi Carl, I explained how views fall short in the post below (response to your
comments on Namit’s post). Please add any more questions you have - I can
explain further if unclear.</p>
-<p>Posted by sambavi at May 31, 2012 18:11
-|
-|
-I think Namit meant this in reference to the design option of using a single
database and using scripts for quota management. In the case of views, due to
views being opaque, it will be hard to see which tables are imported into the
namespace.
-Views are not opaque. DESCRIBE FORMATTED currently includes the following
information:</p>
-<pre tabindex=0><code># View Information
-View Original Text: SELECT value FROM src WHERE key=86
-View Expanded Text: SELECT `src`.`value` FROM `src` WHERE `src`.`key`=86
-
-</code></pre><p>Currently the metastore only tracks the original and expanded
text view query, but it would be straightforward to also extract and store the
list of source tables that are referenced in the query when the view is created
(in fact, there’s already a JIRA ticket for this (<a
href=https://issues.apache.org/jira/browse/HIVE-1073>HIVE-1073</a>), and the
information is already readily available internally as described <a
href=https://cwiki.apache.org/confluence/display/Hi [...]
-Maybe the description was unclear. Table links have two types: static and
dynamic. Static links behave the way you describe, but dynamic links will have
addition and drop of partitions when the source table (of the link) has
partitions added or removed from it.
-I don’t think dynamic table links satisfy the use case covered by Team
2’s access requirements for table T1. Team 2 wants to see only the most
recent three partitions in table T1, and my understanding of dynamic table
links is that once the link is created, Team 2 will subsequently see every new
partition that is added to the source table. In order to satisfy Team 2’s
requirements I think you’re going to have to manually add and drop
partitions from the link using [...]
-The functionality provided by dynamic links does make sense in some contexts,
but the same is true for dynamic partitioned views. Why not extend the
partitioned view feature to support dynamic partitions?
-Table links allow for better accounting of imported data (views are opaque),
single instancing of imports and partition pruning when the imports only have
some of the partitions of the source table of the link. Given this, it seems
ideal to introduce table links as a concept rather than overload views.
-I addressed the “views are opaque” argument above. I’m
having trouble following the rest of the sentence. What does “single
instancing of imports” mean? If possible can you provide an example in
terms of table links and partitioned views?</p>
-<p>Posted by cwsteinbach at May 31, 2012 22:27
-|
-|
-Going back to my example:
-Suppose there are 2 teams which want to use the same physical warehouse.
-Team 1 wants to use the following: (let us say that each table is partitioned
by date)
-T1 (all partitions)
-T2 (all partitions)
-T3 (all partitions)
-Team 2 wants to use the following:
-T1 (partitions for the last 3 days)
-T2 (partition for a fixed day: say May 29' 2012)
-T4 (all partitions)
-Using the current hive architecture, we can perform the following:</p>
-<ul>
-<li>Use a single database and have scripts for quota</li>
-<li>Use 2 databases and copy the data in both the databases (say, the
databases are DB1 and DB2 respectively)</li>
-<li>Use 2 databases, and use views in database 2 (to be used to team 2).</li>
-</ul>
-<p>We have discarded the first 2 approaches above, so let us discuss how will
we use approach 3 (specifically for T1).
-Team 2 will create the view: create view V1T1 as select * from DB1.T1
-Now, whenever a partition gets added in DB1.T1, someone (a hook or something -
outside hive) needs to add the corresponding partition in V1T1.
-That extra layer needs to make sure that the new partition in V1T1 is part of
the inputs (may be necessary for auditing etc.)
-Hive metastore has no knowledge of this dependency (view partition -> table
partition), and it is maintained in multiple places (for possibly
-different teams).
-The same argument applies when a partition gets dropped from DB1.T1.
-By design, there is no one-to-one dependency between a table partition and a
view partition, and we do not want to create such a dependency.
-The view may depend on multiple tables/partitions. The views in hive are not
updatable.
-By design, the schema of the view and the underlying table(s) can be different.
-Links provide the above functionality. If I understand right, you are
proposing to extend views to support the above functionality. We will end up
-with a very specific model for a specific type of views, which are not like
normal hive views. That would be more confusing, in my opinion.</p>
-<p>Posted by namit.jain at Jun 01, 2012 14:25
-|
-|
-Please comment - we haven’t gotten any updates on the wiki as well as
the jira <a
href=https://issues.apache.org/jira/browse/HIVE-2989>https://issues.apache.org/jira/browse/HIVE-2989</a></p>
-<p>Posted by namit.jain at Jun 02, 2012 19:35
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -553,7 +291,6 @@ Please comment - we haven’t gotten any updates on the
wiki as well as the
<li><a
href=#modeling-namespace-as-a-role-in-hive-using-the-authorization-model>Modeling
Namespace as a Role in Hive (using the authorization model)</a></li>
<li><a href=#modeling-namespace-by-tagging-objects>Modeling Namespace by
tagging objects</a></li>
<li><a
href=#modeling-namespace-as-a-database-but-using-views-for-imports>Modeling
Namespace as a database but using views for imports</a></li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
diff --git a/development/desingdocs/listbucketing/index.html
b/development/desingdocs/listbucketing/index.html
index 2763ada..abe2d5d 100644
--- a/development/desingdocs/listbucketing/index.html
+++ b/development/desingdocs/listbucketing/index.html
@@ -172,11 +172,7 @@ Last updated: December 12, 2024
</li>
</ul>
</li>
-<li><a href=#implementation>Implementation</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#implementation>Implementation</a></li>
</ul>
</nav>
</div>
@@ -207,11 +203,7 @@ Last updated: December 12, 2024
</li>
</ul>
</li>
-<li><a href=#implementation>Implementation</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#implementation>Implementation</a></li>
</ul>
</nav>
</aside>
@@ -384,100 +376,6 @@ Last updated: December 12, 2024
<li><a href=https://issues.apache.org/jira/browse/HIVE-3073>HIVE-3073</a>:
Hive List Bucketing - DML support (release 0.11.0)</li>
</ul>
<p>For more information, see <a href=#skewed-tables-in-the-ddl-document>Skewed
Tables in the DDL document</a>.</p>
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-</tr>
-<tr>
-<td>Does this feature require any changes to the metastore? If so can you
please describe them? Thanks.</td>
-</tr>
-</tbody>
-</table>
-<p>Posted by cwsteinbach at Jun 11, 2012 15:13
-|
-|
-Please also describe any changes that will be made to public APIs including
the following:</p>
-<ul>
-<li>The metastore and/or HiveServer Thrift interfaces (note that this includes
overloading functions that are already included in the current Thrift
interfaces, as well as modifying or adding new Thrift structs/objects).</li>
-<li>Hive Query Language, including new commands, extensions to existing
commands, or changes to the output generated by commands (e.g. DESCRIBE
FORMATTED TABLE).</li>
-<li>New configuration properties.</li>
-<li>Modifications to any of the public plugin APIs including SerDes and
Hook/Listener interfaces,</li>
-</ul>
-<p>Also, if this feature requires any changes to the Metastore schema, those
changes should be described in this document.
-Finally, please describe your plan for implementing this feature and getting
it committed. Will it go in as a single patch or be split into several
different patches.</p>
-<p>Posted by cwsteinbach at Jun 12, 2012 01:47
-|
-|
-Yes, it requires metastore change.
-We want to store the following information in metastore:</p>
-<ol>
-<li>skewed column names</li>
-<li>skewed column values</li>
-<li>mappings from skewed column value to directories.
-The above 3 will be added to MStorageDescriptor.java etc</li>
-</ol>
-<p>Posted by gangtimliu at Jun 14, 2012 12:47
-|
-|
-Yes, I will update document with any changes in the areas you mention.
-Here is plan:</p>
-<ol>
-<li>Implement End-to-end feature for single skewed column (DDL+DML) and go in
as a single patch.</li>
-<li>Implement End-to-end feature for multiple skewed columns (DDL+DML) and go
in as a single patch.</li>
-<li>Implement follow-ups and go in as a single patch.
-The #3 is a slot for those not critical but nice to have and not in #1 & #2
due to resource constraints etc.</li>
-</ol>
-<p>Posted by gangtimliu at Jun 14, 2012 12:55
-|
-|
-It wasn’t clear to me from this wiki page what the benefit is of storing
the skewed values “as directories” over just storing them as files
as regular skew tables do? Tim, could you please elaborate on that?</p>
-<p>Posted by <a href=mailto:[email protected]>[email protected]</a> at Nov 07,
2012 11:23
-|
-|
-Different terms but refer to the same thing: create sub directory for skewed
value and store record in file.
-Note that regular skew table doesn’t create sub directory. It’s
different from non-skewed table because it has meta-data of skewed column name
and values so that feature like skewed join can leverage it.
-Only list bucketing table creates sub directory for skewed-value. We use
“stored as directories” to mark it.
-Hope it helps.</p>
-<p>Posted by gangtimliu at Nov 07, 2012 12:49
-|
-|
-Tim, thanks for responding but I am still missing something. I re-read the
wiki page and here is my understanding. Please correct me if I am wrong.
-Let’s take a hand-wavy example.
-Skewed table:
-create table t1 (x string) skewed by (error) on (‘a’,
‘b’) partitioned by dt location
‘/user/hive/warehouse/t1’;
-will create the following files:
-/user/hive/warehouse/t1/dt=something/x=a.txt
-/user/hive/warehouse/t1/dt=something/x=b.txt
-/user/hive/warehouse/t1/dt=something/default
-List bucketing table:
-create table t2 (x string) skewed by (error) on (‘a’,
‘b’) partitioned by dt location
‘/user/hive/warehouse/t2’ ;
-will create the following files:
-/user/hive/warehouse/t2/dt=something/x=a/data.txt
-/user/hive/warehouse/t2/dt=something/x=b/data.txt
-/user/hive/warehouse/t2/dt=something/default/data.txt
-Is that correct?
-In that case, why would a user ever choose to create sub-directories? Skewed
joins would perform just well for regular skewed tables or list bucketing
tables. Given that list bucketing introduces sub-directories it imposes
restrictions on what other things users can and cannot do while regular skewed
tables don’t. So what would be someone’s motivation to choose list
bucketing over skewed tables?</p>
-<p>Posted by <a href=mailto:[email protected]>[email protected]</a> at Nov 09,
2012 00:11
-|
-|
-sorry for confusion. wiki requires polish to make it clear.
-I assume t2 has stored as directories.
-t1 doesn’t have sub-directories but t2 has sub-directories. Directory
structure looks like:
-/user/hive/warehouse/t1/dt=something/data.txt
-/user/hive/warehouse/t2/dt=something/x=a/data.txt
-/user/hive/warehouse/t2/dt=something/x=b/data.txt
-/user/hive/warehouse/t2/dt=something/default/data.txt
-“stored as directories” tells hive to create sub-directories.
-what’s use case of t1? t1 can be used for skewed join since t1 has
skewed column and value information.</p>
-<p>Posted by gangtimliu at Nov 09, 2012 01:55
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -526,11 +424,7 @@ what’s use case of t1? t1 can be used for skewed
join since t1 has skewed
</li>
</ul>
</li>
-<li><a href=#implementation>Implementation</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#implementation>Implementation</a></li>
</ul>
</nav>
</div>
diff --git a/development/desingdocs/llap/index.html
b/development/desingdocs/llap/index.html
index e8b05b7..24459ed 100644
--- a/development/desingdocs/llap/index.html
+++ b/development/desingdocs/llap/index.html
@@ -325,14 +325,6 @@ LLAP Peers - /peers </p>
<h2 id=resources>Resources</h2>
<p><a
href=https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf>LLAP
Design Document</a></p>
<p><a
href="https://cwiki.apache.org/confluence/download/attachments/27362054/LLAP-Meetup-Nov.ppsx?version=1&modificationDate=1447885307000&api=v2">Hive
Contributor Meetup Presentation</a></p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
<h2 id=attachments>Attachments:</h2>
<p><img src=images/icons/bullet_blue.gif alt></p>
</div>
diff --git a/development/desingdocs/skewed-join-optimization/index.html
b/development/desingdocs/skewed-join-optimization/index.html
index 20414b9..a5f0530 100644
--- a/development/desingdocs/skewed-join-optimization/index.html
+++ b/development/desingdocs/skewed-join-optimization/index.html
@@ -158,7 +158,6 @@ Last updated: December 12, 2024
<li><a href=#hive-enhancements>Hive Enhancements</a></li>
</ul>
</li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
@@ -206,28 +205,6 @@ We want to do a join corresponding to the following
query</p>
<h3 id=hive-enhancements>Hive Enhancements</h3>
<p><em>Original plan:</em> ~~The skew data will be obtained from list
bucketing (see the <a
href=https://hive.apache.org/development/desingdocs/listbucketing/>List
Bucketing</a>~~<del>design document). There will be no additions to the Hive
grammar.</del></p>
<p><em>Implementation:</em> Starting in Hive 0.10.0, tables can be created as
skewed or altered to be skewed (in which case partitions created after the
ALTER statement will be skewed). In addition, skewed tables can use the list
bucketing feature by specifying the STORED AS DIRECTORIES option. See the DDL
documentation for details: <a href=#create-table>Create Table</a>, <a
href=#skewed-tables>Skewed Tables</a>, and <a
href=#alter-table-skewed-or-stored-as-directories>Alter Table Skewe [...]
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-</tr>
-<tr>
-<td>Is this proposal ready for review?</td>
-</tr>
-</tbody>
-</table>
-<p>Posted by cwsteinbach at May 31, 2012 21:27
-|
-|
-yes</p>
-<p>Posted by namit.jain at Jun 01, 2012 21:07
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -262,7 +239,6 @@ yes</p>
<li><a href=#hive-enhancements>Hive Enhancements</a></li>
</ul>
</li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
diff --git a/docs/latest/admin/adminmanual-configuration/index.html
b/docs/latest/admin/adminmanual-configuration/index.html
index b21a343..0aa8c3b 100644
--- a/docs/latest/admin/adminmanual-configuration/index.html
+++ b/docs/latest/admin/adminmanual-configuration/index.html
@@ -597,9 +597,6 @@ See <a
href=http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-co
<p>For Hive releases prior to 0.11.0, see the “Thrift Server
Setup” section in the HCatalog 0.5.0 document <a
href=http://hive.apache.org/docs/hcat_r0.5.0/install.html>Installation from
Tarball</a>.</p>
<h3 id=webhcat>WebHCat</h3>
<p>For information about configuring WebHCat, see <a
href=https://hive.apache.org/docs/latest/webhcat/webhcat-configure/>WebHCat
Configuration</a>.</p>
-<p> </p>
-<p> </p>
-<p>Save</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
diff --git a/docs/latest/admin/adminmanual-metastore-administration/index.html
b/docs/latest/admin/adminmanual-metastore-administration/index.html
index ec6951e..cedc5de 100644
--- a/docs/latest/admin/adminmanual-metastore-administration/index.html
+++ b/docs/latest/admin/adminmanual-metastore-administration/index.html
@@ -590,7 +590,6 @@ Last updated: December 12, 2024
<p>Hive now records the schema version in the metastore database and verifies
that the metastore schema version is compatible with Hive binaries that are
going to accesss the metastore. Note that the Hive properties to implicitly
create or alter the existing schema are disabled by default. Hive will not
attempt to change the metastore schema implicitly. When you execute a Hive
query against an old schema, it will fail to access the metastore.</p>
<p>To suppress the schema check and allow the metastore to implicitly modify
the schema, you need to set a configuration property
<code>hive.metastore.schema.verification</code> to false in
<code>hive-site.xml</code>.</p>
<p>Starting in release 0.12, Hive also includes an off-line schema tool to
initialize and upgrade the metastore schema. Please refer to the details <a
href=https://hive.apache.org/docs/latest/admin/hive-schema-tool/>here</a>.</p>
-<p>Save</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
diff --git a/docs/latest/admin/hive-on-spark-getting-started/index.html
b/docs/latest/admin/hive-on-spark-getting-started/index.html
index a3d5078..f1e257b 100644
--- a/docs/latest/admin/hive-on-spark-getting-started/index.html
+++ b/docs/latest/admin/hive-on-spark-getting-started/index.html
@@ -169,7 +169,6 @@ Last updated: December 12, 2024
<li><a href=#recommended-configuration>Recommended Configuration</a></li>
<li><a href=#design-documents>Design documents</a></li>
<li><a href=#attachments>Attachments:</a></li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
@@ -202,7 +201,6 @@ Last updated: December 12, 2024
<li><a href=#recommended-configuration>Recommended Configuration</a></li>
<li><a href=#design-documents>Design documents</a></li>
<li><a href=#attachments>Attachments:</a></li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
@@ -495,24 +493,6 @@
spark.kryo.classesToRegister=org.apache.hadoop.hive.ql.io.HiveKey,org.apache.had
<h2 id=attachments>Attachments:</h2>
<p><img src=images/icons/bullet_blue.gif alt>
<a
href=/attachments/44302539/53575687.pdf>attachments/44302539/53575687.pdf</a>
(application/pdf)</p>
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-</tr>
-<tr>
-<td>Spark has its own property to control whether to merge small files. Set
hive.merge.sparkfiles=true to merge small files.</td>
-</tr>
-</tbody>
-</table>
-<p>Posted by lirui at Jan 15, 2015 01:34
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -558,7 +538,6 @@
spark.kryo.classesToRegister=org.apache.hadoop.hive.ql.io.HiveKey,org.apache.had
<li><a href=#recommended-configuration>Recommended Configuration</a></li>
<li><a href=#design-documents>Design documents</a></li>
<li><a href=#attachments>Attachments:</a></li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
diff --git a/docs/latest/admin/index.html b/docs/latest/admin/index.html
index 08fbda6..626f0e4 100644
--- a/docs/latest/admin/index.html
+++ b/docs/latest/admin/index.html
@@ -297,7 +297,7 @@ Dec 12, 2024
<div class=docs-card-summary>
Apache Hive : Hive on Spark: Getting Started Hive on Spark provides Hive with
the ability to utilize Apache Spark as its execution engine.
set hive.execution.engine=spark; Hive on Spark was added in HIVE-7292.
-Apache Hive : Hive on Spark: Getting Started Version Compatibility Spark
Installation Configuring YARN Configuring Hive Configuration property details
Configuring Spark Tuning Details Common Issues (Green are resolved, will be
removed from this list) Recommended Configuration Design documents Attachments:
Comments: Version Compatibility Hive on Spark is only tested with a specific
version of Spark, so a given version of Hive is only guaranteed to work with a
specific version of Spark.
+Apache Hive : Hive on Spark: Getting Started Version Compatibility Spark
Installation Configuring YARN Configuring Hive Configuration property details
Configuring Spark Tuning Details Common Issues (Green are resolved, will be
removed from this list) Recommended Configuration Design documents Attachments:
Version Compatibility Hive on Spark is only tested with a specific version of
Spark, so a given version of Hive is only guaranteed to work with a specific
version of Spark.
</div>
<div class=docs-card-footer>
<a
href=https://hive.apache.org/docs/latest/admin/hive-on-spark-getting-started/
class=docs-card-link>
diff --git a/docs/latest/admin/index.xml b/docs/latest/admin/index.xml
index 89892ad..17201a0 100644
--- a/docs/latest/admin/index.xml
+++ b/docs/latest/admin/index.xml
@@ -3,7 +3,7 @@ Apache Hive : AdminManual Metastore Administration Introduction
Local/Embedded M
The Hive MetaTool enables administrators to do bulk updates on the location
fields in database, table, and partition records in the metastore. It provides
the following functionality:
Ability to search and replace the HDFS NN (NameNode) location in metastore
records that reference the NN. One use is to transition a Hive deployment to
HDFS HA NN (HDFS High Availability
NameNode).</description></item><item><title>Apache Hive : Hive on Spark:
Getting
Started</title><link>https://hive.apache.org/docs/latest/admin/hive-on-spark-getting-started/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/admin/hive-on-spark-getting-star
[...]
set hive.execution.engine=spark; Hive on Spark was added in HIVE-7292.
-Apache Hive : Hive on Spark: Getting Started Version Compatibility Spark
Installation Configuring YARN Configuring Hive Configuration property details
Configuring Spark Tuning Details Common Issues (Green are resolved, will be
removed from this list) Recommended Configuration Design documents Attachments:
Comments: Version Compatibility Hive on Spark is only tested with a specific
version of Spark, so a given version of Hive is only guaranteed to work with a
specific version of Spark.</d [...]
+Apache Hive : Hive on Spark: Getting Started Version Compatibility Spark
Installation Configuring YARN Configuring Hive Configuration property details
Configuring Spark Tuning Details Common Issues (Green are resolved, will be
removed from this list) Recommended Configuration Design documents Attachments:
Version Compatibility Hive on Spark is only tested with a specific version of
Spark, so a given version of Hive is only guaranteed to work with a specific
version of Spark.</description [...]
Introduced in Hive 0.12.0. See HIVE-3764.
Hive now records the schema version in the metastore database and verifies
that the metastore schema version is compatible with Hive binaries that are
going to accesss the metastore.</description></item><item><title>Apache Hive :
HiveAmazonElasticMapReduce</title><link>https://hive.apache.org/docs/latest/admin/hiveamazonelasticmapreduce/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/admin/hiveamazonelasticmapreduce/</guid><description>A
[...]
Background This document explores the different ways of leveraging Hive on
Amazon Web Services - namely S3, EC2 and Elastic Map-Reduce.
diff --git a/docs/latest/admin/replication/index.html
b/docs/latest/admin/replication/index.html
index fab6ddd..a2788b3 100644
--- a/docs/latest/admin/replication/index.html
+++ b/docs/latest/admin/replication/index.html
@@ -241,8 +241,7 @@ Last updated: December 12, 2024
<name>hive.exim.uri.scheme.whitelist</name>
<value>hdfs,s3a</value>
</property>
-</code></pre><p> </p>
-<p>Save</p>
+</code></pre>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
diff --git
a/docs/latest/language/enhanced-aggregation-cube-grouping-and-rollup/index.html
b/docs/latest/language/enhanced-aggregation-cube-grouping-and-rollup/index.html
index b17b63a..f7986f1 100644
---
a/docs/latest/language/enhanced-aggregation-cube-grouping-and-rollup/index.html
+++
b/docs/latest/language/enhanced-aggregation-cube-grouping-and-rollup/index.html
@@ -162,7 +162,6 @@ Last updated: December 12, 2024
<li><a href=#grouping__id-function-before-hive-230>Grouping__ID function
(before Hive 2.3.0)</a></li>
</ul>
</li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
@@ -186,7 +185,6 @@ Last updated: December 12, 2024
<li><a href=#grouping__id-function-before-hive-230>Grouping__ID function
(before Hive 2.3.0)</a></li>
</ul>
</li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
@@ -595,39 +593,6 @@ GROUP BY key, value WITH ROLLUP;
For the first row, none of the columns are being selected.<br>
For the second row, only the first column is being selected, which explains
the count of 2.<br>
For the third row, both the columns are being selected (and the second column
happens to be null), which explains the count of 1.</p>
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-</tr>
-<tr>
-<td>Is there really much value-add in the grouping sets grammar? If I think
about the plan for generating a CUBE/ROLLUP (), it’s pretty much as
efficient as generating the CUBE and then sub-selecting what you need from
it.</td>
-</tr>
-<tr>
-<td>Can we just provide CUBE and ROLLUP and not provide the additional
syntax?</td>
-</tr>
-</tbody>
-</table>
-<p>Posted by sambavi at Sep 21, 2012 15:32
-|
-|
-Depends on what the use case is.
-By sub-selecting for the right grouping set, we would be passing more data
across the map-reduce boundaries.
-I have started a prototype implementation, and the work for grouping set
should not be substantially more than
-a cube or a rollup. We can stage it, and implement GROUPING_ID later, on
demand.</p>
-<p>Posted by namit.jain at Sep 25, 2012 06:50
-|
-|
-I can only implement CUBE and ROLLUP first, but keep the execution layer
general.
-It will only require parser changes to plug in grouping sets, if need be,
later.</p>
-<p>Posted by namit.jain at Sep 25, 2012 07:16
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -666,7 +631,6 @@ It will only require parser changes to plug in grouping
sets, if need be, later.
<li><a href=#grouping__id-function-before-hive-230>Grouping__ID function
(before Hive 2.3.0)</a></li>
</ul>
</li>
-<li><a href=#comments>Comments:</a></li>
</ul>
</li>
</ul>
diff --git a/docs/latest/language/index.html b/docs/latest/language/index.html
index cea8c2c..baed042 100644
--- a/docs/latest/language/index.html
+++ b/docs/latest/language/index.html
@@ -248,7 +248,7 @@ Dec 12, 2024
</div>
<div class=docs-card-summary>
Apache Hive : Enhanced Aggregation, Cube, Grouping and Rollup This document
describes enhanced aggregation features for the GROUP BY clause of SELECT
statements.
-Apache Hive : Enhanced Aggregation, Cube, Grouping and Rollup GROUPING SETS
clause Grouping__ID function Grouping function Cubes and Rollups
hive.new.job.grouping.set.cardinality Grouping__ID function (before Hive 2.3.0)
Comments: Version
+Apache Hive : Enhanced Aggregation, Cube, Grouping and Rollup GROUPING SETS
clause Grouping__ID function Grouping function Cubes and Rollups
hive.new.job.grouping.set.cardinality Grouping__ID function (before Hive 2.3.0)
Version
Grouping sets, CUBE and ROLLUP operators, and the GROUPING__ID function were
added in Hive 0.
</div>
<div class=docs-card-footer>
diff --git a/docs/latest/language/index.xml b/docs/latest/language/index.xml
index f19e1f7..027306b 100644
--- a/docs/latest/language/index.xml
+++ b/docs/latest/language/index.xml
@@ -5,7 +5,7 @@ Patterns are case-insensitive, except AM/PM and T/Z. See these
sections for more
Version</description></item><item><title>Apache Hive : Compaction
pooling</title><link>https://hive.apache.org/docs/latest/language/compaction-pooling/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/language/compaction-pooling/</guid><description>Apache
Hive : Compaction pooling Concept: Compaction requests and workers can be
assigned to pools. A worker assigned to a specific pool will only process
compaction requests in that pool. Worke [...]
This enables various kind of sketch operations thru regular sql statement.
Apache Hive : Datasketches Integration Sketch functions Naming convention
sketchType functionName List declared sketch functions Integration with
materialized views BI mode Rewrite COUNT(DISTINCT(X)) Rewrite
percentile_disc(p) withing group(order by x) Rewrite cume_dist() over (order by
id) Rewrite NTILE Rewrite RANK Examples Simple distinct counting examples using
HLL Use HLL to compute distinct values using an intermediate table Use HLL to
compute distinct values without intermediate t [...]
-Apache Hive : Enhanced Aggregation, Cube, Grouping and Rollup GROUPING SETS
clause Grouping__ID function Grouping function Cubes and Rollups
hive.new.job.grouping.set.cardinality Grouping__ID function (before Hive 2.3.0)
Comments: Version
+Apache Hive : Enhanced Aggregation, Cube, Grouping and Rollup GROUPING SETS
clause Grouping__ID function Grouping function Cubes and Rollups
hive.new.job.grouping.set.cardinality Grouping__ID function (before Hive 2.3.0)
Version
Grouping sets, CUBE and ROLLUP operators, and the GROUPING__ID function were
added in Hive 0.</description></item><item><title>Apache Hive : Exchange
Partition</title><link>https://hive.apache.org/docs/latest/language/exchange-partition/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/language/exchange-partition/</guid><description>Apache
Hive : Exchange Partition The EXCHANGE PARTITION command will move a partition
from a source table to [...]
When the command is executed, the source table&rsquo;s partition folder in
HDFS will be renamed to move it to the destination table&rsquo;s partition
folder.</description></item><item><title>Apache Hive :
GenericUDAFCaseStudy</title><link>https://hive.apache.org/docs/latest/language/genericudafcasestudy/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/language/genericudafcasestudy/</guid><description>Apache
Hive : Tutorial to writ [...]
package com.example.hive.udf; import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text; public final class Lower extends UDF { public
Text evaluate(final Text s) { if (s == null) { return null; } return new
Text(s.</description></item><item><title>Apache Hive :
HiveQL</title><link>https://hive.apache.org/docs/latest/language/hiveql/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/language/hiveql/</guid><description>Apac
[...]
diff --git a/docs/latest/language/languagemanual-ddl/index.html
b/docs/latest/language/languagemanual-ddl/index.html
index dac7e7e..197cb6b 100644
--- a/docs/latest/language/languagemanual-ddl/index.html
+++ b/docs/latest/language/languagemanual-ddl/index.html
@@ -1934,8 +1934,6 @@ DESCRIBE default.src_thrift lintString.$elem$.myint;
<li><a href=#hcatalog-ddl>HCatalog DDL</a> in the <a
href=https://hive.apache.org/docs/latest/hcatalog/hcatalog-base/>HCatalog
manual</a></li>
<li><a
href=https://hive.apache.org/docs/latest/webhcat/webhcat-reference-allddl/>WebHCat
DDL Resources</a> in the <a
href=https://hive.apache.org/docs/latest/webhcat/webhcat-base/>WebHCat
manual</a></li>
</ul>
-<p>Save</p>
-<p>Save</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
diff --git a/docs/latest/language/languagemanual-types/index.html
b/docs/latest/language/languagemanual-types/index.html
index d15f3d5..455624a 100644
--- a/docs/latest/language/languagemanual-types/index.html
+++ b/docs/latest/language/languagemanual-types/index.html
@@ -841,7 +841,6 @@ SELECT foo FROM union_test;
</tr>
</tbody>
</table>
-<p>Save</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
diff --git a/docs/latest/language/reflectudf/index.html
b/docs/latest/language/reflectudf/index.html
index 6a9833b..8d13595 100644
--- a/docs/latest/language/reflectudf/index.html
+++ b/docs/latest/language/reflectudf/index.html
@@ -151,11 +151,7 @@ Last updated: December 12, 2024
<nav id=TableOfContents>
<ul>
<li><a href=#apache-hive--reflectudf>Apache Hive : ReflectUDF</a></li>
-<li><a href=#reflect-generic-udf>Reflect (Generic) UDF</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#reflect-generic-udf>Reflect (Generic) UDF</a></li>
</ul>
</nav>
</div>
@@ -177,24 +173,6 @@ FROM src LIMIT 1;
</code></pre><p>Version information</p>
<p>As of Hive 0.9.0, java_method() is a synonym for reflect(). See <a
href=#misc--functions>Misc. Functions</a> in Hive Operators and UDFs.</p>
<p>Note that Reflect UDF is non-deterministic since there is no guarantee what
a specific method will return given the same parameters. So be cautious when
using Reflect on the WHERE clause because that may invalidate Predicate
Pushdown optimization.</p>
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-</tr>
-<tr>
-<td>This doc comes from the Hive xdocs, with minor edits. It is included here
because the xdocs are currently unavailable (Feb. 2013).</td>
-</tr>
-</tbody>
-</table>
-<p>Posted by leftyl at Feb 21, 2013 09:30
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -216,21 +194,6 @@ FROM src LIMIT 1;
</div>
</footer>
</article>
-<aside class=docs-toc-sidebar>
-<div class=docs-toc-sticky>
-<h4><i class="fas fa-list"></i> On this page</h4>
-<nav id=TableOfContents>
-<ul>
-<li><a href=#apache-hive--reflectudf>Apache Hive : ReflectUDF</a></li>
-<li><a href=#reflect-generic-udf>Reflect (Generic) UDF</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
-</ul>
-</nav>
-</div>
-</aside>
</main>
</div>
</div>
diff --git a/docs/latest/language/supported-features/index.html
b/docs/latest/language/supported-features/index.html
index d75d413..97a157e 100644
--- a/docs/latest/language/supported-features/index.html
+++ b/docs/latest/language/supported-features/index.html
@@ -150,11 +150,7 @@ Last updated: December 12, 2024
<h4><i class="fas fa-list"></i> Table of Contents</h4>
<nav id=TableOfContents>
<ul>
-<li><a href=#apache-hive--supported-features-apache-hive-31>Apache Hive :
Supported Features: Apache Hive 3.1</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#apache-hive--supported-features-apache-hive-31>Apache Hive :
Supported Features: Apache Hive 3.1</a></li>
</ul>
</nav>
</div>
@@ -1819,61 +1815,6 @@ Last updated: December 12, 2024
</tr>
</tbody>
</table>
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-<th></th>
-<th></th>
-<th></th>
-<th></th>
-<th></th>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-<td></td>
-<td></td>
-<td></td>
-<td></td>
-<td></td>
-<td></td>
-</tr>
-<tr>
-<td><a href=https://cwiki.apache.org/confluence/display/~alangates>Alan
Gates</a> Following features are supported in 3.1:</td>
-<td></td>
-<td></td>
-<td></td>
-<td></td>
-<td></td>
-<td></td>
-</tr>
-</tbody>
-</table>
-<p>| E061-09 | Subqueries in comparison predicate |</p>
-<p>| E141-06 | CHECK constraints | No | Mandatory |</p>
-<p>Posted by vgarg at Nov 29, 2018 19:18
-|
-|
-<em>No need to declare NOT NULL with PRIMARY KEY or UNIQUE</em> - I think this
is not true. NOT NULL is not inferred on UNIQUE and needs to be explicitly
declared.</p>
-<p>Posted by vgarg at Nov 29, 2018 19:20
-|
-|</p>
-<p>| E121-02 | ORDER BY columns need not be in select list | No | Mandatory
|</p>
-<p> Looks like this feature is partially supported. Hive allows this if there
is not aggregate.</p>
-<p>Posted by vgarg at Nov 29, 2018 19:26
-|
-|
-IIUC the requirement isn’t that you don’t need to declare not null
and it is inferred, but rather that it can support unique/pk indices with nulls
in them. </p>
-<p>Posted by alangates at Nov 29, 2018 20:57
-|
-|
-Agreed, I missed this one. Feel free to edit it. I’ll be circling back
on this and a few others shortly to fix it.</p>
-<p>Posted by alangates at Nov 29, 2018 20:57
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -1895,20 +1836,6 @@ Agreed, I missed this one. Feel free to edit it.
I’ll be circling back
</div>
</footer>
</article>
-<aside class=docs-toc-sidebar>
-<div class=docs-toc-sticky>
-<h4><i class="fas fa-list"></i> On this page</h4>
-<nav id=TableOfContents>
-<ul>
-<li><a href=#apache-hive--supported-features-apache-hive-31>Apache Hive :
Supported Features: Apache Hive 3.1</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
-</ul>
-</nav>
-</div>
-</aside>
</main>
</div>
</div>
diff --git a/docs/latest/user/configuration-properties/index.html
b/docs/latest/user/configuration-properties/index.html
index c8dedd3..408a016 100644
--- a/docs/latest/user/configuration-properties/index.html
+++ b/docs/latest/user/configuration-properties/index.html
@@ -4753,28 +4753,6 @@ Some of the optimizations, such as <strong><a
href=#hiveblobstoreuseblobstoreass
<p>Jobs submitted to HCatalog can specify configuration properties that affect
storage, error tolerance, and other kinds of behavior during the job. See <a
href=https://hive.apache.org/docs/latest/hcatalog/hcatalog-configuration-properties/>HCatalog
Configuration Properties</a> for details.</p>
<h1 id=webhcat-configuration-properties>WebHCat Configuration Properties</h1>
<p>For WebHCat configuration, see <a
href=#configuration-variables>Configuration Variables</a> in the WebHCat
manual.</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
-<p>Save</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
diff --git a/docs/latest/user/hive-transactions-acid/index.html
b/docs/latest/user/hive-transactions-acid/index.html
index c801cf8..b47b2a4 100644
--- a/docs/latest/user/hive-transactions-acid/index.html
+++ b/docs/latest/user/hive-transactions-acid/index.html
@@ -700,8 +700,6 @@ ALTER TABLE table_name COMPACT 'major'
<li><a
href=https://www.slideshare.net/Hadoop_Summit/transactional-operations-in-apache-hive-present-and-future-102803358>Slides</a></li>
<li><a href="https://www.youtube.com/watch?v=GyzU9wG0cFQ&t=834s">Video</a></li>
</ul>
-<p>Save</p>
-<p>Save</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
diff --git a/docs/latest/user/hive-transactions/index.html
b/docs/latest/user/hive-transactions/index.html
index 7f6e760..50a150c 100644
--- a/docs/latest/user/hive-transactions/index.html
+++ b/docs/latest/user/hive-transactions/index.html
@@ -568,8 +568,6 @@ ALTER TABLE table_name COMPACT 'major'
<li><a
href=https://www.slideshare.net/Hadoop_Summit/transactional-operations-in-apache-hive-present-and-future-102803358>Slides</a></li>
<li><a href="https://www.youtube.com/watch?v=GyzU9wG0cFQ&t=834s">Video</a></li>
</ul>
-<p>Save</p>
-<p>Save</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
diff --git a/docs/latest/user/hiveserver2-clients/index.html
b/docs/latest/user/hiveserver2-clients/index.html
index 578675f..7a0bb50 100644
--- a/docs/latest/user/hiveserver2-clients/index.html
+++ b/docs/latest/user/hiveserver2-clients/index.html
@@ -1325,7 +1325,6 @@ See examples in <a
href=https://github.com/apache/hive/blob/master/beeline/src/
<p>JDBC connection URL: </p>
<p><code>jdbc:hive2://<host>:<port>/<db>;transportMode=http;httpPath=<http_endpoint>;http.cookie.<name1>=<value1>;http.cookie.<name2>=<value2></code></p>
<p>When the above URL is specified, Beeline will call underlying requests to
add HTTP cookie in the request header, and will set it to <em></em>=<em></em>
and <em></em>=<em></em>. </p>
-<p>Save</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
diff --git a/docs/latest/user/index.html b/docs/latest/user/index.html
index f2492d4..381000a 100644
--- a/docs/latest/user/index.html
+++ b/docs/latest/user/index.html
@@ -844,7 +844,7 @@ Dec 12, 2024
</span>
</div>
<div class=docs-card-summary>
-Apache Hive : SerDe Apache Hive : SerDe SerDe Overview Built-in and Custom
SerDes Built-in SerDes Custom SerDes HiveQL for SerDes Input Processing Output
Processing Additional Notes Comments: SerDe Overview SerDe is short for
Serializer/Deserializer. Hive uses the SerDe interface for IO. The interface
handles both serialization and deserialization and also interpreting the
results of serialization as individual fields for processing.
+Apache Hive : SerDe Apache Hive : SerDe SerDe Overview Built-in and Custom
SerDes Built-in SerDes Custom SerDes HiveQL for SerDes Input Processing Output
Processing Additional Notes SerDe Overview SerDe is short for
Serializer/Deserializer. Hive uses the SerDe interface for IO. The interface
handles both serialization and deserialization and also interpreting the
results of serialization as individual fields for processing.
</div>
<div class=docs-card-footer>
<a href=https://hive.apache.org/docs/latest/user/serde/ class=docs-card-link>
diff --git a/docs/latest/user/index.xml b/docs/latest/user/index.xml
index 77282b2..6ded464 100644
--- a/docs/latest/user/index.xml
+++ b/docs/latest/user/index.xml
@@ -40,7 +40,7 @@ Apache Hive : Query ReExecution ReExecition strategies
Overlay Reoptimize Operat
RCFile stores table data in a flat file consisting of binary key/value pairs.
It first partitions rows horizontally into row splits, and then it vertically
partitions each row split in a columnar way. RCFile stores the metadata of a
row split as the key part of a record, and all the data of a row split as the
value part.</description></item><item><title>Apache Hive :
RCFileCat</title><link>https://hive.apache.org/docs/latest/user/rcfilecat/</link><pubDate>Thu,
12 Dec 2024 00:00:00 +0000< [...]
Apache Hive : RCFileCat Data Metadata Data Prints out the rows stored in an
RCFile, columns are tab separated and rows are newline separated.
Usage:
-hive --rcfilecat [--start=start_offset] [--length=len] [--verbose] fileName
--start=start_offset Start offset to begin reading in the file --length=len
Length of data to read from the file --verbose Prints periodic stats about the
data read, how many records, how many bytes, scan rate Metadata New in
0.</description></item><item><title>Apache Hive : Rebalance
compaction</title><link>https://hive.apache.org/docs/latest/user/rebalance-compaction/</link><pubDate>Thu,
12 Dec 2024 00:00:00 +0 [...]
+hive --rcfilecat [--start=start_offset] [--length=len] [--verbose] fileName
--start=start_offset Start offset to begin reading in the file --length=len
Length of data to read from the file --verbose Prints periodic stats about the
data read, how many records, how many bytes, scan rate Metadata New in
0.</description></item><item><title>Apache Hive : Rebalance
compaction</title><link>https://hive.apache.org/docs/latest/user/rebalance-compaction/</link><pubDate>Thu,
12 Dec 2024 00:00:00 +0 [...]
HIVE-3705 added metastore server security to Hive in release 0.10.0.
For additional information about storage based authorization in the metastore
server, see the HCatalog document Storage Based Authorization. For an overview
of Hive authorization models and other security options, see the Authorization
document.</description></item><item><title>Apache Hive : Streaming Data
Ingest</title><link>https://hive.apache.org/docs/latest/user/streaming-data-ingest/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/us [...]
The TeradataBinarySerDe is available in Hive 2.4 or greater.
diff --git a/docs/latest/user/multidelimitserde/index.html
b/docs/latest/user/multidelimitserde/index.html
index fe4c1be..5a83bb9 100644
--- a/docs/latest/user/multidelimitserde/index.html
+++ b/docs/latest/user/multidelimitserde/index.html
@@ -154,11 +154,7 @@ Last updated: December 12, 2024
<li><a href=#introduction>Introduction</a></li>
<li><a href=#version>Version</a></li>
<li><a href=#hive-ql-syntax>Hive QL Syntax</a></li>
-<li><a href=#limitations>Limitations</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#limitations>Limitations</a></li>
</ul>
</nav>
</div>
@@ -184,29 +180,6 @@ WITH SERDEPROPERTIES
("field.delim"="[,]","collection.d
<li>Nested complex type is not supported, e.g. an Array.</li>
<li>To use MultiDelimitSerDe prior to Hive release 4.0.0, you have to add the
hive-contrib jar to the class path, e.g. with the add jar command.</li>
</ul>
-<p> </p>
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-</tr>
-<tr>
-<td>Thank you <a
href=https://cwiki.apache.org/confluence/display/~leftyl>Lefty Leverenz</a></td>
-</tr>
-</tbody>
-</table>
-<p>Posted by afan at Oct 05, 2018 06:18
-|
-|
-And thanks for your contributions <a
href=https://cwiki.apache.org/confluence/display/~afan>Alice Fan</a>.</p>
-<p>Posted by leftyl at Oct 05, 2018 06:27
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -237,11 +210,7 @@ And thanks for your contributions <a
href=https://cwiki.apache.org/confluence/d
<li><a href=#introduction>Introduction</a></li>
<li><a href=#version>Version</a></li>
<li><a href=#hive-ql-syntax>Hive QL Syntax</a></li>
-<li><a href=#limitations>Limitations</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#limitations>Limitations</a></li>
</ul>
</nav>
</div>
diff --git a/docs/latest/user/serde/index.html
b/docs/latest/user/serde/index.html
index 22548f5..358a2f4 100644
--- a/docs/latest/user/serde/index.html
+++ b/docs/latest/user/serde/index.html
@@ -164,11 +164,7 @@ Last updated: December 12, 2024
</li>
<li><a href=#input-processing>Input Processing</a></li>
<li><a href=#output-processing>Output Processing</a></li>
-<li><a href=#additional-notes>Additional Notes</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#additional-notes>Additional Notes</a></li>
</ul>
</nav>
</div>
@@ -191,11 +187,7 @@ Last updated: December 12, 2024
</li>
<li><a href=#input-processing>Input Processing</a></li>
<li><a href=#output-processing>Output Processing</a></li>
-<li><a href=#additional-notes>Additional Notes</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#additional-notes>Additional Notes</a></li>
</ul>
</nav>
</aside>
@@ -250,56 +242,6 @@ Last updated: December 12, 2024
</ul>
<p>In short, Hive will automatically convert objects so that Integer will be
converted to IntWritable (and vice versa) if needed. This allows people without
Hadoop knowledge to use Java primitive classes (Integer, etc), while hadoop
users/experts can use IntWritable which is more efficient.</p>
<p>Between map and reduce, Hive uses LazyBinarySerDe and BinarySortableSerDe
’s serialize methods. SerDe can serialize an object that is created by
another serde, using ObjectInspector.</p>
-<h2 id=comments>Comments:</h2>
-<table>
-<thead>
-<tr>
-<th></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td></td>
-</tr>
-<tr>
-<td>I noticed that there are ‘!’s in the text, but didn’t
figure out why.</td>
-</tr>
-</tbody>
-</table>
-<p>Posted by xuefu at Feb 22, 2014 20:17
-|
-|
-The exclamation marks also appear in two sections of the Developer Guide:* <a
href=#hive-serde>Hive SerDe</a></p>
-<ul>
-<li><a href=#objectinspector>ObjectInspector</a></li>
-</ul>
-<p>I asked about them in a comment on <a
href="https://issues.apache.org/jira/browse/HIVE-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895544#comment-13895544">HIVE-5380</a>.
If they aren’t escape characters, could they be leftovers from a
previous formatting style?</p>
-<p>Posted by leftyl at Feb 23, 2014 08:47
-|
-|
-Yes, they are artifacts of the old MoinMoin Wiki syntax and can be removed.</p>
-<p>Posted by larsfrancke at Feb 23, 2014 09:09
-|
-|
-And they’re gone, gone, solid gone. Thanks Lars.</p>
-<p>Posted by leftyl at Feb 25, 2014 09:19
-|
-|
-<a href=https://cwiki.apache.org/confluence/display/~leftyl>Lefty Leverenz</a>
I added JsonSerDe to the list of built-in serdes and created new page for Json
Serde. Can you review it?</p>
-<p>Posted by apivovarov at Dec 15, 2015 01:43
-|
-|
-Great! Thanks <a
href=https://cwiki.apache.org/confluence/display/~apivovarov>Alexander
Pivovarov</a>, I’ll just make a few minor edits.</p>
-<p>Posted by leftyl at Jan 06, 2016 03:17
-|
-|
-<a href=https://cwiki.apache.org/confluence/display/~apivovarov>Alexander
Pivovarov</a>, in the Json SerDe doc you have a code box with the title
“Create table, specify CSV properties” but I don’t see
anything about CSV in the code – should it be “Create table, specify
JsonSerDe” instead?</p>
-<p>Posted by leftyl at Jan 07, 2016 08:31
-|
-|
-<a href=https://cwiki.apache.org/confluence/display/~apivovarov>Alexander
Pivovarov</a>, pinging about “CSV” in the Json SerDe doc’s
code box (see my reply to your comment on the SerDe doc).</p>
-<p>Posted by leftyl at Mar 19, 2016 08:33
-|</p>
</div>
<footer class=docs-footer>
<div class=docs-feedback>
@@ -340,11 +282,7 @@ Great! Thanks <a
href=https://cwiki.apache.org/confluence/display/~apivovarov
</li>
<li><a href=#input-processing>Input Processing</a></li>
<li><a href=#output-processing>Output Processing</a></li>
-<li><a href=#additional-notes>Additional Notes</a>
-<ul>
-<li><a href=#comments>Comments:</a></li>
-</ul>
-</li>
+<li><a href=#additional-notes>Additional Notes</a></li>
</ul>
</nav>
</div>
diff --git a/index.json b/index.json
index 9de3eb1..24350cb 100644
--- a/index.json
+++ b/index.json
@@ -1 +1 @@
-[{"categories":null,"contents":"Background In Hive, lineage information is
captured in the form of LineageInfo object. This object is created in the
SemanticAnalyzer and is passed to the HookContext object. Users can use the
following existing Hooks or implement their own custom hooks to capture this
information and utilize it.\nExisting Hooks
org.apache.hadoop.hive.ql.hooks.PostExecutePrinter
org.apache.hadoop.hive.ql.hooks.LineageLogger
org.apache.atlas.hive.hook.HiveHook To facilita [...]
\ No newline at end of file
+[{"categories":null,"contents":"Background In Hive, lineage information is
captured in the form of LineageInfo object. This object is created in the
SemanticAnalyzer and is passed to the HookContext object. Users can use the
following existing Hooks or implement their own custom hooks to capture this
information and utilize it.\nExisting Hooks
org.apache.hadoop.hive.ql.hooks.PostExecutePrinter
org.apache.hadoop.hive.ql.hooks.LineageLogger
org.apache.atlas.hive.hook.HiveHook To facilita [...]
\ No newline at end of file
diff --git a/index.xml b/index.xml
index 3d72bcc..a0283f4 100644
--- a/index.xml
+++ b/index.xml
@@ -9,7 +9,7 @@ Apache Hive : AuthDev 1. Privilege 1.1 Access Privilege 2. Hive
Operations 3. Me
The AvroSerde is available in Hive 0.9.1 and
greater.</description></item><item><title>Apache Hive :
BecomingACommitter</title><link>https://hive.apache.org/community/becomingcommitter/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/community/becomingcommitter/</guid><description>Apache
Hive : BecomingACommitter Becoming A Hive Committer The Apache Software
Foundation defines generic guidelines for what it means to be a committer.
However, it leaves [...]
Committer Zen Contributors often ask Hive PMC members the question,
&ldquo;What do I need to do in order to become a
committer?</description></item><item><title>Apache Hive : Binary DataType
Proposal</title><link>https://hive.apache.org/development/desingdocs/binary-datatype-proposal/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/development/desingdocs/binary-datatype-proposal/</guid><description>Apache
Hive : Binary DataType Proposal Binary Ty [...]
Hive is a project of the Apache Software Foundation. The foundation holds the
copyright on Apache code including the code in the Hive codebase. The
foundation FAQ explains the operation and background of the
foundation.</description></item><item><title>Apache Hive : CAST...FORMAT with
SQL:2016 datetime
formats</title><link>https://hive.apache.org/docs/latest/language/cast-format-with-sql2016-datetime-formats/</link><pubDate>Thu,
12 Dec 2024 00:00:00 +0000</pubDate><guid>https://hive.apac [...]
-Patterns are case-insensitive, except AM/PM and T/Z. See these sections for
more details. For string to datetime conversion, no duplicate format tokens are
allowed, including tokens</description></item><item><title>Apache Hive :
ChangeLog</title><link>https://hive.apache.org/docs/latest/changelog/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/changelog/</guid><description>Apache
Hive : ChangeLog Release 4.0.0 - 2024-03-29 NEW FEATURES: [...]
+Patterns are case-insensitive, except AM/PM and T/Z. See these sections for
more details. For string to datetime conversion, no duplicate format tokens are
allowed, including tokens</description></item><item><title>Apache Hive :
ChangeLog</title><link>https://hive.apache.org/docs/latest/changelog/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/changelog/</guid><description>Apache
Hive : ChangeLog Release 4.0.0 - 2024-03-29 NEW FEATURES: [...]
Version</description></item><item><title>Apache Hive : Compaction
pooling</title><link>https://hive.apache.org/docs/latest/language/compaction-pooling/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/language/compaction-pooling/</guid><description>Apache
Hive : Compaction pooling Concept: Compaction requests and workers can be
assigned to pools. A worker assigned to a specific pool will only process
compaction requests in that pool. Worke [...]
You can import text files compressed with Gzip or Bzip2 directly into a table
stored as TextFile. The compression will be detected automatically and the file
will be decompressed on-the-fly during query execution. For
example:</description></item><item><title>Apache Hive : Configuration
Properties</title><link>https://hive.apache.org/docs/latest/user/configuration-properties/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/user/configurat [...]
The canonical list of configuration properties is managed in the HiveConf Java
class, so refer to the HiveConf.java file for a complete list of configuration
properties available in your Hive release.
@@ -65,7 +65,7 @@ TIMESTAMP and TIMESTAMP WITHOUT TIME ZONE The TIMESTAMP and
TIMESTAMP WITHOUT TI
Apache Hive : Druid Integration Objectives Preliminaries Druid Storage
Handlers Usage Discovery and management of Druid datasources from Hive Create
tables linked to existing Druid datasources Create Druid datasources from Hive
Druid kafka ingestion from Hive INSERT, INSERT OVERWRITE and DROP statements
Queries completely executed in Druid Queries across Druid and Hive Open Issues
(JIRA) Objectives Our main goal is to be able to index data from Hive into
Druid, and to be able to query Dr [...]
Tutorial: Dynamic-Partition Insert Hive DML: Dynamic Partition Inserts
HCatalog Dynamic Partitioning Usage with Pig Usage from MapReduce References:
Original design doc HIVE-936 Terminology Static Partition (SP) columns: in
DML/DDL involving multiple partitioning columns, the columns whose values are
known at COMPILE TIME (given by user).</description></item><item><title>Apache
Hive : Enabling gRPC in Hive/Hive Metastore
(Proposal)</title><link>https://hive.apache.org/development/desingdocs/enabling-grpc-in-hive-metastore/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/development/desingdocs/ena [...]
-Apache Hive : Enhanced Aggregation, Cube, Grouping and Rollup GROUPING SETS
clause Grouping__ID function Grouping function Cubes and Rollups
hive.new.job.grouping.set.cardinality Grouping__ID function (before Hive 2.3.0)
Comments: Version
+Apache Hive : Enhanced Aggregation, Cube, Grouping and Rollup GROUPING SETS
clause Grouping__ID function Grouping function Cubes and Rollups
hive.new.job.grouping.set.cardinality Grouping__ID function (before Hive 2.3.0)
Version
Grouping sets, CUBE and ROLLUP operators, and the GROUPING__ID function were
added in Hive 0.</description></item><item><title>Apache Hive : Exchange
Partition</title><link>https://hive.apache.org/docs/latest/language/exchange-partition/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/language/exchange-partition/</guid><description>Apache
Hive : Exchange Partition The EXCHANGE PARTITION command will move a partition
from a source table to [...]
When the command is executed, the source table&rsquo;s partition folder in
HDFS will be renamed to move it to the destination table&rsquo;s partition
folder.</description></item><item><title>Apache Hive :
FileFormats</title><link>https://hive.apache.org/docs/latest/user/fileformats/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/user/fileformats/</guid><description>Apache
Hive : FileFormats File Formats and Compression File Forma [...]
Text File SequenceFile RCFile Avro Files ORC Files Parquet Custom INPUTFORMAT
and OUTPUTFORMAT The hive.default.fileformat configuration parameter determines
the format to use if it is not specified in a CREATE TABLE or ALTER TABLE
statement. Text file is the parameter&rsquo;s default value.
@@ -92,7 +92,7 @@ Hive installation is documented here.
HCatalog Command Line If you install Hive from the binary tarball, the hcat
command is available in the hcatalog/bin
directory.</description></item><item><title>Apache Hive : HCatalog
LoadStore</title><link>https://hive.apache.org/docs/latest/hcatalog/hcatalog-loadstore/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/hcatalog/hcatalog-loadstore/</guid><description>Apache
Hive : HCatalog Load and Store Interfaces Apache Hive : HCatalog Lo [...]
Apache Hive : HCatalog Streaming Mutation API Background Structure Data
Requirements Streaming Requirements Record Layout Connection and Transaction
Management Writing Data Dynamic Partition Creation Reading Data Example
Attachments: Background In certain data processing use cases it is necessary to
modify existing data when new facts
arrive.</description></item><item><title>Apache Hive : HCatalog
UsingHCat</title><link>https://hive.apache.org/docs/latest/hcatalog/hcatalog-usinghcat/</li
[...]
Apache Hive : Hive across Multiple Data Centers (Physical Clusters) Use Cases
Requirements Use Cases Inside facebook, we are running out of power inside a
data center (physical cluster), and we have a need to have a bigger
cluster.</description></item><item><title>Apache Hive : Hive APIs
Overview</title><link>https://hive.apache.org/community/resources/hive-apis-overview/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/community/resources/hive-apis-o
[...]
-Apache Hive : Hive APIs Overview API categories Operation based APIs Query
based APIs Available APIs HCatClient (Java) HCatalog Storage Handlers (Java)
HCatalog CLI (Command Line) Metastore (Java) WebHCat (REST) Streaming Data
Ingest (Java) Streaming Mutation (Java) hive-jdbc (JDBC) Comments: API
categories The APIs can be segmented into two conceptual categories: operation
based APIs and query based APIs.</description></item><item><title>Apache Hive :
Hive Aws EMR</title><link>https://h [...]
+Apache Hive : Hive APIs Overview API categories Operation based APIs Query
based APIs Available APIs HCatClient (Java) HCatalog Storage Handlers (Java)
HCatalog CLI (Command Line) Metastore (Java) WebHCat (REST) Streaming Data
Ingest (Java) Streaming Mutation (Java) hive-jdbc (JDBC) API categories The
APIs can be segmented into two conceptual categories: operation based APIs and
query based APIs.</description></item><item><title>Apache Hive : Hive Aws
EMR</title><link>https://hive.apache [...]
Here you can find the most important configurations and default values.
Config Name Default Value Description Config file
hive.metastore.client.cache.v2.enabled true This property enabled a Caffaine
Cache for Metastore client MetastoreConf More configs are in
MetastoreConf.</description></item><item><title>Apache Hive : Hive deprecated
authorization mode / Legacy
Mode</title><link>https://hive.apache.org/docs/latest/user/hive-deprecated-authorization-mode/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/user/ [...]
Apache Hive : Hive deprecated authorization mode / Legacy Mode Disclaimer
Prerequisites Users, Groups, and Roles Creating/Dropping/Using Roles Privileges
Hive Operations and Required Privileges Disclaimer Hive authorization is not
completely secure.</description></item><item><title>Apache Hive : Hive
HPL/SQL</title><link>https://hive.apache.org/docs/latest/user/hive-hpl-sql/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/user/hive-hpl-sq [...]
@@ -102,7 +102,7 @@ The Hive MetaTool enables administrators to do bulk updates
on the location fiel
Ability to search and replace the HDFS NN (NameNode) location in metastore
records that reference the NN. One use is to transition a Hive deployment to
HDFS HA NN (HDFS High Availability
NameNode).</description></item><item><title>Apache Hive : Hive
Metrics</title><link>https://hive.apache.org/docs/latest/user/hive-metrics/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/user/hive-metrics/</guid><description>Apache
Hive : Hive Metrics The [...]
The metrics dump will display any metric available over JMX encoded in JSON:
Alternatively the metrics can be written directly into HDFS, a JSON file on the
local file system where the HS2 instance is running or to the console by
enabling the corresponding metric reporters. By default only the JMX and the
JSON file reporter are enabled.</description></item><item><title>Apache Hive :
Hive on
Spark</title><link>https://hive.apache.org/docs/latest/user/hive-on-spark/</link><pubDate>Thu,
12 [...]
set hive.execution.engine=spark; Hive on Spark was added in HIVE-7292.
-Apache Hive : Hive on Spark: Getting Started Version Compatibility Spark
Installation Configuring YARN Configuring Hive Configuration property details
Configuring Spark Tuning Details Common Issues (Green are resolved, will be
removed from this list) Recommended Configuration Design documents Attachments:
Comments: Version Compatibility Hive on Spark is only tested with a specific
version of Spark, so a given version of Hive is only guaranteed to work with a
specific version of Spark.</d [...]
+Apache Hive : Hive on Spark: Getting Started Version Compatibility Spark
Installation Configuring YARN Configuring Hive Configuration property details
Configuring Spark Tuning Details Common Issues (Green are resolved, will be
removed from this list) Recommended Configuration Design documents Attachments:
Version Compatibility Hive on Spark is only tested with a specific version of
Spark, so a given version of Hive is only guaranteed to work with a specific
version of Spark.</description [...]
We believe that this type of functionality will be of increasing importance as
Hadoop and Hive workloads migrate to the
cloud.</description></item><item><title>Apache Hive : Hive Schema
Tool</title><link>https://hive.apache.org/docs/latest/admin/hive-schema-tool/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/admin/hive-schema-tool/</guid><description>Apache
Hive : Hive Schema Tool Apache Hive : Hive Schema Tool Metastore Schema
Verifica [...]
Introduced in Hive 0.12.0. See HIVE-3764.
Hive now records the schema version in the metastore database and verifies
that the metastore schema version is compatible with Hive binaries that are
going to accesss the metastore.</description></item><item><title>Apache Hive :
Hive
Transactions</title><link>https://hive.apache.org/docs/latest/user/hive-transactions/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/user/hive-transactions/</guid><description>Apache
Hive : ACID Transaction [...]
@@ -166,7 +166,7 @@ One is INPUT__FILE__NAME, which is the input
file&rsquo;s name for a mapper
the other is BLOCK__OFFSET__INSIDE__FILE, which is the current global file
position.
For block compressed file, it is the current block&rsquo;s file offset,
which is the current block&rsquo;s first byte&rsquo;s file
offset.</description></item><item><title>Apache Hive : LanguageManual
WindowingAndAnalytics</title><link>https://hive.apache.org/docs/latest/language/languagemanual-windowingandanalytics/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/language/languagemanual-windowingandanalytics/</guid><descripti
[...]
This section introduces the Hive QL enhancements for windowing and analytics
functions. See &ldquo;Windowing Specifications in HQL&rdquo; (attached
to HIVE-4197) for details. HIVE-896 has more information, including links to
earlier documentation in the initial
comments.</description></item><item><title>Apache Hive : LanguageManual
XPathUDF</title><link>https://hive.apache.org/docs/latest/language/languagemanual-xpathudf/</link><pubDate>Thu,
12 Dec 2024 00:00:00 +0000</pubDate><g [...]
-UDFs xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double,
xpath_number, xpath_string Functions for parsing XML data using XPath
expressions. Since version: 0.6.0 Overview The xpath family of UDFs are
wrappers around the Java XPath library javax.xml.xpath provided by the JDK. The
library is based on the XPath 1.0 specification. Please refer to
http://java.sun.com/javase/6/docs/api/javax/xml/xpath/package-summary.html for
detailed information on the Java XPath library.</de [...]
+UDFs xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double,
xpath_number, xpath_string Functions for parsing XML data using XPath
expressions. Since version: 0.6.0 Overview The xpath family of UDFs are
wrappers around the Java XPath library javax.xml.xpath provided by the JDK. The
library is based on the XPath 1.0 specification. Please refer to
http://java.sun.com/javase/6/docs/api/javax/xml/xpath/package-summary.html for
detailed information on the Java XPath library.</de [...]
There are many tables of the following format:
create table T(a, b, c, .</description></item><item><title>Apache Hive :
Literals</title><link>https://hive.apache.org/docs/latest/language/literals/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/language/literals/</guid><description>Apache
Hive : Literals Literals Integral types Integral literals are assumed to be
INT by default, unless the number exceeds the range of INT in which case it is
interpreted as a BIGINT, or if one of the fo [...]
Type Postfix Example TINYINT Y 100Y SMALLINT S 100S BIGINT L 100L String types
String literals can be expressed with either single quotes (') or double quotes
(&quot;).</description></item><item><title>Apache Hive :
LLAP</title><link>https://hive.apache.org/development/desingdocs/llap/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/development/desingdocs/llap/</guid><description>Apache
Hive : LLAP Live Long And Process (LLAP) functionality was a [...]
@@ -210,7 +210,7 @@ There are several rules already defined by the community,
but most of them are n
Yetus helps us by checking these rules for newly introduced errors. Note that
Yetus checks only the changed part of the code. If any unchanged code contains
errors, then Yetus will not report them, but all of the new code should conform
to the rules.</description></item><item><title>Apache Hive : Scheduled
Queries</title><link>https://hive.apache.org/docs/latest/language/scheduled-queries/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/l [...]
Authorization modes
The links below refer to the original Hive authorization mode. See
Authorization for an overview of authorization modes, which include storage
based authorization and SQL standards based authorization.
-Thoughts on security from Venkatesh Howl&rsquo;s approach for persisting
and validating DDL authorization via HDFS permissions HIVE-1264: Hadoop
security integration THRIFT-889: allow Kerberos authentication over Thrift HTTP
THRIFT-876: SASL integration Howl Authorization Proposal Hive Authorization
Proposal Note that Howl was the precursor to
HCatalog.</description></item><item><title>Apache Hive :
SerDe</title><link>https://hive.apache.org/docs/latest/user/serde/</link><pubDate>Thu
[...]
+Thoughts on security from Venkatesh Howl&rsquo;s approach for persisting
and validating DDL authorization via HDFS permissions HIVE-1264: Hadoop
security integration THRIFT-889: allow Kerberos authentication over Thrift HTTP
THRIFT-876: SASL integration Howl Authorization Proposal Hive Authorization
Proposal Note that Howl was the precursor to
HCatalog.</description></item><item><title>Apache Hive :
SerDe</title><link>https://hive.apache.org/docs/latest/user/serde/</link><pubDate>Thu
[...]
STEP 1: Pull the image Pull the 4.0.0 image from Hive DockerHub docker pull
apache/hive:4.0.0 STEP 2: Export the Hive version export HIVE_VERSION=4.0.0
STEP 3: Launch the HiveServer2 with an embedded Metastore. This is lightweight
and for a quick setup, it uses Derby as metastore db.
docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2
--name hive4 apache/hive:${HIVE_VERSION} STEP 4: Connect to beeline docker exec
-it hiveserver2 beeline -u 'jdbc:hive2://hiveserver2:10000/' Note: Launch
Standalone Metastore To use standalone Metastore with
Derby,</description></item><item><title>Apache Hive : Setting Up
HiveServer2</title><link>https://hive.apache.org/docs/latest/admin/setting-up-hiveserver2/</link><pubDate>Thu,
12 Dec 2024 00:00:00 +0000</pubDa [...]
The Thrift interface definition language (IDL) for HiveServer2 is available at
https://github.</description></item><item><title>Apache Hive : Skewed Join
Optimization</title><link>https://hive.apache.org/development/desingdocs/skewed-join-optimization/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/development/desingdocs/skewed-join-optimization/</guid><description>Apache
Hive : Skewed Join Optimization Optimizing Skewed Joins The Problem A join of
[...]
@@ -235,7 +235,7 @@ select * from
(subq1
UNION ALL
sub2) u;
-If the parents to union were map reduce jobs, they will write the output to
temporary files. The Union will then read the rows from these temporary files
and write to a final directory. In effect, the results are read and written
twice unnecessarily. We can avoid this by directly writing to the final
directory.</description></item><item><title>Apache Hive : Unit Testing Hive
SQL</title><link>https://hive.apache.org/community/resources/unit-testing-hive-sql/</link><pubDate>Thu,
12 Dec 202 [...]
+If the parents to union were map reduce jobs, they will write the output to
temporary files. The Union will then read the rows from these temporary files
and write to a final directory. In effect, the results are read and written
twice unnecessarily. We can avoid this by directly writing to the final
directory.</description></item><item><title>Apache Hive : Unit Testing Hive
SQL</title><link>https://hive.apache.org/community/resources/unit-testing-hive-sql/</link><pubDate>Thu,
12 Dec 202 [...]
The view refers to exactly one base table or updatable view in the FROM clause
without a WHERE clause. Each column in the view is a column in the underlying
table/updatable view with no underlying columns duplicated. Views must have the
same partition columns as the underlying table/updatable view. When inserting
into a view:
If a view does not specify all underlying columns, NULL will be inserted for
each column not specified.</description></item><item><title>Apache Hive : User
and Group Filter Support with LDAP Atn Provider in
HiveServer2</title><link>https://hive.apache.org/docs/latest/admin/user-and-group-filter-support-with-ldap-atn-provider-in-hiveserver2/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/admin/user-and-group-filter-support-with-ldap-atn-p
[...]
Navigation Links Next: Using WebHCat</description></item><item><title>Apache
Hive : WebHCat
Configure</title><link>https://hive.apache.org/docs/latest/webhcat/webhcat-configure/</link><pubDate>Thu,
12 Dec 2024 00:00:00
+0000</pubDate><guid>https://hive.apache.org/docs/latest/webhcat/webhcat-configure/</guid><description>Apache
Hive : WebHCat Configure Apache Hive : WebHCat Configure Configuration Files
Configuration Variables Configuration Files The configuration for WebHCat
(Templeton) [...]