[Hadoop Wiki] Update of "Hive" by JohnSichi

Apache Wiki Sun, 26 Jun 2011 16:08:01 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive" page has been changed by JohnSichi:
http://wiki.apache.org/hadoop/Hive?action=diff&rev1=81&rev2=83

+ The Apache Hive wiki has moved to 
[[https://cwiki.apache.org/confluence/display/Hive|Confluence]]!
- = What is Hive =
- [[http://hadoop.apache.org/hive/|Hive]] is a data warehouse infrastructure 
built on top of [[.|Hadoop]]. It provides tools to enable easy data ETL, a 
mechanism to put structures on the data, and the capability to querying and 
analysis of large data sets stored in Hadoop files. Hive defines a simple 
SQL-like query language, called QL, that enables users familiar with SQL to 
query the data. At the same time, this language also allows programmers who are 
familiar with the MapReduce framework to be able to plug in their custom 
mappers and reducers to perform more sophisticated analysis that may not be 
supported by the built-in capabilities of the language.
  
- Hive does not mandate read or written data be in the "Hive format"---there is 
no such thing. Hive works equally well on Thrift, control delimited, or your 
specialized data formats.  Please see [[/DeveloperGuide#File_Formats|File 
Format]] and 
[[http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook|SerDe]]
 in the [[/DeveloperGuide|Developer Guide]] for details.
+ If you're looking for a particular page name, try 
[[https://cwiki.apache.org/confluence/pages/listpages-dirview.action?key=Hive|this
 list]].
  
- = What Hive is NOT =
- Hadoop is a batch processing system and Hadoop jobs tend to have high latency 
and incur substantial overheads in job submission and scheduling. As a result - 
latency for Hive queries is generally very high (minutes) even when data sets 
involved are very small (say a few hundred megabytes). As a result it cannot be 
compared with systems such as Oracle where analyses are conducted on a 
significantly smaller amount of data but the analyses proceed much more 
iteratively with the response times between iterations being less than a few 
minutes. Hive aims to provide acceptable (but not optimal) latency for 
interactive data browsing, queries over small data sets or test queries. Hive 
also does not provide sort of data or query cache to make repeated queries over 
the same data set faster.
- 
- Hive is not designed for online transaction processing and does not offer 
real-time queries and row level updates. It is best used for batch jobs over 
large sets of immutable data (like web logs). What Hive values most are 
scalability (scale out with more machines added dynamically to the Hadoop 
cluster), extensibility (with MapReduce framework and UDF/UDAF/UDTF), 
fault-tolerance, and loose-coupling with its input formats.
- 
- = Information =
-  * General information about Hive
-   * [[/GettingStarted|Getting Started]]
-   * [[/Presentations|Presentations and Papers about Hive]]
-   * [[/PoweredBy|A List of Sites and Applications Powered by Hive]]
-   * [[/FAQ|FAQ]]
-   * [[http://hadoop.apache.org/hive/mailing_lists.html#Users|hive-users 
mailing list]]
-   * Hive IRC Channel: #hive at irc.freenode.net
-  * For users:
-   * [[/Tutorial|Hive Tutorial]]
-   * [[/LanguageManual|HiveQL Language Manual (Queries, DML, DDL, and CLI)]]
-   * [[/HivePlugins|Hive Plug-in Interfaces - User-Defined Functions and 
SerDes]]
-   * [[/LanguageManual/UDF|Guide to Hive Operators and Functions]]
-    * [[Hive/StatisticsAndDataMining|Functions for Statistics and Data Mining]]
-   * [[/HiveWebInterface|Hive Web Interface]]
-   * [[/HiveClient|Hive Client (JDBC, ODBC, Thrift, etc)]]
-  * For administrators:
-   * [[/AdminManual/Installation|Installing Hive]]
-   * [[/AdminManual/Configuration|Configuring Hive]]
-   * [[/AdminManual/MetastoreAdmin|Setting up Metastore]]
-   * [[/HiveWebInterface|Setting up Hive Web Interface]]
-   * [[/AdminManual/SettingUpHiveServer|Setting up Hive Server (JDBC, ODBC, 
Thrift, etc)]]
-   * [[/HiveAws|Hive on Amazon Web Services]]
-  * For developers:
-   * [[/HowToContribute|How to Contribute]]
-   * [[/Development/ContributorsMeetings|Hive Contributors Meetings]]
-   * [[/DeveloperGuide|Hive Developer Guide]]
-   * [[/Performance|Hive Performance]]
-   * [[/Design|Hive Architecture Overview]]
-   * [[/DesignDocs|Hive Design Docs]]
-   * [[/Roadmap|Roadmap/call to Add More Features]]
-   * [[http://search-hadoop.com/Hive|Full-text search over all Hive resources]]
-   * [[/HowToCommit|How to Commit]]
-   * [[/HowToRelease|How to Release]]
-   * [[/HudsonBuild|Build status on Jenkins (formerly Hudson)]]
-   * [[https://cwiki.apache.org/confluence/display/Hive/Bylaws|Project Bylaws]]
- 
- For more information, please see the official 
[[http://hadoop.apache.org/hive/|Hive website]].
-

[Hadoop Wiki] Update of "Hive" by JohnSichi

Reply via email to