[
https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629416#action_12629416
]
lengwuqing commented on HADOOP-3601:
------------------------------------
--- How to improve Hive ---
1. compiling:
1. download and unzip hadoop-0.17.2.1.tar.gz
2. download and unzip facebook-hive.tar.gz
3. copy hive to ./hadoop-0.17.2.1/src/contrib/hive
4. export
CLASSPATH=.:../../../../hadoop-0.17.2.1/hadoop-0.17.2.1-core.jar:$CLASSPATH
5. ant -Ddist.dir=hive_dist -Dtarget.dir=hive_target package
6. cp -rf hive_target ../../../../hadoop-0.17.2.1/contrib/hive
cp -rf hive_target ../../../../hive
2. developing & debug
1. create an Eclipse project
2. collect all hive-related..java into src directory under project.
3. collect all necessary third-part .jar into lib and set the library
setting in project.
4. modify and run this commnad:
java -classpath
.;./lib/antlr-3.0.1.jar;./lib/stringtemplate-3.1b1.jar;./lib/antlr-2.7.7.jar;./lib/antlr-runtime-3.0.1.jar
org.antlr.Tool -fo src/org/apache/hadoop/hive/ql/parse/
src/org/apache/hadoop/hive/ql/parse/Hive.g
5. refresh project, then you can find that this is a complted Hive
development enviroment.
3. execution
export HADOOP_HOME=/home/hadoop/setup/hadoop-release
./bin/hive -hiveconf hive.root.logger=INFO,console
4. hivefly
1. crate some data and format like these two table.
the resume, the number of records is 1024*1024*100.
the course table, the number of records is 1024*1024*100*3.
2. run these scripts and you may find that: the Hive system can not compute
out correct result.
you can debug the hive system on above we built development enviroment.
CREATE TABLE resume(id INT, name STRING, gender STRING, years INT, intro
STRING);
CREATE TABLE course(id INT, name STRING, course STRING, score INT, notes
STRING);
LOAD DATA LOCAL INPATH '/home/hadoop/john/hive/resume.txt' OVERWRITE INTO
TABLE resume;
LOAD DATA LOCAL INPATH '/home/hadoop/john/hive/course.txt' OVERWRITE INTO
TABLE course;
CREATE TABLE test00(name STRING, count INT);
INSERT OVERWRITE TABLE test00 SELECT t1.name,count(DISTINCT t1.name) FROM
course t1 GROUP BY t1.name;
> Hive as a contrib project
> -------------------------
>
> Key: HADOOP-3601
> URL: https://issues.apache.org/jira/browse/HADOOP-3601
> Project: Hadoop Core
> Issue Type: Wish
> Components: contrib/hive
> Affects Versions: 0.19.0
> Environment: N/A
> Reporter: Joydeep Sen Sarma
> Assignee: Ashish Thusoo
> Priority: Minor
> Fix For: 0.19.0
>
> Attachments: ant.log, hive.tgz, hive.tgz, hive.tgz, HiveTutorial.pdf
>
> Original Estimate: 1080h
> Remaining Estimate: 1080h
>
> Hive is a data warehouse built on top of flat files (stored primarily in
> HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables
> Hive's query language is executed using Hadoop map-reduce as the execution
> engine. Queries can use either single stage or multi-stage map-reduce. Hive
> has a native format for tables - but can handle any data set (for example
> json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and
> may use Apache Derby as an embedded database for MetaStore. Antlr has a BSD
> license and should be compatible with Apache license.
> We are currently thinking of contributing to the 0.17 branch as a contrib
> project (since that is the version under which it will get tested internally)
> - but looking for advice on the best release path.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.