Hi Bing,
Something seems wrong about your create table statements.
You are using "LOAD DATA LOCAL INPATH" to load data into Hive tables. This
makes me think that the files /home/biadmin/hivetbl/student_details.txt and
/home/biadmin/hivetbl/student_score.txt are on the local drive.
In such a case you want to copy them onto HDFS for use by Hive tables and if
so, doing "LOAD DATA LOCAL INPATH" is the right thing to do.
However, when you create a table and specify the location
...STORED AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl'...
This location refers to the HDFS location. If you don't specify this, Hive will
choose this by for you by default. However, if you do specify it, it's your
responsibility to ensure that this location doesn't conflict with another Hive
table.
Long story short, a leaf level directory in HDFS stores the partitions/buckets
and contents of 1 table. In your create table statement, you were pointing both
Hive tables to the same HDFS location /home/biadmin/hivetbl.
If you could make your two tables, point to different HDFS locations like,
'/usr/hive/warehouse/student_score' and '/usr/hive/warehouse/student_details'
and then do your load data inpath statements, that should fix the problem.
Good luck!
Mark
Mark Grover, Business Intelligence Analyst
OANDA Corporation
www: oanda.com www: fxtrade.com
e: mgro...@oanda.com
"Best Trading Platform" - World Finance's Forex Awards 2009.
"The One to Watch" - Treasury Today's Adam Smith Awards 2009.
- Original Message -
From: "Bing Li"
To: dev@hive.apache.org, u...@hive.apache.org
Sent: Monday, January 16, 2012 5:06:00 AM
Subject: Hive JOIN fails if SELECT statement contains fields from the first
table.
1. I create two Hive table:
Hive> CREATE EXTERNAL TABLE student_details (studentid INT,studentname
STRING,age INT,gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED
AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl';
Hive>CREATE EXTERNAL TABLE student_score(studentid INT, classid INT,score
FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE
LOCATION '/home/biadmin/hivetbl';
2. Load data
HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_details.txt'
OVERWRITE INTO TABLE student_details;
HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_score.txt' OVERWRITE
INTO TABLE student_score;
3. Run inner join
Hive> SELECT a.studentid,a.studentname,a.age,b.classid,b.score,c.classname FROM
student_details a JOIN student_score b ON (a.studentid = b.studentid);
Result:
There are the following exception:
cannot find field studentname from [0:studentid, 1:classid, 2:score]
[My Question]: studentname is a field of the table student_details (The first
table), why search it in the table student_score(the second table)?
log is like that;
... ...
2012-01-15 23:24:41,727 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201201152221_0014_m_00_3: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"studentid":106,"classid":null,"score":635.0}
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row {"studentid":106,"classid":null,"score":635.0}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 4 more
Caused by: java.lang.RuntimeException: cannot find field studentname from
[0:studentid, 1:classid, 2:score]
at
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:345)
at
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
at
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896)
at
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922)
at
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:200)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
... 5 more