Re: Hive JOIN fails if SELECT statement contains fields from the first table.

2012-01-16 Thread Mark Grover
Hi Bing,
Something seems wrong about your create table statements.
You are using "LOAD DATA LOCAL INPATH" to load data into Hive tables. This 
makes me think that the files /home/biadmin/hivetbl/student_details.txt and 
/home/biadmin/hivetbl/student_score.txt are on the local drive.
In such a case you want to copy them onto HDFS for use by Hive tables and if 
so, doing "LOAD DATA LOCAL INPATH" is the right thing to do.

However, when you create a table and specify the location
...STORED AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl'...
This location refers to the HDFS location. If you don't specify this, Hive will 
choose this by for you by default. However, if you do specify it, it's your 
responsibility to ensure that this location doesn't conflict with another Hive 
table.

Long story short, a leaf level directory in HDFS stores the partitions/buckets 
and contents of 1 table. In your create table statement, you were pointing both 
Hive tables to the same HDFS location /home/biadmin/hivetbl.

If you could make your two tables, point to different HDFS locations like, 
'/usr/hive/warehouse/student_score' and '/usr/hive/warehouse/student_details' 
and then do your load data inpath statements, that should fix the problem.

Good luck!
Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation

www: oanda.com www: fxtrade.com
e: mgro...@oanda.com

"Best Trading Platform" - World Finance's Forex Awards 2009.
"The One to Watch" - Treasury Today's Adam Smith Awards 2009.


- Original Message -
From: "Bing Li" 
To: dev@hive.apache.org, u...@hive.apache.org
Sent: Monday, January 16, 2012 5:06:00 AM
Subject: Hive JOIN fails if SELECT statement contains fields from the first 
table.


1. I create two Hive table:
Hive> CREATE EXTERNAL TABLE student_details (studentid INT,studentname 
STRING,age INT,gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED 
AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl';


Hive>CREATE EXTERNAL TABLE student_score(studentid INT, classid INT,score 
FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE 
LOCATION '/home/biadmin/hivetbl';


2. Load data
HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_details.txt' 
OVERWRITE INTO TABLE student_details;


HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_score.txt' OVERWRITE 
INTO TABLE student_score;


3. Run inner join
Hive> SELECT a.studentid,a.studentname,a.age,b.classid,b.score,c.classname FROM 
student_details a JOIN student_score b ON (a.studentid = b.studentid);


Result:
There are the following exception:
cannot find field studentname from [0:studentid, 1:classid, 2:score]


[My Question]: studentname is a field of the table student_details (The first 
table), why search it in the table student_score(the second table)?


log is like that;
... ...
2012-01-15 23:24:41,727 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
from attempt_201201152221_0014_m_00_3: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"studentid":106,"classid":null,"score":635.0}
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {"studentid":106,"classid":null,"score":635.0}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 4 more
Caused by: java.lang.RuntimeException: cannot find field studentname from 
[0:studentid, 1:classid, 2:score]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:345)
at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:200)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
... 5 more


Hive JOIN fails if SELECT statement contains fields from the first table.

2012-01-16 Thread Bing Li
1. I create two Hive table:
Hive> CREATE EXTERNAL TABLE student_details (studentid INT,studentname
STRING,age INT,gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE LOCATION  ‘/home/biadmin/hivetbl';

Hive>CREATE EXTERNAL TABLE student_score(studentid INT, classid INT,score
FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE
LOCATION '/home/biadmin/hivetbl';

2. Load data
HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_details.txt'
OVERWRITE INTO TABLE student_details;

HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_score.txt'
OVERWRITE INTO TABLE student_score;

3. Run inner join
Hive> SELECT a.studentid,a.studentname,a.age,b.classid,b.score,c.classname
FROM student_details a JOIN student_score b ON (a.studentid = b.studentid);

Result:
There are the following exception:
cannot find field studentname from [0:studentid, 1:classid, 2:score]

[My Question]: studentname is a field of the table student_details (The
first table), why search it in the table student_score(the second table)?

log is like that;
... ...
2012-01-15 23:24:41,727 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201201152221_0014_m_00_3: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"studentid":106,"classid":null,"score":635.0}
at
org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row {"studentid":106,"classid":null,"score":635.0}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at
org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 4 more
Caused by: java.lang.RuntimeException: cannot find field studentname from
[0:studentid, 1:classid, 2:score]
at
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:345)
at
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
at
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
at
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896)
at
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922)
at
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:200)
at
org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at
org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
... 5 more