Mapside join on non partitioned table with partitioned table causes error
-------------------------------------------------------------------------
Key: HIVE-1452
URL: https://issues.apache.org/jira/browse/HIVE-1452
Project: Hadoop Hive
Issue Type: Bug
Components: CLI
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Fix For: 0.6.0
I am running script which contains two tables, one is dynamically partitioned
and stored as RCFormat and the other is stored as TXT file.
The TXT file has around 397MB in size and has around 24million rows.
{code}
drop table joinquery;
create external table joinquery (
id string,
type string,
sec string,
num string,
url string,
cost string,
listinfo array <map<string,string>>
)
STORED AS TEXTFILE
LOCATION '/projects/joinquery';
CREATE EXTERNAL TABLE idtable20mil(
id string
)
STORED AS TEXTFILE
LOCATION '/projects/idtable20mil';
insert overwrite table joinquery
select
/*+ MAPJOIN(idtable20mil) */
rctable.id,
rctable.type,
rctable.map['sec'],
rctable.map['num'],
rctable.map['url'],
rctable.map['cost'],
rctable.listinfo
from rctable
JOIN idtable20mil on (rctable.id = idtable20mil.id)
where
rctable.id is not null and
rctable.part='value' and
rctable.subpart='value'and
rctable.pty='100' and
rctable.uniqid='1000'
order by id;
{code}
Result:
Possible error:
Data file split:string,part:string,subpart:string,subsubpart:string> is
corrupted.
Solution:
Replace file. i.e. by re-running the query that produced the source table /
partition.
-----
If I look at mapper logs.
{verbatim}
Caused by: java.io.IOException: java.io.EOFException
at
org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:109)
at
java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at
org.apache.hadoop.hive.ql.util.jdbm.htree.HashBucket.readExternal(HashBucket.java:284)
at
java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at
org.apache.hadoop.hive.ql.util.jdbm.helper.Serialization.deserialize(Serialization.java:106)
at
org.apache.hadoop.hive.ql.util.jdbm.helper.DefaultSerializer.deserialize(DefaultSerializer.java:106)
at
org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:360)
at
org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:332)
at
org.apache.hadoop.hive.ql.util.jdbm.htree.HashDirectory.get(HashDirectory.java:195)
at org.apache.hadoop.hive.ql.util.jdbm.htree.HTree.get(HTree.java:155)
at
org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.get(HashMapWrapper.java:114)
... 11 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at
java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2776)
at java.io.ObjectInputStream.readInt(ObjectInputStream.java:950)
at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153)
at
org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:98)
{verbatim}
I am trying to create a testcase, which can demonstrate this error.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.