Re: partitions not being created

Prasad Chakka Thu, 30 Jul 2009 11:34:02 -0700

The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.

________________________________
From: Bill Graham <[email protected]>
Reply-To: <[email protected]>
Date: Thu, 30 Jul 2009 10:52:06 -0700
To: Prasad Chakka <[email protected]>
Cc: <[email protected]>, Zheng Shao <[email protected]>
Subject: Re: partitions not being created


I'm trying to set a string to a string and I'm seeing this error. I also had an 
attempt where it was a string to an int, and I also saw the same error.

The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but I've 
included it's output below. Only the Hive server logs show the exceptions 
listed above. (Note that the table I'm loading from in this log output is 
ApiUsageSmall, which is identical to ApiUsageTemp. For some reason the data 
from ApiUsageTemp is now gone.)

QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = 
"20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate = 
'2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" 
TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,  reduce 
=0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="Job 
Counters .Launched map tasks:1,Job Counters .Data-local map tasks:1" 
TASK_ID="Stage-1" QUERY_ID="app_20090730104242" 
TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30 
10:42:43,031 map = 40%,  reduce =0%" 
TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File 
Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job 
Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job 
Counters .Data-local map 
tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
 Framework.Map input records:302,Map-Reduce Framework.Map input 
bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" 
QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" 
TIME="1248975763033"
TaskProgress ROWS_INSERTED="apiusage~1471" TASK_HADOOP_PROGRESS="2009-07-30 
10:42:44,068 map = 100%,  reduce =100%" 
TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File 
Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job 
Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job 
Counters .Data-local map 
tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
 Framework.Map input records:1498,Map-Reduce Framework.Map input 
bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" 
QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" 
TIME="1248975764071"
TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0" 
TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%" 
TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File 
Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job 
Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job 
Counters .Data-local map 
tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
 Framework.Map input records:1498,Map-Reduce Framework.Map input 
bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" 
QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" 
TIME="1248975764199"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" 
TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
TaskEnd TASK_RET_CODE="0" 
TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4" 
QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" 
QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskEnd TASK_RET_CODE="1" TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" 
TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782473"
QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE TABLE 
ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM 
ApiUsageSmall WHERE requestDate = '2009/05/18'" QUERY_ID="app_20090730104242" 
QUERY_NUM_TASKS="2" TIME="1248975782474"



On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <[email protected]> wrote:
Are you sure you are getting the same error even with the schema below (i.e. 
trying to set a string to an int column?). Can you give the full stack trace 
that you might see in /tmp/$USER/hive.log?


________________________________
From: Bill Graham <[email protected] <http://[email protected]> >
Reply-To: <[email protected] <http://[email protected]> >, 
<[email protected] <http://[email protected]> >
Date: Thu, 30 Jul 2009 10:02:54 -0700
To: Zheng Shao <[email protected] <http://[email protected]> >
Cc: <[email protected] <http://[email protected]> >
Subject: Re: partitions not being created


Based on these describe statements, is what I'm trying to do feasable? I'm 
basically trying to load the contents of ApiUsageTemp into ApiUsage, with the 
ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.


On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <[email protected] 
<http://[email protected]> > wrote:
Sure. The only difference I see is that the ApiUsage has a dt partition, 
instead of the requestdate column:

hive> describe extended ApiUsage;
OK
user    string
restresource    string
statuscode      int
requesthour     int
numrequests     string
responsetime    string
numslowrequests string
dt      string

Detailed Table Information      Table(tableName:apiusage, dbName:default, 
owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), 
FieldSchema(name:restresource, type:string, comment:null), 
FieldSchema(name:statuscode, type:int, comment:null), 
FieldSchema(name:requesthour, type:int, comment:null), 
FieldSchema(name:numrequests, type:string, comment:null), 
FieldSchema(name:responsetime, type:string, comment:null), 
FieldSchema(name:numslowrequests, type:string, comment:null)], 
location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage 
<http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{field.delim= , serialization.format= }), bucketCols:[], 
sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt, type:string, 
comment:null)], parameters:{})

Time taken: 0.277 seconds
hive> describe extended ApiUsageTemp;
OK
user    string
restresource    string
statuscode      int
requestdate     string
requesthour     int
numrequests     string
responsetime    string
numslowrequests string

Detailed Table Information      Table(tableName:apiusagetemp, dbName:default, 
owner:grahamb, createTime:1248466925, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), 
FieldSchema(name:restresource, type:string, comment:null), 
FieldSchema(name:statuscode, type:int, comment:null), 
FieldSchema(name:requestdate, type:string, comment:null), 
FieldSchema(name:requesthour, type:int, comment:null), 
FieldSchema(name:numrequests, type:string, comment:null), 
FieldSchema(name:responsetime, type:string, comment:null), 
FieldSchema(name:numslowrequests, type:string, comment:null)], 
location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage 
<http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{field.delim= , serialization.format= }), bucketCols:[], 
sortCols:[], parameters:{}), partitionKeys:[], 
parameters:{last_modified_time=1248826696, last_modified_by=app})

Time taken: 0.235 seconds



On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <[email protected] 
<http://[email protected]> > wrote:
Can you send the output of these 2 commands?

describe extended ApiUsage;
describe extended ApiUsageTemp;


Zheng

On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<[email protected] 
<http://[email protected]> > wrote:
> Thanks for the tip, but it fails in the same way when I use a string.
>
> On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <[email protected] 
> <http://[email protected]> > wrote:
>>
>> >> hive> create table partTable (a string, b int) partitioned by (dt int);
>>
>> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
>> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
>> > '2009/05/18'
>>
>> The table has an int partition column (dt), but you're trying to set a
>> string value (dt = "20090518").
>>
>> Try :
>>
>> create table partTable (a string, b int) partitioned by (dt string);
>>
>> and then do your insert.
>>
>
>



--
Yours,
Zheng

Re: partitions not being created

Reply via email to