Re: partitions not being created

Bill Graham Thu, 30 Jul 2009 13:09:40 -0700

I sent my last try reply before seeing your last email.

Thanks, that seems possible. I did initially create ApiUsageTemp using the
most recent Hive release. Then while working on a JIRA I updated my Hive
client and server to the more recent builds from the trunk.


If that could cause such a problem, this is troubling though, since it
implies that we can't upgrade Hive without possibly corrupting our metadata
store.

I'll try again from scratch though and see if it works, thanks.


On Thu, Jul 30, 2009 at 1:04 PM, Bill Graham <billgra...@gmail.com> wrote:

> Prasad,
>
> My setup is Hive client -> Hive Server (with local metastore) -> Hadoop. I
> was also suspecting metastore issues, so I've tried multiple times with
> newly created destination tables and I see the same thing happening.
>
> All of the log info I've been able to find I've included already in this
> thread. Let me know if there's anywhere else I could look for clues.
>
> I've included from the client:
> - /tmp/$USER/hive.log
>
> And from the hive server:
> - Stdout/err logs
> - /tmp/$USER/hive_job_log*.txt
>
> Is there anything else I should be looking at? All of the M/R logs don't
> show any exceptions anything suspect.
>
> Thanks for your time and insights on this issue, I appreciate it.
>
> thanks,
> Bill
>
>
> On Thu, Jul 30, 2009 at 11:57 AM, Prasad Chakka <pcha...@facebook.com>wrote:
>
>>  Bill,
>>
>> The real error is happening on the Hive Metastore Server or Hive Server
>>  (depending on the setup you are using). Error logs on it must have
>> different stack trace. From the information below I am guessing that the way
>> the destination table hdfs directories that got created has some problems.
>> Can you drop that table (and make sure that there is no corresponding HDFS
>> directory for both integer and string type partitions that you created) and
>> retry the query.
>>
>> If you don’t want to drop the destination table then send me the logs on
>> Hive Server.
>>
>> Prasad
>>
>>
>> ------------------------------
>> *From: *Bill Graham <billgra...@gmail.com>
>> *Reply-To: *<billgra...@gmail.com>
>> *Date: *Thu, 30 Jul 2009 11:47:41 -0700
>> *To: *Prasad Chakka <pcha...@facebook.com>
>> *Cc: *<hive-user@hadoop.apache.org>
>> *Subject: *Re: partitions not being created
>>
>> That file contains a similar error as the Hive Server logs:
>>
>> 2009-07-30 11:44:21,095 WARN  mapred.JobClient
>> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
>> for parsing the arguments. Applications should implement Tool for the same.
>> 2009-07-30 11:44:48,070 WARN  mapred.JobClient
>> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
>> for parsing the arguments. Applications should implement Tool for the same.
>> 2009-07-30 11:45:27,796 ERROR metadata.Hive (Hive.java:getPartition(588))
>> - org.apache.thrift.TApplicationException: get_partition failed: unknown
>> result
>>         at
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
>>         at
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
>>         at
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
>>         at
>> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
>>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
>>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
>>         at
>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
>>         at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
>>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>
>> 2009-07-30 11:45:27,797 ERROR exec.MoveTask
>> (SessionState.java:printError(279)) - Failed with exception
>> org.apache.thrift.TApplicationException: get_partition failed: unknown
>> result
>> org.apache.hadoop.hive.ql.metadata.HiveException:
>> org.apache.thrift.TApplicationException: get_partition failed: unknown
>> result
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
>>         at
>> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
>>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
>>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
>>         at
>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
>>         at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
>>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>> Caused by: org.apache.thrift.TApplicationException: get_partition failed:
>> unknown result
>>         at
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
>>         at
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
>>         at
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
>>         ... 16 more
>>
>> 2009-07-30 11:45:27,798 ERROR ql.Driver
>> (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1
>> from org.apache.hadoop.hive.ql.exec.MoveTask
>>
>> On Thu, Jul 30, 2009 at 11:33 AM, Prasad Chakka <pcha...@facebook.com>
>> wrote:
>>
>>
>> The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.
>>
>>
>> ------------------------------
>> *From: *Bill Graham <billgra...@gmail.com <http://billgra...@gmail.com> >
>> *Reply-To: *<billgra...@gmail.com <http://billgra...@gmail.com> >
>> *Date: *Thu, 30 Jul 2009 10:52:06 -0700
>> *To: *Prasad Chakka <pcha...@facebook.com <http://pcha...@facebook.com> >
>> *Cc: *<hive-user@hadoop.apache.org <http://hive-user@hadoop.apache.org>
>> >, Zheng Shao <zsh...@gmail.com <http://zsh...@gmail.com> >
>>
>> *Subject: *Re: partitions not being created
>>
>> I'm trying to set a string to a string and I'm seeing this error. I also
>> had an attempt where it was a string to an int, and I also saw the same
>> error.
>>
>> The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but
>> I've included it's output below. Only the Hive server logs show the
>> exceptions listed above. (Note that the table I'm loading from in this log
>> output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason
>> the data from ApiUsageTemp is now gone.)
>>
>> QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt =
>> "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate
>> = '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
>> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
>> TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
>> TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,
>> reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
>> TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local
>> map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242"
>> TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
>> TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30
>> 10:42:43,031 map = 40%,  reduce =0%"
>> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
>> Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job
>> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
>> Counters .Data-local map
>> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
>> Framework.Map input records:302,Map-Reduce Framework.Map input
>> bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
>> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
>> TIME="1248975763033"
>> TaskProgress ROWS_INSERTED="apiusage~1471"
>> TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%"
>> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
>> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
>> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
>> Counters .Data-local map
>> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
>> Framework.Map input records:1498,Map-Reduce Framework.Map input
>> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
>> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
>> TIME="1248975764071"
>> TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0"
>> TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%"
>> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
>> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
>> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
>> Counters .Data-local map
>> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
>> Framework.Map input records:1498,Map-Reduce Framework.Map input
>> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
>> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
>> TIME="1248975764199"
>> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask"
>> TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
>> TaskEnd TASK_RET_CODE="0"
>> TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4"
>> QUERY_ID="app_20090730104242" TIME="1248975782277"
>> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask"
>> TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
>> TaskEnd TASK_RET_CODE="1"
>> TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0"
>> QUERY_ID="app_20090730104242" TIME="1248975782473"
>> QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE
>> TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM
>> ApiUsageSmall WHERE requestDate = '2009/05/18'"
>> QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"
>>
>>
>>
>> On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pcha...@facebook.com <
>> http://pcha...@facebook.com> > wrote:
>>
>> Are you sure you are getting the same error even with the schema below
>> (i.e. trying to set a string to an int column?). Can you give the full stack
>> trace that you might see in /tmp/$USER/hive.log?
>>
>>
>> ------------------------------
>> *From: *Bill Graham <billgra...@gmail.com <http://billgra...@gmail.com>
>>  <http://billgra...@gmail.com> >
>> *Reply-To: *<hive-user@hadoop.apache.org <
>> http://hive-user@hadoop.apache.org>  <http://hive-user@hadoop.apache.org>
>> >, <billgra...@gmail.com <http://billgra...@gmail.com>  <
>> http://billgra...@gmail.com> >
>>
>> *Date: *Thu, 30 Jul 2009 10:02:54 -0700
>> *To: *Zheng Shao <zsh...@gmail.com <http://zsh...@gmail.com>  <
>> http://zsh...@gmail.com> >
>> *Cc: *<hive-user@hadoop.apache.org <http://hive-user@hadoop.apache.org>
>>  <http://hive-user@hadoop.apache.org> >
>>
>> *Subject: *Re: partitions not being created
>>
>>
>> Based on these describe statements, is what I'm trying to do feasable? I'm
>> basically trying to load the contents of ApiUsageTemp into ApiUsage, with
>> the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.
>>
>>
>> On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <billgra...@gmail.com <
>> http://billgra...@gmail.com>  <http://billgra...@gmail.com> > wrote:
>>
>> Sure. The only difference I see is that the ApiUsage has a dt partition,
>> instead of the requestdate column:
>>
>> hive> describe extended
>> ApiUsage;
>> OK
>> user    string
>> restresource    string
>> statuscode      int
>> requesthour     int
>> numrequests     string
>> responsetime    string
>> numslowrequests string
>> dt      string
>>
>> Detailed Table Information      Table(tableName:apiusage, dbName:default,
>> owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0,
>> sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
>> comment:null), FieldSchema(name:restresource, type:string, comment:null),
>> FieldSchema(name:statuscode, type:int, comment:null),
>> FieldSchema(name:requesthour, type:int, comment:null),
>> FieldSchema(name:numrequests, type:string, comment:null),
>> FieldSchema(name:responsetime, type:string, comment:null),
>> FieldSchema(name:numslowrequests, type:string, comment:null)],
>> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
>> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage>
>> , inputFormat:org.apache.hadoop.mapred.TextInputFormat,
>> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
>> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
>> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
>> parameters:{field.delim= , serialization.format= }), bucketCols:[],
>> sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt,
>> type:string, comment:null)], parameters:{})
>>
>> Time taken: 0.277 seconds
>> hive> describe extended ApiUsageTemp;
>> OK
>> user    string
>> restresource    string
>> statuscode      int
>> requestdate     string
>> requesthour     int
>> numrequests     string
>> responsetime    string
>> numslowrequests string
>>
>> Detailed Table Information      Table(tableName:apiusagetemp,
>> dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0,
>> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
>> comment:null), FieldSchema(name:restresource, type:string, comment:null),
>> FieldSchema(name:statuscode, type:int, comment:null),
>> FieldSchema(name:requestdate, type:string, comment:null),
>> FieldSchema(name:requesthour, type:int, comment:null),
>> FieldSchema(name:numrequests, type:string, comment:null),
>> FieldSchema(name:responsetime, type:string, comment:null),
>> FieldSchema(name:numslowrequests, type:string, comment:null)],
>> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
>> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage>
>> , inputFormat:org.apache.hadoop.mapred.TextInputFormat,
>> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
>> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
>> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
>> parameters:{field.delim= , serialization.format= }), bucketCols:[],
>> sortCols:[], parameters:{}), partitionKeys:[],
>> parameters:{last_modified_time=1248826696, last_modified_by=app})
>>
>> Time taken: 0.235 seconds
>>
>>
>>
>> On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zsh...@gmail.com <
>> http://zsh...@gmail.com>  <http://zsh...@gmail.com> > wrote:
>>
>> Can you send the output of these 2 commands?
>>
>> describe extended ApiUsage;
>> describe extended ApiUsageTemp;
>>
>>
>> Zheng
>>
>> On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<billgra...@gmail.com <
>> http://billgra...@gmail.com>  <http://billgra...@gmail.com> > wrote:
>> > Thanks for the tip, but it fails in the same way when I use a string.
>> >
>> > On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dler...@videoegg.com <
>> http://dler...@videoegg.com>  <http://dler...@videoegg.com> > wrote:
>> >>
>> >> >> hive> create table partTable (a string, b int) partitioned by (dt
>> int);
>> >>
>> >> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
>> >> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
>> >> > '2009/05/18'
>> >>
>> >> The table has an int partition column (dt), but you're trying to set a
>> >> string value (dt = "20090518").
>> >>
>> >> Try :
>> >>
>> >> create table partTable (a string, b int) partitioned by (dt string);
>> >>
>> >> and then do your insert.
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: partitions not being created

Reply via email to