[jira] Commented: (HIVE-1115) optimize combinehiveinputformat in presence of many partitions
[ https://issues.apache.org/jira/browse/HIVE-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861508#action_12861508 ] Matt Pestritto commented on HIVE-1115: -- Any eta on this issue for resolution ? There hasn't been any activity in a while and it would be significant performance increase in our environment. Thanks > optimize combinehiveinputformat in presence of many partitions > -- > > Key: HIVE-1115 > URL: https://issues.apache.org/jira/browse/HIVE-1115 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Paul Yang > > A query like : > select .. from T where ... > where T contains a very large number of partitions does not work very well > with CombineHiveInputFomat. > A pool is created per directory, which leads to a high number of mappers. > In case all partitions share the same operator tree, and the same partition > description, only a single pool should be created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-820) Describe Extended Line Breaks When Delimiter is \n
[ https://issues.apache.org/jira/browse/HIVE-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797150#action_12797150 ] Matt Pestritto commented on HIVE-820: - All - Do we have a decision on what you want the output to show ? A few different ideas were being thrown around. I would rather replace only characters that would break the output ( tab, \n ) with something meaningful vs, as Edward stated, always showing the octal representation which would require an ascii table to figure out what the delimiter is. If something is | ( pipe ) delimited, I always need to look it up when that is a printable character. I'll wait for feedback from the FB team and make the changes. Thanks. > Describe Extended Line Breaks When Delimiter is \n > -- > > Key: HIVE-820 > URL: https://issues.apache.org/jira/browse/HIVE-820 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.5.0 >Reporter: Matt Pestritto >Assignee: Matt Pestritto >Priority: Minor > Fix For: 0.5.0 > > Attachments: hive_820.patch > > > Tables defined delimited with \t and breaks using \n has output of describe > extended that is not contiguous. > Line.delim outputs an actual \n which breaks the display output so using the > hiveservice you have to do another FetchOne to get the rest of the line. > For example. > Original Output: > Detailed Table InformationTable(tableName:cobra_merchandise, > dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, > type:string, comment:null), FieldSchema(name:client_merch_type_tid, > type:string, comment:null), FieldSchema(name:description, type:string, > comment:null), FieldSchema(name:client_description, type:string, > comment:null), FieldSchema(name:price, type:string, comment:null), > FieldSchema(name:cost, type:string, comment:null), > FieldSchema(name:start_date, type:string, comment:null), > FieldSchema(name:end_date, type:string, comment:null)], > location:hdfs://mustique:9000/user/hive/warehouse/m, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=9,line.delim= > ,field.delim=}), bucketCols:[], sortCols:[], parameters:{}), > partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], > parameters:{}) > Proposed Output: > Detailed Table InformationTable(tableName:cobra_merchandise, > dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, > type:string, comment:null), FieldSchema(name:client_merch_type_tid, > type:string, comment:null), FieldSchema(name:description, type:string, > comment:null), FieldSchema(name:client_description, type:string, > comment:null), FieldSchema(name:price, type:string, comment:null), > FieldSchema(name:cost, type:string, comment:null), > FieldSchema(name:start_date, type:string, comment:null), > FieldSchema(name:end_date, type:string, comment:null)], > location:hdfs://mustique:9000/user/hive/warehouse/m, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=9,line.delim=,field.delim=}), > bucketCols:[], sortCols:[], parameters:{}), > partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], > parameters:{}) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-983) Function from_unixtime only takes Int. Override to support Long
[ https://issues.apache.org/jira/browse/HIVE-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Pestritto updated HIVE-983: Priority: Minor (was: Major) > Function from_unixtime only takes Int. Override to support Long > > > Key: HIVE-983 > URL: https://issues.apache.org/jira/browse/HIVE-983 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: Matt Pestritto >Priority: Minor > > UDFFromUnixTime.java only supports int. We have dates that are future dated > so they fail when it tries to parse. Can there be additional support for > LongWritable input parameter ? > We also have dates stored with milliseconds which blows up the integer > limitation. Long support will be helpful. > FAILED: Error in semantic analysis: line 1:7 Function Argument Type Mismatch > from_unixtime: Looking for UDF "from_unixtime" with parameters [class > org.apache.hadoop.io.LongWritable > ] > 09/12/14 11:42:10 ERROR ql.Driver: FAILED: Error in semantic analysis: line > 1:7 Function Argument Type Mismatch from_unixtime: Looking for UDF > "from_unixtime" with parameters [clas > s org.apache.hadoop.io.LongWritable] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-983) Function from_unixtime only takes Int. Override to support Long
Function from_unixtime only takes Int. Override to support Long Key: HIVE-983 URL: https://issues.apache.org/jira/browse/HIVE-983 Project: Hadoop Hive Issue Type: Improvement Reporter: Matt Pestritto UDFFromUnixTime.java only supports int. We have dates that are future dated so they fail when it tries to parse. Can there be additional support for LongWritable input parameter ? We also have dates stored with milliseconds which blows up the integer limitation. Long support will be helpful. FAILED: Error in semantic analysis: line 1:7 Function Argument Type Mismatch from_unixtime: Looking for UDF "from_unixtime" with parameters [class org.apache.hadoop.io.LongWritable ] 09/12/14 11:42:10 ERROR ql.Driver: FAILED: Error in semantic analysis: line 1:7 Function Argument Type Mismatch from_unixtime: Looking for UDF "from_unixtime" with parameters [clas s org.apache.hadoop.io.LongWritable] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Hive-74
Namit - I have tried hive-trunk as of this afternoon and hive release 814942 ( revision with CombineHiveInputFormat commit ) . Also - there are no logs that get generated on the tasktrackers for the hadoop job that fails. The only log that is generated on the jobtracker is the jobconf. Thanks -Matt On Thu, Oct 8, 2009 at 1:26 AM, Namit Jain wrote: > Hi Matt, > > Sorry for the late reply. > > hive> set > hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > > I tried it running on hadoop 20 and it ran fine for me. > > Which hive release are you using ? > > Also, you got a runtime error – can you see the stderr logs on the tracker > ? > > Thanks, > -namit > > > > On 10/1/09 5:01 PM, "Matt Pestritto" wrote: > > Namit - > Any idea on how to resolve ? > Thanks > > On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto > wrote: > > > There were errors in the hive.log > > > > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin > > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > > "org.eclipse.core.resources" but it cannot be resolved. > > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin > > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > > "org.eclipse.core.resources" but it cannot be resolved. > > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin > > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > > "org.eclipse.core.runtime" but it cannot be resolved. > > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin > > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > > "org.eclipse.core.runtime" but it cannot be resolved. > > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin > > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > > "org.eclipse.text" but it cannot be resolved. > > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin > > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > > "org.eclipse.text" but it cannot be resolved. > > 2009-10-01 10:40:57,143 WARN mapred.JobClient > > (JobClient.java:configureCommandLineOptions(539)) - Use > GenericOptionsParser > > for parsing the arguments. Applications should implement Tool for the > same. > > 2009-10-01 10:40:58,609 ERROR exec.ExecDriver > > (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 > with > > errors > > 2009-10-01 10:40:58,622 ERROR ql.Driver > (SessionState.java:printError(248)) > > - FAILED: Execution Error, return code 2 from > > org.apache.hadoop.hive.ql.exec.ExecDriver > > > > > > > > On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain wrote: > > > >> What you are doing seems OK ? > >> Can you get the stack trace from /tmp//hive.log ? > >> > >> > >> > >> > >> > >> -Original Message- > >> From: Matt Pestritto [mailto:m...@pestritto.com] > >> Sent: Wednesday, September 30, 2009 6:51 AM > >> To: hive-dev@hadoop.apache.org; hive-u...@hadoop.apache.org > >> Subject: Fwd: Hive-74 > >> > >> Including hive-user in case someone has any experience with this.. > >> Thanks > >> -Matt > >> > >> -- Forwarded message -- > >> From: Matt Pestritto > >> Date: Tue, Sep 29, 2009 at 5:26 PM > >> Subject: Hive-74 > >> To: hive-dev@hadoop.apache.org > >> > >> > >> Hi- > >> > >> I'm having a problem using CombineHiveInputSplit. I believe this was > >> patched in http://issues.apache.org/jira/browse/HIVE-74 > >> > >> I'm currently running hadoop 20.1 using hive trunk. > >> > >> hive-default.xml has the following property: > >> > >> hive.input.format > >> > >> The default input format, if it is not specified, the > system > >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and > >> 19, > >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can > >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it > can > >> always be manually set to HiveInputFormat. > >> > >> > >> I added the following to hive-site.xml: ( Notice, the description in > >> hive-default.xml has CombinedHiveInputFormat which does not work for me > - > >> the property value seems to be Combine(-d) ) >
Re: Hive-74
Namit - Any idea on how to resolve ? Thanks On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto wrote: > There were errors in the hive.log > > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.core.resources" but it cannot be resolved. > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.core.resources" but it cannot be resolved. > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.core.runtime" but it cannot be resolved. > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.core.runtime" but it cannot be resolved. > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.text" but it cannot be resolved. > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires > "org.eclipse.text" but it cannot be resolved. > 2009-10-01 10:40:57,143 WARN mapred.JobClient > (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser > for parsing the arguments. Applications should implement Tool for the same. > 2009-10-01 10:40:58,609 ERROR exec.ExecDriver > (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 with > errors > 2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248)) > - FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.ExecDriver > > > > On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain wrote: > >> What you are doing seems OK ? >> Can you get the stack trace from /tmp//hive.log ? >> >> >> >> >> >> -Original Message- >> From: Matt Pestritto [mailto:m...@pestritto.com] >> Sent: Wednesday, September 30, 2009 6:51 AM >> To: hive-dev@hadoop.apache.org; hive-u...@hadoop.apache.org >> Subject: Fwd: Hive-74 >> >> Including hive-user in case someone has any experience with this.. >> Thanks >> -Matt >> >> -- Forwarded message -- >> From: Matt Pestritto >> Date: Tue, Sep 29, 2009 at 5:26 PM >> Subject: Hive-74 >> To: hive-dev@hadoop.apache.org >> >> >> Hi- >> >> I'm having a problem using CombineHiveInputSplit. I believe this was >> patched in http://issues.apache.org/jira/browse/HIVE-74 >> >> I'm currently running hadoop 20.1 using hive trunk. >> >> hive-default.xml has the following property: >> >> hive.input.format >> >> The default input format, if it is not specified, the system >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and >> 19, >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can >> always be manually set to HiveInputFormat. >> >> >> I added the following to hive-site.xml: ( Notice, the description in >> hive-default.xml has CombinedHiveInputFormat which does not work for me - >> the property value seems to be Combine(-d) ) >> >> hive.input.format >> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat >> The default input format, if it is not specified, the system >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and >> 19, >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can >> always be manually set to HiveInputFormat. >> >> >> When I launch a job the cli exits immediately: >> hive> select count(1) from my_table; >> Total MapReduce jobs = 1 >> Number of reduce tasks determined at compile time: 1 >> In order to change the average load for a reducer (in bytes): >> set hive.exec.reducers.bytes.per.reducer= >> In order to limit the maximum number of reducers: >> set hive.exec.reducers.max= >> In order to set a constant number of reducers: >> set mapred.reduce.tasks= >> FAILED: Execution Error, return code 2 from >> org.apache.hadoop.hive.ql.exec.ExecDriver >> hive> exit ; >> >> If I set the property value to >> org.apache.hadoop.hive.ql.io.HiveInputFormat, >> the job runs fine. >> >> Suggestions ? Is there something that I am missing ? >> >> Thanks >> -Matt >> > >
Re: Hive-74
There were errors in the hive.log 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 2009-10-01 10:40:57,143 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-10-01 10:40:58,609 ERROR exec.ExecDriver (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 with errors 2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain wrote: > What you are doing seems OK ? > Can you get the stack trace from /tmp//hive.log ? > > > > > > -Original Message- > From: Matt Pestritto [mailto:m...@pestritto.com] > Sent: Wednesday, September 30, 2009 6:51 AM > To: hive-dev@hadoop.apache.org; hive-u...@hadoop.apache.org > Subject: Fwd: Hive-74 > > Including hive-user in case someone has any experience with this.. > Thanks > -Matt > > -- Forwarded message -- > From: Matt Pestritto > Date: Tue, Sep 29, 2009 at 5:26 PM > Subject: Hive-74 > To: hive-dev@hadoop.apache.org > > > Hi- > > I'm having a problem using CombineHiveInputSplit. I believe this was > patched in http://issues.apache.org/jira/browse/HIVE-74 > > I'm currently running hadoop 20.1 using hive trunk. > > hive-default.xml has the following property: > > hive.input.format > > The default input format, if it is not specified, the system > assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, > whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can > always overwrite it - if there is a bug in CombinedHiveInputFormat, it can > always be manually set to HiveInputFormat. > > > I added the following to hive-site.xml: ( Notice, the description in > hive-default.xml has CombinedHiveInputFormat which does not work for me - > the property value seems to be Combine(-d) ) > > hive.input.format > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat > The default input format, if it is not specified, the system > assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, > whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can > always overwrite it - if there is a bug in CombinedHiveInputFormat, it can > always be manually set to HiveInputFormat. > > > When I launch a job the cli exits immediately: > hive> select count(1) from my_table; > Total MapReduce jobs = 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.ExecDriver > hive> exit ; > > If I set the property value to > org.apache.hadoop.hive.ql.io.HiveInputFormat, > the job runs fine. > > Suggestions ? Is there something that I am missing ? > > Thanks > -Matt >
Fwd: Hive-74
Including hive-user in case someone has any experience with this.. Thanks -Matt -- Forwarded message -- From: Matt Pestritto Date: Tue, Sep 29, 2009 at 5:26 PM Subject: Hive-74 To: hive-dev@hadoop.apache.org Hi- I'm having a problem using CombineHiveInputSplit. I believe this was patched in http://issues.apache.org/jira/browse/HIVE-74 I'm currently running hadoop 20.1 using hive trunk. hive-default.xml has the following property: hive.input.format The default input format, if it is not specified, the system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can always overwrite it - if there is a bug in CombinedHiveInputFormat, it can always be manually set to HiveInputFormat. I added the following to hive-site.xml: ( Notice, the description in hive-default.xml has CombinedHiveInputFormat which does not work for me - the property value seems to be Combine(-d) ) hive.input.format org.apache.hadoop.hive.ql.io.CombineHiveInputFormat The default input format, if it is not specified, the system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can always overwrite it - if there is a bug in CombinedHiveInputFormat, it can always be manually set to HiveInputFormat. When I launch a job the cli exits immediately: hive> select count(1) from my_table; Total MapReduce jobs = 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver hive> exit ; If I set the property value to org.apache.hadoop.hive.ql.io.HiveInputFormat, the job runs fine. Suggestions ? Is there something that I am missing ? Thanks -Matt
Hive-74
Hi- I'm having a problem using CombineHiveInputSplit. I believe this was patched in http://issues.apache.org/jira/browse/HIVE-74 I'm currently running hadoop 20.1 using hive trunk. hive-default.xml has the following property: hive.input.format The default input format, if it is not specified, the system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can always overwrite it - if there is a bug in CombinedHiveInputFormat, it can always be manually set to HiveInputFormat. I added the following to hive-site.xml: ( Notice, the description in hive-default.xml has CombinedHiveInputFormat which does not work for me - the property value seems to be Combine(-d) ) hive.input.format org.apache.hadoop.hive.ql.io.CombineHiveInputFormat The default input format, if it is not specified, the system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can always overwrite it - if there is a bug in CombinedHiveInputFormat, it can always be manually set to HiveInputFormat. When I launch a job the cli exits immediately: hive> select count(1) from my_table; Total MapReduce jobs = 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver hive> exit ; If I set the property value to org.apache.hadoop.hive.ql.io.HiveInputFormat, the job runs fine. Suggestions ? Is there something that I am missing ? Thanks -Matt
[jira] Updated: (HIVE-851) Thrift Client: BaseException.message deprecation warning in Python 2.6
[ https://issues.apache.org/jira/browse/HIVE-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Pestritto updated HIVE-851: Attachment: hive_851.patch Note. This patches two files: service/lib/py/thrift/Thrift.py and service/src/gen-py/hive/ttypes.py It looks like ttypes.py is generated automatically. I'm not sure where that comes from so you may not want to patch that file. Thanks > Thrift Client: BaseException.message deprecation warning in Python 2.6 > -- > > Key: HIVE-851 > URL: https://issues.apache.org/jira/browse/HIVE-851 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Matt Pestritto >Priority: Minor > Attachments: hive_851.patch > > > In Python 2.6 BaseException.message has been deprecated. This is a patch to > remove these warnings. > src/thrift/Thrift.py:62: DeprecationWarning: BaseException.message has been > deprecated as of Python 2.6 > self.message = message > Also note. I could only replicate this error in two clases and there were > other classes that inherited (Exception). > Patch is only attached for those two classes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-851) Thrift Client: BaseException.message deprecation warning in Python 2.6
Thrift Client: BaseException.message deprecation warning in Python 2.6 -- Key: HIVE-851 URL: https://issues.apache.org/jira/browse/HIVE-851 Project: Hadoop Hive Issue Type: Bug Components: Clients Reporter: Matt Pestritto Priority: Minor In Python 2.6 BaseException.message has been deprecated. This is a patch to remove these warnings. src/thrift/Thrift.py:62: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 self.message = message Also note. I could only replicate this error in two clases and there were other classes that inherited (Exception). Patch is only attached for those two classes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: vote for release candidate for hive
Please disregard. I found the cause of my error. Thanks. On Thu, Sep 17, 2009 at 3:09 PM, Matt Pestritto wrote: > I recently switched to the 0.4 branch to do some testing and I'm running > into a problem. > > When I run a query from the cli - the first one works, but the second query > always fails with a NullPointerException. > > Did anyone else run into this ? > > Thanks > -Matt > > hive> select count(1) from table1; > Total MapReduce jobs = 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > Starting Job = job_200909171501_0001, Tracking URL = > http://mustique:50030/jobdetails.jsp?jobid=job_200909171501_0001 > Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job > -Dmapred.job.tracker=mustique:9001 -kill job_200909171501_0001 > 2009-09-17 03:05:54,855 map = 0%, reduce =0% > 2009-09-17 03:06:02,895 map = 22%, reduce =0% > 2009-09-17 03:06:06,933 map = 44%, reduce =0% > 2009-09-17 03:06:11,965 map = 67%, reduce =0% > 2009-09-17 03:06:15,988 map = 89%, reduce =0% > 2009-09-17 03:06:20,009 map = 100%, reduce =0% > 2009-09-17 03:06:25,036 map = 100%, reduce =11% > 2009-09-17 03:06:30,054 map = 100%, reduce =15% > 2009-09-17 03:06:31,063 map = 100%, reduce =22% > 2009-09-17 03:06:34,075 map = 100%, reduce =26% > 2009-09-17 03:06:36,101 map = 100%, reduce =100% > Ended Job = job_200909171501_0001 > OK > 274087 > Time taken: 45.401 seconds > hive> select count(1) from table1; > Total MapReduce jobs = 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:154) > at > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:373) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:155) > at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) > Job Submission failed with exception > 'java.lang.RuntimeException(java.lang.NullPointerException)' > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.ExecDriver > hive> > > > On Thu, Sep 17, 2009 at 12:36 PM, Namit Jain wrote: > >> https://issues.apache.org/jira/browse/HIVE-838 >> >> is a blocker for 0.4 - >> Once this is merged, I will have another release candidate >> >> >> -Original Message- >> From: Johan Oskarsson [mailto:jo...@oskarsson.nu] >> Sent: Wednesday, September 16, 2009 8:29 AM >> To: hive-dev@hadoop.apache.org >> Subject: Re: vote for release candidate for hive >> >> +1 based on running unit tests. >> >> /Johan >> >> Namit Jain wrote: >> > Sorry, was meant for hive-dev@ >> > >> > From: Namit Jain [mailto:nj...@facebook.com] >> > Sent: Tuesday, September 15, 2009 1:30 PM >> > To: hive-u...@hadoop.apache.org >> > Subject: vote for release candidate for hive >> > >> > >> > I have created another release candidate for Hive. >> > >> > >> > >> > https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/ >> > >> > >> > >> > >> > >> > Let me know if it is OK to publish this release candidate. >> > >> > >> > >> > The only change from the previous candidate ( >> https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/) is >> the fix for >> > >> > https://issues.apache.org/jira/browse/HIVE-718 >> > >> > >> > >> > >> > >> > >> > >> > Thanks, >> > >> > -namit >> > >> > >> > >> > >> >> >
Re: vote for release candidate for hive
I recently switched to the 0.4 branch to do some testing and I'm running into a problem. When I run a query from the cli - the first one works, but the second query always fails with a NullPointerException. Did anyone else run into this ? Thanks -Matt hive> select count(1) from table1; Total MapReduce jobs = 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_200909171501_0001, Tracking URL = http://mustique:50030/jobdetails.jsp?jobid=job_200909171501_0001 Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=mustique:9001 -kill job_200909171501_0001 2009-09-17 03:05:54,855 map = 0%, reduce =0% 2009-09-17 03:06:02,895 map = 22%, reduce =0% 2009-09-17 03:06:06,933 map = 44%, reduce =0% 2009-09-17 03:06:11,965 map = 67%, reduce =0% 2009-09-17 03:06:15,988 map = 89%, reduce =0% 2009-09-17 03:06:20,009 map = 100%, reduce =0% 2009-09-17 03:06:25,036 map = 100%, reduce =11% 2009-09-17 03:06:30,054 map = 100%, reduce =15% 2009-09-17 03:06:31,063 map = 100%, reduce =22% 2009-09-17 03:06:34,075 map = 100%, reduce =26% 2009-09-17 03:06:36,101 map = 100%, reduce =100% Ended Job = job_200909171501_0001 OK 274087 Time taken: 45.401 seconds hive> select count(1) from table1; Total MapReduce jobs = 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:154) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:373) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:155) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) Job Submission failed with exception 'java.lang.RuntimeException(java.lang.NullPointerException)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver hive> On Thu, Sep 17, 2009 at 12:36 PM, Namit Jain wrote: > https://issues.apache.org/jira/browse/HIVE-838 > > is a blocker for 0.4 - > Once this is merged, I will have another release candidate > > > -Original Message- > From: Johan Oskarsson [mailto:jo...@oskarsson.nu] > Sent: Wednesday, September 16, 2009 8:29 AM > To: hive-dev@hadoop.apache.org > Subject: Re: vote for release candidate for hive > > +1 based on running unit tests. > > /Johan > > Namit Jain wrote: > > Sorry, was meant for hive-dev@ > > > > From: Namit Jain [mailto:nj...@facebook.com] > > Sent: Tuesday, September 15, 2009 1:30 PM > > To: hive-u...@hadoop.apache.org > > Subject: vote for release candidate for hive > > > > > > I have created another release candidate for Hive. > > > > > > > > https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/ > > > > > > > > > > > > Let me know if it is OK to publish this release candidate. > > > > > > > > The only change from the previous candidate ( > https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/) is > the fix for > > > > https://issues.apache.org/jira/browse/HIVE-718 > > > > > > > > > > > > > > > > Thanks, > > > > -namit > > > > > > > > > >
[jira] Commented: (HIVE-820) Describe Extended Line Breaks When Delimiter is \n
[ https://issues.apache.org/jira/browse/HIVE-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753629#action_12753629 ] Matt Pestritto commented on HIVE-820: - Edward - I made this suggested change and it did not work. For the LF, the output still breaks and two fetches have to be done to get the extended plan. The 054 did not display anything. I also tried escaping the backslash and just a 054 and 012 were printed. Would you prefer that notation ? 054 and 012 with no \ Thanks -Matt > Describe Extended Line Breaks When Delimiter is \n > -- > > Key: HIVE-820 > URL: https://issues.apache.org/jira/browse/HIVE-820 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.5.0 > Reporter: Matt Pestritto >Assignee: Matt Pestritto >Priority: Minor > Fix For: 0.5.0 > > Attachments: hive_820.patch > > > Tables defined delimited with \t and breaks using \n has output of describe > extended that is not contiguous. > Line.delim outputs an actual \n which breaks the display output so using the > hiveservice you have to do another FetchOne to get the rest of the line. > For example. > Original Output: > Detailed Table InformationTable(tableName:cobra_merchandise, > dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, > type:string, comment:null), FieldSchema(name:client_merch_type_tid, > type:string, comment:null), FieldSchema(name:description, type:string, > comment:null), FieldSchema(name:client_description, type:string, > comment:null), FieldSchema(name:price, type:string, comment:null), > FieldSchema(name:cost, type:string, comment:null), > FieldSchema(name:start_date, type:string, comment:null), > FieldSchema(name:end_date, type:string, comment:null)], > location:hdfs://mustique:9000/user/hive/warehouse/m, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=9,line.delim= > ,field.delim=}), bucketCols:[], sortCols:[], parameters:{}), > partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], > parameters:{}) > Proposed Output: > Detailed Table InformationTable(tableName:cobra_merchandise, > dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, > type:string, comment:null), FieldSchema(name:client_merch_type_tid, > type:string, comment:null), FieldSchema(name:description, type:string, > comment:null), FieldSchema(name:client_description, type:string, > comment:null), FieldSchema(name:price, type:string, comment:null), > FieldSchema(name:cost, type:string, comment:null), > FieldSchema(name:start_date, type:string, comment:null), > FieldSchema(name:end_date, type:string, comment:null)], > location:hdfs://mustique:9000/user/hive/warehouse/m, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=9,line.delim=,field.delim=}), > bucketCols:[], sortCols:[], parameters:{}), > partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], > parameters:{}) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-80) Allow Hive Server to run multiple queries simulteneously
[ https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Pestritto updated HIVE-80: --- Attachment: (was: hive_820.patch) > Allow Hive Server to run multiple queries simulteneously > > > Key: HIVE-80 > URL: https://issues.apache.org/jira/browse/HIVE-80 > Project: Hadoop Hive > Issue Type: Improvement > Components: Server Infrastructure >Reporter: Raghotham Murthy >Assignee: Neil Conway >Priority: Critical > Fix For: 0.4.0 > > Attachments: hive_input_format_race-2.patch, > org.apache.hadoop.hive.ql.exec.Utilities-ThreadLocal.patch > > > Can use one driver object per query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-820) Describe Extended Line Breaks When Delimiter is \n
[ https://issues.apache.org/jira/browse/HIVE-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Pestritto updated HIVE-820: Attachment: hive_820.patch Patch Attached. > Describe Extended Line Breaks When Delimiter is \n > -- > > Key: HIVE-820 > URL: https://issues.apache.org/jira/browse/HIVE-820 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Reporter: Matt Pestritto >Priority: Minor > Attachments: hive_820.patch > > > Tables defined delimited with \t and breaks using \n has output of describe > extended that is not contiguous. > Line.delim outputs an actual \n which breaks the display output so using the > hiveservice you have to do another FetchOne to get the rest of the line. > For example. > Original Output: > Detailed Table InformationTable(tableName:cobra_merchandise, > dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, > type:string, comment:null), FieldSchema(name:client_merch_type_tid, > type:string, comment:null), FieldSchema(name:description, type:string, > comment:null), FieldSchema(name:client_description, type:string, > comment:null), FieldSchema(name:price, type:string, comment:null), > FieldSchema(name:cost, type:string, comment:null), > FieldSchema(name:start_date, type:string, comment:null), > FieldSchema(name:end_date, type:string, comment:null)], > location:hdfs://mustique:9000/user/hive/warehouse/m, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=9,line.delim= > ,field.delim=}), bucketCols:[], sortCols:[], parameters:{}), > partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], > parameters:{}) > Proposed Output: > Detailed Table InformationTable(tableName:cobra_merchandise, > dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, > type:string, comment:null), FieldSchema(name:client_merch_type_tid, > type:string, comment:null), FieldSchema(name:description, type:string, > comment:null), FieldSchema(name:client_description, type:string, > comment:null), FieldSchema(name:price, type:string, comment:null), > FieldSchema(name:cost, type:string, comment:null), > FieldSchema(name:start_date, type:string, comment:null), > FieldSchema(name:end_date, type:string, comment:null)], > location:hdfs://mustique:9000/user/hive/warehouse/m, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=9,line.delim=,field.delim=}), > bucketCols:[], sortCols:[], parameters:{}), > partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], > parameters:{}) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-80) Allow Hive Server to run multiple queries simulteneously
[ https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Pestritto updated HIVE-80: --- Attachment: hive_820.patch Patch Attached. > Allow Hive Server to run multiple queries simulteneously > > > Key: HIVE-80 > URL: https://issues.apache.org/jira/browse/HIVE-80 > Project: Hadoop Hive > Issue Type: Improvement > Components: Server Infrastructure >Reporter: Raghotham Murthy >Assignee: Neil Conway >Priority: Critical > Fix For: 0.4.0 > > Attachments: hive_input_format_race-2.patch, > org.apache.hadoop.hive.ql.exec.Utilities-ThreadLocal.patch > > > Can use one driver object per query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-820) Describe Extended Line Breaks When Delimiter is \n
Describe Extended Line Breaks When Delimiter is \n -- Key: HIVE-820 URL: https://issues.apache.org/jira/browse/HIVE-820 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Matt Pestritto Priority: Minor Tables defined delimited with \t and breaks using \n has output of describe extended that is not contiguous. Line.delim outputs an actual \n which breaks the display output so using the hiveservice you have to do another FetchOne to get the rest of the line. For example. Original Output: Detailed Table InformationTable(tableName:cobra_merchandise, dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, type:string, comment:null), FieldSchema(name:client_merch_type_tid, type:string, comment:null), FieldSchema(name:description, type:string, comment:null), FieldSchema(name:client_description, type:string, comment:null), FieldSchema(name:price, type:string, comment:null), FieldSchema(name:cost, type:string, comment:null), FieldSchema(name:start_date, type:string, comment:null), FieldSchema(name:end_date, type:string, comment:null)], location:hdfs://mustique:9000/user/hive/warehouse/m, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=9,line.delim= ,field.delim=}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], parameters:{}) Proposed Output: Detailed Table InformationTable(tableName:cobra_merchandise, dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, type:string, comment:null), FieldSchema(name:client_merch_type_tid, type:string, comment:null), FieldSchema(name:description, type:string, comment:null), FieldSchema(name:client_description, type:string, comment:null), FieldSchema(name:price, type:string, comment:null), FieldSchema(name:cost, type:string, comment:null), FieldSchema(name:start_date, type:string, comment:null), FieldSchema(name:end_date, type:string, comment:null)], location:hdfs://mustique:9000/user/hive/warehouse/m, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=9,line.delim=,field.delim=}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], parameters:{}) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Describe Extended - Replace Tab and LF
Hi. I was wondering if you could replace the Tab and LF to a string and in the describe extended output ? I have tables defined delimited with \t and breaks using \n so the output of describe extended is not contiguous. Minor patch below. Feel free to use if you want to. For example. Note Line.delim outputs an actual \n which breaks the display output so using the hiveservice you have to do another FetchOne to get the rest of the line. Detailed Table InformationTable(tableName:cobra_merchandise, dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, type:string, comment:null), FieldSchema(name:client_merch_type_tid, type:string, comment:null), FieldSchema(name:description, type:string, comment:null), FieldSchema(name:client_description, type:string, comment:null), FieldSchema(name:price, type:string, comment:null), FieldSchema(name:cost, type:string, comment:null), FieldSchema(name:start_date, type:string, comment:null), FieldSchema(name:end_date, type:string, comment:null)], location:hdfs://mustique:9000/user/hive/warehouse/m, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=9,*line.delim=,field.delim=*}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], parameters:{}) Detailed Table InformationTable(tableName:cobra_merchandise, dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, type:string, comment:null), FieldSchema(name:client_merch_type_tid, type:string, comment:null), FieldSchema(name:description, type:string, comment:null), FieldSchema(name:client_description, type:string, comment:null), FieldSchema(name:price, type:string, comment:null), FieldSchema(name:cost, type:string, comment:null), FieldSchema(name:start_date, type:string, comment:null), FieldSchema(name:end_date, type:string, comment:null)], location:hdfs://mustique:9000/user/hive/warehouse/m, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=9,*line.delim= ,field.delim=*}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], parameters:{}) Patch File: Index: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java === --- ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java(revision 812724) +++ ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java(working copy) @@ -588,7 +588,7 @@ // show table information outStream.writeBytes("Detailed Table Information"); outStream.write(separator); -outStream.writeBytes(tbl.getTTable().toString()); + outStream.writeBytes(tbl.getTTable().toString().replaceAll("\n", "").replaceAll("\t", "")); outStream.write(separator); // comment column is empty outStream.write(terminator); Thanks -Matt
Re: [jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables
Hi All. I see a lot of good work being done on HBase/Hive integration especially around how to express hbase metadata in hive and how to load data from/to hbase/hive. Has any thought be been put into how to use HBase data as lookup data in a query and not load all of the data as a normal hive query ? My use case is as follows: I have a table < users > with 50m users. I have a 5gb daily clickstream file that only touchs 150k of those users on a daily basis. It would be much more efficient if I didn't have to load all of the data in HBase to a hive table and write a traditional hive query but just do 150k lookups in the map ( or reduce ) phase of the MR job. If the hbase lookups were done in realtime it would be much faster than sourcing the original user table with 50m rows. Thoughts ? Thanks -Matt On Sun, Aug 23, 2009 at 8:20 AM, Samuel Guo (JIRA) wrote: > >[ > https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746592#action_12746592] > > Samuel Guo commented on HIVE-705: > - > > Attach a new patch. > > 1) move the related hbase code to the contrib package, as hbase just an > optional storage for hive, not neccessary. > I have tried to avoid modifying the hive original code and just add a hbase > serde to connect hive with hbase. But the hbase storage model is quite > different with file storage model. For example, a loadwork is used to > rename/copy files from temp dir to the target table's dir if a query's > target is a hive table. But in a hbased hive table, we can't rename a table > now. So it's hard to let a hbased hive table to follow the logic of a normal > file-based hive table. So I add some code(HiveFormatUtils) to distinguish a > file-based table from a not-file-based table. > > 2) fix some bugs in the draft patch, such as "select *" return nothing. > > > -- > > How to use the hbase as hive's storage? > > 1) remember to add the contrib jar and the hbase jar in the hive's auxPath, > so m/r can populate the neccessary hbase-related jars to the whole hadoop > m/r cluster. > > > $HIVE_HOME/bin/hive -auxPath ${contrib_jar},${hbase_jar} > > 2) modify the configuration to add the following configuration parameters. > > "hbase.master" : pointer to the hbase's master. > "hive.othermetadata.handlers" : > "org.apache.hadoop.hive.contrib.hbase.HiveHBaseTableInputFormat:org.apache.hadoop.hive.contrib.hbase.HBaseMetadataHandler" > > "hive.othermetadata.handlers" collects the metadata handlers to handle the > other metadata operations in the not-file-based hive tables. Take hbase as > an example. HBaseMetadataHandler will create the neccessary hbase table and > its family columns when we create a hbased hive table from hive's client. It > also drop the hbase table when we drop the hive table. > > The metastore read the registered handlers map from the configuration file > during initialization. The registered handlers map is formated as > "table_format_classname:table_metadata_handler_classname,table_format_classname:table_metadata_handler_classname,...". > > 3) enjoy "hive over hbase"! > > > > Other problems. > > 1) Altering a hased-hive table is not supported now. :( > renaming a table in hbase is not supported now, so I just do not support > rename operation. ( maybe if we rename a hive table, we do not need to > rename the base hbase table.) > > adding/replacing cloumns. > Now we need to specify the schema mapping in the SerDe properties > explicitly. If we want to adding columns, we need to call 'alter' twice to > adding columns: change the serde properties and the hive columns. Either > change the serde properties first or change the hive columns first will fail > now, because we validate the schema mapping during SerDe initialization. One > of the hbase serde validation is to check the counts of hive columns and > hbase mapping columns. If we first change the hive columns, the number of > hive columns will be more than hbase mapping columns, the HBase Serde > initialization will fail this alter operation. (maybe we need to remove the > validation code from HBaseSerDe initialization and do it in other place?) > > 2) more flexible schema mapping? > As Schubert metioned before, more flexible schema mapping will be useful > for user. This feature will be added later. > > > welcome for comments~ > > > > > > Let Hive can analyse hbase's tables > > --- > > > > Key: HIVE-705 > > URL: https://issues.apache.org/jira/browse/HIVE-705 > > Project: Hadoop Hive > > Issue Type: New Feature > >Reporter: Samuel Guo > > Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, > HIVE-705_draft.patch, HIVE-705_revision806905.pa
Re: Problem with Thrift Server Concurrency
Ok. Thanks for the update. I'll watch that issue. On Tue, Jul 28, 2009 at 1:57 PM, Prasad Chakka wrote: > This is a known issue in Hive Server. This is because the same metastore > client is being used to issue both queries and JDBC does not like that. We > should use thread specific or session specific metastore clients but I don't > think Hive Server is doing that right now. HIVE-584 is supposed to fix this > issue. > > ____ > From: Matt Pestritto > Reply-To: > Date: Tue, 28 Jul 2009 10:48:24 -0700 > To: > Subject: Problem with Thrift Server Concurrency > > Hi all > > Does the Thrift server support concurrency ? I'm having a problem that > only > happens if I fire off multiple ( 2+ ) DML queries at the same time. > Randomly, one of the queries will succeed but the other will fail with the > following error I pulled from the hiveserver output: > > java.io.IOException: cannot find dir = > hdfs://mustique:9000/user/hadoop/mantis-output/mantis-job/20090601 in > partToPartitionInfo! >at > > org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputFormat.java:311) >at > > org.apache.hadoop.hive.ql.io.HiveInputFormat.validateInput(HiveInputFormat.java:288) >at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:735) >at > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:388) >at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:357) >at org.apache.hadoop.hive.ql.Driver.run(Driver.java:263) >at > > org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:108) >at > > org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:302) >at > > org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:290) >at > > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252) >at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >at java.lang.Thread.run(Thread.java:619) > > If I execute the queries via thrift a few seconds apart from each other, it > succeeds. It only seems to fail if the queries start at about the same > time. > > When I run the same two queries using *hive -e "query 1" & hive -e "query > 2" > * is also works fine. > > Any ideas ? > > Thanks > -Matt > >
Problem with Thrift Server Concurrency
Hi all Does the Thrift server support concurrency ? I'm having a problem that only happens if I fire off multiple ( 2+ ) DML queries at the same time. Randomly, one of the queries will succeed but the other will fail with the following error I pulled from the hiveserver output: java.io.IOException: cannot find dir = hdfs://mustique:9000/user/hadoop/mantis-output/mantis-job/20090601 in partToPartitionInfo! at org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputFormat.java:311) at org.apache.hadoop.hive.ql.io.HiveInputFormat.validateInput(HiveInputFormat.java:288) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:735) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:388) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:357) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:263) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:108) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:302) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:290) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) If I execute the queries via thrift a few seconds apart from each other, it succeeds. It only seems to fail if the queries start at about the same time. When I run the same two queries using *hive -e "query 1" & hive -e "query 2" * is also works fine. Any ideas ? Thanks -Matt
Re: Error on Load into multiple Partitions
Namit. I just Updated to revision 794686 and that worked. It looks like Zheng committed this patch in the afternoon and this failed for me earlier that morning. Bad luck on my timing but I'm happy it works now. Thanks. -Matt On Thu, Jul 16, 2009 at 10:09 AM, Namit Jain wrote: > Most probably, this is the same as > > https://issues.apache.org/jira/browse/HIVE-636 > > which was merged just a days back. Can you try on the latest trunk ? > > > > > On 7/16/09 6:45 AM, "Matt Pestritto" wrote: > > Does anyone have any idea as to the reason for this error ? > > Thanks in Advance > -Matt > > -- Forwarded message -- > From: Matt Pestritto > Date: Wed, Jul 15, 2009 at 10:09 AM > Subject: Error on Load into multiple Partitions > To: hive-dev@hadoop.apache.org > > > Hi All. > > Are there are existing test cases that load into multiple partitions using > a > single from query? This query worked in an older revision but the mappers > fails when I run on trunk: > > java.lang.RuntimeException: Map operator initialization failed > >at > org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) >at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) >at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) >at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) > > Caused by: java.lang.NullPointerException >at > org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176) >at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204) > >at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264) >at > org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103) > > > Here is a simplified version of what I'm running and DDL to support: > *create table test_m ( client int, description string ) > row format delimited fields terminated by '\011' lines terminated by > '\012' stored as textfile; > * > *create table test_m_p ( description string ) > partitioned by ( client int ) row format delimited fields terminated by > '\011' lines terminated by '\012' stored as textfile; > * > *LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m ; * > > *FROM test_m > INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description > where client=1 > INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description > where client=2 ; > * > --- contents of /tmp/m.lst > 1test > 1test2 > 1test3 > 2hi > 2hi1 > 2hi3 > > Thanks! > -Matt > >
Fwd: Error on Load into multiple Partitions
Does anyone have any idea as to the reason for this error ? Thanks in Advance -Matt -- Forwarded message -- From: Matt Pestritto Date: Wed, Jul 15, 2009 at 10:09 AM Subject: Error on Load into multiple Partitions To: hive-dev@hadoop.apache.org Hi All. Are there are existing test cases that load into multiple partitions using a single from query? This query worked in an older revision but the mappers fails when I run on trunk: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103) Here is a simplified version of what I'm running and DDL to support: *create table test_m ( client int, description string ) row format delimited fields terminated by '\011' lines terminated by '\012' stored as textfile; * *create table test_m_p ( description string ) partitioned by ( client int ) row format delimited fields terminated by '\011' lines terminated by '\012' stored as textfile; * *LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m ; * *FROM test_m INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description where client=1 INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description where client=2 ; * --- contents of /tmp/m.lst 1test 1test2 1test3 2hi 2hi1 2hi3 Thanks! -Matt
Error on Load into multiple Partitions
Hi All. Are there are existing test cases that load into multiple partitions using a single from query? This query worked in an older revision but the mappers fails when I run on trunk: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103) Here is a simplified version of what I'm running and DDL to support: *create table test_m ( client int, description string ) row format delimited fields terminated by '\011' lines terminated by '\012' stored as textfile; * *create table test_m_p ( description string ) partitioned by ( client int ) row format delimited fields terminated by '\011' lines terminated by '\012' stored as textfile; * *LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m ; * *FROM test_m INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description where client=1 INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description where client=2 ; * --- contents of /tmp/m.lst 1test 1test2 1test3 2hi 2hi1 2hi3 Thanks! -Matt
Re: Stdout -> Stderr
Ok. Thanks for the info. I have a java wrapper written around hive -e that threw an exception when the error stream had any data. I changed the wrapper to check for a non 0 error status returned from the process instead of looking at the error stream and that works fine now. Thanks. On Mon, Jun 1, 2009 at 4:33 PM, Raghu Murthy wrote: > Only info messages are being written to stderr. The actual data should be > written to stdout. See https://issues.apache.org/jira/browse/HIVE-505 > > > On 6/1/09 1:29 PM, "Matt Pestritto" wrote: > > > Hi All. > > > > It seems like the latest trunk is writing stdout to stderr on a hive -e > > call. Is this the intended functionality ? > > > > hive -e "query 1; query 2; query 3; " 2> errors.out > > > > errors.out has stdout. stdout has no output. > > > > Thanks > > -Matt > >
Stdout -> Stderr
Hi All. It seems like the latest trunk is writing stdout to stderr on a hive -e call. Is this the intended functionality ? hive -e "query 1; query 2; query 3; " 2> errors.out errors.out has stdout. stdout has no output. Thanks -Matt
Re: Trunk runtime errors
Prasad. My query is pretty complex so I created a simple test case for you. I first tried on a table with only 1 partition and that succeeded. I then tried with two partitions and that did not copy the data. So it seems like it is only for tables with more than 1 partition. I ran this in the CLI. drop table hive_test_src; create table hive_test_src ( col1 string ) stored as textfile ; load data local inpath '/home/mpestritto/hive_test/data.dat' overwrite into table hive_test_src ; drop table hive_test_dst; create table hive_test_dst ( col1 string ) partitioned by ( pcol1 string , pcol2 string) stored as sequencefile; insert overwrite table hive_test_dst partition ( pcol1='test_part' , pcol2='test_part') select col1 from hive_test_src ; select count(1) from hive_test_dst where pcol1='test_part' and pcol2='test_part'; mpestri...@mustique:~/hive_test$ cat data.dat 1 2 3 4 5 6 CLI - OUTPUT: hive> drop table hive_test_src; OK Time taken: 0.188 seconds hive> create table hive_test_src ( col1 string ) stored as textfile ; OK Time taken: 0.098 seconds hive> load data local inpath '/home/mpestritto/hive_test/data.dat' overwrite into table hive_test_src ; Copying data from file:/home/mpestritto/hive_test/data.dat Loading data to table hive_test_src OK Time taken: 0.36 seconds hive> > drop table hive_test_dst; OK Time taken: 0.124 seconds hive> create table hive_test_dst ( col1 string ) partitioned by ( pcol1 string , pcol2 string) stored as sequencefile; OK Time taken: 0.084 seconds hive> > insert overwrite table hive_test_dst partition ( pcol1='test_part' , pcol2='test_part') select col1 from hive_test_src ; Total MapReduce jobs = 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_200905111618_0098, Tracking URL = http://mustique.ps.tld:50030/jobdetails.jsp?jobid=job_200905111618_0098 Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=mustique.ps.tld:9001 -kill job_200905111618_0098 map = 0%, reduce =0% map = 100%, reduce =100% Ended Job = job_200905111618_0098 Loading data to table hive_test_dst partition {pcol1=test_part, pcol2=test_part} 6 Rows loaded to hive_test_dst OK Time taken: 5.687 seconds hive> select count(1) from hive_test_dst where pcol1='test_part' and pcol2='test_part'; Total MapReduce jobs = 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Job need not be submitted: no output: Success OK Time taken: 0.41 seconds hive> On Wed, May 13, 2009 at 12:15 PM, Prasad Chakka wrote: > Matt, > > Can you send me the query for the first problem? Also whether the directory > for the partition exists before the query is issued? > > Thanks, > Prasad > > > > From: Matt Pestritto > Reply-To: > Date: Wed, 13 May 2009 09:04:38 -0700 > To: > Subject: Trunk runtime errors > > All - > > 1st problem. > I was having a problem loading data into partitions when the partition did > not exist and traced the problem to revision 772746. Trunk also has the > same error. > Revision 772746 SVN Commend: HIVE-442. Create partitions after data is > moved in the query in order to close out an inconsistent window. (Prasad > Chakka via athusoo) > Revision 772012 works fine for me. > > Essentially the partition directories are created but the data is never > copied over. If I run the same job again, the data is copied to the target > directory in HDFS. > > 2nd problem. > When I try to do a select count(1) from a table I get the following > exception and I'm not sure what the cause is. Again, this works fine if I > roll back to revision 772012. > Job Submission failed with exception > 'java.lang.IllegalArgumentException(Wrong FS: file:/tmp/hive-hive/1, > expected: hdfs://mustique.ps.tld:9000)' > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.ExecDriver > > Let me know if I can facilitate further. > > Thanks > -Matt > >
Trunk runtime errors
All - 1st problem. I was having a problem loading data into partitions when the partition did not exist and traced the problem to revision 772746. Trunk also has the same error. Revision 772746 SVN Commend: HIVE-442. Create partitions after data is moved in the query in order to close out an inconsistent window. (Prasad Chakka via athusoo) Revision 772012 works fine for me. Essentially the partition directories are created but the data is never copied over. If I run the same job again, the data is copied to the target directory in HDFS. 2nd problem. When I try to do a select count(1) from a table I get the following exception and I'm not sure what the cause is. Again, this works fine if I roll back to revision 772012. Job Submission failed with exception 'java.lang.IllegalArgumentException(Wrong FS: file:/tmp/hive-hive/1, expected: hdfs://mustique.ps.tld:9000)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver Let me know if I can facilitate further. Thanks -Matt
trunk not working.
Hi all. The lastest trunk is not working for me. When I roll back to version - 773043 it works fine. Any ideas ? Let me know if you need anything else. Thanks in advance. Error is below: hive> select count(1) from tbl1; Total MapReduce jobs = 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Job Submission failed with exception 'java.lang.IllegalArgumentException(Wrong FS: file:/tmp/hive-mpestritto/1, expected: hdfs://mustique.ps.tld:9000)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver
[jira] Commented: (HIVE-354) [hive] udf needed for getting length of a string
[ https://issues.apache.org/jira/browse/HIVE-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700970#action_12700970 ] Matt Pestritto commented on HIVE-354: - Thanks for patch. Worked for me. > [hive] udf needed for getting length of a string > > > Key: HIVE-354 > URL: https://issues.apache.org/jira/browse/HIVE-354 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Neil Conway > Attachments: JIRA_hive-354.patch.1 > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-383) New String Function: length
[ https://issues.apache.org/jira/browse/HIVE-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700937#action_12700937 ] Matt Pestritto commented on HIVE-383: - Ah. Thanks Neil. I didn't notice the dup. > New String Function: length > --- > > Key: HIVE-383 > URL: https://issues.apache.org/jira/browse/HIVE-383 > Project: Hadoop Hive > Issue Type: New Feature > Reporter: Matt Pestritto > > Request for an additional string function: length ( ) > returns an integer of the length of the string passed in -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-383) New String Function: length
[ https://issues.apache.org/jira/browse/HIVE-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700594#action_12700594 ] Matt Pestritto commented on HIVE-383: - Any ETA on this request ? It seems like a straightforward request. functional example: select max( length( col1 ) ) from table_1; Thanks. > New String Function: length > --- > > Key: HIVE-383 > URL: https://issues.apache.org/jira/browse/HIVE-383 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Matt Pestritto > > Request for an additional string function: length ( ) > returns an integer of the length of the string passed in -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-383) New String Function: length
New String Function: length --- Key: HIVE-383 URL: https://issues.apache.org/jira/browse/HIVE-383 Project: Hadoop Hive Issue Type: New Feature Reporter: Matt Pestritto Request for an additional string function: length ( ) returns an integer of the length of the string passed in -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.