[jira] Commented: (HIVE-1115) optimize combinehiveinputformat in presence of many partitions

2010-04-27 Thread Matt Pestritto (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861508#action_12861508
 ] 

Matt Pestritto commented on HIVE-1115:
--

Any eta on this issue for resolution ?  There hasn't been any activity in a 
while and it would be significant performance increase in our environment.  
Thanks

> optimize combinehiveinputformat in presence of many partitions
> --
>
> Key: HIVE-1115
> URL: https://issues.apache.org/jira/browse/HIVE-1115
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Paul Yang
>
> A query like :
> select ..  from T where ...
> where T contains a very large number of partitions does not work very well 
> with CombineHiveInputFomat.
> A pool is created per directory, which leads to a high number of mappers.
> In case all partitions share the same operator tree, and the same partition 
> description, only a single pool should be created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-820) Describe Extended Line Breaks When Delimiter is \n

2010-01-06 Thread Matt Pestritto (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797150#action_12797150
 ] 

Matt Pestritto commented on HIVE-820:
-

All -

Do we have a decision on what you want the output to show ?  A few different 
ideas were being thrown around.

I would rather replace only characters that would break the output ( tab, \n ) 
with something meaningful vs, as Edward stated, always showing the octal 
representation which would require an ascii table to figure out what the 
delimiter is.  If something is | ( pipe ) delimited, I always need to look it 
up when that is a printable character.

I'll wait for feedback from the FB team and make the changes.

Thanks.

> Describe Extended Line Breaks When Delimiter is \n
> --
>
> Key: HIVE-820
> URL: https://issues.apache.org/jira/browse/HIVE-820
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.5.0
>Reporter: Matt Pestritto
>Assignee: Matt Pestritto
>Priority: Minor
> Fix For: 0.5.0
>
> Attachments: hive_820.patch
>
>
> Tables defined delimited with \t and breaks using \n has output of describe 
> extended that is not contiguous.
> Line.delim outputs an actual \n which breaks the display output so using the 
> hiveservice you have to do another FetchOne to get the rest of the line.
> For example.
> Original Output:
> Detailed Table InformationTable(tableName:cobra_merchandise, 
> dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, 
> type:string, comment:null), FieldSchema(name:client_merch_type_tid, 
> type:string, comment:null), FieldSchema(name:description, type:string, 
> comment:null), FieldSchema(name:client_description, type:string, 
> comment:null), FieldSchema(name:price, type:string, comment:null), 
> FieldSchema(name:cost, type:string, comment:null), 
> FieldSchema(name:start_date, type:string, comment:null), 
> FieldSchema(name:end_date, type:string, comment:null)], 
> location:hdfs://mustique:9000/user/hive/warehouse/m, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=9,line.delim=
> ,field.delim=}), bucketCols:[], sortCols:[], parameters:{}), 
> partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], 
> parameters:{})   
> Proposed Output:
> Detailed Table InformationTable(tableName:cobra_merchandise, 
> dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, 
> type:string, comment:null), FieldSchema(name:client_merch_type_tid, 
> type:string, comment:null), FieldSchema(name:description, type:string, 
> comment:null), FieldSchema(name:client_description, type:string, 
> comment:null), FieldSchema(name:price, type:string, comment:null), 
> FieldSchema(name:cost, type:string, comment:null), 
> FieldSchema(name:start_date, type:string, comment:null), 
> FieldSchema(name:end_date, type:string, comment:null)], 
> location:hdfs://mustique:9000/user/hive/warehouse/m, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=9,line.delim=,field.delim=}), 
> bucketCols:[], sortCols:[], parameters:{}), 
> partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], 
> parameters:{})   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-983) Function from_unixtime only takes Int. Override to support Long

2009-12-14 Thread Matt Pestritto (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Pestritto updated HIVE-983:


Priority: Minor  (was: Major)

> Function from_unixtime only takes Int.  Override to support Long
> 
>
> Key: HIVE-983
> URL: https://issues.apache.org/jira/browse/HIVE-983
> Project: Hadoop Hive
>  Issue Type: Improvement
>    Reporter: Matt Pestritto
>Priority: Minor
>
> UDFFromUnixTime.java only supports int.  We have dates that are future dated 
> so they fail when it tries to parse.  Can there be additional support for 
> LongWritable input parameter ? 
> We also have dates stored with milliseconds which blows up the integer 
> limitation.  Long support will be helpful.
> FAILED: Error in semantic analysis: line 1:7 Function Argument Type Mismatch 
> from_unixtime: Looking for UDF "from_unixtime" with parameters [class 
> org.apache.hadoop.io.LongWritable
> ]
> 09/12/14 11:42:10 ERROR ql.Driver: FAILED: Error in semantic analysis: line 
> 1:7 Function Argument Type Mismatch from_unixtime: Looking for UDF 
> "from_unixtime" with parameters [clas
> s org.apache.hadoop.io.LongWritable]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-983) Function from_unixtime only takes Int. Override to support Long

2009-12-14 Thread Matt Pestritto (JIRA)
Function from_unixtime only takes Int.  Override to support Long


 Key: HIVE-983
 URL: https://issues.apache.org/jira/browse/HIVE-983
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Matt Pestritto


UDFFromUnixTime.java only supports int.  We have dates that are future dated so 
they fail when it tries to parse.  Can there be additional support for 
LongWritable input parameter ? 

We also have dates stored with milliseconds which blows up the integer 
limitation.  Long support will be helpful.

FAILED: Error in semantic analysis: line 1:7 Function Argument Type Mismatch 
from_unixtime: Looking for UDF "from_unixtime" with parameters [class 
org.apache.hadoop.io.LongWritable
]
09/12/14 11:42:10 ERROR ql.Driver: FAILED: Error in semantic analysis: line 1:7 
Function Argument Type Mismatch from_unixtime: Looking for UDF "from_unixtime" 
with parameters [clas
s org.apache.hadoop.io.LongWritable]


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Hive-74

2009-10-08 Thread Matt Pestritto
Namit -

I have tried hive-trunk as of this afternoon and hive release 814942 (
revision with CombineHiveInputFormat commit ) .

Also - there are no logs that get generated on the tasktrackers for the
hadoop job that fails.  The only log that is generated on the jobtracker is
the jobconf.

Thanks
-Matt

On Thu, Oct 8, 2009 at 1:26 AM, Namit Jain  wrote:

>  Hi Matt,
>
> Sorry for the late reply.
>
> hive> set
> hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
>
> I tried it running on hadoop 20 and it ran fine for me.
>
> Which hive release are you using ?
>
> Also, you got a runtime error – can you see the stderr logs on the tracker
> ?
>
> Thanks,
> -namit
>
>
>
> On 10/1/09 5:01 PM, "Matt Pestritto"  wrote:
>
> Namit -
> Any idea on how to resolve ?
> Thanks
>
> On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto 
> wrote:
>
> > There were errors in the hive.log
> >
> > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.resources" but it cannot be resolved.
> > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.resources" but it cannot be resolved.
> > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.runtime" but it cannot be resolved.
> > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.runtime" but it cannot be resolved.
> > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.text" but it cannot be resolved.
> > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.text" but it cannot be resolved.
> > 2009-10-01 10:40:57,143 WARN  mapred.JobClient
> > (JobClient.java:configureCommandLineOptions(539)) - Use
> GenericOptionsParser
> > for parsing the arguments. Applications should implement Tool for the
> same.
> > 2009-10-01 10:40:58,609 ERROR exec.ExecDriver
> > (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068
> with
> > errors
> > 2009-10-01 10:40:58,622 ERROR ql.Driver
> (SessionState.java:printError(248))
> > - FAILED: Execution Error, return code 2 from
> > org.apache.hadoop.hive.ql.exec.ExecDriver
> >
> >
> >
> > On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain  wrote:
> >
> >> What you are doing seems OK ?
> >> Can you get the stack trace from /tmp//hive.log ?
> >>
> >>
> >>
> >>
> >>
> >> -Original Message-
> >> From: Matt Pestritto [mailto:m...@pestritto.com] 
> >> Sent: Wednesday, September 30, 2009 6:51 AM
> >> To: hive-dev@hadoop.apache.org; hive-u...@hadoop.apache.org
> >> Subject: Fwd: Hive-74
> >>
> >> Including hive-user in case someone has any experience with this..
> >> Thanks
> >> -Matt
> >>
> >> -- Forwarded message --
> >> From: Matt Pestritto 
> >> Date: Tue, Sep 29, 2009 at 5:26 PM
> >> Subject: Hive-74
> >> To: hive-dev@hadoop.apache.org
> >>
> >>
> >> Hi-
> >>
> >> I'm having a problem using CombineHiveInputSplit.  I believe this was
> >> patched in http://issues.apache.org/jira/browse/HIVE-74
> >>
> >> I'm currently running hadoop 20.1 using hive trunk.
> >>
> >> hive-default.xml has the following property:
> >> 
> >>  hive.input.format
> >>  
> >>  The default input format, if it is not specified, the
> system
> >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
> >> 19,
> >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it
> can
> >> always be manually set to HiveInputFormat. 
> >> 
> >>
> >> I added the following to hive-site.xml:  ( Notice, the description in
> >> hive-default.xml has CombinedHiveInputFormat which does not work for me
> -
> >> the property value seems to be Combine(-d) )
> 

Re: Hive-74

2009-10-01 Thread Matt Pestritto
Namit -
Any idea on how to resolve ?
Thanks

On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto  wrote:

> There were errors in the hive.log
>
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:57,143 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-10-01 10:40:58,609 ERROR exec.ExecDriver
> (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 with
> errors
> 2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248))
> - FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
>
>
> On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain  wrote:
>
>> What you are doing seems OK ?
>> Can you get the stack trace from /tmp//hive.log ?
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Matt Pestritto [mailto:m...@pestritto.com]
>> Sent: Wednesday, September 30, 2009 6:51 AM
>> To: hive-dev@hadoop.apache.org; hive-u...@hadoop.apache.org
>> Subject: Fwd: Hive-74
>>
>> Including hive-user in case someone has any experience with this..
>> Thanks
>> -Matt
>>
>> -- Forwarded message --
>> From: Matt Pestritto 
>> Date: Tue, Sep 29, 2009 at 5:26 PM
>> Subject: Hive-74
>> To: hive-dev@hadoop.apache.org
>>
>>
>> Hi-
>>
>> I'm having a problem using CombineHiveInputSplit.  I believe this was
>> patched in http://issues.apache.org/jira/browse/HIVE-74
>>
>> I'm currently running hadoop 20.1 using hive trunk.
>>
>> hive-default.xml has the following property:
>> 
>>  hive.input.format
>>  
>>  The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. 
>> 
>>
>> I added the following to hive-site.xml:  ( Notice, the description in
>> hive-default.xml has CombinedHiveInputFormat which does not work for me -
>> the property value seems to be Combine(-d) )
>> 
>>  hive.input.format
>>  org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
>>  The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. 
>> 
>>
>> When I launch a job the cli exits immediately:
>> hive> select count(1) from my_table;
>> Total MapReduce jobs = 1
>> Number of reduce tasks determined at compile time: 1
>> In order to change the average load for a reducer (in bytes):
>>  set hive.exec.reducers.bytes.per.reducer=
>> In order to limit the maximum number of reducers:
>>  set hive.exec.reducers.max=
>> In order to set a constant number of reducers:
>>  set mapred.reduce.tasks=
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.ExecDriver
>> hive> exit ;
>>
>> If I set the property value to
>> org.apache.hadoop.hive.ql.io.HiveInputFormat,
>> the job runs fine.
>>
>> Suggestions ? Is there something that I am missing ?
>>
>> Thanks
>> -Matt
>>
>
>


Re: Hive-74

2009-10-01 Thread Matt Pestritto
There were errors in the hive.log

2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2009-10-01 10:40:57,143 WARN  mapred.JobClient
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
2009-10-01 10:40:58,609 ERROR exec.ExecDriver
(SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 with
errors
2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248))
- FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver


On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain  wrote:

> What you are doing seems OK ?
> Can you get the stack trace from /tmp//hive.log ?
>
>
>
>
>
> -Original Message-
> From: Matt Pestritto [mailto:m...@pestritto.com]
> Sent: Wednesday, September 30, 2009 6:51 AM
> To: hive-dev@hadoop.apache.org; hive-u...@hadoop.apache.org
> Subject: Fwd: Hive-74
>
> Including hive-user in case someone has any experience with this..
> Thanks
> -Matt
>
> -- Forwarded message --
> From: Matt Pestritto 
> Date: Tue, Sep 29, 2009 at 5:26 PM
> Subject: Hive-74
> To: hive-dev@hadoop.apache.org
>
>
> Hi-
>
> I'm having a problem using CombineHiveInputSplit.  I believe this was
> patched in http://issues.apache.org/jira/browse/HIVE-74
>
> I'm currently running hadoop 20.1 using hive trunk.
>
> hive-default.xml has the following property:
> 
>  hive.input.format
>  
>  The default input format, if it is not specified, the system
> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
> always be manually set to HiveInputFormat. 
> 
>
> I added the following to hive-site.xml:  ( Notice, the description in
> hive-default.xml has CombinedHiveInputFormat which does not work for me -
> the property value seems to be Combine(-d) )
> 
>  hive.input.format
>  org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
>  The default input format, if it is not specified, the system
> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
> always be manually set to HiveInputFormat. 
> 
>
> When I launch a job the cli exits immediately:
> hive> select count(1) from my_table;
> Total MapReduce jobs = 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>  set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>  set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>  set mapred.reduce.tasks=
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
> hive> exit ;
>
> If I set the property value to
> org.apache.hadoop.hive.ql.io.HiveInputFormat,
> the job runs fine.
>
> Suggestions ? Is there something that I am missing ?
>
> Thanks
> -Matt
>


Fwd: Hive-74

2009-09-30 Thread Matt Pestritto
Including hive-user in case someone has any experience with this..
Thanks
-Matt

-- Forwarded message --
From: Matt Pestritto 
Date: Tue, Sep 29, 2009 at 5:26 PM
Subject: Hive-74
To: hive-dev@hadoop.apache.org


Hi-

I'm having a problem using CombineHiveInputSplit.  I believe this was
patched in http://issues.apache.org/jira/browse/HIVE-74

I'm currently running hadoop 20.1 using hive trunk.

hive-default.xml has the following property:

  hive.input.format
  
  The default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. 


I added the following to hive-site.xml:  ( Notice, the description in
hive-default.xml has CombinedHiveInputFormat which does not work for me -
the property value seems to be Combine(-d) )

  hive.input.format
  org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
  The default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. 


When I launch a job the cli exits immediately:
hive> select count(1) from my_table;
Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver
hive> exit ;

If I set the property value to org.apache.hadoop.hive.ql.io.HiveInputFormat,
the job runs fine.

Suggestions ? Is there something that I am missing ?

Thanks
-Matt


Hive-74

2009-09-29 Thread Matt Pestritto
Hi-

I'm having a problem using CombineHiveInputSplit.  I believe this was
patched in http://issues.apache.org/jira/browse/HIVE-74

I'm currently running hadoop 20.1 using hive trunk.

hive-default.xml has the following property:

  hive.input.format
  
  The default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. 


I added the following to hive-site.xml:  ( Notice, the description in
hive-default.xml has CombinedHiveInputFormat which does not work for me -
the property value seems to be Combine(-d) )

  hive.input.format
  org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
  The default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. 


When I launch a job the cli exits immediately:
hive> select count(1) from my_table;
Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver
hive> exit ;

If I set the property value to org.apache.hadoop.hive.ql.io.HiveInputFormat,
the job runs fine.

Suggestions ? Is there something that I am missing ?

Thanks
-Matt


[jira] Updated: (HIVE-851) Thrift Client: BaseException.message deprecation warning in Python 2.6

2009-09-22 Thread Matt Pestritto (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Pestritto updated HIVE-851:


Attachment: hive_851.patch

Note. This patches two files: 
service/lib/py/thrift/Thrift.py
and 
service/src/gen-py/hive/ttypes.py

It looks like ttypes.py is generated automatically.  I'm not sure where that 
comes from so you may not want to patch that file.

Thanks

> Thrift Client: BaseException.message deprecation warning in Python 2.6
> --
>
> Key: HIVE-851
> URL: https://issues.apache.org/jira/browse/HIVE-851
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Matt Pestritto
>Priority: Minor
> Attachments: hive_851.patch
>
>
> In Python 2.6 BaseException.message has been deprecated.  This is a patch to 
> remove these warnings.
> src/thrift/Thrift.py:62: DeprecationWarning: BaseException.message has been 
> deprecated as of Python 2.6
>   self.message = message
> Also note.  I could only replicate this error in two clases and there were 
> other classes that inherited (Exception). 
> Patch is only attached for those two classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-851) Thrift Client: BaseException.message deprecation warning in Python 2.6

2009-09-22 Thread Matt Pestritto (JIRA)
Thrift Client: BaseException.message deprecation warning in Python 2.6
--

 Key: HIVE-851
 URL: https://issues.apache.org/jira/browse/HIVE-851
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: Matt Pestritto
Priority: Minor


In Python 2.6 BaseException.message has been deprecated.  This is a patch to 
remove these warnings.
src/thrift/Thrift.py:62: DeprecationWarning: BaseException.message has been 
deprecated as of Python 2.6
  self.message = message

Also note.  I could only replicate this error in two clases and there were 
other classes that inherited (Exception). 

Patch is only attached for those two classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: vote for release candidate for hive

2009-09-17 Thread Matt Pestritto
Please disregard.  I found the cause of my error.

Thanks.

On Thu, Sep 17, 2009 at 3:09 PM, Matt Pestritto  wrote:

> I recently switched to the 0.4 branch to do some testing and I'm running
> into a problem.
>
> When I run a query from the cli - the first one works, but the second query
> always fails with a NullPointerException.
>
> Did anyone else run into this ?
>
> Thanks
> -Matt
>
> hive> select count(1) from table1;
> Total MapReduce jobs = 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> Starting Job = job_200909171501_0001, Tracking URL =
> http://mustique:50030/jobdetails.jsp?jobid=job_200909171501_0001
> Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job
> -Dmapred.job.tracker=mustique:9001 -kill job_200909171501_0001
> 2009-09-17 03:05:54,855 map = 0%,  reduce =0%
> 2009-09-17 03:06:02,895 map = 22%,  reduce =0%
> 2009-09-17 03:06:06,933 map = 44%,  reduce =0%
> 2009-09-17 03:06:11,965 map = 67%,  reduce =0%
> 2009-09-17 03:06:15,988 map = 89%,  reduce =0%
> 2009-09-17 03:06:20,009 map = 100%,  reduce =0%
> 2009-09-17 03:06:25,036 map = 100%,  reduce =11%
> 2009-09-17 03:06:30,054 map = 100%,  reduce =15%
> 2009-09-17 03:06:31,063 map = 100%,  reduce =22%
> 2009-09-17 03:06:34,075 map = 100%,  reduce =26%
> 2009-09-17 03:06:36,101 map = 100%,  reduce =100%
> Ended Job = job_200909171501_0001
> OK
> 274087
> Time taken: 45.401 seconds
> hive> select count(1) from table1;
> Total MapReduce jobs = 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:154)
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:373)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> Job Submission failed with exception
> 'java.lang.RuntimeException(java.lang.NullPointerException)'
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
> hive>
>
>
> On Thu, Sep 17, 2009 at 12:36 PM, Namit Jain  wrote:
>
>> https://issues.apache.org/jira/browse/HIVE-838
>>
>> is a blocker for 0.4 -
>> Once this is merged, I will have another release candidate
>>
>>
>> -Original Message-
>> From: Johan Oskarsson [mailto:jo...@oskarsson.nu]
>> Sent: Wednesday, September 16, 2009 8:29 AM
>> To: hive-dev@hadoop.apache.org
>> Subject: Re: vote for release candidate for hive
>>
>> +1 based on running unit tests.
>>
>> /Johan
>>
>> Namit Jain wrote:
>> > Sorry, was meant for hive-dev@
>> >
>> > From: Namit Jain [mailto:nj...@facebook.com]
>> > Sent: Tuesday, September 15, 2009 1:30 PM
>> > To: hive-u...@hadoop.apache.org
>> > Subject: vote for release candidate for hive
>> >
>> >
>> > I have created another release candidate for Hive.
>> >
>> >
>> >
>> >  https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/
>> >
>> >
>> >
>> >
>> >
>> > Let me know if it is OK to publish this release candidate.
>> >
>> >
>> >
>> > The only change from the previous candidate (
>> https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/) is
>> the fix for
>> >
>> > https://issues.apache.org/jira/browse/HIVE-718
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> > -namit
>> >
>> >
>> >
>> >
>>
>>
>


Re: vote for release candidate for hive

2009-09-17 Thread Matt Pestritto
I recently switched to the 0.4 branch to do some testing and I'm running
into a problem.

When I run a query from the cli - the first one works, but the second query
always fails with a NullPointerException.

Did anyone else run into this ?

Thanks
-Matt

hive> select count(1) from table1;
Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
Starting Job = job_200909171501_0001, Tracking URL =
http://mustique:50030/jobdetails.jsp?jobid=job_200909171501_0001
Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job
-Dmapred.job.tracker=mustique:9001 -kill job_200909171501_0001
2009-09-17 03:05:54,855 map = 0%,  reduce =0%
2009-09-17 03:06:02,895 map = 22%,  reduce =0%
2009-09-17 03:06:06,933 map = 44%,  reduce =0%
2009-09-17 03:06:11,965 map = 67%,  reduce =0%
2009-09-17 03:06:15,988 map = 89%,  reduce =0%
2009-09-17 03:06:20,009 map = 100%,  reduce =0%
2009-09-17 03:06:25,036 map = 100%,  reduce =11%
2009-09-17 03:06:30,054 map = 100%,  reduce =15%
2009-09-17 03:06:31,063 map = 100%,  reduce =22%
2009-09-17 03:06:34,075 map = 100%,  reduce =26%
2009-09-17 03:06:36,101 map = 100%,  reduce =100%
Ended Job = job_200909171501_0001
OK
274087
Time taken: 45.401 seconds
hive> select count(1) from table1;
Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:154)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:373)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:379)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:285)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Job Submission failed with exception
'java.lang.RuntimeException(java.lang.NullPointerException)'
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.ExecDriver
hive>


On Thu, Sep 17, 2009 at 12:36 PM, Namit Jain  wrote:

> https://issues.apache.org/jira/browse/HIVE-838
>
> is a blocker for 0.4 -
> Once this is merged, I will have another release candidate
>
>
> -Original Message-
> From: Johan Oskarsson [mailto:jo...@oskarsson.nu]
> Sent: Wednesday, September 16, 2009 8:29 AM
> To: hive-dev@hadoop.apache.org
> Subject: Re: vote for release candidate for hive
>
> +1 based on running unit tests.
>
> /Johan
>
> Namit Jain wrote:
> > Sorry, was meant for hive-dev@
> >
> > From: Namit Jain [mailto:nj...@facebook.com]
> > Sent: Tuesday, September 15, 2009 1:30 PM
> > To: hive-u...@hadoop.apache.org
> > Subject: vote for release candidate for hive
> >
> >
> > I have created another release candidate for Hive.
> >
> >
> >
> >  https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/
> >
> >
> >
> >
> >
> > Let me know if it is OK to publish this release candidate.
> >
> >
> >
> > The only change from the previous candidate (
> https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/) is
> the fix for
> >
> > https://issues.apache.org/jira/browse/HIVE-718
> >
> >
> >
> >
> >
> >
> >
> > Thanks,
> >
> > -namit
> >
> >
> >
> >
>
>


[jira] Commented: (HIVE-820) Describe Extended Line Breaks When Delimiter is \n

2009-09-10 Thread Matt Pestritto (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753629#action_12753629
 ] 

Matt Pestritto commented on HIVE-820:
-

Edward - 

I made this suggested change and it did not work.  For the LF, the output still 
breaks and two fetches have to be done to get the extended plan.  The 054 did 
not display anything.  

I also tried escaping the backslash and just a 054 and 012 were printed.  Would 
you prefer that notation ?   054 and 012 with no \

Thanks
-Matt



> Describe Extended Line Breaks When Delimiter is \n
> --
>
> Key: HIVE-820
> URL: https://issues.apache.org/jira/browse/HIVE-820
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.5.0
>    Reporter: Matt Pestritto
>Assignee: Matt Pestritto
>Priority: Minor
> Fix For: 0.5.0
>
> Attachments: hive_820.patch
>
>
> Tables defined delimited with \t and breaks using \n has output of describe 
> extended that is not contiguous.
> Line.delim outputs an actual \n which breaks the display output so using the 
> hiveservice you have to do another FetchOne to get the rest of the line.
> For example.
> Original Output:
> Detailed Table InformationTable(tableName:cobra_merchandise, 
> dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, 
> type:string, comment:null), FieldSchema(name:client_merch_type_tid, 
> type:string, comment:null), FieldSchema(name:description, type:string, 
> comment:null), FieldSchema(name:client_description, type:string, 
> comment:null), FieldSchema(name:price, type:string, comment:null), 
> FieldSchema(name:cost, type:string, comment:null), 
> FieldSchema(name:start_date, type:string, comment:null), 
> FieldSchema(name:end_date, type:string, comment:null)], 
> location:hdfs://mustique:9000/user/hive/warehouse/m, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=9,line.delim=
> ,field.delim=}), bucketCols:[], sortCols:[], parameters:{}), 
> partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], 
> parameters:{})   
> Proposed Output:
> Detailed Table InformationTable(tableName:cobra_merchandise, 
> dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, 
> type:string, comment:null), FieldSchema(name:client_merch_type_tid, 
> type:string, comment:null), FieldSchema(name:description, type:string, 
> comment:null), FieldSchema(name:client_description, type:string, 
> comment:null), FieldSchema(name:price, type:string, comment:null), 
> FieldSchema(name:cost, type:string, comment:null), 
> FieldSchema(name:start_date, type:string, comment:null), 
> FieldSchema(name:end_date, type:string, comment:null)], 
> location:hdfs://mustique:9000/user/hive/warehouse/m, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=9,line.delim=,field.delim=}), 
> bucketCols:[], sortCols:[], parameters:{}), 
> partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], 
> parameters:{})   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-80) Allow Hive Server to run multiple queries simulteneously

2009-09-09 Thread Matt Pestritto (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Pestritto updated HIVE-80:
---

Attachment: (was: hive_820.patch)

> Allow Hive Server to run multiple queries simulteneously
> 
>
> Key: HIVE-80
> URL: https://issues.apache.org/jira/browse/HIVE-80
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Neil Conway
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: hive_input_format_race-2.patch, 
> org.apache.hadoop.hive.ql.exec.Utilities-ThreadLocal.patch
>
>
> Can use one driver object per query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-820) Describe Extended Line Breaks When Delimiter is \n

2009-09-09 Thread Matt Pestritto (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Pestritto updated HIVE-820:


Attachment: hive_820.patch

Patch Attached.

> Describe Extended Line Breaks When Delimiter is \n
> --
>
> Key: HIVE-820
> URL: https://issues.apache.org/jira/browse/HIVE-820
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>    Reporter: Matt Pestritto
>Priority: Minor
> Attachments: hive_820.patch
>
>
> Tables defined delimited with \t and breaks using \n has output of describe 
> extended that is not contiguous.
> Line.delim outputs an actual \n which breaks the display output so using the 
> hiveservice you have to do another FetchOne to get the rest of the line.
> For example.
> Original Output:
> Detailed Table InformationTable(tableName:cobra_merchandise, 
> dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, 
> type:string, comment:null), FieldSchema(name:client_merch_type_tid, 
> type:string, comment:null), FieldSchema(name:description, type:string, 
> comment:null), FieldSchema(name:client_description, type:string, 
> comment:null), FieldSchema(name:price, type:string, comment:null), 
> FieldSchema(name:cost, type:string, comment:null), 
> FieldSchema(name:start_date, type:string, comment:null), 
> FieldSchema(name:end_date, type:string, comment:null)], 
> location:hdfs://mustique:9000/user/hive/warehouse/m, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=9,line.delim=
> ,field.delim=}), bucketCols:[], sortCols:[], parameters:{}), 
> partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], 
> parameters:{})   
> Proposed Output:
> Detailed Table InformationTable(tableName:cobra_merchandise, 
> dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, 
> type:string, comment:null), FieldSchema(name:client_merch_type_tid, 
> type:string, comment:null), FieldSchema(name:description, type:string, 
> comment:null), FieldSchema(name:client_description, type:string, 
> comment:null), FieldSchema(name:price, type:string, comment:null), 
> FieldSchema(name:cost, type:string, comment:null), 
> FieldSchema(name:start_date, type:string, comment:null), 
> FieldSchema(name:end_date, type:string, comment:null)], 
> location:hdfs://mustique:9000/user/hive/warehouse/m, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=9,line.delim=,field.delim=}), 
> bucketCols:[], sortCols:[], parameters:{}), 
> partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], 
> parameters:{})   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-80) Allow Hive Server to run multiple queries simulteneously

2009-09-09 Thread Matt Pestritto (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Pestritto updated HIVE-80:
---

Attachment: hive_820.patch

Patch Attached.

> Allow Hive Server to run multiple queries simulteneously
> 
>
> Key: HIVE-80
> URL: https://issues.apache.org/jira/browse/HIVE-80
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Neil Conway
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: hive_input_format_race-2.patch, 
> org.apache.hadoop.hive.ql.exec.Utilities-ThreadLocal.patch
>
>
> Can use one driver object per query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-820) Describe Extended Line Breaks When Delimiter is \n

2009-09-09 Thread Matt Pestritto (JIRA)
Describe Extended Line Breaks When Delimiter is \n
--

 Key: HIVE-820
 URL: https://issues.apache.org/jira/browse/HIVE-820
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Matt Pestritto
Priority: Minor


Tables defined delimited with \t and breaks using \n has output of describe 
extended that is not contiguous.

Line.delim outputs an actual \n which breaks the display output so using the 
hiveservice you have to do another FetchOne to get the rest of the line.

For example.

Original Output:
Detailed Table InformationTable(tableName:cobra_merchandise, 
dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, 
retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, 
type:string, comment:null), FieldSchema(name:client_merch_type_tid, 
type:string, comment:null), FieldSchema(name:description, type:string, 
comment:null), FieldSchema(name:client_description, type:string, comment:null), 
FieldSchema(name:price, type:string, comment:null), FieldSchema(name:cost, 
type:string, comment:null), FieldSchema(name:start_date, type:string, 
comment:null), FieldSchema(name:end_date, type:string, comment:null)], 
location:hdfs://mustique:9000/user/hive/warehouse/m, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=9,line.delim=
,field.delim=}), bucketCols:[], sortCols:[], parameters:{}), 
partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], 
parameters:{})   

Proposed Output:
Detailed Table InformationTable(tableName:cobra_merchandise, 
dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0, 
retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid, 
type:string, comment:null), FieldSchema(name:client_merch_type_tid, 
type:string, comment:null), FieldSchema(name:description, type:string, 
comment:null), FieldSchema(name:client_description, type:string, comment:null), 
FieldSchema(name:price, type:string, comment:null), FieldSchema(name:cost, 
type:string, comment:null), FieldSchema(name:start_date, type:string, 
comment:null), FieldSchema(name:end_date, type:string, comment:null)], 
location:hdfs://mustique:9000/user/hive/warehouse/m, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=9,line.delim=,field.delim=}), 
bucketCols:[], sortCols:[], parameters:{}), 
partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)], 
parameters:{})   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Describe Extended - Replace Tab and LF

2009-09-08 Thread Matt Pestritto
Hi.

I was wondering if you could replace the Tab and LF to a string  and
 in the describe extended output ?
I have tables defined delimited with \t and breaks using \n so the output of
describe extended is not contiguous.

Minor patch below.  Feel free to use if you want to.

For example.  Note Line.delim outputs an actual \n which breaks the display
output so using the hiveservice you have to do another FetchOne to get the
rest of the line.

Detailed Table InformationTable(tableName:cobra_merchandise,
dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0,
retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid,
type:string, comment:null), FieldSchema(name:client_merch_type_tid,
type:string, comment:null), FieldSchema(name:description, type:string,
comment:null), FieldSchema(name:client_description, type:string,
comment:null), FieldSchema(name:price, type:string, comment:null),
FieldSchema(name:cost, type:string, comment:null),
FieldSchema(name:start_date, type:string, comment:null),
FieldSchema(name:end_date, type:string, comment:null)],
location:hdfs://mustique:9000/user/hive/warehouse/m,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{serialization.format=9,*line.delim=,field.delim=*}),
bucketCols:[], sortCols:[], parameters:{}),
partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)],
parameters:{})

Detailed Table InformationTable(tableName:cobra_merchandise,
dbName:default, owner:hive, createTime:1248726291, lastAccessTime:0,
retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:merchandise_tid,
type:string, comment:null), FieldSchema(name:client_merch_type_tid,
type:string, comment:null), FieldSchema(name:description, type:string,
comment:null), FieldSchema(name:client_description, type:string,
comment:null), FieldSchema(name:price, type:string, comment:null),
FieldSchema(name:cost, type:string, comment:null),
FieldSchema(name:start_date, type:string, comment:null),
FieldSchema(name:end_date, type:string, comment:null)],
location:hdfs://mustique:9000/user/hive/warehouse/m,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{serialization.format=9,*line.delim=
,field.delim=*}), bucketCols:[], sortCols:[], parameters:{}),
partitionKeys:[FieldSchema(name:client_tid, type:int, comment:null)],
parameters:{})


Patch File:

Index: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
===
--- ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java(revision
812724)
+++ ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java(working
copy)
@@ -588,7 +588,7 @@
 // show table information
 outStream.writeBytes("Detailed Table Information");
 outStream.write(separator);
-outStream.writeBytes(tbl.getTTable().toString());
+
outStream.writeBytes(tbl.getTTable().toString().replaceAll("\n",
"").replaceAll("\t", ""));
 outStream.write(separator);
 // comment column is empty
 outStream.write(terminator);


Thanks
-Matt


Re: [jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2009-08-23 Thread Matt Pestritto
Hi All.  I see a lot of good work being done on HBase/Hive integration
especially around how to express hbase metadata in hive and how to load data
from/to hbase/hive.

Has any thought be been put into how to use HBase data as lookup data in a
query and not load all of the data as a normal hive query ?

My use case is as follows:  I have a table < users > with 50m users.  I have
a 5gb daily clickstream file that only touchs 150k of those users on a daily
basis.  It would be much more efficient if I didn't have to load all of the
data in HBase to a hive table and write a traditional hive query but just do
150k lookups in the map ( or reduce ) phase of the MR job.  If the hbase
lookups were done in realtime it would be much faster than sourcing the
original user table with 50m rows.

Thoughts ?

Thanks
-Matt


On Sun, Aug 23, 2009 at 8:20 AM, Samuel Guo (JIRA)  wrote:

>
>[
> https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746592#action_12746592]
>
> Samuel Guo commented on HIVE-705:
> -
>
> Attach a new patch.
>
> 1) move the related hbase code to the contrib package, as hbase just an
> optional storage for hive, not neccessary.
> I have tried to avoid modifying the hive original code and just add a hbase
> serde to connect hive with hbase. But the hbase storage model is quite
> different with file storage model. For example, a loadwork is used to
> rename/copy files from temp dir to the target table's dir if a query's
> target is a hive table. But in a hbased hive table, we can't rename a table
> now. So it's hard to let a hbased hive table to follow the logic of a normal
> file-based hive table.  So I add some code(HiveFormatUtils) to distinguish a
> file-based table from a not-file-based table.
>
> 2) fix some bugs in the draft patch, such as "select *" return nothing.
>
>
> --
>
> How to use the hbase as hive's storage?
>
> 1) remember to add the contrib jar and the hbase jar in the hive's auxPath,
> so m/r can populate the neccessary hbase-related jars to the whole hadoop
> m/r cluster.
>
> > $HIVE_HOME/bin/hive -auxPath ${contrib_jar},${hbase_jar}
>
> 2) modify the configuration to add the following configuration parameters.
>
> "hbase.master" : pointer to the hbase's master.
> "hive.othermetadata.handlers" :
> "org.apache.hadoop.hive.contrib.hbase.HiveHBaseTableInputFormat:org.apache.hadoop.hive.contrib.hbase.HBaseMetadataHandler"
>
> "hive.othermetadata.handlers" collects the metadata handlers to handle the
> other metadata operations in the not-file-based hive tables. Take hbase as
> an example. HBaseMetadataHandler will create the neccessary hbase table and
> its family columns when we create a hbased hive table from hive's client. It
> also drop the hbase table when we drop the hive table.
>
> The metastore read the registered handlers map from the configuration file
> during initialization. The registered handlers map is formated as
> "table_format_classname:table_metadata_handler_classname,table_format_classname:table_metadata_handler_classname,...".
>
> 3) enjoy "hive over hbase"!
>
> 
>
> Other problems.
>
> 1) Altering a hased-hive table is not supported now. :(
> renaming a table in hbase is not supported now, so I just do not support
> rename operation. ( maybe if we rename a hive table, we do not need to
> rename the base hbase table.)
>
> adding/replacing cloumns.
> Now we need to specify the schema mapping in the SerDe properties
> explicitly. If we want to adding columns, we need to call 'alter' twice to
> adding columns: change the serde properties and the hive columns.  Either
> change the serde properties first or change the hive columns first will fail
> now, because we validate the schema mapping during SerDe initialization. One
> of the hbase serde validation is to check the counts of hive columns and
> hbase mapping columns. If we first change the hive columns, the number of
> hive columns will be more than hbase mapping columns, the HBase Serde
> initialization will fail this alter operation.  (maybe we need to remove the
> validation code from HBaseSerDe initialization and do it in other place?)
>
> 2) more flexible schema mapping?
> As Schubert metioned before, more flexible schema mapping will be useful
> for user. This feature will be added later.
>
>
> welcome for comments~
>
>
>
>
> > Let Hive can analyse hbase's tables
> > ---
> >
> > Key: HIVE-705
> > URL: https://issues.apache.org/jira/browse/HIVE-705
> > Project: Hadoop Hive
> >  Issue Type: New Feature
> >Reporter: Samuel Guo
> > Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar,
> HIVE-705_draft.patch, HIVE-705_revision806905.pa

Re: Problem with Thrift Server Concurrency

2009-07-28 Thread Matt Pestritto
Ok. Thanks for the update.  I'll watch that issue.

On Tue, Jul 28, 2009 at 1:57 PM, Prasad Chakka  wrote:

> This is a known issue in Hive Server. This is because the same metastore
> client is being used to issue both queries and JDBC does not like that. We
> should use thread specific or session specific metastore clients but I don't
> think Hive Server is doing that right now. HIVE-584 is supposed to fix this
> issue.
>
> ____
> From: Matt Pestritto 
> Reply-To: 
> Date: Tue, 28 Jul 2009 10:48:24 -0700
> To: 
> Subject: Problem with Thrift Server Concurrency
>
> Hi all
>
> Does the Thrift server support concurrency ?  I'm having a problem that
> only
> happens if I fire off multiple ( 2+ ) DML queries at the same time.
> Randomly, one of the queries will succeed but the other will fail with the
> following error I pulled from the hiveserver output:
>
> java.io.IOException: cannot find dir =
> hdfs://mustique:9000/user/hadoop/mantis-output/mantis-job/20090601 in
> partToPartitionInfo!
>at
>
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputFormat.java:311)
>at
>
> org.apache.hadoop.hive.ql.io.HiveInputFormat.validateInput(HiveInputFormat.java:288)
>at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:735)
>at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:388)
>at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:357)
>at org.apache.hadoop.hive.ql.Driver.run(Driver.java:263)
>at
>
> org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:108)
>at
>
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:302)
>at
>
> org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:290)
>at
>
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
>at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.java:619)
>
> If I execute the queries via thrift a few seconds apart from each other, it
> succeeds.  It only seems to fail if the queries start at about the same
> time.
>
> When I run the same two queries using *hive -e "query 1" & hive -e "query
> 2"
> * is also works fine.
>
> Any ideas ?
>
> Thanks
> -Matt
>
>


Problem with Thrift Server Concurrency

2009-07-28 Thread Matt Pestritto
Hi all

Does the Thrift server support concurrency ?  I'm having a problem that only
happens if I fire off multiple ( 2+ ) DML queries at the same time.
Randomly, one of the queries will succeed but the other will fail with the
following error I pulled from the hiveserver output:

java.io.IOException: cannot find dir =
hdfs://mustique:9000/user/hadoop/mantis-output/mantis-job/20090601 in
partToPartitionInfo!
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputFormat.java:311)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.validateInput(HiveInputFormat.java:288)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:735)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:388)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:357)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:263)
at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:108)
at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:302)
at
org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:290)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

If I execute the queries via thrift a few seconds apart from each other, it
succeeds.  It only seems to fail if the queries start at about the same
time.

When I run the same two queries using *hive -e "query 1" & hive -e "query 2"
* is also works fine.

Any ideas ?

Thanks
-Matt


Re: Error on Load into multiple Partitions

2009-07-16 Thread Matt Pestritto
Namit.

I just Updated to revision 794686 and that worked.  It looks like Zheng
committed this patch in the afternoon and this failed for me earlier that
morning.  Bad luck on my timing but I'm happy it works now.

Thanks.
-Matt


On Thu, Jul 16, 2009 at 10:09 AM, Namit Jain  wrote:

> Most probably, this is the same as
>
> https://issues.apache.org/jira/browse/HIVE-636
>
> which was merged just a days back. Can you try on the latest trunk ?
>
>
>
>
> On 7/16/09 6:45 AM, "Matt Pestritto"  wrote:
>
> Does anyone have any idea as to the reason for this error ?
>
> Thanks in Advance
> -Matt
>
> -- Forwarded message --
> From: Matt Pestritto 
> Date: Wed, Jul 15, 2009 at 10:09 AM
> Subject: Error on Load into multiple Partitions
> To: hive-dev@hadoop.apache.org
>
>
> Hi All.
>
> Are there are existing test cases that load into multiple partitions using
> a
> single from query?  This query worked in an older revision but the mappers
> fails when I run on trunk:
>
> java.lang.RuntimeException: Map operator initialization failed
>
>at
> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
>at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)
>
> Caused by: java.lang.NullPointerException
>at
> org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176)
>at
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204)
>
>at
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264)
>at
> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103)
>
>
> Here is a simplified version of what I'm running and DDL to support:
> *create table test_m ( client int, description string )
>  row format delimited fields terminated by '\011' lines terminated by
> '\012' stored as textfile;
> *
> *create table test_m_p ( description string )
>  partitioned by ( client int ) row format delimited fields terminated by
> '\011' lines terminated by '\012' stored as textfile;
> *
> *LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m  ; *
>
> *FROM test_m
> INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description
> where client=1
> INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description
> where client=2 ;
> *
> --- contents of /tmp/m.lst
> 1test
> 1test2
> 1test3
> 2hi
> 2hi1
> 2hi3
>
> Thanks!
> -Matt
>
>


Fwd: Error on Load into multiple Partitions

2009-07-16 Thread Matt Pestritto
Does anyone have any idea as to the reason for this error ?

Thanks in Advance
-Matt

-- Forwarded message --
From: Matt Pestritto 
Date: Wed, Jul 15, 2009 at 10:09 AM
Subject: Error on Load into multiple Partitions
To: hive-dev@hadoop.apache.org


Hi All.

Are there are existing test cases that load into multiple partitions using a
single from query?  This query worked in an older revision but the mappers
fails when I run on trunk:

java.lang.RuntimeException: Map operator initialization failed

at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204)

at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103)


Here is a simplified version of what I'm running and DDL to support:
*create table test_m ( client int, description string )
  row format delimited fields terminated by '\011' lines terminated by
'\012' stored as textfile;
*
*create table test_m_p ( description string )
  partitioned by ( client int ) row format delimited fields terminated by
'\011' lines terminated by '\012' stored as textfile;
*
*LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m  ; *

*FROM test_m
INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description
where client=1
INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description
where client=2 ;
*
--- contents of /tmp/m.lst
1test
1test2
1test3
2hi
2hi1
2hi3

Thanks!
-Matt


Error on Load into multiple Partitions

2009-07-15 Thread Matt Pestritto
Hi All.

Are there are existing test cases that load into multiple partitions using a
single from query?  This query worked in an older revision but the mappers
fails when I run on trunk:

java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103)


Here is a simplified version of what I'm running and DDL to support:
*create table test_m ( client int, description string )
  row format delimited fields terminated by '\011' lines terminated by
'\012' stored as textfile;
*
*create table test_m_p ( description string )
  partitioned by ( client int ) row format delimited fields terminated by
'\011' lines terminated by '\012' stored as textfile;
*
*LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m  ; *

*FROM test_m
INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description
where client=1
INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description
where client=2 ;
*
--- contents of /tmp/m.lst
1test
1test2
1test3
2hi
2hi1
2hi3

Thanks!
-Matt


Re: Stdout -> Stderr

2009-06-02 Thread Matt Pestritto
Ok. Thanks for the info.  I have a java wrapper written around hive -e that
threw an exception when the error stream had any data.  I changed the
wrapper to check for a non 0 error status returned from the process instead
of looking at the error stream and that works fine now.

Thanks.


On Mon, Jun 1, 2009 at 4:33 PM, Raghu Murthy  wrote:

> Only info messages are being written to stderr. The actual data should be
> written to stdout. See https://issues.apache.org/jira/browse/HIVE-505
>
>
> On 6/1/09 1:29 PM, "Matt Pestritto"  wrote:
>
> > Hi All.
> >
> > It seems like the latest trunk is writing stdout to stderr on a hive -e
> > call.  Is this the intended functionality ?
> >
> > hive -e "query 1; query 2; query 3; " 2> errors.out
> >
> > errors.out has stdout.  stdout has no output.
> >
> > Thanks
> > -Matt
>
>


Stdout -> Stderr

2009-06-01 Thread Matt Pestritto
Hi All.

It seems like the latest trunk is writing stdout to stderr on a hive -e
call.  Is this the intended functionality ?

hive -e "query 1; query 2; query 3; " 2> errors.out

errors.out has stdout.  stdout has no output.

Thanks
-Matt


Re: Trunk runtime errors

2009-05-13 Thread Matt Pestritto
Prasad.

My query is pretty complex so I created a simple test case for you.  I first
tried on a table with only 1 partition and that succeeded.  I then tried
with two partitions and that did not copy the data.  So it seems like it is
only for tables with more than 1 partition.

I ran this in the CLI.

drop table hive_test_src;
create table hive_test_src ( col1 string ) stored as textfile ;
load data local inpath '/home/mpestritto/hive_test/data.dat' overwrite into
table hive_test_src ;

drop table hive_test_dst;
create table hive_test_dst ( col1 string ) partitioned by ( pcol1 string ,
pcol2 string) stored as sequencefile;

insert overwrite table hive_test_dst partition ( pcol1='test_part' ,
pcol2='test_part') select col1 from hive_test_src ;
select count(1) from hive_test_dst where pcol1='test_part' and
pcol2='test_part';

mpestri...@mustique:~/hive_test$ cat data.dat
1
2
3
4
5
6


CLI - OUTPUT:

hive> drop table hive_test_src;
OK
Time taken: 0.188 seconds
hive> create table hive_test_src ( col1 string ) stored as textfile ;
OK
Time taken: 0.098 seconds
hive> load data local inpath '/home/mpestritto/hive_test/data.dat' overwrite
into table hive_test_src ;
Copying data from file:/home/mpestritto/hive_test/data.dat
Loading data to table hive_test_src
OK
Time taken: 0.36 seconds
hive>
> drop table hive_test_dst;
OK
Time taken: 0.124 seconds
hive> create table hive_test_dst ( col1 string ) partitioned by ( pcol1
string , pcol2 string) stored as sequencefile;
OK
Time taken: 0.084 seconds
hive>
> insert overwrite table hive_test_dst partition ( pcol1='test_part' ,
pcol2='test_part') select col1 from hive_test_src ;
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_200905111618_0098, Tracking URL =
http://mustique.ps.tld:50030/jobdetails.jsp?jobid=job_200905111618_0098
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job
-Dmapred.job.tracker=mustique.ps.tld:9001 -kill job_200905111618_0098
 map = 0%,  reduce =0%
 map = 100%,  reduce =100%
Ended Job = job_200905111618_0098
Loading data to table hive_test_dst partition {pcol1=test_part,
pcol2=test_part}
6 Rows loaded to hive_test_dst
OK
Time taken: 5.687 seconds
hive> select count(1) from hive_test_dst where pcol1='test_part' and
pcol2='test_part';
Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
Job need not be submitted: no output: Success
OK
Time taken: 0.41 seconds
hive>



On Wed, May 13, 2009 at 12:15 PM, Prasad Chakka wrote:

> Matt,
>
> Can you send me the query for the first problem? Also whether the directory
> for the partition exists before the query is issued?
>
> Thanks,
> Prasad
>
>
> 
> From: Matt Pestritto 
> Reply-To: 
> Date: Wed, 13 May 2009 09:04:38 -0700
> To: 
> Subject: Trunk runtime errors
>
> All -
>
> 1st problem.
> I was having a problem loading data into partitions when the partition did
> not exist and traced the problem to revision 772746.  Trunk also has the
> same error.
> Revision 772746 SVN Commend:  HIVE-442. Create partitions after data is
> moved in the query in order to close out an inconsistent window. (Prasad
> Chakka via athusoo)
> Revision 772012 works fine for me.
>
> Essentially the partition directories are created but the data is never
> copied over.  If I run the same job again, the data is copied to the target
> directory in HDFS.
>
> 2nd problem.
> When I try to do a select count(1) from a table I get the following
> exception and I'm not sure what the cause is.  Again, this works fine if I
> roll back to revision 772012.
> Job Submission failed with exception
> 'java.lang.IllegalArgumentException(Wrong FS: file:/tmp/hive-hive/1,
> expected: hdfs://mustique.ps.tld:9000)'
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
> Let me know if I can facilitate further.
>
> Thanks
> -Matt
>
>


Trunk runtime errors

2009-05-13 Thread Matt Pestritto
All -

1st problem.
I was having a problem loading data into partitions when the partition did
not exist and traced the problem to revision 772746.  Trunk also has the
same error.
Revision 772746 SVN Commend:  HIVE-442. Create partitions after data is
moved in the query in order to close out an inconsistent window. (Prasad
Chakka via athusoo)
Revision 772012 works fine for me.

Essentially the partition directories are created but the data is never
copied over.  If I run the same job again, the data is copied to the target
directory in HDFS.

2nd problem.
When I try to do a select count(1) from a table I get the following
exception and I'm not sure what the cause is.  Again, this works fine if I
roll back to revision 772012.
Job Submission failed with exception
'java.lang.IllegalArgumentException(Wrong FS: file:/tmp/hive-hive/1,
expected: hdfs://mustique.ps.tld:9000)'
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.ExecDriver

Let me know if I can facilitate further.

Thanks
-Matt


trunk not working.

2009-05-12 Thread Matt Pestritto
Hi all.

The lastest trunk is not working for me.  When I roll back to version -
773043 it works fine.  Any ideas ?
Let me know if you need anything else.

Thanks in advance.

Error is below:

hive> select count(1) from tbl1;
Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
Job Submission failed with exception
'java.lang.IllegalArgumentException(Wrong FS: file:/tmp/hive-mpestritto/1,
expected: hdfs://mustique.ps.tld:9000)'
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.ExecDriver


[jira] Commented: (HIVE-354) [hive] udf needed for getting length of a string

2009-04-20 Thread Matt Pestritto (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700970#action_12700970
 ] 

Matt Pestritto commented on HIVE-354:
-

Thanks for patch.  Worked for me.

> [hive] udf needed for getting length of a string
> 
>
> Key: HIVE-354
> URL: https://issues.apache.org/jira/browse/HIVE-354
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Neil Conway
> Attachments: JIRA_hive-354.patch.1
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-383) New String Function: length

2009-04-20 Thread Matt Pestritto (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700937#action_12700937
 ] 

Matt Pestritto commented on HIVE-383:
-

Ah. Thanks Neil.  I didn't notice the dup.

> New String Function: length
> ---
>
> Key: HIVE-383
> URL: https://issues.apache.org/jira/browse/HIVE-383
> Project: Hadoop Hive
>  Issue Type: New Feature
>    Reporter: Matt Pestritto
>
> Request for an additional string function: length (  ) 
> returns an integer of the length of the string passed in

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-383) New String Function: length

2009-04-19 Thread Matt Pestritto (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700594#action_12700594
 ] 

Matt Pestritto commented on HIVE-383:
-

Any ETA on this request ?  It seems like a straightforward request.
functional example:  select max( length( col1 ) ) from table_1; 

Thanks.

> New String Function: length
> ---
>
> Key: HIVE-383
> URL: https://issues.apache.org/jira/browse/HIVE-383
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Matt Pestritto
>
> Request for an additional string function: length (  ) 
> returns an integer of the length of the string passed in

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-383) New String Function: length

2009-04-02 Thread Matt Pestritto (JIRA)
New String Function: length
---

 Key: HIVE-383
 URL: https://issues.apache.org/jira/browse/HIVE-383
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Matt Pestritto


Request for an additional string function: length (  ) 
returns an integer of the length of the string passed in

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.