Re: Running Hive on multi node

2013-02-21 Thread bejoy_ks
Hi

Hive uses the hadoop installation specified in HADOOP_HOME. If your hadoop home 
is configured for fully distributed operation it'll utilize the cluster itself.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Hamza Asad hamza.asa...@gmail.com
Date: Thu, 21 Feb 2013 14:26:40 
To: user@hive.apache.org
Reply-To: user@hive.apache.org
Subject: Running Hive on multi node

Does hive automatically runs on multi node as i configured hadoop on multi
node OR i have to explicitly do its configuration??

-- 
*Muhammad Hamza Asad*



RE: Adding comment to a table for columns

2013-02-21 Thread Bhaskar, Snehalata
Try using 'describe formatted' command  i.e.  describe formatted test

Thanks and regards,
Snehalata Deorukhkar

From: Chunky Gupta [mailto:chunky.gu...@vizury.com]
Sent: Thursday, February 21, 2013 4:47 PM
To: user@hive.apache.org
Subject: Adding comment to a table for columns


Hi,

I am using this syntax to add comments for all columns :-

CREATE EXTERNAL TABLE test ( c STRING COMMENT 'Common  class', time STRING 
COMMENT 'Common  time', url STRING COMMENT 'Site URL' ) PARTITIONED BY (dt 
STRING ) LOCATION 's3://BucketName/'

Output of Describe Extended table is like :- (Output is just an example copied 
from internet)

hive DESCRIBE EXTENDED table_name;

Detailed Table Information Table(tableName:table_name, dbName:benchmarking, 
owner:root, createTime:1309480053, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:session_key, type:string, 
comment:null), FieldSchema(name:remote_address, type:string, comment:null), 
FieldSchema(name:canister_lssn, type:string, comment:null), 
FieldSchema(name:canister_session_id, type:bigint, comment:null), 
FieldSchema(name:tltsid, type:string, comment:null), FieldSchema(name:tltuid, 
type:string, comment:null), FieldSchema(name:tltvid, type:string, 
comment:null), FieldSchema(name:canister_server, type:string, comment:null), 
FieldSchema(name:session_timestamp, type:string, comment:null), 
FieldSchema(name:session_duration, type:string, comment:null), 
FieldSchema(name:hit_count, type:bigint, comment:null), 
FieldSchema(name:http_user_agent, type:string, comment:null), 
FieldSchema(name:extractid, type:bigint, comment:null), 
FieldSchema(name:site_link, type:string, comment:null), FieldSchema(name:dt, 
type:string, comment:null), FieldSchema(name:hour, type:int, comment:null)], 
location:hdfs://hadoop2/user/hive/warehouse/benchmarking.db/table_name, 
inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe)

Is there any way of getting this detailed comments and column name in readable 
format, just like the output of Describe table_name ?.



Thanks,

Chunky.

Confidential: This electronic message and all contents contain information from 
Syntel, Inc. which may be privileged, confidential or otherwise protected from 
disclosure. The information is intended to be for the addressee only. If you 
are not the addressee, any disclosure, copy, distribution or use of the 
contents of this message is prohibited. If you have received this electronic 
message in error, please notify the sender immediately and destroy the original 
message and all copies.


Re: Adding comment to a table for columns

2013-02-21 Thread bejoy_ks
Hi Gupta

Try out

DESCRIBE EXTENDED FORMATTED table-name

I vaguely recall a operation like this.
Please check hive wiki for the exact syntax.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Chunky Gupta chunky.gu...@vizury.com
Date: Thu, 21 Feb 2013 17:15:37 
To: user@hive.apache.org; bejoy...@yahoo.com; 
snehalata_bhas...@syntelinc.com
Reply-To: user@hive.apache.org
Subject: Re: Adding comment to a table for columns

Hi Bejoy, Bhaskar

I tried using FORMATTED, but it will not give me comments which I have put
while creating table. Its output is like :-

col_namedata_type   comment
cstring  from deserializer
timestring  from deserializer

Thanks,
Chunky.

On Thu, Feb 21, 2013 at 4:50 PM, bejoy...@yahoo.com wrote:

 **
 Hi Gupta

 You can the describe output in a formatted way using

 DESCRIBE FORMATTED table name;
 Regards
 Bejoy KS

 Sent from remote device, Please excuse typos
 --
 *From: * Chunky Gupta chunky.gu...@vizury.com
 *Date: *Thu, 21 Feb 2013 16:46:30 +0530
 *To: *user@hive.apache.org
 *ReplyTo: * user@hive.apache.org
 *Subject: *Adding comment to a table for columns

 Hi,

 I am using this syntax to add comments for all columns :-

 CREATE EXTERNAL TABLE test ( c STRING COMMENT 'Common  class', time STRING
 COMMENT 'Common  time', url STRING COMMENT 'Site URL' ) PARTITIONED BY (dt
 STRING ) LOCATION 's3://BucketName/'

 Output of Describe Extended table is like :- (Output is just an example
 copied from internet)

 hive DESCRIBE EXTENDED table_name;

 Detailed Table Information Table(tableName:table_name,
 dbName:benchmarking, owner:root, createTime:1309480053, lastAccessTime:0,
 retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:session_key,
 type:string, comment:null), FieldSchema(name:remote_address, type:string,
 comment:null), FieldSchema(name:canister_lssn, type:string, comment:null),
 FieldSchema(name:canister_session_id, type:bigint, comment:null),
 FieldSchema(name:tltsid, type:string, comment:null),
 FieldSchema(name:tltuid, type:string, comment:null),
 FieldSchema(name:tltvid, type:string, comment:null),
 FieldSchema(name:canister_server, type:string, comment:null),
 FieldSchema(name:session_timestamp, type:string, comment:null),
 FieldSchema(name:session_duration, type:string, comment:null),
 FieldSchema(name:hit_count, type:bigint, comment:null),
 FieldSchema(name:http_user_agent, type:string, comment:null),
 FieldSchema(name:extractid, type:bigint, comment:null),
 FieldSchema(name:site_link, type:string, comment:null),
 FieldSchema(name:dt, type:string, comment:null), FieldSchema(name:hour,
 type:int, comment:null)],
 location:hdfs://hadoop2/user/hive/warehouse/benchmarking.db/table_name,
 inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
 outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
 compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
 serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe)

 Is there any way of getting this detailed comments and column name in
 readable format, just like the output of Describe table_name ?.


 Thanks,

 Chunky.




Re: Using HiveJDBC interface

2013-02-21 Thread Aditya Rao
Thanks for the tips. I would think #2 works well when you are setting
hiveconf variables that are isolated to your query. I have instances in my
scripts where I need to set hadoop properties before executing a query. For
example setting the number of reducers using

set mapred.reduce.tasks=50

Without the concept of a session in HiveServer, won't setting hadoop
configurations like the one above effect all queries that are being
submitted concurrently?

Also, how do you tackle conflicts with tables stored in the meta store?

Aditya



On Mon, Feb 18, 2013 at 8:09 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I personally do not find it a large problem.

 1) have multiple backend hive thrift servers with ha-proxy in front
 2) don't use varaible names like x use myprocess1.x to remove
 possible collisions
 3) experiment with hivethrift2
 4) dont use zk locking + thrift (it leaks as far as I can tell (older
 versions))

 Really #2 solve the problem mentioned on the wiki page. There are
 other subtle issues, but all in all it works pretty well.

 Edward

 On Mon, Feb 18, 2013 at 9:15 AM, Aditya Rao adityac...@gmail.com wrote:
  Hi,
 
  I've just recently started using Hive and I'm particularly interested
 about
  the capabilities of the HiveJDBC interface. I'm writing an simple
  application that aims to use the Hive JDBC driver to submit hive
 queries. My
  end goal is to be able to create multiple connections using the Hive JDBC
  driver and submit queries concurrently.
 
  I came across a few issues in the mailing list and in JIRA related to
  issuing concurrent requests to the hive server (explained here
  https://cwiki.apache.org/Hive/hiveserver2-thrift-api.html) . I would
 like to
  know if anyone has suggestions/guidelines regarding best practices to
 work
  around this problem? Apart from restricting to a single query at a time,
 are
  there any other known pitfalls that one should keep an eye out when using
  the HiveJDBC interface.
 
  Thanks,
 
  Aditya



Re: Hive 0.7.1 Query hands

2013-02-21 Thread Jarek Jarcec Cecho
Hi sir,
the root cause of your issues seems to be java.io.EOFException, that based on 
the java doc description means the following:

  Signals that an end of file or end of stream has been reached unexpectedly 
during input.

What is the health status of the box with ip 10.6.0.55? Isn't it by any chance 
having some issues? Port 8021 is used for TaskTracker, so I would start by 
connecting to that box and checking the TT logs.

Jarcec

On Thu, Feb 21, 2013 at 02:44:09PM +0300, Павел Мезенцев wrote:
 Hello!
 
 I use Hive 0.7.1 over Hadoop 0.20.2 (CHD3u3) on 70 nodes cluster.
 I have a trouble with query like this:
 
 *FROM* (  *SELECT* *id*, {expressions} *FROM* table1   *WHERE*
 day='2013-02-16' *AND* ({conditions1})*UNION* *ALL*  *SELECT* *id*,
 {expressions} *FROM* table2   *WHERE* day='2013-02-16' *AND*
 (conditions)*UNION* *ALL*  *SELECT* *id*, {expressions} *FROM* table3
 *WHERE* day='2013-02-16' *AND* (conditions)*UNION* *ALL*  *SELECT*
 *id*, {expressions} *FROM* table4  *WHERE* day='2013-02-16' *AND*
 (conditions)) union_tmp*INSERT* OVERWRITE *table* result_table
 *PARTITION* (day='2013-02-16')*SELECT* *id*, transformations
 (expressions)*GROUP* *BY* *id*;
 
 it had 4865 map tasks and 100 reduce tasks.
 first 4780 map taks completed succefull and last
 85 tasks hangs.
 
 All this tasks hands with no progress,
 
 
 and no tasks attempts for each:
 
 One hour later after hang situation starting, job failed with exeption:
 
 2013-02-20 20:02:02,000 Stage-1 map = 0%,  reduce = 0%
 2013-02-20 20:02:40,679 Stage-1 map = 1%,  reduce = 0%
 2013-02-20 20:02:54,022 Stage-1 map = 2%,  reduce = 0%
 2013-02-20 20:03:14,129 Stage-1 map = 3%,  reduce = 0%
 ..
 2013-02-20 21:18:00,361 Stage-1 map = 98%,  reduce = 22%
 2013-02-20 21:18:05,691 Stage-1 map = 98%,  reduce = 23%
 java.io.IOException: Call to statlabjt/10.6.0.55:8021 failed on local
 exception: java.io.EOFException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
   at org.apache.hadoop.ipc.Client.call(Client.java:1110)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
   at org.apache.hadoop.mapred.$Proxy8.getJobStatus(Unknown Source)
   at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1053)
   at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1065)
   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:351)
   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:672)
   at 
 org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:425)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:815)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:724)
 Ended Job = job_201302152355_4764 with exception
 'java.io.IOException(Call to statlabjt/10.6.0.55:8021 failed on local
 exception: java.io.EOFException)'
 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.MapRedTask
 
 How to find reasons of such situation?
 How to prevent such situations in future?
 
 Best regards
 Mezentsev Pavel


signature.asc
Description: Digital signature


Re: ROW_NUMBER() equivalent in Hive

2013-02-21 Thread Owen O'Malley
What are the semantics for ROW_NUMBER? Is it a global row number? Per a
partition? Per a bucket?

-- Owen


On Wed, Feb 20, 2013 at 11:33 PM, kumar mr kumar...@aol.com wrote:

 Hi,

  This is Kumar, and this is my first question in this group.

  I have a requirement to implement ROW_NUMBER() from Teradata in Hive
 where partitioning happens on multiple columns along with multiple column
 ordering.
 It can be easily implemented in Hadoop MR, but I have to do in Hive. By
 doing in UDF can assign same rank to grouping key considering dataset is
 small, but ordering need to be done in prior step.
 Can we do this in lot simpler way?

  Thanks in advance.

  Regards,
 Kumar



Re: ROW_NUMBER() equivalent in Hive

2013-02-21 Thread Ashutosh Chauhan
Kumar,

If you are willing to be on bleeding edge, this and many other partitioning
and windowing functionality some of us are developing in a branch over at:
https://svn.apache.org/repos/asf/hive/branches/ptf-windowing
Check out this branch, build hive and than you can have row_number()
functionality. Look in
ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60
or so example queries demonstrating various capabilities which we have
already working (including row_number).
We hope to have this branch merged in trunk soon.

Hope it helps,
Ashutosh
On Wed, Feb 20, 2013 at 11:33 PM, kumar mr kumar...@aol.com wrote:

 Hi,

  This is Kumar, and this is my first question in this group.

  I have a requirement to implement ROW_NUMBER() from Teradata in Hive
 where partitioning happens on multiple columns along with multiple column
 ordering.
 It can be easily implemented in Hadoop MR, but I have to do in Hive. By
 doing in UDF can assign same rank to grouping key considering dataset is
 small, but ordering need to be done in prior step.
 Can we do this in lot simpler way?

  Thanks in advance.

  Regards,
 Kumar



Re: ROW_NUMBER() equivalent in Hive

2013-02-21 Thread kumar mr

Owen,


it's for entire table. the sample TD query looks like below,



SELECT
columnA
,columnB
, columnC
, columnD
, columnX
,ROW_NUMBER() OVER (PARTITION BY columnA, columnB, columnC ORDER BY 
columnX DESC, columnY DESC) AS rank
FROM table a


Regards,
Kumar





-Original Message-
From: Owen O'Malley omal...@apache.org
To: user user@hive.apache.org
Sent: Thu, Feb 21, 2013 8:08 am
Subject: Re: ROW_NUMBER() equivalent in Hive


What are the semantics for ROW_NUMBER? Is it a global row number? Per a 
partition? Per a bucket?


-- Owen




On Wed, Feb 20, 2013 at 11:33 PM, kumar mr kumar...@aol.com wrote:

Hi,


This is Kumar, and this is my first question in this group.


I have a requirement to implement ROW_NUMBER() from Teradata in Hive where 
partitioning happens on multiple columns along with multiple column ordering. 
It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing 
in UDF can assign same rank to grouping key considering dataset is small, but 
ordering need to be done in prior step.
Can we do this in lot simpler way? 


Thanks in advance.


Regards,
Kumar



 

 


please remove me

2013-02-21 Thread Erik Thorson
Can you please take me off the mailing list.

Erik Thorson
Varick Media Management
Lead Engineer
212.337.4796
201.694.1122
[cid:925415C8-B8A6-494F-86E5-213E94DA91FA]
[cid:44B259FA-6E9B-447C-8D2E-3BB0CA6A1D7B]
[cid:53F1B684-5196-4372-A160-3945F706F949]




This e-mail transmission (and/or documents attached) contains confidential 
information. The information is intended only for the use of the individual or 
entity to whom this e-mail is directed. If you are not the intended recipient, 
you are hereby notified that any disclosure, copying, distribution or the 
taking of any action in reliance on the contents of this information is 
strictly prohibited. If you have received this transmission in error, please 
delete same immediately.



This e-mail may not be forwarded without the sender's express permission.

inline: AD5D789F-809C-4E42-B456-264B406EC7EF[14].pnginline: 6E383244-5CFA-41F1-9D31-5CF479C4E1B0[14].pnginline: D9EFECD9-A89E-44A7-A909-926E1CB4990F[14].png

Re: ROW_NUMBER() equivalent in Hive

2013-02-21 Thread Stephen Boesch
Hi Ashutosh,
   I am interested / reviewing your windowing feature.  Can you be more
specific about which (a) tests and (b) src files constitute your additions
(there are lots of files there ;)  )

thanks

stephen boesch


2013/2/21 Ashutosh Chauhan hashut...@apache.org

 Kumar,

 If you are willing to be on bleeding edge, this and many other
 partitioning and windowing functionality some of us are developing in a
 branch over at:
 https://svn.apache.org/repos/asf/hive/branches/ptf-windowing
 Check out this branch, build hive and than you can have row_number()
 functionality. Look in
 ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60
 or so example queries demonstrating various capabilities which we have
 already working (including row_number).
 We hope to have this branch merged in trunk soon.

 Hope it helps,
 Ashutosh
 On Wed, Feb 20, 2013 at 11:33 PM, kumar mr kumar...@aol.com wrote:

 Hi,

  This is Kumar, and this is my first question in this group.

  I have a requirement to implement ROW_NUMBER() from Teradata in Hive
 where partitioning happens on multiple columns along with multiple column
 ordering.
 It can be easily implemented in Hadoop MR, but I have to do in Hive. By
 doing in UDF can assign same rank to grouping key considering dataset is
 small, but ordering need to be done in prior step.
 Can we do this in lot simpler way?

  Thanks in advance.

  Regards,
 Kumar





Re: ROW_NUMBER() equivalent in Hive

2013-02-21 Thread Ashutosh Chauhan
Hi Stephen,

As I indicated in my previous email, check out file  ql/src/test/queries/
clientpositive/ptf_general_queries.q it has plenty of example queries
demonstrating the functionality which is available. If you are interested
in hive src changes which has enabled this feature.. you may want to start
by looking at a patch attached on HIVE-896 which was the starting point for
this work. That jira also has links with other jira which we did /are doing
on top of that patch.

Hope it helps,
Ashutosh

On Thu, Feb 21, 2013 at 12:17 PM, Stephen Boesch java...@gmail.com wrote:

 Hi Ashutosh,
I am interested / reviewing your windowing feature.  Can you be more
 specific about which (a) tests and (b) src files constitute your additions
 (there are lots of files there ;)  )

 thanks

 stephen boesch


 2013/2/21 Ashutosh Chauhan hashut...@apache.org

 Kumar,

 If you are willing to be on bleeding edge, this and many other
 partitioning and windowing functionality some of us are developing in a
 branch over at:
 https://svn.apache.org/repos/asf/hive/branches/ptf-windowing
 Check out this branch, build hive and than you can have row_number()
 functionality. Look in
 ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60
 or so example queries demonstrating various capabilities which we have
 already working (including row_number).
 We hope to have this branch merged in trunk soon.

 Hope it helps,
 Ashutosh
 On Wed, Feb 20, 2013 at 11:33 PM, kumar mr kumar...@aol.com wrote:

 Hi,

  This is Kumar, and this is my first question in this group.

  I have a requirement to implement ROW_NUMBER() from Teradata in Hive
 where partitioning happens on multiple columns along with multiple column
 ordering.
 It can be easily implemented in Hadoop MR, but I have to do in Hive. By
 doing in UDF can assign same rank to grouping key considering dataset is
 small, but ordering need to be done in prior step.
 Can we do this in lot simpler way?

  Thanks in advance.

  Regards,
 Kumar






Re: hive 0.10.0 doc

2013-02-21 Thread Lefty Leverenz

 Can someone point me to the apache docs for hive 0.10.0?


Now you can use the Hive wiki for all documentation except javadocs:

   - https://cwiki.apache.org/confluence/display/Hive/Home

I've just added two docs that weren't originally in the wiki, and for
everything else the wiki versions are the same as the regular docs or
better.  Here are the new wikidocs:

   1.
   
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution
   2. https://cwiki.apache.org/confluence/display/Hive/ReflectUDF


– Lefty Leverenz



On Mon, Jan 28, 2013 at 11:21 AM, Shreepadma Venugopalan 
shreepa...@cloudera.com wrote:

 All,

 Can someone point me to the apache docs for hive 0.10.0?

 Thanks.
 Shreepadma



Re: unbalanced transaction calls

2013-02-21 Thread Hemanth Yamijala
Hi,

We are running into the same problem as well. Is there any clue what could
be wrong ?

Thanks
hemanth


On Wed, Feb 6, 2013 at 1:51 AM, James Warren 
james.war...@stanfordalumni.org wrote:

 As part of our daily workflow, we're running a few hundred hive
 queries that are coordinated through oozie.  Recently we're
 encountering issues where on average a job or two fails - and never
 the same query.  The observed error is:

 FAILED: Error in metadata: java.lang.RuntimeException:
 commitTransaction was called but openTransactionCalls = 0. This
 probably indicates that there are unbalanced calls to
 openTransaction/commitTransaction

 which I've seen was referred to in HIVE-1760 but patched in 0.7.

 We're running Hive 0.9 (from CDH 4.1 - will redirect to the Cloudera
 lists if that is a more appropriate form) - any ideas / suggestions
 where I should start looking?

 cheers,
 -James



Reporting deadlink at GettingStarted.

2013-02-21 Thread 치민 박

Hello Hive guys.

I found a deadline in the GettingStarted document.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted

But have no way to fix it. so i'm reporting deadline and giving a link which 
maybe a correct link.

wget http://www.grouplens.org/system/files/ml-data.tar+0.gz
=
wget 
http://www.grouplens.org/sites/www.grouplens.org/external_files/data/ml-10m.zip

Thank you!