Re: Running Hive on multi node
Hi Hive uses the hadoop installation specified in HADOOP_HOME. If your hadoop home is configured for fully distributed operation it'll utilize the cluster itself. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Hamza Asad hamza.asa...@gmail.com Date: Thu, 21 Feb 2013 14:26:40 To: user@hive.apache.org Reply-To: user@hive.apache.org Subject: Running Hive on multi node Does hive automatically runs on multi node as i configured hadoop on multi node OR i have to explicitly do its configuration?? -- *Muhammad Hamza Asad*
RE: Adding comment to a table for columns
Try using 'describe formatted' command i.e. describe formatted test Thanks and regards, Snehalata Deorukhkar From: Chunky Gupta [mailto:chunky.gu...@vizury.com] Sent: Thursday, February 21, 2013 4:47 PM To: user@hive.apache.org Subject: Adding comment to a table for columns Hi, I am using this syntax to add comments for all columns :- CREATE EXTERNAL TABLE test ( c STRING COMMENT 'Common class', time STRING COMMENT 'Common time', url STRING COMMENT 'Site URL' ) PARTITIONED BY (dt STRING ) LOCATION 's3://BucketName/' Output of Describe Extended table is like :- (Output is just an example copied from internet) hive DESCRIBE EXTENDED table_name; Detailed Table Information Table(tableName:table_name, dbName:benchmarking, owner:root, createTime:1309480053, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:session_key, type:string, comment:null), FieldSchema(name:remote_address, type:string, comment:null), FieldSchema(name:canister_lssn, type:string, comment:null), FieldSchema(name:canister_session_id, type:bigint, comment:null), FieldSchema(name:tltsid, type:string, comment:null), FieldSchema(name:tltuid, type:string, comment:null), FieldSchema(name:tltvid, type:string, comment:null), FieldSchema(name:canister_server, type:string, comment:null), FieldSchema(name:session_timestamp, type:string, comment:null), FieldSchema(name:session_duration, type:string, comment:null), FieldSchema(name:hit_count, type:bigint, comment:null), FieldSchema(name:http_user_agent, type:string, comment:null), FieldSchema(name:extractid, type:bigint, comment:null), FieldSchema(name:site_link, type:string, comment:null), FieldSchema(name:dt, type:string, comment:null), FieldSchema(name:hour, type:int, comment:null)], location:hdfs://hadoop2/user/hive/warehouse/benchmarking.db/table_name, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe) Is there any way of getting this detailed comments and column name in readable format, just like the output of Describe table_name ?. Thanks, Chunky. Confidential: This electronic message and all contents contain information from Syntel, Inc. which may be privileged, confidential or otherwise protected from disclosure. The information is intended to be for the addressee only. If you are not the addressee, any disclosure, copy, distribution or use of the contents of this message is prohibited. If you have received this electronic message in error, please notify the sender immediately and destroy the original message and all copies.
Re: Adding comment to a table for columns
Hi Gupta Try out DESCRIBE EXTENDED FORMATTED table-name I vaguely recall a operation like this. Please check hive wiki for the exact syntax. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Chunky Gupta chunky.gu...@vizury.com Date: Thu, 21 Feb 2013 17:15:37 To: user@hive.apache.org; bejoy...@yahoo.com; snehalata_bhas...@syntelinc.com Reply-To: user@hive.apache.org Subject: Re: Adding comment to a table for columns Hi Bejoy, Bhaskar I tried using FORMATTED, but it will not give me comments which I have put while creating table. Its output is like :- col_namedata_type comment cstring from deserializer timestring from deserializer Thanks, Chunky. On Thu, Feb 21, 2013 at 4:50 PM, bejoy...@yahoo.com wrote: ** Hi Gupta You can the describe output in a formatted way using DESCRIBE FORMATTED table name; Regards Bejoy KS Sent from remote device, Please excuse typos -- *From: * Chunky Gupta chunky.gu...@vizury.com *Date: *Thu, 21 Feb 2013 16:46:30 +0530 *To: *user@hive.apache.org *ReplyTo: * user@hive.apache.org *Subject: *Adding comment to a table for columns Hi, I am using this syntax to add comments for all columns :- CREATE EXTERNAL TABLE test ( c STRING COMMENT 'Common class', time STRING COMMENT 'Common time', url STRING COMMENT 'Site URL' ) PARTITIONED BY (dt STRING ) LOCATION 's3://BucketName/' Output of Describe Extended table is like :- (Output is just an example copied from internet) hive DESCRIBE EXTENDED table_name; Detailed Table Information Table(tableName:table_name, dbName:benchmarking, owner:root, createTime:1309480053, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:session_key, type:string, comment:null), FieldSchema(name:remote_address, type:string, comment:null), FieldSchema(name:canister_lssn, type:string, comment:null), FieldSchema(name:canister_session_id, type:bigint, comment:null), FieldSchema(name:tltsid, type:string, comment:null), FieldSchema(name:tltuid, type:string, comment:null), FieldSchema(name:tltvid, type:string, comment:null), FieldSchema(name:canister_server, type:string, comment:null), FieldSchema(name:session_timestamp, type:string, comment:null), FieldSchema(name:session_duration, type:string, comment:null), FieldSchema(name:hit_count, type:bigint, comment:null), FieldSchema(name:http_user_agent, type:string, comment:null), FieldSchema(name:extractid, type:bigint, comment:null), FieldSchema(name:site_link, type:string, comment:null), FieldSchema(name:dt, type:string, comment:null), FieldSchema(name:hour, type:int, comment:null)], location:hdfs://hadoop2/user/hive/warehouse/benchmarking.db/table_name, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe) Is there any way of getting this detailed comments and column name in readable format, just like the output of Describe table_name ?. Thanks, Chunky.
Re: Using HiveJDBC interface
Thanks for the tips. I would think #2 works well when you are setting hiveconf variables that are isolated to your query. I have instances in my scripts where I need to set hadoop properties before executing a query. For example setting the number of reducers using set mapred.reduce.tasks=50 Without the concept of a session in HiveServer, won't setting hadoop configurations like the one above effect all queries that are being submitted concurrently? Also, how do you tackle conflicts with tables stored in the meta store? Aditya On Mon, Feb 18, 2013 at 8:09 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I personally do not find it a large problem. 1) have multiple backend hive thrift servers with ha-proxy in front 2) don't use varaible names like x use myprocess1.x to remove possible collisions 3) experiment with hivethrift2 4) dont use zk locking + thrift (it leaks as far as I can tell (older versions)) Really #2 solve the problem mentioned on the wiki page. There are other subtle issues, but all in all it works pretty well. Edward On Mon, Feb 18, 2013 at 9:15 AM, Aditya Rao adityac...@gmail.com wrote: Hi, I've just recently started using Hive and I'm particularly interested about the capabilities of the HiveJDBC interface. I'm writing an simple application that aims to use the Hive JDBC driver to submit hive queries. My end goal is to be able to create multiple connections using the Hive JDBC driver and submit queries concurrently. I came across a few issues in the mailing list and in JIRA related to issuing concurrent requests to the hive server (explained here https://cwiki.apache.org/Hive/hiveserver2-thrift-api.html) . I would like to know if anyone has suggestions/guidelines regarding best practices to work around this problem? Apart from restricting to a single query at a time, are there any other known pitfalls that one should keep an eye out when using the HiveJDBC interface. Thanks, Aditya
Re: Hive 0.7.1 Query hands
Hi sir, the root cause of your issues seems to be java.io.EOFException, that based on the java doc description means the following: Signals that an end of file or end of stream has been reached unexpectedly during input. What is the health status of the box with ip 10.6.0.55? Isn't it by any chance having some issues? Port 8021 is used for TaskTracker, so I would start by connecting to that box and checking the TT logs. Jarcec On Thu, Feb 21, 2013 at 02:44:09PM +0300, Павел Мезенцев wrote: Hello! I use Hive 0.7.1 over Hadoop 0.20.2 (CHD3u3) on 70 nodes cluster. I have a trouble with query like this: *FROM* ( *SELECT* *id*, {expressions} *FROM* table1 *WHERE* day='2013-02-16' *AND* ({conditions1})*UNION* *ALL* *SELECT* *id*, {expressions} *FROM* table2 *WHERE* day='2013-02-16' *AND* (conditions)*UNION* *ALL* *SELECT* *id*, {expressions} *FROM* table3 *WHERE* day='2013-02-16' *AND* (conditions)*UNION* *ALL* *SELECT* *id*, {expressions} *FROM* table4 *WHERE* day='2013-02-16' *AND* (conditions)) union_tmp*INSERT* OVERWRITE *table* result_table *PARTITION* (day='2013-02-16')*SELECT* *id*, transformations (expressions)*GROUP* *BY* *id*; it had 4865 map tasks and 100 reduce tasks. first 4780 map taks completed succefull and last 85 tasks hangs. All this tasks hands with no progress, and no tasks attempts for each: One hour later after hang situation starting, job failed with exeption: 2013-02-20 20:02:02,000 Stage-1 map = 0%, reduce = 0% 2013-02-20 20:02:40,679 Stage-1 map = 1%, reduce = 0% 2013-02-20 20:02:54,022 Stage-1 map = 2%, reduce = 0% 2013-02-20 20:03:14,129 Stage-1 map = 3%, reduce = 0% .. 2013-02-20 21:18:00,361 Stage-1 map = 98%, reduce = 22% 2013-02-20 21:18:05,691 Stage-1 map = 98%, reduce = 23% java.io.IOException: Call to statlabjt/10.6.0.55:8021 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142) at org.apache.hadoop.ipc.Client.call(Client.java:1110) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at org.apache.hadoop.mapred.$Proxy8.getJobStatus(Unknown Source) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1053) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1065) at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:351) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:672) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:425) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:815) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:724) Ended Job = job_201302152355_4764 with exception 'java.io.IOException(Call to statlabjt/10.6.0.55:8021 failed on local exception: java.io.EOFException)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask How to find reasons of such situation? How to prevent such situations in future? Best regards Mezentsev Pavel signature.asc Description: Digital signature
Re: ROW_NUMBER() equivalent in Hive
What are the semantics for ROW_NUMBER? Is it a global row number? Per a partition? Per a bucket? -- Owen On Wed, Feb 20, 2013 at 11:33 PM, kumar mr kumar...@aol.com wrote: Hi, This is Kumar, and this is my first question in this group. I have a requirement to implement ROW_NUMBER() from Teradata in Hive where partitioning happens on multiple columns along with multiple column ordering. It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing in UDF can assign same rank to grouping key considering dataset is small, but ordering need to be done in prior step. Can we do this in lot simpler way? Thanks in advance. Regards, Kumar
Re: ROW_NUMBER() equivalent in Hive
Kumar, If you are willing to be on bleeding edge, this and many other partitioning and windowing functionality some of us are developing in a branch over at: https://svn.apache.org/repos/asf/hive/branches/ptf-windowing Check out this branch, build hive and than you can have row_number() functionality. Look in ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60 or so example queries demonstrating various capabilities which we have already working (including row_number). We hope to have this branch merged in trunk soon. Hope it helps, Ashutosh On Wed, Feb 20, 2013 at 11:33 PM, kumar mr kumar...@aol.com wrote: Hi, This is Kumar, and this is my first question in this group. I have a requirement to implement ROW_NUMBER() from Teradata in Hive where partitioning happens on multiple columns along with multiple column ordering. It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing in UDF can assign same rank to grouping key considering dataset is small, but ordering need to be done in prior step. Can we do this in lot simpler way? Thanks in advance. Regards, Kumar
Re: ROW_NUMBER() equivalent in Hive
Owen, it's for entire table. the sample TD query looks like below, SELECT columnA ,columnB , columnC , columnD , columnX ,ROW_NUMBER() OVER (PARTITION BY columnA, columnB, columnC ORDER BY columnX DESC, columnY DESC) AS rank FROM table a Regards, Kumar -Original Message- From: Owen O'Malley omal...@apache.org To: user user@hive.apache.org Sent: Thu, Feb 21, 2013 8:08 am Subject: Re: ROW_NUMBER() equivalent in Hive What are the semantics for ROW_NUMBER? Is it a global row number? Per a partition? Per a bucket? -- Owen On Wed, Feb 20, 2013 at 11:33 PM, kumar mr kumar...@aol.com wrote: Hi, This is Kumar, and this is my first question in this group. I have a requirement to implement ROW_NUMBER() from Teradata in Hive where partitioning happens on multiple columns along with multiple column ordering. It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing in UDF can assign same rank to grouping key considering dataset is small, but ordering need to be done in prior step. Can we do this in lot simpler way? Thanks in advance. Regards, Kumar
please remove me
Can you please take me off the mailing list. Erik Thorson Varick Media Management Lead Engineer 212.337.4796 201.694.1122 [cid:925415C8-B8A6-494F-86E5-213E94DA91FA] [cid:44B259FA-6E9B-447C-8D2E-3BB0CA6A1D7B] [cid:53F1B684-5196-4372-A160-3945F706F949] This e-mail transmission (and/or documents attached) contains confidential information. The information is intended only for the use of the individual or entity to whom this e-mail is directed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this transmission in error, please delete same immediately. This e-mail may not be forwarded without the sender's express permission. inline: AD5D789F-809C-4E42-B456-264B406EC7EF[14].pnginline: 6E383244-5CFA-41F1-9D31-5CF479C4E1B0[14].pnginline: D9EFECD9-A89E-44A7-A909-926E1CB4990F[14].png
Re: ROW_NUMBER() equivalent in Hive
Hi Ashutosh, I am interested / reviewing your windowing feature. Can you be more specific about which (a) tests and (b) src files constitute your additions (there are lots of files there ;) ) thanks stephen boesch 2013/2/21 Ashutosh Chauhan hashut...@apache.org Kumar, If you are willing to be on bleeding edge, this and many other partitioning and windowing functionality some of us are developing in a branch over at: https://svn.apache.org/repos/asf/hive/branches/ptf-windowing Check out this branch, build hive and than you can have row_number() functionality. Look in ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60 or so example queries demonstrating various capabilities which we have already working (including row_number). We hope to have this branch merged in trunk soon. Hope it helps, Ashutosh On Wed, Feb 20, 2013 at 11:33 PM, kumar mr kumar...@aol.com wrote: Hi, This is Kumar, and this is my first question in this group. I have a requirement to implement ROW_NUMBER() from Teradata in Hive where partitioning happens on multiple columns along with multiple column ordering. It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing in UDF can assign same rank to grouping key considering dataset is small, but ordering need to be done in prior step. Can we do this in lot simpler way? Thanks in advance. Regards, Kumar
Re: ROW_NUMBER() equivalent in Hive
Hi Stephen, As I indicated in my previous email, check out file ql/src/test/queries/ clientpositive/ptf_general_queries.q it has plenty of example queries demonstrating the functionality which is available. If you are interested in hive src changes which has enabled this feature.. you may want to start by looking at a patch attached on HIVE-896 which was the starting point for this work. That jira also has links with other jira which we did /are doing on top of that patch. Hope it helps, Ashutosh On Thu, Feb 21, 2013 at 12:17 PM, Stephen Boesch java...@gmail.com wrote: Hi Ashutosh, I am interested / reviewing your windowing feature. Can you be more specific about which (a) tests and (b) src files constitute your additions (there are lots of files there ;) ) thanks stephen boesch 2013/2/21 Ashutosh Chauhan hashut...@apache.org Kumar, If you are willing to be on bleeding edge, this and many other partitioning and windowing functionality some of us are developing in a branch over at: https://svn.apache.org/repos/asf/hive/branches/ptf-windowing Check out this branch, build hive and than you can have row_number() functionality. Look in ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60 or so example queries demonstrating various capabilities which we have already working (including row_number). We hope to have this branch merged in trunk soon. Hope it helps, Ashutosh On Wed, Feb 20, 2013 at 11:33 PM, kumar mr kumar...@aol.com wrote: Hi, This is Kumar, and this is my first question in this group. I have a requirement to implement ROW_NUMBER() from Teradata in Hive where partitioning happens on multiple columns along with multiple column ordering. It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing in UDF can assign same rank to grouping key considering dataset is small, but ordering need to be done in prior step. Can we do this in lot simpler way? Thanks in advance. Regards, Kumar
Re: hive 0.10.0 doc
Can someone point me to the apache docs for hive 0.10.0? Now you can use the Hive wiki for all documentation except javadocs: - https://cwiki.apache.org/confluence/display/Hive/Home I've just added two docs that weren't originally in the wiki, and for everything else the wiki versions are the same as the regular docs or better. Here are the new wikidocs: 1. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution 2. https://cwiki.apache.org/confluence/display/Hive/ReflectUDF – Lefty Leverenz On Mon, Jan 28, 2013 at 11:21 AM, Shreepadma Venugopalan shreepa...@cloudera.com wrote: All, Can someone point me to the apache docs for hive 0.10.0? Thanks. Shreepadma
Re: unbalanced transaction calls
Hi, We are running into the same problem as well. Is there any clue what could be wrong ? Thanks hemanth On Wed, Feb 6, 2013 at 1:51 AM, James Warren james.war...@stanfordalumni.org wrote: As part of our daily workflow, we're running a few hundred hive queries that are coordinated through oozie. Recently we're encountering issues where on average a job or two fails - and never the same query. The observed error is: FAILED: Error in metadata: java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that there are unbalanced calls to openTransaction/commitTransaction which I've seen was referred to in HIVE-1760 but patched in 0.7. We're running Hive 0.9 (from CDH 4.1 - will redirect to the Cloudera lists if that is a more appropriate form) - any ideas / suggestions where I should start looking? cheers, -James
Reporting deadlink at GettingStarted.
Hello Hive guys. I found a deadline in the GettingStarted document. https://cwiki.apache.org/confluence/display/Hive/GettingStarted But have no way to fix it. so i'm reporting deadline and giving a link which maybe a correct link. wget http://www.grouplens.org/system/files/ml-data.tar+0.gz = wget http://www.grouplens.org/sites/www.grouplens.org/external_files/data/ml-10m.zip Thank you!