RE: Hive Alter Partition Location Issue
Hi Harsha, Have you updated stats on table1 after partition adding? In other words it is possible that the optimiser is not aware of that partition yet? analyze table table1 partition (dt=201501) compute statistics; HTH Mich Talebzadeh http://talebzadehmich.wordpress.com Author of the books A Practitioner’s Guide to Upgrading to Sybase ASE 15, ISBN 978-0-9563693-0-7. co-author Sybase Transact SQL Guidelines Best Practices, ISBN 978-0-9759693-0-4 Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: Harsha N [mailto:harsha.hadoo...@gmail.com] Sent: 30 April 2015 07:24 To: user@hive.apache.org Subject: Hive Alter Partition Location Issue Hi All, Can experts share your view on Hive behaviour in below scenario. I am facing below issue on using alter partition locations in hive. select count(*) from table1 where dt = 201501; Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number java.io.IOException: cannot find dir = hdfs:///data/dt =201501/1430201400/ in pathToPartitionInfo: [hdfs:/data/dt=201501/1430201400/] Below are the steps I have followed. I have altered a partition location in hive using below command. ALTER TABLE table1 PARTITION (dt=201501) SET LOCATION 'hdfs:///data/dt=201501/1430201400/'; I have inserted new data into this new location. INSERT INTO TABLE table1 SELECT * FROM table2 where dt=201501 select count(*) from table1 where dt = 201501; doesn't work but select * from table1 where dt = 201501 works good. Please let me know if you need more information. Thanks Harsha
Re: Too many connections from hive to zookeeper
I have had the same problem and bouncing the Hive server temporarily alleviates this. Abe On Thu, Apr 30, 2015 at 2:20 AM, Shady Xu shad...@gmail.com wrote: I found the patch here https://issues.apache.org/jira/browse/HIVE-9119. But upgrading is the last thing we want to do. Also, I found another solution which changes the hive.server2.enable.doAs property to false to use the same user to do the hive operations. But I'm not sure whether a user here is the same with a `client` described in the issue. 2015-04-30 11:01 GMT+08:00 Shady Xu shad...@gmail.com: Thanks. But is there any other solution besides upgrading? 2015-04-30 10:55 GMT+08:00 Xuefu Zhang xzh...@cloudera.com: This is a known issue and has been fixed in later releases. --Xuefu On Wed, Apr 29, 2015 at 7:44 PM, Shady Xu shad...@gmail.com wrote: Recently I found in the zookeeper log that there were too many client connections and it was hive that was establishing more and more connections. I modified the max client connection property in zookeeper and that fixed the problem, temporarily. But the connections hive made to zookeeper were still growing. It seems that hive does not use a connection pool to reuse the connections. So how can I configure to reduce the connections? P.S. I am using Hive-0.12.0
Re: Too many connections from hive to zookeeper
It looks to me using Hive's Table Lock Manager with ZK with Cloudera's CDH5.3.3 distribution (Hive 0.13.1) is a bad thing. The ticket indicates the problem is fixed in version 1.1.0 only. I asked earlier in a different email (Subject: Table Lock Manager: ZooKeeper cluster) about using the same ZK quorum setup for HBase for handling Hive's lock thing and I had the impression from a response from Cloudera that it should be fine. My thought now is that it's not fine. If Hive causes trouble in ZK server because it does not handle connections properly, it's a bad thing trying to use the same ZK quorum. It also seems to me this Table Lock feature should be turned off. On Thu, Apr 30, 2015 at 9:44 AM, Abe Weinograd a...@flonet.com wrote: I have had the same problem and bouncing the Hive server temporarily alleviates this. Abe On Thu, Apr 30, 2015 at 2:20 AM, Shady Xu shad...@gmail.com wrote: I found the patch here https://issues.apache.org/jira/browse/HIVE-9119. But upgrading is the last thing we want to do. Also, I found another solution which changes the hive.server2.enable.doAs property to false to use the same user to do the hive operations. But I'm not sure whether a user here is the same with a `client` described in the issue. 2015-04-30 11:01 GMT+08:00 Shady Xu shad...@gmail.com: Thanks. But is there any other solution besides upgrading? 2015-04-30 10:55 GMT+08:00 Xuefu Zhang xzh...@cloudera.com: This is a known issue and has been fixed in later releases. --Xuefu On Wed, Apr 29, 2015 at 7:44 PM, Shady Xu shad...@gmail.com wrote: Recently I found in the zookeeper log that there were too many client connections and it was hive that was establishing more and more connections. I modified the max client connection property in zookeeper and that fixed the problem, temporarily. But the connections hive made to zookeeper were still growing. It seems that hive does not use a connection pool to reuse the connections. So how can I configure to reduce the connections? P.S. I am using Hive-0.12.0
map side join
Hi, I am doing a few map side joins in one query to load an user facing ORC table in order to denormalize. Two of the tables I am joining too are pretty large. I am setting hive.auto.convert.join.noconditionaltask.size pretty high. However, the join it self filters on those two tables, but it seems like its submitting the whole table to the job. if I use a view with the filter baked in, will that help? I don't want to have to jack up the JVM for the client/HiveServer2 to accommodate the full table. Thanks, Abe
Re: map side join
Great info, thanks. Makes sense on the partition since those files can be shipped by themselves. These are reference tables, but one happens to be pretty long. Thanks, Abe On Thu, Apr 30, 2015 at 12:54 PM, Gopal Vijayaraghavan gop...@apache.org wrote: Hi, Using CDH 5.3 - Hive 0.13. Does a view help here? Does how i format the table help in reducing size? No, a view does not help - they are not materialized and you need hive-1.0 to have temporary table support. The only way out is if you only have 1 filter column in the system. I assume your data is not in ORC (because of CDH), which prevents any speedups due to the ORC row-index filters. In ORC, you can reorganize data to get min-max pruning IO savings by reinserting data via ³insert overwrite table x select * from x sort by filter-col². In my point lookup queries, when looking for ~3 rows in 6 billion unique values, ORC ends up reading only 54,000 rows into memory thanks to ORC indexes, even in MapReduce. But if your case is much simpler you have a low-cardinality column (i.e 100 unique items), you can use that column as a partition column for your table, so that it is pre-filtered during planning time. Outside of those scenarios, a scalable distributed solution exists in Tez¹s broadcast JOIN - you can test Tez using the open-source Apache hive-0.13.1 (the one Yahoo uses), because the CDH version had Tez removed from the package. Cheers, Gopal
Re: PL/HQL - Procedural SQL-on-Hadoop
Hi everyone, PL/HQL 0.3.1 released today. There are some features I hope you find interesting: 1) Stored procedures and SQL user-defined functions CREATE PROCEDURE and CREATE FUNCTION statements are supported so you can define procedures and functions in your scripts, and when added to .plhqlrc they can look as real stored objects. 2) Calling PL/HQL functions from Hive queries You can use PL/HQL functions in Hive queries. PL/HQL automatically puts function definitions to distributed cache, registers Hive UDF and executes functions. You can also call PL/HQL functions from Hive CLI if you manually register UDF and execute ADD FILE for PL/HQL scripts. See, http://plhql.org/udf 3) Multiple connections You can work with multiple databases from a single PL/HQL script. For example, run queries in Hive, but lookup and log to RDBMS. MAP OBJECT statement allows you to assign a table to a connection (again you can do that in plhqlrc). Then PL/HQL will automatically use appropriate connection. In the next releases, procedural features will be extended to cover more existing dialects, on-the-fly SQL conversion will be improved. Also federation features will be added - like saving query results to RDBMS (actually any JDBC source), use RDBMS tables as map-join tables in Hive queries and so on. Early adopters are very welcome. Please contact me at dmtolp...@gmail.com Thanks, Dmitry Tolpeko PL/HQL - Procedural SQL-on-Hadoop, www.plhql.org On Fri, Mar 6, 2015 at 6:23 PM, Ronak Bhatt ronakb...@gmail.com wrote: plhql is a great initiative and will certainly help with more adoption of the eco-system...Good job Dimtry -r *-r* On Fri, Mar 6, 2015 at 1:07 AM, Dmitry Tolpeko dmtolp...@gmail.com wrote: run dos2unix for plhql file. Sorry for this issue, it will be fixed. Dmitry On Fri, Mar 6, 2015 at 11:27 AM, r7raul1...@163.com r7raul1...@163.com wrote: I type plhql --version but meet error bash: /usr/plhql/plhql: /bin/bash^M: bad interpreter: No such file or directory -- r7raul1...@163.com *From:* Dmitry Tolpeko dmtolp...@gmail.com *Date:* 2015-02-28 04:36 *To:* user user@hive.apache.org *Subject:* PL/HQL - Procedural SQL-on-Hadoop Let me introduce PL/HQL, an open source tool that implements procedural SQL on Hadoop. It is going to support all major procedural syntaxes. The tool can be used with any SQL-on-Hadoop solution. Motivation: - Writing the driver code using well-known procedural SQL (not bash) that enables Hadoop to even more wider audience - Allowing dynamic SQL, iterations, flow-of-control and SQL exception handling - Facilitating migration of RDBMS workload to Hadoop Plans (besides extending syntax): - Supporting CREATE PROCEDURE/FUNCTION/PACKAGE to reuse code - Allowing connections to multiple databases (i.e. lookup tables in relational databases) - On-the-fly SQL conversion (SELECT i.e.), compatibility layers More details can be found at http://www.plhql.org/ It is just the first release PL/HQL 0.01 to show that such project exists and get any initial feedback. Thank you, Dmitry Tolpeko
Re: Edit access to hive wiki to update hive-hbase integration docs
Done. Welcome to the Hive wiki team, Andrew. -- Lefty On Thu, Apr 30, 2015 at 7:20 PM, Andrew Mains andrew.ma...@kontagent.com wrote: Doh, can't believe I forgot that, sorry! Username: amains12 Thanks again. On 4/30/15 4:11 PM, Lefty Leverenz wrote: Great! Just give us a Confluence username (see About This Wiki -- How to get permission to edit https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit ). -- Lefty On Thu, Apr 30, 2015 at 7:07 PM, Andrew Mains andrew.ma...@kontagent.com wrote: Hi all, Could I get edit access to the hive wiki in order to update the hive/hbase integration docs ( https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration). Specifically I'd like to: 1. Add documentation about compound key support (the only statement on the wiki right now is there must be exactly one :key mapping (we don't support compound keys yet) which is no longer true post https://issues.apache.org/jira/browse/HIVE-2599 2. Update the test running information (the command still refers to ant; should be switched to maven). 3. Add documentation for support for complex value types ( https://issues.apache.org/jira/browse/HIVE-6147). The ticket has a comment asking about docs, but I haven't found any yet. Thanks! Andrew
Re: Edit access to hive wiki to update hive-hbase integration docs
Great! Just give us a Confluence username (see About This Wiki -- How to get permission to edit https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit ). -- Lefty On Thu, Apr 30, 2015 at 7:07 PM, Andrew Mains andrew.ma...@kontagent.com wrote: Hi all, Could I get edit access to the hive wiki in order to update the hive/hbase integration docs ( https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration). Specifically I'd like to: 1. Add documentation about compound key support (the only statement on the wiki right now is there must be exactly one :key mapping (we don't support compound keys yet) which is no longer true post https://issues.apache.org/jira/browse/HIVE-2599 2. Update the test running information (the command still refers to ant; should be switched to maven). 3. Add documentation for support for complex value types ( https://issues.apache.org/jira/browse/HIVE-6147). The ticket has a comment asking about docs, but I haven't found any yet. Thanks! Andrew
Edit access to hive wiki to update hive-hbase integration docs
Hi all, Could I get edit access to the hive wiki in order to update the hive/hbase integration docs (https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration). Specifically I'd like to: 1. Add documentation about compound key support (the only statement on the wiki right now is there must be exactly one|:key|mapping (we don't support compound keys yet) which is no longer true post https://issues.apache.org/jira/browse/HIVE-2599 2. Update the test running information (the command still refers to ant; should be switched to maven). 3. Add documentation for support for complex value types (https://issues.apache.org/jira/browse/HIVE-6147). The ticket has a comment asking about docs, but I haven't found any yet. Thanks! Andrew
Re: Edit access to hive wiki to update hive-hbase integration docs
Doh, can't believe I forgot that, sorry! Username: amains12 Thanks again. On 4/30/15 4:11 PM, Lefty Leverenz wrote: Great! Just give us a Confluence username (see About This Wiki -- How to get permission to edit https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit). -- Lefty On Thu, Apr 30, 2015 at 7:07 PM, Andrew Mains andrew.ma...@kontagent.com mailto:andrew.ma...@kontagent.com wrote: Hi all, Could I get edit access to the hive wiki in order to update the hive/hbase integration docs (https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration). Specifically I'd like to: 1. Add documentation about compound key support (the only statement on the wiki right now is there must be exactly one|:key|mapping (we don't support compound keys yet) which is no longer true post https://issues.apache.org/jira/browse/HIVE-2599 2. Update the test running information (the command still refers to ant; should be switched to maven). 3. Add documentation for support for complex value types (https://issues.apache.org/jira/browse/HIVE-6147). The ticket has a comment asking about docs, but I haven't found any yet. Thanks! Andrew