RE: Hive Alter Partition Location Issue

2015-04-30 Thread Mich Talebzadeh
Hi Harsha,

 

Have you updated stats on table1 after partition adding? In other words it is 
possible that the optimiser is not aware of that partition yet?

 

analyze table table1 partition (dt=201501) compute statistics;

 

HTH

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Author of the books A Practitioner’s Guide to Upgrading to Sybase ASE 15, 
ISBN 978-0-9563693-0-7. 

co-author Sybase Transact SQL Guidelines Best Practices, ISBN 
978-0-9759693-0-4

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Harsha N [mailto:harsha.hadoo...@gmail.com] 
Sent: 30 April 2015 07:24
To: user@hive.apache.org
Subject: Hive Alter Partition Location Issue

 

Hi All,

Can experts share your view on Hive behaviour in below scenario. 

 

I am facing below issue on using alter partition locations in hive.

 

select count(*) from table1 where dt = 201501;

 

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=number

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=number

In order to set a constant number of reducers:

  set mapreduce.job.reduces=number

java.io.IOException: cannot find dir = hdfs:///data/dt =201501/1430201400/

in pathToPartitionInfo: [hdfs:/data/dt=201501/1430201400/]

 

Below are the steps I have followed.

I have altered a partition location in hive using below command.

ALTER TABLE table1 PARTITION (dt=201501) SET LOCATION 
'hdfs:///data/dt=201501/1430201400/';

 

I have inserted new data into this new location.

 

INSERT INTO TABLE table1

SELECT * FROM table2 where dt=201501

 

select count(*) from table1 where dt = 201501; doesn't work but 

select * from table1 where dt = 201501 works good.

 

Please let me know if you need more information.

 

Thanks

Harsha



Re: Too many connections from hive to zookeeper

2015-04-30 Thread Abe Weinograd
I have had the same problem and bouncing the Hive server temporarily
alleviates this.

Abe

On Thu, Apr 30, 2015 at 2:20 AM, Shady Xu shad...@gmail.com wrote:

 I found the patch here https://issues.apache.org/jira/browse/HIVE-9119.
 But upgrading is the last thing we want to do.

 Also, I found another solution which changes the hive.server2.enable.doAs 
 property
 to false to use the same user to do the hive operations. But I'm not sure
 whether a user here is the same with a `client` described in the issue.

 2015-04-30 11:01 GMT+08:00 Shady Xu shad...@gmail.com:

 Thanks. But is there any other solution besides upgrading?

 2015-04-30 10:55 GMT+08:00 Xuefu Zhang xzh...@cloudera.com:

 This is a known issue and has been fixed in later releases.

 --Xuefu

 On Wed, Apr 29, 2015 at 7:44 PM, Shady Xu shad...@gmail.com wrote:

 Recently I found in the zookeeper log that there were too many client
 connections and it was hive that was establishing more and more 
 connections.

 I modified the max client connection property in zookeeper and that
 fixed the problem, temporarily. But the connections hive made to zookeeper
 were still growing.

 It seems that hive does not use a connection pool to reuse the
 connections. So how can I configure to reduce the connections?

 P.S.
 I am using Hive-0.12.0







Re: Too many connections from hive to zookeeper

2015-04-30 Thread Eduardo Ferreira
It looks to me using Hive's Table Lock Manager with ZK with Cloudera's
CDH5.3.3 distribution (Hive 0.13.1) is a bad thing.
The ticket indicates the problem is fixed in version 1.1.0 only.
I asked earlier in a different email (Subject: Table Lock Manager:
ZooKeeper cluster) about using the same ZK quorum setup for HBase for
handling Hive's lock thing and I had the impression from a response from
Cloudera that it should be fine.
My thought now is that it's not fine. If Hive causes trouble in ZK server
because it does not handle connections properly, it's a bad thing trying to
use the same ZK quorum. It also seems to me this Table Lock feature should
be turned off.

On Thu, Apr 30, 2015 at 9:44 AM, Abe Weinograd a...@flonet.com wrote:

 I have had the same problem and bouncing the Hive server temporarily
 alleviates this.

 Abe

 On Thu, Apr 30, 2015 at 2:20 AM, Shady Xu shad...@gmail.com wrote:

 I found the patch here https://issues.apache.org/jira/browse/HIVE-9119.
 But upgrading is the last thing we want to do.

 Also, I found another solution which changes the
 hive.server2.enable.doAs property to false to use the same user to do
 the hive operations. But I'm not sure whether a user here is the same with
 a `client` described in the issue.

 2015-04-30 11:01 GMT+08:00 Shady Xu shad...@gmail.com:

 Thanks. But is there any other solution besides upgrading?

 2015-04-30 10:55 GMT+08:00 Xuefu Zhang xzh...@cloudera.com:

 This is a known issue and has been fixed in later releases.

 --Xuefu

 On Wed, Apr 29, 2015 at 7:44 PM, Shady Xu shad...@gmail.com wrote:

 Recently I found in the zookeeper log that there were too many client
 connections and it was hive that was establishing more and more 
 connections.

 I modified the max client connection property in zookeeper and that
 fixed the problem, temporarily. But the connections hive made to zookeeper
 were still growing.

 It seems that hive does not use a connection pool to reuse the
 connections. So how can I configure to reduce the connections?

 P.S.
 I am using Hive-0.12.0








map side join

2015-04-30 Thread Abe Weinograd
Hi,

I am doing a few map side joins in one query to load an user facing ORC
table in order to denormalize.

Two of the tables I am joining too are pretty large.  I am
setting hive.auto.convert.join.noconditionaltask.size pretty high.
However, the join it self filters on those two tables, but it seems like
its submitting the whole table to the job.  if I use a view with the filter
baked in, will that help?  I don't want to have to jack up the JVM for the
client/HiveServer2 to accommodate the full table.

Thanks,
Abe


Re: map side join

2015-04-30 Thread Abe Weinograd
Great info, thanks.  Makes sense on the partition since those files can be
shipped by themselves.  These are reference tables, but one happens to be
pretty long.

Thanks,
Abe

On Thu, Apr 30, 2015 at 12:54 PM, Gopal Vijayaraghavan gop...@apache.org
wrote:

 Hi,

  Using CDH 5.3 - Hive 0.13.  Does a view help here?  Does how i format
 the table help in reducing size?

 No, a view does not help - they are not materialized and you need hive-1.0
 to have temporary table support.

 The only way out is if you only have 1 filter column in the system.

 I assume your data is not in ORC (because of CDH), which prevents any
 speedups due to the ORC row-index filters.

 In ORC, you can reorganize data to get min-max pruning IO savings by
 reinserting data via ³insert overwrite table x select * from x sort by
 filter-col².

 In my point lookup queries, when looking for ~3 rows in 6 billion unique
 values, ORC ends up reading only 54,000 rows into memory thanks to ORC
 indexes, even in MapReduce.

 But if your case is much simpler  you have a low-cardinality column (i.e
 100 unique items), you can use that column as a partition column for your
 table, so that it is pre-filtered during planning time.

 Outside of those scenarios, a scalable distributed solution exists in
 Tez¹s broadcast JOIN - you can test Tez using the open-source Apache
 hive-0.13.1 (the one Yahoo uses), because the CDH version had Tez removed
 from the package.

 Cheers,
 Gopal





Re: PL/HQL - Procedural SQL-on-Hadoop

2015-04-30 Thread Dmitry Tolpeko
Hi everyone,

PL/HQL 0.3.1 released today. There are some features I hope you find
interesting:

1) Stored procedures and SQL user-defined functions

CREATE PROCEDURE and CREATE FUNCTION statements are supported so you can
define procedures and functions in your scripts, and when added to .plhqlrc
they can look as real stored objects.

2) Calling PL/HQL functions from Hive queries

You can use PL/HQL functions in Hive queries. PL/HQL automatically puts
function definitions to distributed cache, registers Hive UDF and executes
functions. You can also call PL/HQL functions from Hive CLI if you manually
register UDF and execute ADD FILE for PL/HQL scripts. See,
http://plhql.org/udf

3) Multiple connections

You can work with multiple databases from a single PL/HQL script. For
example, run queries in Hive, but lookup and log to RDBMS.
MAP OBJECT statement allows you to assign a table to a connection (again
you can do that in plhqlrc). Then PL/HQL will automatically use appropriate
connection.

In the next releases, procedural features will be extended to cover more
existing dialects, on-the-fly SQL conversion will be improved. Also
federation features will be added - like saving query results to RDBMS
(actually any JDBC source), use RDBMS tables as map-join tables in Hive
queries and so on.

Early adopters are very welcome. Please contact me at dmtolp...@gmail.com

Thanks,

Dmitry Tolpeko
PL/HQL - Procedural SQL-on-Hadoop, www.plhql.org


On Fri, Mar 6, 2015 at 6:23 PM, Ronak Bhatt ronakb...@gmail.com wrote:

 plhql is a great initiative and will certainly help with more adoption of
 the eco-system...Good job Dimtry

 -r

 *-r*




 On Fri, Mar 6, 2015 at 1:07 AM, Dmitry Tolpeko dmtolp...@gmail.com
 wrote:

 run dos2unix for plhql file. Sorry for this issue, it will be fixed.

 Dmitry

 On Fri, Mar 6, 2015 at 11:27 AM, r7raul1...@163.com r7raul1...@163.com
 wrote:

 I type plhql --version     but meet error bash: /usr/plhql/plhql:
 /bin/bash^M: bad interpreter: No such file or directory

 --
 r7raul1...@163.com


 *From:* Dmitry Tolpeko dmtolp...@gmail.com
 *Date:* 2015-02-28 04:36
 *To:* user user@hive.apache.org
 *Subject:* PL/HQL - Procedural SQL-on-Hadoop
 Let me introduce PL/HQL, an open source tool that implements procedural
 SQL on Hadoop. It is going to support all major procedural syntaxes. The
 tool can be used with any SQL-on-Hadoop solution.

 Motivation:

- Writing the driver code using well-known procedural SQL (not bash)
that enables Hadoop to even more wider audience
- Allowing dynamic SQL, iterations, flow-of-control and SQL
exception handling
- Facilitating migration of RDBMS workload to Hadoop

 Plans (besides extending syntax):


- Supporting CREATE PROCEDURE/FUNCTION/PACKAGE to reuse code
- Allowing connections to multiple databases (i.e. lookup tables in
relational databases)
- On-the-fly SQL conversion (SELECT i.e.), compatibility layers

 More details can be found at http://www.plhql.org/

 It is just the first release PL/HQL 0.01 to show that such project
 exists and get any initial feedback.

 Thank you,

 Dmitry Tolpeko






Re: Edit access to hive wiki to update hive-hbase integration docs

2015-04-30 Thread Lefty Leverenz
Done.  Welcome to the Hive wiki team, Andrew.

-- Lefty

On Thu, Apr 30, 2015 at 7:20 PM, Andrew Mains andrew.ma...@kontagent.com
wrote:

  Doh, can't believe I forgot that, sorry!

 Username: amains12

 Thanks again.


 On 4/30/15 4:11 PM, Lefty Leverenz wrote:

 Great!  Just give us a Confluence username (see About This Wiki -- How to
 get permission to edit
 https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit
 ).

  -- Lefty

 On Thu, Apr 30, 2015 at 7:07 PM, Andrew Mains andrew.ma...@kontagent.com
 wrote:

  Hi all,

 Could I get edit access to the hive wiki in order to update the
 hive/hbase integration docs (
 https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration).
 Specifically I'd like to:

 1. Add documentation about compound key support
 (the only statement on the wiki right now is there must be exactly
 one :key mapping (we don't support compound keys yet)  which is no
 longer true post https://issues.apache.org/jira/browse/HIVE-2599

 2. Update the test running information (the command still refers to ant;
 should be switched to maven).

 3. Add documentation for support for complex value types (
 https://issues.apache.org/jira/browse/HIVE-6147). The ticket has a
 comment asking about docs, but I haven't found any yet.

 Thanks!

 Andrew






Re: Edit access to hive wiki to update hive-hbase integration docs

2015-04-30 Thread Lefty Leverenz
Great!  Just give us a Confluence username (see About This Wiki -- How to
get permission to edit
https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit
).

-- Lefty

On Thu, Apr 30, 2015 at 7:07 PM, Andrew Mains andrew.ma...@kontagent.com
wrote:

  Hi all,

 Could I get edit access to the hive wiki in order to update the hive/hbase
 integration docs (
 https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration).
 Specifically I'd like to:

 1. Add documentation about compound key support
 (the only statement on the wiki right now is there must be exactly one
  :key mapping (we don't support compound keys yet)  which is no longer
 true post https://issues.apache.org/jira/browse/HIVE-2599

 2. Update the test running information (the command still refers to ant;
 should be switched to maven).

 3. Add documentation for support for complex value types (
 https://issues.apache.org/jira/browse/HIVE-6147). The ticket has a
 comment asking about docs, but I haven't found any yet.

 Thanks!

 Andrew



Edit access to hive wiki to update hive-hbase integration docs

2015-04-30 Thread Andrew Mains

Hi all,

Could I get edit access to the hive wiki in order to update the 
hive/hbase integration docs 
(https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration). 
Specifically I'd like to:


1. Add documentation about compound key support
(the only statement on the wiki right now is there must be exactly 
one|:key|mapping (we don't support compound keys yet)  which is no 
longer true post https://issues.apache.org/jira/browse/HIVE-2599


2. Update the test running information (the command still refers to ant; 
should be switched to maven).


3. Add documentation for support for complex value types 
(https://issues.apache.org/jira/browse/HIVE-6147). The ticket has a 
comment asking about docs, but I haven't found any yet.


Thanks!

Andrew


Re: Edit access to hive wiki to update hive-hbase integration docs

2015-04-30 Thread Andrew Mains

Doh, can't believe I forgot that, sorry!

Username: amains12

Thanks again.

On 4/30/15 4:11 PM, Lefty Leverenz wrote:
Great!  Just give us a Confluence username (see About This Wiki -- How 
to get permission to edit 
https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit).


-- Lefty

On Thu, Apr 30, 2015 at 7:07 PM, Andrew Mains 
andrew.ma...@kontagent.com mailto:andrew.ma...@kontagent.com wrote:


Hi all,

Could I get edit access to the hive wiki in order to update the
hive/hbase integration docs
(https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration).
Specifically I'd like to:

1. Add documentation about compound key support
(the only statement on the wiki right now is there must be
exactly one|:key|mapping (we don't support compound keys yet) 
which is no longer true post
https://issues.apache.org/jira/browse/HIVE-2599

2. Update the test running information (the command still refers
to ant; should be switched to maven).

3. Add documentation for support for complex value types
(https://issues.apache.org/jira/browse/HIVE-6147). The ticket has
a comment asking about docs, but I haven't found any yet.

Thanks!

Andrew