Re: Need help running query over yesterday. getting No partition predicate found errors

2012-06-05 Thread Matthew Hooker
Thanks, Mark.

I was hoping the expressions could be evaluated at query time, but
generating the query works just as well.

On Mon, Jun 4, 2012 at 7:42 PM, Mark Grover mgro...@oanda.com wrote:

 Hi Matthew,
 unix_timestamp() is being executed on the Hive server in your case.
 Therefore, as your query progresses, the timestamp returned by your
 unix_timestamp() would change. Based on your usage, this would lead to
 unsuspecting problems when this queries runs across UTC midnight. In any
 case, I'd recommend that you get your Hive client code to get the present
 unix_timestamp and generate your partition pruning constant(s) on the
 client before you send your query to the server. When you do so, partition
 pruning will occur as expected.

 Mark

 - Original Message -
 From: Matthew Hooker mwhoo...@disqus.com
 To: user@hive.apache.org
 Sent: Monday, June 4, 2012 8:01:26 PM
 Subject: Need help running query over yesterday. getting No partition
 predicate found errors

 Hello,


 I'm trying to construct a query which will do some simple counts over a
 table of events, and insert them in to a summary table.


 The query I came up with looks something like



 INSERT OVERWRITE TABLE activity
 PARTITION(date_utc)
 select count(1) views,
 from impressions

 where dt = date_sub(to_date(from_unixtime(unix_timestamp())),1)
 and dt  to_date(from_unixtime(unix_timestamp()))
 group by d_theme;


 This seemed to work fine, but I realized it was doing a full table scan,
 and not using the partitions in views appropriately.


 I set hive.mapred.mode=strict and the above query fails to run.
 If I change the expressions in the where clause to literals, it works as I
 would expect.


 Can anyone help me figure out how to do what I want?




 I put together a test script to illustrate my problem:



 set hive.mapred.mode=strict;
 set hive.optimize.ppd=true;
 DROP TABLE IF EXISTS test_where_expr;
 CREATE TABLE test_where_expr (
 ts int
 )
 PARTITIONED BY ( dt STRING );
 explain SELECT * from test_where_expr WHERE dt =
 to_date(date_sub(from_unixtime(unix_timestamp()),1));
 explain SELECT * from test_where_expr WHERE dt = '2012-06-01';




 This is the output:



 hive set hive.mapred.mode=strict;
 hive set hive.optimize.ppd=true;
 hive
  DROP TABLE IF EXISTS test_where_expr;
 OK
 Time taken: 3.405 seconds
 hive
  CREATE TABLE test_where_expr (
  ts int
  )
  PARTITIONED BY ( dt STRING );
 OK
 Time taken: 0.189 seconds
 hive
  explain SELECT * from test_where_expr
  WHERE dt = to_date(date_sub(from_unixtime(unix_timestamp()),1));
 FAILED: Error in semantic analysis: No partition predicate found for Alias
 test_where_expr Table test_where_expr
 hive
  explain SELECT * from test_where_expr
  WHERE dt = '2012-06-01';
 OK
 ABSTRACT SYNTAX TREE:
 (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_where_expr)))
 (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT
 (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL dt)
 '2012-06-01'


 STAGE DEPENDENCIES:
 Stage-0 is a root stage


 STAGE PLANS:
 Stage: Stage-0
 Fetch Operator
 limit: -1




 Time taken: 0.166 seconds
 hive hadoop@ip-10-68-190-136:~$ hive --version
 Hive version 0.8.1.
 hadoop@ip-10-68-190-136:~$ hadoop -version
 java version 1.6.0_26
 Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)





 Thanks,
 --Matthew Hooker



Need help running query over yesterday. getting No partition predicate found errors

2012-06-04 Thread Matthew Hooker
Hello,

I'm trying to construct a query which will do some simple counts over a
table of events, and insert them in to a summary table.

The query I came up with looks something like

INSERT OVERWRITE TABLE activity
PARTITION(date_utc)
select count(1) views,
from impressions
where dt = date_sub(to_date(from_unixtime(unix_timestamp())),1)
and dt  to_date(from_unixtime(unix_timestamp()))
group by d_theme;

This seemed to work fine, but I realized it was doing a full table scan,
and not using the partitions in views appropriately.

I set hive.mapred.mode=strict and the above query fails to run.
If I change the expressions in the where clause to literals, it works as I
would expect.

Can anyone help me figure out how to do what I want?


I put together a test script to illustrate my problem:

set hive.mapred.mode=strict;
set hive.optimize.ppd=true;
DROP TABLE IF EXISTS test_where_expr;
CREATE TABLE test_where_expr (
ts int
  )
  PARTITIONED BY ( dt STRING );
explain SELECT * from test_where_expr WHERE dt =
to_date(date_sub(from_unixtime(unix_timestamp()),1));
explain SELECT * from test_where_expr WHERE dt = '2012-06-01';


This is the output:

hive set hive.mapred.mode=strict;
hive set hive.optimize.ppd=true;
hive
 DROP TABLE IF EXISTS test_where_expr;
OK
Time taken: 3.405 seconds
hive
 CREATE TABLE test_where_expr (
 ts int
   )
   PARTITIONED BY ( dt STRING );
OK
Time taken: 0.189 seconds
hive
 explain SELECT * from test_where_expr
 WHERE dt = to_date(date_sub(from_unixtime(unix_timestamp()),1));
FAILED: Error in semantic analysis: No partition predicate found for Alias
test_where_expr Table test_where_expr
hive
 explain SELECT * from test_where_expr
 WHERE dt = '2012-06-01';
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_where_expr)))
(TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT
(TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL dt)
'2012-06-01'

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1


Time taken: 0.166 seconds
hive hadoop@ip-10-68-190-136:~$ hive --version
Hive version 0.8.1.
hadoop@ip-10-68-190-136:~$ hadoop -version
java version 1.6.0_26
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)


Thanks,
--Matthew Hooker


Re: Need help running query over yesterday. getting No partition predicate found errors

2012-06-04 Thread Mark Grover
Hi Matthew,
unix_timestamp() is being executed on the Hive server in your case. Therefore, 
as your query progresses, the timestamp returned by your unix_timestamp() would 
change. Based on your usage, this would lead to unsuspecting problems when this 
queries runs across UTC midnight. In any case, I'd recommend that you get your 
Hive client code to get the present unix_timestamp and generate your partition 
pruning constant(s) on the client before you send your query to the server. 
When you do so, partition pruning will occur as expected.

Mark

- Original Message -
From: Matthew Hooker mwhoo...@disqus.com
To: user@hive.apache.org
Sent: Monday, June 4, 2012 8:01:26 PM
Subject: Need help running query over yesterday. getting No partition 
predicate found errors

Hello, 


I'm trying to construct a query which will do some simple counts over a table 
of events, and insert them in to a summary table. 


The query I came up with looks something like 



INSERT OVERWRITE TABLE activity 
PARTITION(date_utc) 
select count(1) views, 
from impressions 

where dt = date_sub(to_date(from_unixtime(unix_timestamp())),1) 
and dt  to_date(from_unixtime(unix_timestamp())) 
group by d_theme; 


This seemed to work fine, but I realized it was doing a full table scan, and 
not using the partitions in views appropriately. 


I set hive.mapred.mode=strict and the above query fails to run. 
If I change the expressions in the where clause to literals, it works as I 
would expect. 


Can anyone help me figure out how to do what I want? 




I put together a test script to illustrate my problem: 



set hive.mapred.mode=strict; 
set hive.optimize.ppd=true; 
DROP TABLE IF EXISTS test_where_expr; 
CREATE TABLE test_where_expr ( 
ts int 
) 
PARTITIONED BY ( dt STRING ); 
explain SELECT * from test_where_expr WHERE dt = 
to_date(date_sub(from_unixtime(unix_timestamp()),1)); 
explain SELECT * from test_where_expr WHERE dt = '2012-06-01'; 




This is the output: 



hive set hive.mapred.mode=strict; 
hive set hive.optimize.ppd=true; 
hive 
 DROP TABLE IF EXISTS test_where_expr; 
OK 
Time taken: 3.405 seconds 
hive 
 CREATE TABLE test_where_expr ( 
 ts int 
 ) 
 PARTITIONED BY ( dt STRING ); 
OK 
Time taken: 0.189 seconds 
hive 
 explain SELECT * from test_where_expr 
 WHERE dt = to_date(date_sub(from_unixtime(unix_timestamp()),1)); 
FAILED: Error in semantic analysis: No partition predicate found for Alias 
test_where_expr Table test_where_expr 
hive 
 explain SELECT * from test_where_expr 
 WHERE dt = '2012-06-01'; 
OK 
ABSTRACT SYNTAX TREE: 
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_where_expr))) (TOK_INSERT 
(TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL dt) '2012-06-01' 


STAGE DEPENDENCIES: 
Stage-0 is a root stage 


STAGE PLANS: 
Stage: Stage-0 
Fetch Operator 
limit: -1 




Time taken: 0.166 seconds 
hive hadoop@ip-10-68-190-136:~$ hive --version 
Hive version 0.8.1. 
hadoop@ip-10-68-190-136:~$ hadoop -version 
java version 1.6.0_26 
Java(TM) SE Runtime Environment (build 1.6.0_26-b03) 
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) 





Thanks, 
--Matthew Hooker