Re: Strict mode and joins

2015-10-19 Thread Gopal Vijayaraghavan

> However reality is I just want it called 1 time which is during planning
>and if I flag is as deterministic this is exactly what happens so you can
>do this:

There was a new constant added in hive-1.2.0 named CURRENT_TIMESTAMP which
works the way most people want unix_timestamp() to work.

> AND article_meta.publish_timestamp > ((unix_timestamp() * 1000) - (1000
>* 60 * 60 * 24 * 2))

If if helps, I have a backport as a UDF on my github.

https://github.com/t3rmin4t0r/current-timestamp/blob/master/src/main/java/o
rg/notmysock/hive/udf/CurrentTimeStampUDF.java#L35


with that UDF, I expect it to constant fold

(to_unix_timestamp(current_timestamp()) * 1000)

into a single bigint.

Cheers,
Gopal








Re: Strict mode and joins

2015-10-19 Thread Bennie Schut

Hi Edward,

That's possibly due to using unix_timestamp (although the error message 
seems misleading if that proves true) . It's technically correct it 
shouldn't be flagged as deterministic because every time you call it 
you'll get a different answer as time progresses. However reality is I 
just want it called 1 time which is during planning and if I flag is as 
deterministic this is exactly what happens so you can do this:


@UDFType(deterministic = true)
public class UnixTimeStamp extends GenericUDFUnixTimeStamp {
// Making the udf deterministic which is kind of cheating but makes 
partition pruning work.

}

And then register the udf like you normally would.

If that's not helping do some creative sub-querying might help like
FROM (select * from entry_hourly_v3 where dt=2015101517 ) 
entry_hourly_v3 INNER JOIN article_meta ON


Bennie.

On 15/10/15 23:06, Edward Capriolo wrote:

So I have strict mode on and I like to keep it that way.

I am trying to do this query.

INSERT OVERWRITE TABLE vertical_stats_recent PARTITION (dt=2015101517)
SELECT ...
FROM entry_hourly_v3 INNER JOIN article_meta ON
entry_hourly_v3.entry_id = article_meta.entry_id
INNER JOIN channel_meta ON
channel_meta.section_name = article_meta.channel

WHERE entry_hourly_v3.dt=2015101517
AND article_meta.dt=2015101517
AND channel_meta.hitdate=20151015
AND article_meta.publish_timestamp > ((unix_timestamp() * 1000) - 
(1000 * 60 * 60 * 24 * 2))

GROUP

entry_hourly_v3, channel_meta and article_meta are partitioned tables.

*Your query has the following error(s):*

Error while compiling statement: FAILED: SemanticException [Error 
10041]: No partition predicate found for Alias "entry_hourly_v3" Table 
"entry_hourly_v3"


I also tried putting views on the table and I had no luck.

Is there any way I can do this query without turning strict mode off?






Strict mode and joins

2015-10-15 Thread Edward Capriolo
So I have strict mode on and I like to keep it that way.

I am trying to do this query.

INSERT OVERWRITE TABLE vertical_stats_recent PARTITION (dt=2015101517)
SELECT ...

FROM entry_hourly_v3 INNER JOIN article_meta ON
entry_hourly_v3.entry_id = article_meta.entry_id
INNER JOIN channel_meta ON
channel_meta.section_name = article_meta.channel

WHERE entry_hourly_v3.dt=2015101517
AND article_meta.dt=2015101517
AND channel_meta.hitdate=20151015
AND article_meta.publish_timestamp > ((unix_timestamp() * 1000) - (1000 *
60 * 60 * 24 * 2))
GROUP

entry_hourly_v3, channel_meta and article_meta are partitioned tables.

*Your query has the following error(s):*

Error while compiling statement: FAILED: SemanticException [Error 10041]:
No partition predicate found for Alias "entry_hourly_v3" Table
"entry_hourly_v3"

I also tried putting views on the table and I had no luck.

Is there any way I can do this query without turning strict mode off?