Re: [ANN] Hivemall: Hive scalable machine learning library

2013-10-11 Thread Makoto YUI

Hi,

I added support for the-state-of-the-art classifiers (those are not yet 
supported in Mahout) and Hivemall's cute(!?) logo as well in Hivemall 
0.1-rc3.


Newly supported classifiers include
- Confidence Weighted (CW)
- Adaptive Regularization of Weight Vectors (AROW)
- Soft Confidence Weighted (SCW1, SCW2)

Those classifiers are much smart comparing to the standard SGD-based or 
passive aggressive classifiers. Please check it out by yourself.


Thanks,
Makoto

(2013/10/11 4:28), Clark Yang (杨卓荦) wrote:

I looks really cool, I think I will try it on.

Cheers,
Zhuoluo (Clark) Yang


2013/10/5 Makoto YUI yuin...@gmail.com mailto:yuin...@gmail.com

Hi Edward,

Thank you for your interst.

Hivemall project does not have a plan to have a specific mailing
list, I will answer following questions/comments on twitter or
through Github issues (with a question label).

BTW, I just added a CTR (Click-Through-Rate) prediction example that is
provided by a commercial search engine provider for the KDDCup 2012
track 2.

https://github.com/myui/__hivemall/wiki/KDDCup-2012-__track-2-CTR-prediction-dataset

https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-dataset

I guess many of you working on ad CTR/CVR predictions. This example
might be some help understanding how to do it only within Hive.

Thanks,
Makoto @myui


(2013/10/04 23:02), Edward Capriolo wrote:

Looks cool im already starting to play with it.

On Friday, October 4, 2013, Makoto Yui yuin...@gmail.com
mailto:yuin...@gmail.com
mailto:yuin...@gmail.com mailto:yuin...@gmail.com wrote:
   Hi Dean,
  
   Thank you for your interest in Hivemall.
  
   Twitter's paper actually influenced me in developing
Hivemall and I
   initially implemented such functionality as Pig UDFs.
  
   Though my Pig ML library is not released, you can find a similar
   attempt for Pig in
   https://github.com/y-tag/java-__pig-MyUDFs
https://github.com/y-tag/java-pig-MyUDFs
  
   Thanks,
   Makoto
  
   2013/10/3 Dean Wampler deanwamp...@gmail.com
mailto:deanwamp...@gmail.com
mailto:deanwamp...@gmail.com mailto:deanwamp...@gmail.com__:

   This is great news! I know that Twitter has done something
similar
with UDFs
   for Pig, as described in this paper:
  

http://www.umiacs.umd.edu/~__jimmylin/publications/Lin___Kolcz_SIGMOD2012.pdf

http://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf

http://www.umiacs.umd.edu/%__7Ejimmylin/publications/Lin___Kolcz_SIGMOD2012.pdf

http://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf

  
   I'm glad to see the same thing start with Hive.
  
   Dean
  
  
   On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI
yuin...@gmail.com mailto:yuin...@gmail.com
mailto:yuin...@gmail.com mailto:yuin...@gmail.com wrote:
  
   Hello all,
  
   My employer, AIST, has given the thumbs up to open source
our machine
   learning library, named Hivemall.
  
   Hivemall is a scalable machine learning library running on
Hive/Hadoop,
   licensed under the LGPL 2.1.
  
   https://github.com/myui/__hivemall
https://github.com/myui/hivemall
  
   Hivemall provides machine learning functionality as well
as feature
   engineering functions through UDFs/UDAFs/UDTFs of Hive. It
is designed
   to be scalable to the number of training instances as well
as the
number
   of training features.
  
   Hivemall is very easy to use as every machine learning
step is done
   within HiveQL.
  
   -- Installation is just as follows:
   add jar /tmp/hivemall.jar;
   source /tmp/define-all.hive;
  
   -- Logistic regression is performed by a query.
   SELECT
 feature,
 avg(weight) as weight
   FROM
(SELECT logress(features,label) as (feature,weight) FROM
   training_features) t
   GROUP BY feature;
  
   You can find detailed examples on our wiki pages.
   https://github.com/myui/__hivemall/wiki/_pages
https://github.com/myui/hivemall/wiki/_pages
  
   Though we consider that Hivemall is much easier to use and
more
scalable
   than Mahout for classification/regression tasks, please
check it by
   yourself. If you have a Hive environment, you can evaluate
Hivemall
   within 5 minutes or so.
  
   Hope you enjoy the 

NullPointerException on Sample Tables / CDH 4.4

2013-10-11 Thread fab wol
hey everyone,

I've got supplied with a decent ten node CDH 4.4 cluster, only 7 days old,
and someone tried some HBase stuff on it. I wanted to apply my (on another
cluster working) workflow's to that cluster (consisting of HiveQL Scripts
and Oozie Workflows) but unfortunately i the following issue: When trying
to drop some select statement on any table (sample tables or tables
regarding my workflow) i get the following error stack (from beeswax and
from Hive CLI):

java.io.IOException:
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException):
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:334)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1245)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)


I tried already setting up a new table (blank) and inserting some
dummy data, but the error keeps the same. Some connection between HDFS
and Hive regarind block information seems to be broken. Any idea how
to fix this or where the configuration is to get rid of this or what
other things we should check?

Cheers

Fabian


Re: NullPointerException on Sample Tables / CDH 4.4

2013-10-11 Thread fab wol
somehow after three days of searching, i just deployed client configuration
for all hive roles again, and the error seems to be gone. lets see what the
future brings.

cheers


2013/10/11 fab wol darkwoll...@gmail.com

 hey everyone,

 I've got supplied with a decent ten node CDH 4.4 cluster, only 7 days old,
 and someone tried some HBase stuff on it. I wanted to apply my (on another
 cluster working) workflow's to that cluster (consisting of HiveQL Scripts
 and Oozie Workflows) but unfortunately i the following issue: When trying
 to drop some select statement on any table (sample tables or tables
 regarding my workflow) i get the following error stack (from beeswax and
 from Hive CLI):

 java.io.IOException: 
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:334)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1245)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)


 I tried already setting up a new table (blank) and inserting some dummy data, 
 but the error keeps the same. Some connection between HDFS and Hive regarind 
 block information seems to be broken. Any idea how to fix this or where the 
 configuration is to get rid of this or what other things we should check?

 Cheers

 Fabian




Re: [ANN] Hivemall: Hive scalable machine learning library

2013-10-11 Thread Nitin Pawar
Just tried this for some hot trends in forum managements. Was pretty
impressive.

I will try this more deeply and if possible integrate in my product.

Thanks for the awesome work.

Nitin


On Fri, Oct 11, 2013 at 12:58 PM, Makoto YUI yuin...@gmail.com wrote:

 Hi,

 I added support for the-state-of-the-art classifiers (those are not yet
 supported in Mahout) and Hivemall's cute(!?) logo as well in Hivemall
 0.1-rc3.

 Newly supported classifiers include
 - Confidence Weighted (CW)
 - Adaptive Regularization of Weight Vectors (AROW)
 - Soft Confidence Weighted (SCW1, SCW2)

 Those classifiers are much smart comparing to the standard SGD-based or
 passive aggressive classifiers. Please check it out by yourself.

 Thanks,
 Makoto


 (2013/10/11 4:28), Clark Yang (杨卓荦) wrote:

 I looks really cool, I think I will try it on.

 Cheers,
 Zhuoluo (Clark) Yang


 2013/10/5 Makoto YUI yuin...@gmail.com mailto:yuin...@gmail.com


 Hi Edward,

 Thank you for your interst.

 Hivemall project does not have a plan to have a specific mailing
 list, I will answer following questions/comments on twitter or
 through Github issues (with a question label).

 BTW, I just added a CTR (Click-Through-Rate) prediction example that
 is
 provided by a commercial search engine provider for the KDDCup 2012
 track 2.
 https://github.com/myui/__**hivemall/wiki/KDDCup-2012-__**
 track-2-CTR-prediction-datasethttps://github.com/myui/__hivemall/wiki/KDDCup-2012-__track-2-CTR-prediction-dataset

 https://github.com/myui/**hivemall/wiki/KDDCup-2012-**
 track-2-CTR-prediction-datasethttps://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-dataset
 **

 I guess many of you working on ad CTR/CVR predictions. This example
 might be some help understanding how to do it only within Hive.

 Thanks,
 Makoto @myui


 (2013/10/04 23:02), Edward Capriolo wrote:

 Looks cool im already starting to play with it.

 On Friday, October 4, 2013, Makoto Yui yuin...@gmail.com
 mailto:yuin...@gmail.com
 mailto:yuin...@gmail.com mailto:yuin...@gmail.com wrote:
Hi Dean,
   
Thank you for your interest in Hivemall.
   
Twitter's paper actually influenced me in developing
 Hivemall and I
initially implemented such functionality as Pig UDFs.
   
Though my Pig ML library is not released, you can find a
 similar
attempt for Pig in

 https://github.com/y-tag/java-**__pig-MyUDFshttps://github.com/y-tag/java-__pig-MyUDFs

 
 https://github.com/y-tag/**java-pig-MyUDFshttps://github.com/y-tag/java-pig-MyUDFs
 
   
Thanks,
Makoto
   
2013/10/3 Dean Wampler deanwamp...@gmail.com
 mailto:deanwamp...@gmail.com
 mailto:deanwamp...@gmail.com mailto:deanwamp...@gmail.com**
 __:


This is great news! I know that Twitter has done something
 similar
 with UDFs
for Pig, as described in this paper:
   
 http://www.umiacs.umd.edu/~__**jimmylin/publications/Lin___**
 Kolcz_SIGMOD2012.pdfhttp://www.umiacs.umd.edu/~__jimmylin/publications/Lin___Kolcz_SIGMOD2012.pdf
 http://www.umiacs.umd.edu/%**7Ejimmylin/publications/Lin_**
 Kolcz_SIGMOD2012.pdfhttp://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf
 
 http://www.umiacs.umd.edu/%__**7Ejimmylin/publications/Lin___**
 Kolcz_SIGMOD2012.pdf

 http://www.umiacs.umd.edu/%**7Ejimmylin/publications/Lin_**
 Kolcz_SIGMOD2012.pdfhttp://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf
 

   
I'm glad to see the same thing start with Hive.
   
Dean
   
   
On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI
 yuin...@gmail.com mailto:yuin...@gmail.com
 mailto:yuin...@gmail.com mailto:yuin...@gmail.com wrote:
   
Hello all,
   
My employer, AIST, has given the thumbs up to open source
 our machine
learning library, named Hivemall.
   
Hivemall is a scalable machine learning library running on
 Hive/Hadoop,
licensed under the LGPL 2.1.
   

 https://github.com/myui/__**hivemallhttps://github.com/myui/__hivemall

 https://github.com/myui/**hivemallhttps://github.com/myui/hivemall
 
   
Hivemall provides machine learning functionality as well
 as feature
engineering functions through UDFs/UDAFs/UDTFs of Hive. It
 is designed
to be scalable to the number of training instances as well
 as the
 number
of training features.
   
Hivemall is very easy to use as every machine learning
 step is done
within HiveQL.
   

Re: NPE org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable

2013-10-11 Thread xinyan Yang
Development environment,hive 0.11、hadoop 1.0.3


2013/10/11 xinyan Yang moon.yan...@gmail.com

 Hi,
 when i run this sql,it fails,can anyone give me a advise

 
 select e.udid as udid,e.app_id as app_id
 from acorn_3g.ClientChannelDefine cc
 join (
 select udid,app_id,from_id
 from (
  select u.device_id as udid,u.app_id as app_id,g.device_id as
 3gdid,u.from_id as from_id from acorn_3g.user_device_info u
 left outer join (select device_id from acorn_3g.3g_device_id where
 log_date'2013-09-15') g
  on u.device_id=g.device_id where u.log_date='2013-09-15' and
 u.from_id0 and u.type=1) f1
 where 3gdid is null ) e
 on(e.from_id=cc.from_id)

 

 error info:
 Task with the most failures(4):
 -
 Task ID:
   task_201305281414_236693_m_01

 URL:

 http://YZSJHL18-22.opi.com:50030/taskdetails.jsp?jobid=job_201305281414_236693tipid=task_201305281414_236693_m_01http://yzsjhl18-22.opi.com:50030/taskdetails.jsp?jobid=job_201305281414_236693tipid=task_201305281414_236693_m_01
 -
 Diagnostic Messages for this Task:
 java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
 at
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
 at
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1377)
 at
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
 at
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
 at
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:611)
 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 Caused by: java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:186)
 ... 14 more


 FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched:
 Job 0: Map: 343  Reduce: 2   Cumulative CPU: 3478.61 sec   HDFS Read:
 1862106687 HDFS Write: 3838425 SUCCESS
 Job 1: Map: 2   HDFS Read: 0 HDFS Write: 0 FAIL




Re: NPE org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable

2013-10-11 Thread Yin Huai
Hello Xinyang,

Can you attach the query plan (the output of EXPLAIN)? I think a bad plan
caused the error.

Also, can you try hive trunk? Looks like it is a bug fixed after the
release of 0.11.

Thanks,

Yin


On Fri, Oct 11, 2013 at 9:21 AM, xinyan Yang moon.yan...@gmail.com wrote:

 Development environment,hive 0.11、hadoop 1.0.3


 2013/10/11 xinyan Yang moon.yan...@gmail.com

 Hi,
 when i run this sql,it fails,can anyone give me a advise

 
 select e.udid as udid,e.app_id as app_id
 from acorn_3g.ClientChannelDefine cc
 join (
 select udid,app_id,from_id
 from (
  select u.device_id as udid,u.app_id as app_id,g.device_id as
 3gdid,u.from_id as from_id from acorn_3g.user_device_info u
 left outer join (select device_id from acorn_3g.3g_device_id where
 log_date'2013-09-15') g
  on u.device_id=g.device_id where u.log_date='2013-09-15' and
 u.from_id0 and u.type=1) f1
 where 3gdid is null ) e
 on(e.from_id=cc.from_id)

 

 error info:
 Task with the most failures(4):
 -
 Task ID:
   task_201305281414_236693_m_01

 URL:

 http://YZSJHL18-22.opi.com:50030/taskdetails.jsp?jobid=job_201305281414_236693tipid=task_201305281414_236693_m_01http://yzsjhl18-22.opi.com:50030/taskdetails.jsp?jobid=job_201305281414_236693tipid=task_201305281414_236693_m_01
 -
 Diagnostic Messages for this Task:
 java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
 at
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
 at
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1377)
 at
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
 at
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
 at
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:611)
 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 Caused by: java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:186)
 ... 14 more


 FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched:
 Job 0: Map: 343  Reduce: 2   Cumulative CPU: 3478.61 sec   HDFS Read:
 1862106687 HDFS Write: 3838425 SUCCESS
 Job 1: Map: 2   HDFS Read: 0 HDFS Write: 0 FAIL





hive partition pruning on joining on partition column

2013-10-11 Thread java8964 java8964
I have the requirement trying to support in hive, not sure if it is doable.
I have the hadoop 1.1.1 with Hive 0.9.0 (Using deby as the meta store)
If I partition my data by a dt column, so if my table 'foo' have some 
partitions like 'dt=2013-07-01' to 'dt=2013-07-30'.
Now the user want to query all the data of Saturday only.
To make it flexiable, instead of asking end user to find out what date in that 
month are Saturday, I add a lookup table (just called it 'bar') in the HIVE 
with following columns:
year, month, day, dt_format, week_of_day
So I want to see if I can join with foo and bar to still get the partition 
pruning:
select *from foojoin baron (bar.year=2013 and bar.month=7 and bar.day_of_week=6 
and bar.dt_foramt = foo.dt)
I tried several ways, like switch the table order, join with subquery etc, none 
of them will make partition pruning works in this case on table foo. 
Can this really archivable in hive?
Thanks
Yong  

Re: hive partition pruning on joining on partition column

2013-10-11 Thread Nitin Pawar
one easiest way to do this is
create a table where each date maps to week of month, week of year, day of
week, day of month and then do the join on just date and put the conditions
on where clause.

Its easy to manipulate the date column for my understanding and you can
join just based on date and  get results based on where conditions.


PS: this is what we currently do where we have to do continuous rollup
analytics  for yeat to date or parameter to date calculations.
Wait for others to give you better solutions,


On Fri, Oct 11, 2013 at 10:35 PM, java8964 java8964 java8...@hotmail.comwrote:

 I have the requirement trying to support in hive, not sure if it is doable.

 I have the hadoop 1.1.1 with Hive 0.9.0 (Using deby as the meta store)

 If I partition my data by a dt column, so if my table 'foo' have some
 partitions like 'dt=2013-07-01' to 'dt=2013-07-30'.

 Now the user want to query all the data of Saturday only.

 To make it flexiable, instead of asking end user to find out what date in
 that month are Saturday, I add a lookup table (just called it 'bar') in the
 HIVE with following columns:

 year, month, day, dt_format, week_of_day

 So I want to see if I can join with foo and bar to still get the partition
 pruning:

 select *
 from foo
 join
 bar
 on (bar.year=2013 and bar.month=7 and bar.day_of_week=6 and bar.dt_foramt
 = foo.dt)

 I tried several ways, like switch the table order, join with subquery etc,
 none of them will make partition pruning works in this case on table foo.

 Can this really archivable in hive?

 Thanks

 Yong




-- 
Nitin Pawar