hive query to select top 10 product of each subcategory and select most recent product info

2014-04-11 Thread Mohit Durgapal
I have a hive table partitioned by dates. It contains ecomm data in the format siteid,sitecatid,catid,subcatgid,pid,pname,pprice,pmrp,pdesc What I need to do is to run a query on table above in hive for top 10 products(count wise) in each sub category. What adds a bit more complexity is

Re: hive query to select top 10 product of each subcategory and select most recent product info

2014-04-11 Thread Nitin Pawar
may be you can share your table ddl, your query and what output r u looking for On Fri, Apr 11, 2014 at 12:26 PM, Mohit Durgapal durgapalmo...@gmail.comwrote: I have a hive table partitioned by dates. It contains ecomm data in the format

Re: hive query to select top 10 product of each subcategory and select most recent product info

2014-04-11 Thread Mohit Durgapal
Hi Nitin, The ddl is as follows: CREATE EXTERNAL TABLE user_logs( users_iduuidstring, siteid int, site_catid int, stext string, catgint, // CATEGORY scatg int, // SUBCATEGORY catgnamestring, scatgname string, brand string,// PRODUCT BRAND NAME prrange

Re: hive query to select top 10 product of each subcategory and select most recent product info

2014-04-11 Thread Nitin Pawar
will it be a good idea to just get top 10 ranked products by whatever your ranking is based on and then join it with its metadata (self join or any other way) ? On Fri, Apr 11, 2014 at 1:52 PM, Mohit Durgapal durgapalmo...@gmail.comwrote: Hi Nitin, The ddl is as follows: CREATE EXTERNAL

Re: hive query to select top 10 product of each subcategory and select most recent product info

2014-04-11 Thread Adrian Hains
I think you need to separate out the logic that does your group by aggregations from the logic of then retrieving all of the other columns for a single row from that set. Something like: select tbl.myKeyColumn1, tbl.myKeyColumn2, tbl.otherValueColumn1,