I have two tables:
pages( title, domain, url )
top_domains(domain)

top_domains was created from a group by domain operation on the pages table.

Because the pages table is very large, I only want to be able to
sample 5 rows for each domain in top_domains.

in a traditional programming language, i could just use a for loop to
iterate on the domain field and perform a select with a limit 5
clause.

Is there a way to express this query in hive?


-
@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com

Reply via email to