DISTINCT as "Function" rather than statement - High Level Pig -------------------------------------------------------------
Key: PIG-826 URL: https://issues.apache.org/jira/browse/PIG-826 Project: Pig Issue Type: New Feature Reporter: David Ciemiewicz In SQL, a user would think nothing of doing something like: {code} select COUNT(DISTINCT(user)) as user_count, COUNT(DISTINCT(country)) as country_count, COUNT(DISTINCT(url) as url_count from server_logs; {code} But in Pig, we'd need to do something like the following. And this is about the most compact version I could come up with. {code} Logs = load 'log' using PigStorage() as ( user: chararray, country: chararray, url: chararray); DistinctUsers = distinct (foreach Logs generate user); DistinctCountries = distinct (foreach Logs generate country); DistinctUrls = distinct (foreach Logs generate url); DistinctUsersCount = foreach (group DistinctUsers all) generate group, COUNT(DistinctUsers) as user_count; DistinctCountriesCount = foreach (group DistinctCountries all) generate group, COUNT(DistinctCountries) as country_count; DistinctUrlCount = foreach (group DistinctUrls all) generate group, COUNT(DistinctUrls) as url_count; AllDistinctCounts = cross DistinctUsersCount, DistinctCountriesCount, DistinctUrlCount; Report = foreach AllDistinctCounts generate DistinctUsersCount::user_count, DistinctCountriesCount::country_count, DistinctUrlCount::url_count; store Report into 'log_report' using PigStorage(); {code} It would be good if there was a higher level version of Pig that permitted code to be written as: {code} Logs = load 'log' using PigStorage() as ( user: chararray, country: chararray, url: chararray); Report = overall Logs generate COUNT(DISTINCT(user)) as user_count, COUNT(DISTINCT(country)) as country_count, COUNT(DISTINCT(url)) as url_count; store Report into 'log_report' using PigStorage(); {code} I do want this in Pig and not as SQL. I'd expect High Level Pig to generate Lower Level Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.