Sven Haster created ZEPPELIN-1106: ------------------------------------- Summary: Unique count / count distinct mode for 'Values' in notebook Key: ZEPPELIN-1106 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1106 Project: Zeppelin Issue Type: Improvement Components: Core Affects Versions: 0.5.6 Reporter: Sven Haster
While making a notebook in Zeppelin, it would be nice if, apart from the 'count' (and 'sum', 'avg' etc.) option there would also be a 'count distinct' option for the field with values. For example, if I have a table Employee filled with (id, department, status, active, startyear) I would like to make a graph displaying the number of unique departments for each status/startyear combination (ie, the number of departments who hired one or more employees in that year for each status). I could ofcourse write a quick query (select startyear, status count(distinct(department)) as count_dep from employee group by startyear, status) to display the data, put startyear in 'Keys' and status in 'Groups' and for the Value pick 'sum(count_dep)' and it works. However, if I then want the number of unique departments per startyear (ie, the number of departments who hired someone in that year) I can't just remove status from the 'Groups' field. That would mean the value would be the sum of the distinct counts per startyear/status combination and would therefore contain doubles. I would like to have a 'count distinct' option (for display purposes maybe 'uqcount'?) so that I would simply do query 'select * from employee', put startyear in the 'Keys' field, status in the 'Groups' field and count distinct(department) in the 'Values' field and have it display the number of unique departments with one or more employees for the given startyear/status combinations. And then if I want just the number of departments who hired an employee in for each startyear I would remove status from the 'Groups' field and the graph would update with the correct numbers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)