Sven Haster created ZEPPELIN-1106:
-------------------------------------
Summary: Unique count / count distinct mode for 'Values' in
notebook
Key: ZEPPELIN-1106
URL: https://issues.apache.org/jira/browse/ZEPPELIN-1106
Project: Zeppelin
Issue Type: Improvement
Components: Core
Affects Versions: 0.5.6
Reporter: Sven Haster
While making a notebook in Zeppelin, it would be nice if, apart from the
'count' (and 'sum', 'avg' etc.) option there would also be a 'count distinct'
option for the field with values.
For example, if I have a table Employee filled with (id, department, status,
active, startyear) I would like to make a graph displaying the number of unique
departments for each status/startyear combination (ie, the number of
departments who hired one or more employees in that year for each status).
I could ofcourse write a quick query (select startyear, status
count(distinct(department)) as count_dep from employee group by startyear,
status) to display the data, put startyear in 'Keys' and status in 'Groups' and
for the Value pick 'sum(count_dep)' and it works.
However, if I then want the number of unique departments per startyear (ie, the
number of departments who hired someone in that year) I can't just remove
status from the 'Groups' field. That would mean the value would be the sum of
the distinct counts per startyear/status combination and would therefore
contain doubles.
I would like to have a 'count distinct' option (for display purposes maybe
'uqcount'?) so that I would simply do query 'select * from employee', put
startyear in the 'Keys' field, status in the 'Groups' field and count
distinct(department) in the 'Values' field and have it display the number of
unique departments with one or more employees for the given startyear/status
combinations.
And then if I want just the number of departments who hired an employee in for
each startyear I would remove status from the 'Groups' field and the graph
would update with the correct numbers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)