Sven Haster created ZEPPELIN-1106:
-------------------------------------

             Summary: Unique count / count distinct mode for 'Values' in 
notebook
                 Key: ZEPPELIN-1106
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1106
             Project: Zeppelin
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.5.6
            Reporter: Sven Haster


While making a notebook in Zeppelin, it would be nice if, apart from the 
'count' (and 'sum', 'avg' etc.) option there would also be a 'count distinct' 
option for the field with values.

For example, if I have a table Employee filled with (id, department, status, 
active, startyear) I would like to make a graph displaying the number of unique 
departments for each status/startyear combination (ie, the number of 
departments who hired one or more employees in that year for each status).

I could ofcourse write a quick query (select startyear, status 
count(distinct(department)) as count_dep from employee group by startyear, 
status) to display the data, put startyear in 'Keys' and status in 'Groups' and 
for the Value pick 'sum(count_dep)' and it works.

However, if I then want the number of unique departments per startyear (ie, the 
number of departments who hired someone in that year) I can't just remove 
status from the 'Groups' field. That would mean the value would be the sum of 
the distinct counts per startyear/status combination and would therefore 
contain doubles.

I would like to have a 'count distinct' option (for display purposes maybe 
'uqcount'?) so that I would simply do query 'select * from employee', put 
startyear in the 'Keys' field, status in the 'Groups' field and count 
distinct(department) in the 'Values' field and have it display the number of 
unique departments with one or more employees for the given startyear/status 
combinations. 

And then if I want just the number of departments who hired an employee in for 
each startyear I would remove status from the 'Groups' field and the graph 
would update with the correct numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to