Use rollout and cube. On Fri, Aug 24, 2018 at 7:55 PM 崔苗 <cuim...@danale.com> wrote:
> > > > > > > -------- Forwarding messages -------- > From: "崔苗" <cuim...@danale.com> > Date: 2018-08-25 10:54:31 > To: d...@spark.apache.org > Subject: multiple group by action > > Hi, > we have some user data with > columns(userId,company,client,country,region,city), > now we want to count userId by multiple column,such as : > select count(distinct userId) group by company > select count(distinct userId) group by company,client > select count(distinct userId) group by company,client,country > select count(distinct userId) group by company,client,country,region > etc > so each action will bring a shuffle stage, as for columns( company,client) > contain column company, > Is there a way to reduce shuffle stage? > > Thanks for any replys > > > >