Can you please raise a jira and attach the required files? I can try to reproduce it.
Rahul On Jun 3, 2017 6:19 AM, "Stefán Baxter" <ste...@activitystream.com> wrote: > Hi, > > I have a sample data set (a few million records) that is saved to parquet > in 2 ways. A simple file structure with primary types to store dimensions > and metrics (String, Double) and a using nested maps (String,String and > String,Double) respectively. > > Querying the data set with the simple types only: > > select roundTimeStamp(s.occurred_at,'PT1H') as `at`, sum(metrics_price) as > price, sum(metrics_kwh) as kwh from > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s > group by roundTimeStamp(s.occurred_at,'PT1H') > > > takes: *28.442 *sec. (dev. laptop x 1) > > > Same query against the nested structure: > > select roundTimeStamp(s.occurred_at,'PT1H') as `at`, sum(s.metrics.price) > as price, sum(s.metricss.kwh) as kwh from > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s > group by roundTimeStamp(s.occurred_at,'PT1H') > > takes: *719.810* sec. > > Event counting the number of records takes very, very long if there is a > nested structure involved. (select count(*) from) > It does not behave like this on our production servers (1.8) put I have not > run this particular test on them (their performance has never been an > issue) > I have these sample files available if anyone wishes to reproduces this > consistently. > Regards, > -Stefán >