Hi Rahul,

Sure, but can I perhaps get the files to you directly?

Regards,
 -Stefán

On Sat, Jun 3, 2017 at 8:13 PM, rahul challapalli <
challapallira...@gmail.com> wrote:

> Can you please raise a jira and attach the required files? I can try to
> reproduce it.
>
> Rahul
>
> On Jun 3, 2017 6:19 AM, "Stefán Baxter" <ste...@activitystream.com> wrote:
>
> > Hi,
> >
> > I have a sample data set (a few million records) that is saved to parquet
> > in 2 ways. A simple file structure with primary types to store dimensions
> > and metrics (String, Double) and a using nested maps (String,String and
> > String,Double) respectively.
> >
> > Querying the data set with the simple types only:
> >
> > select roundTimeStamp(s.occurred_at,'PT1H') as `at`, sum(metrics_price)
> as
> > price, sum(metrics_kwh) as kwh from
> > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s
> > group by roundTimeStamp(s.occurred_at,'PT1H')
> >
> >
> > takes: *28.442 *sec. (dev. laptop x 1)
> >
> >
> > Same query against the nested structure:
> >
> > select roundTimeStamp(s.occurred_at,'PT1H') as `at`,
> sum(s.metrics.price)
> > as price, sum(s.metricss.kwh) as kwh from
> > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s
> > group by roundTimeStamp(s.occurred_at,'PT1H')
> >
> > takes: *719.810* sec.
> >
> > Event counting the number of records takes very, very long if there is a
> > nested structure involved. (select count(*) from)
> > It does not behave like this on our production servers (1.8) put I have
> not
> > run this particular test on them (their performance has never been an
> > issue)
> > I have these sample files available if anyone wishes to reproduces this
> > consistently.
> > Regards,
> >  -Stefán
> >
>

Reply via email to