As Maryann said, secondary index would help a lot in your case. Beside that
if your HBase cluster consists of at-least couple of region servers and the
data is distributed, then stats
<http://phoenix.apache.org/update_statistics.html> available in latest
Phoenix release should get better performance.

On Tue, Nov 25, 2014 at 12:30 PM, Maryann Xue <maryann....@gmail.com> wrote:

> Your query time should be the sum of running the following two individual
> queries:
> 1. SELECT MAX("Records"."timestamp") FROM "History" where "rowId" like ?
> 2. select "Records"."Operation", "Records"."status", "Records"."timestamp"
> from "History" where "Records"."timestamp"=?
>
> Without secondary indexing on column "timestamp", the first query will be
> full-scan with aggregation and the second will be a full-scan with a filter
> on the "timestamp" column.
>
> Generally the higher parallel level (which depends on how many nodes you
> have in your cluster and how many regions you have for the target table)
> the better performance you get.
>
> In this case, you could also have an secondary index on "timestamp" column
> to avoid a full-table scan for the second query.
>
>
> Thanks,
> Maryann
>
> On Tue, Nov 25, 2014 at 1:26 AM, Ahmed Hussien <aahussi...@gmail.com>
> wrote:
>
> > Good Day to you guys,
> >
> > I have a db that contains about 33 million records per table (4 tables).
> > when i execute the following queries it takes to much time to retrieve
> > about 5 miniutes per query.
> >
> > 1- select "Records"."Operation", "Records"."status",
> "Records"."timestamp"
> > from "History" where "Records"."timestamp"=(SELECT
> > MAX("Records"."timestamp") FROM "History" where "rowId" like ?)
> >
> > can you help me adjust the performance?
> > ---
> > Ahmed
> >
> >
>
>
> --
> Thanks,
> Maryann
>

Reply via email to