Re: Query TimeOut on Azure HDInsight

2017-02-10 Thread Sumanta Gh
Thanks. I got an idea how distinct works and probably would be able to solve 
this issue now.
 

Regards
Sumanta

 -Juvenn Woo  wrote: -

 ===
 To: user@phoenix.apache.org
 From: Juvenn Woo 
 Date: 02/10/2017 09:08PM 
 Subject: Re: Query TimeOut on Azure HDInsight
 ===
   Sumanta,

Actually DISTINCT makes big difference, it may require scan as many rows as
possible to find 10 (limit 10) distinct rows. If your COL1 has less than 10
distinct value, it'll scan whole table to know that there are less than
that.

On Feb 10, 2017 11:25 PM, "Sumanta Gh"  wrote:

> If we remove DISTINCT from the below query, everything works fine.
> Any pointer why DISTINCT could fail?
>
>
> Regards
> Sumanta
>
>
>  -Mark Heppner  wrote: -
>
>  ===
>  To: user@phoenix.apache.org
>  From: Mark Heppner 
>  Date: 02/10/2017 08:02PM
>  Subject: Re: Query TimeOut on Azure HDInsight
>  ===
>Sumanta,
> Doing the full scan over 100 million rows is going to be costly. How many
> region servers do you have? If this is a common query, you could add a
> secondary index on COL1 and INCLUDE(COLX). Otherwise, you'll have to
> increase hbase.rpc.timeout to something higher than 6 and maybe even
> phoenix.query.timeoutMs. I'm sure there are other optimizations too, but
> I'll let someone else answer that.
>
> On Fri, Feb 10, 2017 at 7:40 AM, Sumanta Gh  wrote:
>
> > Hi,
> > We have a production system on Azure HDInsight.
> > There is a table called TABLE1 which has approx 100 million rows.
> >
> > Recently the following query is always timing out -
> >
> > *SELECT DISTINCT COLX FROM TABLE1 WHERE COL1=1 LIMIT 10;*
> >
> > java.lang.RuntimeException: org.apache.phoenix.exception.
> PhoenixIOException:
> > org.apache.phoenix.exception.PhoenixIOException: Failed after
> > attempts=36, exceptions:
> > Fri Feb 10 12:06:14 GMT 2017, null, java.net.SocketTimeoutException:
> > callTimeout=6, callDuration=72705: row '?  ?' on table 'TABLE1' at
> > region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0.,
> > hostname=workernode1.xx.d1.internal.cloudapp.net,
> 60020,1483615853438,
> > seqNum=173240701
> >
> >
> > The explain plan is -
> > +--+
> > |   PLAN   |
> > +--+
> > | CLIENT 47-CHUNK PARALLEL 47-WAY RANGE SCAN OVER TABLE1 [1] |
> > | SERVER AGGREGATE INTO DISTINCT ROWS BY [COLX] LIMIT 10 GROUPS |
> > | CLIENT MERGE SORT|
> > | CLIENT 10 ROW LIMIT  |
> > +--+
> >
> >
> > How can we make this above query successful? Kindly reply urgently.
> >
> > Regards
> > Sumanta
> >
> > =-=-=
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
>
>
> --
> Mark Heppner
>
>



Re: Query TimeOut on Azure HDInsight

2017-02-10 Thread Sumanta Gh
If we remove DISTINCT from the below query, everything works fine. 
Any pointer why DISTINCT could fail?
 

Regards
Sumanta


 -Mark Heppner  wrote: -

 ===
 To: user@phoenix.apache.org
 From: Mark Heppner 
 Date: 02/10/2017 08:02PM 
 Subject: Re: Query TimeOut on Azure HDInsight
 ===
   Sumanta,
Doing the full scan over 100 million rows is going to be costly. How many
region servers do you have? If this is a common query, you could add a
secondary index on COL1 and INCLUDE(COLX). Otherwise, you'll have to
increase hbase.rpc.timeout to something higher than 6 and maybe even
phoenix.query.timeoutMs. I'm sure there are other optimizations too, but
I'll let someone else answer that.

On Fri, Feb 10, 2017 at 7:40 AM, Sumanta Gh  wrote:

> Hi,
> We have a production system on Azure HDInsight.
> There is a table called TABLE1 which has approx 100 million rows.
>
> Recently the following query is always timing out -
>
> *SELECT DISTINCT COLX FROM TABLE1 WHERE COL1=1 LIMIT 10;*
>
> java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException:
> org.apache.phoenix.exception.PhoenixIOException: Failed after
> attempts=36, exceptions:
> Fri Feb 10 12:06:14 GMT 2017, null, java.net.SocketTimeoutException:
> callTimeout=6, callDuration=72705: row '?  ?' on table 'TABLE1' at
> region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0.,
> hostname=workernode1.xx.d1.internal.cloudapp.net,60020,1483615853438,
> seqNum=173240701
>
>
> The explain plan is -
> +--+
> |   PLAN   |
> +--+
> | CLIENT 47-CHUNK PARALLEL 47-WAY RANGE SCAN OVER TABLE1 [1] |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [COLX] LIMIT 10 GROUPS |
> | CLIENT MERGE SORT|
> | CLIENT 10 ROW LIMIT  |
> +--+
>
>
> How can we make this above query successful? Kindly reply urgently.
>
> Regards
> Sumanta
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>


-- 
Mark Heppner



Re: Query TimeOut on Azure HDInsight

2017-02-10 Thread Ted Yu
Sumanta:
bq. at region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0.,

Please check the log of region server which hosted the above region around
the time of your query.

Which Phoenix / hbase release are you using ?

Thanks

On Fri, Feb 10, 2017 at 6:31 AM, Mark Heppner 
wrote:

> Sumanta,
> Doing the full scan over 100 million rows is going to be costly. How many
> region servers do you have? If this is a common query, you could add a
> secondary index on COL1 and INCLUDE(COLX). Otherwise, you'll have to
> increase hbase.rpc.timeout to something higher than 6 and maybe even
> phoenix.query.timeoutMs. I'm sure there are other optimizations too, but
> I'll let someone else answer that.
>
> On Fri, Feb 10, 2017 at 7:40 AM, Sumanta Gh  wrote:
>
>> Hi,
>> We have a production system on Azure HDInsight.
>> There is a table called TABLE1 which has approx 100 million rows.
>>
>> Recently the following query is always timing out -
>>
>> *SELECT DISTINCT COLX FROM TABLE1 WHERE COL1=1 LIMIT 10;*
>>
>> java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException:
>> org.apache.phoenix.exception.PhoenixIOException: Failed after
>> attempts=36, exceptions:
>> Fri Feb 10 12:06:14 GMT 2017, null, java.net.SocketTimeoutException:
>> callTimeout=6, callDuration=72705: row '?  ?' on table 'TABLE1' at
>> region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0.,
>> hostname=workernode1.xx.d1.internal.cloudapp.net,60020,1483615853438,
>> seqNum=173240701
>>
>>
>> The explain plan is -
>> +--+
>> |   PLAN   |
>> +--+
>> | CLIENT 47-CHUNK PARALLEL 47-WAY RANGE SCAN OVER TABLE1 [1] |
>> | SERVER AGGREGATE INTO DISTINCT ROWS BY [COLX] LIMIT 10 GROUPS |
>> | CLIENT MERGE SORT|
>> | CLIENT 10 ROW LIMIT  |
>> +--+
>>
>>
>> How can we make this above query successful? Kindly reply urgently.
>>
>> Regards
>> Sumanta
>>
>> =-=-=
>> Notice: The information contained in this e-mail
>> message and/or attachments to it may contain
>> confidential or privileged information. If you are
>> not the intended recipient, any dissemination, use,
>> review, distribution, printing or copying of the
>> information contained in this e-mail message
>> and/or attachments to it are strictly prohibited. If
>> you have received this communication in error,
>> please notify us by reply e-mail or telephone and
>> immediately and permanently delete the message
>> and any attachments. Thank you
>>
>>
>
>
> --
> Mark Heppner
>


Re: Query TimeOut on Azure HDInsight

2017-02-10 Thread Mark Heppner
Sumanta,
Doing the full scan over 100 million rows is going to be costly. How many
region servers do you have? If this is a common query, you could add a
secondary index on COL1 and INCLUDE(COLX). Otherwise, you'll have to
increase hbase.rpc.timeout to something higher than 6 and maybe even
phoenix.query.timeoutMs. I'm sure there are other optimizations too, but
I'll let someone else answer that.

On Fri, Feb 10, 2017 at 7:40 AM, Sumanta Gh  wrote:

> Hi,
> We have a production system on Azure HDInsight.
> There is a table called TABLE1 which has approx 100 million rows.
>
> Recently the following query is always timing out -
>
> *SELECT DISTINCT COLX FROM TABLE1 WHERE COL1=1 LIMIT 10;*
>
> java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException:
> org.apache.phoenix.exception.PhoenixIOException: Failed after
> attempts=36, exceptions:
> Fri Feb 10 12:06:14 GMT 2017, null, java.net.SocketTimeoutException:
> callTimeout=6, callDuration=72705: row '?  ?' on table 'TABLE1' at
> region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0.,
> hostname=workernode1.xx.d1.internal.cloudapp.net,60020,1483615853438,
> seqNum=173240701
>
>
> The explain plan is -
> +--+
> |   PLAN   |
> +--+
> | CLIENT 47-CHUNK PARALLEL 47-WAY RANGE SCAN OVER TABLE1 [1] |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [COLX] LIMIT 10 GROUPS |
> | CLIENT MERGE SORT|
> | CLIENT 10 ROW LIMIT  |
> +--+
>
>
> How can we make this above query successful? Kindly reply urgently.
>
> Regards
> Sumanta
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>


-- 
Mark Heppner


Query TimeOut on Azure HDInsight

2017-02-10 Thread Sumanta Gh
Hi,
We have a production system on Azure HDInsight.
There is a table called TABLE1 which has approx 100 million rows.

Recently the following query is always timing out - 

SELECT DISTINCT COLX FROM TABLE1 WHERE COL1=1 LIMIT 10;

java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException: 
org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, 
exceptions:
Fri Feb 10 12:06:14 GMT 2017, null, java.net.SocketTimeoutException: 
callTimeout=6, callDuration=72705: row '?  ?' on table 'TABLE1' at 
region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0.,
hostname=workernode1.xx.d1.internal.cloudapp.net,60020,1483615853438, 
seqNum=173240701


The explain plan is -
+--+
|   PLAN   |
+--+
| CLIENT 47-CHUNK PARALLEL 47-WAY RANGE SCAN OVER TABLE1 [1] |
| SERVER AGGREGATE INTO DISTINCT ROWS BY [COLX] LIMIT 10 GROUPS |
| CLIENT MERGE SORT|
| CLIENT 10 ROW LIMIT  |
+--+


How can we make this above query successful? Kindly reply urgently.

Regards
Sumanta 
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you