Re: Query TimeOut on Azure HDInsight
Thanks. I got an idea how distinct works and probably would be able to solve this issue now. Regards Sumanta -Juvenn Woowrote: - === To: user@phoenix.apache.org From: Juvenn Woo Date: 02/10/2017 09:08PM Subject: Re: Query TimeOut on Azure HDInsight === Sumanta, Actually DISTINCT makes big difference, it may require scan as many rows as possible to find 10 (limit 10) distinct rows. If your COL1 has less than 10 distinct value, it'll scan whole table to know that there are less than that. On Feb 10, 2017 11:25 PM, "Sumanta Gh" wrote: > If we remove DISTINCT from the below query, everything works fine. > Any pointer why DISTINCT could fail? > > > Regards > Sumanta > > > -Mark Heppner wrote: - > > === > To: user@phoenix.apache.org > From: Mark Heppner > Date: 02/10/2017 08:02PM > Subject: Re: Query TimeOut on Azure HDInsight > === >Sumanta, > Doing the full scan over 100 million rows is going to be costly. How many > region servers do you have? If this is a common query, you could add a > secondary index on COL1 and INCLUDE(COLX). Otherwise, you'll have to > increase hbase.rpc.timeout to something higher than 6 and maybe even > phoenix.query.timeoutMs. I'm sure there are other optimizations too, but > I'll let someone else answer that. > > On Fri, Feb 10, 2017 at 7:40 AM, Sumanta Gh wrote: > > > Hi, > > We have a production system on Azure HDInsight. > > There is a table called TABLE1 which has approx 100 million rows. > > > > Recently the following query is always timing out - > > > > *SELECT DISTINCT COLX FROM TABLE1 WHERE COL1=1 LIMIT 10;* > > > > java.lang.RuntimeException: org.apache.phoenix.exception. > PhoenixIOException: > > org.apache.phoenix.exception.PhoenixIOException: Failed after > > attempts=36, exceptions: > > Fri Feb 10 12:06:14 GMT 2017, null, java.net.SocketTimeoutException: > > callTimeout=6, callDuration=72705: row '? ?' on table 'TABLE1' at > > region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0., > > hostname=workernode1.xx.d1.internal.cloudapp.net, > 60020,1483615853438, > > seqNum=173240701 > > > > > > The explain plan is - > > +--+ > > | PLAN | > > +--+ > > | CLIENT 47-CHUNK PARALLEL 47-WAY RANGE SCAN OVER TABLE1 [1] | > > | SERVER AGGREGATE INTO DISTINCT ROWS BY [COLX] LIMIT 10 GROUPS | > > | CLIENT MERGE SORT| > > | CLIENT 10 ROW LIMIT | > > +--+ > > > > > > How can we make this above query successful? Kindly reply urgently. > > > > Regards > > Sumanta > > > > =-=-= > > Notice: The information contained in this e-mail > > message and/or attachments to it may contain > > confidential or privileged information. If you are > > not the intended recipient, any dissemination, use, > > review, distribution, printing or copying of the > > information contained in this e-mail message > > and/or attachments to it are strictly prohibited. If > > you have received this communication in error, > > please notify us by reply e-mail or telephone and > > immediately and permanently delete the message > > and any attachments. Thank you > > > > > > > -- > Mark Heppner > >
Re: Query TimeOut on Azure HDInsight
If we remove DISTINCT from the below query, everything works fine. Any pointer why DISTINCT could fail? Regards Sumanta -Mark Heppnerwrote: - === To: user@phoenix.apache.org From: Mark Heppner Date: 02/10/2017 08:02PM Subject: Re: Query TimeOut on Azure HDInsight === Sumanta, Doing the full scan over 100 million rows is going to be costly. How many region servers do you have? If this is a common query, you could add a secondary index on COL1 and INCLUDE(COLX). Otherwise, you'll have to increase hbase.rpc.timeout to something higher than 6 and maybe even phoenix.query.timeoutMs. I'm sure there are other optimizations too, but I'll let someone else answer that. On Fri, Feb 10, 2017 at 7:40 AM, Sumanta Gh wrote: > Hi, > We have a production system on Azure HDInsight. > There is a table called TABLE1 which has approx 100 million rows. > > Recently the following query is always timing out - > > *SELECT DISTINCT COLX FROM TABLE1 WHERE COL1=1 LIMIT 10;* > > java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException: > org.apache.phoenix.exception.PhoenixIOException: Failed after > attempts=36, exceptions: > Fri Feb 10 12:06:14 GMT 2017, null, java.net.SocketTimeoutException: > callTimeout=6, callDuration=72705: row '? ?' on table 'TABLE1' at > region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0., > hostname=workernode1.xx.d1.internal.cloudapp.net,60020,1483615853438, > seqNum=173240701 > > > The explain plan is - > +--+ > | PLAN | > +--+ > | CLIENT 47-CHUNK PARALLEL 47-WAY RANGE SCAN OVER TABLE1 [1] | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [COLX] LIMIT 10 GROUPS | > | CLIENT MERGE SORT| > | CLIENT 10 ROW LIMIT | > +--+ > > > How can we make this above query successful? Kindly reply urgently. > > Regards > Sumanta > > =-=-= > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > -- Mark Heppner
Re: Query TimeOut on Azure HDInsight
Sumanta: bq. at region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0., Please check the log of region server which hosted the above region around the time of your query. Which Phoenix / hbase release are you using ? Thanks On Fri, Feb 10, 2017 at 6:31 AM, Mark Heppnerwrote: > Sumanta, > Doing the full scan over 100 million rows is going to be costly. How many > region servers do you have? If this is a common query, you could add a > secondary index on COL1 and INCLUDE(COLX). Otherwise, you'll have to > increase hbase.rpc.timeout to something higher than 6 and maybe even > phoenix.query.timeoutMs. I'm sure there are other optimizations too, but > I'll let someone else answer that. > > On Fri, Feb 10, 2017 at 7:40 AM, Sumanta Gh wrote: > >> Hi, >> We have a production system on Azure HDInsight. >> There is a table called TABLE1 which has approx 100 million rows. >> >> Recently the following query is always timing out - >> >> *SELECT DISTINCT COLX FROM TABLE1 WHERE COL1=1 LIMIT 10;* >> >> java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException: >> org.apache.phoenix.exception.PhoenixIOException: Failed after >> attempts=36, exceptions: >> Fri Feb 10 12:06:14 GMT 2017, null, java.net.SocketTimeoutException: >> callTimeout=6, callDuration=72705: row '? ?' on table 'TABLE1' at >> region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0., >> hostname=workernode1.xx.d1.internal.cloudapp.net,60020,1483615853438, >> seqNum=173240701 >> >> >> The explain plan is - >> +--+ >> | PLAN | >> +--+ >> | CLIENT 47-CHUNK PARALLEL 47-WAY RANGE SCAN OVER TABLE1 [1] | >> | SERVER AGGREGATE INTO DISTINCT ROWS BY [COLX] LIMIT 10 GROUPS | >> | CLIENT MERGE SORT| >> | CLIENT 10 ROW LIMIT | >> +--+ >> >> >> How can we make this above query successful? Kindly reply urgently. >> >> Regards >> Sumanta >> >> =-=-= >> Notice: The information contained in this e-mail >> message and/or attachments to it may contain >> confidential or privileged information. If you are >> not the intended recipient, any dissemination, use, >> review, distribution, printing or copying of the >> information contained in this e-mail message >> and/or attachments to it are strictly prohibited. If >> you have received this communication in error, >> please notify us by reply e-mail or telephone and >> immediately and permanently delete the message >> and any attachments. Thank you >> >> > > > -- > Mark Heppner >
Re: Query TimeOut on Azure HDInsight
Sumanta, Doing the full scan over 100 million rows is going to be costly. How many region servers do you have? If this is a common query, you could add a secondary index on COL1 and INCLUDE(COLX). Otherwise, you'll have to increase hbase.rpc.timeout to something higher than 6 and maybe even phoenix.query.timeoutMs. I'm sure there are other optimizations too, but I'll let someone else answer that. On Fri, Feb 10, 2017 at 7:40 AM, Sumanta Ghwrote: > Hi, > We have a production system on Azure HDInsight. > There is a table called TABLE1 which has approx 100 million rows. > > Recently the following query is always timing out - > > *SELECT DISTINCT COLX FROM TABLE1 WHERE COL1=1 LIMIT 10;* > > java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException: > org.apache.phoenix.exception.PhoenixIOException: Failed after > attempts=36, exceptions: > Fri Feb 10 12:06:14 GMT 2017, null, java.net.SocketTimeoutException: > callTimeout=6, callDuration=72705: row '? ?' on table 'TABLE1' at > region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0., > hostname=workernode1.xx.d1.internal.cloudapp.net,60020,1483615853438, > seqNum=173240701 > > > The explain plan is - > +--+ > | PLAN | > +--+ > | CLIENT 47-CHUNK PARALLEL 47-WAY RANGE SCAN OVER TABLE1 [1] | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [COLX] LIMIT 10 GROUPS | > | CLIENT MERGE SORT| > | CLIENT 10 ROW LIMIT | > +--+ > > > How can we make this above query successful? Kindly reply urgently. > > Regards > Sumanta > > =-=-= > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > -- Mark Heppner
Query TimeOut on Azure HDInsight
Hi, We have a production system on Azure HDInsight. There is a table called TABLE1 which has approx 100 million rows. Recently the following query is always timing out - SELECT DISTINCT COLX FROM TABLE1 WHERE COL1=1 LIMIT 10; java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions: Fri Feb 10 12:06:14 GMT 2017, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=72705: row '? ?' on table 'TABLE1' at region=TABLE1,,1450429763940.e30cec826e39df2e3b21e0baa6e1d9c0., hostname=workernode1.xx.d1.internal.cloudapp.net,60020,1483615853438, seqNum=173240701 The explain plan is - +--+ | PLAN | +--+ | CLIENT 47-CHUNK PARALLEL 47-WAY RANGE SCAN OVER TABLE1 [1] | | SERVER AGGREGATE INTO DISTINCT ROWS BY [COLX] LIMIT 10 GROUPS | | CLIENT MERGE SORT| | CLIENT 10 ROW LIMIT | +--+ How can we make this above query successful? Kindly reply urgently. Regards Sumanta =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you