Are values of a sequence deleted after an incrementation?

2017-02-22 Thread Cheyenne Forbes
I am not sure how sequences work in phoenix but something popped up in my
mind.

If I request 100 values from a sequence will there be 100 values stored in
the database or just 1 value which is "100" telling phoenix the is the
number to be incremented?


Select date range performance issue

2017-02-22 Thread NaHeon Kim
Hi all,
I've seen performance problem when selecting rows within date range.

My table schema is:

CREATE TABLE MY_TABLE (
   OBJ_ID varchar(20) not null,
   CREATE_DT timestamp not null,
   KEYWORD varchar(100) not null,
   COUNT integer,
   CONSTRAINT PK PRIMARY KEY (OBJ_ID,CREATE_DT,KEYWORD)
);

MY_TABLE has almost 5,200,000 rows,
CREATE_DT has about 6 months range.

And executed query:

SELECT KEYWORD, SUM(COUNT)
FROM MY_TABLE
WHERE CREATE_DT > to_timestamp('2016-03-01 00:00')
AND CREATE_DT < to_timestamp('2016-04-01 00:00')
GROUP BY KEYWORD;

It tooks 46 seconds, too slow than expected, cause CREATE_DT is one of row
key.
I created a secondary index on CREATE_DT but there's no improvement.

Query plan looks weird:
CLIENT 2-CHUNK 0 ROWS 0 BYTES PARALLEL 2-WAY FULL SCAN OVER
NEWS_KEYWORD_COUNT
 SERVER FILTER BY (CREATE_DT > TIMESTAMP '2017-03-01 00:00:00.000' AND
CREATE_DT < TIMESTAMP '2017-04-01 00:00:00.000')
 SERVER AGGREGATE INTO DISTINCT ROWS BY [KEYWORD]
CLIENT MERGE SORT
CLIENT 100 ROW LIMIT

BUT If CREATE_DT comes first of row key, plan says range scan will be done.

Any suggestion? : )

Thanks,
NaHeon


Re: Phoenix query performance

2017-02-22 Thread Arvind S
why cant you reduce your query to

  select msbo1.PARENTID
  from msbo_phoenix_comp_rowkey
  where msbo1.PARENTTYPE = 'SHIPMENT'
   and msbo1.OWNERORGID = 100
   and msbo1.MILESTONETYPEID != 19661
   and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
  group by msbo1.PARENTID
  order by msbo1.PARENTID


??

looks like that's what you get by this left outer.



*Cheers !!*
Arvind

On 22 February 2017 at 22:58, Pradheep Shanmugam <
pradheep.shanmu...@infor.com> wrote:

> Hi,
>
> We have a hbase cluster with 8 region servers with 20G memory
> We have a table  with 1 column family along with a secondary index.
> Following query took only few milliseconds when we had less data(< 1
> million)
> After adding more data(~30M rows) the performance declined and took about
> a minute or more(not stable)
>
> select msbo1.PARENTID
>   from msbo_phoenix_comp_rowkey msbo1
>   left outer join (
>  select PARENTID,MILESTONETYPEID
>from msbo_phoenix_comp_rowkey
>   where PARENTREFERENCETIME between 1479964000 and 1480464000
> <01480%20464%2>
> and OWNERORGID = 100
> and PARENTTYPE = 'SHIPMENT'
> and MILESTONETYPEID = 19661
> group by PARENTID,MILESTONETYPEID
>  ) msbo2
>   on msbo1.PARENTID = msbo2.PARENTID
>   where msbo1.PARENTTYPE = 'SHIPMENT'
>and msbo1.OWNERORGID = 100
>and msbo2.MILESTONETYPEID is null
>and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
> <01480%20464%2>
> group by msbo1.PARENTID
>   order by msbo1.PARENTID
>
> The RHS return about a 500K rows ..LHS about 18M rows…final result about
> 500K rows
>
> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
>  is the index
>
> *Query plan:*
> CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_
> INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
> [0,'SHIPMENT',100]
> SERVER FILTER BY FIRST KEY ONLY AND 
> (TO_UNSIGNED_LONG("PARENTREFERENCETIME")
> >= 1477958400 AND TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000
> <01480%20464%2>)
> SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"]
> CLIENT MERGE SORT
> PARALLEL LEFT-JOIN TABLE 0
> CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER
> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
> [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,
> 464,000]
> SERVER FILTER BY FIRST KEY ONLY
> SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID",
> "MILESTONETYPEID"]
> CLIENT MERGE SORT
> AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL
>
> Attached the phoenix log.
> I see the caching to set as 100..and "maxResultSize”:2097152..is that
> something that can be tuned will help?
> Is that the client merge sort consuming more time can be improved? Is
> there any other tuning possible?
>
> Thanks,
> Pradheep
>


Re: Phoenix Query Server tenant_id

2017-02-22 Thread Josh Elser
Also, remember that Bash is going to interpret that semi-colon in your 
URL if you don't quote it. It will be treated as two separate commands:


`/usr/hdp/2.5.0.0-1245/phoenix/bin/sqlline-thin.py 
http://pqs1.mydomain.com:8765` and `TenantId=myTenant`


Please try `/usr/hdp/2.5.0.0-1245/phoenix/bin/sqlline-thin.py 
'http://pqs1.mydomain.com:8765;TenantId=myTenant'`, and, if that still 
does not work, feel free to open a JIRA issue to investigate why this is 
happening.


Michael Young wrote:

In response to your question:
 >>So, just that I'm on the same page as you, when you invoke the Java
application yourself, it works fine, but when you use sqlline-thin.py,
the extra argument is dropped?
Yes. The property gets passed when invoking the Java application, but
not when adding it to the sqllin-thin.py command line.
It is simple to fix in my local dev test environment, but not in our
production environment, unfortunately.
We are using Phoenix 4.7 (from HDP 2.5 Community release).

On Wed, Feb 22, 2017 at 4:07 PM, Josh Elser > wrote:

So, just that I'm on the same page as you, when you invoke the Java
application yourself, it works fine, but when you use
sqlline-thin.py, the extra argument is dropped?

If so, that just sounds like a bug in sqlline-thin.py that we could
fix...

If you're skilled in Python (or even just adventurous), I'd guess it
would be a relatively simple fix :)

Michael Young wrote:

Hi Josh,

you are right, the TenantId property does get passed through
when I run
the command by hand (see below).

Phoenix Version: phoenix-4.7.0.2.5.0.0-1245

java $PHOENIX_OPTS \
-cp $CLASSPATH \

-Dlog4j.configuration=file:/usr/hdp/2.5.0.0-1245/phoenix/bin/log4j.properties
\
org.apache.phoenix.queryserver.client.SqllineWrapper \
-d org.apache.phoenix.queryserver.client.Driver \
-u
"jdbc:phoenix:thin:url=http://pqs1.mydomain.com:8765

;serialization=PROTOBUF;TenantId=myTenant"
\
-n none -p none --color=true --fastConnect=false --verbose=true
--incremental=false --isolation=TRANSACTION_READ_COMMITTED

However, with Phoenix version 4.7.0.2.5.0.0-1245 (which comes
with HDP
2.5), the python script doesn't pass any property along.
  Here's the
obfuscated output when I connect to PQS:

[prompt]$ /usr/hdp/2.5.0.0-1245/phoenix/bin/sqlline-thin.py
http://pqs1.mydomain.com:8765;TenantId=myTenant

Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect
jdbc:phoenix:thin:url=http://pqs1.mydomain:8765;serialization=PROTOBUF
none none org.apache.phoenix.queryserver.client.Driver
Connecting to
jdbc:phoenix:thin:url=http://pqs1.mydomain:8765;serialization=PROTOBUF
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in

[jar:file:/usr/hdp/2.5.0.0-1245/phoenix/phoenix-4.7.0.2.5.0.0-1245-thin-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in

[jar:file:/usr/hdp/2.5.0.0-1245/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings
 for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Apache Phoenix (version unknown version)
Driver: Phoenix Remote JDBC Driver (version unknown version)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set
fastconnect
to true to skip)...
7603/7603 (100%) Done
Done
sqlline version 1.1.8
0: jdbc:phoenix:thin:url=http://pqs1.mydomain
> !list
1 active connection:
   #0  open
jdbc:phoenix:thin:url=http://pqs1.mydomain:8765;serialization=PROTOBUF


Is this something that has changed in newer versions of Phoenix?

On Mon, Feb 20, 2017 at 1:47 PM, Josh Elser 
>> wrote:

 See

https://github.com/apache/calcite/blob/5181563f9f26d1533a7d98ecca8443077e7b7efa/avatica/core/src/main/java/org/apache/calcite/avatica/remote/Service.java#L1759-L1768




Re: Phoenix Query Server tenant_id

2017-02-22 Thread Josh Elser
So, just that I'm on the same page as you, when you invoke the Java 
application yourself, it works fine, but when you use sqlline-thin.py, 
the extra argument is dropped?


If so, that just sounds like a bug in sqlline-thin.py that we could fix...

If you're skilled in Python (or even just adventurous), I'd guess it 
would be a relatively simple fix :)


Michael Young wrote:

Hi Josh,

you are right, the TenantId property does get passed through when I run
the command by hand (see below).

Phoenix Version: phoenix-4.7.0.2.5.0.0-1245

java $PHOENIX_OPTS \
-cp $CLASSPATH \
-Dlog4j.configuration=file:/usr/hdp/2.5.0.0-1245/phoenix/bin/log4j.properties
\
org.apache.phoenix.queryserver.client.SqllineWrapper \
-d org.apache.phoenix.queryserver.client.Driver \
-u
"jdbc:phoenix:thin:url=http://pqs1.mydomain.com:8765;serialization=PROTOBUF;TenantId=myTenant;
\
-n none -p none --color=true --fastConnect=false --verbose=true
--incremental=false --isolation=TRANSACTION_READ_COMMITTED

However, with Phoenix version 4.7.0.2.5.0.0-1245 (which comes with HDP
2.5), the python script doesn't pass any property along.   Here's the
obfuscated output when I connect to PQS:

[prompt]$ /usr/hdp/2.5.0.0-1245/phoenix/bin/sqlline-thin.py
http://pqs1.mydomain.com:8765;TenantId=myTenant

Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect
jdbc:phoenix:thin:url=http://pqs1.mydomain:8765;serialization=PROTOBUF
none none org.apache.phoenix.queryserver.client.Driver
Connecting to
jdbc:phoenix:thin:url=http://pqs1.mydomain:8765;serialization=PROTOBUF
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/hdp/2.5.0.0-1245/phoenix/phoenix-4.7.0.2.5.0.0-1245-thin-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/hdp/2.5.0.0-1245/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Apache Phoenix (version unknown version)
Driver: Phoenix Remote JDBC Driver (version unknown version)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect
to true to skip)...
7603/7603 (100%) Done
Done
sqlline version 1.1.8
0: jdbc:phoenix:thin:url=http://pqs1.mydomain> !list
1 active connection:
  #0  open
jdbc:phoenix:thin:url=http://pqs1.mydomain:8765;serialization=PROTOBUF


Is this something that has changed in newer versions of Phoenix?

On Mon, Feb 20, 2017 at 1:47 PM, Josh Elser > wrote:

See

https://github.com/apache/calcite/blob/5181563f9f26d1533a7d98ecca8443077e7b7efa/avatica/core/src/main/java/org/apache/calcite/avatica/remote/Service.java#L1759-L1768



This should be passed down just fine. If you can provide details as
to how it isn't, that'd be great.


Josh Elser wrote:

I thought arbitrary properties would be passed through, but I'm
not sure
off the top of my head anymore

Would have to dig through the Avatica JDBC driver to (re)figure
this one
out.

Michael Young wrote:

Is it possible to pass the TenantID attribute on the URL
when using the
phoenix query server? For example,

/usr/hdp/2.5.0.0-1245/phoenix/bin/sqlline-thin.py
http://pqshost.myhost.com:8765;TenantId=tenant1

This works fine for me when connecting via jdbc. Just didn't
seem to
work with the query server.

Thanks,
-Michael






Re: Phoenix query performance

2017-02-22 Thread Maryann Xue
Hi Pradheep,

Thank you for the answers! Please see my response inline.


On Wed, Feb 22, 2017 at 12:39 PM, Pradheep Shanmugam <
pradheep.shanmu...@infor.com> wrote:

> Hi Maryann
>
> Please find my answers inline.
>
> Thanks,
> Pradheep
>
> From: Maryann Xue 
> Reply-To: "user@phoenix.apache.org" 
> Date: Wednesday, February 22, 2017 at 2:22 PM
> To: "user@phoenix.apache.org" 
> Subject: Re: Phoenix query performance
>
> Hi Pradheep,
>
> Thank you for posting the query and the log file! There are two things
> going on on the server side at the same time here. I think it'd be a good
> idea to isolate the problem first. So a few questions:
> 1. When you say data size went from "< 1M" to 30M, did the data from both
> LHS and RHS grow proportionately?
> *It is basically the same table..the query is like a self join..yes, you
> can say that it is proportional.*
>

Sorry that I didn't read the query well enough. Just went through it again,
but it looks to me that instead of a LEFT OUTER join, you are actually
trying to do an ANTI join, is that correct? i.e.,
SELECT PARENTID
FROM msbo_phoenix_comp_rowkey msbo1
WHERE  AND NOT EXISTS (
SELECT 1 FROM msbo_phoenix_comp_rowkey
WHERE  AND PARENTID = msbo1. PARENTID)
If the query can be rewritten to an ANTI join, the join operation can be
more efficient.


> 2. If yes to (1), what if we only increase the data in LHS, but keep it
> small for RHS? Would the query run significantly faster?
> *When RHS count is 420336, time taken is 37 seconds*
> *When RHS count is 63575, time taken is 32 seconds (not a significant
> difference)*
> 3. What if we only do group by on LHS? Would the query time be linear to
> the data size?
> After Removing group by on RHS
> *When RHS count is 420336, time taken is 34 seconds*
> *When RHS count is 63575, time taken is 32 seconds*
>

Just to confirm, are you saying that you removed GROUP BY and kept the
JOIN, and it's taking a long time? One more question, how long would it
take to further remove the JOIN?


> 4. How was GC when running the query?
> *About 12ms in 1 RS, 10ms in 1 RS, 4-5ms in couple of them and less than
> 1ms in the rest of the region servers when the query is runnning.*
>
> Thanks,
> Maryann
>
> On Wed, Feb 22, 2017 at 9:28 AM, Pradheep Shanmugam <
> pradheep.shanmu...@infor.com> wrote:
>
>> Hi,
>>
>> We have a hbase cluster with 8 region servers with 20G memory
>> We have a table  with 1 column family along with a secondary index.
>> Following query took only few milliseconds when we had less data(< 1
>> million)
>> After adding more data(~30M rows) the performance declined and took about
>> a minute or more(not stable)
>>
>> select msbo1.PARENTID
>>   from msbo_phoenix_comp_rowkey msbo1
>>   left outer join (
>>  select PARENTID,MILESTONETYPEID
>>from msbo_phoenix_comp_rowkey
>>   where PARENTREFERENCETIME between 1479964000 and 1480464000
>> and OWNERORGID = 100
>> and PARENTTYPE = 'SHIPMENT'
>> and MILESTONETYPEID = 19661
>> group by PARENTID,MILESTONETYPEID
>>  ) msbo2
>>   on msbo1.PARENTID = msbo2.PARENTID
>>   where msbo1.PARENTTYPE = 'SHIPMENT'
>>and msbo1.OWNERORGID = 100
>>and msbo2.MILESTONETYPEID is null
>>and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
>> group by msbo1.PARENTID
>>   order by msbo1.PARENTID
>>
>> The RHS return about a 500K rows ..LHS about 18M rows…final result about
>> 500K rows
>>
>> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
>>  is the index
>>
>> *Query plan:*
>> CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER
>> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
>> [0,'SHIPMENT',100]
>> SERVER FILTER BY FIRST KEY ONLY AND 
>> (TO_UNSIGNED_LONG("PARENTREFERENCETIME")
>> >= 1477958400 AND TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000)
>> SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"]
>> CLIENT MERGE SORT
>> PARALLEL LEFT-JOIN TABLE 0
>> CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER
>> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
>> [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,
>> 464,000]
>> SERVER FILTER BY FIRST KEY ONLY
>> SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID",
>> "MILESTONETYPEID"]
>> CLIENT MERGE SORT
>> AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL
>>
>> Attached the phoenix log.
>> I see the caching to set as 100..and "maxResultSize”:2097152..is that
>> something that can be tuned will help?
>> Is that the client merge sort consuming more time can be improved? Is
>> there any other tuning possible?
>>
>> Thanks,
>> Pradheep
>>
>
>


Re: Phoenix query performance

2017-02-22 Thread Pradheep Shanmugam
Hi Maryann

Please find my answers inline.

Thanks,
Pradheep

From: Maryann Xue >
Reply-To: "user@phoenix.apache.org" 
>
Date: Wednesday, February 22, 2017 at 2:22 PM
To: "user@phoenix.apache.org" 
>
Subject: Re: Phoenix query performance

Hi Pradheep,

Thank you for posting the query and the log file! There are two things going on 
on the server side at the same time here. I think it'd be a good idea to 
isolate the problem first. So a few questions:
1. When you say data size went from "< 1M" to 30M, did the data from both LHS 
and RHS grow proportionately?
It is basically the same table..the query is like a self join..yes, you can say 
that it is proportional.
2. If yes to (1), what if we only increase the data in LHS, but keep it small 
for RHS? Would the query run significantly faster?
When RHS count is 420336, time taken is 37 seconds
When RHS count is 63575, time taken is 32 seconds (not a significant difference)
3. What if we only do group by on LHS? Would the query time be linear to the 
data size?
After Removing group by on RHS
When RHS count is 420336, time taken is 34 seconds
When RHS count is 63575, time taken is 32 seconds
4. How was GC when running the query?
About 12ms in 1 RS, 10ms in 1 RS, 4-5ms in couple of them and less than 1ms in 
the rest of the region servers when the query is runnning.

Thanks,
Maryann

On Wed, Feb 22, 2017 at 9:28 AM, Pradheep Shanmugam 
> wrote:
Hi,

We have a hbase cluster with 8 region servers with 20G memory
We have a table  with 1 column family along with a secondary index.
Following query took only few milliseconds when we had less data(< 1 million)
After adding more data(~30M rows) the performance declined and took about a 
minute or more(not stable)

select msbo1.PARENTID
  from msbo_phoenix_comp_rowkey msbo1
  left outer join (
 select PARENTID,MILESTONETYPEID
   from msbo_phoenix_comp_rowkey
  where PARENTREFERENCETIME between 1479964000 and 1480464000
and OWNERORGID = 100
and PARENTTYPE = 'SHIPMENT'
and MILESTONETYPEID = 19661
group by PARENTID,MILESTONETYPEID
 ) msbo2
  on msbo1.PARENTID = msbo2.PARENTID
  where msbo1.PARENTTYPE = 'SHIPMENT'
   and msbo1.OWNERORGID = 100
   and msbo2.MILESTONETYPEID is null
   and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
group by msbo1.PARENTID
  order by msbo1.PARENTID

The RHS return about a 500K rows ..LHS about 18M rows…final result about 500K 
rows

MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
  is the index

Query plan:
CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER 
MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
 [0,'SHIPMENT',100]
SERVER FILTER BY FIRST KEY ONLY AND 
(TO_UNSIGNED_LONG("PARENTREFERENCETIME") >= 1477958400 AND 
TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000)
SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"]
CLIENT MERGE SORT
PARALLEL LEFT-JOIN TABLE 0
CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER 
MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
 [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,464,000]
SERVER FILTER BY FIRST KEY ONLY
SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID", 
"MILESTONETYPEID"]
CLIENT MERGE SORT
AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL

Attached the phoenix log.
I see the caching to set as 100..and "maxResultSize”:2097152..is that something 
that can be tuned will help?
Is that the client merge sort consuming more time can be improved? Is there any 
other tuning possible?

Thanks,
Pradheep



Re: Phoenix query performance

2017-02-22 Thread Maryann Xue
Hi Pradheep,

Thank you for posting the query and the log file! There are two things
going on on the server side at the same time here. I think it'd be a good
idea to isolate the problem first. So a few questions:
1. When you say data size went from "< 1M" to 30M, did the data from both
LHS and RHS grow proportionately?
2. If yes to (1), what if we only increase the data in LHS, but keep it
small for RHS? Would the query run significantly faster?
3. What if we only do group by on LHS? Would the query time be linear to
the data size?
4. How was GC when running the query?


Thanks,
Maryann

On Wed, Feb 22, 2017 at 9:28 AM, Pradheep Shanmugam <
pradheep.shanmu...@infor.com> wrote:

> Hi,
>
> We have a hbase cluster with 8 region servers with 20G memory
> We have a table  with 1 column family along with a secondary index.
> Following query took only few milliseconds when we had less data(< 1
> million)
> After adding more data(~30M rows) the performance declined and took about
> a minute or more(not stable)
>
> select msbo1.PARENTID
>   from msbo_phoenix_comp_rowkey msbo1
>   left outer join (
>  select PARENTID,MILESTONETYPEID
>from msbo_phoenix_comp_rowkey
>   where PARENTREFERENCETIME between 1479964000 and 1480464000
> and OWNERORGID = 100
> and PARENTTYPE = 'SHIPMENT'
> and MILESTONETYPEID = 19661
> group by PARENTID,MILESTONETYPEID
>  ) msbo2
>   on msbo1.PARENTID = msbo2.PARENTID
>   where msbo1.PARENTTYPE = 'SHIPMENT'
>and msbo1.OWNERORGID = 100
>and msbo2.MILESTONETYPEID is null
>and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
> group by msbo1.PARENTID
>   order by msbo1.PARENTID
>
> The RHS return about a 500K rows ..LHS about 18M rows…final result about
> 500K rows
>
> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
>  is the index
>
> *Query plan:*
> CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_
> INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
> [0,'SHIPMENT',100]
> SERVER FILTER BY FIRST KEY ONLY AND 
> (TO_UNSIGNED_LONG("PARENTREFERENCETIME")
> >= 1477958400 AND TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000)
> SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"]
> CLIENT MERGE SORT
> PARALLEL LEFT-JOIN TABLE 0
> CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER
> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
> [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,
> 464,000]
> SERVER FILTER BY FIRST KEY ONLY
> SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID",
> "MILESTONETYPEID"]
> CLIENT MERGE SORT
> AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL
>
> Attached the phoenix log.
> I see the caching to set as 100..and "maxResultSize”:2097152..is that
> something that can be tuned will help?
> Is that the client merge sort consuming more time can be improved? Is
> there any other tuning possible?
>
> Thanks,
> Pradheep
>