[ 
https://issues.apache.org/jira/browse/PHOENIX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230858#comment-14230858
 ] 

Jeffrey Zhong commented on PHOENIX-1498:
----------------------------------------

We tried several major compactions and it didn't help if 
KEEP_DELETED_CELLS=true because those deleted row keys are mostly unique. When 
KEEP_DELETED_CELLS=false, major compaction removes all deleted cells.  In HBase 
0.98, "versions=1" is set by default already.

Normal backup & restore, a user often use snapshot/export/replication. 



> Turn KEEP_DELETED_CELLS off by default
> --------------------------------------
>
>                 Key: PHOENIX-1498
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1498
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.0.0, 5.0.0
>            Reporter: Jeffrey Zhong
>
> Phoenix table is created with "KEEP_DELETED_CELLS" enabled by default, this 
> is only used to allow for flashback queries to work correctly. While 
> flashback query isn't used often in field and we found that query performance 
> degraded with the option on. This is likely a hbase scan issue though(will 
> create a JIRA once having more info). 
> Anyway Keeping deleted cells will add performance penalty and it's not used 
> often. Therefore, I'm suggesting to set it off by default. 
> We have a test where a table is loaded with > 5m rows and then some are 
> deleted/reinserted. The count ( * ) performance became worse & worse:
> {code}
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (33.273 seconds)
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (174.771 seconds)
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (458.251 seconds)
> {code}
> I think we can provide a table property in CREATE TABLE & ALTER TABLE 
> statement for people to enable KEEP_DELETED_CELLS if there is a need but by 
> default it should be turned off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to