[ 
https://issues.apache.org/jira/browse/PHOENIX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237093#comment-14237093
 ] 

James Taylor commented on PHOENIX-1498:
---------------------------------------

bq. This is a good point. But we also have to set VERSIONS=unlimited otherwise 
it won't work for some data table anyway. 
We set VERSIONS=1000 which should be plenty for any realistic case. We could 
make it configurable if necessary (but it'd only be applied once when the 
system tables are created).

bq. In addition, do we need enable it for SYSTEM.STATS which we should use 
latest info as region split is happening all the time?
We allow it to be versioned when the user is controlling the timestamps and 
doing flashback queries. It's definitely a corner case. If we find this 
problematic from a perf perspective, we can change it.

So If you turn KEEP_DELETED_CELLS on for these system tables, are the changes 
still necessary to the base test classes? My preference would be to not change 
those unless there's a good reason.



> Turn KEEP_DELETED_CELLS off by default
> --------------------------------------
>
>                 Key: PHOENIX-1498
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1498
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.0.0, 5.0.0
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>         Attachments: PHOENIX-1498-v2.patch, PHOENIX-1498.patch
>
>
> Phoenix table is created with "KEEP_DELETED_CELLS" enabled by default, this 
> is only used to allow for flashback queries to work correctly. While 
> flashback query isn't used often in field and we found that query performance 
> degraded with the option on. This is likely a hbase scan issue though(will 
> create a JIRA once having more info). 
> Anyway Keeping deleted cells will add performance penalty and it's not used 
> often. Therefore, I'm suggesting to set it off by default. 
> We have a test where a table is loaded with > 5m rows and then some are 
> deleted/reinserted. The count ( * ) performance became worse & worse:
> {code}
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (33.273 seconds)
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (174.771 seconds)
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (458.251 seconds)
> {code}
> I think we can provide a table property in CREATE TABLE & ALTER TABLE 
> statement for people to enable KEEP_DELETED_CELLS if there is a need but by 
> default it should be turned off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to