[ https://issues.apache.org/jira/browse/PHOENIX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230961#comment-14230961 ]
Lars Hofhansl commented on PHOENIX-1498: ---------------------------------------- I think KEEP_DELETED_CELLS is fairly misunderstood :) With this on you are telling HBase to keep deleted cells around unless they are expunged by some other means (VERSIONS, or TTL). In the example rows are inserted, then deleted, then *new* rows are inserted (not new versions of existing rows), so with that setup the old deleted rows will never be deleted (you told HBase not to do that and the deleted rows are the 1st version of the rowkey/column in question), and hence HBase now has to skip all the delete markers and deleted rows. +1 on defaulting to KEEP_DELETED_CELLS to false going forward. Anything else will be surprising. > Turn KEEP_DELETED_CELLS off by default > -------------------------------------- > > Key: PHOENIX-1498 > URL: https://issues.apache.org/jira/browse/PHOENIX-1498 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.0.0, 5.0.0 > Reporter: Jeffrey Zhong > > Phoenix table is created with "KEEP_DELETED_CELLS" enabled by default, this > is only used to allow for flashback queries to work correctly. While > flashback query isn't used often in field and we found that query performance > degraded with the option on. This is likely a hbase scan issue though(will > create a JIRA once having more info). > Anyway Keeping deleted cells will add performance penalty and it's not used > often. Therefore, I'm suggesting to set it off by default. > We have a test where a table is loaded with > 5m rows and then some are > deleted/reinserted. The count ( * ) performance became worse & worse: > {code} > +------------+ > | COUNT(1) | > +------------+ > | 5078242 | > +------------+ > 1 row selected (33.273 seconds) > +------------+ > | COUNT(1) | > +------------+ > | 5078242 | > +------------+ > 1 row selected (174.771 seconds) > +------------+ > | COUNT(1) | > +------------+ > | 5078242 | > +------------+ > 1 row selected (458.251 seconds) > {code} > I think we can provide a table property in CREATE TABLE & ALTER TABLE > statement for people to enable KEEP_DELETED_CELLS if there is a need but by > default it should be turned off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)