[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-03-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590601#comment-13590601
 ] 

Jonathan Ellis commented on CASSANDRA-5182:
---

bq. So overall I do like the last patch attached by Yuki. Of course, the 
solution of just saying you shouldn't disable bloom filters on workloads that 
perform deletes works too, and I wouldn't oppose it, but it doesn't have my 
preference because I'm always a bit afraid of solving an issue by saying don't 
do this, as it usually end up in people getting bitten first and hearing they 
shouldn't have done it second. 

The problem is it's not as simple as people get bitten if we don't 
getPosition, and don't if we do -- they get bitten either way, and IMO the 
bite from getPosition is worse, since it will destroy compaction performance 
for any workload where index doesn't fit entirely in ram, which makes BF 
disabling almost useless.  But if we say only disable BF where you're not 
doing deletes, it has a legitimate if narrow use case.

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.2.3

 Attachments: 5182-1.1.txt, 5182-1.2.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-03-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590683#comment-13590683
 ] 

Sylvain Lebresne commented on CASSANDRA-5182:
-

bq.  if we say only disable BF where you're not doing deletes, it has a 
legitimate if narrow use case

I guess I agree on the principle that we should say only disable BF where 
you're not doing deletes. That being said, if we do use getPosition, we extend 
the possible use cases, since it become only disable BF where you're not doing 
deletes or your index fit entirely in RAM (because getPosition will not 
destroy performance for the not doing delete case, since we don't even call 
shouldPurge() unless we know there is tombstones).

bq. and IMO the bite from getPosition is worse, since it will destroy 
compaction performance

I'm not totally sure I agree on the worse. As said above, if people have not 
tombstone, it won't destroy compaction performance. So I guess the question is: 
for people that 1) do not follow recommendation (cause we should definitively 
say when disabling BF is ok or not) and that 2) do have deletes, is it better 
for them to be bitten by a) bad compaction performance or b) their tombstones 
not being purged ever.

I don't doubt that which of a) or b) is worse is a matter of perspective. That 
being said, my own personal preference goes to avoiding because:
* to me b) is a break of correctness which somewhat trumps performance 
consideration. It purely subjective though.
* accumulating tombstones forever is a pretty nasty time-bomb. Having 
compaction being slow because it hit disk more than it should on the other 
seems easier to me to detect (and thus fix by following the recommendation of 
not disabling BF when you shouldn't).

So, I still have a preference for using Yuki's last patch (and making it clear 
that you shall only disable BF where you're not doing deletes or your index 
fit entirely in RAM). If only because that's a bit better than only disable 
BF where you're not doing deletes. But if you still prefer keeping the status 
quo, I won't oppose, do feel free to close that issue (we still should write 
the recommendation on when to disable BF somewhere in any case).

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.2.3

 Attachments: 5182-1.1.txt, 5182-1.2.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-03-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590940#comment-13590940
 ] 

Jonathan Ellis commented on CASSANDRA-5182:
---

bq. if we do use getPosition, we extend the possible use cases, since it become 
only disable BF where you're not doing deletes or your index fit entirely in 
RAM

That makes sense.  Let's ship Yuki's patch.

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.2.3

 Attachments: 5182-1.1.txt, 5182-1.2.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-02-13 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577446#comment-13577446
 ] 

Sylvain Lebresne commented on CASSANDRA-5182:
-

bq. If our goal is to throw out the maximum possible amount of obsolete data

I kind agree with Bryan, this doesn't have to be black and white. What we want 
is doing the best we can to remove obsolete rows without impacting compaction 
too much. Now if you do have active bloom filters, then I think just checking 
the bloom filters as we do now is the right trade-off: it maximize  with a very 
high probability the amount of removed data at the relatively cheap cost. Using 
getPosition in that case would be a bad idea, because the reward (a tiny 
fraction of additional data potentially removed) is not worth the cost (hitting 
disk each time a row we compact is also in a non-compacted sstable) imo, hence 
my opposition to the idea.

But if you deactivate bloom filters, you also fully destroy our bloom filter 
trade-off. So using getPosition does now provide a substantial benefit as it 
allows to go from 'no deletion' to 'maximize deletion'. The reward is, in that 
case, likely worth the cost, especially since people shouldn't desactivate 
bloom filters unless their index files fits in memory, in which case 
getPosition costs won't be that big.

So overall I do like the last patch attached by Yuki. Of course, the solution 
of just saying you shouldn't disable bloom filters on workloads that perform 
deletes works too, and I wouldn't oppose it, but it doesn't have my preference 
because I'm always a bit afraid of solving an issue by saying don't do this, 
as it usually end up in people getting bitten first and hearing they shouldn't 
have done it second. 

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.2.2

 Attachments: 5182-1.1.txt, 5182-1.2.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-02-12 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577178#comment-13577178
 ] 

Jonathan Ellis commented on CASSANDRA-5182:
---

I'm still not comfortable with this.

If our goal is to throw out the maximum possible amount of obsolete data, we 
should perform getPosition across the board.

But if our goal is to be minimally impactful with compaction then we shouldn't 
do it at all, and rely instead on the timestamp check.  If that's not enough, 
then you shouldn't disable bloom filters on workloads that perform deletes.  
I'm okay with that message.

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.2.2

 Attachments: 5182-1.1.txt, 5182-1.2.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-02-12 Thread Bryan Talbot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577189#comment-13577189
 ] 

Bryan Talbot commented on CASSANDRA-5182:
-

Our use case doesn't require maximum effort to delete rows.  What we ran into 
was an unexpected interaction between two features: bloom filter tuned for low 
read rate, and deleting tombstoned rows.  With that configuration NO rows were 
being removed.  

As long as there is some reasonable effort to remove rows with bloom filter 
disabled OR it's clearly known that a reasonable FP setting is required to 
remove tombstones, I think we could have avoided a lot of headaches.

How does the new tombstone histogram feature in 1.2 affect this issue?  If that 
feature solves the problem already, maybe this fix is irrelevant.


 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.7
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.2.2

 Attachments: 5182-1.1.txt, 5182-1.2.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-01-24 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561502#comment-13561502
 ] 

Sylvain Lebresne commented on CASSANDRA-5182:
-

bq. Maybe it is better to check if fp_chance is high before going through index 
file

Actually, I agree with Yuki on that and I'm kind of -1 on the patch in his 
current form. The current patch means that whatever your fp_chance is, each 
time the row is indeed present in a non compacted sstable (which does prevent 
gcing the row for this compaction but is not something that will necessarily be 
rare) might hit the disk (unless the key cache save you). So I'd be in favor of 
using getPosition only if fp_chance == 1, at least on 1.1 as we have no idea of 
the impact this can have on people that haven't disabled bloom filter and have 
no problem whatsoever with gcing tombstone.

As a side note, I've opened CASSANDRA-5183 that is related to this purge 
tombstone problem.

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.1.10, 1.2.1

 Attachments: 5182-1.1.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-01-23 Thread Bryan Talbot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561072#comment-13561072
 ] 

Bryan Talbot commented on CASSANDRA-5182:
-

A mailing list thread with more details about the use case and symptoms can be 
found at http://www.mail-archive.com/user@cassandra.apache.org/msg27049.html

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5
Reporter: Binh Van Nguyen
 Attachments: test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-01-23 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561229#comment-13561229
 ] 

Yuki Morishita commented on CASSANDRA-5182:
---

Maybe it is better to check if fp_chance is high before going through index 
file, since it has performance penalty.

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.1.10, 1.2.1

 Attachments: 5182-1.1.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-01-23 Thread Binh Van Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561242#comment-13561242
 ] 

Binh Van Nguyen commented on CASSANDRA-5182:


I agreed that we should find a better way since getPosition will check bloom 
filter, key cache and in the worst case (which is our case) it will scan whole 
index table. This will cause the performance issue.

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.1.10, 1.2.1

 Attachments: 5182-1.1.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-01-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561257#comment-13561257
 ] 

Jonathan Ellis commented on CASSANDRA-5182:
---

Do you want a performance issue, or do you only want to remove tombstones 
during major compaction? :)

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.1.10, 1.2.1

 Attachments: 5182-1.1.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-01-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561259#comment-13561259
 ] 

Jonathan Ellis commented on CASSANDRA-5182:
---

Personally I am +1 on the fix; if you run a lot of deletes and can't cache your 
index files in ram, then don't disable bloom filters.

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.1.10, 1.2.1

 Attachments: 5182-1.1.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-01-23 Thread Bryan Talbot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561271#comment-13561271
 ] 

Bryan Talbot commented on CASSANDRA-5182:
-

Using the test program attached, I've reproduce the problem using 1.1.9 and 
then upgraded that cluster (1 node on laptop) to 1.2.0.  The problem remains 
with the load and sstable count increasing.

However, when I run the test program on a fresh 1.2.0 cluster the problem does 
not come up.  My process to reproduce on upgrade is:

install fresh 1.1.9
run test to get 500 MB of data (20-30 mins)
drain and shutdown 1.1.9
start 1.2.0
run nodetool upgradesstables
run test and watch load grow to 2.5 GB while away at lunch


When running the test program on a fresh 1.2.0 installation, the load tops out 
at about 200 MB and 90 or so SSTables which is what is desired.


 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.1.10, 1.2.1

 Attachments: 5182-1.1.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-01-23 Thread Bryan Talbot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561274#comment-13561274
 ] 

Bryan Talbot commented on CASSANDRA-5182:
-

About the check for a high fp_chance before checking indexes.  Did you mean to 
only check index files if fp_chance is high (say over 0.5 or something)?  That 
way the additional check is only incurred with bloom filters are effectively 
disabled and the common case using an effective (low fp) bloom filter is not 
impacted.


 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.1.10, 1.2.1

 Attachments: 5182-1.1.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5182) Deletable rows are sometimes not removed during compaction

2013-01-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561308#comment-13561308
 ] 

Jonathan Ellis commented on CASSANDRA-5182:
---

getPosition does the right thing here: it checks the index file only on bloom 
filter positives, so a high bloom filter setting will benefit automatically.

The only improvement I think makes sense would be adding support for compaction 
strategy tombstone threshold.

 Deletable rows are sometimes not removed during compaction
 --

 Key: CASSANDRA-5182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5182
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5
Reporter: Binh Van Nguyen
Assignee: Yuki Morishita
 Fix For: 1.1.10, 1.2.1

 Attachments: 5182-1.1.txt, test_ttl.tar.gz


 Our use case is write heavy and read seldom.  To optimize the space used, 
 we've set the bloom_filter_fp_ratio=1.0  That along with the fact that each 
 row is only written to one time and that there are more than 20 SSTables 
 keeps the rows from ever being compacted. Here is the code:
 https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L162
 We hit this conner case and because of this C* keeps consuming more and more 
 space on disk while it should not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira