[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-05-11 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0001-move-compaction-code-into-own-package.patch

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-05-11 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0002-Pluggable-Compaction-and-Expiration.patch

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction

2011-05-11 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032205#comment-13032205
 ] 

Alan Liang commented on CASSANDRA-1610:
---

Some TODOs:
-add mockito dependency to test build only
-determine why DatabaseDescriptorTest#serDe() fails
-validation of compaction_strategy_options
-more tests for expiration of files

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-05-11 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: (was: 0002-Pluggable-Compaction-and-Expiration.patch)

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-05-11 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: (was: 0001-move-compaction-code-into-own-package.patch)

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-05-11 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0002-Pluggable-Compaction-and-Expiration.patch

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-05-11 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0001-move-compaction-code-into-own-package.patch

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction

2011-05-11 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032235#comment-13032235
 ] 

Alan Liang commented on CASSANDRA-1610:
---

Updated patch files.

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction

2011-05-13 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033258#comment-13033258
 ] 

Alan Liang commented on CASSANDRA-1610:
---

"Looking quickly through that code, it looks a good chunk of the code is here 
to support the expiring of sstables, and it's pretty much hardcoded. Isn't 
there a way to encapsulate that better ?"

You're right, it might make more sense to allow a strategy to define how it 
should expire the sstables.

I'll try and fix the description. But I want to keep the implemented strategies 
with this ticket because they justify why the interfaces are worthwhile as Stu 
pointed out above.

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-05-13 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Description: 
In CASSANDRA-1608, I proposed some changes on how compaction works. I think it 
also makes sense to allow the ability to have pluggable compaction per CF. 
There could be many types of workloads where this makes sense. One example we 
had at Digg was to completely throw away certain SSTables after N days.

The goal of this ticket is to make compaction pluggable enough to support 
compaction based on max timestamp ordering of the sstables while satisfying max 
sstable size, min and max compaction thresholds. Another goal is to allow 
expiration of sstables based on a timestamp.

  was:In CASSANDRA-1608, I proposed some changes on how compaction works. I 
think it also makes sense to allow the ability to have pluggable compaction per 
CF. There could be many types of workloads where this makes sense. One example 
we had at Digg was to completely throw away certain SSTables after N days. 


> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> The goal of this ticket is to make compaction pluggable enough to support 
> compaction based on max timestamp ordering of the sstables while satisfying 
> max sstable size, min and max compaction thresholds. Another goal is to allow 
> expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-06-01 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0002-pluggable-compaction.patch
0001-move-compaction-code-into-own-package.patch

2nd attempt after rebasing with trunk

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> The goal of this ticket is to make compaction pluggable enough to support 
> compaction based on max timestamp ordering of the sstables while satisfying 
> max sstable size, min and max compaction thresholds. Another goal is to allow 
> expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-06-01 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Description: 
In CASSANDRA-1608, I proposed some changes on how compaction works. I think it 
also makes sense to allow the ability to have pluggable compaction per CF. 
There could be many types of workloads where this makes sense. One example we 
had at Digg was to completely throw away certain SSTables after N days.

This ticket addresses making compaction pluggable only.

  was:
In CASSANDRA-1608, I proposed some changes on how compaction works. I think it 
also makes sense to allow the ability to have pluggable compaction per CF. 
There could be many types of workloads where this makes sense. One example we 
had at Digg was to completely throw away certain SSTables after N days.

The goal of this ticket is to make compaction pluggable enough to support 
compaction based on max timestamp ordering of the sstables while satisfying max 
sstable size, min and max compaction thresholds. Another goal is to allow 
expiration of sstables based on a timestamp.


> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-01 Thread Alan Liang (JIRA)
Timestamp Based Compaction Strategy
---

 Key: CASSANDRA-2735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Alan Liang
Assignee: Alan Liang
Priority: Minor


Compaction strategy implementation based on max timestamp ordering of the 
sstables while satisfying max sstable size, min and max compaction thresholds. 
It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-01 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Attachments: 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2576) Rewrite into new file post streaming

2011-06-02 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043078#comment-13043078
 ] 

Alan Liang commented on CASSANDRA-2576:
---

Looks good, but why are we not adding row sizes and column counts to the 
estimated histograms for CommutativeRowIndexer#doIndexing ?

> Rewrite into new file post streaming
> 
>
> Key: CASSANDRA-2576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2576
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
> Fix For: 1.0
>
> Attachments: 
> 0001-CASSANDRA-2576-Don-t-depend-on-a-byte-for-byte-match-f.txt, 
> 0002-CASSANDRA-2576-Rebuild-into-a-new-file-to-minimize-mag.txt
>
>
> Commutative/counter column families use a separate path to rebuild sstables 
> post streaming, and that path currently rewrites the data within the streamed 
> file. While this is great for space efficiency, it means a duplicated code 
> path for writing sstables, which makes it more difficult to make changes like 
> #674.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-06-03 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0002-pluggable-compaction.patch
0001-move-compaction-code-into-own-package.patch

rebased to trunk

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Edited] (CASSANDRA-1610) Pluggable Compaction

2011-06-03 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043867#comment-13043867
 ] 

Alan Liang edited comment on CASSANDRA-1610 at 6/3/11 4:56 PM:
---

Wanted to add a little bit more context. This ticket now only addresses 
pluggable compaction, I've moved the implementation of a timestamp based 
compaction to https://issues.apache.org/jira/browse/CASSANDRA-2735. 

This patch makes compaction pluggable in the sense that, you can implement your 
own AbstractCompactionStrategy. An AbstractCompactionStrategy is responsible 
for selecting the sstables for minor and major compaction. The strategy returns 
a list of AbstractCompactionTasks that are to be executed by the 
CompactionManager. These tasks can be regular compaction, expiration of 
sstables (see #2735), cleanup tasks, etc. For compaction, a strategy returns a 
list of CompactionTask's.

  was (Author: alanliang):
Wanted to add a little bit more context. This ticket now only addresses 
pluggable compaction only, I've moved the implementation of a timestamp based 
compaction to https://issues.apache.org/jira/browse/CASSANDRA-2735. 

This patch makes compaction pluggable in the sense that, you can implement your 
own AbstractCompactionStrategy. An AbstractCompactionStrategy is responsible 
for selecting the sstables for minor and major compaction. The strategy returns 
a list of AbstractCompactionTasks that are to be executed by the 
CompactionManager. These tasks can be regular compaction, expiration of 
sstables (see #2735), cleanup tasks, etc. For compaction, a strategy returns a 
list of CompactionTask's.
  
> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction

2011-06-03 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043867#comment-13043867
 ] 

Alan Liang commented on CASSANDRA-1610:
---

Wanted to add a little bit more context. This ticket now only addresses 
pluggable compaction only, I've moved the implementation of a timestamp based 
compaction to https://issues.apache.org/jira/browse/CASSANDRA-2735. 

This patch makes compaction pluggable in the sense that, you can implement your 
own AbstractCompactionStrategy. An AbstractCompactionStrategy is responsible 
for selecting the sstables for minor and major compaction. The strategy returns 
a list of AbstractCompactionTasks that are to be executed by the 
CompactionManager. These tasks can be regular compaction, expiration of 
sstables (see #2735), cleanup tasks, etc. For compaction, a strategy returns a 
list of CompactionTask's.

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-06-03 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0002-pluggable-compaction.patch
0001-move-compaction-code-into-own-package.patch

Rebased, fixed tests, added documentation in the cli help.

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-03 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: (was: 
0003-implemented-timestamp-bucketed-compaction-strategy-a.patch)

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-03 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch

Rebased.

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-06-04 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0002-pluggable-compaction.patch
0001-move-compaction-code-into-own-package.patch

I apologize, I uploaded the wrong diffs. This is the one.

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-04 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch

Upload correct patch.

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-04 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: (was: 
0003-implemented-timestamp-bucketed-compaction-strategy-a.patch)

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-06-05 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0002-pluggable-compaction.patch
0001-move-compaction-code-into-own-package.patch

Combed through the files and removed unused/duplicate imports

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-05 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch

Removed unused/duplicate imports.

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch, 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-05 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Comment: was deleted

(was: Removed unused/duplicate imports.)

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-05 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: (was: 
0003-implemented-timestamp-bucketed-compaction-strategy-a.patch)

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-06-08 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0001-pluggable-compaction.patch

Removed updateEstimatedCompactions() from strategy since it is no longer called.

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-pluggable-compaction.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-08 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: 0002-timestamp-bucketed-compaction-strategy.patch

Rebased once again due to change from AbstractCompactionStrategy to 
ICompactionStrategy #1610

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0002-timestamp-bucketed-compaction-strategy.patch, 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction

2011-06-08 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046199#comment-13046199
 ] 

Alan Liang commented on CASSANDRA-1610:
---

bq. I think Ben's selection of methods for the CompactionStrategy is an 
improvement, but I do like having an abstract class so it's obvious what the 
contract is for us vs having to inject parameters post-construction.

I agree, I'll go back to the Abstract class approach.

bq. I'd like to move away from minor/major terms as too tied to the old 
compaction internals. Perhaps background/maximal instead?

Sounds good to me.

bq. We should also make user defined compactions part of ACS – for some 
strategies (e.g. leveldb) we want to be able to reject user requests that would 
break strategy invariants. Note that this should probably return a single Task, 
rather than a list. ("Maximal" will also usually return a single task, but it's 
cleaner to represent "nothing to do" as an empty list, than as null.)

Sounds good to me.

bq. handleInsufficientSpaceForCompaction is a bad encapsulation; it means both 
it and its caller have to deal with "find a place for an sstable." suggest 
leaving it up to CT.execute to deal with.

Sounds good to me.


I'll resubmit a patch with all these suggestions. Thanks!

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-pluggable-compaction.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Edited] (CASSANDRA-1610) Pluggable Compaction

2011-06-08 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046199#comment-13046199
 ] 

Alan Liang edited comment on CASSANDRA-1610 at 6/8/11 9:10 PM:
---

bq. I think Ben's selection of methods for the CompactionStrategy is an 
improvement, but I do like having an abstract class so it's obvious what the 
contract is for us vs having to inject parameters post-construction.

I agree, I'll go back to the Abstract class approach.

bq. I'd like to move away from minor/major terms as too tied to the old 
compaction internals. Perhaps background/maximal instead?

Sounds good to me.

bq. We should also make user defined compactions part of ACS – for some 
strategies (e.g. leveldb) we want to be able to reject user requests that would 
break strategy invariants. Note that this should probably return a single Task, 
rather than a list. ("Maximal" will also usually return a single task, but it's 
cleaner to represent "nothing to do" as an empty list, than as null.)

Sounds good to me.

bq. handleInsufficientSpaceForCompaction is a bad encapsulation; it means both 
it and its caller have to deal with "find a place for an sstable." suggest 
leaving it up to CT.execute to deal with.

Sounds good to me. So if a strategy wants to customize the behavior of handling 
insufficient space, they'd have to implement their own CompactionTask (or 
override the existing one). What do you think about that? Another thing is... 
since space is always a race condition, I could leave it up to the strategy to 
ensure the sstable it has selected has a reasonable amount of space for 
compaction.


I'll resubmit a patch with all these suggestions. Thanks!

  was (Author: alanliang):
bq. I think Ben's selection of methods for the CompactionStrategy is an 
improvement, but I do like having an abstract class so it's obvious what the 
contract is for us vs having to inject parameters post-construction.

I agree, I'll go back to the Abstract class approach.

bq. I'd like to move away from minor/major terms as too tied to the old 
compaction internals. Perhaps background/maximal instead?

Sounds good to me.

bq. We should also make user defined compactions part of ACS – for some 
strategies (e.g. leveldb) we want to be able to reject user requests that would 
break strategy invariants. Note that this should probably return a single Task, 
rather than a list. ("Maximal" will also usually return a single task, but it's 
cleaner to represent "nothing to do" as an empty list, than as null.)

Sounds good to me.

bq. handleInsufficientSpaceForCompaction is a bad encapsulation; it means both 
it and its caller have to deal with "find a place for an sstable." suggest 
leaving it up to CT.execute to deal with.

Sounds good to me.


I'll resubmit a patch with all these suggestions. Thanks!
  
> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-pluggable-compaction.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-06-09 Thread Alan Liang (JIRA)
Capture the max client timestamp for an SSTable
---

 Key: CASSANDRA-2753
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Alan Liang
Assignee: Alan Liang
Priority: Minor




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Edited] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-09 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046750#comment-13046750
 ] 

Alan Liang edited comment on CASSANDRA-2735 at 6/9/11 8:01 PM:
---

Splitting out the capturing of max client supplied timestamp into a separate 
ticket (#2753) so that other tickets can benefit.

  was (Author: alanliang):
Splitting out the capturing of max client supplied timestamp into a 
separate ticket so that other tickets can benefit.
  
> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0002-timestamp-bucketed-compaction-strategy.patch, 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-06-09 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0002-rename-major-minor-to-maximal-background-in-Compacti.patch
0001-pluggable-compaction.patch

new patch incorporates suggestions by jbellis, also, renamed minor/major -> 
background/maximal

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-pluggable-compaction.patch, 0001-pluggable-compaction.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 
> 0002-rename-major-minor-to-maximal-background-in-Compacti.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-06-09 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2753:
--

Attachment: 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch

> Capture the max client timestamp for an SSTable
> ---
>
> Key: CASSANDRA-2753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Attachments: 
> 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-09 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: 0004-timestamp-bucketed-compaction-strategy.patch

New patch has code just for timestamp compaction strategy.

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0004-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-09 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: (was: 
0003-implemented-timestamp-bucketed-compaction-strategy-a.patch)

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0004-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-09 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: (was: 0002-timestamp-bucketed-compaction-strategy.patch)

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0004-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-06-09 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046939#comment-13046939
 ] 

Alan Liang commented on CASSANDRA-2753:
---

In this patch, I've captured the max timestamp and stored it as part of the 
stats file. I've encapsulated this file through a class called SSTableMetadata. 
Estimated histograms for row size and column counts and replay positions will 
also be available via this class.

> Capture the max client timestamp for an SSTable
> ---
>
> Key: CASSANDRA-2753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Attachments: 
> 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2336) Extract SSTable.Builder/IndexWriter

2011-06-10 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047329#comment-13047329
 ] 

Alan Liang commented on CASSANDRA-2336:
---

These changes look good. +1

> Extract SSTable.Builder/IndexWriter
> ---
>
> Key: CASSANDRA-2336
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2336
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>Priority: Minor
> Fix For: 1.0
>
> Attachments: 0001-CASSANDRA-2336-Extract-IndexWriter.txt, 
> 0002-CASSANDRA-2336-Extract-Builder.txt, 
> 0003-CASSANDRA-2336-Move-statistics-writing-into-IndexWrite.txt
>
>
> The Builder and IndexWriter classes in SSTableWriter are static, and 
> independently useful. Additionally, we need the ability to subclass them for 
> CASSANDRA-674 and CASSANDRA-2319.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1610) Pluggable Compaction

2011-06-10 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047442#comment-13047442
 ] 

Alan Liang commented on CASSANDRA-1610:
---

The only difference is: 
 84 public boolean isCompactionDisabled()
 89 public int getMinCompactionThreshold()
 94 public int getMaxCompactionThreshold()

They were for convenience for the strategy implementer to have all things in 
one place. I'll remove.

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-pluggable-compaction.patch, 0001-pluggable-compaction.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 
> 0002-rename-major-minor-to-maximal-background-in-Compacti.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1610) Pluggable Compaction

2011-06-10 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-1610:
--

Attachment: 0002-rename-major-minor-to-maximal-background-in-Compacti.patch
0001-pluggable-compaction.patch

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Fix For: 1.0
>
> Attachments: 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-move-compaction-code-into-own-package.patch, 
> 0001-pluggable-compaction.patch, 0001-pluggable-compaction.patch, 
> 0001-pluggable-compaction.patch, 
> 0002-Pluggable-Compaction-and-Expiration.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 0002-pluggable-compaction.patch, 
> 0002-pluggable-compaction.patch, 
> 0002-rename-major-minor-to-maximal-background-in-Compacti.patch, 
> 0002-rename-major-minor-to-maximal-background-in-Compacti.patch
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days.
> This ticket addresses making compaction pluggable only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2629) Move key reads into SSTableIterators

2011-06-13 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048946#comment-13048946
 ] 

Alan Liang commented on CASSANDRA-2629:
---

  CompactionManager.java:
  -retrying from key/length from index is useful, we should add this back, as 
you mentioned in your comments above.
-move "long rowSizeFromIndex = nextRowPositionFromIndex - 
currentRowPositionFromIndex;" into the IF statement where it is needed
-in your log warnings, specifying the actual sstable will help with 
debugging

SSTableNamesIterator.java:
-remove "this.key = key;" for both constructors and that means "public 
DecoratedKey key;" can still be final
*init() method should be more descriptive
-remove @param key comment from IFilter.java and 
SSTableSliceIterator.java

SSTableWriter.java:
-calling close() on an SSTableIdentityIterator to go to the end doesn't 
sound right. Use another name other than "close()"
  -safer to updateCache(iter) AFTER appending to writer

> Move key reads into SSTableIterators
> 
>
> Key: CASSANDRA-2629
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2629
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
> Fix For: 1.0
>
> Attachments: 
> 0001-CASSANDRA-2629-Move-key-and-row-size-reading-into-the-.txt, 
> 0002-CASSANDRA-2629-Remove-the-retry-with-key-from-index-st.txt
>
>
> All SSTableIterators have a constructor that assumes the key and length has 
> already been parsed. Moving this logic inside the iterator will improve 
> symmetry and allow the file format to change without iterator consumers 
> knowing it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-06-14 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049323#comment-13049323
 ] 

Alan Liang commented on CASSANDRA-2753:
---

Makes sense, I'll move the tracking outside of the serializer. However, one 
thing I realized that I missed is to also capture max timestamp of counter data 
being streamed over from the other nodes. The challenge is where to capture the 
max timestamp without doing it within the AbstractedCompactedRow#write method. 
But it seems like I have no choice without sacrificing performance by iterating 
over the file again to collect the max timestamp. This is because a 
LazilyCompactedRow keeps only a single column in memory and this only happens 
within the write method. What do you think?

> Capture the max client timestamp for an SSTable
> ---
>
> Key: CASSANDRA-2753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Attachments: 
> 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-06-14 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049413#comment-13049413
 ] 

Alan Liang commented on CASSANDRA-2753:
---

I already have a solution to capture max timestamp for non counter data as seen 
in the current patch. So this really is only a problem for streamed counter 
data. 

> Capture the max client timestamp for an SSTable
> ---
>
> Key: CASSANDRA-2753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Attachments: 
> 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-06-14 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049444#comment-13049444
 ] 

Alan Liang edited comment on CASSANDRA-2753 at 6/14/11 9:40 PM:


There are basically 3 places where we need to track max timestamps:

1. Memtable flush
2. During compaction (we simply take the max timestamp already recorded for the 
sstables)
3. Streamed data (normal columns and counter columns)

The challenge here is to capture the max timestamp for newly streamed data. 

For non-counter streamed data, RowIndexer#doIndexing goes through the streamed 
data files and simply updates the cache for the new rows. It iterates over the 
column families without deserializing the columns. To capture max timestamp 
here, I actually deserialize the columns from disk. This incurs more CPU but 
since it is already doing disk seeks when calling  
deserializeFromSSTableNoColumns(), the seek is less costly.

For counter streamed data, CommutativeRowIndexer#doIndexing actually creates 
new data files from the streamed data files. It does this by building an 
AbstractCompactedRow which can be either PreCompactedRow or LazilyCompactedRow. 
Collecting the max timestamp for PreCompactedRow is easy since all the columns 
are in memory. For LazilyCompactedRow, the only place where I can observe the 
max timestamp is during the #write method. Capturing the max timestamp inside 
#write is obviously not ideal since it would introduce a side effect. 
Alternatively, I could capture the max timestamp by deserializing the entire 
LazilyCompactedRow again but this obviously would mean more IO/CPU.

So it looks like I have to capture the max timestamp inside #write.

  was (Author: alanliang):
There are basically 3 places where we need to track max timestamps:

1. Memtable flush
2. During compaction (we simply take the max timestamp already recorded for the 
sstables)
3. Streamed data (normal columns and counter columns)

The challenge here is to capture the max timestamp for newly streamed data. 

For non-counter streamed data, RowIndexer#doIndexing goes through the streamed 
data files and simply updates the cache for the new rows. It iterates over the 
column families without deserializing the columns. To capture max timestamp 
here, I actually deserialize the columns from disk. This incurs more CPU but 
since it is already doing disk seeks when calling  
deserializeFromSSTableNoColumns(), the seek is less costly.

For counter streamed data, CommutativeRowIndexer#doIndexing actually creates 
new data files from the streamed data files. It does this by building an 
AbstractCompactedRow which can be either PreCompactedRow or LazilyCompactedRow. 
Collecting the max timestamp for PreCompactedRow is easy since all the columns 
are in memory. For LazilyCompactedRow, the only place where I can observe the 
max timestamp is during the #write method. Capturing the max timestamp is 
obviously not ideal since it would introduce a side effect. Alternatively, I 
could capture the max timestamp by deserializing the entire LazilyCompactedRow 
again but this obviously would mean more IO/CPU.

So it looks like I have to capture the max timestamp inside #write.
  
> Capture the max client timestamp for an SSTable
> ---
>
> Key: CASSANDRA-2753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Attachments: 
> 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-06-14 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049444#comment-13049444
 ] 

Alan Liang commented on CASSANDRA-2753:
---

There are basically 3 places where we need to track max timestamps:

1. Memtable flush
2. During compaction (we simply take the max timestamp already recorded for the 
sstables)
3. Streamed data (normal columns and counter columns)

The challenge here is to capture the max timestamp for newly streamed data. 

For non-counter streamed data, RowIndexer#doIndexing goes through the streamed 
data files and simply updates the cache for the new rows. It iterates over the 
column families without deserializing the columns. To capture max timestamp 
here, I actually deserialize the columns from disk. This incurs more CPU but 
since it is already doing disk seeks when calling  
deserializeFromSSTableNoColumns(), the seek is less costly.

For counter streamed data, CommutativeRowIndexer#doIndexing actually creates 
new data files from the streamed data files. It does this by building an 
AbstractCompactedRow which can be either PreCompactedRow or LazilyCompactedRow. 
Collecting the max timestamp for PreCompactedRow is easy since all the columns 
are in memory. For LazilyCompactedRow, the only place where I can observe the 
max timestamp is during the #write method. Capturing the max timestamp is 
obviously not ideal since it would introduce a side effect. Alternatively, I 
could capture the max timestamp by deserializing the entire LazilyCompactedRow 
again but this obviously would mean more IO/CPU.

So it looks like I have to capture the max timestamp inside #write.

> Capture the max client timestamp for an SSTable
> ---
>
> Key: CASSANDRA-2753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Attachments: 
> 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2769) Cannot Create Duplicate Compaction Marker

2011-06-15 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050081#comment-13050081
 ] 

Alan Liang commented on CASSANDRA-2769:
---

Instead of letting DataTracker#markCompacting modify the subset of sstables to 
be compacted, I think it might be cleaner if it didn't and relied on the 
CompactionStrategy to select the correct sstables. We can do this by having the 
CompactionStrategy get the non compacting sstables from the DataTracker and 
work with those to generate the buckets. The strategy should also be 
responsible for creating buckets that fit within the min/max thresholds. 
#markCompacting would then be changed such that it can either accept/reject a 
bucket to be compacted instead of modifying the subset. #markCompacting will 
also serve to handle the race condition of the DataTracker being inaccurate, 
whereby, it will move on to other buckets.

With this, we can avoid generating buckets that are already compacting and it 
gives full control of what actually is compacted by the CompactionStrategy.

What do you guys think?


> Cannot Create Duplicate Compaction Marker
> -
>
> Key: CASSANDRA-2769
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2769
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Benjamin Coverston
>Assignee: Sylvain Lebresne
> Fix For: 0.8.2
>
> Attachments: 
> 0001-0.8.0-Remove-useless-unmarkCompacting-in-doCleanup.patch, 
> 0001-Do-compact-only-smallerSSTables.patch, 
> 0002-Only-compact-what-has-been-succesfully-marked-as-com.patch
>
>
> Concurrent compaction can trigger the following exception when two threads 
> compact the same sstable. DataTracker attempts to prevent this but apparently 
> not successfully.
> java.io.IOError: java.io.IOException: Unable to create compaction marker
>   at 
> org.apache.cassandra.io.sstable.SSTableReader.markCompacted(SSTableReader.java:638)
>   at 
> org.apache.cassandra.db.DataTracker.removeOldSSTablesSize(DataTracker.java:321)
>   at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:294)
>   at 
> org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:255)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:932)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:119)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:102)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:680)
> Caused by: java.io.IOException: Unable to create compaction marker
>   at 
> org.apache.cassandra.io.sstable.SSTableReader.markCompacted(SSTableReader.java:634)
>   ... 12 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2778) Unable to set compaction strategy in cli using create column family command

2011-06-15 Thread Alan Liang (JIRA)
Unable to set compaction strategy in cli using create column family command
---

 Key: CASSANDRA-2778
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2778
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Alan Liang
Assignee: Alan Liang


The following command does not set compaction strategy and its options:
{code}
create column family Standard1
with comparator = BytesType
and compaction_strategy = 
'org.apache.cassandra.db.compaction.TimestampBucketedCompactionStrategy'
and compaction_strategy_options = [{max_sstable_size:504857600, 
retention_in_seconds:60}];
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2778) Unable to set compaction strategy in cli using create column family command

2011-06-16 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2778:
--

Attachment: 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch

> Unable to set compaction strategy in cli using create column family command
> ---
>
> Key: CASSANDRA-2778
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2778
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
> Attachments: 
> 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch
>
>
> The following command does not set compaction strategy and its options:
> {code}
> create column family Standard1
> with comparator = BytesType
> and compaction_strategy = 
> 'org.apache.cassandra.db.compaction.TimestampBucketedCompactionStrategy'
> and compaction_strategy_options = [{max_sstable_size:504857600, 
> retention_in_seconds:60}];
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2778) Unable to set compaction strategy in cli using create column family command

2011-06-16 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2778:
--

Attachment: 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch

> Unable to set compaction strategy in cli using create column family command
> ---
>
> Key: CASSANDRA-2778
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2778
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
> Attachments: 
> 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch, 
> 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch
>
>
> The following command does not set compaction strategy and its options:
> {code}
> create column family Standard1
> with comparator = BytesType
> and compaction_strategy = 
> 'org.apache.cassandra.db.compaction.TimestampBucketedCompactionStrategy'
> and compaction_strategy_options = [{max_sstable_size:504857600, 
> retention_in_seconds:60}];
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2778) Unable to set compaction strategy in cli using create column family command

2011-06-16 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2778:
--

Attachment: (was: 
0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch)

> Unable to set compaction strategy in cli using create column family command
> ---
>
> Key: CASSANDRA-2778
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2778
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
> Attachments: 
> 0001-2778-allow-for-dynamic-changes-to-compaction-strateg.patch
>
>
> The following command does not set compaction strategy and its options:
> {code}
> create column family Standard1
> with comparator = BytesType
> and compaction_strategy = 
> 'org.apache.cassandra.db.compaction.TimestampBucketedCompactionStrategy'
> and compaction_strategy_options = [{max_sstable_size:504857600, 
> retention_in_seconds:60}];
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-06-20 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2753:
--

Attachment: 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch

2nd attempt based on Jonathan Ellis' comments. 

Highlights of the patch are:

- captures max column timestamp at the following places: memtable flush, 
compaction and rebuilding after streamed
- store max timestamp in stats file and created SSTableMetadata class to 
encapsulate the stats file
- moved estimated histograms for column/row counts and replay position into 
stats file
- bumped version number
- tests


> Capture the max client timestamp for an SSTable
> ---
>
> Key: CASSANDRA-2753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Attachments: 
> 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, 
> 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-21 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052834#comment-13052834
 ] 

Alan Liang commented on CASSANDRA-2735:
---

This compaction strategy is useful for time series data. Eg. you capture counts 
for each minute, hour, day. Ordering and compacting the sstables by column 
timestamp allows you to expire sstables more effectively compared to the size 
tiered approach in trunk. This is because the size tiered approach could 
combine an old sstable with a new sstable, which renders the sstable to look 
like it is quite new. You would not be able to expire the old data in this case.

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0004-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-21 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: 0001-timestamp-bucketed-compaction-strategy.patch

Highlights of this patch:
- Introduce a timestamp compaction strategy
- Introduce Expiration Task
- option to delete or move to expired folder
- Tests for timestamp bucketing strategy

This patch depends on https://issues.apache.org/jira/browse/CASSANDRA-2753 to 
be committed.

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0001-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-21 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: (was: 0004-timestamp-bucketed-compaction-strategy.patch)

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0001-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-06-28 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056362#comment-13056362
 ] 

Alan Liang commented on CASSANDRA-2753:
---

bq. No support for supercolumns?

Wow. Good catch. I've added test tests for this as well.

bq. it would be more clear if observeColumnsInSSTable took a CFMetaData object 
instead of a CF, to get a serializer from.

I've added a helper method CFMetaData.getColumnSerializer() to do this.

bq. nit: SSTMC.setMaxTimestamp would be more accurately named updateMaxTimestamp

Makes sense.

bq. IMO SSTM deserialize versioning logic would be clearer if it were all in 
SSTMSerializer instead of split between that and openFromDescriptor.

Makes sense.

bq. Suggest adding a comment that SSTableWriter.append(AbstractCompactedRow 
row) deliberately avoids calling updateMaxTimestamp b/c otherwise we'd have to 
deserialize EchoedRow.

Sounds good.

bq. where is the max-timestamp-of-compacted-sstables logic? I didn't notice it.

I put this in ColumnFamilyStore.createCompactionWriter():

{code}
public SSTableWriter createCompactionWriter(long estimatedRows, String 
location, Collection sstables) throws IOException
{
ReplayPosition rp = ReplayPosition.getReplayPosition(sstables);
SSTableMetadata.Collector sstableMetadataCollector = 
SSTableMetadata.createCollector().replayPosition(rp);

// get the max timestamp of the precompacted sstables
for (SSTableReader sstable : sstables)
sstableMetadataCollector.updateMaxTimestamp(sstable.getMaxTimestamp());

return new SSTableWriter(getTempSSTablePath(location), estimatedRows, 
metadata, partitioner, sstableMetadataCollector);
}
{code}

bq. nit: renaming SSTableWriter.writeMetadata feels gratuitous

I renamed it back to writeMetadata.

bq. nit: prefer initializing fields that don't need constructor parameters, at 
declaration time (looking at RowIndexer.sstMC)

Makes sense.


> Capture the max client timestamp for an SSTable
> ---
>
> Key: CASSANDRA-2753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Attachments: 
> 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, 
> 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-06-28 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2753:
--

Attachment: 
0001-capture-max-timestamp-and-created-SSTableMetadata-to-V2.patch

V2 patch based on jbellis' comments

> Capture the max client timestamp for an SSTable
> ---
>
> Key: CASSANDRA-2753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Attachments: 
> 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V2.patch, 
> 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, 
> 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-07-06 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2753:
--

Attachment: 
0001-capture-max-timestamp-and-created-SSTableMetadata-to-V3.patch

added maxTimestamp() to IColumn

> Capture the max client timestamp for an SSTable
> ---
>
> Key: CASSANDRA-2753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Attachments: 
> 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V2.patch, 
> 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V3.patch, 
> 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, 
> 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-07-12 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064230#comment-13064230
 ] 

Alan Liang commented on CASSANDRA-2753:
---

Daniel, 

Which test does this break? Can you elaborate?

> Capture the max client timestamp for an SSTable
> ---
>
> Key: CASSANDRA-2753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
> Fix For: 1.0
>
> Attachments: 
> 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V2.patch, 
> 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V3.patch, 
> 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, 
> 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch, 
> supercolumn.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-08-02 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2735:
--

Attachment: 0001-timestamp-bucketed-compaction-strategy-V2.patch

rebased onto trunk

> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0001-timestamp-bucketed-compaction-strategy-V2.patch, 
> 0001-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

2011-08-19 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087991#comment-13087991
 ] 

Alan Liang commented on CASSANDRA-1608:
---

There's a problem with Interval#intersects:

{code}
public boolean intersects(Interval interval)
{
return this.contains(interval.min) || this.contains(interval.min);
}
{code}

I think you wanted:
{code}
return this.contains(interval.min) || this.contains(interval.max);
{code}

However, a more efficient way to do this would be:
{code}
return this.min.compareTo(interval.max) <= 0 && return 
this.max.compareTo(interval.min) >= 0;
{code}



> Redesigned Compaction
> -
>
> Key: CASSANDRA-1608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Benjamin Coverston
> Attachments: 1608-v11.txt, 1608-v13.txt, 1608-v2.txt
>
>
> After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
> thinking on this subject that I wanted to lay out.
> I propose we redo the concept of how compaction works in Cassandra. At the 
> moment, compaction is kicked off based on a write access pattern, not read 
> access pattern. In most cases, you want the opposite. You want to be able to 
> track how well each SSTable is performing in the system. If we were to keep 
> statistics in-memory of each SSTable, prioritize them based on most accessed, 
> and bloom filter hit/miss ratios, we could intelligently group sstables that 
> are being read most often and schedule them for compaction. We could also 
> schedule lower priority maintenance on SSTable's not often accessed.
> I also propose we limit the size of each SSTable to a fix sized, that gives 
> us the ability to  better utilize our bloom filters in a predictable manner. 
> At the moment after a certain size, the bloom filters become less reliable. 
> This would also allow us to group data most accessed. Currently the size of 
> an SSTable can grow to a point where large portions of the data might not 
> actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-1608) Redesigned Compaction

2011-08-19 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087991#comment-13087991
 ] 

Alan Liang edited comment on CASSANDRA-1608 at 8/19/11 9:34 PM:


There's a problem with Interval#intersects:

{code}
public boolean intersects(Interval interval)
{
return this.contains(interval.min) || this.contains(interval.min);
}
{code}

I think you wanted:
{code}
return this.contains(interval.min) || this.contains(interval.max);
{code}

However, a more efficient way to do this would be:
{code}
return this.min.compareTo(interval.max) <= 0 && 
this.max.compareTo(interval.min) >= 0;
{code}



  was (Author: alanliang):
There's a problem with Interval#intersects:

{code}
public boolean intersects(Interval interval)
{
return this.contains(interval.min) || this.contains(interval.min);
}
{code}

I think you wanted:
{code}
return this.contains(interval.min) || this.contains(interval.max);
{code}

However, a more efficient way to do this would be:
{code}
return this.min.compareTo(interval.max) <= 0 && return 
this.max.compareTo(interval.min) >= 0;
{code}


  
> Redesigned Compaction
> -
>
> Key: CASSANDRA-1608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Benjamin Coverston
> Attachments: 1608-v11.txt, 1608-v13.txt, 1608-v2.txt
>
>
> After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
> thinking on this subject that I wanted to lay out.
> I propose we redo the concept of how compaction works in Cassandra. At the 
> moment, compaction is kicked off based on a write access pattern, not read 
> access pattern. In most cases, you want the opposite. You want to be able to 
> track how well each SSTable is performing in the system. If we were to keep 
> statistics in-memory of each SSTable, prioritize them based on most accessed, 
> and bloom filter hit/miss ratios, we could intelligently group sstables that 
> are being read most often and schedule them for compaction. We could also 
> schedule lower priority maintenance on SSTable's not often accessed.
> I also propose we limit the size of each SSTable to a fix sized, that gives 
> us the ability to  better utilize our bloom filters in a predictable manner. 
> At the moment after a certain size, the bloom filters become less reliable. 
> This would also allow us to group data most accessed. Currently the size of 
> an SSTable can grow to a point where large portions of the data might not 
> actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

2011-08-22 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089129#comment-13089129
 ] 

Alan Liang commented on CASSANDRA-1608:
---

>From a high level, it's looking good.

In Manifest.java, either "public void add(SSTableReader reader)" should be 
should be synchronized or use a NonBlockingHashMap to hold generations because 
multiple threads could be calling this.

> Redesigned Compaction
> -
>
> Key: CASSANDRA-1608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Benjamin Coverston
> Attachments: 1608-22082011.txt, 1608-v2.txt
>
>
> After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
> thinking on this subject that I wanted to lay out.
> I propose we redo the concept of how compaction works in Cassandra. At the 
> moment, compaction is kicked off based on a write access pattern, not read 
> access pattern. In most cases, you want the opposite. You want to be able to 
> track how well each SSTable is performing in the system. If we were to keep 
> statistics in-memory of each SSTable, prioritize them based on most accessed, 
> and bloom filter hit/miss ratios, we could intelligently group sstables that 
> are being read most often and schedule them for compaction. We could also 
> schedule lower priority maintenance on SSTable's not often accessed.
> I also propose we limit the size of each SSTable to a fix sized, that gives 
> us the ability to  better utilize our bloom filters in a predictable manner. 
> At the moment after a certain size, the bloom filters become less reliable. 
> This would also allow us to group data most accessed. Currently the size of 
> an SSTable can grow to a point where large portions of the data might not 
> actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CASSANDRA-2171) Record and expose flush rate per CF

2011-03-07 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2171:
--

Attachment: expose_flush_rate_per_cf_patch.diff

Attached patch to record flush rate per CF. Exposed this rate through JMX.

> Record and expose flush rate per CF
> ---
>
> Key: CASSANDRA-2171
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2171
> Project: Cassandra
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Stu Hood
>Assignee: Stu Hood
> Fix For: 0.8
>
> Attachments: expose_flush_rate_per_cf_patch.diff
>
>
> In order to automatically throttle compaction to some multiple of the flush 
> rate, we need to record the flush rate across the system. Since this might be 
> useful information on a per CF basis, this ticket will deal with recording 
> the flush rate in the CFStore object, and exposing it via JMX.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Assigned: (CASSANDRA-2171) Record and expose flush rate per CF

2011-03-07 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang reassigned CASSANDRA-2171:
-

Assignee: Alan Liang  (was: Stu Hood)

> Record and expose flush rate per CF
> ---
>
> Key: CASSANDRA-2171
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2171
> Project: Cassandra
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Stu Hood
>Assignee: Alan Liang
> Fix For: 0.8
>
> Attachments: expose_flush_rate_per_cf_patch.diff
>
>
> In order to automatically throttle compaction to some multiple of the flush 
> rate, we need to record the flush rate across the system. Since this might be 
> useful information on a per CF basis, this ticket will deal with recording 
> the flush rate in the CFStore object, and exposing it via JMX.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (CASSANDRA-2288) AES Counter Repair Improvements

2011-03-08 Thread Alan Liang (JIRA)
AES Counter Repair Improvements
---

 Key: CASSANDRA-2288
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2288
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8
Reporter: Alan Liang
Assignee: Alan Liang


A few issues found for AES Counter Repair in 
AESCommutativeRowIndexer#doIndexing:

- sync() being called for each row in sstable
- because the sstable is rebuilt inline (read and write on same file), this 
causes seeking back and forth of write and read positions which causes many 
flushes
- BufferedRandomAccessFile#setLength does not work with buffers

Fixed:
- remove sync() until end
- use two BufferedRandomAccessFile's one for reader, one for writer
- cache length of reader file
- implement BufferedRandomAccessFile#setLength to work with buffer


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2288) AES Counter Repair Improvements

2011-03-09 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang updated CASSANDRA-2288:
--

Attachment: CASSANDRA-2288-aes_counter_repair_improvements.diff

> AES Counter Repair Improvements
> ---
>
> Key: CASSANDRA-2288
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2288
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.8
>Reporter: Alan Liang
>Assignee: Alan Liang
> Attachments: CASSANDRA-2288-aes_counter_repair_improvements.diff
>
>
> A few issues found for AES Counter Repair in 
> AESCommutativeRowIndexer#doIndexing:
> - sync() being called for each row in sstable
> - because the sstable is rebuilt inline (read and write on same file), this 
> causes seeking back and forth of write and read positions which causes many 
> flushes
> - BufferedRandomAccessFile#setLength does not work with buffers
> Fixed:
> - remove sync() until end
> - use two BufferedRandomAccessFile's one for reader, one for writer
> - cache length of reader file
> - implement BufferedRandomAccessFile#setLength to work with buffer

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (CASSANDRA-1610) Pluggable Compaction

2011-04-08 Thread Alan Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Liang reassigned CASSANDRA-1610:
-

Assignee: Alan Liang

> Pluggable Compaction
> 
>
> Key: CASSANDRA-1610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chris Goffinet
>Assignee: Alan Liang
>Priority: Minor
> Fix For: 1.0
>
>
> In CASSANDRA-1608, I proposed some changes on how compaction works. I think 
> it also makes sense to allow the ability to have pluggable compaction per CF. 
> There could be many types of workloads where this makes sense. One example we 
> had at Digg was to completely throw away certain SSTables after N days. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira