[ https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486692#comment-14486692 ]
Jonathan Shook commented on CASSANDRA-8826: ------------------------------------------- Consider that many systems are implementing aggregate processing at the client node. A more optimal system would allow those aggregates to be processed close to storage rather than bulk shipping operands across the wire to the client before any computation can even be started. Even using the coordinate for this is relatively wasteful. After considering multiple options for how to handle aggregates in a Cassandra-idiomatic way, I arrived at pretty much the same place as [~benedict]. The point is not to try to emulate other systems, but to highly optimize a very common and traffic-sensitive usage pattern. The partial data scenarios (CL>1) are interesting, but you can easily describe what a reasonable behavior would be if data were missing from a replica. In the most basic case, you simply reflect the standard CL interpretation that "the results from these nodes is not consistent at CL=Q". While this is not helpful to clients as such, it is a consistent interpretation of the semantics. The same types of things you might do as a user to deal with it do not change. If the data of interest is consistent, then aggregations of that data will be consistent, and vice-versa. That almost certainly invites more questions about the likely scenario of partial data for near-time reads at CL>1. That, to me, is the most interesting and challenging part of this idea. If you simply active read repair logic as an intermediate step, you still maintain the same CL semantics that users would expect. Am I missing something that makes this more complicated than I am thinking? My impression is that the concern for complexity is more fairly placed on the more advanced things that you might build on top of distributed single partition aggregates, not the basic idea of it. > Distributed aggregates > ---------------------- > > Key: CASSANDRA-8826 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8826 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Robert Stupp > Priority: Minor > > Aggregations have been implemented in CASSANDRA-4914. > All calculation is performed on the coordinator. This means, that all data is > pulled by the coordinator and processed there. > This ticket's about to distribute aggregates to make them more efficient. > Currently some related tickets (esp. CASSANDRA-8099) are currently in > progress - we should wait for them to land before talking about > implementation. > Another playgrounds (not covered by this ticket), that might be related is > about _distributed filtering_. -- This message was sent by Atlassian JIRA (v6.3.4#6332)