[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356881#comment-15356881 ]
Sylvain Lebresne commented on CASSANDRA-10707: ---------------------------------------------- Sorry for the long iterations on the reviews. I still have a bunch of remarks, though a lot are fairly minor. In general, I "think" the general logic if fine, but it's still a bit hard to wrap your head around so I'm also listing things that are unclear to me, for which some extra-commenting might be just what's missing. Anyway, here we go: * On {{GroupMaker.State}}: ** I'd promote it to a top-level {{GroupingState}} class since it's used in {{DataLimits}} too and would have a nice symmetry with {{PagingState}}. ** Any reason not to reuse the {{Clustering.Serializer}} in the serializer (it probably requires keeping the CFMetaData or ClusteringComparator around for the types but not a big deal)? In particular, the hand-rolled serialization doesn't handle nulls (and should). ** I think it would useful to spell out in more details what having or not having each component means. My understanding is that if {{partitionKey() != null}} and {{clustering == null}}, it means we haven't started counting the partition at all (meaning that seeing that partition in the previous page made use close a group and subsequently hit the page limit, hence stopping). But if {{clustering != null}}, it means we stopped the previous page in the middle of a group. But I'm not sure I'm fully correct (maybe I'm missing some edge cases in particular?) and documenting this would be nice. ** Nit: The comment on the {{clustering()}} method is incorrect (talks about "partition key"). * In general, the handling of the static column is a bit subtle, and it would be useful to have a comment somewhere that explain its general handling in details. For instance, the first case of {{GroupByAwareCounter.applyToPartition()}} contradicts slightly my understanding of what {{state.partitionKey()}} means. That is, I though that if reaching a new partition {{X}} makes us close a group and hit a page limit, then we'll stop and have {{X}} as the partitionKey in the state. However, on the next page, it seems we assume we've somehow already accounted the static row and I'm unclear why. The code in {{applyToStatic}} is also bemusing to me: to start with, what is the subtle difference between the first condition ({{!row.isEmpty() && (assumeLiveData || row.hasLiveData(nowInSec))}}) and the second one ({{hasLiveStaticRow}})? * In {{DataResolver}}, why not just calling {{rowCounted()}} unconditionally (rather than adding helper methods)? * In {{PkPrefixGroupMaker}}, instead of dealing with a ByteBuffer array, I'd just kept the {{Clustering}} of the last row seen (null initially), and the prefix size. We can then even delegate the check of whether the 2 clustering belong to the same group to some {{Clustering}} method. On that front, we shouldn't use {{ByteBuffer.equals}} but rather the {{ClusteringComparator}} types. It happens that some types equality does not align with the bytes equality (ok, one type, IntegerType, which allow as many zeroed bytes as prefix without changing the actual value). * In {{DataLimits.CQLGroupByLimits}} the silent assumption that {{filter()}} should only called on replica, and {{newCounter()}} on the coordinator, feels pretty error-prone for the caller (I haven't even carefully checked it's the case tbh). One option would be to rename the methods to {{filterOnReplica()}} and {{newCounterOnCoordinator()}} but even that feels a bit weird/limiting. I would prefer removing the {{onReplica}} flag completely and handling that in deserialization: we can assume when deserializing a {{DataLimits}} that we are on the {{replica}} (it's arguably still a somewhat silent assumption, but one that feels safer to me; we have no reason to serialize a {{ReadCommand}} except to send it to replica, while having a need to call {{filter()}} on the coordinator or {{newCounter}} on replicas in the future sounds very plausible), so we could just call some {{withoutClustering()}} method on the {{State}} if it's a range query. Another more explicit alternative could be to add some {{onReplica()}} method to {{ReadCommand}} that we'd call in {{ReadCommandVerbHandler}} (where we know for sure we are on a replica) that would return a modified {{ReadCommand}} that would have a modified {{DataLimits}}. That said, it's probably a bit more code just for that, so I think I'm personally fine with the deserializer option. * In {{GroupByAwareCounter}}: ** in {{applyToPartition}}, in the comment starting by {{partitionKey is the last key for which we're returned rows}}, I believe it should read {{state.partitionKey() is the last ...}}. ** at end of {{applyToPartition}}, why the {{!isDone()}} test? It seems that since you just changed the {{currentPartitionKey}}, you should unconditionally change the other variables. ** In {{applyToRow}}, why do we put {{hasGroupStarted == false}} as soon as we find a non-live row? It makes it sound like we "close" a group when we get a non-live row in the middle of a group. ** I'm confused by the 2nd part of the comment on {{onClose()}} (starting with {{The last group ...}}). I'm not sure to see how that relate to the next line (contrarily to the first part of the comment), nor am I entirely sure what that part of the comment is trying to say in general. ** Nit: we should move {{nowInSec}} and {{assumeLiveData}} to {{Counter}} and add a {{protected boolean isLive(Row row)}} helper method. ** Nit: don't need to pass the state to the ctor (currently, slightly confusing that the ctor doesn't store it but its used by other methods). Same for the spec. * For the documentation change, would be nice to modify the new documentation now that it's committed. Also, the modification to the textile have some merge marker left in them (I'm happy if you just want to skip the textile file, I assume we'll probably just remove it soon). * At the end of {{SelectStatement.getAggregationSpecification}}, I believe the {{clusteringPrefixSize > 0 && isDistinct}} test is redundant since we'd have exited early in that case. Happy to make it an assertion, though I'm not sure it's that useful (the code is relatively clear). * Not sure why in {{ReadCommand}} the patch switches the {{withStateTracking}} and {{withPurgeableTombstone}} transformation. Intuitively feels to me that having {{withStateTracking}} (which check for aborts) first is more logical. * Nit: In {{AggregationSpecification.Serializer}}, I'd add a {{break}} after the {{case}} in the {{serialize}} and {{serializedSize}} method just to avoid future mistake (would also maybe add a case for {{AGGREGATE_EVERYTHING}} that does nothing for symmetry with {{desrialize}}). Also, in {{deserialize}}, [seems people prefer|https://issues.apache.org/jira/browse/CASSANDRA-11868] normalizing on putting the {{AssertionError}} in a {{default}} case. * In {{AggregationQueryPager}}: ** {{handlePagingOff()}} is called twice (both before call the iterator ctor on the argument and inside the ctor). ** the pager is doing paging a top-level page, and the naming/javadoc doesn't help not being confused. As an example, in {{computePageSize(int pageSize, int counted)}}, the "pageSize" in the method name refers to sub-pages, while the "pageSize" argument is the top-level page size. Could use some small consistent naming change to help the reader (and some top-level comment to catch the attention on the "subtlety"). * The NEWS file change should be "rebased". > Add support for Group By to Select statement > -------------------------------------------- > > Key: CASSANDRA-10707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10707 > Project: Cassandra > Issue Type: Improvement > Components: CQL > Reporter: Benjamin Lerer > Assignee: Benjamin Lerer > Fix For: 3.x > > > Now that Cassandra support aggregate functions, it makes sense to support > {{GROUP BY}} on the {{SELECT}} statements. > It should be possible to group either at the partition level or at the > clustering column level. > {code} > SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey; > SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP > BY partitionKey, clustering0, clustering1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)