ableegoldman commented on a change in pull request #6592: URL: https://github.com/apache/kafka/pull/6592#discussion_r621676170
########## File path: clients/src/main/java/org/apache/kafka/common/serialization/ListSerializer.java ########## @@ -77,21 +87,39 @@ public void configure(Map<String, ?> configs, boolean isKey) { } } + private void serializeNullIndexList(final DataOutputStream out, List<Inner> data) throws IOException { + List<Integer> nullIndexList = IntStream.range(0, data.size()) + .filter(i -> data.get(i) == null) + .boxed().collect(Collectors.toList()); + out.writeInt(nullIndexList.size()); + for (int i : nullIndexList) out.writeInt(i); + } + @Override public byte[] serialize(String topic, List<Inner> data) { if (data == null) { return null; } - final int size = data.size(); try (final ByteArrayOutputStream baos = new ByteArrayOutputStream(); final DataOutputStream out = new DataOutputStream(baos)) { + out.writeByte(serStrategy.ordinal()); // write serialization strategy flag + if (serStrategy == SerializationStrategy.NULL_INDEX_LIST) { + serializeNullIndexList(out, data); + } + final int size = data.size(); out.writeInt(size); for (Inner entry : data) { - final byte[] bytes = inner.serialize(topic, entry); - if (!isFixedLength) { - out.writeInt(bytes.length); + if (entry == null) { + if (serStrategy == SerializationStrategy.NEGATIVE_SIZE) { + out.writeInt(Serdes.ListSerde.NEGATIVE_SIZE_VALUE); + } + } else { + final byte[] bytes = inner.serialize(topic, entry); + if (!isFixedLength || serStrategy == SerializationStrategy.NEGATIVE_SIZE) { + out.writeInt(bytes.length); Review comment: Just to clarify you mean don't expose this to the user at all, right? That sounds completely fine to me. If there are enough people trying to serialize lists of custom classes with all constant data size who want this optimization exposed for general use, then someone will request the feature and we can go back and add it in. Then we can debate what the API should look like at that time, and keep things simple for now. Personally I suspect the vast majority of non-primitive data types are not going to be constant size anyways. Given the above, I think whether to track the strategy as an actual `SerializationStrategy` enum vs a boolean flag becomes a matter of code style and personal preference, since it's no longer exposed to the user. So it's up to you whether you find the enum or the flag to be more readable or clean -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org