scwhittle commented on code in PR #38139:
URL: https://github.com/apache/beam/pull/38139#discussion_r3063433363
##########
runners/core-java/src/main/java/org/apache/beam/runners/core/triggers/TriggerStateMachineRunner.java:
##########
@@ -59,7 +59,7 @@
public class TriggerStateMachineRunner<W extends BoundedWindow> {
@VisibleForTesting
public static final StateTag<ValueState<BitSet>> FINISHED_BITS_TAG =
- StateTags.makeSystemTagInternal(StateTags.value("closed",
BitSetCoder.of()));
+ StateTags.makeSystemTagInternal(StateTags.value("closed",
SentinelBitSetCoder.of()));
Review Comment:
could this lead to possible unbounded state growth in cases where it was not
possible?
Consider a pipeline that is global windows with unbounded keyspace and some
afterprocessing time trigger. Would we now store some encoded empty array for
the finished bits for every key where as before there would not be this state
since it would be empty and treated as a delete?
##########
runners/core-java/src/main/java/org/apache/beam/runners/core/serialization/SentinelBitSetCoder.java:
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.serialization;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.BitSet;
+import org.apache.beam.sdk.coders.AtomicCoder;
+import org.apache.beam.sdk.coders.ByteArrayCoder;
+import org.apache.beam.sdk.coders.CoderException;
+
+/**
+ * Coder for {@link BitSet} that stores an empty bit set as a byte array with
a single 0 element.
Review Comment:
// In general BitSetCoder should be preferred as it encodes an empty bit set
as an empty byte array. However there are cases where non-empty values are
useful to indicate presence.
##########
runners/core-java/src/main/java/org/apache/beam/runners/core/triggers/TriggerStateMachineRunner.java:
##########
@@ -59,7 +59,7 @@
public class TriggerStateMachineRunner<W extends BoundedWindow> {
@VisibleForTesting
public static final StateTag<ValueState<BitSet>> FINISHED_BITS_TAG =
- StateTags.makeSystemTagInternal(StateTags.value("closed",
BitSetCoder.of()));
+ StateTags.makeSystemTagInternal(StateTags.value("closed",
SentinelBitSetCoder.of()));
Review Comment:
Can you manually test this won't break dataflow update compatability?
I'm not sure if this is one of the coders verified
##########
runners/core-java/src/main/java/org/apache/beam/runners/core/serialization/SentinelBitSetCoder.java:
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.serialization;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.BitSet;
+import org.apache.beam.sdk.coders.AtomicCoder;
+import org.apache.beam.sdk.coders.ByteArrayCoder;
+import org.apache.beam.sdk.coders.CoderException;
+
+/**
+ * Coder for {@link BitSet} that stores an empty bit set as a byte array with
a single 0 element.
+ */
+public class SentinelBitSetCoder extends AtomicCoder<BitSet> {
+ private static final SentinelBitSetCoder INSTANCE = new
SentinelBitSetCoder();
+ private static final ByteArrayCoder BYTE_ARRAY_CODER = ByteArrayCoder.of();
+
+ private SentinelBitSetCoder() {}
+
+ public static SentinelBitSetCoder of() {
+ return INSTANCE;
+ }
+
+ @Override
+ public void encode(BitSet value, OutputStream outStream) throws
CoderException, IOException {
+ encode(value, outStream, Context.NESTED);
+ }
+
+ @Override
+ public void encode(BitSet value, OutputStream outStream, Context context)
+ throws CoderException, IOException {
+ if (value == null) {
+ throw new CoderException("cannot encode a null BitSet");
+ }
+ byte[] bytes = value.isEmpty() ? new byte[] {0} : value.toByteArray();
+ BYTE_ARRAY_CODER.encodeAndOwn(bytes, outStream, context);
+ }
+
+ @Override
+ public BitSet decode(InputStream inStream) throws CoderException,
IOException {
+ return decode(inStream, Context.NESTED);
+ }
+
+ @Override
+ public BitSet decode(InputStream inStream, Context context) throws
CoderException, IOException {
+ return BitSet.valueOf(BYTE_ARRAY_CODER.decode(inStream, context));
+ }
+
+ @Override
+ public void verifyDeterministic() throws NonDeterministicException {
+ verifyDeterministic(
+ this,
+ "SentinelBitSetCoder requires its ByteArrayCoder to be deterministic.",
+ BYTE_ARRAY_CODER);
+ }
+
+ @Override
+ public boolean consistentWithEquals() {
+ return true;
Review Comment:
I took a look at the use cases as well and it seems like it would only
matter for some state objects etc. I think it is fine here.
##########
runners/core-java/src/main/java/org/apache/beam/runners/core/triggers/TriggerStateMachineRunner.java:
##########
Review Comment:
doesn't this mean the new coder logic doesn't really matter?
##########
runners/core-java/src/main/java/org/apache/beam/runners/core/serialization/SentinelBitSetCoder.java:
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.serialization;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.BitSet;
+import org.apache.beam.sdk.coders.AtomicCoder;
+import org.apache.beam.sdk.coders.ByteArrayCoder;
+import org.apache.beam.sdk.coders.CoderException;
+
+/**
+ * Coder for {@link BitSet} that stores an empty bit set as a byte array with
a single 0 element.
+ */
+public class SentinelBitSetCoder extends AtomicCoder<BitSet> {
+ private static final SentinelBitSetCoder INSTANCE = new
SentinelBitSetCoder();
+ private static final ByteArrayCoder BYTE_ARRAY_CODER = ByteArrayCoder.of();
+
+ private SentinelBitSetCoder() {}
+
+ public static SentinelBitSetCoder of() {
+ return INSTANCE;
+ }
+
+ @Override
+ public void encode(BitSet value, OutputStream outStream) throws
CoderException, IOException {
+ encode(value, outStream, Context.NESTED);
+ }
+
+ @Override
+ public void encode(BitSet value, OutputStream outStream, Context context)
+ throws CoderException, IOException {
+ if (value == null) {
+ throw new CoderException("cannot encode a null BitSet");
+ }
+ byte[] bytes = value.isEmpty() ? new byte[] {0} : value.toByteArray();
Review Comment:
ah, missed that part of the pr description. makes sense now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]