[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.
vahmed-hamdy commented on a change in pull request #18669: URL: https://github.com/apache/flink/pull/18669#discussion_r804660040 ## File path: flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/BufferedRequestState.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.base.sink.writer; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Deque; +import java.util.List; + +/** + * Class holding state of {@link AsyncSinkWriter} needed at taking a snapshot. The state captures + * the {@code bufferedRequestEntries} buffer for the writer at snapshot to resume the requests. This + * guarantees at least once semantic in sending requests where restoring from a snapshot where + * buffered requests were flushed to the sink will cause duplicate requests. + * + * @param request type. + */ +@PublicEvolving +public class BufferedRequestState implements Serializable { +private final List> bufferedRequestEntries; +private final long stateSize; + +public BufferedRequestState(Deque> bufferedRequestEntries) { +this.bufferedRequestEntries = new ArrayList<>(bufferedRequestEntries); +this.stateSize = calculateStateSize(); +} + +public BufferedRequestState(List> bufferedRequestEntries) { +this.bufferedRequestEntries = new ArrayList<>(bufferedRequestEntries); +this.stateSize = calculateStateSize(); +} Review comment: They were already, were split out to ensure an ordering data structure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.
vahmed-hamdy commented on a change in pull request #18669: URL: https://github.com/apache/flink/pull/18669#discussion_r804518817 ## File path: flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/BufferedRequestState.java ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.base.sink.writer; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.Objects; + +/** + * Class holding internal state of buffered requests. + * + * @param request type. + */ +@PublicEvolving Review comment: it voilated `PUBLIC_EVOLVING_API_METHODS_USE_ONLY_PUBLIC_EVOLVING_API_TYPES` arch rule the `snapshot()` function in the writer is public and can't return an `@Internal` class -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.
vahmed-hamdy commented on a change in pull request #18669: URL: https://github.com/apache/flink/pull/18669#discussion_r804518817 ## File path: flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/BufferedRequestState.java ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.base.sink.writer; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.Objects; + +/** + * Class holding internal state of buffered requests. + * + * @param request type. + */ +@PublicEvolving Review comment: it voilated `PUBLIC_EVOLVING_API_METHODS_USE_ONLY_PUBLIC_EVOLVING_API_TYPES` arch rule the `snapshot()` function in the writer is publich and can't return an `@Internal` class -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.
vahmed-hamdy commented on a change in pull request #18669: URL: https://github.com/apache/flink/pull/18669#discussion_r803575849 ## File path: flink-connectors/flink-connector-aws-kinesis-data-streams/src/main/java/org/apache/flink/connector/kinesis/sink/KinesisDataStreamsStateSerializer.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.kinesis.sink; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.connector.base.sink.writer.AsyncSinkWriterStateSerializer; + +import software.amazon.awssdk.core.SdkBytes; +import software.amazon.awssdk.services.kinesis.model.PutRecordsRequestEntry; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; + +/** Kinesis Streams implementation {@link AsyncSinkWriterStateSerializer}. */ +@Internal +public class KinesisDataStreamsStateSerializer +extends AsyncSinkWriterStateSerializer { +@Override +protected void serializeRequestToStream(PutRecordsRequestEntry request, DataOutputStream out) +throws IOException { +out.write(request.data().asByteArrayUnsafe()); +serializePartitionKeyToStream(request.partitionKey(), out); +validateExplicitHashKey(request); +} + +protected void serializePartitionKeyToStream(String partitionKey, DataOutputStream out) +throws IOException { +out.writeInt(partitionKey.length()); +out.write(partitionKey.getBytes()); +} + +protected void validateExplicitHashKey(PutRecordsRequestEntry request) { +if (request.explicitHashKey() != null) { +throw new IllegalStateException( +"Request contains field not included in serialization."); +} +} + +@Override +protected PutRecordsRequestEntry deserializeRequestFromStream( +long requestSize, DataInputStream in) throws IOException { +byte[] requestData = readBytes(in, (int) requestSize); + +return PutRecordsRequestEntry.builder() +.data(SdkBytes.fromByteArray(requestData)) +.partitionKey(deserializePartitionKeyToStream(in)) +.build(); +} + +protected String deserializePartitionKeyToStream(DataInputStream in) throws IOException { +int partitionKeyLength = readInt(in); +byte[] requestPartitionKeyData = readBytes(in, partitionKeyLength); +return new String(requestPartitionKeyData); +} Review comment: It is agreed to remove byte stream validation and bubble up the original error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.
vahmed-hamdy commented on a change in pull request #18669: URL: https://github.com/apache/flink/pull/18669#discussion_r803465710 ## File path: flink-connectors/flink-connector-aws-kinesis-data-streams/src/main/java/org/apache/flink/connector/kinesis/sink/KinesisDataStreamsStateSerializer.java ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.kinesis.sink; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.connector.base.sink.writer.AsyncSinkWriterStateSerializer; + +import software.amazon.awssdk.core.SdkBytes; +import software.amazon.awssdk.services.kinesis.model.PutRecordsRequestEntry; + +import javax.annotation.Nullable; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; + +/** Kinesis Streams implementation {@link AsyncSinkWriterStateSerializer}. */ +@Internal +public class KinesisDataStreamsStateSerializer +extends AsyncSinkWriterStateSerializer { +@Override +protected void serializeRequestToStream(PutRecordsRequestEntry request, DataOutputStream out) +throws IOException { +out.write(request.data().asByteArrayUnsafe()); +serializePartitionKeyToStream(request.partitionKey(), out); +serializeExplicitHashKeyToStream(request.explicitHashKey(), out); Review comment: Agree, It is specific to serialize `BufferRequestsState` whose request would never have an explicit hash key. will remove and assert. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.
vahmed-hamdy commented on a change in pull request #18669: URL: https://github.com/apache/flink/pull/18669#discussion_r802818889 ## File path: flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/BufferedRequestState.java ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.base.sink.writer; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.Objects; + +/** + * Class holding internal state of buffered requests. + * + * @param request type. + */ +@PublicEvolving +public class BufferedRequestState implements Serializable { Review comment: It was intended for readability, we would other wise be transfering state as `Collection>>` , I am fine removing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.
vahmed-hamdy commented on a change in pull request #18669: URL: https://github.com/apache/flink/pull/18669#discussion_r802815599 ## File path: flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java ## @@ -406,11 +403,30 @@ private void addEntryToBuffer(RequestEntryT entry, boolean insertAtHead) { * a failure/restart of the application. */ @Override -public List> snapshotState() { -return Arrays.asList( -bufferedRequestEntries.stream() -.map(RequestEntryWrapper::getRequestEntry) -.collect(Collectors.toList())); +public List> snapshotState() { +return Collections.singletonList( +new BufferedRequestState<>( + Collections.unmodifiableCollection(bufferedRequestEntries))); +} + +protected void initialize(BufferedRequestState state) { +this.bufferedRequestEntries.clear(); +this.bufferedRequestEntries.addAll(state.getBufferedRequestEntries()); + +long sum = 0L; +for (RequestEntryWrapper wrapper : bufferedRequestEntries) { +if (wrapper.getSize() > maxRecordSizeInBytes) { +throw new IllegalStateException( Review comment: We allowed `maxBufferedRequests` to flush at the end of initialisation. It is agreed not to validate and to drop the flush logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.
vahmed-hamdy commented on a change in pull request #18669: URL: https://github.com/apache/flink/pull/18669#discussion_r802808380 ## File path: flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/BufferedRequestState.java ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.base.sink.writer; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.Objects; + +/** + * Class holding internal state of buffered requests. + * + * @param request type. + */ +@PublicEvolving Review comment: I wanted to use `@Internal` but this fails architecture tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.
vahmed-hamdy commented on a change in pull request #18669: URL: https://github.com/apache/flink/pull/18669#discussion_r802792404 ## File path: flink-connectors/flink-connector-aws-kinesis-data-streams/src/main/java/org/apache/flink/connector/kinesis/sink/KinesisDataStreamsSink.java ## @@ -126,13 +126,19 @@ getMaxRecordSizeInBytes(), failOnError, streamName, -kinesisClientProperties); +kinesisClientProperties, +getInitialState(states)); +} + +private BufferedRequestState getInitialState( +List> states) { +return states.isEmpty() ? BufferedRequestState.emptyState() : states.get(0); Review comment: Yes, we wrap the entire state in a single item. I agree, will validate and fail -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org