[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.

2022-02-11 Thread GitBox


vahmed-hamdy commented on a change in pull request #18669:
URL: https://github.com/apache/flink/pull/18669#discussion_r804660040



##
File path: 
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/BufferedRequestState.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.base.sink.writer;
+
+import org.apache.flink.annotation.PublicEvolving;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Deque;
+import java.util.List;
+
+/**
+ * Class holding state of {@link AsyncSinkWriter} needed at taking a snapshot. 
The state captures
+ * the {@code bufferedRequestEntries} buffer for the writer at snapshot to 
resume the requests. This
+ * guarantees at least once semantic in sending requests where restoring from 
a snapshot where
+ * buffered requests were flushed to the sink will cause duplicate requests.
+ *
+ * @param  request type.
+ */
+@PublicEvolving
+public class BufferedRequestState 
implements Serializable {
+private final List> 
bufferedRequestEntries;
+private final long stateSize;
+
+public BufferedRequestState(Deque> 
bufferedRequestEntries) {
+this.bufferedRequestEntries = new ArrayList<>(bufferedRequestEntries);
+this.stateSize = calculateStateSize();
+}
+
+public BufferedRequestState(List> 
bufferedRequestEntries) {
+this.bufferedRequestEntries = new ArrayList<>(bufferedRequestEntries);
+this.stateSize = calculateStateSize();
+}

Review comment:
   They were already, were split out to ensure an ordering data structure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.

2022-02-11 Thread GitBox


vahmed-hamdy commented on a change in pull request #18669:
URL: https://github.com/apache/flink/pull/18669#discussion_r804518817



##
File path: 
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/BufferedRequestState.java
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.base.sink.writer;
+
+import org.apache.flink.annotation.PublicEvolving;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Objects;
+
+/**
+ * Class holding internal state of buffered requests.
+ *
+ * @param  request type.
+ */
+@PublicEvolving

Review comment:
   it voilated 
`PUBLIC_EVOLVING_API_METHODS_USE_ONLY_PUBLIC_EVOLVING_API_TYPES` arch rule
   the `snapshot()` function in the writer is public and can't return an 
`@Internal` class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.

2022-02-11 Thread GitBox


vahmed-hamdy commented on a change in pull request #18669:
URL: https://github.com/apache/flink/pull/18669#discussion_r804518817



##
File path: 
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/BufferedRequestState.java
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.base.sink.writer;
+
+import org.apache.flink.annotation.PublicEvolving;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Objects;
+
+/**
+ * Class holding internal state of buffered requests.
+ *
+ * @param  request type.
+ */
+@PublicEvolving

Review comment:
   it voilated 
`PUBLIC_EVOLVING_API_METHODS_USE_ONLY_PUBLIC_EVOLVING_API_TYPES` arch rule
   the `snapshot()` function in the writer is publich and can't return an 
`@Internal` class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.

2022-02-10 Thread GitBox


vahmed-hamdy commented on a change in pull request #18669:
URL: https://github.com/apache/flink/pull/18669#discussion_r803575849



##
File path: 
flink-connectors/flink-connector-aws-kinesis-data-streams/src/main/java/org/apache/flink/connector/kinesis/sink/KinesisDataStreamsStateSerializer.java
##
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.kinesis.sink;
+
+import org.apache.flink.annotation.Internal;
+import 
org.apache.flink.connector.base.sink.writer.AsyncSinkWriterStateSerializer;
+
+import software.amazon.awssdk.core.SdkBytes;
+import software.amazon.awssdk.services.kinesis.model.PutRecordsRequestEntry;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+
+/** Kinesis Streams implementation {@link AsyncSinkWriterStateSerializer}. */
+@Internal
+public class KinesisDataStreamsStateSerializer
+extends AsyncSinkWriterStateSerializer {
+@Override
+protected void serializeRequestToStream(PutRecordsRequestEntry request, 
DataOutputStream out)
+throws IOException {
+out.write(request.data().asByteArrayUnsafe());
+serializePartitionKeyToStream(request.partitionKey(), out);
+validateExplicitHashKey(request);
+}
+
+protected void serializePartitionKeyToStream(String partitionKey, 
DataOutputStream out)
+throws IOException {
+out.writeInt(partitionKey.length());
+out.write(partitionKey.getBytes());
+}
+
+protected void validateExplicitHashKey(PutRecordsRequestEntry request) {
+if (request.explicitHashKey() != null) {
+throw new IllegalStateException(
+"Request contains field not included in serialization.");
+}
+}
+
+@Override
+protected PutRecordsRequestEntry deserializeRequestFromStream(
+long requestSize, DataInputStream in) throws IOException {
+byte[] requestData = readBytes(in, (int) requestSize);
+
+return PutRecordsRequestEntry.builder()
+.data(SdkBytes.fromByteArray(requestData))
+.partitionKey(deserializePartitionKeyToStream(in))
+.build();
+}
+
+protected String deserializePartitionKeyToStream(DataInputStream in) 
throws IOException {
+int partitionKeyLength = readInt(in);
+byte[] requestPartitionKeyData = readBytes(in, partitionKeyLength);
+return new String(requestPartitionKeyData);
+}

Review comment:
   It is agreed to remove byte stream validation and bubble up the original 
error.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.

2022-02-10 Thread GitBox


vahmed-hamdy commented on a change in pull request #18669:
URL: https://github.com/apache/flink/pull/18669#discussion_r803465710



##
File path: 
flink-connectors/flink-connector-aws-kinesis-data-streams/src/main/java/org/apache/flink/connector/kinesis/sink/KinesisDataStreamsStateSerializer.java
##
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.kinesis.sink;
+
+import org.apache.flink.annotation.Internal;
+import 
org.apache.flink.connector.base.sink.writer.AsyncSinkWriterStateSerializer;
+
+import software.amazon.awssdk.core.SdkBytes;
+import software.amazon.awssdk.services.kinesis.model.PutRecordsRequestEntry;
+
+import javax.annotation.Nullable;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+
+/** Kinesis Streams implementation {@link AsyncSinkWriterStateSerializer}. */
+@Internal
+public class KinesisDataStreamsStateSerializer
+extends AsyncSinkWriterStateSerializer {
+@Override
+protected void serializeRequestToStream(PutRecordsRequestEntry request, 
DataOutputStream out)
+throws IOException {
+out.write(request.data().asByteArrayUnsafe());
+serializePartitionKeyToStream(request.partitionKey(), out);
+serializeExplicitHashKeyToStream(request.explicitHashKey(), out);

Review comment:
   Agree, It is specific to serialize `BufferRequestsState` whose request 
would never have an explicit hash key.
   will remove and assert.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.

2022-02-09 Thread GitBox


vahmed-hamdy commented on a change in pull request #18669:
URL: https://github.com/apache/flink/pull/18669#discussion_r802818889



##
File path: 
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/BufferedRequestState.java
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.base.sink.writer;
+
+import org.apache.flink.annotation.PublicEvolving;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Objects;
+
+/**
+ * Class holding internal state of buffered requests.
+ *
+ * @param  request type.
+ */
+@PublicEvolving
+public class BufferedRequestState 
implements Serializable {

Review comment:
   It was intended for readability, we would other wise be transfering 
state as `Collection>>` , I am fine removing 
it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.

2022-02-09 Thread GitBox


vahmed-hamdy commented on a change in pull request #18669:
URL: https://github.com/apache/flink/pull/18669#discussion_r802815599



##
File path: 
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java
##
@@ -406,11 +403,30 @@ private void addEntryToBuffer(RequestEntryT entry, 
boolean insertAtHead) {
  * a failure/restart of the application.
  */
 @Override
-public List> snapshotState() {
-return Arrays.asList(
-bufferedRequestEntries.stream()
-.map(RequestEntryWrapper::getRequestEntry)
-.collect(Collectors.toList()));
+public List> snapshotState() {
+return Collections.singletonList(
+new BufferedRequestState<>(
+
Collections.unmodifiableCollection(bufferedRequestEntries)));
+}
+
+protected void initialize(BufferedRequestState state) {
+this.bufferedRequestEntries.clear();
+this.bufferedRequestEntries.addAll(state.getBufferedRequestEntries());
+
+long sum = 0L;
+for (RequestEntryWrapper wrapper : 
bufferedRequestEntries) {
+if (wrapper.getSize() > maxRecordSizeInBytes) {
+throw new IllegalStateException(

Review comment:
   We allowed `maxBufferedRequests` to flush at the end of initialisation.
   It is agreed not to validate and to drop the flush logic.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.

2022-02-09 Thread GitBox


vahmed-hamdy commented on a change in pull request #18669:
URL: https://github.com/apache/flink/pull/18669#discussion_r802808380



##
File path: 
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/BufferedRequestState.java
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.base.sink.writer;
+
+import org.apache.flink.annotation.PublicEvolving;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Objects;
+
+/**
+ * Class holding internal state of buffered requests.
+ *
+ * @param  request type.
+ */
+@PublicEvolving

Review comment:
   I wanted to use `@Internal` but this fails architecture tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] vahmed-hamdy commented on a change in pull request #18669: [FLINK-25943][connector/common] Add buffered requests to snapshot state in AsyncSyncWriter.

2022-02-09 Thread GitBox


vahmed-hamdy commented on a change in pull request #18669:
URL: https://github.com/apache/flink/pull/18669#discussion_r802792404



##
File path: 
flink-connectors/flink-connector-aws-kinesis-data-streams/src/main/java/org/apache/flink/connector/kinesis/sink/KinesisDataStreamsSink.java
##
@@ -126,13 +126,19 @@
 getMaxRecordSizeInBytes(),
 failOnError,
 streamName,
-kinesisClientProperties);
+kinesisClientProperties,
+getInitialState(states));
+}
+
+private BufferedRequestState getInitialState(
+List> states) {
+return states.isEmpty() ? BufferedRequestState.emptyState() : 
states.get(0);

Review comment:
   Yes, we wrap the entire state in a single item.
   I agree, will validate and fail




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org