[jira] [Created] (FLINK-17957) Forbidden syntax "CREATE SYSTEM FUNCTION" for sql parser
Danny Chen created FLINK-17957: -- Summary: Forbidden syntax "CREATE SYSTEM FUNCTION" for sql parser Key: FLINK-17957 URL: https://issues.apache.org/jira/browse/FLINK-17957 Project: Flink Issue Type: Bug Components: Table SQL / API Affects Versions: 1.11.0 Reporter: Danny Chen Fix For: 1.12.0 This syntax is invalid, but the parser still works. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17958) Kubernetes session constantly allocates taskmanagers after cancel a job
Yang Wang created FLINK-17958: - Summary: Kubernetes session constantly allocates taskmanagers after cancel a job Key: FLINK-17958 URL: https://issues.apache.org/jira/browse/FLINK-17958 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.11.0, 1.12.0 Reporter: Yang Wang Fix For: 1.11.0 When i am testing the {{kubernetes-session.sh}}, i find that the {{KubernetesResourceManager}} will constantly allocate taskmanager after cancel a job. I think it may be caused by a bug of the following code. When the {{dividend}} is 0 and {{divisor}} is bigger than 1, the return value will be 1. However, we expect it to be 0. {code:java} /** * Divide and rounding up to integer. * E.g., divideRoundUp(3, 2) returns 2. * @param dividend value to be divided by the divisor * @param divisor value by which the dividend is to be divided * @return the quotient rounding up to integer */ public static int divideRoundUp(int dividend, int divisor) { return (dividend - 1) / divisor + 1; }{code} How to reproduce this issue? # Start a Kubernetes session # Submit a Flink job to the existing session # Cancel the job and wait for the TaskManager released via idle timeout # More and more TaskManagers will be allocated -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17959) Exception: "CANCELLED: call already cancelled" is thrown when run python udf
Hequn Cheng created FLINK-17959: --- Summary: Exception: "CANCELLED: call already cancelled" is thrown when run python udf Key: FLINK-17959 URL: https://issues.apache.org/jira/browse/FLINK-17959 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.10.1, 1.11.0 Reporter: Hequn Cheng The exception is thrown when running Python UDF: {code:java} May 27, 2020 3:20:49 PM org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.SerializingExecutor run SEVERE: Exception while executing runnable org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed@3960b30e org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:366) at org.apache.beam.runners.fnexecution.state.GrpcStateService$Inbound.onError(GrpcStateService.java:145) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:270) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.PartialForwardingServerCallListener.onCancel(PartialForwardingServerCallListener.java:40) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:23) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:40) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Contexts$ContextualizedServerCallListener.onCancel(Contexts.java:96) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:337) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:793) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} The job can output the right results however it seems something goes wrong during the shutdown procedure. You can reproduce the exception with the following code(note: the exception happens occasionally): {code} from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import StreamTableEnvironment, DataTypes from pyflink.table.descriptors import Schema, OldCsv, FileSystem from pyflink.table.udf import udf env = StreamExecutionEnvironment.get_execution_environment() env.set_parallelism(1) t_env = StreamTableEnvironment.create(env) add = udf(lambda i, j: i + j, [DataTypes.BIGINT(), DataTypes.BIGINT()], DataTypes.BIGINT()) t_env.register_function("add", add) t_env.connect(FileSystem().path('/tmp/input')) \ .with_format(OldCsv() .field('a', DataTypes.BIGINT()) .field('b', DataTypes.BIGINT())) \ .with_schema(Schema() .field('a', DataTypes.BIGINT()) .field('b', DataTypes.BIGINT())) \ .create_temporary_table('mySource') t_env.connect(FileSystem().path('/tmp/output')) \ .with_format(OldCsv() .field('sum', DataTypes.BIGINT())) \ .with_schema(Schema() .field('sum', DataTypes.BIGINT())) \ .create_temporary_table('mySink') t_env.from_path('mySource')\ .select("add(a, b)") \ .insert_into('mySink') t_env.execute("tutorial_job") {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] (Document) Backwards Compatibility of Savepoints
I agree with Konstantin and Steven that it makes sense to point this out explicitly. I think that the following would be helpful: 1/ Mention breaking compatibility in release notes 2/ Update the linked table to reflect compatibilities while pointing out what the community commits to maintain going forward (e.g. "happens to work" vs. "guaranteed to work") In general, the table is quite large. Would it make sense to order the releases in reverse order (assuming that the table is more relevant for recent releases)? – Ufuk On Tue, May 26, 2020 at 8:36 PM Steven Wu wrote: > > A use case for this might be when you want to rollback a framework > upgrade (after some time) due to e.g. a performance > or stability issue. > > Downgrade (that Konstantin called out) is an important and realistic > scenario. It will be great to support backward compatibility for savepoint > or at least document any breaking change. > > On Tue, May 26, 2020 at 4:39 AM Piotr Nowojski > wrote: > > > Hi, > > > > It might have been implicit choice, but so far we were not supporting the > > scenario that you are asking for. It has never been tested and we have > > lot’s of state migration code sprinkled among our code base (for example > > upgrading state fields of the operators like [1]), that only supports > > upgrades, not downgrades. > > > > Also we do not have testing infrastructure for checking the downgrades. > We > > would need to check if save points taken from master branch, are readable > > by previous releases (not release branch!). > > > > So all in all, I don’t think it can be easily done. It would require some > > effort to start maintaining backward compatibility. > > > > Piotrek > > > > [1] > > > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011#migrateNextTransactionalIdHindState > > > > > On 26 May 2020, at 13:18, Konstantin Knauf wrote: > > > > > > Hi everyone, > > > > > > I recently stumbled across the fact that Savepoints created with Flink > > 1.11 > > > can not be read by Flink 1.10. A use case for this might be when you > want > > > to rollback a framework upgrade (after some time) due to e.g. a > > performance > > > or stability issue. > > > > > > From the documentation [1] it seems as if the Savepoint format is > > generally > > > only forward-compatible although in many cases it is actually also > > > backwards compatible (e.g. Savepoint taken in Flink 1.10, restored with > > > Flink 1.9). > > > > > > Was it a deliberate choice not to document any backwards compatibility? > > If > > > not, should we add the missing entries in the compatibility table? > > > > > > Thanks, > > > > > > Konstantin > > > > > > [1] > > > > > > https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html#compatibility-table > > > > > > -- > > > > > > Konstantin Knauf > > > > > > https://twitter.com/snntrable > > > > > > https://github.com/knaufk > > > > >
[jira] [Created] (FLINK-17960) Improve commands in the "Common Questions" document for PyFlink
Hequn Cheng created FLINK-17960: --- Summary: Improve commands in the "Common Questions" document for PyFlink Key: FLINK-17960 URL: https://issues.apache.org/jira/browse/FLINK-17960 Project: Flink Issue Type: Improvement Components: API / Python Affects Versions: 1.11.0 Reporter: Hequn Cheng Currently, in the ["Common Questions" |https://ci.apache.org/projects/flink/flink-docs-master/dev/table/python/common_questions.html#preparing-python-virtual-environment]document, we have the command `$ setup-pyflink-virtual-env.sh` to run the script. However, the script is not executable. It would be better to replace the command with `$ sh setup-pyflink-virtual-env.sh` and add download command. {code} $ curl -O https://ci.apache.org/projects/flink/flink-docs-master/downloads/setup-pyflink-virtual-env.sh $ sh setup-pyflink-virtual-env.sh {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17961) Create an Elasticsearch source
Etienne Chauchot created FLINK-17961: Summary: Create an Elasticsearch source Key: FLINK-17961 URL: https://issues.apache.org/jira/browse/FLINK-17961 Project: Flink Issue Type: New Feature Reporter: Etienne Chauchot There is only an Elasticsearch sink available. There are opensource github repos such as [this one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also the apache bahir project does not provide an Elasticsearch source connector for flink either. IMHO I think the project would benefit from having an stock source connector for ES alongside with the available sink connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release flink-shaded 11.0, release candidate #1
+1 (binding) Checks: - diff to flink-shaded 1.10: https://github.com/apache/flink-shaded/compare/release-10.0...release-11.0-rc1 - mvn clean install passes on the source archive - sha of source archive is correct - source archive is signed by Chesnay - mvn staging repo looks reasonable - flink-shaded-zookeeper 3 jar license documentation seems correct On Mon, May 25, 2020 at 7:14 PM Chesnay Schepler wrote: > Hi everyone, > Please review and vote on the release candidate #1 for the version 11.0, > as follows: > [ ] +1, Approve the release > [ ] -1, Do not approve the release (please provide specific comments) > > > The complete staging area is available for your review, which includes: > * JIRA release notes [1], > * the official Apache source release to be deployed to dist.apache.org > [2], which are signed with the key with fingerprint 11D464BA [3], > * all artifacts to be deployed to the Maven Central Repository [4], > * source code tag "release-11.0-rc1" [5], > * website pull request listing the new release [6]. > > The vote will be open for at least 72 hours. It is adopted by majority > approval, with at least 3 PMC affirmative votes. > > Thanks, > Chesnay > > [1] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12347784 > [2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-11.0-rc1/ > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > [4] > https://repository.apache.org/content/repositories/orgapacheflink-1372/ > [5] https://github.com/apache/flink-shaded/tree/release-11.0-rc1 > [6] https://github.com/apache/flink-web/pull/340 > >
[jira] [Created] (FLINK-17962) Add document for how to define Python UDF with DDL
Hequn Cheng created FLINK-17962: --- Summary: Add document for how to define Python UDF with DDL Key: FLINK-17962 URL: https://issues.apache.org/jira/browse/FLINK-17962 Project: Flink Issue Type: Improvement Components: API / Python, Documentation Reporter: Hequn Cheng -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17963) Revert execution environment patching in StatefulFunctionsJob (FLINK-16926)
Tzu-Li (Gordon) Tai created FLINK-17963: --- Summary: Revert execution environment patching in StatefulFunctionsJob (FLINK-16926) Key: FLINK-17963 URL: https://issues.apache.org/jira/browse/FLINK-17963 Project: Flink Issue Type: Task Components: Stateful Functions Affects Versions: statefun-2.0.1, statefun-2.1.0 Reporter: Tzu-Li (Gordon) Tai Assignee: Tzu-Li (Gordon) Tai In FLINK-16926, we explicitly "patched" the {{StreamExecutionEnvironment}} due to FLINK-16560. Now that we have upgraded the Flink version in StateFun to 1.10.1 which includes a fix for FLINK-16560, we can now revert the patching of {{StreamExecutionEnvironment}} in the {{StatefulFunctionsJob}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17964) Hivemodule does not support map type
夏帅 created FLINK-17964: -- Summary: Hivemodule does not support map type Key: FLINK-17964 URL: https://issues.apache.org/jira/browse/FLINK-17964 Project: Flink Issue Type: Bug Components: Connectors / Hive Affects Versions: 1.10.1 Reporter: 夏帅 Fix For: 1.12.0, 1.11.1 {code:java} //代码占位符 Exception in thread "main" scala.MatchError: MAP (of class org.apache.flink.table.types.logical.LogicalTypeRoot)Exception in thread "main" scala.MatchError: MAP (of class org.apache.flink.table.types.logical.LogicalTypeRoot) at org.apache.flink.table.planner.codegen.CodeGenUtils$.hashCodeForType(CodeGenUtils.scala:212) at org.apache.flink.table.planner.codegen.HashCodeGenerator$$anonfun$2.apply(HashCodeGenerator.scala:97) at org.apache.flink.table.planner.codegen.HashCodeGenerator$$anonfun$2.apply(HashCodeGenerator.scala:91) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.flink.table.planner.codegen.HashCodeGenerator$.generateCodeBody(HashCodeGenerator.scala:91) at org.apache.flink.table.planner.codegen.HashCodeGenerator$.generateRowHash(HashCodeGenerator.scala:61) at org.apache.flink.table.planner.plan.nodes.physical.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.scala:182) at org.apache.flink.table.planner.plan.nodes.physical.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.scala:52) at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17965) Hive dialect doesn't properly handle special character escaping with SQL CLI
Rui Li created FLINK-17965: -- Summary: Hive dialect doesn't properly handle special character escaping with SQL CLI Key: FLINK-17965 URL: https://issues.apache.org/jira/browse/FLINK-17965 Project: Flink Issue Type: Bug Components: Table SQL / Client Reporter: Rui Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17966) Documentation improvement of checkpointing API
Vijaya Bhaskar V created FLINK-17966: Summary: Documentation improvement of checkpointing API Key: FLINK-17966 URL: https://issues.apache.org/jira/browse/FLINK-17966 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 1.10.1 Reporter: Vijaya Bhaskar V As per mail thread discussion: [https://lists.apache.org/thread.html/rfe38f6f15f35c28b3ea7865cee2a4dd1d31d93c8d212ad0723b9fd7b%40%3Cuser.flink.apache.org%3E] While using Flink REST Monitoring API's related to checkpoints: [https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html] , its better to give reference of [https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/checkpoint_monitoring.html#overview-tab] and retained check points: [https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html] and give example of various fields of it, in the scenarios a) When job has enough number of attempts after failure and recovery b) When job has retained check points, how the response looks like c) When the job has last but one attempt left and has already few check points left -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17967) Could not find document *** in docs
Aven Wu created FLINK-17967: --- Summary: Could not find document *** in docs Key: FLINK-17967 URL: https://issues.apache.org/jira/browse/FLINK-17967 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.11.1 Reporter: Aven Wu Fix For: 1.12.0, 1.11.1 When I run build_docs.sh -p in docs dir with *master*, some error as list down below. And web server cannot started. {color:#FF}Liquid Exception: Could not find document 'concepts/stateful-stream-processing.md' in tag 'link'. Make sure the document exists and the path is correct. in dev/stream/state/state.zh.md{color} List files also have same problem. _concepts/flink-architecture.zh.md_ _dev/batch/hadoop_compatibility.zh.md_ _dev/batch/index.zh.md_ _dev/connectors/cassandra.zh.md_ _dev/stream/operators/index.zh.md_ _dev/stream/operators/state.zh.md_ _dev/table/common.zh.md_ 1.11 seems has same problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17968) Hadoop Configuration is not properly serialized in HBaseRowInputFormat
Robert Metzger created FLINK-17968: -- Summary: Hadoop Configuration is not properly serialized in HBaseRowInputFormat Key: FLINK-17968 URL: https://issues.apache.org/jira/browse/FLINK-17968 Project: Flink Issue Type: Bug Components: Connectors / HBase, Table SQL / Ecosystem Affects Versions: 1.12.0 Reporter: Robert Metzger This pull request mentions an issue with the serialization of the {{Configuration}} field in {{HBaseRowInputFormat}}: https://github.com/apache/flink/pull/12146. After reviewing the code, it seems that this field is {transient}, thus it is always {null} at runtime. Note {{HBaseRowDataInputFormat}} is likely suffering from the same issue (because the code has been copied) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17969) Enhance Flink (Task) logging to include job name as context diagnostic information
Bhagavan created FLINK-17969: Summary: Enhance Flink (Task) logging to include job name as context diagnostic information Key: FLINK-17969 URL: https://issues.apache.org/jira/browse/FLINK-17969 Project: Flink Issue Type: Improvement Components: Runtime / Task Affects Versions: 1.10.0 Reporter: Bhagavan Problem statement: We use a shared session cluster (Standalone/Yarn) to execute jobs. All logs from the cluster are shipped using log aggregation framework (Logstash/Splunk) so that application diagnostic is easier. However, we are missing one vital information in the logline. i.e. Job name so that we can filter the logs for a single job. Background Currently, Flink logging uses SLF4J as API to abstract away from concrete logging implementation (log4j 1.x, Logback or log4j2) and configuration of logging pattern and implementation can be configured at deployment, However, there is no MDC info from framework indicating job context. Proposed improvement. Add jobName field to Task class so that we can add it as MDC when task thread starts executing. Change is trivial and uses SLF4J MDC API. With this change, user can customise logging pattern to include MDC (e.g. in Logback [%X{jobName}]) Change required. {code:java} @@ -319,6 +323,7 @@ public class Task implements Runnable, TaskSlotPayload, TaskActions, PartitionPr this.jobId = jobInformation.getJobId(); + this.jobName = jobInformation.getJobName(); this.vertexId = taskInformation.getJobVertexId(); @@ -530,8 +535,10 @@ public class Task implements Runnable, TaskSlotPayload, TaskActions, PartitionPr @Override public void run() { try { + MDC.put("jobName", this.jobName); doRun(); } finally { + MDC.remove("jobName"); terminationFuture.complete(executionState); } } {code} if we are in agreement for this small change. Will raise PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17970) Increase default value of IO pool executor to 4 * #cores
Till Rohrmann created FLINK-17970: - Summary: Increase default value of IO pool executor to 4 * #cores Key: FLINK-17970 URL: https://issues.apache.org/jira/browse/FLINK-17970 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Affects Versions: 1.11.0 Reporter: Till Rohrmann Assignee: Till Rohrmann Fix For: 1.11.0 Currently, the default value of {{ClusterOptions.CLUSTER_IO_EXECUTOR_POOL_SIZE}} is #cores. I propose to increase it to 4 * #cores to support a higher load of blocking IO operations. Moreover, I propose to use a cached thread pool instead of a fixed thread pool. That way, only those use cases which have high IO load will actually occupy the required resources to start more threads. Last but not least, I propose to change the config option name from {{cluster.io-executor.pool-size}} to {{cluster.io-pool.size}} which is a bit shorter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17971) Speed up RocksDB bulk loading with SST generation and ingestion
Joey Pereira created FLINK-17971: Summary: Speed up RocksDB bulk loading with SST generation and ingestion Key: FLINK-17971 URL: https://issues.apache.org/jira/browse/FLINK-17971 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Reporter: Joey Pereira RocksDB provides an API for creating SST files and ingesting them directly into RocksDB: [https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files] Using this method for bulk loading data into RocksDB may provide a significant performance increase, specifically for paths doing inserts such as full savepoint recovery and state migrations. This is one method of optimizing bulk loads, as described in https://issues.apache.org/jira/browse/FLINK-17288 This was discussed on the user maillist: [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/RocksDB-savepoint-recovery-performance-improvements-td35238.html] A draft PR is here: [https://github.com/apache/flink/pull/12345/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17972) Consider restructuring channel state
Roman Khachatryan created FLINK-17972: - Summary: Consider restructuring channel state Key: FLINK-17972 URL: https://issues.apache.org/jira/browse/FLINK-17972 Project: Flink Issue Type: Improvement Components: Runtime / Checkpointing Affects Versions: 1.11.0 Reporter: Roman Khachatryan Assignee: Roman Khachatryan Fix For: 1.12.0 Current structure is the following (this PR doesn't change it): {{}} {code:java} Each subtask reports to JM TaskStateSnapshot each with zero ore more OperatorSubtaskState, each with zero or more InputChannelStateHandle and ResultSubpartitionStateHandle each referencing an underlying StreamStateHandle {code} The underlying {{StreamStateHandle}} duplicates filename ({{ByteStreamStateHandle}} has it too at least because of {{equals/hashcode}} I guess). An alternative would be something like {{}} {code:java} Each subtask reports to JM TaskStateSnapshot each with zero ore more OperatorSubtaskState each with zero or one StreamStateHandle (for channel state) each with zero or more InputChannelStateHandle and ResultSubpartitionStateHandle{code} {{}} {{}} {{(p}}{{robably, with }}{{StreamStateHandle}}{{ and }}{{InputChannelStateHandle and ResultSubpartitionStateHandle}}{{ encapsulated)}} {{}} It would be more effective (less data duplication) but probably also more error-prone (implicit structure), less flexible (re-scaling).{{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17973) Test memory configuration of Flink cluster
Till Rohrmann created FLINK-17973: - Summary: Test memory configuration of Flink cluster Key: FLINK-17973 URL: https://issues.apache.org/jira/browse/FLINK-17973 Project: Flink Issue Type: Task Components: Runtime / Coordination, Tests Affects Versions: 1.11.0 Reporter: Till Rohrmann Fix For: 1.11.0 Make sure that Flink processes (in particular Master processes) fail with a meaningful exception message if they exceed the configured memory budgets. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17974) Test new Flink Docker image
Till Rohrmann created FLINK-17974: - Summary: Test new Flink Docker image Key: FLINK-17974 URL: https://issues.apache.org/jira/browse/FLINK-17974 Project: Flink Issue Type: Task Components: Deployment / Docker Affects Versions: 1.11.0 Reporter: Till Rohrmann Fix For: 1.11.0 Test Flink's new Docker image and the corresponding Dockerfile: * Try to build custom image * Try to run different Flink processes (Master (session, per-job), TaskManager) * Try custom configuration and log properties -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17975) Test ZooKeeper 3.5 support
Till Rohrmann created FLINK-17975: - Summary: Test ZooKeeper 3.5 support Key: FLINK-17975 URL: https://issues.apache.org/jira/browse/FLINK-17975 Project: Flink Issue Type: Task Components: Runtime / Coordination Affects Versions: 1.11.0 Reporter: Till Rohrmann Fix For: 1.11.0 Setup a Flink cluster with ZooKeeper 3.5 and run some HA tests on it (killing processes, failing jobs). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17976) Test native K8s integration
Till Rohrmann created FLINK-17976: - Summary: Test native K8s integration Key: FLINK-17976 URL: https://issues.apache.org/jira/browse/FLINK-17976 Project: Flink Issue Type: Task Components: Runtime / Coordination Affects Versions: 1.11.0 Reporter: Till Rohrmann Fix For: 1.11.0 Test Flink's native K8s integration: * session mode * application mode * custom Flink image * custom configuration and log properties -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17977) Check log sanity
Till Rohrmann created FLINK-17977: - Summary: Check log sanity Key: FLINK-17977 URL: https://issues.apache.org/jira/browse/FLINK-17977 Project: Flink Issue Type: Task Components: Runtime / Coordination Affects Versions: 1.11.0 Reporter: Till Rohrmann Fix For: 1.11.0 Run a normal Flink workload (e.g. job with fixed number of failures on session cluster) and check that the produced Flink logs make sense and don't contain confusing statements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17978) Test Hadoop dependency change
Till Rohrmann created FLINK-17978: - Summary: Test Hadoop dependency change Key: FLINK-17978 URL: https://issues.apache.org/jira/browse/FLINK-17978 Project: Flink Issue Type: Task Components: Runtime / Coordination Affects Versions: 1.11.0 Reporter: Till Rohrmann Fix For: 1.11.0 Test the Hadoop dependency change: * Run Flink with HBase/ORC (maybe add e2e test) * Validate meaningful exception message if Hadoop dependency is missing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17979) Support rescaling for Unaligned Checkpoints
Roman Khachatryan created FLINK-17979: - Summary: Support rescaling for Unaligned Checkpoints Key: FLINK-17979 URL: https://issues.apache.org/jira/browse/FLINK-17979 Project: Flink Issue Type: Improvement Components: Runtime / Checkpointing Reporter: Roman Khachatryan Fix For: 1.12.0 This is one of the limitations of Unaligned Checkpoints MVP. (see [Unaligned checkpoints: recovery & rescaling|https://docs.google.com/document/d/1T2WB163uf8xt6Eu2JS0Jyy2XZyF4YpnzGiHlo6twrks/edit?usp=sharing] for possible options) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17980) Simplify Flink Getting Started Material
Seth Wiesman created FLINK-17980: Summary: Simplify Flink Getting Started Material Key: FLINK-17980 URL: https://issues.apache.org/jira/browse/FLINK-17980 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Seth Wiesman With the addition of hands-on training, Flink contains too much overlapping material and is confusing for new users. We propose adding a Try / Learn reorganization. The updated structure should: * Remove the "Getting Started" section and replace it with "Try Flink" containing the walkthroughs and playground * Move project setup content to DataStream * Rename Hands-on Flink to Learn Flink -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17981) Add new Flink docs homepage content
Seth Wiesman created FLINK-17981: Summary: Add new Flink docs homepage content Key: FLINK-17981 URL: https://issues.apache.org/jira/browse/FLINK-17981 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Seth Wiesman Flink docs homepage requires a serious redesign to better guide users through the different sections of the documentation. This ticket is focused soley on updating the text. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17982) Finish Flink concepts docs
Seth Wiesman created FLINK-17982: Summary: Finish Flink concepts docs Key: FLINK-17982 URL: https://issues.apache.org/jira/browse/FLINK-17982 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Seth Wiesman The concepts section in the documentation contains several TODO's. These should be replaced with content. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17983) Add new Flink docs homepage (style)
Seth Wiesman created FLINK-17983: Summary: Add new Flink docs homepage (style) Key: FLINK-17983 URL: https://issues.apache.org/jira/browse/FLINK-17983 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Seth Wiesman The Flink documentation homepage is one of the first things users see when working with Flink. We should invest in making it more visually appealing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17984) Update Flink sidebar nav
Seth Wiesman created FLINK-17984: Summary: Update Flink sidebar nav Key: FLINK-17984 URL: https://issues.apache.org/jira/browse/FLINK-17984 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Seth Wiesman Flink's docs contain two distinct sections 1) Getting started material 2) Reference documenation This should be better differentiated in the side nav. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17985) Consolidate Connector Documentation
Seth Wiesman created FLINK-17985: Summary: Consolidate Connector Documentation Key: FLINK-17985 URL: https://issues.apache.org/jira/browse/FLINK-17985 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Seth Wiesman The connector documentation is a mess and needs to be reorganized. We should focus on organizing based on external system vs flink api and make formats a first class section. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Releasing Stateful Functions 2.1.0 soon?
Hi all, Quick update on progress: There is still ongoing work on adding state TTL for remote functions (FLINK-17875), but the PR is expected to be mergeable over the next day or two. Therefore the feature branch cut will be slightly delayed. I'll announce the cut in a separate email once it happens. Gordon On Tue, May 26, 2020, 1:32 AM Stephan Ewen wrote: > Nice work, thanks for pushing this, Gordon! > > +1 also from my side for a quick release. > > I think it already warrants a release to have the 1.10.1 upgrade and the > fix to not fail on savepoints that are triggered concurrently to a > checkpoint. > Even nicer that there are two cool new features included. > > On Fri, May 22, 2020 at 7:13 AM Tzu-Li (Gordon) Tai > wrote: > > > Thanks for the positive feedback so far. > > > > Lets then set the feature freeze date for Stateful Functions 2.1.0 to be > > next Wednesday (May 27th). > > > > We've made good progress over the past days, all mentioned features > merged > > besides the following: > > - https://issues.apache.org/jira/browse/FLINK-17875 State TTL support > for > > remote functions > > > > Will keep track of that and hopefully cut the feature branch as > scheduled. > > > > Cheers, > > Gordon > > > > On Thu, May 21, 2020 at 7:22 PM Yuan Mei wrote: > > > > > faster iteration definitely helps early-stage projects. > > > > > > +1 > > > > > > Best, > > > Yuan > > > > > > > > > On Thu, May 21, 2020 at 4:14 PM Congxian Qiu > > > wrote: > > > > > > > +1 from my side to have smaller and more frequent feature releases > for > > > the > > > > project in its early phases. > > > > > > > > Best, > > > > Congxian > > > > > > > > > > > > Marta Paes Moreira 于2020年5月21日周四 下午12:49写道: > > > > > > > > > +1 for more frequent releases with a shorter (but > feedback-informed) > > > > > feature set. > > > > > > > > > > Thanks, Gordon (and Igal)! > > > > > > > > > > Marta > > > > > > > > > > On Thu, 21 May 2020 at 03:44, Yu Li wrote: > > > > > > > > > > > +1, it makes a lot of sense for stateful functions to evolve > > faster. > > > > > > > > > > > > Best Regards, > > > > > > Yu > > > > > > > > > > > > > > > > > > On Wed, 20 May 2020 at 23:36, Zhijiang < > wangzhijiang...@aliyun.com > > > > > > .invalid> > > > > > > wrote: > > > > > > > > > > > > > I also like this idea, considering stateful functions flexible > > > enough > > > > > to > > > > > > > have a faster release cycle. +1 from my side. > > > > > > > > > > > > > > Best, > > > > > > > Zhijiang > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > From:Seth Wiesman > > > > > > > Send Time:2020年5月20日(星期三) 21:45 > > > > > > > To:dev > > > > > > > Subject:Re: [DISCUSS] Releasing Stateful Functions 2.1.0 soon? > > > > > > > > > > > > > > +1 for a fast release cycle > > > > > > > > > > > > > > Seth > > > > > > > > > > > > > > On Wed, May 20, 2020 at 8:43 AM Robert Metzger < > > > rmetz...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > I like the idea of releasing Statefun more frequently to have > > > > faster > > > > > > > > feedback cycles! > > > > > > > > > > > > > > > > No objections for releasing 2.1.0 from my side. > > > > > > > > > > > > > > > > On Wed, May 20, 2020 at 2:22 PM Tzu-Li (Gordon) Tai < > > > > > > tzuli...@apache.org > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi devs, > > > > > > > > > > > > > > > > > > Since Stateful Functions 2.0 was released early April, > > > > > > > > > we've been getting some good feedback from various > channels, > > > > > > > > > including the Flink mailing lists, JIRA issues, as well as > > > Stack > > > > > > > Overflow > > > > > > > > > questions. > > > > > > > > > > > > > > > > > > Some of the discussions have actually translated into new > > > > features > > > > > > > > > currently being implemented into the project, such as: > > > > > > > > > > > > > > > > > >- State TTL for the state primitives in Stateful > Functions > > > > (for > > > > > > both > > > > > > > > >embedded/remote functions) > > > > > > > > >- Transport for remote functions via UNIX domain > sockets, > > > > which > > > > > > > would > > > > > > > > be > > > > > > > > >useful when remote functions are co-located with Flink > > > > StateFun > > > > > > > > workers > > > > > > > > >(i.e. the "sidecar" deployment mode) > > > > > > > > > > > > > > > > > > > > > > > > > > > Besides that, some critical shortcomings have already been > > > > > addressed > > > > > > > > since > > > > > > > > > the last release: > > > > > > > > > > > > > > > > > >- After upgrading to Flink 1.10.1, failure recovery in > > > > Stateful > > > > > > > > >Functions now works properly with the new scheduler. > > > > > > > > >- Support for concurrent checkpoints > > > > > > > > > > > > > > > > > > > > > > > > > > > With these ongoing threads, while it's only been just short > > of > > > 2 > > > > > > months > > > > > > > > > s
[jira] [Created] (FLINK-17986) Erroneous check in FsCheckpointStateOutputStream#write(int)
Roman Khachatryan created FLINK-17986: - Summary: Erroneous check in FsCheckpointStateOutputStream#write(int) Key: FLINK-17986 URL: https://issues.apache.org/jira/browse/FLINK-17986 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing, Runtime / Task Affects Versions: 1.11.0, 1.12.0 Reporter: Roman Khachatryan Assignee: Roman Khachatryan Fix For: 1.11.0, 1.12.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Fwd: Season of Docs - Ex-participant Intro
Hi, Kartik! Sorry that you didn't get our replies to your self-presentation. You can find them below! Let me know if you have any questions. Marta -- Forwarded message - From: Marta Paes Moreira Date: Thu, May 14, 2020 at 8:37 AM Subject: Re: Season of Docs - Ex-participant Intro To: dev Cc: Hey, Kartik! Thanks for reaching out and sharing your experience on GSoD last year. Let us know if you have any questions about the project that we can help with before the applications phase kicks off! Marta On Wed, May 13, 2020 at 11:30 PM Seth Wiesman wrote: > Hey Kartik, > > Nice to connect, I am one of the GSoD mentors, along with Aljoscha in cc. > Happy to see you're interested in working with the community. > > Seth > > On Wed, May 13, 2020 at 4:10 PM Kartik Khare > wrote: > > > Hi, > > This is Kartik. I an genuinely interested in working with your > organisation > > for Google season of Docs. I work as data engineer @ walmart labs but > also > > write a lot of blogs on distributed systems on > > https://medium.com/@kharekartik. > > > > I contributed one on flink's website as well: > > > > > https://flink.apache.org/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html > > > > I participated as technical writer last year as well for Apache Airflow. > > You can check my project and journey here: > > > > > https://airflow.apache.org/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/ > > > > Let's keep in touch. > > > > > > Regards, > > Kartik Khare > > >
Re: Season of Docs - Ex-participant Intro
Hi, Accidentally hit on Single Reply button instead of Reply All. Are you able to see my previous mail now? On Wed, May 27, 2020 at 11:09 PM Marta Paes Moreira wrote: > Hi, Kartik! > > Sorry that you didn't get our replies to your self-presentation. You can > find them below! > > Let me know if you have any questions. > > Marta > > -- Forwarded message - > From: Marta Paes Moreira > Date: Thu, May 14, 2020 at 8:37 AM > Subject: Re: Season of Docs - Ex-participant Intro > To: dev > Cc: > > > Hey, Kartik! > > Thanks for reaching out and sharing your experience on GSoD last year. > > Let us know if you have any questions about the project that we can help > with before the applications phase kicks off! > > Marta > > On Wed, May 13, 2020 at 11:30 PM Seth Wiesman wrote: > >> Hey Kartik, >> >> Nice to connect, I am one of the GSoD mentors, along with Aljoscha in cc. >> Happy to see you're interested in working with the community. >> >> Seth >> >> On Wed, May 13, 2020 at 4:10 PM Kartik Khare >> wrote: >> >> > Hi, >> > This is Kartik. I an genuinely interested in working with your >> organisation >> > for Google season of Docs. I work as data engineer @ walmart labs but >> also >> > write a lot of blogs on distributed systems on >> > https://medium.com/@kharekartik. >> > >> > I contributed one on flink's website as well: >> > >> > >> https://flink.apache.org/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html >> > >> > I participated as technical writer last year as well for Apache Airflow. >> > You can check my project and journey here: >> > >> > >> https://airflow.apache.org/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/ >> > >> > Let's keep in touch. >> > >> > >> > Regards, >> > Kartik Khare >> > >> >
[jira] [Created] (FLINK-17987) KafkaITCase.testStartFromGroupOffsets fails with SchemaException: Error reading field
Robert Metzger created FLINK-17987: -- Summary: KafkaITCase.testStartFromGroupOffsets fails with SchemaException: Error reading field Key: FLINK-17987 URL: https://issues.apache.org/jira/browse/FLINK-17987 Project: Flink Issue Type: Bug Components: Connectors / Kafka, Tests Affects Versions: 1.12.0 Reporter: Robert Metzger https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2276&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=684b1416-4c17-504e-d5ab-97ee44e08a20 {code} 2020-05-27T13:05:24.5355101Z Test testStartFromGroupOffsets(org.apache.flink.streaming.connectors.kafka.KafkaITCase) failed with: 2020-05-27T13:05:24.5355935Z org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'api_versions': Error reading array of size 131084, only 12 bytes available 2020-05-27T13:05:24.5356501Zat org.apache.kafka.common.protocol.types.Schema.read(Schema.java:77) 2020-05-27T13:05:24.5356911Zat org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:308) 2020-05-27T13:05:24.5357350Zat org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:152) 2020-05-27T13:05:24.5357838Zat org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:687) 2020-05-27T13:05:24.5358333Zat org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:811) 2020-05-27T13:05:24.5358840Zat org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:544) 2020-05-27T13:05:24.5359297Zat org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:265) 2020-05-27T13:05:24.5359832Zat org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236) 2020-05-27T13:05:24.5360659Zat org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:227) 2020-05-27T13:05:24.5361292Zat org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161) 2020-05-27T13:05:24.5361885Zat org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:245) 2020-05-27T13:05:24.5362454Zat org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:657) 2020-05-27T13:05:24.5363089Zat org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1425) 2020-05-27T13:05:24.5363558Zat org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1384) 2020-05-27T13:05:24.5364130Zat org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl$KafkaOffsetHandlerImpl.setCommittedOffset(KafkaTestEnvironmentImpl.java:444) 2020-05-27T13:05:24.5365027Zat org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase.runStartFromGroupOffsets(KafkaConsumerTestBase.java:554) 2020-05-27T13:05:24.5365596Zat org.apache.flink.streaming.connectors.kafka.KafkaITCase.testStartFromGroupOffsets(KafkaITCase.java:158) 2020-05-27T13:05:24.5366035Zat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2020-05-27T13:05:24.5366425Zat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 2020-05-27T13:05:24.5366871Zat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2020-05-27T13:05:24.5367285Zat java.lang.reflect.Method.invoke(Method.java:498) 2020-05-27T13:05:24.5367675Zat org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) 2020-05-27T13:05:24.5368142Zat org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) 2020-05-27T13:05:24.5368655Zat org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) 2020-05-27T13:05:24.5369103Zat org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) 2020-05-27T13:05:24.5369590Zat org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) 2020-05-27T13:05:24.5370094Zat org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) 2020-05-27T13:05:24.5370543Zat java.util.concurrent.FutureTask.run(FutureTask.java:266) 2020-05-27T13:05:24.5370947Zat java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17988) Checkpointing slows down after reaching state.checkpoints.num-retained
Roman Khachatryan created FLINK-17988: - Summary: Checkpointing slows down after reaching state.checkpoints.num-retained Key: FLINK-17988 URL: https://issues.apache.org/jira/browse/FLINK-17988 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.11.0 Reporter: Roman Khachatryan Assignee: Roman Khachatryan -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17989) java.lang.NoClassDefFoundError: org/apache/flink/fs/azure/common/hadoop/HadoopRecoverableWriter
Israel Ekpo created FLINK-17989: --- Summary: java.lang.NoClassDefFoundError: org/apache/flink/fs/azure/common/hadoop/HadoopRecoverableWriter Key: FLINK-17989 URL: https://issues.apache.org/jira/browse/FLINK-17989 Project: Flink Issue Type: Bug Components: Connectors / FileSystem, FileSystems Affects Versions: 1.10.1, 1.9.3 Environment: Ubuntu 18 Java 1.8 Flink 1.9.x and 1.10.x Reporter: Israel Ekpo Fix For: 1.10.1, 1.9.3 In the POM.xml classes from certain packages are relocated and filtered out of the final jar In the POM.xml classes from certain packages are relocated and filtered out of the final jar This is causing errors for customers and users that are using the StreamingFileSink with Azure Blob Storage in Flink version 1.9.x, 1.10.x and possibly 1.11.x https://github.com/apache/flink/blob/release-1.9/flink-filesystems/flink-azure-fs-hadoop/pom.xml#L170https://github.com/apache/flink/blob/release-1.9/flink-filesystems/flink-fs-hadoop-shaded/pom.xml#L233 I would like to know why the relocation is happening and the reasoning behind the exclusion and filtering of the classes It seems to affect just the Azure file systems in my sample implementations {code:java} String outputPath = "wasbs://contai...@account.blob.core.windows.net/streaming-output/"; final StreamingFileSink sink = StreamingFileSink .forRowFormat(new Path(outputPath), new SimpleStringEncoder("UTF-8")) .build(); stream.addSink(sink); // execute programenv.execute(StreamingJob.class.getCanonicalName()); {code} 2020-05-27 17:23:16java.lang.NoClassDefFoundError: org/apache/flink/fs/azure/common/hadoop/HadoopRecoverableWriter at org.apache.flink.fs.azure.common.hadoop.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202) at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69) at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.(Buckets.java:112) at org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink$RowFormatBuilder.createBuckets(StreamingFileSink.java:242) at org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.initializeState(StreamingFileSink.java:327) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:281) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.ClassNotFoundException: org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at org.apache.flink.util.ChildFirstClassLoader.loadClass(ChildFirstClassLoader.java:60) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter
Some users are running into issues when using Azure Blob Storage for the StreamFileSink https://issues.apache.org/jira/browse/FLINK-17989 The issue is because certain packages are relocated in the POM file and some classes are dropped in the final shaded jar I have attempted to comment out the relocated and recompile the source but I keep hitting roadblocks of other relocation and filtration each time I update a specific pom file How can this be addressed so that these users can be unblocked? Why are the classes filtered out? What is the workaround? I can work on the patch if I have some guidance. This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same issue but I am yet to confirm Thanks.
Re: [DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter
You can assign the task to me and I will like to collaborate with someone to fix it. On Wed, May 27, 2020 at 5:52 PM Israel Ekpo wrote: > Some users are running into issues when using Azure Blob Storage for the > StreamFileSink > > https://issues.apache.org/jira/browse/FLINK-17989 > > The issue is because certain packages are relocated in the POM file and > some classes are dropped in the final shaded jar > > I have attempted to comment out the relocated and recompile the source but > I keep hitting roadblocks of other relocation and filtration each time I > update a specific pom file > > How can this be addressed so that these users can be unblocked? Why are > the classes filtered out? What is the workaround? I can work on the patch > if I have some guidance. > > This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same > issue but I am yet to confirm > > Thanks. > > >
Re: [DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter
Hi, I think the StreamingFileSink could not support Azure currently. You could find more detailed info from here[1]. [1] https://issues.apache.org/jira/browse/FLINK-17444 Best, Guowei Israel Ekpo 于2020年5月28日周四 上午6:04写道: > You can assign the task to me and I will like to collaborate with someone > to fix it. > > On Wed, May 27, 2020 at 5:52 PM Israel Ekpo wrote: > >> Some users are running into issues when using Azure Blob Storage for the >> StreamFileSink >> >> https://issues.apache.org/jira/browse/FLINK-17989 >> >> The issue is because certain packages are relocated in the POM file and >> some classes are dropped in the final shaded jar >> >> I have attempted to comment out the relocated and recompile the source >> but I keep hitting roadblocks of other relocation and filtration each time >> I update a specific pom file >> >> How can this be addressed so that these users can be unblocked? Why are >> the classes filtered out? What is the workaround? I can work on the patch >> if I have some guidance. >> >> This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same >> issue but I am yet to confirm >> >> Thanks. >> >> >> >
[jira] [Created] (FLINK-17990) ArrowSourceFunctionTestBase.testParallelProcessing is instable
Dian Fu created FLINK-17990: --- Summary: ArrowSourceFunctionTestBase.testParallelProcessing is instable Key: FLINK-17990 URL: https://issues.apache.org/jira/browse/FLINK-17990 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.11.0 Reporter: Dian Fu Assignee: Dian Fu It failed on cron JDK 11 tests with following error: {code} org.apache.flink.table.runtime.arrow.sources.ArrowSourceFunctionTest 2020-05-27T21:37:21.2840832Z [ERROR] testParallelProcessing(org.apache.flink.table.runtime.arrow.sources.ArrowSourceFunctionTest) Time elapsed: 0.433 s <<< FAILURE! 2020-05-27T21:37:21.2841551Z java.lang.AssertionError: expected:<4> but was:<5> 2020-05-27T21:37:21.2842025Z at org.junit.Assert.fail(Assert.java:88) 2020-05-27T21:37:21.2842470Z at org.junit.Assert.failNotEquals(Assert.java:834) 2020-05-27T21:37:21.2842994Z at org.junit.Assert.assertEquals(Assert.java:645) 2020-05-27T21:37:21.2843484Z at org.junit.Assert.assertEquals(Assert.java:631) 2020-05-27T21:37:21.2844779Z at org.apache.flink.table.runtime.arrow.sources.ArrowSourceFunctionTestBase.checkElementsEquals(ArrowSourceFunctionTestBase.java:243) 2020-05-27T21:37:21.2845873Z at org.apache.flink.table.runtime.arrow.sources.ArrowSourceFunctionTestBase.testParallelProcessing(ArrowSourceFunctionTestBase.java:233) {code} instance: [https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_apis/build/builds/2304/logs/535] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17991) Add JdbcAsyncLookupFunction
底限 created FLINK-17991: -- Summary: Add JdbcAsyncLookupFunction Key: FLINK-17991 URL: https://issues.apache.org/jira/browse/FLINK-17991 Project: Flink Issue Type: New Feature Components: Connectors / JDBC Affects Versions: 1.12.0 Reporter: 底限 It is very useful to support JdbcAsyncLookupFunction, user can use this to async lookup dimension fields in his streaming job. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17992) Exception from RemoteInputChannel#onBuffer should not fail the whole NetworkClientHandler
Zhijiang created FLINK-17992: Summary: Exception from RemoteInputChannel#onBuffer should not fail the whole NetworkClientHandler Key: FLINK-17992 URL: https://issues.apache.org/jira/browse/FLINK-17992 Project: Flink Issue Type: Bug Components: Runtime / Network Affects Versions: 1.10.1, 1.10.0 Reporter: Zhijiang Assignee: Zhijiang Fix For: 1.11.0 RemoteInputChannel#onBuffer is invoked by CreditBasedPartitionRequestClientHandler while receiving and decoding the network data. #onBuffer can throw exceptions which would tag the error in client handler and fail all the added input channels inside handler. Then it would cause a tricky potential issue as following. If the RemoteInputChannel is canceling by canceler thread, then the task thread might exit early than canceler thread terminate. That means the PartitionRequestClient might not be closed (triggered by canceler thread) while the new task attempt is already deployed into this TaskManger. Therefore the new task might reuse the previous PartitionRequestClient while requesting partitions, but note that the respective client handler was already tagged an error before during above RemoteInputChannel#onBuffer. It will cause the next round unnecessary failover. It is hard to find this potential issue in production because it can be restored normal finally after one or more additional failover. We find this potential problem from UnalignedCheckpointITCase because it will define the precise restart times within configured failures. The solution is to only fail the respective task when its internal RemoteInputChannel#onBuffer throws any exceptions instead of failing the whole channels inside client handler, then the client is still health and can also be reused by other input channels as long as it is not released yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17993) read schema from orc file
dingli123 created FLINK-17993: - Summary: read schema from orc file Key: FLINK-17993 URL: https://issues.apache.org/jira/browse/FLINK-17993 Project: Flink Issue Type: Improvement Components: Connectors / ORC Reporter: dingli123 now , if we use OrcTableSource , we must provide schema info. In fact, the orc file already store schema info in file. (hive --orcfiledump can see the schema info) so OrcTableSource can read schema info from orc file -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17994) Fix the race condition between CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceived
Zhijiang created FLINK-17994: Summary: Fix the race condition between CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceived Key: FLINK-17994 URL: https://issues.apache.org/jira/browse/FLINK-17994 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Reporter: Zhijiang Assignee: Zhijiang Fix For: 1.11.0 The race condition issue happens as follow: * ch1 is received from network by netty thread and schedule the ch1 into mailbox via #notifyBarrierReceived * ch2 is received from network by netty thread, but before calling #notifyBarrierReceived this barrier was inserted into channel's data queue in advance. Then it would cause task thread process ch2 earlier than #notifyBarrierReceived by netty thread. * Task thread would execute checkpoint for ch2 directly because ch2 > ch1. * After that, the previous scheduled ch1 is performed from mailbox by task thread, then it causes the IllegalArgumentException inside SubtaskCheckpointCoordinatorImpl#checkpointState because it breaks the assumption that checkpoint is executed in incremental way. One possible solution for this race condition is inserting the received barrier into channel's data queue after calling #notifyBarrierReceived, then we can make the assumption that the checkpoint is always triggered by netty thread, to simplify the current situation that checkpoint might be triggered either by task thread or netty thread. To do so we can also avoid accessing #notifyBarrierReceived method by task thread while processing the barrier to simplify the logic inside CheckpointBarrierUnaligner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Introduce a new module 'flink-hadoop-utils'
Hi Sivaprasanna, thanks a lot for your proposal. Now that I ran into a HadoopUtils-related issue myself [1] I see the benefit in this proposal. I'm happy to be the Flink committer that mentors this change. If we do this, I would like to have a small scope for the initial change: - create a "flink-hadoop-utils" module - move generic, common utils into that module (for example SerializableHadoopConfiguration) I agree with Till that we should initially leave out the Hadoop compatibility modules. You can go ahead with filing a JIRA ticket! Let's discuss the exact scope there. [1] https://github.com/apache/flink/pull/12146 On Thu, Apr 30, 2020 at 6:54 PM Sivaprasanna wrote: > Bump. > > Please let me know, if someone is interested in reviewing this one. I am > willing to start working on this. BTW, a small and new addition to the > list: With FLINK-10114 merged, OrcBulkWriterFactory can also reuse > `SerializableHadoopConfiguration` along with SequenceFileWriterFactory and > CompressWriterFactory. > > CC - Kostas Kloudas since he has a better understanding on the > `SerializableHadoopConfiguration.` > > Cheers, > Sivaprasanna > > On Mon, Mar 30, 2020 at 3:17 PM Chesnay Schepler > wrote: > > > I would recommend to wait until a committer has signed up for reviewing > > your changes before preparing any PR. > > Otherwise the chances are high that you invest a lot of time but the > > changes never get in. > > > > On 30/03/2020 11:42, Sivaprasanna wrote: > > > Hello Till, > > > > > > I agree with having the scope limited and more concentrated. I can > file a > > > Jira and get started with the code changes, as and when someone has > some > > > bandwidth, the review can also be done. What do you think? > > > > > > Cheers, > > > Sivaprasanna > > > > > > On Mon, Mar 30, 2020 at 3:00 PM Till Rohrmann > > wrote: > > > > > >> Hi Sivaprasanna, > > >> > > >> thanks for starting this discussion. In general I like the idea to > > remove > > >> duplications and move common code to a shared module. As a > > recommendation, > > >> I would exclude the whole part about Flink's Hadoop compatibility > > modules > > >> because they are legacy code and hardly used anymore. This would also > > have > > >> the benefit of making the scope of the proposal a bit smaller. > > >> > > >> What we now need is a committer who wants to help with this effort. It > > >> might be that this takes a bit of time as many of the committers are > > quite > > >> busy. > > >> > > >> Cheers, > > >> Till > > >> > > >> On Thu, Mar 19, 2020 at 2:15 PM Sivaprasanna < > sivaprasanna...@gmail.com > > > > > >> wrote: > > >> > > >>> Hi, > > >>> > > >>> Continuing on an earlier discussion[1] regarding having a separate > > module > > >>> for Hadoop related utility components, I have gone through our > project > > >>> briefly and found the following components which I feel could be > moved > > >> to a > > >>> separate module for reusability, and better module structure. > > >>> > > >>> Module Name Class Name Used at / Remarks > > >>> > > >>> flink-hadoop-fs > > >>> flink.runtime.util.HadoopUtils > > >>> flink-runtime => HadoopModule & HadoopModuleFactory > > >>> flink-swift-fs-hadoop => SwiftFileSystemFactory > > >>> flink-yarn => Utils, YarnClusterDescriptor > > >>> > > >>> flink-hadoop-compatability > > >>> api.java.hadoop.mapred.utils.HadoopUtils > > >>> Both belong to the same module but with different packages > > >>> (api.java.hadoop.mapred and api.java.hadoop.mapreduce) > > >>> api.java.hadoop.mapreduce.utils.HadoopUtils > > >>> flink-sequeunce-file > > >>> formats.sequeuncefile.SerializableHadoopConfiguration Currently, > > >>> it is used at formats.sequencefile.SequenceFileWriterFactory but can > > also > > >>> be used at HadoopCompressionBulkWriter, a potential OrcBulkWriter and > > >>> pretty much everywhere to avoid NotSerializableException. > > >>> > > >>> *Proposal* > > >>> To summarise, I believe we can create a new module > (flink-hadoop-utils > > ?) > > >>> and move these reusable components to this new module which will have > > an > > >>> optional/provided dependency on flink-shaded-hadoop-2. > > >>> > > >>> *Structure* > > >>> In the present form, I think we will have two classes with the > > packaging > > >>> structure being *org.apache.flink.hadoop.[utils/serialization]* > > >>> 1. HadoopUtils with all static methods ( after combining and > > eliminating > > >>> the duplicate code fragments from the three HadoopUtils classes > > mentioned > > >>> above) > > >>> 2. Move the existing SerializableHadoopConfiguration from the > > >>> flink-sequence-file to this new module . > > >>> > > >>> *Justification* > > >>> * With this change, we would be stripping away the dependency on > > >>> flink-hadoop-fs from flink-runtime as I don't see any other classes > > from > > >>> flink-hadoop-fs is being used anywhere in flink-runtime module. > > >>> * We will have a common place where all the utilities related to > Hadoop > > >> can > > >>> go which
Re: [DISCUSS] Introduce a new module 'flink-hadoop-utils'
Awesome. : ) Thanks, Robert for signing up to be the reviewer. I will create Jira and share the link here. Stay safe. - Sivaprasanna On Thu, May 28, 2020 at 12:13 PM Robert Metzger wrote: > Hi Sivaprasanna, > > thanks a lot for your proposal. Now that I ran into a HadoopUtils-related > issue myself [1] I see the benefit in this proposal. > > I'm happy to be the Flink committer that mentors this change. If we do > this, I would like to have a small scope for the initial change: > - create a "flink-hadoop-utils" module > - move generic, common utils into that module (for example > SerializableHadoopConfiguration) > > I agree with Till that we should initially leave out the Hadoop > compatibility modules. > > You can go ahead with filing a JIRA ticket! Let's discuss the exact scope > there. > > > [1] https://github.com/apache/flink/pull/12146 > > > On Thu, Apr 30, 2020 at 6:54 PM Sivaprasanna > wrote: > > > Bump. > > > > Please let me know, if someone is interested in reviewing this one. I am > > willing to start working on this. BTW, a small and new addition to the > > list: With FLINK-10114 merged, OrcBulkWriterFactory can also reuse > > `SerializableHadoopConfiguration` along with SequenceFileWriterFactory > and > > CompressWriterFactory. > > > > CC - Kostas Kloudas since he has a better understanding on the > > `SerializableHadoopConfiguration.` > > > > Cheers, > > Sivaprasanna > > > > On Mon, Mar 30, 2020 at 3:17 PM Chesnay Schepler > > wrote: > > > > > I would recommend to wait until a committer has signed up for reviewing > > > your changes before preparing any PR. > > > Otherwise the chances are high that you invest a lot of time but the > > > changes never get in. > > > > > > On 30/03/2020 11:42, Sivaprasanna wrote: > > > > Hello Till, > > > > > > > > I agree with having the scope limited and more concentrated. I can > > file a > > > > Jira and get started with the code changes, as and when someone has > > some > > > > bandwidth, the review can also be done. What do you think? > > > > > > > > Cheers, > > > > Sivaprasanna > > > > > > > > On Mon, Mar 30, 2020 at 3:00 PM Till Rohrmann > > > wrote: > > > > > > > >> Hi Sivaprasanna, > > > >> > > > >> thanks for starting this discussion. In general I like the idea to > > > remove > > > >> duplications and move common code to a shared module. As a > > > recommendation, > > > >> I would exclude the whole part about Flink's Hadoop compatibility > > > modules > > > >> because they are legacy code and hardly used anymore. This would > also > > > have > > > >> the benefit of making the scope of the proposal a bit smaller. > > > >> > > > >> What we now need is a committer who wants to help with this effort. > It > > > >> might be that this takes a bit of time as many of the committers are > > > quite > > > >> busy. > > > >> > > > >> Cheers, > > > >> Till > > > >> > > > >> On Thu, Mar 19, 2020 at 2:15 PM Sivaprasanna < > > sivaprasanna...@gmail.com > > > > > > > >> wrote: > > > >> > > > >>> Hi, > > > >>> > > > >>> Continuing on an earlier discussion[1] regarding having a separate > > > module > > > >>> for Hadoop related utility components, I have gone through our > > project > > > >>> briefly and found the following components which I feel could be > > moved > > > >> to a > > > >>> separate module for reusability, and better module structure. > > > >>> > > > >>> Module Name Class Name Used at / Remarks > > > >>> > > > >>> flink-hadoop-fs > > > >>> flink.runtime.util.HadoopUtils > > > >>> flink-runtime => HadoopModule & HadoopModuleFactory > > > >>> flink-swift-fs-hadoop => SwiftFileSystemFactory > > > >>> flink-yarn => Utils, YarnClusterDescriptor > > > >>> > > > >>> flink-hadoop-compatability > > > >>> api.java.hadoop.mapred.utils.HadoopUtils > > > >>> Both belong to the same module but with different packages > > > >>> (api.java.hadoop.mapred and api.java.hadoop.mapreduce) > > > >>> api.java.hadoop.mapreduce.utils.HadoopUtils > > > >>> flink-sequeunce-file > > > >>> formats.sequeuncefile.SerializableHadoopConfiguration Currently, > > > >>> it is used at formats.sequencefile.SequenceFileWriterFactory but > can > > > also > > > >>> be used at HadoopCompressionBulkWriter, a potential OrcBulkWriter > and > > > >>> pretty much everywhere to avoid NotSerializableException. > > > >>> > > > >>> *Proposal* > > > >>> To summarise, I believe we can create a new module > > (flink-hadoop-utils > > > ?) > > > >>> and move these reusable components to this new module which will > have > > > an > > > >>> optional/provided dependency on flink-shaded-hadoop-2. > > > >>> > > > >>> *Structure* > > > >>> In the present form, I think we will have two classes with the > > > packaging > > > >>> structure being *org.apache.flink.hadoop.[utils/serialization]* > > > >>> 1. HadoopUtils with all static methods ( after combining and > > > eliminating > > > >>> the duplicate code fragments from the three HadoopUtils classes > > > mentioned > > > >>> above) > > > >>> 2. Move the exist