[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend
flinkbot edited a comment on pull request #19335: URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086 ## CI report: * f5f269e5c65f55d2d132677f3ce23c148c00bcf4 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34149) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-27022) Expose taskSlots in TaskManager spec
[ https://issues.apache.org/jira/browse/FLINK-27022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516461#comment-17516461 ] Gyula Fora commented on FLINK-27022: Yea maybe you are right with my logic we could start adding everything which obviously doesnt make sense... it just always annoys me a little :D > Expose taskSlots in TaskManager spec > > > Key: FLINK-27022 > URL: https://issues.apache.org/jira/browse/FLINK-27022 > Project: Flink > Issue Type: Sub-task > Components: Kubernetes Operator >Reporter: Gyula Fora >Priority: Major > > Basically every Flink job needs the number of taskslots configuration and it > is an important part of the resource spec. > We should include this in the taskmanager spec as an integer parameter. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27029) DeploymentValidator should take default flink config into account during validation
Gyula Fora created FLINK-27029: -- Summary: DeploymentValidator should take default flink config into account during validation Key: FLINK-27029 URL: https://issues.apache.org/jira/browse/FLINK-27029 Project: Flink Issue Type: Bug Components: Kubernetes Operator Reporter: Gyula Fora Fix For: kubernetes-operator-1.0.0 Currently the DefaultDeploymentValidator only takes the FlinkDeployment object into account. However in places where we validate the presence of config keys we should also consider the default flink config which might already provide default values for the required configs even if the deployment itself doesnt. We should make sure this works correctly both in the operator and the webhook -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27005) Bump CRD version to v1alpha2
[ https://issues.apache.org/jira/browse/FLINK-27005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516455#comment-17516455 ] Gyula Fora commented on FLINK-27005: I am wondering whether we should keep the v1alpha1 classes. For stable versions this would be a requirement, but I think for alpha versions we can simply replace them (bump version and "delete" the old one). This will avoid a lot of garbage accumulating in the repo for the alpha iterations. What do you think [~matyas] ? > Bump CRD version to v1alpha2 > > > Key: FLINK-27005 > URL: https://issues.apache.org/jira/browse/FLINK-27005 > Project: Flink > Issue Type: Sub-task > Components: Kubernetes Operator >Reporter: Gyula Fora >Priority: Blocker > > We should upgrade the CRD version to v1alpha2 for both FlinkDeployment and > FlinkSessionJob to avoid any conflicts with the preview release. > We should also upgrade the version in all examples and documentation that > references it. > We can also consider introducing some tooling to make this easier as we will > have to repeate this step at least a few more times. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27028) Support to upload jar and run jar in RestClusterClient
[ https://issues.apache.org/jira/browse/FLINK-27028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516448#comment-17516448 ] Aitozi commented on FLINK-27028: cc [~wangyang0918] [~chesnay] If no objection, I'm willing to open a pull request for this. > Support to upload jar and run jar in RestClusterClient > -- > > Key: FLINK-27028 > URL: https://issues.apache.org/jira/browse/FLINK-27028 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Reporter: Aitozi >Priority: Major > > The {{flink-kubernetes-operator}} is using the JarUpload + JarRun to support > the session job submission. However, currently the RestClusterClient do not > expose a way to upload the user jar to session cluster and trigger the jar > run api. So a naked RestClient is used to achieve this, but it lacks the > common retry logic. > Can we expose these two api the the rest cluster client to make it more > convenient to use in the operator -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-27028) Support to upload jar and run jar in RestClusterClient
[ https://issues.apache.org/jira/browse/FLINK-27028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aitozi updated FLINK-27028: --- Description: The {{flink-kubernetes-operator}} is using the JarUpload + JarRun to support the session job submission. However, currently the RestClusterClient do not expose a way to upload the user jar to session cluster and trigger the jar run api. So a naked RestClient is used to achieve this, but it lacks the common retry logic. Can we expose these two api the the rest cluster client to make it more convenient to use in the operator was: The flink-kubernetes-operator is using the JarUpload + JarRun to support the session job management. However, currently the RestClusterClient do not expose a way to upload the user jar to session cluster and trigger the jar run api. So I used to naked RestClient to achieve this. Can we expose these two api the the rest cluster client to make it more convenient to use in the operator > Support to upload jar and run jar in RestClusterClient > -- > > Key: FLINK-27028 > URL: https://issues.apache.org/jira/browse/FLINK-27028 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Reporter: Aitozi >Priority: Major > > The {{flink-kubernetes-operator}} is using the JarUpload + JarRun to support > the session job submission. However, currently the RestClusterClient do not > expose a way to upload the user jar to session cluster and trigger the jar > run api. So a naked RestClient is used to achieve this, but it lacks the > common retry logic. > Can we expose these two api the the rest cluster client to make it more > convenient to use in the operator -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27028) Support to upload jar and run jar in RestClusterClient
Aitozi created FLINK-27028: -- Summary: Support to upload jar and run jar in RestClusterClient Key: FLINK-27028 URL: https://issues.apache.org/jira/browse/FLINK-27028 Project: Flink Issue Type: Improvement Components: Client / Job Submission Reporter: Aitozi The flink-kubernetes-operator is using the JarUpload + JarRun to support the session job management. However, currently the RestClusterClient do not expose a way to upload the user jar to session cluster and trigger the jar run api. So I used to naked RestClient to achieve this. Can we expose these two api the the rest cluster client to make it more convenient to use in the operator -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend
flinkbot edited a comment on pull request #19335: URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086 ## CI report: * f5f269e5c65f55d2d132677f3ce23c148c00bcf4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34149) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-26845) Document testing guidelines
[ https://issues.apache.org/jira/browse/FLINK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516446#comment-17516446 ] Salva commented on FLINK-26845: --- Sounds good [~trohrmann], I could start looking at the different SDKs once the tickets are created. Also agree on having a separate one for the e2e part. > Document testing guidelines > --- > > Key: FLINK-26845 > URL: https://issues.apache.org/jira/browse/FLINK-26845 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Salva >Priority: Minor > > As of now, there seems to be not much guidance on how to approach testing. > Although it's true [~sjwiesman] that > {quote}Unlike other flink apis, the sdk doesn’t pull in the runtime so > testing should really look like any other code. There isn’t much statefun > specific. > {quote} > I think that at the very least testing should be mentioned as part of the > docs, even if it is only to stress the above fact. Indeed, as reported by > [~trohrmann], there seems that there are already some test utilities in place > but they are not well-documented, plus some potential ideas on how to improve > on that front > {quote}Testing tools is definitely one area where we want to improve > significantly. Also the documentation for how to do things needs to be > updated. There are some testing utils that you can already use today: > [https://github.com/apache/flink-statefun/blob/master/statefun-sdk-java/src/main/java/org/apache/flink/statefun/sdk/java/testing/TestContext.java…|https://t.co/WGjyMA510b]. > However, it is not well documented. > {quote} > Once the overall guidelines are in place for different testing strategies, > tests could be added to the examples in the > {_}[playground|https://github.com/apache/flink-statefun-playground]{_}. > {_}Note{_}: Issue originally reported in Twitter: > [https://twitter.com/salvalcantara/status/1505834101026267136?s=20&t=Go2IHP6iP4ZmIyVLmIeD3g] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend
flinkbot edited a comment on pull request #19335: URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086 ## CI report: * f5f269e5c65f55d2d132677f3ce23c148c00bcf4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34149) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] Myasuka commented on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend
Myasuka commented on pull request #19335: URL: https://github.com/apache/flink/pull/19335#issuecomment-1086779793 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-27001) Support to specify the resource of the operator
[ https://issues.apache.org/jira/browse/FLINK-27001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516445#comment-17516445 ] Aitozi commented on FLINK-27001: It seems this can also implement by https://issues.apache.org/jira/browse/FLINK-26663 So I will not work on this right now, I will keep an eye on this. > Support to specify the resource of the operator > > > Key: FLINK-27001 > URL: https://issues.apache.org/jira/browse/FLINK-27001 > Project: Flink > Issue Type: Sub-task > Components: Kubernetes Operator >Reporter: Aitozi >Priority: Major > > Supporting to specify the operator resource requirements and limits -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-27000) Support to set JVM args for operator
[ https://issues.apache.org/jira/browse/FLINK-27000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-27000: --- Labels: pull-request-available (was: ) > Support to set JVM args for operator > > > Key: FLINK-27000 > URL: https://issues.apache.org/jira/browse/FLINK-27000 > Project: Flink > Issue Type: Sub-task > Components: Kubernetes Operator >Reporter: Aitozi >Priority: Major > Labels: pull-request-available > > In production we often need to set the JVM option to operator -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841158223 ## File path: docs/content/docs/development/write-table.md ## @@ -0,0 +1,190 @@ +--- +title: "Write Table" +weight: 3 +type: docs +aliases: +- /development/write-table.html +--- + + +# Write Table + +```sql +INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name + [PARTITION part_spec] [column_list] select_statement + +part_spec: + (part_col_name1=val1 [, part_col_name2=val2, ...]) + +column_list: + (col_name1 [, column_name2, ...]) +``` + +## Unify Streaming and Batch Review comment: Let's spend more effort on polishing the user story. The current expression is a bit colloquial, not enough written and formal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841157829 ## File path: docs/content/docs/development/write-table.md ## @@ -0,0 +1,190 @@ +--- +title: "Write Table" +weight: 3 +type: docs +aliases: +- /development/write-table.html +--- + + +# Write Table + +```sql +INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name + [PARTITION part_spec] [column_list] select_statement + +part_spec: + (part_col_name1=val1 [, part_col_name2=val2, ...]) + +column_list: + (col_name1 [, column_name2, ...]) +``` + +## Unify Streaming and Batch + +Table Store data writing supports Flink's batch and streaming modes. +Moreover, Table Store supports simultaneous writing of streaming and batch: + +```sql +-- a managed table ddl +CREATE TABLE MyTable ( + user_id BIGINT, + item_id BIGINT, + dt STRING +) PARTITIONED BY (dt); + +-- Run a stream job that continuously writes to the table +SET 'execution.runtime-mode' = 'streaming'; +INSERT INTO MyTable SELECT ... FROM MyCdcTable; +``` + +You have this partitioned table, and now there is a stream job that +is continuously writing data. + +After a few days, you find that there is a problem with yesterday's +partition data, the data in the partition is wrong, you need to +recalculate and revise the partition. + +```sql +-- Run a batch job to revise yesterday's partition +SET 'execution.runtime-mode' = 'batch'; +INSERT OVERWRITE MyTable PARTITION ('dt'='20220402') + SELECT ... FROM SourceTable WHERE dt = '20220402'; +``` + +This way you revise yesterday's partition without suspending the streaming job. + +{{< hint info >}} +__Note:__ Multiple jobs writing to a single partition at the same time is +not supported. The behavior does not result in data errors, but can lead +to job failover. +{{< /hint >}} + +## Parallelism + +It is recommended that the parallelism of sink should be less than or +equal to the number of buckets, preferably equal. You can control the +parallelism of the sink with the `sink.parallelism` option. + + + + + Option + Required + Default + Type + Description + + + + + sink.parallelism + No + (none) + Integer + Defines the parallelism of the sink operator. By default, the parallelism is determined by the framework using the same parallelism of the upstream chained operator. + + + + +## Expire Snapshot + +Table Store generates one or two snapshots per commit. To avoid too many snapshots +that create a lot of small files and redundant storage, Table Store write defaults +to eliminating expired snapshots, controlled by the following options: + + + + + Option + Required + Default + Type + Description + + + + + snapshot.time-retained + No + 1 h + Duration + The maximum time of completed snapshots to retain. + + + snapshot.num-retained + No + Integer.MAX_VALUE + Integer + The maximum number of completed snapshots to retain. + + + + +Please note that too short retain time or too small retain number may result in: +- Batch query cannot find the file. For example, the table is relatively large and + the batch query takes 10 minutes to read, but the snapshot from 10 minutes ago + expires, at which point the batch query will read a deleted snapshot. +- Continuous reading jobs on FileStore (Without Log System) fail to restart. At the + time of the job failover, the snapshot it recorded has expired. + +## Performance + +Table Store uses LSM data structure, which itself has the ability to support a large +number of updates. Update performance and query performance is a tradeoff, the +following parameters control this tradeoff: + + + + + Option + Required + Default + Type + Description + + + + + num-sorted-run.max + No + 5 + Integer + The max sorted run number. Includes level0 files (one file one sorted run) and high-level runs (one level one sorted run). + + + + +- The larger `num-sorted-run.max`: the less merge cost when updating data, which + can avoid many invalid merges. However, if this value is too large, more memory + will be needed when merging files, because each FileReader will take up a lot + of memory. +- Smaller `num-sorted-run.max`: better performance when querying, fewer files + will be merged. Review comment: Nit: Keep consistency - The larger `num-sorted-run.max`, the less merge cost when updating data, which can avoid many invalid merges. However, if this value is too large, more memory will be needed when merging files because each FileReader will take up a lot of memory. - The smaller `num-sorted-run.max`, the better performance
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841157936 ## File path: docs/content/docs/development/write-table.md ## @@ -0,0 +1,190 @@ +--- +title: "Write Table" +weight: 3 +type: docs +aliases: +- /development/write-table.html +--- + + +# Write Table + +```sql +INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name + [PARTITION part_spec] [column_list] select_statement + +part_spec: + (part_col_name1=val1 [, part_col_name2=val2, ...]) + +column_list: + (col_name1 [, column_name2, ...]) +``` + +## Unify Streaming and Batch + +Table Store data writing supports Flink's batch and streaming modes. +Moreover, Table Store supports simultaneous writing of streaming and batch: + +```sql +-- a managed table ddl +CREATE TABLE MyTable ( + user_id BIGINT, + item_id BIGINT, + dt STRING +) PARTITIONED BY (dt); + +-- Run a stream job that continuously writes to the table +SET 'execution.runtime-mode' = 'streaming'; +INSERT INTO MyTable SELECT ... FROM MyCdcTable; +``` + +You have this partitioned table, and now there is a stream job that +is continuously writing data. + +After a few days, you find that there is a problem with yesterday's +partition data, the data in the partition is wrong, you need to +recalculate and revise the partition. + +```sql +-- Run a batch job to revise yesterday's partition +SET 'execution.runtime-mode' = 'batch'; +INSERT OVERWRITE MyTable PARTITION ('dt'='20220402') + SELECT ... FROM SourceTable WHERE dt = '20220402'; +``` + +This way you revise yesterday's partition without suspending the streaming job. + +{{< hint info >}} +__Note:__ Multiple jobs writing to a single partition at the same time is +not supported. The behavior does not result in data errors, but can lead +to job failover. +{{< /hint >}} + +## Parallelism + +It is recommended that the parallelism of sink should be less than or +equal to the number of buckets, preferably equal. You can control the +parallelism of the sink with the `sink.parallelism` option. + + + + + Option + Required + Default + Type + Description + + + + + sink.parallelism + No + (none) + Integer + Defines the parallelism of the sink operator. By default, the parallelism is determined by the framework using the same parallelism of the upstream chained operator. + + + + +## Expire Snapshot + +Table Store generates one or two snapshots per commit. To avoid too many snapshots +that create a lot of small files and redundant storage, Table Store write defaults +to eliminating expired snapshots, controlled by the following options: + + + + + Option + Required + Default + Type + Description + + + + + snapshot.time-retained + No + 1 h + Duration + The maximum time of completed snapshots to retain. + + + snapshot.num-retained + No + Integer.MAX_VALUE + Integer + The maximum number of completed snapshots to retain. + + + + +Please note that too short retain time or too small retain number may result in: +- Batch query cannot find the file. For example, the table is relatively large and + the batch query takes 10 minutes to read, but the snapshot from 10 minutes ago + expires, at which point the batch query will read a deleted snapshot. +- Continuous reading jobs on FileStore (Without Log System) fail to restart. At the + time of the job failover, the snapshot it recorded has expired. + +## Performance + +Table Store uses LSM data structure, which itself has the ability to support a large +number of updates. Update performance and query performance is a tradeoff, the +following parameters control this tradeoff: + + + + + Option + Required + Default + Type + Description + + + + + num-sorted-run.max + No + 5 + Integer + The max sorted run number. Includes level0 files (one file one sorted run) and high-level runs (one level one sorted run). + + + + +- The larger `num-sorted-run.max`: the less merge cost when updating data, which + can avoid many invalid merges. However, if this value is too large, more memory + will be needed when merging files, because each FileReader will take up a lot + of memory. +- Smaller `num-sorted-run.max`: better performance when querying, fewer files + will be merged. + +## Memory + +There are three main places in the Table Store's sink writer that take up memory: +- MemTable's write buffer, which is individually occupied by each partition, each + bucket, and this memory value can be adjustable by the `write-buffer-size` + option (default 64 MB). +- The memory consumed by compaction for reading files, it can be adj
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841157829 ## File path: docs/content/docs/development/write-table.md ## @@ -0,0 +1,190 @@ +--- +title: "Write Table" +weight: 3 +type: docs +aliases: +- /development/write-table.html +--- + + +# Write Table + +```sql +INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name + [PARTITION part_spec] [column_list] select_statement + +part_spec: + (part_col_name1=val1 [, part_col_name2=val2, ...]) + +column_list: + (col_name1 [, column_name2, ...]) +``` + +## Unify Streaming and Batch + +Table Store data writing supports Flink's batch and streaming modes. +Moreover, Table Store supports simultaneous writing of streaming and batch: + +```sql +-- a managed table ddl +CREATE TABLE MyTable ( + user_id BIGINT, + item_id BIGINT, + dt STRING +) PARTITIONED BY (dt); + +-- Run a stream job that continuously writes to the table +SET 'execution.runtime-mode' = 'streaming'; +INSERT INTO MyTable SELECT ... FROM MyCdcTable; +``` + +You have this partitioned table, and now there is a stream job that +is continuously writing data. + +After a few days, you find that there is a problem with yesterday's +partition data, the data in the partition is wrong, you need to +recalculate and revise the partition. + +```sql +-- Run a batch job to revise yesterday's partition +SET 'execution.runtime-mode' = 'batch'; +INSERT OVERWRITE MyTable PARTITION ('dt'='20220402') + SELECT ... FROM SourceTable WHERE dt = '20220402'; +``` + +This way you revise yesterday's partition without suspending the streaming job. + +{{< hint info >}} +__Note:__ Multiple jobs writing to a single partition at the same time is +not supported. The behavior does not result in data errors, but can lead +to job failover. +{{< /hint >}} + +## Parallelism + +It is recommended that the parallelism of sink should be less than or +equal to the number of buckets, preferably equal. You can control the +parallelism of the sink with the `sink.parallelism` option. + + + + + Option + Required + Default + Type + Description + + + + + sink.parallelism + No + (none) + Integer + Defines the parallelism of the sink operator. By default, the parallelism is determined by the framework using the same parallelism of the upstream chained operator. + + + + +## Expire Snapshot + +Table Store generates one or two snapshots per commit. To avoid too many snapshots +that create a lot of small files and redundant storage, Table Store write defaults +to eliminating expired snapshots, controlled by the following options: + + + + + Option + Required + Default + Type + Description + + + + + snapshot.time-retained + No + 1 h + Duration + The maximum time of completed snapshots to retain. + + + snapshot.num-retained + No + Integer.MAX_VALUE + Integer + The maximum number of completed snapshots to retain. + + + + +Please note that too short retain time or too small retain number may result in: +- Batch query cannot find the file. For example, the table is relatively large and + the batch query takes 10 minutes to read, but the snapshot from 10 minutes ago + expires, at which point the batch query will read a deleted snapshot. +- Continuous reading jobs on FileStore (Without Log System) fail to restart. At the + time of the job failover, the snapshot it recorded has expired. + +## Performance + +Table Store uses LSM data structure, which itself has the ability to support a large +number of updates. Update performance and query performance is a tradeoff, the +following parameters control this tradeoff: + + + + + Option + Required + Default + Type + Description + + + + + num-sorted-run.max + No + 5 + Integer + The max sorted run number. Includes level0 files (one file one sorted run) and high-level runs (one level one sorted run). + + + + +- The larger `num-sorted-run.max`: the less merge cost when updating data, which + can avoid many invalid merges. However, if this value is too large, more memory + will be needed when merging files, because each FileReader will take up a lot + of memory. +- Smaller `num-sorted-run.max`: better performance when querying, fewer files + will be merged. Review comment: Nit: Keep consistency - The larger `num-sorted-run.max`, the less merge cost when updating data, which can avoid many invalid merges. However, if this value is too large, more memory will be needed when merging files because each FileReader will take up a lot of memory. - The smaller `num-sorted-run.max`, the better p
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841157612 ## File path: docs/content/docs/development/write-table.md ## @@ -0,0 +1,190 @@ +--- +title: "Write Table" +weight: 3 +type: docs +aliases: +- /development/write-table.html +--- + + +# Write Table + +```sql +INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name + [PARTITION part_spec] [column_list] select_statement + +part_spec: + (part_col_name1=val1 [, part_col_name2=val2, ...]) + +column_list: + (col_name1 [, column_name2, ...]) +``` + +## Unify Streaming and Batch + +Table Store data writing supports Flink's batch and streaming modes. +Moreover, Table Store supports simultaneous writing of streaming and batch: + +```sql +-- a managed table ddl +CREATE TABLE MyTable ( + user_id BIGINT, + item_id BIGINT, + dt STRING +) PARTITIONED BY (dt); + +-- Run a stream job that continuously writes to the table +SET 'execution.runtime-mode' = 'streaming'; +INSERT INTO MyTable SELECT ... FROM MyCdcTable; +``` + +You have this partitioned table, and now there is a stream job that +is continuously writing data. + +After a few days, you find that there is a problem with yesterday's +partition data, the data in the partition is wrong, you need to +recalculate and revise the partition. + +```sql +-- Run a batch job to revise yesterday's partition +SET 'execution.runtime-mode' = 'batch'; +INSERT OVERWRITE MyTable PARTITION ('dt'='20220402') + SELECT ... FROM SourceTable WHERE dt = '20220402'; +``` + +This way you revise yesterday's partition without suspending the streaming job. + +{{< hint info >}} +__Note:__ Multiple jobs writing to a single partition at the same time is +not supported. The behavior does not result in data errors, but can lead +to job failover. +{{< /hint >}} + +## Parallelism + +It is recommended that the parallelism of sink should be less than or +equal to the number of buckets, preferably equal. You can control the +parallelism of the sink with the `sink.parallelism` option. + + + + + Option + Required + Default + Type + Description + + + + + sink.parallelism + No + (none) + Integer + Defines the parallelism of the sink operator. By default, the parallelism is determined by the framework using the same parallelism of the upstream chained operator. + + + + +## Expire Snapshot + +Table Store generates one or two snapshots per commit. To avoid too many snapshots +that create a lot of small files and redundant storage, Table Store write defaults +to eliminating expired snapshots, controlled by the following options: Review comment: Table Store generates one or two snapshots per commit. To avoid too many snapshots that create a lot of small files and redundant storage, Table Store writes defaults to eliminate expired snapshots -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841157424 ## File path: docs/content/docs/development/write-table.md ## @@ -0,0 +1,190 @@ +--- +title: "Write Table" +weight: 3 +type: docs +aliases: +- /development/write-table.html +--- + + +# Write Table + +```sql +INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name + [PARTITION part_spec] [column_list] select_statement + +part_spec: + (part_col_name1=val1 [, part_col_name2=val2, ...]) + +column_list: + (col_name1 [, column_name2, ...]) +``` + +## Unify Streaming and Batch + +Table Store data writing supports Flink's batch and streaming modes. +Moreover, Table Store supports simultaneous writing of streaming and batch: + +```sql +-- a managed table ddl +CREATE TABLE MyTable ( + user_id BIGINT, + item_id BIGINT, + dt STRING +) PARTITIONED BY (dt); + +-- Run a stream job that continuously writes to the table +SET 'execution.runtime-mode' = 'streaming'; +INSERT INTO MyTable SELECT ... FROM MyCdcTable; +``` + +You have this partitioned table, and now there is a stream job that +is continuously writing data. + +After a few days, you find that there is a problem with yesterday's +partition data, the data in the partition is wrong, you need to +recalculate and revise the partition. + +```sql +-- Run a batch job to revise yesterday's partition +SET 'execution.runtime-mode' = 'batch'; +INSERT OVERWRITE MyTable PARTITION ('dt'='20220402') + SELECT ... FROM SourceTable WHERE dt = '20220402'; +``` + +This way you revise yesterday's partition without suspending the streaming job. + +{{< hint info >}} +__Note:__ Multiple jobs writing to a single partition at the same time is +not supported. The behavior does not result in data errors, but can lead +to job failover. +{{< /hint >}} + +## Parallelism + +It is recommended that the parallelism of sink should be less than or +equal to the number of buckets, preferably equal. You can control the +parallelism of the sink with the `sink.parallelism` option. + + + + + Option + Required + Default + Type + Description + + + + + sink.parallelism + No + (none) + Integer + Defines the parallelism of the sink operator. By default, the parallelism is determined by the framework using the same parallelism of the upstream chained operator. + + + + +## Expire Snapshot Review comment: `Snapshot Expiration` or `Expiring Snapshot`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal edited a comment on pull request #18386: [FLINK-25684][table] Support enhanced `show databases` syntax
RocMarshal edited a comment on pull request #18386: URL: https://github.com/apache/flink/pull/18386#issuecomment-1086631358 @ZhangChaoming Thanks for the update. In general, it's recommended that we should update with a new commit message. Because the squish multi-commits message into one commit message will be not convenient for reviewers to start the next review in so many changed lines. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155419 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable; + +-- Streaming mode, read latest incremental +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */; +``` + +## Query Optimization + +It is highly recommended taking partition and primary key filters +in the query, which will speed up the data skipping of the query. + +Supported filter functions are: +- `=` +- `<>` +- `<` +- `<=` +- `>` +- `>=` +- `in` +- starts with `like` + +## Streaming Real-time + +By default, data is only visible after the checkpoint, which means +that the streaming reading has transactional consistency. + +If you want the data to be immediately visible, you need to: +- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption + capability because the FileStore only provides checkpoint-based visibility. +- 'log.consistency' = 'eventual', this means that writes are visible without + using LogSystem's transaction mechanism. +- All tables need to have primary key defined, because only then can the + data be de-duplicated by normalize node of downstream job. + +## Streaming Low Cost + +By default, for the table with primary key, the records in the table store only +contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is +generated in downstream consuming job, the node will store all key-value for +producing UPDATE_BEFORE message. Review comment: How about > By default, for the table with the primary key, the records in the table store only contain `INSERT`, `UPDATE_AFTER`, and `DELETE`. The downstream consuming job will have a normalized node, and it stores all processed key-value to produce the `UPDATE_BEFORE` message, which will bring extra overhead. If you want to handle the deduplication manually or require all changelog of the table, you can remove this node by setting: - `log.changelog-mode`= `all` - `log.consistency` = `transactional` (default) --- P.S. I think the main difference between the original statement and the suggested change is **How** v.s. **Why**. We should first explain to the users why they want to do something and then how to achieve the goal. For instance, removing the normalized node is not the intention. It is just a means, and the purpose is to see all changelog or manually handle du-dep, so we should first express why and then how. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19344: [hotfix] Modify spelling error in IOUtils.java
flinkbot edited a comment on pull request #19344: URL: https://github.com/apache/flink/pull/19344#issuecomment-1086770696 ## CI report: * 1f3f21220dc9ac5041c28da2e7a49db823addfbe Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34188) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155419 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable; + +-- Streaming mode, read latest incremental +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */; +``` + +## Query Optimization + +It is highly recommended taking partition and primary key filters +in the query, which will speed up the data skipping of the query. + +Supported filter functions are: +- `=` +- `<>` +- `<` +- `<=` +- `>` +- `>=` +- `in` +- starts with `like` + +## Streaming Real-time + +By default, data is only visible after the checkpoint, which means +that the streaming reading has transactional consistency. + +If you want the data to be immediately visible, you need to: +- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption + capability because the FileStore only provides checkpoint-based visibility. +- 'log.consistency' = 'eventual', this means that writes are visible without + using LogSystem's transaction mechanism. +- All tables need to have primary key defined, because only then can the + data be de-duplicated by normalize node of downstream job. + +## Streaming Low Cost + +By default, for the table with primary key, the records in the table store only +contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is +generated in downstream consuming job, the node will store all key-value for +producing UPDATE_BEFORE message. Review comment: How about > By default, for the table with the primary key, the records in the table store only contain `INSERT`, `UPDATE_AFTER`, and `DELETE`. The downstream consuming job will have a normalized node, and it stores all processed key-value to produce the `UPDATE_BEFORE` message, which will bring extra overhead. If you want to handle the deduplication manually or require all changelog of the table, you can remove this node by setting: - `log.changelog-mode`= `all` - `log.consistency` = `transactional` (default) --- P.S. I think the main difference between the original statement and the suggested change is **How** v.s. **Why**. We should first explain to the users why they want to do something and then how to achieve the goal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155997 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable; + +-- Streaming mode, read latest incremental +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */; +``` + +## Query Optimization + +It is highly recommended taking partition and primary key filters +in the query, which will speed up the data skipping of the query. + +Supported filter functions are: +- `=` +- `<>` +- `<` +- `<=` +- `>` +- `>=` +- `in` +- starts with `like` + +## Streaming Real-time + +By default, data is only visible after the checkpoint, which means +that the streaming reading has transactional consistency. + +If you want the data to be immediately visible, you need to: +- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption + capability because the FileStore only provides checkpoint-based visibility. +- 'log.consistency' = 'eventual', this means that writes are visible without + using LogSystem's transaction mechanism. +- All tables need to have primary key defined, because only then can the + data be de-duplicated by normalize node of downstream job. Review comment: How about > By default, data is only visible after the checkpoint, which means that the streaming reading has transactional consistency. > If you want the data to be immediately visible, you need to set the following options Table Option Default Description `log.system` = `kafka` No You need to enable log system because the FileStore's continuous consumptition only provides checkpoint-based visibility. `log.consistency` = `eventual` No This means that writes are visible without using LogSystem's transaction mechanism. Note: All tables need to have the primary key defined because only then can the data be de-duplicated by the normalizing node of the downstream job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155997 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable; + +-- Streaming mode, read latest incremental +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */; +``` + +## Query Optimization + +It is highly recommended taking partition and primary key filters +in the query, which will speed up the data skipping of the query. + +Supported filter functions are: +- `=` +- `<>` +- `<` +- `<=` +- `>` +- `>=` +- `in` +- starts with `like` + +## Streaming Real-time + +By default, data is only visible after the checkpoint, which means +that the streaming reading has transactional consistency. + +If you want the data to be immediately visible, you need to: +- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption + capability because the FileStore only provides checkpoint-based visibility. +- 'log.consistency' = 'eventual', this means that writes are visible without + using LogSystem's transaction mechanism. +- All tables need to have primary key defined, because only then can the + data be de-duplicated by normalize node of downstream job. Review comment: How about > By default, data is only visible after the checkpoint, which means that the streaming reading has transactional consistency. > If you want the data to be immediately visible, you need to set the following options Table Option Default Description `log.system` = `kafka` No You need to enable log system because the FileStore's continuous consumptition only provides checkpoint-based visibility. `log.consistency` = `eventual` No This means that writes are visible without using LogSystem's transaction mechanism. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #19344: [hotfix] Modify spelling error in IOUtils.java
flinkbot commented on pull request #19344: URL: https://github.com/apache/flink/pull/19344#issuecomment-1086770696 ## CI report: * 1f3f21220dc9ac5041c28da2e7a49db823addfbe UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r84110 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable; + +-- Streaming mode, read latest incremental +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */; +``` + +## Query Optimization + +It is highly recommended taking partition and primary key filters +in the query, which will speed up the data skipping of the query. + +Supported filter functions are: +- `=` +- `<>` +- `<` +- `<=` +- `>` +- `>=` +- `in` +- starts with `like` + +## Streaming Real-time + +By default, data is only visible after the checkpoint, which means +that the streaming reading has transactional consistency. + +If you want the data to be immediately visible, you need to: +- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption + capability because the FileStore only provides checkpoint-based visibility. +- 'log.consistency' = 'eventual', this means that writes are visible without + using LogSystem's transaction mechanism. +- All tables need to have primary key defined, because only then can the + data be de-duplicated by normalize node of downstream job. + +## Streaming Low Cost + +By default, for the table with primary key, the records in the table store only +contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is +generated in downstream consuming job, the node will store all key-value for +producing UPDATE_BEFORE message. + +If you want to remove downstream normalized node (It's costly) or see the all +changes of this table, you can configure: +- 'log.changelog-mode' = 'all' +- 'log.consistency' = 'transactional' (default) + +The query written to the table store must contain all message types with +UPDATE_BEFORE, otherwise the planner will throw an exception. Review comment: Nit: add a punctuation mark for `UPDATE_BEFORE` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] DefuLi opened a new pull request #19344: [hotfix] Modify spelling error in IOUtils.java
DefuLi opened a new pull request #19344: URL: https://github.com/apache/flink/pull/19344 ## What is the purpose of the change *The word "Premeture" is misspelled in the IOUtils.java file of flink-core.* ## Brief change log - *Modify spelling error in IOUtils.java file of flink-core.* ## Verifying this change *This change is a trivial rework / code cleanup without any test coverage.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155419 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable; + +-- Streaming mode, read latest incremental +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */; +``` + +## Query Optimization + +It is highly recommended taking partition and primary key filters +in the query, which will speed up the data skipping of the query. + +Supported filter functions are: +- `=` +- `<>` +- `<` +- `<=` +- `>` +- `>=` +- `in` +- starts with `like` + +## Streaming Real-time + +By default, data is only visible after the checkpoint, which means +that the streaming reading has transactional consistency. + +If you want the data to be immediately visible, you need to: +- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption + capability because the FileStore only provides checkpoint-based visibility. +- 'log.consistency' = 'eventual', this means that writes are visible without + using LogSystem's transaction mechanism. +- All tables need to have primary key defined, because only then can the + data be de-duplicated by normalize node of downstream job. + +## Streaming Low Cost + +By default, for the table with primary key, the records in the table store only +contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is +generated in downstream consuming job, the node will store all key-value for +producing UPDATE_BEFORE message. Review comment: How about > By default, for the table with the primary key, the records in the table store only contain `INSERT`, `UPDATE_AFTER`, and `DELETE`. The downstream consuming job will have a normalized node, and it stores all processed key-value to produce the `UPDATE_BEFORE` message, which will bring extra overhead. If you want to handle the deduplication manually or require all changes to the table, you can remove this node by setting: - `log.changelog-mode`= `all` - `log.consistency` = `transactional` (default) --- P.S. I think the main difference between the original statement and the suggested change is **How** v.s. **Why**. We should first explain to the users why they want to do something and then how to achieve the goal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155419 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable; + +-- Streaming mode, read latest incremental +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */; +``` + +## Query Optimization + +It is highly recommended taking partition and primary key filters +in the query, which will speed up the data skipping of the query. + +Supported filter functions are: +- `=` +- `<>` +- `<` +- `<=` +- `>` +- `>=` +- `in` +- starts with `like` + +## Streaming Real-time + +By default, data is only visible after the checkpoint, which means +that the streaming reading has transactional consistency. + +If you want the data to be immediately visible, you need to: +- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption + capability because the FileStore only provides checkpoint-based visibility. +- 'log.consistency' = 'eventual', this means that writes are visible without + using LogSystem's transaction mechanism. +- All tables need to have primary key defined, because only then can the + data be de-duplicated by normalize node of downstream job. + +## Streaming Low Cost + +By default, for the table with primary key, the records in the table store only +contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is +generated in downstream consuming job, the node will store all key-value for +producing UPDATE_BEFORE message. Review comment: How about > By default, for the table with the primary key, the records in the table store only contain `INSERT`, `UPDATE_AFTER`, and `DELETE`. The downstream consuming job will have a normalized node, and it stores all processed key-value to produce the `UPDATE_BEFORE` message, which will bring extra overhead. If you want to handle the deduplication manually or require all changes to the table, you can remove this node by setting: - `log.changelog-mode`= `all` - `log.consistency` = `transactional` (default) --- P.S. I think the main difference between the original statement and the suggested change is **How** v.s. **Why**. We should first explain to the user why they want to do something and then how to achieve the goal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-27022) Expose taskSlots in TaskManager spec
[ https://issues.apache.org/jira/browse/FLINK-27022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516426#comment-17516426 ] Thomas Weise commented on FLINK-27022: -- I still think that this setting should not be repeated. While it is used everywhere I also think that all but the most trivial demo application will need to have a flinkConfiguration section and this is just the first setting of several. > Expose taskSlots in TaskManager spec > > > Key: FLINK-27022 > URL: https://issues.apache.org/jira/browse/FLINK-27022 > Project: Flink > Issue Type: Sub-task > Components: Kubernetes Operator >Reporter: Gyula Fora >Priority: Major > > Basically every Flink job needs the number of taskslots configuration and it > is an important part of the resource spec. > We should include this in the taskmanager spec as an integer parameter. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841152998 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable; + +-- Streaming mode, read latest incremental +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */; +``` + +## Query Optimization + +It is highly recommended taking partition and primary key filters +in the query, which will speed up the data skipping of the query. + +Supported filter functions are: +- `=` +- `<>` +- `<` +- `<=` +- `>` +- `>=` +- `in` +- starts with `like` + +## Streaming Real-time + +By default, data is only visible after the checkpoint, which means +that the streaming reading has transactional consistency. + +If you want the data to be immediately visible, you need to: +- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption + capability because the FileStore only provides checkpoint-based visibility. +- 'log.consistency' = 'eventual', this means that writes are visible without + using LogSystem's transaction mechanism. +- All tables need to have primary key defined, because only then can the + data be de-duplicated by normalize node of downstream job. + +## Streaming Low Cost + +By default, for the table with primary key, the records in the table store only +contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is +generated in downstream consuming job, the node will store all key-value for Review comment: Nit: `the node will` can be simplified as `it will` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841152540 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable; + +-- Streaming mode, read latest incremental +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */; +``` + +## Query Optimization + +It is highly recommended taking partition and primary key filters +in the query, which will speed up the data skipping of the query. + +Supported filter functions are: +- `=` +- `<>` +- `<` +- `<=` +- `>` +- `>=` +- `in` +- starts with `like` + +## Streaming Real-time + +By default, data is only visible after the checkpoint, which means +that the streaming reading has transactional consistency. + +If you want the data to be immediately visible, you need to: +- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption + capability because the FileStore only provides checkpoint-based visibility. +- 'log.consistency' = 'eventual', this means that writes are visible without + using LogSystem's transaction mechanism. +- All tables need to have primary key defined, because only then can the + data be de-duplicated by normalize node of downstream job. + +## Streaming Low Cost + +By default, for the table with primary key, the records in the table store only Review comment: By default, for the table with the primary key, the records in the table store only contain `INSERT`, `UPDATE_AFTER`, and `DELETE`. No `UPDATE_BEFORE`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841151631 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable; + +-- Streaming mode, read latest incremental +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */; +``` + +## Query Optimization + +It is highly recommended taking partition and primary key filters +in the query, which will speed up the data skipping of the query. + +Supported filter functions are: +- `=` +- `<>` +- `<` +- `<=` +- `>` +- `>=` +- `in` +- starts with `like` + +## Streaming Real-time + +By default, data is only visible after the checkpoint, which means +that the streaming reading has transactional consistency. + +If you want the data to be immediately visible, you need to: Review comment: complete the sentence as > you need to set the following options -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841151482 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable; + +-- Streaming mode, read latest incremental +SET 'execution.runtime-mode' = 'streaming'; +SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */; +``` + +## Query Optimization + +It is highly recommended taking partition and primary key filters Review comment: It is highly recommended to take partition and primary key filters along with the query, which will speed up the data skipping of the query. See [Ref](https://www.macmillandictionary.com/us/dictionary/american/recommend#:~:text=recommend%20doing%20something%3A,to%20read%20the%20following%20books.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841149095 ## File path: docs/content/docs/development/query-table.md ## @@ -0,0 +1,87 @@ +--- +title: "Query Table" +weight: 4 +type: docs +aliases: +- /development/query-table.html +--- + + +# Query Table + +The Table Store is streaming batch unified, you can read full +and incremental data depending on the runtime execution mode: + +```sql +-- Batch mode, read latest snapshot +SET 'execution.runtime-mode' = 'batch'; +SELECT * FROM MyTable; + +-- Streaming mode, read incremental snapshot, read the snapshot first, then read the increment Review comment: then read the incremental -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841148673 ## File path: docs/content/docs/development/overview.md ## @@ -30,6 +30,25 @@ Flink Table Store is a unified streaming and batch store for building dynamic tables on Apache Flink. Flink Table Store serves as the storage engine behind Flink SQL Managed Table. +## Setup Table Store + +{{< hint info >}} +__Note:__ Table Store is only supported starting from Flink 1.15. +{{< /hint >}} + +You can get the bundle jar for the Table Store in one of the following ways: +- [Download the latest bundle jar](https://flink.apache.org/downloads.html) of + Flink Table Store. +- Build bundle jar from `flink-table-store-dist` in the source code. + +We have shaded all the dependencies in the package, so you don't have Review comment: `Flink Table Store has shaded` Ref: [Apache Flink Documentation Language Syle](https://flink.apache.org/contributing/docs-style.html) > Use you, never we. Using we can be confusing and patronizing to some users, giving the impression that “we are all members of a secret club and you didn’t get a membership invite”. Address the user as you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841148428 ## File path: docs/content/docs/development/overview.md ## @@ -30,6 +30,25 @@ Flink Table Store is a unified streaming and batch store for building dynamic tables on Apache Flink. Flink Table Store serves as the storage engine behind Flink SQL Managed Table. +## Setup Table Store + +{{< hint info >}} +__Note:__ Table Store is only supported starting from Flink 1.15. +{{< /hint >}} + +You can get the bundle jar for the Table Store in one of the following ways: +- [Download the latest bundle jar](https://flink.apache.org/downloads.html) of + Flink Table Store. +- Build bundle jar from `flink-table-store-dist` in the source code. Review comment: Build bundle jar under submodule`flink-table-store-dist` from source code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841147919 ## File path: docs/content/docs/development/overview.md ## @@ -30,6 +30,25 @@ Flink Table Store is a unified streaming and batch store for building dynamic tables on Apache Flink. Flink Table Store serves as the storage engine behind Flink SQL Managed Table. +## Setup Table Store + +{{< hint info >}} +__Note:__ Table Store is only supported starting from Flink 1.15. Review comment: How about > Table Store is only supported since Flink 1.15. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841147805 ## File path: docs/content/docs/development/create-table.md ## @@ -151,3 +151,85 @@ Creating a table will create the corresponding physical storage: - If `log.system` is configured as Kafka, a Topic named "${catalog_name}.${database_name}.${table_name}" will be created automatically when the table is created. + +## Distribution + +The data distribution of Table Store consists of three concepts: +Partition, Bucket, and Primary Key. + +```sql +CREATE TABLE MyTable ( + user_id BIGINT, + item_id BIGINT, + behavior STRING, + dt STRING, + PRIMARY KEY (dt, user_id) NOT ENFORCED +) PARTITIONED BY (dt) WITH ( + 'bucket' = '4' +); +``` + +For example, the `MyTable` table above has its data distribution +in the following order: +- Partition: isolating different data based on partition fields. +- Bucket: Within a single partition, distributed into 4 different + buckets based on the hash value of the primary key. +- Primary key: Within a single bucket, sorted by primary key to + build LSM structure. + +## Partition + +Table Store adopts the same partitioning concept as Apache Hive to +separate data, and thus various operations can be managed by partition +as a management unit. + +Partitioned filtering is the most effective way to improve performance, +your query statements should contain partition filtering conditions +as much as possible. + +## Bucket + +The record is hashed into different buckets according to the +primary key or the whole row (without primary key). + +The number of buckets is very important as it determines the +worst-case maximum processing parallelism. But it should not be +too big, otherwise, the system will create a lot of small files. + +In general, the desired file size is 128 MB, the recommended data +to be kept on disk in each sub-bucket is about 1 GB. + +## Primary Key + +The primary key is unique and is indexed. + +Flink Table Store imposes an ordering of data, which means the system +will sort the primary key within each bucket. All fields will be used +to sort if no primary key is defined. Using this feature, you can +achieve high performance by adding filter conditions on the primary key. + +The primary key's choice is critical, especially when setting the composite +primary key. A rule of thumb is to put the most frequently queried field in +the front. For example: + +```sql +CREATE TABLE MyTable ( + catalog_id BIGINT, + user_id BIGINT, + item_id BIGINT, + behavior STRING, + dt STRING, + .. +); +``` + +For this table, assuming that the composite primary keys are +the `catalog_id` and `user_id` fields, there are two ways to +set the primary key: +1. PRIMARY KEY (user_id, catalog_id) +2. PRIMARY KEY (catalog_id, user_id) + +The two methods do not behave in the same way when querying. +Use approach 1 if you have a large number of filtered queries +with only `user_id`, and use approach 2 if you have a large +number of filtered queries with only `catalog_id`. Review comment: Nit: approach 1 => approach one, approach 2 => approach two -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-26899) Introduce write/query table document for table store
[ https://issues.apache.org/jira/browse/FLINK-26899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-26899: --- Labels: pull-request-available (was: ) > Introduce write/query table document for table store > > > Key: FLINK-26899 > URL: https://issues.apache.org/jira/browse/FLINK-26899 > Project: Flink > Issue Type: Sub-task > Components: Table Store >Reporter: Jingsong Lee >Assignee: Jingsong Lee >Priority: Major > Labels: pull-request-available > Fix For: table-store-0.1.0 > > > * Batch or streaming > * file store continuous and log store > * retention > * parallelism > * memory -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store
LadyForest commented on a change in pull request #72: URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841147565 ## File path: docs/content/docs/development/create-table.md ## @@ -151,3 +151,85 @@ Creating a table will create the corresponding physical storage: - If `log.system` is configured as Kafka, a Topic named "${catalog_name}.${database_name}.${table_name}" will be created automatically when the table is created. + +## Distribution + +The data distribution of Table Store consists of three concepts: +Partition, Bucket, and Primary Key. + +```sql +CREATE TABLE MyTable ( + user_id BIGINT, + item_id BIGINT, + behavior STRING, + dt STRING, + PRIMARY KEY (dt, user_id) NOT ENFORCED +) PARTITIONED BY (dt) WITH ( + 'bucket' = '4' +); +``` + +For example, the `MyTable` table above has its data distribution +in the following order: +- Partition: isolating different data based on partition fields. +- Bucket: Within a single partition, distributed into 4 different + buckets based on the hash value of the primary key. +- Primary key: Within a single bucket, sorted by primary key to + build LSM structure. + +## Partition + +Table Store adopts the same partitioning concept as Apache Hive to +separate data, and thus various operations can be managed by partition +as a management unit. + +Partitioned filtering is the most effective way to improve performance, +your query statements should contain partition filtering conditions +as much as possible. + +## Bucket + +The record is hashed into different buckets according to the +primary key or the whole row (without primary key). + +The number of buckets is very important as it determines the +worst-case maximum processing parallelism. But it should not be +too big, otherwise, the system will create a lot of small files. + +In general, the desired file size is 128 MB, the recommended data +to be kept on disk in each sub-bucket is about 1 GB. + +## Primary Key + +The primary key is unique and is indexed. Review comment: Nit: The primary key is unique and indexed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19343: [FLINK-27026][build] Upgrade checkstyle plugin
flinkbot edited a comment on pull request #19343: URL: https://github.com/apache/flink/pull/19343#issuecomment-1086724313 ## CI report: * 52653837e78b17dcdc196b3c6562ccebf22fca17 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34184) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19263: [FLINK-21585][metrics] Add options for in-/excluding metrics
flinkbot edited a comment on pull request #19263: URL: https://github.com/apache/flink/pull/19263#issuecomment-1081065491 ## CI report: * 087457408aa0db8ed8d49e2c7da68b296c25142f Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34183) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19342: [FLINK-27027] Get rid of oddly named mvn-${sys mvn.forkNumber}.log files
flinkbot edited a comment on pull request #19342: URL: https://github.com/apache/flink/pull/19342#issuecomment-1086722113 ## CI report: * a051c979b72bac01401b1190096f29f30d21dadc Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34182) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-26524) Elasticsearch (v5.3.3) sink end-to-end test
[ https://issues.apache.org/jira/browse/FLINK-26524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-26524: --- Labels: auto-deprioritized-critical test-stability (was: stale-critical test-stability) Priority: Major (was: Critical) This issue was labeled "stale-critical" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Critical, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Elasticsearch (v5.3.3) sink end-to-end test > --- > > Key: FLINK-26524 > URL: https://issues.apache.org/jira/browse/FLINK-26524 > Project: Flink > Issue Type: Bug > Components: Connectors / ElasticSearch >Affects Versions: 1.14.3 >Reporter: Matthias Pohl >Priority: Major > Labels: auto-deprioritized-critical, test-stability > > e2e test {{Elasticsearch (v5.3.3) sink end-to-end test}} failed in [this > build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=32627&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=070ff179-953e-5bda-71fa-d6599415701c&l=16598] > on {{release-1.14}} probably because of the following stacktrace showing up > in the logs: > {code} > Mar 07 15:40:41 2022-03-07 15:40:40,336 WARN > org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to > trigger checkpoint 1 for job 3a2fd4c6fb03d5b20929a6f2b7131d82. (0 consecutive > failed attempts so far) > Mar 07 15:40:41 org.apache.flink.runtime.checkpoint.CheckpointException: > Checkpoint was declined (task is closing) > Mar 07 15:40:41 at > org.apache.flink.runtime.messages.checkpoint.SerializedCheckpointException.unwrap(SerializedCheckpointException.java:51) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > Mar 07 15:40:41 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveDeclineMessage(CheckpointCoordinator.java:988) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > Mar 07 15:40:41 at > org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$declineCheckpoint$2(ExecutionGraphHandler.java:103) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > Mar 07 15:40:41 at > org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > Mar 07 15:40:41 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_322] > Mar 07 15:40:41 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_322] > Mar 07 15:40:41 at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322] > Mar 07 15:40:41 Caused by: org.apache.flink.util.SerializedThrowable: Task > name with subtask : Source: Sequence Source (Deprecated) -> Flat Map -> Sink: > Unnamed (1/1)#0 Failure reason: Checkpoint was declined (task is closing) > Mar 07 15:40:41 at > org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1389) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > Mar 07 15:40:41 at > org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1382) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > Mar 07 15:40:41 at > org.apache.flink.runtime.taskmanager.Task.triggerCheckpointBarrier(Task.java:1348) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > Mar 07 15:40:41 at > org.apache.flink.runtime.taskexecutor.TaskExecutor.triggerCheckpoint(TaskExecutor.java:956) > ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] > Mar 07 15:40:41 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) ~[?:1.8.0_322] > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25765) Kubernetes: flink's configmap and flink's actual config are out of sync
[ https://issues.apache.org/jira/browse/FLINK-25765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-25765: --- Labels: auto-deprioritized-major usability (was: stale-major usability) Priority: Minor (was: Major) This issue was labeled "stale-major" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Kubernetes: flink's configmap and flink's actual config are out of sync > --- > > Key: FLINK-25765 > URL: https://issues.apache.org/jira/browse/FLINK-25765 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.15.0 >Reporter: Niklas Semmler >Priority: Minor > Labels: auto-deprioritized-major, usability > > For kubernetes setups, Flink's configmap does not reflect the actual config. > Causes > # Config values are overridden by the environment variables in the docker > image (see FLINK-25764) > # Flink reads the config on start-up, but does not subscribe to changes > # Changes to the config map do not lead to restarts of the flink cluster > Effects > # Users cannot expect to understand Flink's config from the configmap > # TaskManager/JobManager started at different times may start with different > configs, if the user edits the configmap > Related to FLINK-21383. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-26386) CassandraConnectorITCase.testCassandraTableSink failed on azure
[ https://issues.apache.org/jira/browse/FLINK-26386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-26386: --- Labels: auto-deprioritized-critical test-stability (was: stale-critical test-stability) Priority: Major (was: Critical) This issue was labeled "stale-critical" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Critical, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > CassandraConnectorITCase.testCassandraTableSink failed on azure > --- > > Key: FLINK-26386 > URL: https://issues.apache.org/jira/browse/FLINK-26386 > Project: Flink > Issue Type: Bug > Components: Connectors / Cassandra >Affects Versions: 1.15.0, 1.13.6 >Reporter: Yun Gao >Priority: Major > Labels: auto-deprioritized-critical, test-stability > > {code:java} > Feb 28 02:39:19 [ERROR] Tests run: 20, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 226.77 s <<< FAILURE! - in > org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase > Feb 28 02:39:19 [ERROR] > org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase.testCassandraTableSink > Time elapsed: 52.49 s <<< ERROR! > Feb 28 02:39:19 > com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException: > Keyspace flink doesn't exist > Feb 28 02:39:19 at > com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException.copy(InvalidConfigurationInQueryException.java:37) > Feb 28 02:39:19 at > com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException.copy(InvalidConfigurationInQueryException.java:27) > Feb 28 02:39:19 at > com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) > Feb 28 02:39:19 at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245) > Feb 28 02:39:19 at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63) > Feb 28 02:39:19 at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39) > Feb 28 02:39:19 at > org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase.createTable(CassandraConnectorITCase.java:391) > Feb 28 02:39:19 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Feb 28 02:39:19 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Feb 28 02:39:19 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Feb 28 02:39:19 at java.lang.reflect.Method.invoke(Method.java:498) > Feb 28 02:39:19 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Feb 28 02:39:19 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Feb 28 02:39:19 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Feb 28 02:39:19 at > org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33) > Feb 28 02:39:19 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > Feb 28 02:39:19 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > Feb 28 02:39:19 at > org.apache.flink.testutils.junit.RetryRule$RetryOnExceptionStatement.evaluate(RetryRule.java:192) > Feb 28 02:39:19 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Feb 28 02:39:19 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > Feb 28 02:39:19 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Feb 28 02:39:19 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Feb 28 02:39:19 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Feb 28 02:39:19 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Feb 28 02:39:19 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Feb 28 02:39:19 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Feb 28 02:39:19 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Feb 28 02:39:19 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Feb 28 02:39:19 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Feb 28 02:39:19 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Feb 28 02:39:19 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:2
[jira] [Updated] (FLINK-22955) lookup join filter push down result to mismatch function signature
[ https://issues.apache.org/jira/browse/FLINK-22955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-22955: --- Labels: auto-deprioritized-critical stale-major (was: auto-deprioritized-critical) I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community manage its development. I see this issues has been marked as Major but is unassigned and neither itself nor its Sub-Tasks have been updated for 60 days. I have gone ahead and added a "stale-major" to the issue". If this ticket is a Major, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized. > lookup join filter push down result to mismatch function signature > -- > > Key: FLINK-22955 > URL: https://issues.apache.org/jira/browse/FLINK-22955 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.11.3, 1.13.1, 1.12.4 > Environment: Flink 1.13.1 > how to reproduce: patch file attached >Reporter: Cooper Luan >Priority: Major > Labels: auto-deprioritized-critical, stale-major > Attachments: > 0001-try-to-produce-lookup-join-filter-pushdown-expensive.patch > > > a sql like this may result to look function signature mismatch exception when > explain sql > {code:sql} > CREATE TEMPORARY VIEW v_vvv AS > SELECT * FROM MyTable AS T > JOIN LookupTableAsync1 FOR SYSTEM_TIME AS OF T.proctime AS D > ON T.a = D.id; > SELECT a,b,id,name > FROM v_vvv > WHERE age = 10;{code} > the lookup function is > {code:scala} > class AsyncTableFunction1 extends AsyncTableFunction[RowData] { > def eval(resultFuture: CompletableFuture[JCollection[RowData]], a: > Integer): Unit = { > } > }{code} > exec plan is > {code:java} > LegacySink(name=[`default_catalog`.`default_database`.`appendSink1`], > fields=[a, b, id, name]) > +- LookupJoin(table=[default_catalog.default_database.LookupTableAsync1], > joinType=[InnerJoin], async=[true], lookup=[age=10, id=a], where=[(age = > 10)], select=[a, b, id, name]) >+- Calc(select=[a, b]) > +- DataStreamScan(table=[[default_catalog, default_database, MyTable]], > fields=[a, b, c, proctime, rowtime]) > {code} > the "lookup=[age=10, id=a]" result to mismatch signature mismatch > > but if I add 1 more insert, it works well > {code:sql} > SELECT a,b,id,name > FROM v_vvv > WHERE age = 30 > {code} > exec plan is > {code:java} > == Optimized Execution Plan == > LookupJoin(table=[default_catalog.default_database.LookupTableAsync1], > joinType=[InnerJoin], async=[true], lookup=[id=a], select=[a, b, c, proctime, > rowtime, id, name, age, ts])(reuse_id=[1]) > +- DataStreamScan(table=[[default_catalog, default_database, MyTable]], > fields=[a, b, c, proctime, > rowtime])LegacySink(name=[`default_catalog`.`default_database`.`appendSink1`], > fields=[a, b, id, name]) > +- Calc(select=[a, b, id, name], where=[(age = 10)]) >+- > Reused(reference_id=[1])LegacySink(name=[`default_catalog`.`default_database`.`appendSink2`], > fields=[a, b, id, name]) > +- Calc(select=[a, b, id, name], where=[(age = 30)]) >+- Reused(reference_id=[1]) > {code} > the LookupJoin node use "lookup=[id=a]"(right) not "lookup=[age=10, id=a]" > (wrong) > > so, in "multi insert" case, planner works great > in "single insert" case, planner throw exception -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] flinkbot edited a comment on pull request #19343: [FLINK-27026][build] Upgrade checkstyle plugin
flinkbot edited a comment on pull request #19343: URL: https://github.com/apache/flink/pull/19343#issuecomment-1086724313 ## CI report: * 52653837e78b17dcdc196b3c6562ccebf22fca17 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34184) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19263: [FLINK-21585][metrics] Add options for in-/excluding metrics
flinkbot edited a comment on pull request #19263: URL: https://github.com/apache/flink/pull/19263#issuecomment-1081065491 ## CI report: * 8eda65b9d6af64aafa22c855e8177f2eacbaf1d0 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34033) * 087457408aa0db8ed8d49e2c7da68b296c25142f Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34183) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #19343: [FLINK-27026][build] Upgrade checkstyle plugin
flinkbot commented on pull request #19343: URL: https://github.com/apache/flink/pull/19343#issuecomment-1086724313 ## CI report: * 52653837e78b17dcdc196b3c6562ccebf22fca17 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19263: [FLINK-21585][metrics] Add options for in-/excluding metrics
flinkbot edited a comment on pull request #19263: URL: https://github.com/apache/flink/pull/19263#issuecomment-1081065491 ## CI report: * 8eda65b9d6af64aafa22c855e8177f2eacbaf1d0 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34033) * 087457408aa0db8ed8d49e2c7da68b296c25142f UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19342: [FLINK-27027] Get rid of oddly named mvn-${sys mvn.forkNumber}.log files
flinkbot edited a comment on pull request #19342: URL: https://github.com/apache/flink/pull/19342#issuecomment-1086722113 ## CI report: * a051c979b72bac01401b1190096f29f30d21dadc Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34182) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-27026) Upgrade checkstyle plugin
[ https://issues.apache.org/jira/browse/FLINK-27026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-27026: --- Labels: pull-request-available (was: ) > Upgrade checkstyle plugin > - > > Key: FLINK-27026 > URL: https://issues.apache.org/jira/browse/FLINK-27026 > Project: Flink > Issue Type: Technical Debt > Components: Build System >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > Newer versions of the checkstyle plugin allow running checkstyle:check > without requiring dependency resolution. This allows it to be used in a fresh > environment. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] zentol opened a new pull request #19343: [FLINK-27026][build] Upgrade checkstyle plugin
zentol opened a new pull request #19343: URL: https://github.com/apache/flink/pull/19343 newer version no longer require dependency resolution -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-27027) Get rid of oddly named mvn-${sys mvn.forkNumber}.log files
[ https://issues.apache.org/jira/browse/FLINK-27027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-27027: --- Labels: pull-request-available (was: ) > Get rid of oddly named mvn-${sys mvn.forkNumber}.log files > -- > > Key: FLINK-27027 > URL: https://issues.apache.org/jira/browse/FLINK-27027 > Project: Flink > Issue Type: Technical Debt > Components: Build System / CI >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] flinkbot commented on pull request #19342: [FLINK-27027] Get rid of oddly named mvn-${sys mvn.forkNumber}.log files
flinkbot commented on pull request #19342: URL: https://github.com/apache/flink/pull/19342#issuecomment-1086722113 ## CI report: * a051c979b72bac01401b1190096f29f30d21dadc UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-27027) Get rid of oddly named mvn-${sys mvn.forkNumber}.log files
Chesnay Schepler created FLINK-27027: Summary: Get rid of oddly named mvn-${sys mvn.forkNumber}.log files Key: FLINK-27027 URL: https://issues.apache.org/jira/browse/FLINK-27027 Project: Flink Issue Type: Technical Debt Components: Build System / CI Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.16.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27026) Upgrade checkstyle plugin
Chesnay Schepler created FLINK-27026: Summary: Upgrade checkstyle plugin Key: FLINK-27026 URL: https://issues.apache.org/jira/browse/FLINK-27026 Project: Flink Issue Type: Technical Debt Components: Build System Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.16.0 Newer versions of the checkstyle plugin allow running checkstyle:check without requiring dependency resolution. This allows it to be used in a fresh environment. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-27025) Cannot read parquet file, after putting the jar in the right place with right permissions
[ https://issues.apache.org/jira/browse/FLINK-27025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang updated FLINK-27025: Attachment: tpch-12-parquet.py > Cannot read parquet file, after putting the jar in the right place with right > permissions > - > > Key: FLINK-27025 > URL: https://issues.apache.org/jira/browse/FLINK-27025 > Project: Flink > Issue Type: Bug > Components: API / Python, Table SQL / API >Affects Versions: 1.14.0 >Reporter: Ziheng Wang >Priority: Blocker > Attachments: tpch-12-parquet.py > > > I am using Flink with the SQL API on AWS EMR. I can run queries on CSV files, > no problem. > However when I try to run queries on Parquet files, I get this error: Caused > by: java.io.StreamCorruptedException: unexpected block data > I have put flink-sql-parquet_2.12-1.14.0.jar under /usr/lib/flink/lib on the > master node of the EMR cluster. Indeed it seems that Flink picks up on it, > because if the jar is not there then the error is different (it says it can't > understand parquet source) The jar has full 777 permissions under the same > username as all the other jars in that file. > I tried passing a folder name as the Parquet source as well as a single > Parquet file, nothing works. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-27025) Cannot read parquet file, after putting the jar in the right place with right permissions
Ziheng Wang created FLINK-27025: --- Summary: Cannot read parquet file, after putting the jar in the right place with right permissions Key: FLINK-27025 URL: https://issues.apache.org/jira/browse/FLINK-27025 Project: Flink Issue Type: Bug Components: API / Python, Table SQL / API Affects Versions: 1.14.0 Reporter: Ziheng Wang I am using Flink with the SQL API on AWS EMR. I can run queries on CSV files, no problem. However when I try to run queries on Parquet files, I get this error: Caused by: java.io.StreamCorruptedException: unexpected block data I have put flink-sql-parquet_2.12-1.14.0.jar under /usr/lib/flink/lib on the master node of the EMR cluster. Indeed it seems that Flink picks up on it, because if the jar is not there then the error is different (it says it can't understand parquet source) The jar has full 777 permissions under the same username as all the other jars in that file. I tried passing a folder name as the Parquet source as well as a single Parquet file, nothing works. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-27024) Cleanup surefire configuration
[ https://issues.apache.org/jira/browse/FLINK-27024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-27024. Resolution: Fixed master: c41f169765646bf9dd1ac8264127e1fc9e399898 > Cleanup surefire configuration > -- > > Key: FLINK-27024 > URL: https://issues.apache.org/jira/browse/FLINK-27024 > Project: Flink > Issue Type: Technical Debt > Components: Build System >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > We have a few redundant surefire configurations in some connector modules, > and overall a lot of duplication and stuff defined on the argLine which could > be systemEnvironmentVariables (which are easier to extend in sub-modules). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] zentol merged pull request #19341: [FLINK-27024][build] Cleanup surefire configuration
zentol merged pull request #19341: URL: https://github.com/apache/flink/pull/19341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-26994) Merge libraries CI profile into core
[ https://issues.apache.org/jira/browse/FLINK-26994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-26994. Resolution: Fixed master: d17b0ba0487a421d1dc00c5264ef820908f54658 > Merge libraries CI profile into core > > > Key: FLINK-26994 > URL: https://issues.apache.org/jira/browse/FLINK-26994 > Project: Flink > Issue Type: Technical Debt > Components: Build System / CI >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > The libraries profile spends more time on setting up the environment than > actually running tests. Merge it with Core for more efficiency. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] zentol merged pull request #19337: [FLINK-26994][ci] Merge Core/Libraries CI profiles
zentol merged pull request #19337: URL: https://github.com/apache/flink/pull/19337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty
flinkbot edited a comment on pull request #19330: URL: https://github.com/apache/flink/pull/19330#issuecomment-1085919480 ## CI report: * 5cc15a539159eaeeb8a11961c46e755041bb7840 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34176) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19341: [FLINK-27024][build] Cleanup surefire configuration
flinkbot edited a comment on pull request #19341: URL: https://github.com/apache/flink/pull/19341#issuecomment-1086648587 ## CI report: * 12bf7a182480d9677b2e7c50c08793665c99dd5f Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34177) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19340: [FLINK-26961][BP-1.14][connectors][filesystems][formats] Update Jackson Databi…
flinkbot edited a comment on pull request #19340: URL: https://github.com/apache/flink/pull/19340#issuecomment-1086644469 ## CI report: * 948190b21e39a82c9accade41cd1fe0c861a8184 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34175) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-web] gyfora commented on pull request #519: Add Kubernetes Operator 0.1.0 release
gyfora commented on pull request #519: URL: https://github.com/apache/flink-web/pull/519#issuecomment-1086667655 @mbalassi please build, review and merge if all good :) I have modified the blogpost release date to 04/04 (monday) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-27024) Cleanup surefire configuration
Chesnay Schepler created FLINK-27024: Summary: Cleanup surefire configuration Key: FLINK-27024 URL: https://issues.apache.org/jira/browse/FLINK-27024 Project: Flink Issue Type: Technical Debt Components: Build System Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.16.0 We have a few redundant surefire configurations in some connector modules, and overall a lot of duplication and stuff defined on the argLine which could be systemEnvironmentVariables (which are easier to extend in sub-modules). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] flinkbot edited a comment on pull request #18386: [FLINK-25684][table] Support enhanced `show databases` syntax
flinkbot edited a comment on pull request #18386: URL: https://github.com/apache/flink/pull/18386#issuecomment-1015100174 ## CI report: * 91a6c605b4d6395e6bd01ce25ae6d0bec1146cae Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34173) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-26969) Write operation examples of tumbling window, sliding window, session window and count window based on pyflink
[ https://issues.apache.org/jira/browse/FLINK-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-26969. --- Fix Version/s: 1.16.0 Resolution: Fixed Merged to master via db23add96305e163328283859fb644af72350018 > Write operation examples of tumbling window, sliding window, session window > and count window based on pyflink > - > > Key: FLINK-26969 > URL: https://issues.apache.org/jira/browse/FLINK-26969 > Project: Flink > Issue Type: Improvement > Components: API / Python, Examples >Reporter: zhangjingcun >Assignee: zhangjingcun >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > Write operation examples of tumbling window, sliding window, session window > and count window based on pyflink -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] flinkbot edited a comment on pull request #19218: hive dialect supports select current database
flinkbot edited a comment on pull request #19218: URL: https://github.com/apache/flink/pull/19218#issuecomment-1077287637 ## CI report: * 959005953d10ba602eb1bad4ee2004179f9e8626 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34172) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] dianfu closed pull request #19328: [FLINK-26969][Examples] Write operation examples of tumbling window, sliding window, session window and count window based on pyflink
dianfu closed pull request #19328: URL: https://github.com/apache/flink/pull/19328 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-26540) Support handle join involving complex types in on condition
[ https://issues.apache.org/jira/browse/FLINK-26540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516320#comment-17516320 ] Jing Zhang edited comment on FLINK-26540 at 4/2/22 2:39 PM: merged in master: 0e459bf810736347ad7bf1eb62be4dad3382392e was (Author: qingru zhang): fixed in master: 0e459bf810736347ad7bf1eb62be4dad3382392e > Support handle join involving complex types in on condition > --- > > Key: FLINK-26540 > URL: https://issues.apache.org/jira/browse/FLINK-26540 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > Now, the Hive dialect can't handle join involving complex types in on > condition, which can be reproduced using the following code in > HiveDialectITCase: > {code:java} > tableEnv.executeSql("CREATE TABLE test2a (a ARRAY)"); > tableEnv.executeSql("CREATE TABLE test2b (a INT)"); > List results = > CollectionUtil.iteratorToList( > tableEnv.executeSql( > "select * from test2b join test2a on > test2b.a = test2a.a[1]") > .collect()); > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-26540) Support handle join involving complex types in on condition
[ https://issues.apache.org/jira/browse/FLINK-26540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhang closed FLINK-26540. -- Resolution: Fixed > Support handle join involving complex types in on condition > --- > > Key: FLINK-26540 > URL: https://issues.apache.org/jira/browse/FLINK-26540 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > Now, the Hive dialect can't handle join involving complex types in on > condition, which can be reproduced using the following code in > HiveDialectITCase: > {code:java} > tableEnv.executeSql("CREATE TABLE test2a (a ARRAY)"); > tableEnv.executeSql("CREATE TABLE test2b (a INT)"); > List results = > CollectionUtil.iteratorToList( > tableEnv.executeSql( > "select * from test2b join test2a on > test2b.a = test2a.a[1]") > .collect()); > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-26540) Support handle join involving complex types in on condition
[ https://issues.apache.org/jira/browse/FLINK-26540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516320#comment-17516320 ] Jing Zhang commented on FLINK-26540: fixed in master: 0e459bf810736347ad7bf1eb62be4dad3382392e > Support handle join involving complex types in on condition > --- > > Key: FLINK-26540 > URL: https://issues.apache.org/jira/browse/FLINK-26540 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > Now, the Hive dialect can't handle join involving complex types in on > condition, which can be reproduced using the following code in > HiveDialectITCase: > {code:java} > tableEnv.executeSql("CREATE TABLE test2a (a ARRAY)"); > tableEnv.executeSql("CREATE TABLE test2b (a INT)"); > List results = > CollectionUtil.iteratorToList( > tableEnv.executeSql( > "select * from test2b join test2a on > test2b.a = test2a.a[1]") > .collect()); > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] beyond1920 closed pull request #19012: [FLINK-26540][hive] Support handle join involving complex types in on condition
beyond1920 closed pull request #19012: URL: https://github.com/apache/flink/pull/19012 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19341: [FLINK-27024][build] Cleanup surefire configuration
flinkbot edited a comment on pull request #19341: URL: https://github.com/apache/flink/pull/19341#issuecomment-1086648587 ## CI report: * 12bf7a182480d9677b2e7c50c08793665c99dd5f Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34177) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #19341: [FLINK-27024][build] Cleanup surefire configuration
flinkbot commented on pull request #19341: URL: https://github.com/apache/flink/pull/19341#issuecomment-1086648587 ## CI report: * 12bf7a182480d9677b2e7c50c08793665c99dd5f UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-27011) Add state access examples of Python DataStream API
[ https://issues.apache.org/jira/browse/FLINK-27011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516315#comment-17516315 ] jay li commented on FLINK-27011: [~dianfu] cool , It looks like this ticket has already been assigned. > Add state access examples of Python DataStream API > -- > > Key: FLINK-27011 > URL: https://issues.apache.org/jira/browse/FLINK-27011 > Project: Flink > Issue Type: Improvement > Components: API / Python, Examples >Reporter: zhangjingcun >Assignee: zhangjingcun >Priority: Major > > Add various types of state usage examples -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-26512) Document debugging and manual error handling
[ https://issues.apache.org/jira/browse/FLINK-26512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora updated FLINK-26512: --- Fix Version/s: kubernetes-operator-1.0.0 (was: kubernetes-operator-0.1.0) > Document debugging and manual error handling > > > Key: FLINK-26512 > URL: https://issues.apache.org/jira/browse/FLINK-26512 > Project: Flink > Issue Type: Sub-task > Components: Kubernetes Operator >Reporter: Gyula Fora >Priority: Major > Fix For: kubernetes-operator-1.0.0 > > > We should document how users can discover errors that happen during job > deployments, such as which k8s logs to check etc. > We should also highlight limitation of the operator error handling and > document how users can manually recover jobs (for example: delete -> manually > restart with initial savepoint) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-26611) Document operator config options
[ https://issues.apache.org/jira/browse/FLINK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora closed FLINK-26611. -- Resolution: Fixed Merged main: ffe0d55613764a9dd768cc20f5d96da8154c0865 release-0.1: 01d717e63ed02747a79808558e5aa75a2b1cb4f3 > Document operator config options > > > Key: FLINK-26611 > URL: https://issues.apache.org/jira/browse/FLINK-26611 > Project: Flink > Issue Type: Sub-task > Components: Kubernetes Operator >Reporter: Gyula Fora >Assignee: Biao Geng >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > > Similar to how other flink configs are documented we should also document the > configs found in OperatorConfigOptions. > Generating the documentation might be possible but maybe it's easier to do it > manually. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] flinkbot edited a comment on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty
flinkbot edited a comment on pull request #19330: URL: https://github.com/apache/flink/pull/19330#issuecomment-1085919480 ## CI report: * Unknown: [CANCELED](TBD) * 5cc15a539159eaeeb8a11961c46e755041bb7840 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34176) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-27024) Cleanup surefire configuration
[ https://issues.apache.org/jira/browse/FLINK-27024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-27024: --- Labels: pull-request-available (was: ) > Cleanup surefire configuration > -- > > Key: FLINK-27024 > URL: https://issues.apache.org/jira/browse/FLINK-27024 > Project: Flink > Issue Type: Technical Debt > Components: Build System >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > We have a few redundant surefire configurations in some connector modules, > and overall a lot of duplication and stuff defined on the argLine which could > be systemEnvironmentVariables (which are easier to extend in sub-modules). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [flink] zentol opened a new pull request #19341: [FLINK-27024][build] Cleanup surefire configuration
zentol opened a new pull request #19341: URL: https://github.com/apache/flink/pull/19341 Removes unnecessary special surefire configurations and models as much as possible as environment variables to make things easier to extend. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty
flinkbot edited a comment on pull request #19330: URL: https://github.com/apache/flink/pull/19330#issuecomment-1085919480 ## CI report: * Unknown: [CANCELED](TBD) * 5cc15a539159eaeeb8a11961c46e755041bb7840 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19315: [FLINK-27013][hive] Hive dialect supports is_distinct_from
flinkbot edited a comment on pull request #19315: URL: https://github.com/apache/flink/pull/19315#issuecomment-1085386734 ## CI report: * 5c2e924a5bf679bf9b54cd00b1ec1d1e5116048e UNKNOWN * 549983475a45a52f5917acacb4e9f00ad7687a49 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34170) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] jay-li-csck commented on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty
jay-li-csck commented on pull request #19330: URL: https://github.com/apache/flink/pull/19330#issuecomment-1086646200 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] jay-li-csck removed a comment on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty
jay-li-csck removed a comment on pull request #19330: URL: https://github.com/apache/flink/pull/19330#issuecomment-1086597006 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] jay-li-csck removed a comment on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty
jay-li-csck removed a comment on pull request #19330: URL: https://github.com/apache/flink/pull/19330#issuecomment-1086608452 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19331: [FLINK-26985][runtime] Don't discard shared state of restored checkpoints
flinkbot edited a comment on pull request #19331: URL: https://github.com/apache/flink/pull/19331#issuecomment-108520 ## CI report: * 3d8a33dad195ea7303270333d05f5449a5ea71bf Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34171) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19340: [FLINK-26961][BP-1.14][connectors][filesystems][formats] Update Jackson Databi…
flinkbot edited a comment on pull request #19340: URL: https://github.com/apache/flink/pull/19340#issuecomment-1086644469 ## CI report: * 948190b21e39a82c9accade41cd1fe0c861a8184 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34175) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework
lindong28 commented on a change in pull request #71: URL: https://github.com/apache/flink-ml/pull/71#discussion_r841078416 ## File path: flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/data/DataGenerator.java ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.benchmark.data; + +import org.apache.flink.ml.common.param.HasSeed; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; + +/** Interface for generating data as table arrays. */ +public interface DataGenerator> extends HasSeed { Review comment: After thinking about this more, I think it might be simpler to just remove `DataGeneratorParams` and define parameters directly in `DataGenerator`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework
lindong28 commented on a change in pull request #71: URL: https://github.com/apache/flink-ml/pull/71#discussion_r841074926 ## File path: flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/data/InputDataGeneratorParams.java ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.benchmark.data; + +import org.apache.flink.ml.param.LongParam; +import org.apache.flink.ml.param.Param; +import org.apache.flink.ml.param.ParamValidators; +import org.apache.flink.ml.param.StringArrayParam; + +/** Interface for the input data generator params. */ +public interface InputDataGeneratorParams extends DataGeneratorParams { Review comment: Would it be simpler to remove this class and define parameters directly in `InputDataGenerator`? By doing this, all subclasses of `InputDataGenerator`, such as `DenseVectorArrayGenerator` and `DenseVectorGenerator`, would inherit those parameters. Note that we have dedicated parameter classes for algorithms because an algorithm typically have a pair of estimator and model, where estimator and model share some but not call parameters. Data generators do not have have this problem. Similarly, it might be simpler to remove the generator-specific parameter classes such as `DenseVectorArrayGeneratorParams` and `DenseVectorGeneratorParams`. We can still define non-generator-specific parameters in dedicated classes such as `HasArraySize`. ## File path: flink-ml-benchmark/src/test/java/org/apache/flink/ml/benchmark/BenchmarkTest.java ## @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.benchmark; + +import org.apache.flink.ml.benchmark.data.InputDataGenerator; +import org.apache.flink.ml.benchmark.data.common.DenseVectorGenerator; +import org.apache.flink.ml.clustering.kmeans.KMeans; +import org.apache.flink.ml.clustering.kmeans.KMeansModel; +import org.apache.flink.ml.param.WithParams; +import org.apache.flink.ml.util.ReadWriteUtils; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; +import org.apache.flink.test.util.AbstractTestBase; + +import org.apache.commons.io.FileUtils; +import org.junit.After; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.io.ByteArrayOutputStream; +import java.io.File; +import java.io.InputStream; +import java.io.PrintStream; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import static org.apache.flink.ml.util.ReadWriteUtils.OBJECT_MAPPER; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +/** Tests benchmarks. */ +@SuppressWarnings("unchecked") +public class BenchmarkTest extends AbstractTestBase { +@Rule public final TemporaryFolder tempFolder = new TemporaryFolder(); + +private static final String exampleConfigFile = "kmeansmodel-benchmark.json"; +private static final String expectedBenchmarkName = "KMeansModel-1"; + +private final ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); +private final PrintStream originalOut = System.out; + +@Before
[GitHub] [flink] flinkbot commented on pull request #19340: [FLINK-26961][BP-1.14][connectors][filesystems][formats] Update Jackson Databi…
flinkbot commented on pull request #19340: URL: https://github.com/apache/flink/pull/19340#issuecomment-1086644469 ## CI report: * 948190b21e39a82c9accade41cd1fe0c861a8184 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] snuyanzin opened a new pull request #19340: [FLINK-26961][BP-1.14][connectors][filesystems][formats] Update Jackson Databi…
snuyanzin opened a new pull request #19340: URL: https://github.com/apache/flink/pull/19340 This is a 1.14 backport of PR https://github.com/apache/flink/pull/19303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19328: [FLINK-26969][Examples] Write operation examples of tumbling window, sliding window, session window and count window based on pyflink
flinkbot edited a comment on pull request #19328: URL: https://github.com/apache/flink/pull/19328#issuecomment-1085814902 ## CI report: * b4c99e946189a0f96d61260557ed042431b14480 UNKNOWN * 49ce90f0906fc05b69af7fa969aee682d69121d4 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34168) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #19339: [FLINK-26961][BP-1.15][connectors][filesystems][formats] Update Jackson Databi…
flinkbot edited a comment on pull request #19339: URL: https://github.com/apache/flink/pull/19339#issuecomment-1086609854 ## CI report: * 4e4dd3279b26e140bfef2bba76a57369000aa044 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34169) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org