date:20220402

[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * f5f269e5c65f55d2d132677f3ce23c148c00bcf4 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34149)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-27022) Expose taskSlots in TaskManager spec

2022-04-02 Thread Gyula Fora (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-27022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516461#comment-17516461
 ] 

Gyula Fora commented on FLINK-27022:


Yea maybe you are right with my logic we could start adding everything which 
obviously doesnt make sense... it just always annoys me a little :D 

> Expose taskSlots in TaskManager spec
> 
>
> Key: FLINK-27022
> URL: https://issues.apache.org/jira/browse/FLINK-27022
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
>
> Basically every Flink job needs the number of taskslots configuration and it 
> is an important part of the resource spec.
> We should include this in the taskmanager spec as an integer parameter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27029) DeploymentValidator should take default flink config into account during validation

2022-04-02 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-27029:
--

 Summary: DeploymentValidator should take default flink config into 
account during validation
 Key: FLINK-27029
 URL: https://issues.apache.org/jira/browse/FLINK-27029
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Gyula Fora
 Fix For: kubernetes-operator-1.0.0


Currently the DefaultDeploymentValidator only takes the FlinkDeployment object 
into account.

However in places where we validate the presence of config keys we should also 
consider the default flink config which might already provide default values 
for the required configs even if the deployment itself doesnt.

We should make sure this works correctly both in the operator and the webhook



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-27005) Bump CRD version to v1alpha2

2022-04-02 Thread Gyula Fora (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-27005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516455#comment-17516455
 ] 

Gyula Fora commented on FLINK-27005:


I am wondering whether we should keep the v1alpha1 classes. 

For stable versions this would be a requirement, but I think for alpha versions 
we can simply replace them (bump version and "delete" the old one). This will 
avoid a lot of garbage accumulating in the repo for the alpha iterations. What 
do you think [~matyas] ?

> Bump CRD version to v1alpha2
> 
>
> Key: FLINK-27005
> URL: https://issues.apache.org/jira/browse/FLINK-27005
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Blocker
>
> We should upgrade the CRD version to v1alpha2 for both FlinkDeployment and 
> FlinkSessionJob to avoid any conflicts with the preview release.
> We should also upgrade the version in all examples and documentation that 
> references it.
> We can also consider introducing some tooling to make this easier as we will 
> have to repeate this step at least a few more times.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-27028) Support to upload jar and run jar in RestClusterClient

2022-04-02 Thread Aitozi (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-27028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516448#comment-17516448
 ] 

Aitozi commented on FLINK-27028:


cc [~wangyang0918]  [~chesnay]   If no objection, I'm willing to open a pull 
request for this.

> Support to upload jar and run jar in RestClusterClient
> --
>
> Key: FLINK-27028
> URL: https://issues.apache.org/jira/browse/FLINK-27028
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Reporter: Aitozi
>Priority: Major
>
> The {{flink-kubernetes-operator}} is using the JarUpload + JarRun to support 
> the session job submission. However, currently the RestClusterClient do not 
> expose a way to upload the user jar to session cluster and trigger the jar 
> run api. So a naked RestClient is used to achieve this, but it lacks the 
> common retry logic.
> Can we expose these two api the the rest cluster client to make it more 
> convenient to use in the operator 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-27028) Support to upload jar and run jar in RestClusterClient

2022-04-02 Thread Aitozi (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-27028:
---
Description: 
The {{flink-kubernetes-operator}} is using the JarUpload + JarRun to support 
the session job submission. However, currently the RestClusterClient do not 
expose a way to upload the user jar to session cluster and trigger the jar run 
api. So a naked RestClient is used to achieve this, but it lacks the common 
retry logic.

Can we expose these two api the the rest cluster client to make it more 
convenient to use in the operator 

  was:
The flink-kubernetes-operator is using the JarUpload + JarRun to support the 
session job management. However, currently the RestClusterClient do not expose 
a way to upload the user jar to session cluster and trigger the jar run api. So 
I used to naked RestClient to achieve this. 

Can we expose these two api the the rest cluster client to make it more 
convenient to use in the operator


> Support to upload jar and run jar in RestClusterClient
> --
>
> Key: FLINK-27028
> URL: https://issues.apache.org/jira/browse/FLINK-27028
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Reporter: Aitozi
>Priority: Major
>
> The {{flink-kubernetes-operator}} is using the JarUpload + JarRun to support 
> the session job submission. However, currently the RestClusterClient do not 
> expose a way to upload the user jar to session cluster and trigger the jar 
> run api. So a naked RestClient is used to achieve this, but it lacks the 
> common retry logic.
> Can we expose these two api the the rest cluster client to make it more 
> convenient to use in the operator 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27028) Support to upload jar and run jar in RestClusterClient

2022-04-02 Thread Aitozi (Jira)

Aitozi created FLINK-27028:
--

 Summary: Support to upload jar and run jar in RestClusterClient
 Key: FLINK-27028
 URL: https://issues.apache.org/jira/browse/FLINK-27028
 Project: Flink
  Issue Type: Improvement
  Components: Client / Job Submission
Reporter: Aitozi


The flink-kubernetes-operator is using the JarUpload + JarRun to support the 
session job management. However, currently the RestClusterClient do not expose 
a way to upload the user jar to session cluster and trigger the jar run api. So 
I used to naked RestClient to achieve this. 

Can we expose these two api the the rest cluster client to make it more 
convenient to use in the operator



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * f5f269e5c65f55d2d132677f3ce23c148c00bcf4 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34149)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-26845) Document testing guidelines

2022-04-02 Thread Salva (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516446#comment-17516446
 ] 

Salva commented on FLINK-26845:
---

Sounds good [~trohrmann], I could start looking at the different SDKs once the 
tickets are created. Also agree on having a separate one for the e2e part.

> Document testing guidelines
> ---
>
> Key: FLINK-26845
> URL: https://issues.apache.org/jira/browse/FLINK-26845
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Salva
>Priority: Minor
>
> As of now, there seems to be not much guidance on how to approach testing. 
> Although it's true [~sjwiesman] that 
> {quote}Unlike other flink apis, the sdk doesn’t pull in the runtime so 
> testing should really look like any other code. There isn’t much statefun 
> specific.
> {quote}
> I think that at the very least testing should be mentioned as part of the 
> docs, even if it is only to stress the above fact. Indeed, as reported by 
> [~trohrmann], there seems that there are already some test utilities in place 
> but they are not well-documented, plus some potential ideas on how to improve 
> on that front
> {quote}Testing tools is definitely one area where we want to improve 
> significantly. Also the documentation for how to do things needs to be 
> updated. There are some testing utils that you can already use today: 
> [https://github.com/apache/flink-statefun/blob/master/statefun-sdk-java/src/main/java/org/apache/flink/statefun/sdk/java/testing/TestContext.java…|https://t.co/WGjyMA510b].
>  However, it is not well documented.
> {quote}
> Once the overall guidelines are in place for different testing strategies, 
> tests could be added to the examples in the 
> {_}[playground|https://github.com/apache/flink-statefun-playground]{_}.
> {_}Note{_}: Issue originally reported in Twitter: 
> [https://twitter.com/salvalcantara/status/1505834101026267136?s=20&t=Go2IHP6iP4ZmIyVLmIeD3g]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * f5f269e5c65f55d2d132677f3ce23c148c00bcf4 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34149)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Myasuka commented on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-02 Thread GitBox



Myasuka commented on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086779793


   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-27001) Support to specify the resource of the operator

2022-04-02 Thread Aitozi (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-27001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516445#comment-17516445
 ] 

Aitozi commented on FLINK-27001:


It seems this can also implement by 
https://issues.apache.org/jira/browse/FLINK-26663 So I will not work on this 
right now, I will keep an eye on this.

> Support to specify the resource of the operator 
> 
>
> Key: FLINK-27001
> URL: https://issues.apache.org/jira/browse/FLINK-27001
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Aitozi
>Priority: Major
>
> Supporting to specify the operator resource requirements and limits



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-27000) Support to set JVM args for operator

2022-04-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-27000:
---
Labels: pull-request-available  (was: )

> Support to set JVM args for operator
> 
>
> Key: FLINK-27000
> URL: https://issues.apache.org/jira/browse/FLINK-27000
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Aitozi
>Priority: Major
>  Labels: pull-request-available
>
> In production we often need to set the JVM option to operator



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841158223



##
File path: docs/content/docs/development/write-table.md
##
@@ -0,0 +1,190 @@
+---
+title: "Write Table"
+weight: 3
+type: docs
+aliases:
+- /development/write-table.html
+---
+
+
+# Write Table
+
+```sql
+INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name
+  [PARTITION part_spec] [column_list] select_statement
+
+part_spec:
+  (part_col_name1=val1 [, part_col_name2=val2, ...])
+
+column_list:
+  (col_name1 [, column_name2, ...])
+```
+
+## Unify Streaming and Batch

Review comment:
   Let's spend more effort on polishing the user story. The current 
expression is a bit colloquial, not enough written and formal.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841157829



##
File path: docs/content/docs/development/write-table.md
##
@@ -0,0 +1,190 @@
+---
+title: "Write Table"
+weight: 3
+type: docs
+aliases:
+- /development/write-table.html
+---
+
+
+# Write Table
+
+```sql
+INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name
+  [PARTITION part_spec] [column_list] select_statement
+
+part_spec:
+  (part_col_name1=val1 [, part_col_name2=val2, ...])
+
+column_list:
+  (col_name1 [, column_name2, ...])
+```
+
+## Unify Streaming and Batch
+
+Table Store data writing supports Flink's batch and streaming modes.
+Moreover, Table Store supports simultaneous writing of streaming and batch:
+
+```sql
+-- a managed table ddl
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  dt STRING
+) PARTITIONED BY (dt);
+
+-- Run a stream job that continuously writes to the table
+SET 'execution.runtime-mode' = 'streaming';
+INSERT INTO MyTable SELECT ... FROM MyCdcTable;
+```
+
+You have this partitioned table, and now there is a stream job that
+is continuously writing data.
+
+After a few days, you find that there is a problem with yesterday's
+partition data, the data in the partition is wrong, you need to
+recalculate and revise the partition.
+
+```sql
+-- Run a batch job to revise yesterday's partition
+SET 'execution.runtime-mode' = 'batch';
+INSERT OVERWRITE MyTable PARTITION ('dt'='20220402')
+  SELECT ... FROM SourceTable WHERE dt = '20220402';
+```
+
+This way you revise yesterday's partition without suspending the streaming job.
+
+{{< hint info >}}
+__Note:__ Multiple jobs writing to a single partition at the same time is
+not supported. The behavior does not result in data errors, but can lead
+to job failover.
+{{< /hint >}}
+
+## Parallelism
+
+It is recommended that the parallelism of sink should be less than or
+equal to the number of buckets, preferably equal. You can control the
+parallelism of the sink with the `sink.parallelism` option.
+
+
+
+
+  Option
+  Required
+  Default
+  Type
+  Description
+
+
+
+
+  sink.parallelism
+  No
+  (none)
+  Integer
+  Defines the parallelism of the sink operator. By default, the 
parallelism is determined by the framework using the same parallelism of the 
upstream chained operator.
+
+
+
+
+## Expire Snapshot
+
+Table Store generates one or two snapshots per commit. To avoid too many 
snapshots
+that create a lot of small files and redundant storage, Table Store write 
defaults
+to eliminating expired snapshots, controlled by the following options:
+
+
+
+
+  Option
+  Required
+  Default
+  Type
+  Description
+
+
+
+
+  snapshot.time-retained
+  No
+  1 h
+  Duration
+  The maximum time of completed snapshots to retain.
+
+
+  snapshot.num-retained
+  No
+  Integer.MAX_VALUE
+  Integer
+  The maximum number of completed snapshots to retain.
+
+
+
+
+Please note that too short retain time or too small retain number may result 
in:
+- Batch query cannot find the file. For example, the table is relatively large 
and
+  the batch query takes 10 minutes to read, but the snapshot from 10 minutes 
ago
+  expires, at which point the batch query will read a deleted snapshot.
+- Continuous reading jobs on FileStore (Without Log System) fail to restart. 
At the
+  time of the job failover, the snapshot it recorded has expired.
+
+## Performance
+
+Table Store uses LSM data structure, which itself has the ability to support a 
large
+number of updates. Update performance and query performance is a tradeoff, the
+following parameters control this tradeoff:
+
+
+
+
+  Option
+  Required
+  Default
+  Type
+  Description
+
+
+
+
+  num-sorted-run.max
+  No
+  5
+  Integer
+  The max sorted run number. Includes level0 files (one file one 
sorted run) and high-level runs (one level one sorted run).
+
+
+
+
+- The larger `num-sorted-run.max`: the less merge cost when updating data, 
which
+  can avoid many invalid merges. However, if this value is too large, more 
memory
+  will be needed when merging files, because each FileReader will take up a lot
+  of memory.
+- Smaller `num-sorted-run.max`: better performance when querying, fewer files
+  will be merged.

Review comment:
   Nit: Keep consistency
   - The larger `num-sorted-run.max`, the less merge cost when updating data, 
which can avoid many invalid merges. However, if this value is too large, more 
memory will be needed when merging files because each FileReader will take up a 
lot of memory.
   - The smaller `num-sorted-run.max`, the better performance

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841157936



##
File path: docs/content/docs/development/write-table.md
##
@@ -0,0 +1,190 @@
+---
+title: "Write Table"
+weight: 3
+type: docs
+aliases:
+- /development/write-table.html
+---
+
+
+# Write Table
+
+```sql
+INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name
+  [PARTITION part_spec] [column_list] select_statement
+
+part_spec:
+  (part_col_name1=val1 [, part_col_name2=val2, ...])
+
+column_list:
+  (col_name1 [, column_name2, ...])
+```
+
+## Unify Streaming and Batch
+
+Table Store data writing supports Flink's batch and streaming modes.
+Moreover, Table Store supports simultaneous writing of streaming and batch:
+
+```sql
+-- a managed table ddl
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  dt STRING
+) PARTITIONED BY (dt);
+
+-- Run a stream job that continuously writes to the table
+SET 'execution.runtime-mode' = 'streaming';
+INSERT INTO MyTable SELECT ... FROM MyCdcTable;
+```
+
+You have this partitioned table, and now there is a stream job that
+is continuously writing data.
+
+After a few days, you find that there is a problem with yesterday's
+partition data, the data in the partition is wrong, you need to
+recalculate and revise the partition.
+
+```sql
+-- Run a batch job to revise yesterday's partition
+SET 'execution.runtime-mode' = 'batch';
+INSERT OVERWRITE MyTable PARTITION ('dt'='20220402')
+  SELECT ... FROM SourceTable WHERE dt = '20220402';
+```
+
+This way you revise yesterday's partition without suspending the streaming job.
+
+{{< hint info >}}
+__Note:__ Multiple jobs writing to a single partition at the same time is
+not supported. The behavior does not result in data errors, but can lead
+to job failover.
+{{< /hint >}}
+
+## Parallelism
+
+It is recommended that the parallelism of sink should be less than or
+equal to the number of buckets, preferably equal. You can control the
+parallelism of the sink with the `sink.parallelism` option.
+
+
+
+
+  Option
+  Required
+  Default
+  Type
+  Description
+
+
+
+
+  sink.parallelism
+  No
+  (none)
+  Integer
+  Defines the parallelism of the sink operator. By default, the 
parallelism is determined by the framework using the same parallelism of the 
upstream chained operator.
+
+
+
+
+## Expire Snapshot
+
+Table Store generates one or two snapshots per commit. To avoid too many 
snapshots
+that create a lot of small files and redundant storage, Table Store write 
defaults
+to eliminating expired snapshots, controlled by the following options:
+
+
+
+
+  Option
+  Required
+  Default
+  Type
+  Description
+
+
+
+
+  snapshot.time-retained
+  No
+  1 h
+  Duration
+  The maximum time of completed snapshots to retain.
+
+
+  snapshot.num-retained
+  No
+  Integer.MAX_VALUE
+  Integer
+  The maximum number of completed snapshots to retain.
+
+
+
+
+Please note that too short retain time or too small retain number may result 
in:
+- Batch query cannot find the file. For example, the table is relatively large 
and
+  the batch query takes 10 minutes to read, but the snapshot from 10 minutes 
ago
+  expires, at which point the batch query will read a deleted snapshot.
+- Continuous reading jobs on FileStore (Without Log System) fail to restart. 
At the
+  time of the job failover, the snapshot it recorded has expired.
+
+## Performance
+
+Table Store uses LSM data structure, which itself has the ability to support a 
large
+number of updates. Update performance and query performance is a tradeoff, the
+following parameters control this tradeoff:
+
+
+
+
+  Option
+  Required
+  Default
+  Type
+  Description
+
+
+
+
+  num-sorted-run.max
+  No
+  5
+  Integer
+  The max sorted run number. Includes level0 files (one file one 
sorted run) and high-level runs (one level one sorted run).
+
+
+
+
+- The larger `num-sorted-run.max`: the less merge cost when updating data, 
which
+  can avoid many invalid merges. However, if this value is too large, more 
memory
+  will be needed when merging files, because each FileReader will take up a lot
+  of memory.
+- Smaller `num-sorted-run.max`: better performance when querying, fewer files
+  will be merged.
+
+## Memory
+
+There are three main places in the Table Store's sink writer that take up 
memory:
+- MemTable's write buffer, which is individually occupied by each partition, 
each
+  bucket, and this memory value can be adjustable by the `write-buffer-size`
+  option (default 64 MB).
+- The memory consumed by compaction for reading files, it can be adj

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841157829



##
File path: docs/content/docs/development/write-table.md
##
@@ -0,0 +1,190 @@
+---
+title: "Write Table"
+weight: 3
+type: docs
+aliases:
+- /development/write-table.html
+---
+
+
+# Write Table
+
+```sql
+INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name
+  [PARTITION part_spec] [column_list] select_statement
+
+part_spec:
+  (part_col_name1=val1 [, part_col_name2=val2, ...])
+
+column_list:
+  (col_name1 [, column_name2, ...])
+```
+
+## Unify Streaming and Batch
+
+Table Store data writing supports Flink's batch and streaming modes.
+Moreover, Table Store supports simultaneous writing of streaming and batch:
+
+```sql
+-- a managed table ddl
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  dt STRING
+) PARTITIONED BY (dt);
+
+-- Run a stream job that continuously writes to the table
+SET 'execution.runtime-mode' = 'streaming';
+INSERT INTO MyTable SELECT ... FROM MyCdcTable;
+```
+
+You have this partitioned table, and now there is a stream job that
+is continuously writing data.
+
+After a few days, you find that there is a problem with yesterday's
+partition data, the data in the partition is wrong, you need to
+recalculate and revise the partition.
+
+```sql
+-- Run a batch job to revise yesterday's partition
+SET 'execution.runtime-mode' = 'batch';
+INSERT OVERWRITE MyTable PARTITION ('dt'='20220402')
+  SELECT ... FROM SourceTable WHERE dt = '20220402';
+```
+
+This way you revise yesterday's partition without suspending the streaming job.
+
+{{< hint info >}}
+__Note:__ Multiple jobs writing to a single partition at the same time is
+not supported. The behavior does not result in data errors, but can lead
+to job failover.
+{{< /hint >}}
+
+## Parallelism
+
+It is recommended that the parallelism of sink should be less than or
+equal to the number of buckets, preferably equal. You can control the
+parallelism of the sink with the `sink.parallelism` option.
+
+
+
+
+  Option
+  Required
+  Default
+  Type
+  Description
+
+
+
+
+  sink.parallelism
+  No
+  (none)
+  Integer
+  Defines the parallelism of the sink operator. By default, the 
parallelism is determined by the framework using the same parallelism of the 
upstream chained operator.
+
+
+
+
+## Expire Snapshot
+
+Table Store generates one or two snapshots per commit. To avoid too many 
snapshots
+that create a lot of small files and redundant storage, Table Store write 
defaults
+to eliminating expired snapshots, controlled by the following options:
+
+
+
+
+  Option
+  Required
+  Default
+  Type
+  Description
+
+
+
+
+  snapshot.time-retained
+  No
+  1 h
+  Duration
+  The maximum time of completed snapshots to retain.
+
+
+  snapshot.num-retained
+  No
+  Integer.MAX_VALUE
+  Integer
+  The maximum number of completed snapshots to retain.
+
+
+
+
+Please note that too short retain time or too small retain number may result 
in:
+- Batch query cannot find the file. For example, the table is relatively large 
and
+  the batch query takes 10 minutes to read, but the snapshot from 10 minutes 
ago
+  expires, at which point the batch query will read a deleted snapshot.
+- Continuous reading jobs on FileStore (Without Log System) fail to restart. 
At the
+  time of the job failover, the snapshot it recorded has expired.
+
+## Performance
+
+Table Store uses LSM data structure, which itself has the ability to support a 
large
+number of updates. Update performance and query performance is a tradeoff, the
+following parameters control this tradeoff:
+
+
+
+
+  Option
+  Required
+  Default
+  Type
+  Description
+
+
+
+
+  num-sorted-run.max
+  No
+  5
+  Integer
+  The max sorted run number. Includes level0 files (one file one 
sorted run) and high-level runs (one level one sorted run).
+
+
+
+
+- The larger `num-sorted-run.max`: the less merge cost when updating data, 
which
+  can avoid many invalid merges. However, if this value is too large, more 
memory
+  will be needed when merging files, because each FileReader will take up a lot
+  of memory.
+- Smaller `num-sorted-run.max`: better performance when querying, fewer files
+  will be merged.

Review comment:
   Nit: Keep consistency
   - The larger `num-sorted-run.max`, the less merge cost when updating data, 
which
 can avoid many invalid merges. However, if this value is too large, more 
memory
 will be needed when merging files because each FileReader will take up a 
lot
 of memory.
   - The smaller `num-sorted-run.max`, the better p

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841157612



##
File path: docs/content/docs/development/write-table.md
##
@@ -0,0 +1,190 @@
+---
+title: "Write Table"
+weight: 3
+type: docs
+aliases:
+- /development/write-table.html
+---
+
+
+# Write Table
+
+```sql
+INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name
+  [PARTITION part_spec] [column_list] select_statement
+
+part_spec:
+  (part_col_name1=val1 [, part_col_name2=val2, ...])
+
+column_list:
+  (col_name1 [, column_name2, ...])
+```
+
+## Unify Streaming and Batch
+
+Table Store data writing supports Flink's batch and streaming modes.
+Moreover, Table Store supports simultaneous writing of streaming and batch:
+
+```sql
+-- a managed table ddl
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  dt STRING
+) PARTITIONED BY (dt);
+
+-- Run a stream job that continuously writes to the table
+SET 'execution.runtime-mode' = 'streaming';
+INSERT INTO MyTable SELECT ... FROM MyCdcTable;
+```
+
+You have this partitioned table, and now there is a stream job that
+is continuously writing data.
+
+After a few days, you find that there is a problem with yesterday's
+partition data, the data in the partition is wrong, you need to
+recalculate and revise the partition.
+
+```sql
+-- Run a batch job to revise yesterday's partition
+SET 'execution.runtime-mode' = 'batch';
+INSERT OVERWRITE MyTable PARTITION ('dt'='20220402')
+  SELECT ... FROM SourceTable WHERE dt = '20220402';
+```
+
+This way you revise yesterday's partition without suspending the streaming job.
+
+{{< hint info >}}
+__Note:__ Multiple jobs writing to a single partition at the same time is
+not supported. The behavior does not result in data errors, but can lead
+to job failover.
+{{< /hint >}}
+
+## Parallelism
+
+It is recommended that the parallelism of sink should be less than or
+equal to the number of buckets, preferably equal. You can control the
+parallelism of the sink with the `sink.parallelism` option.
+
+
+
+
+  Option
+  Required
+  Default
+  Type
+  Description
+
+
+
+
+  sink.parallelism
+  No
+  (none)
+  Integer
+  Defines the parallelism of the sink operator. By default, the 
parallelism is determined by the framework using the same parallelism of the 
upstream chained operator.
+
+
+
+
+## Expire Snapshot
+
+Table Store generates one or two snapshots per commit. To avoid too many 
snapshots
+that create a lot of small files and redundant storage, Table Store write 
defaults
+to eliminating expired snapshots, controlled by the following options:

Review comment:
   Table Store generates one or two snapshots per commit. To avoid too many 
snapshots that create a lot of small files and redundant storage, Table Store 
writes defaults to eliminate expired snapshots




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841157424



##
File path: docs/content/docs/development/write-table.md
##
@@ -0,0 +1,190 @@
+---
+title: "Write Table"
+weight: 3
+type: docs
+aliases:
+- /development/write-table.html
+---
+
+
+# Write Table
+
+```sql
+INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name
+  [PARTITION part_spec] [column_list] select_statement
+
+part_spec:
+  (part_col_name1=val1 [, part_col_name2=val2, ...])
+
+column_list:
+  (col_name1 [, column_name2, ...])
+```
+
+## Unify Streaming and Batch
+
+Table Store data writing supports Flink's batch and streaming modes.
+Moreover, Table Store supports simultaneous writing of streaming and batch:
+
+```sql
+-- a managed table ddl
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  dt STRING
+) PARTITIONED BY (dt);
+
+-- Run a stream job that continuously writes to the table
+SET 'execution.runtime-mode' = 'streaming';
+INSERT INTO MyTable SELECT ... FROM MyCdcTable;
+```
+
+You have this partitioned table, and now there is a stream job that
+is continuously writing data.
+
+After a few days, you find that there is a problem with yesterday's
+partition data, the data in the partition is wrong, you need to
+recalculate and revise the partition.
+
+```sql
+-- Run a batch job to revise yesterday's partition
+SET 'execution.runtime-mode' = 'batch';
+INSERT OVERWRITE MyTable PARTITION ('dt'='20220402')
+  SELECT ... FROM SourceTable WHERE dt = '20220402';
+```
+
+This way you revise yesterday's partition without suspending the streaming job.
+
+{{< hint info >}}
+__Note:__ Multiple jobs writing to a single partition at the same time is
+not supported. The behavior does not result in data errors, but can lead
+to job failover.
+{{< /hint >}}
+
+## Parallelism
+
+It is recommended that the parallelism of sink should be less than or
+equal to the number of buckets, preferably equal. You can control the
+parallelism of the sink with the `sink.parallelism` option.
+
+
+
+
+  Option
+  Required
+  Default
+  Type
+  Description
+
+
+
+
+  sink.parallelism
+  No
+  (none)
+  Integer
+  Defines the parallelism of the sink operator. By default, the 
parallelism is determined by the framework using the same parallelism of the 
upstream chained operator.
+
+
+
+
+## Expire Snapshot

Review comment:
   `Snapshot Expiration` or `Expiring Snapshot`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] RocMarshal edited a comment on pull request #18386: [FLINK-25684][table] Support enhanced `show databases` syntax

2022-04-02 Thread GitBox



RocMarshal edited a comment on pull request #18386:
URL: https://github.com/apache/flink/pull/18386#issuecomment-1086631358


   @ZhangChaoming Thanks for the update. 
   In general, it's recommended that we should update with a new commit 
message. Because the squish multi-commits message into one commit message will 
be not convenient for reviewers to start the next review in so many changed 
lines.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155419



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read latest incremental
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */;
+```
+
+## Query Optimization
+
+It is highly recommended taking partition and primary key filters
+in the query, which will speed up the data skipping of the query.
+
+Supported filter functions are:
+- `=`
+- `<>`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `in`
+- starts with `like`
+
+## Streaming Real-time
+
+By default, data is only visible after the checkpoint, which means
+that the streaming reading has transactional consistency.
+
+If you want the data to be immediately visible, you need to:
+- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption
+  capability because the FileStore only provides checkpoint-based visibility.
+- 'log.consistency' = 'eventual', this means that writes are visible without 
+  using LogSystem's transaction mechanism.
+- All tables need to have primary key defined, because only then can the
+  data be de-duplicated by normalize node of downstream job.
+
+## Streaming Low Cost
+
+By default, for the table with primary key, the records in the table store only
+contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is
+generated in downstream consuming job, the node will store all key-value for
+producing UPDATE_BEFORE message.

Review comment:
   How about
   > By default, for the table with the primary key, the records in the table 
store only contain `INSERT`, `UPDATE_AFTER`, and `DELETE`. The downstream 
consuming job will have a normalized node, and it stores all processed 
key-value to produce the `UPDATE_BEFORE` message, which will bring extra 
overhead.
   
   If you want to handle the deduplication manually or require all changelog of 
the table, you can remove this node by setting:
   - `log.changelog-mode`= `all`
   - `log.consistency` = `transactional` (default)
   
   ---
   P.S. I think the main difference between the original statement and the 
suggested change is **How** v.s. **Why**. We should first explain to the users 
why they want to do something and then how to achieve the goal. For instance, 
removing the normalized node is not the intention. It is just a means, and the 
purpose is to see all changelog or manually handle du-dep, so we should first 
express why and then how.
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19344: [hotfix] Modify spelling error in IOUtils.java

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19344:
URL: https://github.com/apache/flink/pull/19344#issuecomment-1086770696


   
   ## CI report:
   
   * 1f3f21220dc9ac5041c28da2e7a49db823addfbe Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34188)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155419



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read latest incremental
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */;
+```
+
+## Query Optimization
+
+It is highly recommended taking partition and primary key filters
+in the query, which will speed up the data skipping of the query.
+
+Supported filter functions are:
+- `=`
+- `<>`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `in`
+- starts with `like`
+
+## Streaming Real-time
+
+By default, data is only visible after the checkpoint, which means
+that the streaming reading has transactional consistency.
+
+If you want the data to be immediately visible, you need to:
+- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption
+  capability because the FileStore only provides checkpoint-based visibility.
+- 'log.consistency' = 'eventual', this means that writes are visible without 
+  using LogSystem's transaction mechanism.
+- All tables need to have primary key defined, because only then can the
+  data be de-duplicated by normalize node of downstream job.
+
+## Streaming Low Cost
+
+By default, for the table with primary key, the records in the table store only
+contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is
+generated in downstream consuming job, the node will store all key-value for
+producing UPDATE_BEFORE message.

Review comment:
   How about
   > By default, for the table with the primary key, the records in the table 
store only contain `INSERT`, `UPDATE_AFTER`, and `DELETE`. The downstream 
consuming job will have a normalized node, and it stores all processed 
key-value to produce the `UPDATE_BEFORE` message, which will bring extra 
overhead.
   
   If you want to handle the deduplication manually or require all changelog of 
the table, you can remove this node by setting:
   - `log.changelog-mode`= `all`
   - `log.consistency` = `transactional` (default)
   
   ---
   P.S. I think the main difference between the original statement and the 
suggested change is **How** v.s. **Why**. We should first explain to the users 
why they want to do something and then how to achieve the goal.
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155997



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read latest incremental
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */;
+```
+
+## Query Optimization
+
+It is highly recommended taking partition and primary key filters
+in the query, which will speed up the data skipping of the query.
+
+Supported filter functions are:
+- `=`
+- `<>`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `in`
+- starts with `like`
+
+## Streaming Real-time
+
+By default, data is only visible after the checkpoint, which means
+that the streaming reading has transactional consistency.
+
+If you want the data to be immediately visible, you need to:
+- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption
+  capability because the FileStore only provides checkpoint-based visibility.
+- 'log.consistency' = 'eventual', this means that writes are visible without 
+  using LogSystem's transaction mechanism.
+- All tables need to have primary key defined, because only then can the
+  data be de-duplicated by normalize node of downstream job.

Review comment:
   How about
   > By default, data is only visible after the checkpoint, which means
   that the streaming reading has transactional consistency.
   > If you want the data to be immediately visible, you need to set the 
following options
   
   
   
 Table Option
 Default
 Description
   
   
   
   
 `log.system` = `kafka`
 No
 You need to enable log system because the FileStore's continuous 
consumptition only provides checkpoint-based visibility.
   
   
 `log.consistency` = `eventual`
 No
 This means that writes are visible without 
 using LogSystem's transaction mechanism.
   
   
   
   
   Note: All tables need to have the primary key defined because only then can 
the
 data be de-duplicated by the normalizing node of the downstream job.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155997



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read latest incremental
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */;
+```
+
+## Query Optimization
+
+It is highly recommended taking partition and primary key filters
+in the query, which will speed up the data skipping of the query.
+
+Supported filter functions are:
+- `=`
+- `<>`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `in`
+- starts with `like`
+
+## Streaming Real-time
+
+By default, data is only visible after the checkpoint, which means
+that the streaming reading has transactional consistency.
+
+If you want the data to be immediately visible, you need to:
+- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption
+  capability because the FileStore only provides checkpoint-based visibility.
+- 'log.consistency' = 'eventual', this means that writes are visible without 
+  using LogSystem's transaction mechanism.
+- All tables need to have primary key defined, because only then can the
+  data be de-duplicated by normalize node of downstream job.

Review comment:
   How about
   > By default, data is only visible after the checkpoint, which means
   that the streaming reading has transactional consistency.
   > If you want the data to be immediately visible, you need to set the 
following options
   
   
   
 Table Option
 Default
 Description
   
   
   
   
 `log.system` = `kafka`
 No
 You need to enable log system because the FileStore's continuous 
consumptition only provides checkpoint-based visibility.
   
   
 `log.consistency` = `eventual`
 No
 This means that writes are visible without 
 using LogSystem's transaction mechanism.
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #19344: [hotfix] Modify spelling error in IOUtils.java

2022-04-02 Thread GitBox



flinkbot commented on pull request #19344:
URL: https://github.com/apache/flink/pull/19344#issuecomment-1086770696


   
   ## CI report:
   
   * 1f3f21220dc9ac5041c28da2e7a49db823addfbe UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r84110



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read latest incremental
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */;
+```
+
+## Query Optimization
+
+It is highly recommended taking partition and primary key filters
+in the query, which will speed up the data skipping of the query.
+
+Supported filter functions are:
+- `=`
+- `<>`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `in`
+- starts with `like`
+
+## Streaming Real-time
+
+By default, data is only visible after the checkpoint, which means
+that the streaming reading has transactional consistency.
+
+If you want the data to be immediately visible, you need to:
+- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption
+  capability because the FileStore only provides checkpoint-based visibility.
+- 'log.consistency' = 'eventual', this means that writes are visible without 
+  using LogSystem's transaction mechanism.
+- All tables need to have primary key defined, because only then can the
+  data be de-duplicated by normalize node of downstream job.
+
+## Streaming Low Cost
+
+By default, for the table with primary key, the records in the table store only
+contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is
+generated in downstream consuming job, the node will store all key-value for
+producing UPDATE_BEFORE message.
+
+If you want to remove downstream normalized node (It's costly) or see the all
+changes of this table, you can configure:
+- 'log.changelog-mode' = 'all'
+- 'log.consistency' = 'transactional' (default)
+
+The query written to the table store must contain all message types with
+UPDATE_BEFORE, otherwise the planner will throw an exception.

Review comment:
   Nit: add a punctuation mark for `UPDATE_BEFORE`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] DefuLi opened a new pull request #19344: [hotfix] Modify spelling error in IOUtils.java

2022-04-02 Thread GitBox



DefuLi opened a new pull request #19344:
URL: https://github.com/apache/flink/pull/19344


   
   
   ## What is the purpose of the change
   
   *The word "Premeture" is misspelled in the IOUtils.java file of flink-core.*
   
   
   ## Brief change log
   
 - *Modify spelling error in IOUtils.java file of flink-core.*
   
   ## Verifying this change
   
   *This change is a trivial rework / code cleanup without any test coverage.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155419



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read latest incremental
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */;
+```
+
+## Query Optimization
+
+It is highly recommended taking partition and primary key filters
+in the query, which will speed up the data skipping of the query.
+
+Supported filter functions are:
+- `=`
+- `<>`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `in`
+- starts with `like`
+
+## Streaming Real-time
+
+By default, data is only visible after the checkpoint, which means
+that the streaming reading has transactional consistency.
+
+If you want the data to be immediately visible, you need to:
+- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption
+  capability because the FileStore only provides checkpoint-based visibility.
+- 'log.consistency' = 'eventual', this means that writes are visible without 
+  using LogSystem's transaction mechanism.
+- All tables need to have primary key defined, because only then can the
+  data be de-duplicated by normalize node of downstream job.
+
+## Streaming Low Cost
+
+By default, for the table with primary key, the records in the table store only
+contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is
+generated in downstream consuming job, the node will store all key-value for
+producing UPDATE_BEFORE message.

Review comment:
   How about
   > By default, for the table with the primary key, the records in the table 
store only contain `INSERT`, `UPDATE_AFTER`, and `DELETE`. The downstream 
consuming job will have a normalized node, and it stores all processed 
key-value to produce the `UPDATE_BEFORE` message, which will bring extra 
overhead.
   
   If you want to handle the deduplication manually or require all changes to 
the table, you can remove this node by setting:
   - `log.changelog-mode`= `all`
   - `log.consistency` = `transactional` (default)
   
   ---
   P.S. I think the main difference between the original statement and the 
suggested change is **How** v.s. **Why**. We should first explain to the users 
why they want to do something and then how to achieve the goal.
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841155419



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read latest incremental
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */;
+```
+
+## Query Optimization
+
+It is highly recommended taking partition and primary key filters
+in the query, which will speed up the data skipping of the query.
+
+Supported filter functions are:
+- `=`
+- `<>`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `in`
+- starts with `like`
+
+## Streaming Real-time
+
+By default, data is only visible after the checkpoint, which means
+that the streaming reading has transactional consistency.
+
+If you want the data to be immediately visible, you need to:
+- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption
+  capability because the FileStore only provides checkpoint-based visibility.
+- 'log.consistency' = 'eventual', this means that writes are visible without 
+  using LogSystem's transaction mechanism.
+- All tables need to have primary key defined, because only then can the
+  data be de-duplicated by normalize node of downstream job.
+
+## Streaming Low Cost
+
+By default, for the table with primary key, the records in the table store only
+contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is
+generated in downstream consuming job, the node will store all key-value for
+producing UPDATE_BEFORE message.

Review comment:
   How about
   > By default, for the table with the primary key, the records in the table 
store only contain `INSERT`, `UPDATE_AFTER`, and `DELETE`. The downstream 
consuming job will have a normalized node, and it stores all processed 
key-value to produce the `UPDATE_BEFORE` message, which will bring extra 
overhead.
   
   If you want to handle the deduplication manually or require all changes to 
the table, you can remove this node by setting:
   - `log.changelog-mode`= `all`
   - `log.consistency` = `transactional` (default)
   
   ---
   P.S. I think the main difference between the original statement and the 
suggested change is **How** v.s. **Why**. We should first explain to the user 
why they want to do something and then how to achieve the goal.
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-27022) Expose taskSlots in TaskManager spec

2022-04-02 Thread Thomas Weise (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-27022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516426#comment-17516426
 ] 

Thomas Weise commented on FLINK-27022:
--

I still think that this setting should not be repeated. While it is used 
everywhere I also think that all but the most trivial demo application will 
need to have a flinkConfiguration section and this is just the first setting of 
several.

> Expose taskSlots in TaskManager spec
> 
>
> Key: FLINK-27022
> URL: https://issues.apache.org/jira/browse/FLINK-27022
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
>
> Basically every Flink job needs the number of taskslots configuration and it 
> is an important part of the resource spec.
> We should include this in the taskmanager spec as an integer parameter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841152998



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read latest incremental
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */;
+```
+
+## Query Optimization
+
+It is highly recommended taking partition and primary key filters
+in the query, which will speed up the data skipping of the query.
+
+Supported filter functions are:
+- `=`
+- `<>`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `in`
+- starts with `like`
+
+## Streaming Real-time
+
+By default, data is only visible after the checkpoint, which means
+that the streaming reading has transactional consistency.
+
+If you want the data to be immediately visible, you need to:
+- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption
+  capability because the FileStore only provides checkpoint-based visibility.
+- 'log.consistency' = 'eventual', this means that writes are visible without 
+  using LogSystem's transaction mechanism.
+- All tables need to have primary key defined, because only then can the
+  data be de-duplicated by normalize node of downstream job.
+
+## Streaming Low Cost
+
+By default, for the table with primary key, the records in the table store only
+contains INSERT, UPDATE_AFTER, DELETE. No UPDATE_BEFORE. A normalized node is
+generated in downstream consuming job, the node will store all key-value for

Review comment:
   Nit: `the node will` can be simplified as `it will`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841152540



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read latest incremental
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */;
+```
+
+## Query Optimization
+
+It is highly recommended taking partition and primary key filters
+in the query, which will speed up the data skipping of the query.
+
+Supported filter functions are:
+- `=`
+- `<>`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `in`
+- starts with `like`
+
+## Streaming Real-time
+
+By default, data is only visible after the checkpoint, which means
+that the streaming reading has transactional consistency.
+
+If you want the data to be immediately visible, you need to:
+- 'log.system' = 'kafka', you can't use the FileStore's continuous consumption
+  capability because the FileStore only provides checkpoint-based visibility.
+- 'log.consistency' = 'eventual', this means that writes are visible without 
+  using LogSystem's transaction mechanism.
+- All tables need to have primary key defined, because only then can the
+  data be de-duplicated by normalize node of downstream job.
+
+## Streaming Low Cost
+
+By default, for the table with primary key, the records in the table store only

Review comment:
   By default, for the table with the primary key, the records in the table 
store only
   contain `INSERT`, `UPDATE_AFTER`, and `DELETE`. No `UPDATE_BEFORE`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841151631



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read latest incremental
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */;
+```
+
+## Query Optimization
+
+It is highly recommended taking partition and primary key filters
+in the query, which will speed up the data skipping of the query.
+
+Supported filter functions are:
+- `=`
+- `<>`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `in`
+- starts with `like`
+
+## Streaming Real-time
+
+By default, data is only visible after the checkpoint, which means
+that the streaming reading has transactional consistency.
+
+If you want the data to be immediately visible, you need to:

Review comment:
   complete the sentence as 
   > you need to set the following options




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841151482



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read latest incremental
+SET 'execution.runtime-mode' = 'streaming';
+SELECT * FROM MyTable /*+ OPTIONS ('log.scan'='latest') */;
+```
+
+## Query Optimization
+
+It is highly recommended taking partition and primary key filters

Review comment:
   It is highly recommended to take partition and primary key filters
   along with the query, which will speed up the data skipping of the query.
   
   See 
[Ref](https://www.macmillandictionary.com/us/dictionary/american/recommend#:~:text=recommend%20doing%20something%3A,to%20read%20the%20following%20books.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841149095



##
File path: docs/content/docs/development/query-table.md
##
@@ -0,0 +1,87 @@
+---
+title: "Query Table"
+weight: 4
+type: docs
+aliases:
+- /development/query-table.html
+---
+
+
+# Query Table
+
+The Table Store is streaming batch unified, you can read full
+and incremental data depending on the runtime execution mode:
+
+```sql
+-- Batch mode, read latest snapshot
+SET 'execution.runtime-mode' = 'batch';
+SELECT * FROM MyTable;
+
+-- Streaming mode, read incremental snapshot, read the snapshot first, then 
read the increment

Review comment:
   then read the incremental




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841148673



##
File path: docs/content/docs/development/overview.md
##
@@ -30,6 +30,25 @@ Flink Table Store is a unified streaming and batch store for 
building dynamic
 tables on Apache Flink. Flink Table Store serves as the storage engine behind
 Flink SQL Managed Table.
 
+## Setup Table Store
+
+{{< hint info >}}
+__Note:__ Table Store is only supported starting from Flink 1.15.
+{{< /hint >}}
+
+You can get the bundle jar for the Table Store in one of the following ways:
+- [Download the latest bundle jar](https://flink.apache.org/downloads.html) of
+  Flink Table Store.
+- Build bundle jar from `flink-table-store-dist` in the source code.
+
+We have shaded all the dependencies in the package, so you don't have

Review comment:
   `Flink Table Store has shaded`
   
   Ref:  [Apache Flink Documentation Language 
Syle](https://flink.apache.org/contributing/docs-style.html)
   > Use you, never we. Using we can be confusing and patronizing to some 
users, giving the impression that “we are all members of a secret club and you 
didn’t get a membership invite”. Address the user as you.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841148428



##
File path: docs/content/docs/development/overview.md
##
@@ -30,6 +30,25 @@ Flink Table Store is a unified streaming and batch store for 
building dynamic
 tables on Apache Flink. Flink Table Store serves as the storage engine behind
 Flink SQL Managed Table.
 
+## Setup Table Store
+
+{{< hint info >}}
+__Note:__ Table Store is only supported starting from Flink 1.15.
+{{< /hint >}}
+
+You can get the bundle jar for the Table Store in one of the following ways:
+- [Download the latest bundle jar](https://flink.apache.org/downloads.html) of
+  Flink Table Store.
+- Build bundle jar from `flink-table-store-dist` in the source code.

Review comment:
   Build bundle jar under submodule`flink-table-store-dist` from source 
code.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841147919



##
File path: docs/content/docs/development/overview.md
##
@@ -30,6 +30,25 @@ Flink Table Store is a unified streaming and batch store for 
building dynamic
 tables on Apache Flink. Flink Table Store serves as the storage engine behind
 Flink SQL Managed Table.
 
+## Setup Table Store
+
+{{< hint info >}}
+__Note:__ Table Store is only supported starting from Flink 1.15.

Review comment:
   How about 
   > Table Store is only supported since Flink 1.15.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841147805



##
File path: docs/content/docs/development/create-table.md
##
@@ -151,3 +151,85 @@ Creating a table will create the corresponding physical 
storage:
 - If `log.system` is configured as Kafka, a Topic named
   "${catalog_name}.${database_name}.${table_name}" will be created
   automatically when the table is created.
+
+## Distribution
+
+The data distribution of Table Store consists of three concepts:
+Partition, Bucket, and Primary Key.
+
+```sql
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  behavior STRING,
+  dt STRING,
+  PRIMARY KEY (dt, user_id) NOT ENFORCED
+) PARTITIONED BY (dt) WITH (
+  'bucket' = '4'
+);
+```
+
+For example, the `MyTable` table above has its data distribution
+in the following order:
+- Partition: isolating different data based on partition fields.
+- Bucket: Within a single partition, distributed into 4 different
+  buckets based on the hash value of the primary key.
+- Primary key: Within a single bucket, sorted by primary key to
+  build LSM structure.
+
+## Partition
+
+Table Store adopts the same partitioning concept as Apache Hive to
+separate data, and thus various operations can be managed by partition
+as a management unit.
+
+Partitioned filtering is the most effective way to improve performance,
+your query statements should contain partition filtering conditions
+as much as possible.
+
+## Bucket
+
+The record is hashed into different buckets according to the
+primary key or the whole row (without primary key).
+
+The number of buckets is very important as it determines the
+worst-case maximum processing parallelism. But it should not be
+too big, otherwise, the system will create a lot of small files.
+
+In general, the desired file size is 128 MB, the recommended data
+to be kept on disk in each sub-bucket is about 1 GB.
+
+## Primary Key
+
+The primary key is unique and is indexed.
+
+Flink Table Store imposes an ordering of data, which means the system
+will sort the primary key within each bucket. All fields will be used
+to sort if no primary key is defined. Using this feature, you can
+achieve high performance by adding filter conditions on the primary key.
+
+The primary key's choice is critical, especially when setting the composite
+primary key. A rule of thumb is to put the most frequently queried field in
+the front. For example:
+
+```sql
+CREATE TABLE MyTable (
+  catalog_id BIGINT,
+  user_id BIGINT,
+  item_id BIGINT,
+  behavior STRING,
+  dt STRING,
+  ..
+);
+```
+
+For this table, assuming that the composite primary keys are
+the `catalog_id` and `user_id` fields, there are two ways to
+set the primary key:
+1. PRIMARY KEY (user_id, catalog_id)
+2. PRIMARY KEY (catalog_id, user_id)
+
+The two methods do not behave in the same way when querying.
+Use approach 1 if you have a large number of filtered queries
+with only `user_id`, and use approach 2 if you have a large
+number of filtered queries with only `catalog_id`.

Review comment:
   Nit: approach 1 => approach one, approach 2 => approach two




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-26899) Introduce write/query table document for table store

2022-04-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-26899:
---
Labels: pull-request-available  (was: )

> Introduce write/query table document for table store
> 
>
> Key: FLINK-26899
> URL: https://issues.apache.org/jira/browse/FLINK-26899
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.1.0
>
>
> * Batch or streaming
>  * file store continuous and log store
>  * retention
>  * parallelism
>  * memory



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

2022-04-02 Thread GitBox



LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841147565



##
File path: docs/content/docs/development/create-table.md
##
@@ -151,3 +151,85 @@ Creating a table will create the corresponding physical 
storage:
 - If `log.system` is configured as Kafka, a Topic named
   "${catalog_name}.${database_name}.${table_name}" will be created
   automatically when the table is created.
+
+## Distribution
+
+The data distribution of Table Store consists of three concepts:
+Partition, Bucket, and Primary Key.
+
+```sql
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  behavior STRING,
+  dt STRING,
+  PRIMARY KEY (dt, user_id) NOT ENFORCED
+) PARTITIONED BY (dt) WITH (
+  'bucket' = '4'
+);
+```
+
+For example, the `MyTable` table above has its data distribution
+in the following order:
+- Partition: isolating different data based on partition fields.
+- Bucket: Within a single partition, distributed into 4 different
+  buckets based on the hash value of the primary key.
+- Primary key: Within a single bucket, sorted by primary key to
+  build LSM structure.
+
+## Partition
+
+Table Store adopts the same partitioning concept as Apache Hive to
+separate data, and thus various operations can be managed by partition
+as a management unit.
+
+Partitioned filtering is the most effective way to improve performance,
+your query statements should contain partition filtering conditions
+as much as possible.
+
+## Bucket
+
+The record is hashed into different buckets according to the
+primary key or the whole row (without primary key).
+
+The number of buckets is very important as it determines the
+worst-case maximum processing parallelism. But it should not be
+too big, otherwise, the system will create a lot of small files.
+
+In general, the desired file size is 128 MB, the recommended data
+to be kept on disk in each sub-bucket is about 1 GB.
+
+## Primary Key
+
+The primary key is unique and is indexed.

Review comment:
   Nit: The primary key is unique and indexed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19343: [FLINK-27026][build] Upgrade checkstyle plugin

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19343:
URL: https://github.com/apache/flink/pull/19343#issuecomment-1086724313


   
   ## CI report:
   
   * 52653837e78b17dcdc196b3c6562ccebf22fca17 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34184)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19263: [FLINK-21585][metrics] Add options for in-/excluding metrics

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19263:
URL: https://github.com/apache/flink/pull/19263#issuecomment-1081065491


   
   ## CI report:
   
   * 087457408aa0db8ed8d49e2c7da68b296c25142f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34183)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19342: [FLINK-27027] Get rid of oddly named mvn-${sys mvn.forkNumber}.log files

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19342:
URL: https://github.com/apache/flink/pull/19342#issuecomment-1086722113


   
   ## CI report:
   
   * a051c979b72bac01401b1190096f29f30d21dadc Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34182)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-26524) Elasticsearch (v5.3.3) sink end-to-end test

2022-04-02 Thread Flink Jira Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-26524:
---
  Labels: auto-deprioritized-critical test-stability  (was: stale-critical 
test-stability)
Priority: Major  (was: Critical)

This issue was labeled "stale-critical" 7 days ago and has not received any 
updates so it is being deprioritized. If this ticket is actually Critical, 
please raise the priority and ask a committer to assign you the issue or revive 
the public discussion.


> Elasticsearch (v5.3.3) sink end-to-end test
> ---
>
> Key: FLINK-26524
> URL: https://issues.apache.org/jira/browse/FLINK-26524
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / ElasticSearch
>Affects Versions: 1.14.3
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: auto-deprioritized-critical, test-stability
>
> e2e test {{Elasticsearch (v5.3.3) sink end-to-end test}} failed in [this 
> build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=32627&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=070ff179-953e-5bda-71fa-d6599415701c&l=16598]
>  on {{release-1.14}} probably because of the following stacktrace showing up 
> in the logs:
> {code}
> Mar 07 15:40:41 2022-03-07 15:40:40,336 WARN  
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
> trigger checkpoint 1 for job 3a2fd4c6fb03d5b20929a6f2b7131d82. (0 consecutive 
> failed attempts so far)
> Mar 07 15:40:41 org.apache.flink.runtime.checkpoint.CheckpointException: 
> Checkpoint was declined (task is closing)
> Mar 07 15:40:41   at 
> org.apache.flink.runtime.messages.checkpoint.SerializedCheckpointException.unwrap(SerializedCheckpointException.java:51)
>  ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
> Mar 07 15:40:41   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveDeclineMessage(CheckpointCoordinator.java:988)
>  ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
> Mar 07 15:40:41   at 
> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$declineCheckpoint$2(ExecutionGraphHandler.java:103)
>  ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
> Mar 07 15:40:41   at 
> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119)
>  ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
> Mar 07 15:40:41   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_322]
> Mar 07 15:40:41   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_322]
> Mar 07 15:40:41   at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
> Mar 07 15:40:41 Caused by: org.apache.flink.util.SerializedThrowable: Task 
> name with subtask : Source: Sequence Source (Deprecated) -> Flat Map -> Sink: 
> Unnamed (1/1)#0 Failure reason: Checkpoint was declined (task is closing)
> Mar 07 15:40:41   at 
> org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1389) 
> ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
> Mar 07 15:40:41   at 
> org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1382) 
> ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
> Mar 07 15:40:41   at 
> org.apache.flink.runtime.taskmanager.Task.triggerCheckpointBarrier(Task.java:1348)
>  ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
> Mar 07 15:40:41   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.triggerCheckpoint(TaskExecutor.java:956)
>  ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
> Mar 07 15:40:41   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) ~[?:1.8.0_322]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-25765) Kubernetes: flink's configmap and flink's actual config are out of sync

2022-04-02 Thread Flink Jira Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-25765:
---
  Labels: auto-deprioritized-major usability  (was: stale-major usability)
Priority: Minor  (was: Major)

This issue was labeled "stale-major" 7 days ago and has not received any 
updates so it is being deprioritized. If this ticket is actually Major, please 
raise the priority and ask a committer to assign you the issue or revive the 
public discussion.


> Kubernetes: flink's configmap and flink's actual config are out of sync
> ---
>
> Key: FLINK-25765
> URL: https://issues.apache.org/jira/browse/FLINK-25765
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.15.0
>Reporter: Niklas Semmler
>Priority: Minor
>  Labels: auto-deprioritized-major, usability
>
> For kubernetes setups, Flink's configmap does not reflect the actual config.
> Causes
>  # Config values are overridden by the environment variables in the docker 
> image (see FLINK-25764)
>  # Flink reads the config on start-up, but does not subscribe to changes
>  # Changes to the config map do not lead to restarts of the flink cluster
> Effects
>  # Users cannot expect to understand Flink's config from the configmap
>  # TaskManager/JobManager started at different times may start with different 
> configs, if the user edits the configmap
> Related to FLINK-21383.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26386) CassandraConnectorITCase.testCassandraTableSink failed on azure

2022-04-02 Thread Flink Jira Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-26386:
---
  Labels: auto-deprioritized-critical test-stability  (was: stale-critical 
test-stability)
Priority: Major  (was: Critical)

This issue was labeled "stale-critical" 7 days ago and has not received any 
updates so it is being deprioritized. If this ticket is actually Critical, 
please raise the priority and ask a committer to assign you the issue or revive 
the public discussion.


> CassandraConnectorITCase.testCassandraTableSink failed on azure
> ---
>
> Key: FLINK-26386
> URL: https://issues.apache.org/jira/browse/FLINK-26386
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Cassandra
>Affects Versions: 1.15.0, 1.13.6
>Reporter: Yun Gao
>Priority: Major
>  Labels: auto-deprioritized-critical, test-stability
>
> {code:java}
> Feb 28 02:39:19 [ERROR] Tests run: 20, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 226.77 s <<< FAILURE! - in 
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase
> Feb 28 02:39:19 [ERROR] 
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase.testCassandraTableSink
>   Time elapsed: 52.49 s  <<< ERROR!
> Feb 28 02:39:19 
> com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException: 
> Keyspace flink doesn't exist
> Feb 28 02:39:19   at 
> com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException.copy(InvalidConfigurationInQueryException.java:37)
> Feb 28 02:39:19   at 
> com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException.copy(InvalidConfigurationInQueryException.java:27)
> Feb 28 02:39:19   at 
> com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
> Feb 28 02:39:19   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
> Feb 28 02:39:19   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63)
> Feb 28 02:39:19   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39)
> Feb 28 02:39:19   at 
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase.createTable(CassandraConnectorITCase.java:391)
> Feb 28 02:39:19   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Feb 28 02:39:19   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Feb 28 02:39:19   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Feb 28 02:39:19   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 28 02:39:19   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Feb 28 02:39:19   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Feb 28 02:39:19   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Feb 28 02:39:19   at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
> Feb 28 02:39:19   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> Feb 28 02:39:19   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> Feb 28 02:39:19   at 
> org.apache.flink.testutils.junit.RetryRule$RetryOnExceptionStatement.evaluate(RetryRule.java:192)
> Feb 28 02:39:19   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Feb 28 02:39:19   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Feb 28 02:39:19   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Feb 28 02:39:19   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Feb 28 02:39:19   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Feb 28 02:39:19   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Feb 28 02:39:19   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Feb 28 02:39:19   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Feb 28 02:39:19   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Feb 28 02:39:19   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Feb 28 02:39:19   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Feb 28 02:39:19   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Feb 28 02:39:19   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:2

[jira] [Updated] (FLINK-22955) lookup join filter push down result to mismatch function signature

2022-04-02 Thread Flink Jira Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-22955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-22955:
---
Labels: auto-deprioritized-critical stale-major  (was: 
auto-deprioritized-critical)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Major but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 60 days. I have gone ahead and added a "stale-major" to the issue". If this 
ticket is a Major, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> lookup join filter push down result to mismatch function signature
> --
>
> Key: FLINK-22955
> URL: https://issues.apache.org/jira/browse/FLINK-22955
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.3, 1.13.1, 1.12.4
> Environment: Flink 1.13.1
> how to reproduce: patch file attached
>Reporter: Cooper Luan
>Priority: Major
>  Labels: auto-deprioritized-critical, stale-major
> Attachments: 
> 0001-try-to-produce-lookup-join-filter-pushdown-expensive.patch
>
>
> a sql like this may result to look function signature mismatch exception when 
> explain sql
> {code:sql}
> CREATE TEMPORARY VIEW v_vvv AS
> SELECT * FROM MyTable AS T
> JOIN LookupTableAsync1 FOR SYSTEM_TIME AS OF T.proctime AS D
> ON T.a = D.id;
> SELECT a,b,id,name
> FROM v_vvv
> WHERE age = 10;{code}
> the lookup function is
> {code:scala}
> class AsyncTableFunction1 extends AsyncTableFunction[RowData] {
>   def eval(resultFuture: CompletableFuture[JCollection[RowData]], a: 
> Integer): Unit = {
>   }
> }{code}
> exec plan is
> {code:java}
> LegacySink(name=[`default_catalog`.`default_database`.`appendSink1`], 
> fields=[a, b, id, name])
> +- LookupJoin(table=[default_catalog.default_database.LookupTableAsync1], 
> joinType=[InnerJoin], async=[true], lookup=[age=10, id=a], where=[(age = 
> 10)], select=[a, b, id, name])
>+- Calc(select=[a, b])
>   +- DataStreamScan(table=[[default_catalog, default_database, MyTable]], 
> fields=[a, b, c, proctime, rowtime])
> {code}
> the "lookup=[age=10, id=a]" result to mismatch signature mismatch
>  
> but if I add 1 more insert, it works well
> {code:sql}
> SELECT a,b,id,name
> FROM v_vvv
> WHERE age = 30
> {code}
> exec plan is
> {code:java}
> == Optimized Execution Plan ==
> LookupJoin(table=[default_catalog.default_database.LookupTableAsync1], 
> joinType=[InnerJoin], async=[true], lookup=[id=a], select=[a, b, c, proctime, 
> rowtime, id, name, age, ts])(reuse_id=[1])
> +- DataStreamScan(table=[[default_catalog, default_database, MyTable]], 
> fields=[a, b, c, proctime, 
> rowtime])LegacySink(name=[`default_catalog`.`default_database`.`appendSink1`],
>  fields=[a, b, id, name])
> +- Calc(select=[a, b, id, name], where=[(age = 10)])
>+- 
> Reused(reference_id=[1])LegacySink(name=[`default_catalog`.`default_database`.`appendSink2`],
>  fields=[a, b, id, name])
> +- Calc(select=[a, b, id, name], where=[(age = 30)])
>+- Reused(reference_id=[1])
> {code}
>  the LookupJoin node use "lookup=[id=a]"(right) not "lookup=[age=10, id=a]" 
> (wrong)
>  
> so, in "multi insert" case, planner works great
> in "single insert" case, planner throw exception



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #19343: [FLINK-27026][build] Upgrade checkstyle plugin

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19343:
URL: https://github.com/apache/flink/pull/19343#issuecomment-1086724313


   
   ## CI report:
   
   * 52653837e78b17dcdc196b3c6562ccebf22fca17 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34184)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19263: [FLINK-21585][metrics] Add options for in-/excluding metrics

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19263:
URL: https://github.com/apache/flink/pull/19263#issuecomment-1081065491


   
   ## CI report:
   
   * 8eda65b9d6af64aafa22c855e8177f2eacbaf1d0 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34033)
 
   * 087457408aa0db8ed8d49e2c7da68b296c25142f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34183)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #19343: [FLINK-27026][build] Upgrade checkstyle plugin

2022-04-02 Thread GitBox



flinkbot commented on pull request #19343:
URL: https://github.com/apache/flink/pull/19343#issuecomment-1086724313


   
   ## CI report:
   
   * 52653837e78b17dcdc196b3c6562ccebf22fca17 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19263: [FLINK-21585][metrics] Add options for in-/excluding metrics

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19263:
URL: https://github.com/apache/flink/pull/19263#issuecomment-1081065491


   
   ## CI report:
   
   * 8eda65b9d6af64aafa22c855e8177f2eacbaf1d0 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34033)
 
   * 087457408aa0db8ed8d49e2c7da68b296c25142f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19342: [FLINK-27027] Get rid of oddly named mvn-${sys mvn.forkNumber}.log files

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19342:
URL: https://github.com/apache/flink/pull/19342#issuecomment-1086722113


   
   ## CI report:
   
   * a051c979b72bac01401b1190096f29f30d21dadc Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34182)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-27026) Upgrade checkstyle plugin

2022-04-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-27026:
---
Labels: pull-request-available  (was: )

> Upgrade checkstyle plugin
> -
>
> Key: FLINK-27026
> URL: https://issues.apache.org/jira/browse/FLINK-27026
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Newer versions of the checkstyle plugin allow running checkstyle:check 
> without requiring dependency resolution. This allows it to be used in a fresh 
> environment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] zentol opened a new pull request #19343: [FLINK-27026][build] Upgrade checkstyle plugin

2022-04-02 Thread GitBox



zentol opened a new pull request #19343:
URL: https://github.com/apache/flink/pull/19343


   newer version no longer require dependency resolution
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-27027) Get rid of oddly named mvn-${sys mvn.forkNumber}.log files

2022-04-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-27027:
---
Labels: pull-request-available  (was: )

> Get rid of oddly named mvn-${sys mvn.forkNumber}.log files
> --
>
> Key: FLINK-27027
> URL: https://issues.apache.org/jira/browse/FLINK-27027
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot commented on pull request #19342: [FLINK-27027] Get rid of oddly named mvn-${sys mvn.forkNumber}.log files

2022-04-02 Thread GitBox



flinkbot commented on pull request #19342:
URL: https://github.com/apache/flink/pull/19342#issuecomment-1086722113


   
   ## CI report:
   
   * a051c979b72bac01401b1190096f29f30d21dadc UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-27027) Get rid of oddly named mvn-${sys mvn.forkNumber}.log files

2022-04-02 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-27027:


 Summary: Get rid of oddly named mvn-${sys mvn.forkNumber}.log files
 Key: FLINK-27027
 URL: https://issues.apache.org/jira/browse/FLINK-27027
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27026) Upgrade checkstyle plugin

2022-04-02 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-27026:


 Summary: Upgrade checkstyle plugin
 Key: FLINK-27026
 URL: https://issues.apache.org/jira/browse/FLINK-27026
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


Newer versions of the checkstyle plugin allow running checkstyle:check without 
requiring dependency resolution. This allows it to be used in a fresh 
environment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-27025) Cannot read parquet file, after putting the jar in the right place with right permissions

2022-04-02 Thread Ziheng Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziheng Wang updated FLINK-27025:

Attachment: tpch-12-parquet.py

> Cannot read parquet file, after putting the jar in the right place with right 
> permissions
> -
>
> Key: FLINK-27025
> URL: https://issues.apache.org/jira/browse/FLINK-27025
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Table SQL / API
>Affects Versions: 1.14.0
>Reporter: Ziheng Wang
>Priority: Blocker
> Attachments: tpch-12-parquet.py
>
>
> I am using Flink with the SQL API on AWS EMR. I can run queries on CSV files, 
> no problem.
> However when I try to run queries on Parquet files, I get this error: Caused 
> by: java.io.StreamCorruptedException: unexpected block data
> I have put flink-sql-parquet_2.12-1.14.0.jar under /usr/lib/flink/lib on the 
> master node of the EMR cluster. Indeed it seems that Flink picks up on it, 
> because if the jar is not there then the error is different (it says it can't 
> understand parquet source) The jar has full 777 permissions under the same 
> username as all the other jars in that file.
> I tried passing a folder name as the Parquet source as well as a single 
> Parquet file, nothing works. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27025) Cannot read parquet file, after putting the jar in the right place with right permissions

2022-04-02 Thread Ziheng Wang (Jira)

Ziheng Wang created FLINK-27025:
---

 Summary: Cannot read parquet file, after putting the jar in the 
right place with right permissions
 Key: FLINK-27025
 URL: https://issues.apache.org/jira/browse/FLINK-27025
 Project: Flink
  Issue Type: Bug
  Components: API / Python, Table SQL / API
Affects Versions: 1.14.0
Reporter: Ziheng Wang


I am using Flink with the SQL API on AWS EMR. I can run queries on CSV files, 
no problem.

However when I try to run queries on Parquet files, I get this error: Caused 
by: java.io.StreamCorruptedException: unexpected block data

I have put flink-sql-parquet_2.12-1.14.0.jar under /usr/lib/flink/lib on the 
master node of the EMR cluster. Indeed it seems that Flink picks up on it, 
because if the jar is not there then the error is different (it says it can't 
understand parquet source) The jar has full 777 permissions under the same 
username as all the other jars in that file.

I tried passing a folder name as the Parquet source as well as a single Parquet 
file, nothing works. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Closed] (FLINK-27024) Cleanup surefire configuration

2022-04-02 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-27024.

Resolution: Fixed

master: c41f169765646bf9dd1ac8264127e1fc9e399898

> Cleanup surefire configuration
> --
>
> Key: FLINK-27024
> URL: https://issues.apache.org/jira/browse/FLINK-27024
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> We have a few redundant surefire configurations in some connector modules, 
> and overall a lot of duplication and stuff defined on the argLine which could 
> be systemEnvironmentVariables (which are easier to extend in sub-modules).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] zentol merged pull request #19341: [FLINK-27024][build] Cleanup surefire configuration

2022-04-02 Thread GitBox



zentol merged pull request #19341:
URL: https://github.com/apache/flink/pull/19341


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-26994) Merge libraries CI profile into core

2022-04-02 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-26994.

Resolution: Fixed

master: d17b0ba0487a421d1dc00c5264ef820908f54658

> Merge libraries CI profile into core
> 
>
> Key: FLINK-26994
> URL: https://issues.apache.org/jira/browse/FLINK-26994
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / CI
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> The libraries profile spends more time on setting up the environment than 
> actually running tests. Merge it with Core for more efficiency.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] zentol merged pull request #19337: [FLINK-26994][ci] Merge Core/Libraries CI profiles

2022-04-02 Thread GitBox



zentol merged pull request #19337:
URL: https://github.com/apache/flink/pull/19337


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19330:
URL: https://github.com/apache/flink/pull/19330#issuecomment-1085919480


   
   ## CI report:
   
   * 5cc15a539159eaeeb8a11961c46e755041bb7840 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34176)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19341: [FLINK-27024][build] Cleanup surefire configuration

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19341:
URL: https://github.com/apache/flink/pull/19341#issuecomment-1086648587


   
   ## CI report:
   
   * 12bf7a182480d9677b2e7c50c08793665c99dd5f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34177)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19340: [FLINK-26961][BP-1.14][connectors][filesystems][formats] Update Jackson Databi…

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19340:
URL: https://github.com/apache/flink/pull/19340#issuecomment-1086644469


   
   ## CI report:
   
   * 948190b21e39a82c9accade41cd1fe0c861a8184 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34175)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] gyfora commented on pull request #519: Add Kubernetes Operator 0.1.0 release

2022-04-02 Thread GitBox



gyfora commented on pull request #519:
URL: https://github.com/apache/flink-web/pull/519#issuecomment-1086667655


   @mbalassi please build, review and merge if all good :) 
   I have modified the blogpost release date to 04/04 (monday)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-27024) Cleanup surefire configuration

2022-04-02 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-27024:


 Summary: Cleanup surefire configuration
 Key: FLINK-27024
 URL: https://issues.apache.org/jira/browse/FLINK-27024
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


We have a few redundant surefire configurations in some connector modules, and 
overall a lot of duplication and stuff defined on the argLine which could be 
systemEnvironmentVariables (which are easier to extend in sub-modules).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #18386: [FLINK-25684][table] Support enhanced `show databases` syntax

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #18386:
URL: https://github.com/apache/flink/pull/18386#issuecomment-1015100174


   
   ## CI report:
   
   * 91a6c605b4d6395e6bd01ce25ae6d0bec1146cae Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34173)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-26969) Write operation examples of tumbling window, sliding window, session window and count window based on pyflink

2022-04-02 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-26969.
---
Fix Version/s: 1.16.0
   Resolution: Fixed

Merged to master via db23add96305e163328283859fb644af72350018

> Write operation examples of tumbling window, sliding window, session window 
> and count window based on pyflink
> -
>
> Key: FLINK-26969
> URL: https://issues.apache.org/jira/browse/FLINK-26969
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Examples
>Reporter: zhangjingcun
>Assignee: zhangjingcun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Write operation examples of tumbling window, sliding window, session window 
> and count window based on pyflink



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #19218: hive dialect supports select current database

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19218:
URL: https://github.com/apache/flink/pull/19218#issuecomment-1077287637


   
   ## CI report:
   
   * 959005953d10ba602eb1bad4ee2004179f9e8626 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34172)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] dianfu closed pull request #19328: [FLINK-26969][Examples] Write operation examples of tumbling window, sliding window, session window and count window based on pyflink

2022-04-02 Thread GitBox



dianfu closed pull request #19328:
URL: https://github.com/apache/flink/pull/19328


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-26540) Support handle join involving complex types in on condition

2022-04-02 Thread Jing Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516320#comment-17516320
 ] 

Jing Zhang edited comment on FLINK-26540 at 4/2/22 2:39 PM:


merged in master: 0e459bf810736347ad7bf1eb62be4dad3382392e


was (Author: qingru zhang):
fixed in master: 0e459bf810736347ad7bf1eb62be4dad3382392e

> Support handle join involving complex types in on condition
> ---
>
> Key: FLINK-26540
> URL: https://issues.apache.org/jira/browse/FLINK-26540
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Now, the Hive dialect can't handle join involving complex types in on 
> condition, which can be reproduced using the following code in 
> HiveDialectITCase:
> {code:java}
> tableEnv.executeSql("CREATE TABLE test2a (a ARRAY)");
> tableEnv.executeSql("CREATE TABLE test2b (a INT)");
> List results =
> CollectionUtil.iteratorToList(
> tableEnv.executeSql(
> "select *  from test2b join test2a on 
> test2b.a = test2a.a[1]")
> .collect());
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Closed] (FLINK-26540) Support handle join involving complex types in on condition

2022-04-02 Thread Jing Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhang closed FLINK-26540.
--
Resolution: Fixed

> Support handle join involving complex types in on condition
> ---
>
> Key: FLINK-26540
> URL: https://issues.apache.org/jira/browse/FLINK-26540
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Now, the Hive dialect can't handle join involving complex types in on 
> condition, which can be reproduced using the following code in 
> HiveDialectITCase:
> {code:java}
> tableEnv.executeSql("CREATE TABLE test2a (a ARRAY)");
> tableEnv.executeSql("CREATE TABLE test2b (a INT)");
> List results =
> CollectionUtil.iteratorToList(
> tableEnv.executeSql(
> "select *  from test2b join test2a on 
> test2b.a = test2a.a[1]")
> .collect());
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26540) Support handle join involving complex types in on condition

2022-04-02 Thread Jing Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516320#comment-17516320
 ] 

Jing Zhang commented on FLINK-26540:


fixed in master: 0e459bf810736347ad7bf1eb62be4dad3382392e

> Support handle join involving complex types in on condition
> ---
>
> Key: FLINK-26540
> URL: https://issues.apache.org/jira/browse/FLINK-26540
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Now, the Hive dialect can't handle join involving complex types in on 
> condition, which can be reproduced using the following code in 
> HiveDialectITCase:
> {code:java}
> tableEnv.executeSql("CREATE TABLE test2a (a ARRAY)");
> tableEnv.executeSql("CREATE TABLE test2b (a INT)");
> List results =
> CollectionUtil.iteratorToList(
> tableEnv.executeSql(
> "select *  from test2b join test2a on 
> test2b.a = test2a.a[1]")
> .collect());
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] beyond1920 closed pull request #19012: [FLINK-26540][hive] Support handle join involving complex types in on condition

2022-04-02 Thread GitBox



beyond1920 closed pull request #19012:
URL: https://github.com/apache/flink/pull/19012


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19341: [FLINK-27024][build] Cleanup surefire configuration

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19341:
URL: https://github.com/apache/flink/pull/19341#issuecomment-1086648587


   
   ## CI report:
   
   * 12bf7a182480d9677b2e7c50c08793665c99dd5f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34177)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #19341: [FLINK-27024][build] Cleanup surefire configuration

2022-04-02 Thread GitBox



flinkbot commented on pull request #19341:
URL: https://github.com/apache/flink/pull/19341#issuecomment-1086648587


   
   ## CI report:
   
   * 12bf7a182480d9677b2e7c50c08793665c99dd5f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-27011) Add state access examples of Python DataStream API

2022-04-02 Thread jay li (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-27011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516315#comment-17516315
 ] 

jay li commented on FLINK-27011:


[~dianfu]  cool , It looks like this ticket has already been assigned.

> Add state access examples of Python DataStream API
> --
>
> Key: FLINK-27011
> URL: https://issues.apache.org/jira/browse/FLINK-27011
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Examples
>Reporter: zhangjingcun
>Assignee: zhangjingcun
>Priority: Major
>
> Add various types of state usage examples



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26512) Document debugging and manual error handling

2022-04-02 Thread Gyula Fora (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora updated FLINK-26512:
---
Fix Version/s: kubernetes-operator-1.0.0
   (was: kubernetes-operator-0.1.0)

> Document debugging and manual error handling
> 
>
> Key: FLINK-26512
> URL: https://issues.apache.org/jira/browse/FLINK-26512
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
> Fix For: kubernetes-operator-1.0.0
>
>
> We should document how users can discover errors that happen during job 
> deployments, such as which k8s logs to check etc.
> We should also highlight limitation of the operator error handling and 
> document how users can manually recover jobs (for example: delete -> manually 
> restart with initial savepoint)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Closed] (FLINK-26611) Document operator config options

2022-04-02 Thread Gyula Fora (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora closed FLINK-26611.
--
Resolution: Fixed

Merged

main: ffe0d55613764a9dd768cc20f5d96da8154c0865
release-0.1: 01d717e63ed02747a79808558e5aa75a2b1cb4f3

> Document operator config options
> 
>
> Key: FLINK-26611
> URL: https://issues.apache.org/jira/browse/FLINK-26611
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Biao Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> Similar to how other flink configs are documented we should also document the 
> configs found in OperatorConfigOptions.
> Generating the documentation might be possible but maybe it's easier to do it 
> manually.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19330:
URL: https://github.com/apache/flink/pull/19330#issuecomment-1085919480


   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * 5cc15a539159eaeeb8a11961c46e755041bb7840 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34176)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-27024) Cleanup surefire configuration

2022-04-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-27024:
---
Labels: pull-request-available  (was: )

> Cleanup surefire configuration
> --
>
> Key: FLINK-27024
> URL: https://issues.apache.org/jira/browse/FLINK-27024
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> We have a few redundant surefire configurations in some connector modules, 
> and overall a lot of duplication and stuff defined on the argLine which could 
> be systemEnvironmentVariables (which are easier to extend in sub-modules).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] zentol opened a new pull request #19341: [FLINK-27024][build] Cleanup surefire configuration

2022-04-02 Thread GitBox



zentol opened a new pull request #19341:
URL: https://github.com/apache/flink/pull/19341


   Removes unnecessary special surefire configurations and models as much as 
possible as environment variables to make things easier to extend.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19330:
URL: https://github.com/apache/flink/pull/19330#issuecomment-1085919480


   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * 5cc15a539159eaeeb8a11961c46e755041bb7840 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19315: [FLINK-27013][hive] Hive dialect supports is_distinct_from

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19315:
URL: https://github.com/apache/flink/pull/19315#issuecomment-1085386734


   
   ## CI report:
   
   * 5c2e924a5bf679bf9b54cd00b1ec1d1e5116048e UNKNOWN
   * 549983475a45a52f5917acacb4e9f00ad7687a49 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34170)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] jay-li-csck commented on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty

2022-04-02 Thread GitBox



jay-li-csck commented on pull request #19330:
URL: https://github.com/apache/flink/pull/19330#issuecomment-1086646200


   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] jay-li-csck removed a comment on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty

2022-04-02 Thread GitBox



jay-li-csck removed a comment on pull request #19330:
URL: https://github.com/apache/flink/pull/19330#issuecomment-1086597006


   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] jay-li-csck removed a comment on pull request #19330: [FLINK-14998] [FileSystems] Remove FileUtils#deletePathIfEmpty

2022-04-02 Thread GitBox



jay-li-csck removed a comment on pull request #19330:
URL: https://github.com/apache/flink/pull/19330#issuecomment-1086608452


   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19331: [FLINK-26985][runtime] Don't discard shared state of restored checkpoints

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19331:
URL: https://github.com/apache/flink/pull/19331#issuecomment-108520


   
   ## CI report:
   
   * 3d8a33dad195ea7303270333d05f5449a5ea71bf Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34171)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19340: [FLINK-26961][BP-1.14][connectors][filesystems][formats] Update Jackson Databi…

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19340:
URL: https://github.com/apache/flink/pull/19340#issuecomment-1086644469


   
   ## CI report:
   
   * 948190b21e39a82c9accade41cd1fe0c861a8184 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34175)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-02 Thread GitBox



lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r841078416



##
File path: 
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/data/DataGenerator.java
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.benchmark.data;
+
+import org.apache.flink.ml.common.param.HasSeed;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+
+/** Interface for generating data as table arrays. */
+public interface DataGenerator> extends HasSeed {

Review comment:
   After thinking about this more, I think it might be simpler to just 
remove `DataGeneratorParams` and define parameters directly in `DataGenerator`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-02 Thread GitBox



lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r841074926



##
File path: 
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/data/InputDataGeneratorParams.java
##
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.benchmark.data;
+
+import org.apache.flink.ml.param.LongParam;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.param.ParamValidators;
+import org.apache.flink.ml.param.StringArrayParam;
+
+/** Interface for the input data generator params. */
+public interface InputDataGeneratorParams extends DataGeneratorParams {

Review comment:
   Would it be simpler to remove this class and define parameters directly 
in `InputDataGenerator`?
   
   By doing this, all subclasses of `InputDataGenerator`, such as 
`DenseVectorArrayGenerator` and `DenseVectorGenerator`, would inherit those 
parameters.
   
   Note that we have dedicated parameter classes for algorithms because an 
algorithm typically have a pair of estimator and model, where estimator and 
model share some but not call parameters. Data generators do not have have this 
problem.
   
   Similarly, it might be simpler to remove the generator-specific parameter 
classes such as `DenseVectorArrayGeneratorParams` and 
`DenseVectorGeneratorParams`. 
   
   We can still define non-generator-specific parameters in dedicated classes 
such as `HasArraySize`.
   

##
File path: 
flink-ml-benchmark/src/test/java/org/apache/flink/ml/benchmark/BenchmarkTest.java
##
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.benchmark;
+
+import org.apache.flink.ml.benchmark.data.InputDataGenerator;
+import org.apache.flink.ml.benchmark.data.common.DenseVectorGenerator;
+import org.apache.flink.ml.clustering.kmeans.KMeans;
+import org.apache.flink.ml.clustering.kmeans.KMeansModel;
+import org.apache.flink.ml.param.WithParams;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.test.util.AbstractTestBase;
+
+import org.apache.commons.io.FileUtils;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.ByteArrayOutputStream;
+import java.io.File;
+import java.io.InputStream;
+import java.io.PrintStream;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.ml.util.ReadWriteUtils.OBJECT_MAPPER;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+/** Tests benchmarks. */
+@SuppressWarnings("unchecked")
+public class BenchmarkTest extends AbstractTestBase {
+@Rule public final TemporaryFolder tempFolder = new TemporaryFolder();
+
+private static final String exampleConfigFile = 
"kmeansmodel-benchmark.json";
+private static final String expectedBenchmarkName = "KMeansModel-1";
+
+private final ByteArrayOutputStream outputStream = new 
ByteArrayOutputStream();
+private final PrintStream originalOut = System.out;
+
+@Before

[GitHub] [flink] flinkbot commented on pull request #19340: [FLINK-26961][BP-1.14][connectors][filesystems][formats] Update Jackson Databi…

2022-04-02 Thread GitBox



flinkbot commented on pull request #19340:
URL: https://github.com/apache/flink/pull/19340#issuecomment-1086644469


   
   ## CI report:
   
   * 948190b21e39a82c9accade41cd1fe0c861a8184 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] snuyanzin opened a new pull request #19340: [FLINK-26961][BP-1.14][connectors][filesystems][formats] Update Jackson Databi…

2022-04-02 Thread GitBox



snuyanzin opened a new pull request #19340:
URL: https://github.com/apache/flink/pull/19340


   This is a 1.14 backport of PR https://github.com/apache/flink/pull/19303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19328: [FLINK-26969][Examples] Write operation examples of tumbling window, sliding window, session window and count window based on pyflink

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19328:
URL: https://github.com/apache/flink/pull/19328#issuecomment-1085814902


   
   ## CI report:
   
   * b4c99e946189a0f96d61260557ed042431b14480 UNKNOWN
   * 49ce90f0906fc05b69af7fa969aee682d69121d4 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34168)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #19339: [FLINK-26961][BP-1.15][connectors][filesystems][formats] Update Jackson Databi…

2022-04-02 Thread GitBox



flinkbot edited a comment on pull request #19339:
URL: https://github.com/apache/flink/pull/19339#issuecomment-1086609854


   
   ## CI report:
   
   * 4e4dd3279b26e140bfef2bba76a57369000aa044 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34169)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 3 4 >

1 - 100 of 328 matches

Mail list logo