date:20211125

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   * 9c6dcb1aafe9c2062f809f4e673a8af77b17a23b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] snuyanzin opened a new pull request #17922: [FLINK-24908][Table SQL/Client] Improve SQL Error description for SQL Client

2021-11-25 Thread GitBox



snuyanzin opened a new pull request #17922:
URL: https://github.com/apache/flink/pull/17922


   ## What is the purpose of the change
   There are some problem queries with misprints like 
   ```sql
   ELECT  1;
   /*
   comment
   */ SEECT 1;
   ```
   Currently SQL client responses for such queries like `Non-query expression 
encountered in illegal context`. For larger queries it could be a time 
consuming task to understand what is wrong.
   
   The PR is aimed to provide more info for such queries like exact unknown 
token and start position. So it will help to faster detect issues. In fact this 
info is already present in `SqlParseException` in `pos` field.
   
   ## Brief change log
   In case of `SqlParseException` and message `Non-query expression encountered 
in illegal context` extract and show `pos` info
   
   ## Verifying this change
   There are tests added to 
_flink-table/flink-sql-client/src/test/resources/sql/select.q_
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] AHeise merged pull request #17881: [FLINK-24971][tests] Adding retry mechanism in case git clone fails

2021-11-25 Thread GitBox



AHeise merged pull request #17881:
URL: https://github.com/apache/flink/pull/17881


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] shenzhu commented on pull request #17788: [FLINK-15826][Tabel SQL/API] Add renameFunction() to Catalog

2021-11-25 Thread GitBox



shenzhu commented on pull request #17788:
URL: https://github.com/apache/flink/pull/17788#issuecomment-979761896


   Hey @JingsongLi , thanks for assigning the ticket to me! Would you mind 
taking a look at this PR when you have a moment?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   * 9c6dcb1aafe9c2062f809f4e673a8af77b17a23b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   * 9c6dcb1aafe9c2062f809f4e673a8af77b17a23b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   * 9c6dcb1aafe9c2062f809f4e673a8af77b17a23b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   * 9c6dcb1aafe9c2062f809f4e673a8af77b17a23b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xiangqiao123 commented on a change in pull request #17826: [FLINK-24953][Connectors / Hive]Optime hive parallelism inference

2021-11-25 Thread GitBox



xiangqiao123 commented on a change in pull request #17826:
URL: https://github.com/apache/flink/pull/17826#discussion_r757281586



##
File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveParallelismInference.java
##
@@ -63,10 +63,10 @@ int limit(Long limit) {
 return parallelism;
 }
 
-if (limit != null) {
-parallelism = Math.min(parallelism, (int) (limit / 1000));
+if (!infer || limit == null) {

Review comment:
   @luoyuxia Thank you for your review.
   > I mean If set table.exec.hive.infer-source-parallelism = false, it will 
always return the table.exec.resource.default-parallelism configured by user.
   > And if it's -1 , it will fallback to the parallelism of 
StreamExecutionEnvironment.
   
   if `set table.exec.hive.infer-source-parallelism = false` and 
`table.exec.resource.default-parallelism` is not set by user so it is default 
value -1, current logic of method `int limit(Long limit) ` return 1, and will 
not fallback to the parallelism of StreamExecutionEnvironment.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xiangqiao123 commented on a change in pull request #17826: [FLINK-24953][Connectors / Hive]Optime hive parallelism inference

2021-11-25 Thread GitBox



xiangqiao123 commented on a change in pull request #17826:
URL: https://github.com/apache/flink/pull/17826#discussion_r757281586



##
File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveParallelismInference.java
##
@@ -63,10 +63,10 @@ int limit(Long limit) {
 return parallelism;
 }
 
-if (limit != null) {
-parallelism = Math.min(parallelism, (int) (limit / 1000));
+if (!infer || limit == null) {

Review comment:
   @luoyuxia Thank you for your review.
   > I mean If set table.exec.hive.infer-source-parallelism = false, it will 
always return the table.exec.resource.default-parallelism configured by user.
   > And if it's -1 , it will fallback to the parallelism of 
StreamExecutionEnvironment.
   
   if `set table.exec.hive.infer-source-parallelism = false` and 
`table.exec.resource.default-parallelism` is not set by user it is default 
value -1, current logic of method `int limit(Long limit) ` return 1, and will 
not fallback to the parallelism of StreamExecutionEnvironment.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   * 9c6dcb1aafe9c2062f809f4e673a8af77b17a23b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Tartarus0zm commented on a change in pull request #17826: [FLINK-24953][Connectors / Hive]Optime hive parallelism inference

2021-11-25 Thread GitBox



Tartarus0zm commented on a change in pull request #17826:
URL: https://github.com/apache/flink/pull/17826#discussion_r757279531



##
File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveParallelismInference.java
##
@@ -63,10 +63,10 @@ int limit(Long limit) {
 return parallelism;
 }
 
-if (limit != null) {
-parallelism = Math.min(parallelism, (int) (limit / 1000));
+if (!infer || limit == null) {

Review comment:
   I think we only add  ``` if (!infer && limit == null) { return 
parallelism; } ```  is better,
   then the `limit` still work;




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   * 9c6dcb1aafe9c2062f809f4e673a8af77b17a23b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   * 9c6dcb1aafe9c2062f809f4e673a8af77b17a23b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-25068) Show the maximum parallelism (number of key groups) of a job in Web UI

2021-11-25 Thread zlzhang0122 (Jira)

zlzhang0122 created FLINK-25068:
---

 Summary: Show the maximum parallelism (number of key groups) of a 
job in Web UI
 Key: FLINK-25068
 URL: https://issues.apache.org/jira/browse/FLINK-25068
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Affects Versions: 1.14.0
Reporter: zlzhang0122


Now, Flink use maximum parallelism as the number of key groups to distribute 
the key, the maximum parallelism can be set manually, or flink will set-up 
automatically, sometimes the value is useful and we may want to know it, maybe 
we can expose in the Web UI.

By doing this, we can easily know the max parallelism we can suggest the user 
to scale when they are facing the leak of through-output, and we can know which 
subtask will processing the special value and we can find the log soon.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   * 9c6dcb1aafe9c2062f809f4e673a8af77b17a23b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] yunfengzhou-hub commented on a change in pull request #24: [Flink 24557] - Add knn algorithm to flink-ml

2021-11-25 Thread GitBox



yunfengzhou-hub commented on a change in pull request #24:
URL: https://github.com/apache/flink-ml/pull/24#discussion_r757264541



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/knn/KnnParams.java
##
@@ -0,0 +1,18 @@
+package org.apache.flink.ml.classification.knn;
+
+import org.apache.flink.ml.common.param.HasFeatureColsDefaultAsNull;
+import org.apache.flink.ml.common.param.HasK;
+import org.apache.flink.ml.common.param.HasLabelCol;
+import org.apache.flink.ml.common.param.HasPredictionCol;
+import org.apache.flink.ml.common.param.HasVectorColDefaultAsNull;
+import org.apache.flink.ml.param.WithParams;
+
+/** knn parameters. */
+public interface KnnParams
+extends WithParams,

Review comment:
   It seems that `KnnParams` does not need to directly inherit from 
`WithParams`.

##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/knn/KnnModel.java
##
@@ -0,0 +1,493 @@
+package org.apache.flink.ml.classification.knn;
+
+import org.apache.flink.api.common.eventtime.WatermarkStrategy;
+import org.apache.flink.api.common.functions.RichMapFunction;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.connector.source.Source;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.connector.file.sink.FileSink;
+import org.apache.flink.connector.file.src.FileSource;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.ml.api.core.Model;
+import org.apache.flink.ml.common.broadcast.BroadcastUtils;
+import org.apache.flink.ml.linalg.DenseMatrix;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.VectorUtils;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.shaded.curator4.com.google.common.base.Preconditions;
+import org.apache.flink.shaded.curator4.com.google.common.collect.ImmutableMap;
+import 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonProcessingException;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.BasePathBucketAssigner;
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.rollingpolicies.OnCheckpointRollingPolicy;
+import org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.internal.TableImpl;
+import org.apache.flink.table.catalog.ResolvedSchema;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.utils.LogicalTypeParser;
+import org.apache.flink.table.types.utils.LogicalTypeDataTypeConverter;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.PriorityQueue;
+import java.util.TreeMap;
+import java.util.function.Function;
+
+/** Knn classification model fitted by estimator. */
+public class KnnModel implements Model, KnnParams {
+protected Map, Object> params = new HashMap<>();
+private Table[] modelData;
+
+/** constructor. */
+public KnnModel() {
+ParamUtils.initializeMapWithDefaultValues(params, this);
+}
+
+/**
+ * constructor.
+ *
+ * @param params parameters for algorithm.
+ */
+public KnnModel(Map, Object> params) {
+this.params = params;
+}
+
+/**
+ * Set model data for knn prediction.
+ *
+ * @param modelData knn model.
+ * @return knn classification model.
+ */
+@Override
+public KnnModel setModelData(Table... modelData) {
+this.modelData = modelData;
+return this;
+}
+
+/**
+ * get model data.
+ *
+ * @return list of tables.
+ */
+@Override
+public Table[] getModelData() {
+return modelData;
+}
+
+/**
+ * @param inputs a list of tables.
+ * @return result.
+ */
+@Override
+public Table[] transform(Table... inputs) {
+StreamTableEnvironment tEnv =
+(StreamTableEnvironment) ((TableImpl) 
inputs[0]).getTableEnvironment();
+DataStream input = tEnv.toDataStream(inputs[0]);
+DataStream model = tEnv.toDataStream(modelData[0]);
+final String BROADCAST_STR =

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   * 9c6dcb1aafe9c2062f809f4e673a8af77b17a23b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17761: [FLINK-24862][Connectors / Hive]Fix user-defined hive udaf/udtf cannot be used normally in hive dialect

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17761:
URL: https://github.com/apache/flink/pull/17761#issuecomment-965943227


   
   ## CI report:
   
   * a321c9ac3866e238f9b1b7198c5b73def28d4d3f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27102)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27105)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] banmoy commented on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



banmoy commented on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737934


   @Myasuka Please help to review this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot commented on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737516


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 80213498bbe8a905b5fcd2ee0077b3c59853b443 (Fri Nov 26 
06:59:01 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



flinkbot commented on pull request #17921:
URL: https://github.com/apache/flink/pull/17921#issuecomment-979737381


   
   ## CI report:
   
   * 80213498bbe8a905b5fcd2ee0077b3c59853b443 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-23798) Avoid using reflection to get filter when partition filter is enabled

2021-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-23798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-23798:
---
Labels: pull-request-available  (was: )

> Avoid using reflection to get filter when partition filter is enabled
> -
>
> Key: FLINK-23798
> URL: https://issues.apache.org/jira/browse/FLINK-23798
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Yun Tang
>Assignee: PengFei Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> FLINK-20496 introduce partitioned index & filter to Flink. However, RocksDB 
> only support new full format of filter in this feature, and we need to 
> replace previous filter if user enabled. [Previous implementation use 
> reflection to get the 
> filter|https://github.com/apache/flink/blob/7ff4cbdc25aa971dccaf5ce02aaf46dc1e7345cc/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBResourceContainer.java#L251-L258]
>  and we could use API to get that after upgrading to newer version.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] banmoy opened a new pull request #17921: [FLINK-23798][state] Avoid using reflection to get filter when partition filter is enable

2021-11-25 Thread GitBox



banmoy opened a new pull request #17921:
URL: https://github.com/apache/flink/pull/17921


   
   
   ## What is the purpose of the change
   
   FLINK-20496 introduce partitioned index & filter to Flink. However, RocksDB 
only support new full format of filter in this feature, and we need to replace 
previous filter if user enabled. Previous implementation use reflection to get 
the filter and we could use API to get that after upgrading RocksDB to 
`6.20.3-ververica-1.0`.
   
   ## Brief change log
   
 - Use `BlockBasedTableConfig#filterPolicy()` to get filter in 
`RocksDBResourceContainer#overwriteFilterIfExist()`
   
   ## Verifying this change
   
   This change is a trivial rework, and already covered by existing tests in 
`RocksDBResourceContainerTest#testGetColumnFamilyOptionsWithPartitionedIndex()`.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-21027) Add isStateKeyValueSerialized() method to KeyedStateBackend interface

2021-11-25 Thread Yun Tang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-21027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-21027:
-
Fix Version/s: 1.15.0

> Add isStateKeyValueSerialized() method to KeyedStateBackend interface
> -
>
> Key: FLINK-21027
> URL: https://issues.apache.org/jira/browse/FLINK-21027
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Jark Wu
>Assignee: Yun Tang
>Priority: Minor
>  Labels: auto-deprioritized-major, auto-unassigned
> Fix For: 1.15.0
>
>
> In Table/SQL operators, we have some optimizations that reuse objects of keys 
> and records. For example, we buffer input records in {{BytesMultiMap}} and 
> use the reused object to map to the underlying memory segment to reduce bytes 
> copy. 
> However, if we put the reused key and value into Heap statebackend, the 
> result will be wrong, because it is not allowed to mutate keys and values in 
> Heap statebackend. 
> Therefore, it would be great if {{KeyedStateBackend}} can expose such API, so 
> that Table/SQL can dynamically decide whether to copy the keys and values 
> before putting into state. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (FLINK-21027) Add isStateKeyValueSerialized() method to KeyedStateBackend interface

2021-11-25 Thread Yun Tang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-21027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-21027:


Assignee: Yun Tang

> Add isStateKeyValueSerialized() method to KeyedStateBackend interface
> -
>
> Key: FLINK-21027
> URL: https://issues.apache.org/jira/browse/FLINK-21027
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Jark Wu
>Assignee: Yun Tang
>Priority: Minor
>  Labels: auto-deprioritized-major, auto-unassigned
>
> In Table/SQL operators, we have some optimizations that reuse objects of keys 
> and records. For example, we buffer input records in {{BytesMultiMap}} and 
> use the reused object to map to the underlying memory segment to reduce bytes 
> copy. 
> However, if we put the reused key and value into Heap statebackend, the 
> result will be wrong, because it is not allowed to mutate keys and values in 
> Heap statebackend. 
> Therefore, it would be great if {{KeyedStateBackend}} can expose such API, so 
> that Table/SQL can dynamically decide whether to copy the keys and values 
> before putting into state. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink-ml] lindong28 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



lindong28 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757255599



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/clustering/kmeans/KMeans.java
##
@@ -146,7 +145,7 @@ public IterationBodyResult process(
 DataStream points = dataStreams.get(0);
 
 DataStream terminationCriteria =
-centroids.flatMap(new 
TerminateOnMaxIterationNum<>(maxIterationNum));
+centroids.map(x -> 0.).flatMap(new 
TerminationCriteria(maxIterationNum));

Review comment:
   How about we have `TerminateOnMaxIter` and `TerminateOnMaxIterOrTol`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17920: [FLINK-25067][doc] Correct the description of RocksDB's background threads for flush and compaction

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17920:
URL: https://github.com/apache/flink/pull/17920#issuecomment-979723335


   
   ## CI report:
   
   * 92ab9a62c27a104103b1ba3d9d9d55a956dadb50 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27104)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



lindong28 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757247408



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/linear/LogisticRegressionModel.java
##
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification.linear;
+
+import org.apache.flink.api.common.eventtime.WatermarkStrategy;
+import org.apache.flink.api.common.functions.RichMapFunction;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.api.common.typeinfo.PrimitiveArrayTypeInfo;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.connector.source.Source;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.connector.file.sink.FileSink;
+import org.apache.flink.connector.file.src.FileSource;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.ml.api.core.Model;
+import 
org.apache.flink.ml.classification.linear.LogisticRegressionModelData.LogisticRegressionModelDataEncoder;
+import 
org.apache.flink.ml.classification.linear.LogisticRegressionModelData.LogisticRegressionModelDataStreamFormat;
+import org.apache.flink.ml.common.broadcast.BroadcastUtils;
+import org.apache.flink.ml.common.linalg.BLAS;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.BasePathBucketAssigner;
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.rollingpolicies.OnCheckpointRollingPolicy;
+import org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.internal.TableImpl;
+import org.apache.flink.types.Row;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+/** This class implements {@link Model} for {@link LogisticRegression}. */
+public class LogisticRegressionModel
+implements Model,
+LogisticRegressionModelParams {
+
+private Map, Object> paramMap = new HashMap<>();
+
+private Table model;
+
+public LogisticRegressionModel() {
+ParamUtils.initializeMapWithDefaultValues(this.paramMap, this);
+}
+
+@Override
+public Map, Object> getParamMap() {
+return paramMap;
+}
+
+@Override
+public void save(String path) throws IOException {
+StreamTableEnvironment tEnv =
+(StreamTableEnvironment) ((TableImpl) 
model).getTableEnvironment();
+String dataPath = ReadWriteUtils.getDataPath(path);
+FileSink sink =
+FileSink.forRowFormat(new Path(dataPath), new 
LogisticRegressionModelDataEncoder())
+.withRollingPolicy(OnCheckpointRollingPolicy.build())
+.withBucketAssigner(new BasePathBucketAssigner<>())
+.build();
+ReadWriteUtils.saveMetadata(this, path);
+tEnv.toDataStream(model)
+.map(x -> (LogisticRegressionModelData) x.getField(0))
+.sinkTo(sink)
+.setParallelism(1);
+}
+
+public static LogisticRegressionModel load(StreamExecutionEnvironment env, 
String path)
+throws IOException {
+StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);
+Source source =
+FileSource.forRecordStreamFormat(
+new LogisticRegressionModelDataStreamFormat(),
+ReadWriteUtils.getDataPaths(path))
+.build();
+LogisticRegressionModel model

[GitHub] [flink-ml] lindong28 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



lindong28 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757252256



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/linear/LogisticGradient.java
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification.linear;
+
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.ml.common.linalg.BLAS;
+
+import java.io.Serializable;
+
+/** Utility class to compute gradient and loss for logistic loss. */

Review comment:
   The link provided by Spark links to slides, which is not very easy to 
glance through.
   
   It would be sufficient if we can have a link to wikipedia with the necessary 
mathematical formulas.
   
   I am OK to just keep the slideshare link for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #17920: [FLINK-25067][doc] Correct the description of RocksDB's background threads for flush and compaction

2021-11-25 Thread GitBox



flinkbot commented on pull request #17920:
URL: https://github.com/apache/flink/pull/17920#issuecomment-979723264






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17919: [FLINK-24419][Table SQL/API] Casting to a CHAR() and VARCHAR() doesn't trim the string to the specified precision

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17919:
URL: https://github.com/apache/flink/pull/17919#issuecomment-979659835


   
   ## CI report:
   
   * aa962bd691e9648a1cfcacaebee015fecef2ec7c Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27100)
 
   * 87fd0c5c53eab00390b5e49eb4a5f51663b00928 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27103)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-25067) Correct the description of RocksDB's background threads

2021-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-25067:
---
Labels: pull-request-available  (was: )

> Correct the description of RocksDB's background threads
> ---
>
> Key: FLINK-25067
> URL: https://issues.apache.org/jira/browse/FLINK-25067
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Runtime / State Backends
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.15.0, 1.14.1, 1.13.4
>
>
> RocksDB actually has changed the maximum number of concurrent background 
> flush and compaction jobs to 2 for long time, we should fix the related 
> documentation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] Myasuka opened a new pull request #17920: [FLINK-25067][doc] Correct the description of RocksDB's background threads for flush and compaction

2021-11-25 Thread GitBox



Myasuka opened a new pull request #17920:
URL: https://github.com/apache/flink/pull/17920


   ## What is the purpose of the change
   
   Correct the description of RocksDB's background threads for flush and 
compaction.
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17919: [FLINK-24419][Table SQL/API] Casting to a CHAR() and VARCHAR() doesn't trim the string to the specified precision

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17919:
URL: https://github.com/apache/flink/pull/17919#issuecomment-979659835


   
   ## CI report:
   
   * aa962bd691e9648a1cfcacaebee015fecef2ec7c Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27100)
 
   * 87fd0c5c53eab00390b5e49eb4a5f51663b00928 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Zakelly commented on a change in pull request #17874: [FLINK-24046] Refactor the EmbeddedRocksDBStateBackend configuration logic

2021-11-25 Thread GitBox



Zakelly commented on a change in pull request #17874:
URL: https://github.com/apache/flink/pull/17874#discussion_r757248608



##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBConfigurableOptions.java
##
@@ -55,23 +57,23 @@
 public static final ConfigOption MAX_BACKGROUND_THREADS =
 key("state.backend.rocksdb.thread.num")
 .intType()
-.noDefaultValue()
+.defaultValue(1)

Review comment:
   Good point. I'll fix the docs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] zhipeng93 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



zhipeng93 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757248612



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/common/datastream/SortPartitionImpl.java
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.common.datastream;
+
+import org.apache.flink.api.common.state.ListState;
+import org.apache.flink.api.common.state.ListStateDescriptor;
+import org.apache.flink.api.common.typeutils.TypeComparator;
+import org.apache.flink.ml.common.utils.ComparatorAdapter;
+import org.apache.flink.runtime.state.StateInitializationContext;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
+import org.apache.flink.streaming.api.operators.BoundedOneInput;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+
+import org.apache.commons.collections.IteratorUtils;
+
+import java.util.List;
+
+/** Applies sortPartition to a bounded data stream. */
+class SortPartitionImpl {

Review comment:
   Sure, let's follow the existing practice. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17761: [FLINK-24862][Connectors / Hive]Fix user-defined hive udaf/udtf cannot be used normally in hive dialect

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17761:
URL: https://github.com/apache/flink/pull/17761#issuecomment-965943227


   
   ## CI report:
   
   * 6b9562e4437590efafbfbc1d714f537c0567fba9 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27101)
 
   * a321c9ac3866e238f9b1b7198c5b73def28d4d3f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27102)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] zhipeng93 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



zhipeng93 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757248185



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/clustering/kmeans/KMeans.java
##
@@ -146,7 +145,7 @@ public IterationBodyResult process(
 DataStream points = dataStreams.get(0);
 
 DataStream terminationCriteria =
-centroids.flatMap(new 
TerminateOnMaxIterationNum<>(maxIterationNum));
+centroids.map(x -> 0.).flatMap(new 
TerminationCriteria(maxIterationNum));

Review comment:
   Having separate classess for different termination conditions is fine 
for me and also may not confuse users. Then we are going to have the following 
three classes:
   - `TerminateOnMaxIterationNum`
   - `TerminateOnToleranceThreshold`
   - `TerminateOnMaxIterationNumOrToleranceThreshold
   
   Do you think it is a viable solution?

##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/clustering/kmeans/KMeans.java
##
@@ -146,7 +145,7 @@ public IterationBodyResult process(
 DataStream points = dataStreams.get(0);
 
 DataStream terminationCriteria =
-centroids.flatMap(new 
TerminateOnMaxIterationNum<>(maxIterationNum));
+centroids.map(x -> 0.).flatMap(new 
TerminationCriteria(maxIterationNum));

Review comment:
   Having separate classess for different termination conditions is fine 
for me and also may not confuse users. Then we are going to have the following 
three classes:
   - `TerminateOnMaxIterationNum`
   - `TerminateOnToleranceThreshold`
   - `TerminateOnMaxIterationNumOrToleranceThreshold`
   
   Do you think it is a viable solution?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Zakelly commented on a change in pull request #17874: [FLINK-24046] Refactor the EmbeddedRocksDBStateBackend configuration logic

2021-11-25 Thread GitBox



Zakelly commented on a change in pull request #17874:
URL: https://github.com/apache/flink/pull/17874#discussion_r757247965



##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBResourceContainer.java
##
@@ -91,10 +108,13 @@ public RocksDBResourceContainer(
 
 /** Gets the RocksDB {@link DBOptions} to be used for RocksDB instances. */
 public DBOptions getDbOptions() {
-// initial options from pre-defined profile
-DBOptions opt = predefinedOptions.createDBOptions(handlesToClose);
+// initial options from common profile
+DBOptions opt = createDBOptions();
 handlesToClose.add(opt);
 
+// load configurable options on top of pre-defined profile
+opt = getDBOptionsFromConfigurableOptions(opt, handlesToClose);

Review comment:
   Sure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-25045) Introduce AdaptiveBatchScheduler

2021-11-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-25045:
---

Assignee: Lijie Wang

> Introduce AdaptiveBatchScheduler
> 
>
> Key: FLINK-25045
> URL: https://issues.apache.org/jira/browse/FLINK-25045
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>
> Introduce AdaptiveBatchScheduler



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink-ml] lindong28 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



lindong28 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757247408



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/linear/LogisticRegressionModel.java
##
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification.linear;
+
+import org.apache.flink.api.common.eventtime.WatermarkStrategy;
+import org.apache.flink.api.common.functions.RichMapFunction;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.api.common.typeinfo.PrimitiveArrayTypeInfo;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.connector.source.Source;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.connector.file.sink.FileSink;
+import org.apache.flink.connector.file.src.FileSource;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.ml.api.core.Model;
+import 
org.apache.flink.ml.classification.linear.LogisticRegressionModelData.LogisticRegressionModelDataEncoder;
+import 
org.apache.flink.ml.classification.linear.LogisticRegressionModelData.LogisticRegressionModelDataStreamFormat;
+import org.apache.flink.ml.common.broadcast.BroadcastUtils;
+import org.apache.flink.ml.common.linalg.BLAS;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.BasePathBucketAssigner;
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.rollingpolicies.OnCheckpointRollingPolicy;
+import org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.internal.TableImpl;
+import org.apache.flink.types.Row;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+/** This class implements {@link Model} for {@link LogisticRegression}. */
+public class LogisticRegressionModel
+implements Model,
+LogisticRegressionModelParams {
+
+private Map, Object> paramMap = new HashMap<>();
+
+private Table model;
+
+public LogisticRegressionModel() {
+ParamUtils.initializeMapWithDefaultValues(this.paramMap, this);
+}
+
+@Override
+public Map, Object> getParamMap() {
+return paramMap;
+}
+
+@Override
+public void save(String path) throws IOException {
+StreamTableEnvironment tEnv =
+(StreamTableEnvironment) ((TableImpl) 
model).getTableEnvironment();
+String dataPath = ReadWriteUtils.getDataPath(path);
+FileSink sink =
+FileSink.forRowFormat(new Path(dataPath), new 
LogisticRegressionModelDataEncoder())
+.withRollingPolicy(OnCheckpointRollingPolicy.build())
+.withBucketAssigner(new BasePathBucketAssigner<>())
+.build();
+ReadWriteUtils.saveMetadata(this, path);
+tEnv.toDataStream(model)
+.map(x -> (LogisticRegressionModelData) x.getField(0))
+.sinkTo(sink)
+.setParallelism(1);
+}
+
+public static LogisticRegressionModel load(StreamExecutionEnvironment env, 
String path)
+throws IOException {
+StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);
+Source source =
+FileSource.forRecordStreamFormat(
+new LogisticRegressionModelDataStreamFormat(),
+ReadWriteUtils.getDataPaths(path))
+.build();
+LogisticRegressionModel model

[jira] [Assigned] (FLINK-25033) Let some scheduler components updatable

2021-11-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-25033:
---

Assignee: Lijie Wang

> Let some scheduler components updatable
> ---
>
> Key: FLINK-25033
> URL: https://issues.apache.org/jira/browse/FLINK-25033
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>
> Many scheduler components rely on the execution topology to make decisions. 
> Some of them will build up some mappings against the execution topology on 
> initialization for later use. When the execution topology becomes dynamic, 
> these components need to be notified about the topology changes and adjust 
> themselves accordingly. These components are:
>  * DefaultExecutionTopology
>  * SchedulingStrategy
>  * PartitionReleaseStrategy
>  * SlotSharingStrategy
>  * OperatorCoordinatorHandler
>  * Network memory of SlotSharingGroup.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (FLINK-25032) Allow to create execution vertices and execution edges lazily

2021-11-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-25032:
---

Assignee: Lijie Wang

> Allow to create execution vertices and execution edges lazily
> -
>
> Key: FLINK-25032
> URL: https://issues.apache.org/jira/browse/FLINK-25032
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>
> For a dynamic graph, its execution vertices and execution edges should be 
> lazily created.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (FLINK-25035) Shuffle Service Supports Consuming Subpartition Range

2021-11-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-25035:
---

Assignee: Lijie Wang

> Shuffle Service Supports Consuming Subpartition Range
> -
>
> Key: FLINK-25035
> URL: https://issues.apache.org/jira/browse/FLINK-25035
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>
> In adaptive batch scheduler, the shuffle service needs to support a 
> SingleInputGate to consume  a certain range of subpartitions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (FLINK-25034) Support flexible number of subpartitions in IntermediateResultPartition

2021-11-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-25034:
---

Assignee: Lijie Wang

> Support flexible number of subpartitions in IntermediateResultPartition
> ---
>
> Key: FLINK-25034
> URL: https://issues.apache.org/jira/browse/FLINK-25034
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>
> Currently, when a task is deployed, it needs to know the parallelism of its 
> consumer job vertex. This is because the consumer vertex parallelism is 
> needed to decide the _numberOfSubpartitions_ of _PartitionDescriptor_ which 
> is part of the {_}ResultPartitionDeploymentDescriptor{_}. The reason behind 
> that is, at the moment, for one result partition, different subpartitions 
> serve different consumer execution vertices. More specifically, one consumer 
> execution vertex only consumes data from subpartition with the same index. 
> Considering a dynamic graph, the parallelism of a job vertex may not have 
> been decided when its upstream vertices are deployed. To enable Flink to work 
> in this case, we need a way to allow an execution vertex to run without 
> knowing the parallelism of its consumer job vertices. One basic idea is to 
> enable multiple subpartitions in one result partition to serve the same 
> consumer execution vertex.
> To achieve this goal, we can set the number of subpartitions to be the *max 
> parallelism* of the consumer job vertex. When the consumer vertex is 
> deployed, it should be assigned with a subpartition range to consume.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (FLINK-25036) Introduce stage-wised scheduling strategy

2021-11-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-25036:
---

Assignee: Lijie Wang

> Introduce stage-wised scheduling strategy
> -
>
> Key: FLINK-25036
> URL: https://issues.apache.org/jira/browse/FLINK-25036
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>
> The scheduling of the adaptive batch job scheduler should be stage 
> granularity, because the information for deciding parallelism can only be 
> collected after the upstream stage is fully finished, so we need to introduce 
> a new scheduling strategy: Stage-wised Scheduling Strategy.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (FLINK-25011) Introduce VertexParallelismDecider

2021-11-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-25011:
---

Assignee: Lijie Wang

> Introduce VertexParallelismDecider
> --
>
> Key: FLINK-25011
> URL: https://issues.apache.org/jira/browse/FLINK-25011
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>
> Introduce VertexParallelismDecider and provide a default implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (FLINK-25031) Job finishes iff all job vertices finish

2021-11-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-25031:
---

Assignee: Lijie Wang

> Job finishes iff all job vertices finish
> 
>
> Key: FLINK-25031
> URL: https://issues.apache.org/jira/browse/FLINK-25031
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>
> The adaptive batch scheduler needs to build ExecutionGraph dynamically. For a 
> dynamic graph, since its execution vertices can be lazily created, a job 
> should not finish when all ExecutionVertex(es) finish. Changes should be made 
> to let a job finish only when all registered ExecutionJobVertex have finished.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (FLINK-24892) FLIP-187: Adaptive Batch Scheduler

2021-11-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-24892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-24892:
---

Assignee: Lijie Wang

> FLIP-187: Adaptive Batch Scheduler
> --
>
> Key: FLINK-24892
> URL: https://issues.apache.org/jira/browse/FLINK-24892
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>
> Introduce a new scheduler to Flink: adaptive batch scheduler. The new 
> scheduler can automatically decide parallelisms of job vertices for batch 
> jobs, according to the size of data volume each vertex needs to process.
> More details see 
> [FLIP-187.|https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (FLINK-24980) Collect sizes of finished BLOCKING result partitions

2021-11-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-24980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-24980:
---

Assignee: Lijie Wang

> Collect sizes of finished BLOCKING result partitions
> 
>
> Key: FLINK-24980
> URL: https://issues.apache.org/jira/browse/FLINK-24980
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>  Labels: pull-request-available
>
> The adaptive batch scheduler needs to know the size of each result partition 
> when the task is finished.
> This issue will introduce the *numBytesProduced* counter and register it into 
> {*}TaskIOMetricGroup{*}, to record the size of each result partition. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] Zakelly commented on a change in pull request #17874: [FLINK-24046] Refactor the EmbeddedRocksDBStateBackend configuration logic

2021-11-25 Thread GitBox



Zakelly commented on a change in pull request #17874:
URL: https://github.com/apache/flink/pull/17874#discussion_r757247017



##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java
##
@@ -782,6 +791,29 @@ public void setWriteBatchSize(long writeBatchSize) {
 //  utilities
 // 
 
+private ReadableConfig mergeConfigurableOptions(ReadableConfig base, 
ReadableConfig onTop) {
+if (base == null) {
+base = new Configuration();
+}
+if (onTop == null) {

Review comment:
   Sure, will remove this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Zakelly commented on a change in pull request #17874: [FLINK-24046] Refactor the EmbeddedRocksDBStateBackend configuration logic

2021-11-25 Thread GitBox



Zakelly commented on a change in pull request #17874:
URL: https://github.com/apache/flink/pull/17874#discussion_r757246538



##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBResourceContainer.java
##
@@ -245,6 +268,121 @@ private boolean 
overwriteFilterIfExist(BlockBasedTableConfig blockBasedTableConf
 return true;
 }
 
+/** Create a {@link DBOptions} for RocksDB, including some common 
settings. */
+DBOptions createDBOptions() {

Review comment:
   Sure, I'll change it. Same as below




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Zakelly commented on a change in pull request #17874: [FLINK-24046] Refactor the EmbeddedRocksDBStateBackend configuration logic

2021-11-25 Thread GitBox



Zakelly commented on a change in pull request #17874:
URL: https://github.com/apache/flink/pull/17874#discussion_r757245729



##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java
##
@@ -500,15 +509,15 @@ private RocksDBOptionsFactory configureOptionsFactory(
 return originalOptionsFactory;
 }
 
-// if using DefaultConfigurableOptionsFactory by default, we could 
avoid reflection to speed
-// up.
-if 
(factoryClassName.equalsIgnoreCase(DefaultConfigurableOptionsFactory.class.getName()))
 {
-DefaultConfigurableOptionsFactory optionsFactory =
-new DefaultConfigurableOptionsFactory();
-optionsFactory.configure(config);
-LOG.info("Using default options factory: {}.", optionsFactory);
-
-return optionsFactory;
+// From FLINK-24046, we deprecate the 
DefaultConfigurableOptionsFactory.
+if (factoryClassName == null) {
+return null;
+} else if (factoryClassName.equalsIgnoreCase(
+DefaultConfigurableOptionsFactory.class.getName())) {
+LOG.info(
+"{} is deprecated. Please remove this value from the 
configuration.",

Review comment:
   Good point. I'll add those warnings.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Zakelly commented on a change in pull request #17874: [FLINK-24046] Refactor the EmbeddedRocksDBStateBackend configuration logic

2021-11-25 Thread GitBox



Zakelly commented on a change in pull request #17874:
URL: https://github.com/apache/flink/pull/17874#discussion_r757245444



##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/PredefinedOptions.java
##
@@ -50,26 +56,15 @@
  * The following options are set:
  *
  * 
- *   setUseFsync(false)
  *   setInfoLogLevel(InfoLogLevel.HEADER_LEVEL)
- *   setStatsDumpPeriodSec(0)
  * 
  */
-DEFAULT {
-
-@Override
-public DBOptions createDBOptions(Collection 
handlesToClose) {
-return new DBOptions()
-.setUseFsync(false)
-.setInfoLogLevel(InfoLogLevel.HEADER_LEVEL)
-.setStatsDumpPeriodSec(0);
-}
-
-@Override
-public ColumnFamilyOptions 
createColumnOptions(Collection handlesToClose) {
-return new ColumnFamilyOptions();
-}
-},
+DEFAULT(
+new HashMap, Object>() {
+{
+put(RocksDBConfigurableOptions.LOG_LEVEL, 
InfoLogLevel.HEADER_LEVEL);
+}
+}),

Review comment:
   Will do




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



lindong28 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757245219



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/clustering/kmeans/KMeans.java
##
@@ -146,7 +145,7 @@ public IterationBodyResult process(
 DataStream points = dataStreams.get(0);
 
 DataStream terminationCriteria =
-centroids.flatMap(new 
TerminateOnMaxIterationNum<>(maxIterationNum));
+centroids.map(x -> 0.).flatMap(new 
TerminationCriteria(maxIterationNum));

Review comment:
   IMO, for users who wants to termination iteration based on round number, 
asking them to map input to double-typed values seems unnecessary and might 
confuse users.
   
   And if we have two separate classes such as `TerminateOnMaxIterationNum ` 
and `TerminateOnToleranceThreshold`, given that users already have concepts of 
these two types of termination pattern, the class names seem to be pretty 
self-explanatory and intuitive.
   
   What do you think? Or maybe we can wait for @gaoyunhaii comment?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Zakelly commented on a change in pull request #17874: [FLINK-24046] Refactor the EmbeddedRocksDBStateBackend configuration logic

2021-11-25 Thread GitBox



Zakelly commented on a change in pull request #17874:
URL: https://github.com/apache/flink/pull/17874#discussion_r757245062



##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/PredefinedOptions.java
##
@@ -83,35 +78,24 @@ public ColumnFamilyOptions 
createColumnOptions(Collection handles
  *   setCompactionStyle(CompactionStyle.LEVEL)
  *   setLevelCompactionDynamicLevelBytes(true)
  *   setIncreaseParallelism(4)
- *   setUseFsync(false)
  *   setDisableDataSync(true)
  *   setMaxOpenFiles(-1)
  *   setInfoLogLevel(InfoLogLevel.HEADER_LEVEL)
- *   setStatsDumpPeriodSec(0)
  * 
  *
  * Note: Because Flink does not rely on RocksDB data on disk for 
recovery, there is no need
  * to sync data to stable storage.
  */
-SPINNING_DISK_OPTIMIZED {
-
-@Override
-public DBOptions createDBOptions(Collection 
handlesToClose) {
-return new DBOptions()
-.setIncreaseParallelism(4)
-.setUseFsync(false)
-.setMaxOpenFiles(-1)
-.setInfoLogLevel(InfoLogLevel.HEADER_LEVEL)
-.setStatsDumpPeriodSec(0);
-}
-
-@Override
-public ColumnFamilyOptions 
createColumnOptions(Collection handlesToClose) {
-return new ColumnFamilyOptions()
-.setCompactionStyle(CompactionStyle.LEVEL)
-.setLevelCompactionDynamicLevelBytes(true);
-}
-},
+SPINNING_DISK_OPTIMIZED(
+new HashMap, Object>() {
+{
+put(RocksDBConfigurableOptions.MAX_BACKGROUND_THREADS, 4);
+put(RocksDBConfigurableOptions.MAX_OPEN_FILES, -1);
+put(RocksDBConfigurableOptions.LOG_LEVEL, 
InfoLogLevel.HEADER_LEVEL);
+put(RocksDBConfigurableOptions.COMPACTION_STYLE, 
CompactionStyle.LEVEL);
+put(RocksDBConfigurableOptions.USE_DYNAMIC_LEVEL_SIZE, 
true);
+}

Review comment:
   Sure, I'll add serialVersionUID to each class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Zakelly commented on a change in pull request #17874: [FLINK-24046] Refactor the EmbeddedRocksDBStateBackend configuration logic

2021-11-25 Thread GitBox



Zakelly commented on a change in pull request #17874:
URL: https://github.com/apache/flink/pull/17874#discussion_r757244954



##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/PredefinedOptions.java
##
@@ -131,55 +115,35 @@ public ColumnFamilyOptions 
createColumnOptions(Collection handles
  *   setIncreaseParallelism(4)
  *   setMinWriteBufferNumberToMerge(3)
  *   setMaxWriteBufferNumber(4)
- *   setUseFsync(false)
  *   setMaxOpenFiles(-1)
  *   setInfoLogLevel(InfoLogLevel.HEADER_LEVEL)
- *   setStatsDumpPeriodSec(0)
  *   BlockBasedTableConfig.setBlockCacheSize(256 MBytes)
  *   BlockBasedTableConfigsetBlockSize(128 KBytes)

Review comment:
   Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17919: [FLINK-24419][Table SQL/API] Casting to a CHAR() and VARCHAR() doesn't trim the string to the specified precision

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17919:
URL: https://github.com/apache/flink/pull/17919#issuecomment-979659835


   
   ## CI report:
   
   * aa962bd691e9648a1cfcacaebee015fecef2ec7c Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27100)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



lindong28 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757242053



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/common/datastream/SortPartitionImpl.java
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.common.datastream;
+
+import org.apache.flink.api.common.state.ListState;
+import org.apache.flink.api.common.state.ListStateDescriptor;
+import org.apache.flink.api.common.typeutils.TypeComparator;
+import org.apache.flink.ml.common.utils.ComparatorAdapter;
+import org.apache.flink.runtime.state.StateInitializationContext;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
+import org.apache.flink.streaming.api.operators.BoundedOneInput;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+
+import org.apache.commons.collections.IteratorUtils;
+
+import java.util.List;
+
+/** Applies sortPartition to a bounded data stream. */
+class SortPartitionImpl {

Review comment:
   The problem you described here is common to most projects (e.g. Flink). 
Can we follow the existing practice in e.g. Flink instead inventing new 
patterns?
   
   Is there any util class in popular projects like Flink/Kafka/Spark that 
explicitly moves the implementation of its static methods to individual `*impl` 
classes?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17598: [FLINK-24703][connectors][formats] Add CSV format support for filesystem based on StreamFormat and BulkWriter interfaces.

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17598:
URL: https://github.com/apache/flink/pull/17598#issuecomment-954282485


   
   ## CI report:
   
   * 12d370196fdbe6ff098e8c93512b90fb774aa5c6 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27098)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



lindong28 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757240950



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/common/param/HasMaxIter.java
##
@@ -26,7 +26,7 @@
 /** Interface for the shared maxIter param. */
 public interface HasMaxIter extends WithParams {
 Param MAX_ITER =
-new IntParam("maxIter", "Maximum number of iterations.", 20, 
ParamValidators.gtEq(0));
+new IntParam("maxIter", "Maximum number of iterations.", 20, 
ParamValidators.gt(0));

Review comment:
   OK. Then let's keep it as is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17699: [FLINK-20370][table] part1: Fix wrong results when sink primary key is not the same with query result's changelog upsert key

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17699:
URL: https://github.com/apache/flink/pull/17699#issuecomment-961963701


   
   ## CI report:
   
   * cf97485d0fe1e71b9fefd3f7f1d67014492b3bd5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27099)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] liuyongvs commented on pull request #17777: [FLINK-24886][core] TimeUtils supports the form of m.

2021-11-25 Thread GitBox



liuyongvs commented on pull request #1:
URL: https://github.com/apache/flink/pull/1#issuecomment-979708663


   @Thesharing , but if not modified the  existing configurations, it will 
throw exception. and the "m" is also supports. so i think it is not problem, 
what do you think @tillrohrmann 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-24884) flink flame graph webui bug

2021-11-25 Thread jackylau (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-24884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449377#comment-17449377
 ] 

jackylau commented on FLINK-24884:
--

hi [~trohrmann] , anyone will have time to review it. it has already some time 
past

> flink flame graph webui bug
> ---
>
> Key: FLINK-24884
> URL: https://issues.apache.org/jira/browse/FLINK-24884
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.14.0, 1.13.3
>Reporter: jackylau
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2021-11-12-15-48-08-140.png
>
>
> i can not compile success when i port the flame graph feature to our low 
> version of flink.
> but it is success in the high version of flink 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] (FLINK-24884) flink flame graph webui bug

2021-11-25 Thread jackylau (Jira)



[ https://issues.apache.org/jira/browse/FLINK-24884 ]


jackylau deleted comment on FLINK-24884:
--

was (Author: jackylau):
hi [~junhany] any one will review it?

> flink flame graph webui bug
> ---
>
> Key: FLINK-24884
> URL: https://issues.apache.org/jira/browse/FLINK-24884
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.14.0, 1.13.3
>Reporter: jackylau
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2021-11-12-15-48-08-140.png
>
>
> i can not compile success when i port the flame graph feature to our low 
> version of flink.
> but it is success in the high version of flink 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-24884) flink flame graph webui bug

2021-11-25 Thread jackylau (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-24884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449376#comment-17449376
 ] 

jackylau commented on FLINK-24884:
--

hi [~junhany] any one will review it?

> flink flame graph webui bug
> ---
>
> Key: FLINK-24884
> URL: https://issues.apache.org/jira/browse/FLINK-24884
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.14.0, 1.13.3
>Reporter: jackylau
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2021-11-12-15-48-08-140.png
>
>
> i can not compile success when i port the flame graph feature to our low 
> version of flink.
> but it is success in the high version of flink 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] xiangqiao123 commented on pull request #17761: [FLINK-24862][Connectors / Hive]Fix user-defined hive udaf/udtf cannot be used normally in hive dialect

2021-11-25 Thread GitBox



xiangqiao123 commented on pull request #17761:
URL: https://github.com/apache/flink/pull/17761#issuecomment-979703361


   @rmetzger @Tartarus0zm Thank you for your review.  
   I have modified the logic to skip validation: skip validate if the input is 
not instance of UserDefinedFunction.
   
   > And for the second question you mention in FLINK-24862 .
   > I think we don't need to modify the logic of HiveGenericUDTF.
   > 
   > You can implement your UDTF by overriding method public 
StructObjectInspector initialize(ObjectInspector[] argOIs) just like 
HiveGenericUDTFTest.TestSplitUDTF does.
   
   Since hive 0.13, many hive UDTF implementations have not  overrided method 
`public StructObjectInspector initialize(ObjectInspector[] argOIs)`，and this 
method is deprecated.  
   Therefore, flink needs to be compatible with the use of this udtf.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17761: [FLINK-24862][Connectors / Hive]Fix user-defined hive udaf/udtf cannot be used normally in hive dialect

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17761:
URL: https://github.com/apache/flink/pull/17761#issuecomment-965943227


   
   ## CI report:
   
   * d77e81463fafbc26a61207f3246bb82ecdce88ff Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26517)
 
   * 6b9562e4437590efafbfbc1d714f537c0567fba9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27101)
 
   * a321c9ac3866e238f9b1b7198c5b73def28d4d3f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27102)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17761: [FLINK-24862][Connectors / Hive]Fix user-defined hive udaf/udtf cannot be used normally in hive dialect

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17761:
URL: https://github.com/apache/flink/pull/17761#issuecomment-965943227


   
   ## CI report:
   
   * d77e81463fafbc26a61207f3246bb82ecdce88ff Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26517)
 
   * 6b9562e4437590efafbfbc1d714f537c0567fba9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27101)
 
   * a321c9ac3866e238f9b1b7198c5b73def28d4d3f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] zhipeng93 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



zhipeng93 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757229351



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/linear/LogisticGradient.java
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification.linear;
+
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.ml.common.linalg.BLAS;
+
+import java.io.Serializable;
+
+/** Utility class to compute gradient and loss for logistic loss. */

Review comment:
   I am not sure whether we should do that detailed explanation. As the 
information is just on the website.
   
   I just added one url there. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] zhipeng93 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



zhipeng93 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757227966



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/linear/LogisticRegressionModelData.java
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification.linear;
+
+import org.apache.flink.api.common.serialization.Encoder;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.SimpleStreamFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.ArrayList;
+import java.util.List;
+
+/** Model data of {@link LogisticRegressionModel}. */
+public class LogisticRegressionModelData {
+
+public final double[] coefficient;
+

Review comment:
   Let's follow the common pratice in Flink.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] zhipeng93 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



zhipeng93 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757227966



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/linear/LogisticRegressionModelData.java
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification.linear;
+
+import org.apache.flink.api.common.serialization.Encoder;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.SimpleStreamFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.ArrayList;
+import java.util.List;
+
+/** Model data of {@link LogisticRegressionModel}. */
+public class LogisticRegressionModelData {
+
+public final double[] coefficient;
+

Review comment:
   Let's see the common pratice in Flink




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] zhipeng93 commented on a change in pull request #28: [Flink-24556] Add Estimator and Transformer for logistic regression

2021-11-25 Thread GitBox



zhipeng93 commented on a change in pull request #28:
URL: https://github.com/apache/flink-ml/pull/28#discussion_r757227812



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/linear/LogisticRegressionModel.java
##
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification.linear;
+
+import org.apache.flink.api.common.eventtime.WatermarkStrategy;
+import org.apache.flink.api.common.functions.RichMapFunction;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.api.common.typeinfo.PrimitiveArrayTypeInfo;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.connector.source.Source;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.connector.file.sink.FileSink;
+import org.apache.flink.connector.file.src.FileSource;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.ml.api.core.Model;
+import 
org.apache.flink.ml.classification.linear.LogisticRegressionModelData.LogisticRegressionModelDataEncoder;
+import 
org.apache.flink.ml.classification.linear.LogisticRegressionModelData.LogisticRegressionModelDataStreamFormat;
+import org.apache.flink.ml.common.broadcast.BroadcastUtils;
+import org.apache.flink.ml.common.linalg.BLAS;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.BasePathBucketAssigner;
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.rollingpolicies.OnCheckpointRollingPolicy;
+import org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.internal.TableImpl;
+import org.apache.flink.types.Row;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+/** This class implements {@link Model} for {@link LogisticRegression}. */
+public class LogisticRegressionModel
+implements Model,
+LogisticRegressionModelParams {
+
+private Map, Object> paramMap = new HashMap<>();
+
+private Table model;
+
+public LogisticRegressionModel() {
+ParamUtils.initializeMapWithDefaultValues(this.paramMap, this);
+}
+
+@Override
+public Map, Object> getParamMap() {
+return paramMap;
+}
+
+@Override
+public void save(String path) throws IOException {
+StreamTableEnvironment tEnv =
+(StreamTableEnvironment) ((TableImpl) 
model).getTableEnvironment();
+String dataPath = ReadWriteUtils.getDataPath(path);
+FileSink sink =
+FileSink.forRowFormat(new Path(dataPath), new 
LogisticRegressionModelDataEncoder())
+.withRollingPolicy(OnCheckpointRollingPolicy.build())
+.withBucketAssigner(new BasePathBucketAssigner<>())
+.build();
+ReadWriteUtils.saveMetadata(this, path);
+tEnv.toDataStream(model)
+.map(x -> (LogisticRegressionModelData) x.getField(0))
+.sinkTo(sink)
+.setParallelism(1);
+}
+
+public static LogisticRegressionModel load(StreamExecutionEnvironment env, 
String path)
+throws IOException {
+StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);
+Source source =
+FileSource.forRecordStreamFormat(
+new LogisticRegressionModelDataStreamFormat(),
+ReadWriteUtils.getDataPaths(path))
+.build();
+LogisticRegressionModel model

[GitHub] [flink] Zakelly commented on a change in pull request #17874: [FLINK-24046] Refactor the EmbeddedRocksDBStateBackend configuration logic

2021-11-25 Thread GitBox



Zakelly commented on a change in pull request #17874:
URL: https://github.com/apache/flink/pull/17874#discussion_r757225258



##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/PredefinedOptions.java
##
@@ -83,35 +78,24 @@ public ColumnFamilyOptions 
createColumnOptions(Collection handles
  *   setCompactionStyle(CompactionStyle.LEVEL)
  *   setLevelCompactionDynamicLevelBytes(true)
  *   setIncreaseParallelism(4)
- *   setUseFsync(false)
  *   setDisableDataSync(true)
  *   setMaxOpenFiles(-1)
  *   setInfoLogLevel(InfoLogLevel.HEADER_LEVEL)
- *   setStatsDumpPeriodSec(0)
  * 
  *
  * Note: Because Flink does not rely on RocksDB data on disk for 
recovery, there is no need
  * to sync data to stable storage.
  */
-SPINNING_DISK_OPTIMIZED {
-
-@Override
-public DBOptions createDBOptions(Collection 
handlesToClose) {
-return new DBOptions()
-.setIncreaseParallelism(4)
-.setUseFsync(false)
-.setMaxOpenFiles(-1)
-.setInfoLogLevel(InfoLogLevel.HEADER_LEVEL)
-.setStatsDumpPeriodSec(0);
-}
-
-@Override
-public ColumnFamilyOptions 
createColumnOptions(Collection handlesToClose) {
-return new ColumnFamilyOptions()
-.setCompactionStyle(CompactionStyle.LEVEL)
-.setLevelCompactionDynamicLevelBytes(true);
-}
-},
+SPINNING_DISK_OPTIMIZED(
+new HashMap, Object>() {
+{
+put(RocksDBConfigurableOptions.MAX_BACKGROUND_THREADS, 4);
+put(RocksDBConfigurableOptions.MAX_OPEN_FILES, -1);
+put(RocksDBConfigurableOptions.LOG_LEVEL, 
InfoLogLevel.HEADER_LEVEL);
+put(RocksDBConfigurableOptions.COMPACTION_STYLE, 
CompactionStyle.LEVEL);
+put(RocksDBConfigurableOptions.USE_DYNAMIC_LEVEL_SIZE, 
true);
+}

Review comment:
   Sure, I added serialVersionUID in each class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] yunfengzhou-hub commented on a change in pull request #32: [FLINK-24817] Add Naive Bayes implementation

2021-11-25 Thread GitBox



yunfengzhou-hub commented on a change in pull request #32:
URL: https://github.com/apache/flink-ml/pull/32#discussion_r757201463



##
File path: 
flink-ml-lib/src/test/java/org/apache/flink/ml/classification/NaiveBayesTest.java
##
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification;
+
+import org.apache.flink.api.common.eventtime.WatermarkStrategy;
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayes;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayesModel;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayesModelData;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.Vector;
+import org.apache.flink.ml.linalg.Vectors;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.ml.util.StageTestUtils;
+import 
org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.CloseableIterator;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+
+/** Tests {@link NaiveBayes} and {@link NaiveBayesModel}. */
+public class NaiveBayesTest {
+private StreamExecutionEnvironment env;
+private StreamTableEnvironment tEnv;
+private Table trainTable;
+private Table predictTable;
+private Map expectedOutput;
+private NaiveBayes estimator;
+
+@Before
+public void before() {
+Configuration config = new Configuration();
+
config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH, 
true);
+env = StreamExecutionEnvironment.getExecutionEnvironment(config);
+env.setParallelism(4);
+env.enableCheckpointing(100);
+env.setRestartStrategy(RestartStrategies.noRestart());
+tEnv = StreamTableEnvironment.create(env);
+
+Schema trainSchema =
+Schema.newBuilder()
+.column("f0", DataTypes.of(DenseVector.class))
+.column("f1", DataTypes.INT())
+.columnByMetadata("rowtime", "TIMESTAMP_LTZ(3)")
+.watermark("rowtime", "SOURCE_WATERMARK()")
+.build();
+
+trainTable =
+tEnv.fromDataStream(
+env.fromElements(
+Row.of(Vectors.dense(0, 0.), 
11),
+Row.of(Vectors.dense(1, 0), 
10),
+Row.of(Vectors.dense(1, 1.), 
10))
+.assignTimestampsAndWatermarks(
+
WatermarkStrategy.noWatermarks()),
+trainSchema)
+.as("features", "label");
+
+Schema predictSchema =
+Schema.newBuilder()
+.column("f0", DataTypes.of(DenseVector.class))
+.columnByMetadata("rowtime", "TIMESTAMP_LTZ(3)")
+.watermark("rowtime", "SOURCE_WATERMARK()")
+.build();
+
+predictTable =
+tEnv.fromDataStream(
+env.fromElements(
+Row.of(Vectors.dense(0, 1.)),
+Row.of(Vectors.dense(0, 0.)),
+Row.of(Vectors.dense(1, 0)),
+Row.of(Vectors.dense(1, 1.)))
+

[jira] [Created] (FLINK-25067) Correct the description of RocksDB's background threads

2021-11-25 Thread Yun Tang (Jira)

Yun Tang created FLINK-25067:


 Summary: Correct the description of RocksDB's background threads
 Key: FLINK-25067
 URL: https://issues.apache.org/jira/browse/FLINK-25067
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Runtime / State Backends
Reporter: Yun Tang
Assignee: Yun Tang
 Fix For: 1.15.0, 1.14.1, 1.13.4


RocksDB actually has changed the maximum number of concurrent background flush 
and compaction jobs to 2 for long time, we should fix the related documentation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink-ml] yunfengzhou-hub commented on a change in pull request #32: [FLINK-24817] Add Naive Bayes implementation

2021-11-25 Thread GitBox



yunfengzhou-hub commented on a change in pull request #32:
URL: https://github.com/apache/flink-ml/pull/32#discussion_r757201463



##
File path: 
flink-ml-lib/src/test/java/org/apache/flink/ml/classification/NaiveBayesTest.java
##
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification;
+
+import org.apache.flink.api.common.eventtime.WatermarkStrategy;
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayes;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayesModel;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayesModelData;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.Vector;
+import org.apache.flink.ml.linalg.Vectors;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.ml.util.StageTestUtils;
+import 
org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.CloseableIterator;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+
+/** Tests {@link NaiveBayes} and {@link NaiveBayesModel}. */
+public class NaiveBayesTest {
+private StreamExecutionEnvironment env;
+private StreamTableEnvironment tEnv;
+private Table trainTable;
+private Table predictTable;
+private Map expectedOutput;
+private NaiveBayes estimator;
+
+@Before
+public void before() {
+Configuration config = new Configuration();
+
config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH, 
true);
+env = StreamExecutionEnvironment.getExecutionEnvironment(config);
+env.setParallelism(4);
+env.enableCheckpointing(100);
+env.setRestartStrategy(RestartStrategies.noRestart());
+tEnv = StreamTableEnvironment.create(env);
+
+Schema trainSchema =
+Schema.newBuilder()
+.column("f0", DataTypes.of(DenseVector.class))
+.column("f1", DataTypes.INT())
+.columnByMetadata("rowtime", "TIMESTAMP_LTZ(3)")
+.watermark("rowtime", "SOURCE_WATERMARK()")
+.build();
+
+trainTable =
+tEnv.fromDataStream(
+env.fromElements(
+Row.of(Vectors.dense(0, 0.), 
11),
+Row.of(Vectors.dense(1, 0), 
10),
+Row.of(Vectors.dense(1, 1.), 
10))
+.assignTimestampsAndWatermarks(
+
WatermarkStrategy.noWatermarks()),
+trainSchema)
+.as("features", "label");
+
+Schema predictSchema =
+Schema.newBuilder()
+.column("f0", DataTypes.of(DenseVector.class))
+.columnByMetadata("rowtime", "TIMESTAMP_LTZ(3)")
+.watermark("rowtime", "SOURCE_WATERMARK()")
+.build();
+
+predictTable =
+tEnv.fromDataStream(
+env.fromElements(
+Row.of(Vectors.dense(0, 1.)),
+Row.of(Vectors.dense(0, 0.)),
+Row.of(Vectors.dense(1, 0)),
+Row.of(Vectors.dense(1, 1.)))
+

[GitHub] [flink-ml] yunfengzhou-hub commented on a change in pull request #32: [FLINK-24817] Add Naive Bayes implementation

2021-11-25 Thread GitBox



yunfengzhou-hub commented on a change in pull request #32:
URL: https://github.com/apache/flink-ml/pull/32#discussion_r757201463



##
File path: 
flink-ml-lib/src/test/java/org/apache/flink/ml/classification/NaiveBayesTest.java
##
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification;
+
+import org.apache.flink.api.common.eventtime.WatermarkStrategy;
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayes;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayesModel;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayesModelData;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.Vector;
+import org.apache.flink.ml.linalg.Vectors;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.ml.util.StageTestUtils;
+import 
org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.CloseableIterator;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+
+/** Tests {@link NaiveBayes} and {@link NaiveBayesModel}. */
+public class NaiveBayesTest {
+private StreamExecutionEnvironment env;
+private StreamTableEnvironment tEnv;
+private Table trainTable;
+private Table predictTable;
+private Map expectedOutput;
+private NaiveBayes estimator;
+
+@Before
+public void before() {
+Configuration config = new Configuration();
+
config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH, 
true);
+env = StreamExecutionEnvironment.getExecutionEnvironment(config);
+env.setParallelism(4);
+env.enableCheckpointing(100);
+env.setRestartStrategy(RestartStrategies.noRestart());
+tEnv = StreamTableEnvironment.create(env);
+
+Schema trainSchema =
+Schema.newBuilder()
+.column("f0", DataTypes.of(DenseVector.class))
+.column("f1", DataTypes.INT())
+.columnByMetadata("rowtime", "TIMESTAMP_LTZ(3)")
+.watermark("rowtime", "SOURCE_WATERMARK()")
+.build();
+
+trainTable =
+tEnv.fromDataStream(
+env.fromElements(
+Row.of(Vectors.dense(0, 0.), 
11),
+Row.of(Vectors.dense(1, 0), 
10),
+Row.of(Vectors.dense(1, 1.), 
10))
+.assignTimestampsAndWatermarks(
+
WatermarkStrategy.noWatermarks()),
+trainSchema)
+.as("features", "label");
+
+Schema predictSchema =
+Schema.newBuilder()
+.column("f0", DataTypes.of(DenseVector.class))
+.columnByMetadata("rowtime", "TIMESTAMP_LTZ(3)")
+.watermark("rowtime", "SOURCE_WATERMARK()")
+.build();
+
+predictTable =
+tEnv.fromDataStream(
+env.fromElements(
+Row.of(Vectors.dense(0, 1.)),
+Row.of(Vectors.dense(0, 0.)),
+Row.of(Vectors.dense(1, 0)),
+Row.of(Vectors.dense(1, 1.)))
+

[GitHub] [flink-ml] zhipeng93 commented on a change in pull request #30: [FLINK-24845] Add allreduce utility function in FlinkML

2021-11-25 Thread GitBox



zhipeng93 commented on a change in pull request #30:
URL: https://github.com/apache/flink-ml/pull/30#discussion_r757194716



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/common/datastream/DataStreamUtils.java
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.common.datastream;
+
+import org.apache.flink.streaming.api.datastream.DataStream;
+
+/** Provides utility functions for {@link 
org.apache.flink.streaming.api.datastream.DataStream}. */

Review comment:
   The citation is generated automatically. There seems a error if I use 
`{@link DataStream}`.
   
   BTW, Flink has an example for this: 
`org.apache.flink.streaming.api.operators.Output`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] shenzhu commented on a change in pull request #17831: [FLINK-15825][Table SQL/API] Add renameDatabase() to Catalog

2021-11-25 Thread GitBox



shenzhu commented on a change in pull request #17831:
URL: https://github.com/apache/flink/pull/17831#discussion_r757216616



##
File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java
##
@@ -398,6 +398,34 @@ public void alterDatabase(
 }
 }
 
+@Override
+public void renameDatabase(
+String databaseName, String newDatabaseName, boolean 
ignoreIfNotExists)
+throws DatabaseNotExistException, DatabaseAlreadyExistException, 
CatalogException {
+checkArgument(
+!isNullOrWhitespaceOnly(databaseName), "databaseName cannot be 
null or empty");
+checkArgument(
+!isNullOrWhitespaceOnly(newDatabaseName),
+"newDatabaseName cannot be null or empty");
+
+try {
+if (databaseExists(databaseName)) {
+if (databaseExists(newDatabaseName)) {
+throw new DatabaseAlreadyExistException(getName(), 
newDatabaseName);
+} else {
+Database database = getHiveDatabase(databaseName);
+database.setName(newDatabaseName);
+client.alterDatabase(databaseName, database);

Review comment:
   Hey @YesOrNo828 , thanks for your suggestion! Sorry I didn't realized 
that before, I just updated this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Myasuka commented on a change in pull request #17874: [FLINK-24046] Refactor the EmbeddedRocksDBStateBackend configuration logic

2021-11-25 Thread GitBox



Myasuka commented on a change in pull request #17874:
URL: https://github.com/apache/flink/pull/17874#discussion_r757203945



##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/PredefinedOptions.java
##
@@ -50,26 +56,15 @@
  * The following options are set:
  *
  * 
- *   setUseFsync(false)
  *   setInfoLogLevel(InfoLogLevel.HEADER_LEVEL)
- *   setStatsDumpPeriodSec(0)
  * 
  */
-DEFAULT {
-
-@Override
-public DBOptions createDBOptions(Collection 
handlesToClose) {
-return new DBOptions()
-.setUseFsync(false)
-.setInfoLogLevel(InfoLogLevel.HEADER_LEVEL)
-.setStatsDumpPeriodSec(0);
-}
-
-@Override
-public ColumnFamilyOptions 
createColumnOptions(Collection handlesToClose) {
-return new ColumnFamilyOptions();
-}
-},
+DEFAULT(
+new HashMap, Object>() {
+{
+put(RocksDBConfigurableOptions.LOG_LEVEL, 
InfoLogLevel.HEADER_LEVEL);
+}
+}),

Review comment:
   This could be simpilfied to 
`Collections.singletonMap(RocksDBConfigurableOptions.LOG_LEVEL, 
InfoLogLevel.HEADER_LEVEL)`

##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/PredefinedOptions.java
##
@@ -131,55 +115,35 @@ public ColumnFamilyOptions 
createColumnOptions(Collection handles
  *   setIncreaseParallelism(4)
  *   setMinWriteBufferNumberToMerge(3)
  *   setMaxWriteBufferNumber(4)
- *   setUseFsync(false)
  *   setMaxOpenFiles(-1)
  *   setInfoLogLevel(InfoLogLevel.HEADER_LEVEL)
- *   setStatsDumpPeriodSec(0)
  *   BlockBasedTableConfig.setBlockCacheSize(256 MBytes)
  *   BlockBasedTableConfigsetBlockSize(128 KBytes)

Review comment:
   Maybe we can also fix this to `BlockBasedTableConfig.setBlockSize(128 
KBytes)`

##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBResourceContainer.java
##
@@ -245,6 +268,121 @@ private boolean 
overwriteFilterIfExist(BlockBasedTableConfig blockBasedTableConf
 return true;
 }
 
+/** Create a {@link DBOptions} for RocksDB, including some common 
settings. */
+DBOptions createDBOptions() {

Review comment:
   I think this method should be renamed to `createBaseCommonDBOptions` 
with `private` scope.

##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java
##
@@ -782,6 +791,29 @@ public void setWriteBatchSize(long writeBatchSize) {
 //  utilities
 // 
 
+private ReadableConfig mergeConfigurableOptions(ReadableConfig base, 
ReadableConfig onTop) {
+if (base == null) {
+base = new Configuration();
+}
+if (onTop == null) {

Review comment:
   Actually, the `onTop` cannot be null.

##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBConfigurableOptions.java
##
@@ -55,23 +57,23 @@
 public static final ConfigOption MAX_BACKGROUND_THREADS =
 key("state.backend.rocksdb.thread.num")
 .intType()
-.noDefaultValue()
+.defaultValue(1)

Review comment:
   I noticed that RocksDB has changed the default value to 2, I will create 
a hotfix for this.
   Once we fill `RocksDBConfigurableOptions` with default values, we should not 
say "RocksDB has default configuration " as Flink control the configurations 
totally.

##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBResourceContainer.java
##
@@ -245,6 +268,121 @@ private boolean 
overwriteFilterIfExist(BlockBasedTableConfig blockBasedTableConf
 return true;
 }
 
+/** Create a {@link DBOptions} for RocksDB, including some common 
settings. */
+DBOptions createDBOptions() {
+return new DBOptions().setUseFsync(false).setStatsDumpPeriodSec(0);
+}
+
+/** Create a {@link ColumnFamilyOptions} for RocksDB, including some 
common settings. */
+ColumnFamilyOptions createColumnOptions() {

Review comment:
   I think this method should be renamed to `createBaseColumnOptions` with 
`private` scope.

##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/EmbeddedRocksDBStateBackend.java
##
@@ -782,6 +791,29 @@ public void setWriteBatchSize(long writeBatchSize) {
 //  utilities
 //

[GitHub] [flink] ZhangChaoming closed pull request #14978: [hotfix][flink-yarn] Modify options' description format.

2021-11-25 Thread GitBox



ZhangChaoming closed pull request #14978:
URL: https://github.com/apache/flink/pull/14978


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] ZhangChaoming closed pull request #14980: [hotfix][flink-streaming-java] Remove the reference of nonexistent class.

2021-11-25 Thread GitBox



ZhangChaoming closed pull request #14980:
URL: https://github.com/apache/flink/pull/14980


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] yunfengzhou-hub commented on a change in pull request #32: [FLINK-24817] Add Naive Bayes implementation

2021-11-25 Thread GitBox



yunfengzhou-hub commented on a change in pull request #32:
URL: https://github.com/apache/flink-ml/pull/32#discussion_r757211597



##
File path: 
flink-ml-lib/src/test/java/org/apache/flink/ml/classification/NaiveBayesTest.java
##
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification;
+
+import org.apache.flink.api.common.eventtime.WatermarkStrategy;
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayes;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayesModel;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayesModelData;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.Vector;
+import org.apache.flink.ml.linalg.Vectors;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.ml.util.StageTestUtils;
+import 
org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.CloseableIterator;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+
+/** Tests {@link NaiveBayes} and {@link NaiveBayesModel}. */
+public class NaiveBayesTest {
+private StreamExecutionEnvironment env;
+private StreamTableEnvironment tEnv;
+private Table trainTable;
+private Table predictTable;
+private Map expectedOutput;
+private NaiveBayes estimator;
+
+@Before
+public void before() {
+Configuration config = new Configuration();
+
config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH, 
true);
+env = StreamExecutionEnvironment.getExecutionEnvironment(config);
+env.setParallelism(4);
+env.enableCheckpointing(100);
+env.setRestartStrategy(RestartStrategies.noRestart());
+tEnv = StreamTableEnvironment.create(env);
+
+Schema trainSchema =
+Schema.newBuilder()
+.column("f0", DataTypes.of(DenseVector.class))
+.column("f1", DataTypes.INT())
+.columnByMetadata("rowtime", "TIMESTAMP_LTZ(3)")
+.watermark("rowtime", "SOURCE_WATERMARK()")
+.build();
+
+trainTable =
+tEnv.fromDataStream(
+env.fromElements(
+Row.of(Vectors.dense(0, 0.), 
11),
+Row.of(Vectors.dense(1, 0), 
10),
+Row.of(Vectors.dense(1, 1.), 
10))
+.assignTimestampsAndWatermarks(
+
WatermarkStrategy.noWatermarks()),
+trainSchema)
+.as("features", "label");
+
+Schema predictSchema =
+Schema.newBuilder()
+.column("f0", DataTypes.of(DenseVector.class))
+.columnByMetadata("rowtime", "TIMESTAMP_LTZ(3)")
+.watermark("rowtime", "SOURCE_WATERMARK()")
+.build();
+
+predictTable =
+tEnv.fromDataStream(
+env.fromElements(
+Row.of(Vectors.dense(0, 1.)),
+Row.of(Vectors.dense(0, 0.)),
+Row.of(Vectors.dense(1, 0)),
+Row.of(Vectors.dense(1, 1.)))
+

[GitHub] [flink-ml] yunfengzhou-hub commented on a change in pull request #32: [FLINK-24817] Add Naive Bayes implementation

2021-11-25 Thread GitBox



yunfengzhou-hub commented on a change in pull request #32:
URL: https://github.com/apache/flink-ml/pull/32#discussion_r757211597



##
File path: 
flink-ml-lib/src/test/java/org/apache/flink/ml/classification/NaiveBayesTest.java
##
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification;
+
+import org.apache.flink.api.common.eventtime.WatermarkStrategy;
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayes;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayesModel;
+import org.apache.flink.ml.classification.naivebayes.NaiveBayesModelData;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.Vector;
+import org.apache.flink.ml.linalg.Vectors;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.ml.util.StageTestUtils;
+import 
org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.CloseableIterator;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+
+/** Tests {@link NaiveBayes} and {@link NaiveBayesModel}. */
+public class NaiveBayesTest {
+private StreamExecutionEnvironment env;
+private StreamTableEnvironment tEnv;
+private Table trainTable;
+private Table predictTable;
+private Map expectedOutput;
+private NaiveBayes estimator;
+
+@Before
+public void before() {
+Configuration config = new Configuration();
+
config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH, 
true);
+env = StreamExecutionEnvironment.getExecutionEnvironment(config);
+env.setParallelism(4);
+env.enableCheckpointing(100);
+env.setRestartStrategy(RestartStrategies.noRestart());
+tEnv = StreamTableEnvironment.create(env);
+
+Schema trainSchema =
+Schema.newBuilder()
+.column("f0", DataTypes.of(DenseVector.class))
+.column("f1", DataTypes.INT())
+.columnByMetadata("rowtime", "TIMESTAMP_LTZ(3)")
+.watermark("rowtime", "SOURCE_WATERMARK()")
+.build();
+
+trainTable =
+tEnv.fromDataStream(
+env.fromElements(
+Row.of(Vectors.dense(0, 0.), 
11),
+Row.of(Vectors.dense(1, 0), 
10),
+Row.of(Vectors.dense(1, 1.), 
10))
+.assignTimestampsAndWatermarks(
+
WatermarkStrategy.noWatermarks()),
+trainSchema)
+.as("features", "label");
+
+Schema predictSchema =
+Schema.newBuilder()
+.column("f0", DataTypes.of(DenseVector.class))
+.columnByMetadata("rowtime", "TIMESTAMP_LTZ(3)")
+.watermark("rowtime", "SOURCE_WATERMARK()")
+.build();
+
+predictTable =
+tEnv.fromDataStream(
+env.fromElements(
+Row.of(Vectors.dense(0, 1.)),
+Row.of(Vectors.dense(0, 0.)),
+Row.of(Vectors.dense(1, 0)),
+Row.of(Vectors.dense(1, 1.)))
+

[GitHub] [flink] flinkbot edited a comment on pull request #17598: [FLINK-24703][connectors][formats] Add CSV format support for filesystem based on StreamFormat and BulkWriter interfaces.

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17598:
URL: https://github.com/apache/flink/pull/17598#issuecomment-954282485


   
   ## CI report:
   
   * 12d370196fdbe6ff098e8c93512b90fb774aa5c6 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27098)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17761: [FLINK-24862][Connectors / Hive]Fix user-defined hive udaf/udtf cannot be used normally in hive dialect

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17761:
URL: https://github.com/apache/flink/pull/17761#issuecomment-965943227


   
   ## CI report:
   
   * d77e81463fafbc26a61207f3246bb82ecdce88ff Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26517)
 
   * 6b9562e4437590efafbfbc1d714f537c0567fba9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27101)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17598: [FLINK-24703][connectors][formats] Add CSV format support for filesystem based on StreamFormat and BulkWriter interfaces.

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17598:
URL: https://github.com/apache/flink/pull/17598#issuecomment-954282485


   
   ## CI report:
   
   * 12d370196fdbe6ff098e8c93512b90fb774aa5c6 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27098)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17918: [FLINK-25065][docs] Update document for jdbc connector

2021-11-25 Thread GitBox



flinkbot edited a comment on pull request #17918:
URL: https://github.com/apache/flink/pull/17918#issuecomment-979524868


   
   ## CI report:
   
   * b846e608190b1f030040c3670a3259a81d127258 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=27097)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 663 matches

Mail list logo