[jira] [Created] (FLINK-28574) Bump the fabric8 kubernetes-client to 6.0.0
ConradJam created FLINK-28574: - Summary: Bump the fabric8 kubernetes-client to 6.0.0 Key: FLINK-28574 URL: https://issues.apache.org/jira/browse/FLINK-28574 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.1.0 Reporter: ConradJam fabric8 kubernetes-client now is release to [6.0.0|https://github.com/fabric8io/kubernetes-client/releases/tag/v6.0.0] , Later we can upgrade this version and remove the deprecated API usage -- This message was sent by Atlassian Jira (v8.20.10#820010)
[RESULT][VOTE] Creating benchmark channel in Apache Flink slack
Hi everyone, I’m happy to announce that 'Creating new benchmark Slack channel' has been accepted, with 7 approving votes, 5 of which are binding: - Yun Gao (binding) - Yuan Mei (binding) - Godfrey He - Marton Balassi (binding) - Alexander Fedulov - Yun Tang (binding) - Jing Zhang (binding) There is no disapproving vote. Thanks everyone for votes. Márton, I think I will reach you later for handling admin steps. -- Best regards, Anton Kalashnikov
[DISCUSS] FLIP-252: Amazon DynamoDB Sink Connector
Hello all, We would like to start a discussion thread on FLIP-252: Amazon DynamoDB Sink Connector [1] where we propose to provide a sink connector for Amazon DynamoDB [2] based on the Async Sink [3]. Looking forward to comments and feedback. Thank you. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector [2] https://aws.amazon.com/dynamodb [3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
[jira] [Created] (FLINK-28573) Nested type will lose nullability when converting from TableSchema
Jane Chan created FLINK-28573: - Summary: Nested type will lose nullability when converting from TableSchema Key: FLINK-28573 URL: https://issues.apache.org/jira/browse/FLINK-28573 Project: Flink Issue Type: Bug Components: Table Store Affects Versions: table-store-0.2.0 Reporter: Jane Chan Fix For: table-store-0.2.0 E.g. ArrayDataType, MultisetDataType etc -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD
Hi Matyas! So to clarify your suggestion, we would have the following JobStatus fields: jobId : String state : String savepointInfo : SavepointInfo jobDetailsInfo : String (optional) - output of Flink Rest API job details And the user could configure with a flag whether to include jobDetailsInfo or not in status. Cheers, Gyula On Fri, Jul 15, 2022 at 3:02 PM Őrhidi Mátyás wrote: > Hi Gyula, > > since the jobDetailsInfo could evolve, another option would be to dump it > as yaml/json into the metadata. > > Best, > Matyas > > On Fri, Jul 15, 2022 at 2:58 PM Gyula Fóra wrote: > > > Based on some further though, a reasonable middleground would be to add > an > > optional metadata/jobDetailsInfo field to the JobStatus. > > We would also add an accompanying config option (default false) whether > to > > populate this field for jobs. > > > > This way operator users could decide if they want to expose the job > > information provided by Flink Rest API or only the information that the > > operator itself needs. > > > > What do you all think? > > > > Gyula > > > > On Fri, Jul 15, 2022 at 2:09 PM Gyula Fóra wrote: > > > > > Hi All! > > > > > > I fully acknowledge the general need to access more info about the > > running > > > deployments. This need however is very specific to the use-cases / > > > platforms built on the operator. > > > I think we need a good way to tackle this without growing the status > > > arbitrarily. > > > > > > Currently the JobStatus in the operator contains the following fields: > > > > > >- jobId > > >- state : Flink JobStatus > > >- savepointInfo : Operator savepoint tracking info > > >- startTime : Flink job startTime > > >- updateTime : Last time state was updated in the operator > > >- jobName: Name of the job > > > > > > Technically speaking only jobId, state and savepointInfo are used > inside > > > the operator logic, the rest is unnecessary and "could be removed" > > without > > > affecting any operator functionality. > > > > > > I think instead of adding more of these "unnecessary/arbitrary" fields > we > > > should add a more generic way that allows a configurable / pluggable > way > > to > > > extend the status with user/platform specific fields based on the Flink > > job > > > information. At the same time we should already @Deprecate / phase out > > the > > > currently unnecessary fields. > > > > > > One way of doing this would be adding a new Map metadata > > > (or similar) field. And at the same time add a configurable / pluggable > > way > > > to create the content of this metadata based on the Flink rest api > > response > > > (the extended job details). > > > > > > What do you think? > > > Gyula > > > > > > On Fri, Jul 15, 2022 at 1:05 PM WONG, DAREN > > > > > wrote: > > > > > >> Hi Martin, > > >> > > >> Yes, that's understandable. I think adding job endTime, duration, > > jobPlan > > >> is useful to other Flink users too as they now have info to track: > > >> > > >> 1. endTime: If the job has ended, the user can know when it has ended. > > If > > >> the job is still streaming, then the user can know as it defaults to > > "-1". > > >> 2. duration: Info on how long the job has been running for, useful for > > >> monitoring purposes. > > >> 3. jobPlan: Contains more detailed job info such as the operators in > the > > >> job graph and the parallelism of each operator. This could benefit > Flink > > >> users as follows: > > >> 3.1. Help users to get a quick view on jobs simply by querying > > >> via k8s API, without need to integrate with Flink Client/API. Useful > for > > >> users who mainly use kubectl. > > >> 3.2. Allows users to easily notice a change in job. For eg, if > > >> user changed a job code by adding a new operator but built it with > same > > jar > > >> name, then they can notice the change in jobPlan. > > >> 3.3. User may want to operate on jobPlan difference. For eg, > > >> create difference notification, allocate resources, or other > automation > > >> purposed. > > >> > > >> In general, I think adding these info is useful for Flink users from > > >> simple monitoring to audit trail purposes. In addition, these info are > > >> available via Flink REST API, hence I believe Flink users who tracks > > these > > >> info via API would benefit from them when they start using Flink > > Kubernetes > > >> Operator. > > >> > > >> Regards, > > >> Daren > > >> > > >> > > >> On 13/07/2022, 08:25, "Martijn Visser" > > wrote: > > >> > > >> CAUTION: This email originated from outside of the organization. > Do > > >> not click links or open attachments unless you can confirm the sender > > and > > >> know the content is safe. > > >> > > >> > > >> > > >> Hi Daren, > > >> > > >> Could you list the benefits for the users of Flink? I do think > that > > an > > >> internal AWS requirement is not a good argument for getting > > something > > >> done > > >> in Flink. > > >> > > >> Best regards, > > >
Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD
Hi Gyula, since the jobDetailsInfo could evolve, another option would be to dump it as yaml/json into the metadata. Best, Matyas On Fri, Jul 15, 2022 at 2:58 PM Gyula Fóra wrote: > Based on some further though, a reasonable middleground would be to add an > optional metadata/jobDetailsInfo field to the JobStatus. > We would also add an accompanying config option (default false) whether to > populate this field for jobs. > > This way operator users could decide if they want to expose the job > information provided by Flink Rest API or only the information that the > operator itself needs. > > What do you all think? > > Gyula > > On Fri, Jul 15, 2022 at 2:09 PM Gyula Fóra wrote: > > > Hi All! > > > > I fully acknowledge the general need to access more info about the > running > > deployments. This need however is very specific to the use-cases / > > platforms built on the operator. > > I think we need a good way to tackle this without growing the status > > arbitrarily. > > > > Currently the JobStatus in the operator contains the following fields: > > > >- jobId > >- state : Flink JobStatus > >- savepointInfo : Operator savepoint tracking info > >- startTime : Flink job startTime > >- updateTime : Last time state was updated in the operator > >- jobName: Name of the job > > > > Technically speaking only jobId, state and savepointInfo are used inside > > the operator logic, the rest is unnecessary and "could be removed" > without > > affecting any operator functionality. > > > > I think instead of adding more of these "unnecessary/arbitrary" fields we > > should add a more generic way that allows a configurable / pluggable way > to > > extend the status with user/platform specific fields based on the Flink > job > > information. At the same time we should already @Deprecate / phase out > the > > currently unnecessary fields. > > > > One way of doing this would be adding a new Map metadata > > (or similar) field. And at the same time add a configurable / pluggable > way > > to create the content of this metadata based on the Flink rest api > response > > (the extended job details). > > > > What do you think? > > Gyula > > > > On Fri, Jul 15, 2022 at 1:05 PM WONG, DAREN > > > wrote: > > > >> Hi Martin, > >> > >> Yes, that's understandable. I think adding job endTime, duration, > jobPlan > >> is useful to other Flink users too as they now have info to track: > >> > >> 1. endTime: If the job has ended, the user can know when it has ended. > If > >> the job is still streaming, then the user can know as it defaults to > "-1". > >> 2. duration: Info on how long the job has been running for, useful for > >> monitoring purposes. > >> 3. jobPlan: Contains more detailed job info such as the operators in the > >> job graph and the parallelism of each operator. This could benefit Flink > >> users as follows: > >> 3.1. Help users to get a quick view on jobs simply by querying > >> via k8s API, without need to integrate with Flink Client/API. Useful for > >> users who mainly use kubectl. > >> 3.2. Allows users to easily notice a change in job. For eg, if > >> user changed a job code by adding a new operator but built it with same > jar > >> name, then they can notice the change in jobPlan. > >> 3.3. User may want to operate on jobPlan difference. For eg, > >> create difference notification, allocate resources, or other automation > >> purposed. > >> > >> In general, I think adding these info is useful for Flink users from > >> simple monitoring to audit trail purposes. In addition, these info are > >> available via Flink REST API, hence I believe Flink users who tracks > these > >> info via API would benefit from them when they start using Flink > Kubernetes > >> Operator. > >> > >> Regards, > >> Daren > >> > >> > >> On 13/07/2022, 08:25, "Martijn Visser" > wrote: > >> > >> CAUTION: This email originated from outside of the organization. Do > >> not click links or open attachments unless you can confirm the sender > and > >> know the content is safe. > >> > >> > >> > >> Hi Daren, > >> > >> Could you list the benefits for the users of Flink? I do think that > an > >> internal AWS requirement is not a good argument for getting > something > >> done > >> in Flink. > >> > >> Best regards, > >> > >> Martijn > >> > >> Op di 12 jul. 2022 om 21:17 schreef WONG, DAREN > >> : > >> > >> > Hi Yang, > >> > > >> > The requirement to add *plan* currently originates from an > internal > >> AWS > >> > requirement as our service needs visibility of *plan*, but we > think > >> it > >> > could be beneficial as well to customers who uses *plan* too. > >> > > >> > Regards, > >> > Daren > >> > > >> > > >> > > >> > > >> > On 12/07/2022, 13:23, "Yang Wang" wrote: > >> > > >> > CAUTION: This email originated from outside of the > >> organization. Do > >> > not click links or open attachment
[jira] [Created] (FLINK-28572) FlinkSQL executes Table.execute multiple times on the same Table, and changing the Table.execute code position will produce different phenomena
Andrew Chan created FLINK-28572: --- Summary: FlinkSQL executes Table.execute multiple times on the same Table, and changing the Table.execute code position will produce different phenomena Key: FLINK-28572 URL: https://issues.apache.org/jira/browse/FLINK-28572 Project: Flink Issue Type: Bug Components: Table SQL / API, Table SQL / Planner Affects Versions: 1.13.6 Environment: flink-table-planner-blink_2.11 1.13.6 Reporter: Andrew Chan *The following code prints and inserts fine* public static void main(String[] args) { Configuration conf = new Configuration(); conf.setInteger("rest.port", 2000); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf); env.setParallelism(1); StreamTableEnvironment tEnv = StreamTableEnvironment.create(env); tEnv.executeSql("create table sensor(" + " id string, " + " ts bigint, " + " vc int" + ")with(" + " 'connector' = 'kafka', " + " 'topic' = 's1', " + " 'properties.bootstrap.servers' = 'hadoop162:9092', " + " 'properties.group.id' = 'atguigu', " + " 'scan.startup.mode' = 'latest-offset', " + " 'format' = 'csv'" + ")"); Table result = tEnv.sqlQuery("select * from sensor"); tEnv.executeSql("create table s_out(" + " id string, " + " ts bigint, " + " vc int" + ")with(" + " 'connector' = 'kafka', " + " 'topic' = 's2', " + " 'properties.bootstrap.servers' = 'hadoop162:9092', " + " 'format' = 'json', " + " 'sink.partitioner' = 'round-robin' " + ")"); result.executeInsert("s_out"); result.execute().print(); } *When the code that prints this line is moved up, it can be printed normally, but the insert statement is invalid, as follows* public static void main(String[] args) { Configuration conf = new Configuration(); conf.setInteger("rest.port", 2000); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf); env.setParallelism(1); StreamTableEnvironment tEnv = StreamTableEnvironment.create(env); // 1. 通过ddl方式建表(动态表), 与文件关联 tEnv.executeSql("create table sensor(" + " id string, " + " ts bigint, " + " vc int" + ")with(" + " 'connector' = 'kafka', " + " 'topic' = 's1', " + " 'properties.bootstrap.servers' = 'hadoop162:9092', " + " 'properties.group.id' = 'atguigu', " + " 'scan.startup.mode' = 'latest-offset', " + " 'format' = 'csv'" + ")"); Table result = tEnv.sqlQuery("select * from sensor"); tEnv.executeSql("create table s_out(" + " id string, " + " ts bigint, " + " vc int" + ")with(" + " 'connector' = 'kafka', " + " 'topic' = 's2', " + " 'properties.bootstrap.servers' = 'hadoop162:9092', " + " 'format' = 'json', " + " 'sink.partitioner' = 'round-robin' " + ")"); {color:#FF}result.execute().print();{color} {color:#FF} result.executeInsert("s_out");{color} } -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD
Based on some further though, a reasonable middleground would be to add an optional metadata/jobDetailsInfo field to the JobStatus. We would also add an accompanying config option (default false) whether to populate this field for jobs. This way operator users could decide if they want to expose the job information provided by Flink Rest API or only the information that the operator itself needs. What do you all think? Gyula On Fri, Jul 15, 2022 at 2:09 PM Gyula Fóra wrote: > Hi All! > > I fully acknowledge the general need to access more info about the running > deployments. This need however is very specific to the use-cases / > platforms built on the operator. > I think we need a good way to tackle this without growing the status > arbitrarily. > > Currently the JobStatus in the operator contains the following fields: > >- jobId >- state : Flink JobStatus >- savepointInfo : Operator savepoint tracking info >- startTime : Flink job startTime >- updateTime : Last time state was updated in the operator >- jobName: Name of the job > > Technically speaking only jobId, state and savepointInfo are used inside > the operator logic, the rest is unnecessary and "could be removed" without > affecting any operator functionality. > > I think instead of adding more of these "unnecessary/arbitrary" fields we > should add a more generic way that allows a configurable / pluggable way to > extend the status with user/platform specific fields based on the Flink job > information. At the same time we should already @Deprecate / phase out the > currently unnecessary fields. > > One way of doing this would be adding a new Map metadata > (or similar) field. And at the same time add a configurable / pluggable way > to create the content of this metadata based on the Flink rest api response > (the extended job details). > > What do you think? > Gyula > > On Fri, Jul 15, 2022 at 1:05 PM WONG, DAREN > wrote: > >> Hi Martin, >> >> Yes, that's understandable. I think adding job endTime, duration, jobPlan >> is useful to other Flink users too as they now have info to track: >> >> 1. endTime: If the job has ended, the user can know when it has ended. If >> the job is still streaming, then the user can know as it defaults to "-1". >> 2. duration: Info on how long the job has been running for, useful for >> monitoring purposes. >> 3. jobPlan: Contains more detailed job info such as the operators in the >> job graph and the parallelism of each operator. This could benefit Flink >> users as follows: >> 3.1. Help users to get a quick view on jobs simply by querying >> via k8s API, without need to integrate with Flink Client/API. Useful for >> users who mainly use kubectl. >> 3.2. Allows users to easily notice a change in job. For eg, if >> user changed a job code by adding a new operator but built it with same jar >> name, then they can notice the change in jobPlan. >> 3.3. User may want to operate on jobPlan difference. For eg, >> create difference notification, allocate resources, or other automation >> purposed. >> >> In general, I think adding these info is useful for Flink users from >> simple monitoring to audit trail purposes. In addition, these info are >> available via Flink REST API, hence I believe Flink users who tracks these >> info via API would benefit from them when they start using Flink Kubernetes >> Operator. >> >> Regards, >> Daren >> >> >> On 13/07/2022, 08:25, "Martijn Visser" wrote: >> >> CAUTION: This email originated from outside of the organization. Do >> not click links or open attachments unless you can confirm the sender and >> know the content is safe. >> >> >> >> Hi Daren, >> >> Could you list the benefits for the users of Flink? I do think that an >> internal AWS requirement is not a good argument for getting something >> done >> in Flink. >> >> Best regards, >> >> Martijn >> >> Op di 12 jul. 2022 om 21:17 schreef WONG, DAREN >> : >> >> > Hi Yang, >> > >> > The requirement to add *plan* currently originates from an internal >> AWS >> > requirement as our service needs visibility of *plan*, but we think >> it >> > could be beneficial as well to customers who uses *plan* too. >> > >> > Regards, >> > Daren >> > >> > >> > >> > >> > On 12/07/2022, 13:23, "Yang Wang" wrote: >> > >> > CAUTION: This email originated from outside of the >> organization. Do >> > not click links or open attachments unless you can confirm the >> sender and >> > know the content is safe. >> > >> > >> > >> > Thanks for the explanation. Only having 1 API call in most >> cases makes >> > sense to me. >> > >> > Could you please elaborate more about why do we need the *plan* >> in CR >> > status? >> > >> > >> > Best, >> > Yang >> > >> > Gyula Fóra 于2022年7月12日周二 17:36写道: >> > >> > > H
Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD
Hi All! I fully acknowledge the general need to access more info about the running deployments. This need however is very specific to the use-cases / platforms built on the operator. I think we need a good way to tackle this without growing the status arbitrarily. Currently the JobStatus in the operator contains the following fields: - jobId - state : Flink JobStatus - savepointInfo : Operator savepoint tracking info - startTime : Flink job startTime - updateTime : Last time state was updated in the operator - jobName: Name of the job Technically speaking only jobId, state and savepointInfo are used inside the operator logic, the rest is unnecessary and "could be removed" without affecting any operator functionality. I think instead of adding more of these "unnecessary/arbitrary" fields we should add a more generic way that allows a configurable / pluggable way to extend the status with user/platform specific fields based on the Flink job information. At the same time we should already @Deprecate / phase out the currently unnecessary fields. One way of doing this would be adding a new Map metadata (or similar) field. And at the same time add a configurable / pluggable way to create the content of this metadata based on the Flink rest api response (the extended job details). What do you think? Gyula On Fri, Jul 15, 2022 at 1:05 PM WONG, DAREN wrote: > Hi Martin, > > Yes, that's understandable. I think adding job endTime, duration, jobPlan > is useful to other Flink users too as they now have info to track: > > 1. endTime: If the job has ended, the user can know when it has ended. If > the job is still streaming, then the user can know as it defaults to "-1". > 2. duration: Info on how long the job has been running for, useful for > monitoring purposes. > 3. jobPlan: Contains more detailed job info such as the operators in the > job graph and the parallelism of each operator. This could benefit Flink > users as follows: > 3.1. Help users to get a quick view on jobs simply by querying via > k8s API, without need to integrate with Flink Client/API. Useful for users > who mainly use kubectl. > 3.2. Allows users to easily notice a change in job. For eg, if > user changed a job code by adding a new operator but built it with same jar > name, then they can notice the change in jobPlan. > 3.3. User may want to operate on jobPlan difference. For eg, > create difference notification, allocate resources, or other automation > purposed. > > In general, I think adding these info is useful for Flink users from > simple monitoring to audit trail purposes. In addition, these info are > available via Flink REST API, hence I believe Flink users who tracks these > info via API would benefit from them when they start using Flink Kubernetes > Operator. > > Regards, > Daren > > > On 13/07/2022, 08:25, "Martijn Visser" wrote: > > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender and > know the content is safe. > > > > Hi Daren, > > Could you list the benefits for the users of Flink? I do think that an > internal AWS requirement is not a good argument for getting something > done > in Flink. > > Best regards, > > Martijn > > Op di 12 jul. 2022 om 21:17 schreef WONG, DAREN > : > > > Hi Yang, > > > > The requirement to add *plan* currently originates from an internal > AWS > > requirement as our service needs visibility of *plan*, but we think > it > > could be beneficial as well to customers who uses *plan* too. > > > > Regards, > > Daren > > > > > > > > > > On 12/07/2022, 13:23, "Yang Wang" wrote: > > > > CAUTION: This email originated from outside of the organization. > Do > > not click links or open attachments unless you can confirm the > sender and > > know the content is safe. > > > > > > > > Thanks for the explanation. Only having 1 API call in most cases > makes > > sense to me. > > > > Could you please elaborate more about why do we need the *plan* > in CR > > status? > > > > > > Best, > > Yang > > > > Gyula Fóra 于2022年7月12日周二 17:36写道: > > > > > Hi Devs! > > > > > > I discussed with Daren offline, and I agree with him that > > technically we > > > almost never need 2 API calls. > > > > > > I think it's fine to have a second API call once directly after > > application > > > submission (technically even this can be eliminated by setting > a fix > > job id > > > always). > > > > > > +1 from me. > > > > > > Cheers, > > > Gyula > > > > > > > > > On Tue, Jul 12, 2022 at 11:32 AM WONG, DAREN > > > > > > > > wrote: > > > > >
[jira] [Created] (FLINK-28571) Add Chi-squared test as Transformer to ml.feature
Simon Tao created FLINK-28571: - Summary: Add Chi-squared test as Transformer to ml.feature Key: FLINK-28571 URL: https://issues.apache.org/jira/browse/FLINK-28571 Project: Flink Issue Type: New Feature Components: Library / Machine Learning Reporter: Simon Tao Pearson's chi-squared test:https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test For more information on chi-squared:http://en.wikipedia.org/wiki/Chi-squared_test -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD
Hi Martin, Yes, that's understandable. I think adding job endTime, duration, jobPlan is useful to other Flink users too as they now have info to track: 1. endTime: If the job has ended, the user can know when it has ended. If the job is still streaming, then the user can know as it defaults to "-1". 2. duration: Info on how long the job has been running for, useful for monitoring purposes. 3. jobPlan: Contains more detailed job info such as the operators in the job graph and the parallelism of each operator. This could benefit Flink users as follows: 3.1. Help users to get a quick view on jobs simply by querying via k8s API, without need to integrate with Flink Client/API. Useful for users who mainly use kubectl. 3.2. Allows users to easily notice a change in job. For eg, if user changed a job code by adding a new operator but built it with same jar name, then they can notice the change in jobPlan. 3.3. User may want to operate on jobPlan difference. For eg, create difference notification, allocate resources, or other automation purposed. In general, I think adding these info is useful for Flink users from simple monitoring to audit trail purposes. In addition, these info are available via Flink REST API, hence I believe Flink users who tracks these info via API would benefit from them when they start using Flink Kubernetes Operator. Regards, Daren On 13/07/2022, 08:25, "Martijn Visser" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Daren, Could you list the benefits for the users of Flink? I do think that an internal AWS requirement is not a good argument for getting something done in Flink. Best regards, Martijn Op di 12 jul. 2022 om 21:17 schreef WONG, DAREN : > Hi Yang, > > The requirement to add *plan* currently originates from an internal AWS > requirement as our service needs visibility of *plan*, but we think it > could be beneficial as well to customers who uses *plan* too. > > Regards, > Daren > > > > > On 12/07/2022, 13:23, "Yang Wang" wrote: > > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender and > know the content is safe. > > > > Thanks for the explanation. Only having 1 API call in most cases makes > sense to me. > > Could you please elaborate more about why do we need the *plan* in CR > status? > > > Best, > Yang > > Gyula Fóra 于2022年7月12日周二 17:36写道: > > > Hi Devs! > > > > I discussed with Daren offline, and I agree with him that > technically we > > almost never need 2 API calls. > > > > I think it's fine to have a second API call once directly after > application > > submission (technically even this can be eliminated by setting a fix > job id > > always). > > > > +1 from me. > > > > Cheers, > > Gyula > > > > > > On Tue, Jul 12, 2022 at 11:32 AM WONG, DAREN > > > > > wrote: > > > > > Hi Matyas, > > > > > > Thanks for the feedback, and yes I agree. An alternative approach > would > > > instead be: > > > > > > - 2 API calls only when jobID is not available (i.e when > submitting a new > > > application cluster, which is a one-off event). > > > - 1 API call when jobID is already available by directly calling > > > "/jobs/:jobid". > > > > > > With this approach, we can keep the API call to 1 in most cases. > > > > > > Regards, > > > Daren > > > > > > > > > On 11/07/2022, 14:44, "Őrhidi Mátyás" > wrote: > > > > > > CAUTION: This email originated from outside of the > organization. Do > > > not click links or open attachments unless you can confirm the > sender and > > > know the content is safe. > > > > > > > > > > > > Hi Daren, > > > > > > At the moment the Operator fetches the job state via > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview > > > which contains the 'end-time' and 'duration' fields already. I > feel > > > calling > > > the > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid > > > after the previous call for every job in every reconcile loop > would > > be > > > too > >
Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions
Hi Jing, Thanks for the driving this, LGTM. Best, Godfrey Jingsong Li 于2022年7月15日周五 11:38写道: > > Thanks for starting this discussion. > > Have we considered introducing a listPartitionWithStats() in Catalog? > > Best, > Jingsong > > On Fri, Jul 15, 2022 at 10:08 AM Jark Wu wrote: > > > > Hi Jing, > > > > Thanks for starting this discussion. The bulk fetch is a great improvement > > for the optimizer. > > The FLIP looks good to me. > > > > Best, > > Jark > > > > On Fri, 8 Jul 2022 at 17:36, Jing Ge wrote: > > > > > Hi devs, > > > > > > After having multiple discussions with Jark and Goldfrey, I'd like to > > > start > > > a discussion on the mailing list w.r.t. FLIP-247[1], which will > > > significantly improve the performance by providing the bulk fetch > > > capability for table and column statistics. > > > > > > Currently the statistics information about tables can only be fetched from > > > the catalog by each given partition iteratively. Since getting statistics > > > information from catalogs is a very heavy operation, in order to improve > > > the query performance, we’d better provide functionality to fetch the > > > statistics information of a table for all given partitions in one shot. > > > > > > Based on the manual performance test, for 2000 partitions, the cost will > > > be > > > improved from 10s to 2s. The improvement result is 500%. > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions > > > > > > Best regards, > > > Jing > > >
[jira] [Created] (FLINK-28570) Introduces a StreamPhysicalPlanChecker to validate if there's any non-deterministic updates which may cause wrong result
lincoln lee created FLINK-28570: --- Summary: Introduces a StreamPhysicalPlanChecker to validate if there's any non-deterministic updates which may cause wrong result Key: FLINK-28570 URL: https://issues.apache.org/jira/browse/FLINK-28570 Project: Flink Issue Type: Sub-task Components: Table SQL / Planner Reporter: lincoln lee Fix For: 1.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-28569) SinkUpsertMaterializer should be aware of the input upsertKey if it is not empty
lincoln lee created FLINK-28569: --- Summary: SinkUpsertMaterializer should be aware of the input upsertKey if it is not empty Key: FLINK-28569 URL: https://issues.apache.org/jira/browse/FLINK-28569 Project: Flink Issue Type: Sub-task Components: Table SQL / Runtime Reporter: lincoln lee Fix For: 1.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-28568) Implements a new lookup join operator (sync mode only) with state to eliminate the non determinism
lincoln lee created FLINK-28568: --- Summary: Implements a new lookup join operator (sync mode only) with state to eliminate the non determinism Key: FLINK-28568 URL: https://issues.apache.org/jira/browse/FLINK-28568 Project: Flink Issue Type: Sub-task Components: Table SQL / Runtime Reporter: lincoln lee Fix For: 1.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-28567) Introduce predicate inference from one side of join to the other
Alexander Trushev created FLINK-28567: - Summary: Introduce predicate inference from one side of join to the other Key: FLINK-28567 URL: https://issues.apache.org/jira/browse/FLINK-28567 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Reporter: Alexander Trushev h2. Context There is JoinPushTransitivePredicatesRule in Calcite that infers predicates from on a Join and creates Filters if those predicates can be pushed to its inputs. *Example.* (a0 = b0 AND a0 > 0) => (b0 > 0) h2. Proposal Add org.apache.calcite.rel.rules.JoinPushTransitivePredicatesRule to FlinkStreamRuleSets and to FlinkBatchRuleSets h2. Benefit Before the changes: {code} Flink SQL> explain select * from A join B on a0 = b0 and a0 > 0; Join(joinType=[InnerJoin], where=[=(a0, b0)], select=[a0, b0], leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey]) :- Exchange(distribution=[hash[a0]]) : +- Calc(select=[a0], where=[>(a0, 0)]) : +- TableSourceScan(table=[[default_catalog, default_database, A, filter=[]]], fields=[a0]) +- Exchange(distribution=[hash[b0]]) +- TableSourceScan(table=[[default_catalog, default_database, B]], fields=[b0]) {code} After the changes: {code} Join(joinType=[InnerJoin], where=[=(a0, b0)], select=[a0, b0], leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey]) :- Exchange(distribution=[hash[a0]]) : +- Calc(select=[a0], where=[>(a0, 0)]) : +- TableSourceScan(table=[[default_catalog, default_database, A, filter=[]]], fields=[a0]) +- Exchange(distribution=[hash[b0]]) +- Calc(select=[b0], where=[>(b0, 0)]) +- TableSourceScan(table=[[default_catalog, default_database, B, filter=[]]], fields=[b0]) {code} i.e., b0 > 0 is inferred and pushed down -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-28566) Adds materialization support to eliminate the non determinism generated by lookup join node
lincoln lee created FLINK-28566: --- Summary: Adds materialization support to eliminate the non determinism generated by lookup join node Key: FLINK-28566 URL: https://issues.apache.org/jira/browse/FLINK-28566 Project: Flink Issue Type: Sub-task Components: Table SQL / Planner Reporter: lincoln lee Fix For: 1.16.0 In order to minimize the potential exceptions or data errors when many users use the update stream to lookup join an external table (essentially due to the non-deterministic result based on processing-time to lookup external tables). When update exists in the input stream and the lookup key does not contain the primary key of the external table, FLINK automatically adds materialization of the update by default, so that it will only lookup the external table when the insert or update_after message arrives, and when the delete or update_before message arrives, it will directly querying the latest version of the locally materialized data and sent it to downstream operator. To do so,we introduce a new option 'table.exec.lookup-join.upsert-materialize' and resue the `UpsertMaterialize`. By default, the materialize operator will be added when an update stream lookup an external table without containing its primary keys(includes no primary key defined). You can also choose no materialization(NONE) or force materialization(FORCE) which will always enable materialization except input is insert only. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[RESULT][VOTE] FLIP-249: Flink Web UI Enhancement for Speculative Execution
Hi everyone, I’m happy to announce that FLIP-249[1] has been accepted, with 4 approving votes, 3 of which are binding[2]: - Zhu Zhu (binding) - Lijie Wang - Jing Zhang (binding) - Yun Gao (binding) There is no disapproving vote. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-249%3A+Flink+Web+UI+Enhancement+for+Speculative+Execution [2] https://lists.apache.org/thread/0xonn4y8lkj49comors0z86tpyzkvvqg Best, Gen
[jira] [Created] (FLINK-28565) Create NOTICE file for flink-table-store-hive-catalog
Jingsong Lee created FLINK-28565: Summary: Create NOTICE file for flink-table-store-hive-catalog Key: FLINK-28565 URL: https://issues.apache.org/jira/browse/FLINK-28565 Project: Flink Issue Type: Improvement Components: Table Store Reporter: Jingsong Lee Fix For: table-store-0.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-28564) Update NOTICE/LICENCE files for 1.1.0 release
Matyas Orhidi created FLINK-28564: - Summary: Update NOTICE/LICENCE files for 1.1.0 release Key: FLINK-28564 URL: https://issues.apache.org/jira/browse/FLINK-28564 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Matyas Orhidi Fix For: kubernetes-operator-1.1.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-28563) Add Transformer for VectorSlicer
weibo zhao created FLINK-28563: -- Summary: Add Transformer for VectorSlicer Key: FLINK-28563 URL: https://issues.apache.org/jira/browse/FLINK-28563 Project: Flink Issue Type: New Feature Components: Library / Machine Learning Reporter: weibo zhao h1. Add Transformer for VectorSlicer. -- This message was sent by Atlassian Jira (v8.20.10#820010)