date:20201023

[jira] [Created] (FLINK-19793) KafkaTableITCase.testKafkaSourceSinkWithMetadata fails on AZP

2020-10-23 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-19793:
-

 Summary: KafkaTableITCase.testKafkaSourceSinkWithMetadata fails on 
AZP
 Key: FLINK-19793
 URL: https://issues.apache.org/jira/browse/FLINK-19793
 Project: Flink
  Issue Type: Task
  Components: Connectors / Kafka, Table SQL / Ecosystem
Affects Versions: 1.12.0
Reporter: Till Rohrmann
 Fix For: 1.12.0


The {{KafkaTableITCase.testKafkaSourceSinkWithMetadata}} seems to fail on AZP:

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=8197&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Releasing Apache Flink 1.11.3

2020-10-23 Thread Till Rohrmann

Thanks for starting this discussion Gordon. There are over 100 issues which
are fixed for 1.11.3. Hence +1 for a soonish 1.11.3 release. Thanks for
volunteering as our release managers Gordon and Xintong!

Cheers,
Till

On Fri, Oct 23, 2020 at 5:02 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi,
>
> Xintong and I would like to start a discussion for releasing Flink 1.11.3
> soon.
>
> It seems like we already have a few pressing issues that needs to be
> included in a new hotfix release:
>
>- Heap-based timers’ restore behaviour is causing a critical recovery
>issue for StateFun [1] [2] [3].
>- There are several robustness issues for the FLIP-27 new source API,
>such as [4]. We already have some users using the FLIP-27 API with
> 1.11.x,
>so it would be important to get those fixes in for 1.11.x as well.
>
> Apart from the issues that are already marked as blocker for 1.11.3 in our
> JIRA [5], please let us know in this thread if there is already ongoing
> work for other important fixes that we should try to include.
>
> Xintong and I would like to volunteer for managing this release, and will
> try to communicate the priority of pending blockers over the next few days.
> Since the aforementioned issues are quite critical, we’d like to aim
> for a *feature
> freeze by the end of next week (Oct. 30th)* and start the release voting
> process the week after.
> If that is too short of a notice and you might need more time, please let
> us know!
>
> Cheers,
> Gordon
>
> [1] https://issues.apache.org/jira/browse/FLINK-19692
> [2] https://issues.apache.org/jira/browse/FLINK-19741
> [3] https://issues.apache.org/jira/browse/FLINK-19748
> [4] https://issues.apache.org/jira/browse/FLINK-19717
> [5]
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20priority%20%3D%20Blocker%20AND%20fixVersion%20%3D%201.11.3
>

[DISCUSS] Releasing Apache Flink 1.11.3

2020-10-23 Thread Tzu-Li (Gordon) Tai

Hi,

Xintong and I would like to start a discussion for releasing Flink 1.11.3
soon.

It seems like we already have a few pressing issues that needs to be
included in a new hotfix release:

   - Heap-based timers’ restore behaviour is causing a critical recovery
   issue for StateFun [1] [2] [3].
   - There are several robustness issues for the FLIP-27 new source API,
   such as [4]. We already have some users using the FLIP-27 API with 1.11.x,
   so it would be important to get those fixes in for 1.11.x as well.

Apart from the issues that are already marked as blocker for 1.11.3 in our
JIRA [5], please let us know in this thread if there is already ongoing
work for other important fixes that we should try to include.

Xintong and I would like to volunteer for managing this release, and will
try to communicate the priority of pending blockers over the next few days.
Since the aforementioned issues are quite critical, we’d like to aim
for a *feature
freeze by the end of next week (Oct. 30th)* and start the release voting
process the week after.
If that is too short of a notice and you might need more time, please let
us know!

Cheers,
Gordon

[1] https://issues.apache.org/jira/browse/FLINK-19692
[2] https://issues.apache.org/jira/browse/FLINK-19741
[3] https://issues.apache.org/jira/browse/FLINK-19748
[4] https://issues.apache.org/jira/browse/FLINK-19717
[5]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20priority%20%3D%20Blocker%20AND%20fixVersion%20%3D%201.11.3

Re: [SURVEY] Remove Mesos support

2020-10-23 Thread Robert Metzger

Hey Piyush,
thanks a lot for raising this concern. I believe we should keep Mesos in
Flink then in the foreseeable future.
Your offer to help is much appreciated. We'll let you know once there is
something.

On Fri, Oct 23, 2020 at 4:28 PM Piyush Narang  wrote:

> Thanks Kostas. If there's items we can help with, I'm sure we'd be able to
> find folks who would be excited to contribute / help in any way.
>
> -- Piyush
>
>
> On 10/23/20, 10:25 AM, "Kostas Kloudas"  wrote:
>
> Thanks Piyush for the message.
> After this, I revoke my +1. I agree with the previous opinions that we
> cannot drop code that is actively used by users, especially if it
> something that deep in the stack as support for cluster management
> framework.
>
> Cheers,
> Kostas
>
> On Fri, Oct 23, 2020 at 4:15 PM Piyush Narang 
> wrote:
> >
> > Hi folks,
> >
> >
> >
> > We at Criteo are active users of the Flink on Mesos resource
> management component. We are pretty heavy users of Mesos for scheduling
> workloads on our edge datacenters and we do want to continue to be able to
> run some of our Flink topologies (to compute machine learning short term
> features) on those DCs. If possible our vote would be not to drop Mesos
> support as that will tie us to an old release / have to maintain a fork as
> we’re not planning to migrate off Mesos anytime soon. Is the burden
> something that can be helped with by the community? (Or are you referring
> to having to ensure PRs handle the Mesos piece as well when they touch the
> resource managers?)
> >
> >
> >
> > Thanks,
> >
> >
> >
> > -- Piyush
> >
> >
> >
> >
> >
> > From: Till Rohrmann 
> > Date: Friday, October 23, 2020 at 8:19 AM
> > To: Xintong Song 
> > Cc: dev , user 
> > Subject: Re: [SURVEY] Remove Mesos support
> >
> >
> >
> > Thanks for starting this survey Robert! I second Konstantin and
> Xintong in the sense that our Mesos user's opinions should matter most
> here. If our community is no longer using the Mesos integration, then I
> would be +1 for removing it in order to decrease the maintenance burden.
> >
> >
> >
> > Cheers,
> >
> > Till
> >
> >
> >
> > On Fri, Oct 23, 2020 at 2:03 PM Xintong Song 
> wrote:
> >
> > +1 for adding a warning in 1.12 about planning to remove Mesos
> support.
> >
> >
> >
> > With my developer hat on, removing the Mesos support would
> definitely reduce the maintaining overhead for the deployment and resource
> management related components. On the other hand, the Flink on Mesos users'
> voices definitely matter a lot for this community. Either way, it would be
> good to draw users attention to this discussion early.
> >
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> >
> >
> > On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf 
> wrote:
> >
> > Hi Robert,
> >
> > +1 to the plan you outlined. If we were to drop support in Flink
> 1.13+, we
> > would still support it in Flink 1.12- with bug fixes for some time
> so that
> > users have time to move on.
> >
> > It would certainly be very interesting to hear from current Flink on
> Mesos
> > users, on how they see the evolution of this part of the ecosystem.
> >
> > Best,
> >
> > Konstantin
>
>
>

[jira] [Created] (FLINK-19792) Interval join with equal time attributes is not recognized

2020-10-23 Thread Timo Walther (Jira)

Timo Walther created FLINK-19792:


 Summary: Interval join with equal time attributes is not recognized
 Key: FLINK-19792
 URL: https://issues.apache.org/jira/browse/FLINK-19792
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Timo Walther


A user reported that interval joins with equal time attribute predicate is not 
recognized, instead a regular inner join is used:

For example:
{code}
table1 = table_env.from_path("table1")
table2 = table_env.from_path("table2")

print(table1.join(table2).where("ts = ts2 && id = id2").select("id, ts")
{code}

The documentation clearly states that this should be supported:
{code}
For example, the following predicates are valid interval join conditions:

ltime === rtime
ltime >= rtime && ltime < rtime + 10.minutes
{code}
Source: 
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/tableApi.html#joins

See also the discussion here:
https://stackoverflow.com/q/64445207/806430



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [SURVEY] Remove Mesos support

2020-10-23 Thread Piyush Narang

Thanks Kostas. If there's items we can help with, I'm sure we'd be able to find 
folks who would be excited to contribute / help in any way. 

-- Piyush
 

On 10/23/20, 10:25 AM, "Kostas Kloudas"  wrote:

Thanks Piyush for the message.
After this, I revoke my +1. I agree with the previous opinions that we
cannot drop code that is actively used by users, especially if it
something that deep in the stack as support for cluster management
framework.

Cheers,
Kostas

On Fri, Oct 23, 2020 at 4:15 PM Piyush Narang  wrote:
>
> Hi folks,
>
>
>
> We at Criteo are active users of the Flink on Mesos resource management 
component. We are pretty heavy users of Mesos for scheduling workloads on our 
edge datacenters and we do want to continue to be able to run some of our Flink 
topologies (to compute machine learning short term features) on those DCs. If 
possible our vote would be not to drop Mesos support as that will tie us to an 
old release / have to maintain a fork as we’re not planning to migrate off 
Mesos anytime soon. Is the burden something that can be helped with by the 
community? (Or are you referring to having to ensure PRs handle the Mesos piece 
as well when they touch the resource managers?)
>
>
>
> Thanks,
>
>
>
> -- Piyush
>
>
>
>
>
> From: Till Rohrmann 
> Date: Friday, October 23, 2020 at 8:19 AM
> To: Xintong Song 
> Cc: dev , user 
> Subject: Re: [SURVEY] Remove Mesos support
>
>
>
> Thanks for starting this survey Robert! I second Konstantin and Xintong 
in the sense that our Mesos user's opinions should matter most here. If our 
community is no longer using the Mesos integration, then I would be +1 for 
removing it in order to decrease the maintenance burden.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Fri, Oct 23, 2020 at 2:03 PM Xintong Song  
wrote:
>
> +1 for adding a warning in 1.12 about planning to remove Mesos support.
>
>
>
> With my developer hat on, removing the Mesos support would definitely 
reduce the maintaining overhead for the deployment and resource management 
related components. On the other hand, the Flink on Mesos users' voices 
definitely matter a lot for this community. Either way, it would be good to 
draw users attention to this discussion early.
>
>
>
> Thank you~
>
> Xintong Song
>
>
>
>
>
> On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf  
wrote:
>
> Hi Robert,
>
> +1 to the plan you outlined. If we were to drop support in Flink 1.13+, we
> would still support it in Flink 1.12- with bug fixes for some time so that
> users have time to move on.
>
> It would certainly be very interesting to hear from current Flink on Mesos
> users, on how they see the evolution of this part of the ecosystem.
>
> Best,
>
> Konstantin

Re: [SURVEY] Remove Mesos support

2020-10-23 Thread Kostas Kloudas

Thanks Piyush for the message.
After this, I revoke my +1. I agree with the previous opinions that we
cannot drop code that is actively used by users, especially if it
something that deep in the stack as support for cluster management
framework.

Cheers,
Kostas

On Fri, Oct 23, 2020 at 4:15 PM Piyush Narang  wrote:
>
> Hi folks,
>
>
>
> We at Criteo are active users of the Flink on Mesos resource management 
> component. We are pretty heavy users of Mesos for scheduling workloads on our 
> edge datacenters and we do want to continue to be able to run some of our 
> Flink topologies (to compute machine learning short term features) on those 
> DCs. If possible our vote would be not to drop Mesos support as that will tie 
> us to an old release / have to maintain a fork as we’re not planning to 
> migrate off Mesos anytime soon. Is the burden something that can be helped 
> with by the community? (Or are you referring to having to ensure PRs handle 
> the Mesos piece as well when they touch the resource managers?)
>
>
>
> Thanks,
>
>
>
> -- Piyush
>
>
>
>
>
> From: Till Rohrmann 
> Date: Friday, October 23, 2020 at 8:19 AM
> To: Xintong Song 
> Cc: dev , user 
> Subject: Re: [SURVEY] Remove Mesos support
>
>
>
> Thanks for starting this survey Robert! I second Konstantin and Xintong in 
> the sense that our Mesos user's opinions should matter most here. If our 
> community is no longer using the Mesos integration, then I would be +1 for 
> removing it in order to decrease the maintenance burden.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Fri, Oct 23, 2020 at 2:03 PM Xintong Song  wrote:
>
> +1 for adding a warning in 1.12 about planning to remove Mesos support.
>
>
>
> With my developer hat on, removing the Mesos support would definitely reduce 
> the maintaining overhead for the deployment and resource management related 
> components. On the other hand, the Flink on Mesos users' voices definitely 
> matter a lot for this community. Either way, it would be good to draw users 
> attention to this discussion early.
>
>
>
> Thank you~
>
> Xintong Song
>
>
>
>
>
> On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf  wrote:
>
> Hi Robert,
>
> +1 to the plan you outlined. If we were to drop support in Flink 1.13+, we
> would still support it in Flink 1.12- with bug fixes for some time so that
> users have time to move on.
>
> It would certainly be very interesting to hear from current Flink on Mesos
> users, on how they see the evolution of this part of the ecosystem.
>
> Best,
>
> Konstantin

Re: [SURVEY] Remove Mesos support

2020-10-23 Thread Piyush Narang

Hi folks,

We at Criteo are active users of the Flink on Mesos resource management 
component. We are pretty heavy users of Mesos for scheduling workloads on our 
edge datacenters and we do want to continue to be able to run some of our Flink 
topologies (to compute machine learning short term features) on those DCs. If 
possible our vote would be not to drop Mesos support as that will tie us to an 
old release / have to maintain a fork as we’re not planning to migrate off 
Mesos anytime soon. Is the burden something that can be helped with by the 
community? (Or are you referring to having to ensure PRs handle the Mesos piece 
as well when they touch the resource managers?)

Thanks,

-- Piyush


From: Till Rohrmann 
Date: Friday, October 23, 2020 at 8:19 AM
To: Xintong Song 
Cc: dev , user 
Subject: Re: [SURVEY] Remove Mesos support

Thanks for starting this survey Robert! I second Konstantin and Xintong in the 
sense that our Mesos user's opinions should matter most here. If our community 
is no longer using the Mesos integration, then I would be +1 for removing it in 
order to decrease the maintenance burden.

Cheers,
Till

On Fri, Oct 23, 2020 at 2:03 PM Xintong Song 
mailto:tonysong...@gmail.com>> wrote:
+1 for adding a warning in 1.12 about planning to remove Mesos support.



With my developer hat on, removing the Mesos support would definitely reduce 
the maintaining overhead for the deployment and resource management related 
components. On the other hand, the Flink on Mesos users' voices definitely 
matter a lot for this community. Either way, it would be good to draw users 
attention to this discussion early.



Thank you~

Xintong Song


On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf 
mailto:kna...@apache.org>> wrote:
Hi Robert,

+1 to the plan you outlined. If we were to drop support in Flink 1.13+, we
would still support it in Flink 1.12- with bug fixes for some time so that
users have time to move on.

It would certainly be very interesting to hear from current Flink on Mesos
users, on how they see the evolution of this part of the ecosystem.

Best,

Konstantin

[jira] [Created] (FLINK-19791) PartitionRequestClientFactoryTest.testInterruptsNotCached fails with NullPointerException

2020-10-23 Thread Robert Metzger (Jira)

Robert Metzger created FLINK-19791:
--

 Summary: PartitionRequestClientFactoryTest.testInterruptsNotCached 
fails with NullPointerException
 Key: FLINK-19791
 URL: https://issues.apache.org/jira/browse/FLINK-19791
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.12.0
Reporter: Robert Metzger


https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8517&view=logs&j=6e58d712-c5cc-52fb-0895-6ff7bd56c46b&t=f30a8e80-b2cf-535c-9952-7f521a4ae374

{code}
2020-10-23T13:25:12.0774554Z [ERROR] 
testInterruptsNotCached(org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactoryTest)
  Time elapsed: 0.762 s  <<< ERROR!
2020-10-23T13:25:12.0775695Z java.io.IOException: 
java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
Connecting to remote task manager '934dfa03c743/172.18.0.2:8080' has failed. 
This might indicate that the remote task manager has been lost.
2020-10-23T13:25:12.0776455Zat 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:95)
2020-10-23T13:25:12.0777038Zat 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactoryTest.testInterruptsNotCached(PartitionRequestClientFactoryTest.java:72)
2020-10-23T13:25:12.0777465Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-10-23T13:25:12.0777815Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-10-23T13:25:12.0778221Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-10-23T13:25:12.0778581Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2020-10-23T13:25:12.0778921Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
2020-10-23T13:25:12.0779331Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2020-10-23T13:25:12.0779733Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
2020-10-23T13:25:12.0780117Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2020-10-23T13:25:12.0780484Zat 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
2020-10-23T13:25:12.0780851Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
2020-10-23T13:25:12.0781236Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
2020-10-23T13:25:12.0781600Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2020-10-23T13:25:12.0781937Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
2020-10-23T13:25:12.0782431Zat 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
2020-10-23T13:25:12.0782877Zat 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
2020-10-23T13:25:12.0783223Zat 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
2020-10-23T13:25:12.0783541Zat 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
2020-10-23T13:25:12.0783905Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
2020-10-23T13:25:12.0784315Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
2020-10-23T13:25:12.0784718Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
2020-10-23T13:25:12.0785125Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
2020-10-23T13:25:12.0785552Zat 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
2020-10-23T13:25:12.0785980Zat 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
2020-10-23T13:25:12.0786379Zat 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
2020-10-23T13:25:12.0786763Zat 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
2020-10-23T13:25:12.0787922Z Caused by: 
java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
Connecting to remote task manager '934dfa03c743/172.18.0.2:8080' has failed. 
This might indicate that the remote task manager has been lost.
2020-10-23T13:25:12.0788575Zat 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
2020-10-23T13:25:12.0788954Zat 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
2020-10-23T13:25:12.0789431Zat 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:88)
2020-10-23T13:25:12.0789808Z... 26 more
202

[jira] [Created] (FLINK-19790) Writing MAP to Kafka with JSON format produces incorrect data.

2020-10-23 Thread Fabian Hueske (Jira)

Fabian Hueske created FLINK-19790:
-

 Summary: Writing MAP to Kafka with JSON format 
produces incorrect data.
 Key: FLINK-19790
 URL: https://issues.apache.org/jira/browse/FLINK-19790
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Ecosystem
Affects Versions: 1.11.2
Reporter: Fabian Hueske


Running the following SQL script writes incorrect data to Kafka:
{code:java}
CREATE TEMPORARY TABLE tmp_1 (m MAP) WITH (
  'connector' = 'kafka',
  'format' = 'json',
  'properties.bootstrap.servers' = '...',
  'properties.group.id' = '...',
  'topic' = 'tmp-1'
);

CREATE TEMPORARY TABLE gen (k STRING, v STRING) WITH (
  'connector' = 'datagen'
);

CREATE TEMPORARY VIEW gen_short AS
SELECT SUBSTR(k, 0, 4) AS k, SUBSTR(v, 0, 4) AS v FROM gen;

INSERT INTO tmp_1
SELECT MAP[k, v] FROM gen_short; {code}
Printing the content of the {{tmp-1}} topics results in the following output:
{code:java}
$ kafka-console-consumer --bootstrap-server ... --from-beginning --topic tmp-1 
| head -n 5
{"m":{"8a93":"6102"}}
{"m":{"8a93":"6102","7922":"f737"}}
{"m":{"8a93":"6102","7922":"f737","9b63":"15b0"}}
{"m":{"8a93":"6102","7922":"f737","9b63":"15b0","c38b":"b55c"}}
{"m":{"8a93":"6102","7922":"f737","9b63":"15b0","c38b":"b55c","222c":"f3e2"}}
{code}
As you can see, the map is not correctly encoded as JSON and written to Kafka.

I've run the query with the Blink planner with object reuse and operator 
pipelining disabled.
Writing with Avro works as expected.

Hence I assume that the JSON encoder/serializer reuses the Map object when 
encoding the JSON.

 

 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [SURVEY] Remove Mesos support

2020-10-23 Thread Till Rohrmann

Thanks for starting this survey Robert! I second Konstantin and Xintong in
the sense that our Mesos user's opinions should matter most here. If our
community is no longer using the Mesos integration, then I would be +1 for
removing it in order to decrease the maintenance burden.

Cheers,
Till

On Fri, Oct 23, 2020 at 2:03 PM Xintong Song  wrote:

> +1 for adding a warning in 1.12 about planning to remove Mesos support.
>
>
> With my developer hat on, removing the Mesos support would
> definitely reduce the maintaining overhead for the deployment and resource
> management related components. On the other hand, the Flink on Mesos users'
> voices definitely matter a lot for this community. Either way, it would be
> good to draw users attention to this discussion early.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf 
> wrote:
>
>> Hi Robert,
>>
>> +1 to the plan you outlined. If we were to drop support in Flink 1.13+, we
>> would still support it in Flink 1.12- with bug fixes for some time so that
>> users have time to move on.
>>
>> It would certainly be very interesting to hear from current Flink on Mesos
>> users, on how they see the evolution of this part of the ecosystem.
>>
>> Best,
>>
>> Konstantin
>>
>

Re: [SURVEY] Remove Mesos support

2020-10-23 Thread Kostas Kloudas

+1 for adding a warning about the removal of Mesos support and I would
also propose to state explicitly in the warning the version that we
are planning to actually remove it (e.g. 1.13 or even 1.14 if we feel
it is too aggressive).

This will help as a reminder to users and devs about the upcoming
removal and it will avoid future, potentially endless, discussions.

Cheers,
Kostas

On Fri, Oct 23, 2020 at 2:03 PM Xintong Song  wrote:
>
> +1 for adding a warning in 1.12 about planning to remove Mesos support.
>
>
> With my developer hat on, removing the Mesos support would definitely reduce 
> the maintaining overhead for the deployment and resource management related 
> components. On the other hand, the Flink on Mesos users' voices definitely 
> matter a lot for this community. Either way, it would be good to draw users 
> attention to this discussion early.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf  wrote:
>>
>> Hi Robert,
>>
>> +1 to the plan you outlined. If we were to drop support in Flink 1.13+, we
>> would still support it in Flink 1.12- with bug fixes for some time so that
>> users have time to move on.
>>
>> It would certainly be very interesting to hear from current Flink on Mesos
>> users, on how they see the evolution of this part of the ecosystem.
>>
>> Best,
>>
>> Konstantin

[jira] [Created] (FLINK-19789) Migrate Hive connector to new table source sink interface

2020-10-23 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-19789:


 Summary: Migrate Hive connector to new table source sink interface
 Key: FLINK-19789
 URL: https://issues.apache.org/jira/browse/FLINK-19789
 Project: Flink
  Issue Type: Sub-task
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [SURVEY] Remove Mesos support

2020-10-23 Thread Xintong Song

+1 for adding a warning in 1.12 about planning to remove Mesos support.

With my developer hat on, removing the Mesos support would
definitely reduce the maintaining overhead for the deployment and resource
management related components. On the other hand, the Flink on Mesos users'
voices definitely matter a lot for this community. Either way, it would be
good to draw users attention to this discussion early.

Thank you~

Xintong Song

On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf  wrote:

> Hi Robert,
>
> +1 to the plan you outlined. If we were to drop support in Flink 1.13+, we
> would still support it in Flink 1.12- with bug fixes for some time so that
> users have time to move on.
>
> It would certainly be very interesting to hear from current Flink on Mesos
> users, on how they see the evolution of this part of the ecosystem.
>
> Best,
>
> Konstantin
>

Re: [VOTE] FLIP-135: Approximate Task-Local Recovery

2020-10-23 Thread Yuan Mei

Hey All,

The voting time for FLIP-135 has passed. I'm closing the vote now.

There are 6 +1 votes, 4 of which are binding:
- Piotr (binding)
- Zhijiang (binding)
- Steven (non-binding)
- Roman (non-binding)
- Yu (binding)
- Becket (binding)

There were no disapproving votes.

Thus, FLIP-135 has been accepted.

Thanks everyone for giving great feedback!

Best,
Yuan


On Fri, Oct 23, 2020 at 7:46 PM Becket Qin  wrote:

> +1 (binding)
>
> Thanks for the proposal and well written wiki, Yuan. It is a very useful
> feature.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Oct 21, 2020 at 11:31 AM Yu Li  wrote:
>
> > +1 (binding)
> >
> > Thanks for driving this Yuan! It definitely helps in relative scenarios,
> > especially in the machine learning area.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Wed, 21 Oct 2020 at 03:50, Khachatryan Roman <
> > khachatryan.ro...@gmail.com>
> > wrote:
> >
> > > +1 (non-binding).
> > >
> > > It would be a great improvement, thanks for the effort!
> > >
> > > Regards,
> > > Roman
> > >
> > >
> > > On Tue, Oct 20, 2020 at 4:49 PM Steven Wu 
> wrote:
> > >
> > > > +1 (non-binding).
> > > >
> > > > Some of our users have asked for this tradeoff of consistency over
> > > > availability for some cases.
> > > >
> > > > On Mon, Oct 19, 2020 at 8:02 PM Zhijiang  > > > .invalid>
> > > > wrote:
> > > >
> > > > > Thanks for driving this effort, Yuan.
> > > > >
> > > > > +1 (binding) on my side.
> > > > >
> > > > > Best,
> > > > > Zhijiang
> > > > >
> > > > >
> > > > > --
> > > > > From:Piotr Nowojski 
> > > > > Send Time:2020年10月19日(星期一) 21:02
> > > > > To:dev 
> > > > > Subject:Re: [VOTE] FLIP-135: Approximate Task-Local Recovery
> > > > >
> > > > > Hey,
> > > > >
> > > > > I carry over my +1 (binding) from the discussion thread.
> > > > >
> > > > > Best,
> > > > > Piotrek
> > > > >
> > > > > pon., 19 paź 2020 o 14:56 Yuan Mei 
> > > napisał(a):
> > > > >
> > > > > > Hey,
> > > > > >
> > > > > > I would like to start a voting thread for FLIP-135 [1], for
> > > approximate
> > > > > > task local recovery. The proposal has been discussed in [2].
> > > > > >
> > > > > > The vote will be open till Oct. 23rd (72h, excluding weekends)
> > unless
> > > > > there
> > > > > > is an objection or not enough votes.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-135+Approximate+Task-Local+Recovery
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-135-Approximate-Task-Local-Recovery-tp43930.html
> > > > > >
> > > > > >
> > > > > > Best
> > > > > >
> > > > > > Yuan
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [SURVEY] Remove Mesos support

2020-10-23 Thread Konstantin Knauf

Hi Robert,

+1 to the plan you outlined. If we were to drop support in Flink 1.13+, we
would still support it in Flink 1.12- with bug fixes for some time so that
users have time to move on.

It would certainly be very interesting to hear from current Flink on Mesos
users, on how they see the evolution of this part of the ecosystem.

Best,

Konstantin

Re: [DISCUSS] Release 1.12 Feature Freeze

2020-10-23 Thread Yang Wang

+1 for Sunday night. This will help a lot for FLIP-144(Native Kubernetes HA
for Flink).

Best,
Yang

Zhu Zhu  于2020年10月23日周五 下午2:15写道：

> +1 for November 2nd
>
> Thanks,
> Zhu
>
> Xintong Song  于2020年10月20日周二 下午11:26写道：
>
> > +1 for Nov. 2nd.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Oct 20, 2020 at 8:56 PM Leonard Xu  wrote:
> >
> > > +1 for **Sunday night**.
> > >
> > >
> > > Best
> > > Leonard
> > >
> > > > 在 2020年10月20日，19:03，Jingsong Li  写道：
> > > >
> > > > +1 for Sunday night. This also helps Filesystem and Hive
> implementation
> > > for
> > > > FLIP-27 Source. And the implementation will block "Support Multiple
> > Input
> > > > for Blink Planner". Multiple input is very important for Batch
> > > performance.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Tue, Oct 20, 2020 at 6:36 PM Yu Li  wrote:
> > > >
> > > >> +1 for Sunday night. This also helps for more thorough testing of
> the
> > > >> RocksDB version bumping up job [1].
> > > >>
> > > >> Thanks.
> > > >>
> > > >> Best Regards,
> > > >> Yu
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/FLINK-14482
> > > >>
> > > >> On Tue, 20 Oct 2020 at 17:06, Robert Metzger 
> > > wrote:
> > > >>
> > > >>> Thanks a lot.
> > > >>>
> > > >>> It seems that a few people are supportive of the idea of moving the
> > > >> feature
> > > >>> freeze to Friday.
> > > >>> I would actually propose to make it *Sunday night* then. We'll then
> > > >> create
> > > >>> the first release candidate on Monday morning, November 2nd.
> > > >>>
> > > >>>
> > > >>> On Mon, Oct 19, 2020 at 1:27 PM Danny Chan 
> > > wrote:
> > > >>>
> > >  Per FLIP-145, we have many runtime operators to implement and
> bridge
> > > it
> > >  with the planner.
> > > 
> > >  Best,
> > >  Danny Chan
> > >  在 2020年10月19日 +0800 PM6:59，Robert Metzger  >，写道：
> > > > Thank you for your responses so far.
> > > >
> > > > @Kurt, Jingsong, Danny: Which JIRAs/FLIPs are going to benefit
> from
> > > >> the
> > > > extension?
> > > >
> > > > On Mon, Oct 19, 2020 at 12:45 PM Aljoscha Krettek <
> > > >> aljos...@apache.org
> > > 
> > > > wrote:
> > > >
> > > >> @Robert Your (and Dian's) suggestions sound good to me! I like
> > > >>> keeping
> > > >> to master frozen for a while since it will prevent a lot of
> > > >> duplicate
> > > >> merging efforts.
> > > >>
> > > >> Regarding the date: I'm fine with the proposed date but I can
> also
> > > >>> see
> > > >> that extending it to the end of the week could be helpful.
> > > >>
> > > >> Aljoscha
> > > >>
> > > >> On 19.10.20 12:24, Danny Chan wrote:
> > > >>> +1 for Kurt suggestion, there are many features for SQL yet, 2
> > > >> more
> > >  days
> > > >> are valuable.
> > > >>>
> > > >>> Best,
> > > >>> Danny Chan
> > > >>> 在 2020年10月19日 +0800 PM6:22，Jingsong Li  > >  ，写道：
> > >  Hi Robert,
> > > 
> > >  Thanks for your detailed explanation.
> > > 
> > >  At present, we are preparing or participating in Flink
> forward,
> > > >>> so
> > >  +1
> > > >> for
> > >  appropriate extension of deadline.
> > > 
> > >  Best,
> > >  Jingsong
> > > 
> > >  On Mon, Oct 19, 2020 at 5:36 PM Kurt Young 
> > >  wrote:
> > > 
> > > > Can we change the freeze date to October 30th (Friday next
> > >  week)? It
> > > >> would
> > > > be helpful
> > > > for us if we have 2 more days.
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Mon, Oct 19, 2020 at 5:00 PM Robert Metzger <
> > >  rmetz...@apache.org>
> > > > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Dian and I would like to discuss a few items regarding the
> > >  upcoming
> > > >> Flink
> > > >> 1.12 feature freeze:
> > > >>
> > > >> *A) Exact feature freeze day*
> > > >> So far, we've always said "end of October
> > > >> <
> > >  https://cwiki.apache.org/confluence/display/FLINK/1.12+Release>"
> > for
> > > > the
> > > >> freeze. We propose (end of day CEST) October 28th
> > > >> (Wednesday
> > >  next
> > > >> week)
> > > > as
> > > >> the feature freeze time.
> > > >> We want to create RC0 on the day after the feature freeze,
> > > >> to
> > >  make
> > > >> sure
> > > > the
> > > >> RC creation process is running smoothly, and to have a
> > > >> common
> > >  testing
> > > >> reference point.
> > > >>
> > > >>
> > > >>
> > > >> *B) What does feature freeze mean?*After the feature
> > > >> freeze,
> > >  no new
> > > >> features are allowed to be merged to master. Only bug fixes
> > > >>> and
> > > >> documentation impr

Re: [VOTE] FLIP-135: Approximate Task-Local Recovery

2020-10-23 Thread Becket Qin

+1 (binding)

Thanks for the proposal and well written wiki, Yuan. It is a very useful
feature.

Thanks,

Jiangjie (Becket) Qin

On Wed, Oct 21, 2020 at 11:31 AM Yu Li  wrote:

> +1 (binding)
>
> Thanks for driving this Yuan! It definitely helps in relative scenarios,
> especially in the machine learning area.
>
> Best Regards,
> Yu
>
>
> On Wed, 21 Oct 2020 at 03:50, Khachatryan Roman <
> khachatryan.ro...@gmail.com>
> wrote:
>
> > +1 (non-binding).
> >
> > It would be a great improvement, thanks for the effort!
> >
> > Regards,
> > Roman
> >
> >
> > On Tue, Oct 20, 2020 at 4:49 PM Steven Wu  wrote:
> >
> > > +1 (non-binding).
> > >
> > > Some of our users have asked for this tradeoff of consistency over
> > > availability for some cases.
> > >
> > > On Mon, Oct 19, 2020 at 8:02 PM Zhijiang  > > .invalid>
> > > wrote:
> > >
> > > > Thanks for driving this effort, Yuan.
> > > >
> > > > +1 (binding) on my side.
> > > >
> > > > Best,
> > > > Zhijiang
> > > >
> > > >
> > > > --
> > > > From:Piotr Nowojski 
> > > > Send Time:2020年10月19日(星期一) 21:02
> > > > To:dev 
> > > > Subject:Re: [VOTE] FLIP-135: Approximate Task-Local Recovery
> > > >
> > > > Hey,
> > > >
> > > > I carry over my +1 (binding) from the discussion thread.
> > > >
> > > > Best,
> > > > Piotrek
> > > >
> > > > pon., 19 paź 2020 o 14:56 Yuan Mei 
> > napisał(a):
> > > >
> > > > > Hey,
> > > > >
> > > > > I would like to start a voting thread for FLIP-135 [1], for
> > approximate
> > > > > task local recovery. The proposal has been discussed in [2].
> > > > >
> > > > > The vote will be open till Oct. 23rd (72h, excluding weekends)
> unless
> > > > there
> > > > > is an objection or not enough votes.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-135+Approximate+Task-Local+Recovery
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-135-Approximate-Task-Local-Recovery-tp43930.html
> > > > >
> > > > >
> > > > > Best
> > > > >
> > > > > Yuan
> > > > >
> > > >
> > > >
> > >
> >
>

[SURVEY] Remove Mesos support

2020-10-23 Thread Robert Metzger

Hi all,

I wanted to discuss if it makes sense to remove support for Mesos in Flink.
It seems that nobody is actively maintaining that component (except for
necessary refactorings because of interfaces we are changing), and there
are almost no users reporting issues or asking for features.

The Apache Mesos project itself seems very inactive: There has been only
one message on the dev@ list in the last 3 months.

In 2020, I found only 3 users mentioning that they are using Mesos on the
user@ list.

Maybe it makes sense to add a prominent log warning into the Mesos code in
the Flink 1.12 release, that we are planning to remove Mesos support. Users
will then have enough opportunity to raise concerns or discuss with us.

Best,
Robert

Re: [VOTE] NEW FLIP-102: Add More Metrics to TaskManager

2020-10-23 Thread Xintong Song

I'm aware that many steps are already done. But I believe not all of them
are done, as the FLIP doc currently shows, unless the FLIP is already
completed.

Taking a closer look at the issues, I believe step 5 should point
to FLINK-19764, rather than FLINK-15328 which is closed for duplication.

Anyway, +1 on the proposed changes.

Thank you~

Xintong Song



On Fri, Oct 23, 2020 at 6:59 PM Andrey Zagrebin 
wrote:

> Thanks a lot for this nice UI guys!
>
> +1 and for closed issues that is just because many steps have been
> already done.
>
> Best,
> Andrey
>
> On Fri, Oct 23, 2020 at 11:12 AM Till Rohrmann 
> wrote:
>
> > Thanks for reviving this Flip Yadong! The changes look good to me and the
> > new memory UI looks awesome :-)
> >
> > I think the reason why the REST issues are closed is because they are
> > already done. In that sense some of the work already finished.
> >
> > +1 for adopting this FLIP and moving forward with updating the web UI
> > accordingly.
> >
> > Cheers,
> > Till
> >
> > On Fri, Oct 23, 2020 at 8:58 AM Jark Wu  wrote:
> >
> > > +1
> > >
> > > Thanks for the work.
> > >
> > > Best,
> > > Jark
> > >
> > > On Fri, 23 Oct 2020 at 10:13, Xintong Song 
> > wrote:
> > >
> > > > Thanks Yadong, Mattias and Lining for reviving this FLIP.
> > > >
> > > > I've seen so many users confused by the current webui page of task
> > > manager
> > > > metrics. This FLIP should definitely help them understand the memory
> > > > footprints and tune the configurations for task managers.
> > > >
> > > > The design part of this proposal looks really good to me. The UI is
> > clear
> > > > and easy to understand. The metrics look correct to me.
> > > >
> > > > KIND REMINDER: I think the section `Implementation Proposal` in the
> > FLIP
> > > > doc needs to be updated, so that we can vote on this FLIP. Currently,
> > all
> > > > the tickets listed are closed.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Thu, Oct 22, 2020 at 5:53 PM Yadong Xie 
> > wrote:
> > > >
> > > > > Hi all
> > > > >
> > > > > I want to start a new vote for FLIP-102, which proposes to add more
> > > > metrics
> > > > > to the task manager in web UI.
> > > > >
> > > > > The new FLIP-102 was revisited and adapted following the old ML
> > > > discussion
> > > > > <
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-102-Add-More-Metrics-to-TaskManager-td37898.html
> > > > > >
> > > > > .
> > > > >
> > > > > Thanks to Matthias and Lining's effort, more metrics are available.
> > We
> > > > can
> > > > > match most of the effective configuration to the metrics just as
> > Flink
> > > > Doc
> > > > > <
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/memory/mem_setup_tm.html#detailed-memory-model
> > > > > >
> > > > > describes now.
> > > > >
> > > > > The vote will last for at least 72 hours, following the consensus
> > > voting.
> > > > >
> > > > >
> > > > > FLIP-102 wiki:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager
> > > > >
> > > > >
> > > > > Discussion thread:
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-102-Add-More-Metrics-to-TaskManager-td37898.html
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Yadong
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] NEW FLIP-102: Add More Metrics to TaskManager

2020-10-23 Thread Andrey Zagrebin

Thanks a lot for this nice UI guys!

+1 and for closed issues that is just because many steps have been
already done.

Best,
Andrey

On Fri, Oct 23, 2020 at 11:12 AM Till Rohrmann  wrote:

> Thanks for reviving this Flip Yadong! The changes look good to me and the
> new memory UI looks awesome :-)
>
> I think the reason why the REST issues are closed is because they are
> already done. In that sense some of the work already finished.
>
> +1 for adopting this FLIP and moving forward with updating the web UI
> accordingly.
>
> Cheers,
> Till
>
> On Fri, Oct 23, 2020 at 8:58 AM Jark Wu  wrote:
>
> > +1
> >
> > Thanks for the work.
> >
> > Best,
> > Jark
> >
> > On Fri, 23 Oct 2020 at 10:13, Xintong Song 
> wrote:
> >
> > > Thanks Yadong, Mattias and Lining for reviving this FLIP.
> > >
> > > I've seen so many users confused by the current webui page of task
> > manager
> > > metrics. This FLIP should definitely help them understand the memory
> > > footprints and tune the configurations for task managers.
> > >
> > > The design part of this proposal looks really good to me. The UI is
> clear
> > > and easy to understand. The metrics look correct to me.
> > >
> > > KIND REMINDER: I think the section `Implementation Proposal` in the
> FLIP
> > > doc needs to be updated, so that we can vote on this FLIP. Currently,
> all
> > > the tickets listed are closed.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Thu, Oct 22, 2020 at 5:53 PM Yadong Xie 
> wrote:
> > >
> > > > Hi all
> > > >
> > > > I want to start a new vote for FLIP-102, which proposes to add more
> > > metrics
> > > > to the task manager in web UI.
> > > >
> > > > The new FLIP-102 was revisited and adapted following the old ML
> > > discussion
> > > > <
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-102-Add-More-Metrics-to-TaskManager-td37898.html
> > > > >
> > > > .
> > > >
> > > > Thanks to Matthias and Lining's effort, more metrics are available.
> We
> > > can
> > > > match most of the effective configuration to the metrics just as
> Flink
> > > Doc
> > > > <
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/memory/mem_setup_tm.html#detailed-memory-model
> > > > >
> > > > describes now.
> > > >
> > > > The vote will last for at least 72 hours, following the consensus
> > voting.
> > > >
> > > >
> > > > FLIP-102 wiki:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager
> > > >
> > > >
> > > > Discussion thread:
> > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-102-Add-More-Metrics-to-TaskManager-td37898.html
> > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html
> > > >
> > > > Thanks,
> > > >
> > > > Yadong
> > > >
> > >
> >
>

[jira] [Created] (FLINK-19788) Flaky test ShuffleCompressionITCase.testDataCompressionForBlockingShuffle?

2020-10-23 Thread Matthias (Jira)

Matthias created FLINK-19788:


 Summary: Flaky test 
ShuffleCompressionITCase.testDataCompressionForBlockingShuffle?
 Key: FLINK-19788
 URL: https://issues.apache.org/jira/browse/FLINK-19788
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.11.2
Reporter: Matthias
 Fix For: 1.12.0


The {{ShuffleCompressionITCase.testDataCompressionForBlockingShuffle}} 
[failed|https://dev.azure.com/mapohl/flink/_build/results?buildId=92&view=logs&j=70ad9b63-500e-5dc9-5a3c-b60356162d7e&t=944c7023-8984-5aa2-b5f8-54922bd90d3a]
 once as part of [PR #13747|https://github.com/apache/flink/pull/13747] which 
fixes FLINK-19662.

The test shouldn't be related to the changes and succeeded locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19787) Migrate Filesystem source to new table source sink interface

2020-10-23 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-19787:


 Summary: Migrate Filesystem source to new table source sink 
interface
 Key: FLINK-19787
 URL: https://issues.apache.org/jira/browse/FLINK-19787
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Runtime
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19786) Flink doesn't set proper nullability for Logical types for Confluent Avro Serialization

2020-10-23 Thread Jira

Maciej Bryński created FLINK-19786:
--

 Summary: Flink doesn't set proper nullability for Logical types 
for Confluent Avro Serialization
 Key: FLINK-19786
 URL: https://issues.apache.org/jira/browse/FLINK-19786
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.12.0
Reporter: Maciej Bryński


When Flink is creating schema in registry nullability is not properly set for 
logical types.
Examples. Table:


{code:sql}
create table `test_logical_null` (
`string_field` STRING,
`timestamp_field` TIMESTAMP(3)
) WITH (
  'connector' = 'kafka', 
  'topic' = 'test-logical-null', 
  'properties.bootstrap.servers' = 'localhost:9092', 
  'properties.group.id' = 'test12345', 
   'scan.startup.mode' = 'earliest-offset', 
  'format' = 'avro-confluent', -- Must be set to 'avro-confluent' to configure 
this format.
  'avro-confluent.schema-registry.url' = 'http://localhost:8081', -- URL to 
connect to Confluent Schema Registry
  'avro-confluent.schema-registry.subject' = 'test-logical-null' -- Subject 
name to write to the Schema Registry service; required for sinks
)
{code}

Schema:


{code:json}
{
  "type": "record",
  "name": "record",
  "fields": [
{
  "name": "string_field",
  "type": [
"string",
"null"
  ]
},
{
  "name": "timestamp_field",
  "type": {
"type": "long",
"logicalType": "timestamp-millis"
  }
}
  ]
}
{code}

For not null fields:

{code:sql}
create table `test_logical_notnull` (
`string_field` STRING NOT NULL,
`timestamp_field` TIMESTAMP(3) NOT NULL
) WITH (
  'connector' = 'kafka', 
  'topic' = 'test-logical-notnull', 
  'properties.bootstrap.servers' = 'localhost:9092', 
  'properties.group.id' = 'test12345', 
   'scan.startup.mode' = 'earliest-offset', 
  'format' = 'avro-confluent', -- Must be set to 'avro-confluent' to configure 
this format.
  'avro-confluent.schema-registry.url' = 'http://localhost:8081', -- URL to 
connect to Confluent Schema Registry
  'avro-confluent.schema-registry.subject' = 'test-logical-notnull-value' -- 
Subject name to write to the Schema Registry service; required for sinks
);
{code}
Schema

{code:json}
{
  "type": "record",
  "name": "record",
  "fields": [
{
  "name": "string_field",
  "type": "string"
},
{
  "name": "timestamp_field",
  "type": {
"type": "long",
"logicalType": "timestamp-millis"
  }
}
  ]
}
{code}
As we can see for string_field we have proper union with null (for nullable 
field). For timestamp_field in both examples union is missing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE]FLIP-149: Introduce the upsert-kafka connector

2020-10-23 Thread Timo Walther


+1

Thanks,
Timo

On 23.10.20 10:21, Jingsong Li wrote:

+1

On Fri, Oct 23, 2020 at 3:52 PM Konstantin Knauf  wrote:


+1

On Fri, Oct 23, 2020 at 9:36 AM Jark Wu  wrote:


+1

On Fri, 23 Oct 2020 at 15:25, Shengkai Fang  wrote:


Hi, all,

I would like to start the vote for FLIP-149[1], which is discussed and
reached a consensus in the discussion thread[2]. The vote will be open
until 16:00(UTC+8) 28th Oct. (72h, exclude weekends), unless there is

an

objection or not enough votes.

[1]





https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector

[2]





https://lists.apache.org/thread.html/r83e3153377594276b2066e49e399ec05d127b58bb4ce0fde33309da2%40%3Cdev.flink.apache.org%3E







--

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

[jira] [Created] (FLINK-19785) Upgrade commons-io to 2.7 or newer

2020-10-23 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-19785:
-

 Summary: Upgrade commons-io to 2.7 or newer
 Key: FLINK-19785
 URL: https://issues.apache.org/jira/browse/FLINK-19785
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination
Affects Versions: 1.11.2, 1.12.0
Reporter: Till Rohrmann
 Fix For: 1.12.0, 1.11.3


A user reported a dependency vulnerability which affects {{commons-io}} [1]. We 
should try to upgrade this dependency to {{2.7}} or newer.

[1] 
https://lists.apache.org/thread.html/r0dd7ff197b2e3bdd80a0326587ca3d0c22e10d1dba17c769d6da7d7a%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19784) Upgrade okhttp to 3.13.0 or newer

2020-10-23 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-19784:
-

 Summary: Upgrade okhttp to 3.13.0 or newer
 Key: FLINK-19784
 URL: https://issues.apache.org/jira/browse/FLINK-19784
 Project: Flink
  Issue Type: Task
  Components: Runtime / Metrics
Affects Versions: 1.11.2, 1.12.0
Reporter: Till Rohrmann
 Fix For: 1.12.0, 1.11.3


A user reported a dependency vulnerability which affects {{okhttp}} [1]. We 
should upgrade this dependency to {{3.13.0}} or newer. The dependency is used 
by the datadog reporter.

[1] 
https://lists.apache.org/thread.html/r0dd7ff197b2e3bdd80a0326587ca3d0c22e10d1dba17c769d6da7d7a%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19783) Upgrade mesos to 1.7 or newer

2020-10-23 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-19783:
-

 Summary: Upgrade mesos to 1.7 or newer
 Key: FLINK-19783
 URL: https://issues.apache.org/jira/browse/FLINK-19783
 Project: Flink
  Issue Type: Task
  Components: Deployment / Mesos
Affects Versions: 1.11.2, 1.12.0
Reporter: Till Rohrmann
 Fix For: 1.12.0, 1.11.3


A user reported a dependency vulnerability which affects {{mesos}} [1]. We 
should upgrade {{mesos}} to {{1.7.0}} or newer.

[1] 
https://lists.apache.org/thread.html/r0dd7ff197b2e3bdd80a0326587ca3d0c22e10d1dba17c769d6da7d7a%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19782) Upgrade antlr to 4.7.1 or newer

2020-10-23 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-19782:
-

 Summary: Upgrade antlr to 4.7.1 or newer
 Key: FLINK-19782
 URL: https://issues.apache.org/jira/browse/FLINK-19782
 Project: Flink
  Issue Type: Task
  Components: API / Python
Affects Versions: 1.11.2, 1.12.0
Reporter: Till Rohrmann
 Fix For: 1.12.0, 1.11.3


A user reported dependency vulnerabilities which affect {{antlr}} [1]. We 
should upgrade this dependency to {{4.7.1}} or newer.

[1] 
https://lists.apache.org/thread.html/r0dd7ff197b2e3bdd80a0326587ca3d0c22e10d1dba17c769d6da7d7a%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19781) Upgrade commons_codec to 1.13 or newer

2020-10-23 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-19781:
-

 Summary: Upgrade commons_codec to 1.13 or newer
 Key: FLINK-19781
 URL: https://issues.apache.org/jira/browse/FLINK-19781
 Project: Flink
  Issue Type: Task
  Components: Table SQL / Planner
Affects Versions: 1.11.2, 1.12.0
Reporter: Till Rohrmann
 Fix For: 1.12.0, 1.11.3


A user reported a dependency vulnerability which affects {{commons_codec}} [1]. 
We should try to upgrade this version to 1.13 or newer.

[1] 
https://lists.apache.org/thread.html/r0dd7ff197b2e3bdd80a0326587ca3d0c22e10d1dba17c769d6da7d7a%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE]FLIP-149: Introduce the upsert-kafka connector

2020-10-23 Thread Jingsong Li

+1

On Fri, Oct 23, 2020 at 3:52 PM Konstantin Knauf  wrote:

> +1
>
> On Fri, Oct 23, 2020 at 9:36 AM Jark Wu  wrote:
>
> > +1
> >
> > On Fri, 23 Oct 2020 at 15:25, Shengkai Fang  wrote:
> >
> > > Hi, all,
> > >
> > > I would like to start the vote for FLIP-149[1], which is discussed and
> > > reached a consensus in the discussion thread[2]. The vote will be open
> > > until 16:00(UTC+8) 28th Oct. (72h, exclude weekends), unless there is
> an
> > > objection or not enough votes.
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector
> > > [2]
> > >
> > >
> >
> https://lists.apache.org/thread.html/r83e3153377594276b2066e49e399ec05d127b58bb4ce0fde33309da2%40%3Cdev.flink.apache.org%3E
> > >
> >
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


-- 
Best, Jingsong Lee

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Jingsong Li

I see, I understand what you mean is avoiding the loss of historical data

Logically, another option is never clean up, so don't have to turn on
compact

I am OK with the implementation, It's that feeling shouldn't be a logical
limitation

Best,
Jingsong

On Fri, Oct 23, 2020 at 4:09 PM Kurt Young  wrote:

> To be precise, it means the Kakfa topic should set the configuration
> "cleanup.policy" to "compact" not "delete".
>
> Best,
> Kurt
>
>
> On Fri, Oct 23, 2020 at 4:01 PM Jingsong Li 
> wrote:
>
> > I just notice there is a limitation in the FLIP:
> >
> > > Generally speaking, the underlying topic of the upsert-kafka source
> must
> > be compacted. Besides, the underlying topic must have all the data with
> the
> > same key in the same partition, otherwise, the result will be wrong.
> >
> > According to my understanding, this is not accurate? Compact is an
> > optimization, not a limitation. It depends on users.
> >
> > I don't want to stop voting, just want to make it clear.
> >
> > Best,
> > Jingsong
> >
> > On Fri, Oct 23, 2020 at 3:16 PM Timo Walther  wrote:
> >
> > > +1 for voting
> > >
> > > Regards,
> > > Timo
> > >
> > > On 23.10.20 09:07, Jark Wu wrote:
> > > > Thanks Shengkai!
> > > >
> > > > +1 to start voting.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Fri, 23 Oct 2020 at 15:02, Shengkai Fang 
> wrote:
> > > >
> > > >> Add one more message, I have already updated the FLIP[1].
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector
> > > >>
> > > >> Shengkai Fang  于2020年10月23日周五 下午2:55写道：
> > > >>
> > > >>> Hi, all.
> > > >>> It seems we have reached a consensus on the FLIP. If no one has
> other
> > > >>> objections, I would like to start the vote for FLIP-149.
> > > >>>
> > > >>> Best,
> > > >>> Shengkai
> > > >>>
> > > >>> Jingsong Li  于2020年10月23日周五 下午2:25写道：
> > > >>>
> > >  Thanks for explanation,
> > > 
> > >  I am OK for `upsert`. Yes, Its concept has been accepted by many
> > > >> systems.
> > > 
> > >  Best,
> > >  Jingsong
> > > 
> > >  On Fri, Oct 23, 2020 at 12:38 PM Jark Wu 
> wrote:
> > > 
> > > > Hi Timo,
> > > >
> > > > I have some concerns about `kafka-cdc`,
> > > > 1) cdc is an abbreviation of Change Data Capture which is
> commonly
> > > >> used
> > >  for
> > > > databases, not for message queues.
> > > > 2) usually, cdc produces full content of changelog, including
> > > > UPDATE_BEFORE, however "upsert kafka" doesn't
> > > > 3) `kafka-cdc` sounds like a natively support for `debezium-json`
> > >  format,
> > > > however, it is not and even we don't want
> > > > "upsert kafka" to support "debezium-json"
> > > >
> > > >
> > > > Hi Jingsong,
> > > >
> > > > I think the terminology of "upsert" is fine, because Kafka also
> > uses
> > > > "upsert" to define such behavior in their official documentation
> > [1]:
> > > >
> > > >> a data record in a changelog stream is interpreted as an UPSERT
> > aka
> > > > INSERT/UPDATE
> > > >
> > > > Materialize uses the "UPSERT" keyword to define such behavior too
> > > [2].
> > > > Users have been requesting such feature using "upsert kafka"
> > >  terminology in
> > > > user mailing lists [3][4].
> > > > Many other systems support "UPSERT" statement natively, such as
> > > impala
> > >  [5],
> > > > SAP [6], Phoenix [7], Oracle NoSQL [8], etc..
> > > >
> > > > Therefore, I think we don't need to be afraid of introducing
> > "upsert"
> > > > terminology, it is widely accepted by users.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > > [1]:
> > > >
> > > >
> > > 
> > > >>
> > >
> >
> https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#streams_concepts_ktable
> > > > [2]:
> > > >
> > > >
> > > 
> > > >>
> > >
> >
> https://materialize.io/docs/sql/create-source/text-kafka/#upsert-on-a-kafka-topic
> > > > [3]:
> > > >
> > > >
> > > 
> > > >>
> > >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SQL-materialized-upsert-tables-td18482.html#a18503
> > > > [4]:
> > > >
> > > >
> > > 
> > > >>
> > >
> >
> http://apache-flink.147419.n8.nabble.com/Kafka-Sink-AppendStreamTableSink-doesn-t-support-consuming-update-changes-td5959.html
> > > > [5]:
> > > 
> https://impala.apache.org/docs/build/html/topics/impala_upsert.html
> > > > [6]:
> > > >
> > > >
> > > 
> > > >>
> > >
> >
> https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/ea8b6773be584203bcd99da76844c5ed.html
> > > > [7]: https://phoenix.apache.org/atomic_upsert.html
> > > > [8]:
> > > >
> > > >
> > > 
> > > >>
> > >
> >
> https://docs.oracle.com/en/database/other-databases/nosql-database/18.3/sqlfornosql/adding-table

Re: [VOTE] NEW FLIP-102: Add More Metrics to TaskManager

2020-10-23 Thread Till Rohrmann

Thanks for reviving this Flip Yadong! The changes look good to me and the
new memory UI looks awesome :-)

I think the reason why the REST issues are closed is because they are
already done. In that sense some of the work already finished.

+1 for adopting this FLIP and moving forward with updating the web UI
accordingly.

Cheers,
Till

On Fri, Oct 23, 2020 at 8:58 AM Jark Wu  wrote:

> +1
>
> Thanks for the work.
>
> Best,
> Jark
>
> On Fri, 23 Oct 2020 at 10:13, Xintong Song  wrote:
>
> > Thanks Yadong, Mattias and Lining for reviving this FLIP.
> >
> > I've seen so many users confused by the current webui page of task
> manager
> > metrics. This FLIP should definitely help them understand the memory
> > footprints and tune the configurations for task managers.
> >
> > The design part of this proposal looks really good to me. The UI is clear
> > and easy to understand. The metrics look correct to me.
> >
> > KIND REMINDER: I think the section `Implementation Proposal` in the FLIP
> > doc needs to be updated, so that we can vote on this FLIP. Currently, all
> > the tickets listed are closed.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Thu, Oct 22, 2020 at 5:53 PM Yadong Xie  wrote:
> >
> > > Hi all
> > >
> > > I want to start a new vote for FLIP-102, which proposes to add more
> > metrics
> > > to the task manager in web UI.
> > >
> > > The new FLIP-102 was revisited and adapted following the old ML
> > discussion
> > > <
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-102-Add-More-Metrics-to-TaskManager-td37898.html
> > > >
> > > .
> > >
> > > Thanks to Matthias and Lining's effort, more metrics are available. We
> > can
> > > match most of the effective configuration to the metrics just as Flink
> > Doc
> > > <
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/memory/mem_setup_tm.html#detailed-memory-model
> > > >
> > > describes now.
> > >
> > > The vote will last for at least 72 hours, following the consensus
> voting.
> > >
> > >
> > > FLIP-102 wiki:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager
> > >
> > >
> > > Discussion thread:
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-102-Add-More-Metrics-to-TaskManager-td37898.html
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html
> > >
> > > Thanks,
> > >
> > > Yadong
> > >
> >
>

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Kurt Young

To be precise, it means the Kakfa topic should set the configuration
"cleanup.policy" to "compact" not "delete".

Best,
Kurt


On Fri, Oct 23, 2020 at 4:01 PM Jingsong Li  wrote:

> I just notice there is a limitation in the FLIP:
>
> > Generally speaking, the underlying topic of the upsert-kafka source must
> be compacted. Besides, the underlying topic must have all the data with the
> same key in the same partition, otherwise, the result will be wrong.
>
> According to my understanding, this is not accurate? Compact is an
> optimization, not a limitation. It depends on users.
>
> I don't want to stop voting, just want to make it clear.
>
> Best,
> Jingsong
>
> On Fri, Oct 23, 2020 at 3:16 PM Timo Walther  wrote:
>
> > +1 for voting
> >
> > Regards,
> > Timo
> >
> > On 23.10.20 09:07, Jark Wu wrote:
> > > Thanks Shengkai!
> > >
> > > +1 to start voting.
> > >
> > > Best,
> > > Jark
> > >
> > > On Fri, 23 Oct 2020 at 15:02, Shengkai Fang  wrote:
> > >
> > >> Add one more message, I have already updated the FLIP[1].
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector
> > >>
> > >> Shengkai Fang  于2020年10月23日周五 下午2:55写道：
> > >>
> > >>> Hi, all.
> > >>> It seems we have reached a consensus on the FLIP. If no one has other
> > >>> objections, I would like to start the vote for FLIP-149.
> > >>>
> > >>> Best,
> > >>> Shengkai
> > >>>
> > >>> Jingsong Li  于2020年10月23日周五 下午2:25写道：
> > >>>
> >  Thanks for explanation,
> > 
> >  I am OK for `upsert`. Yes, Its concept has been accepted by many
> > >> systems.
> > 
> >  Best,
> >  Jingsong
> > 
> >  On Fri, Oct 23, 2020 at 12:38 PM Jark Wu  wrote:
> > 
> > > Hi Timo,
> > >
> > > I have some concerns about `kafka-cdc`,
> > > 1) cdc is an abbreviation of Change Data Capture which is commonly
> > >> used
> >  for
> > > databases, not for message queues.
> > > 2) usually, cdc produces full content of changelog, including
> > > UPDATE_BEFORE, however "upsert kafka" doesn't
> > > 3) `kafka-cdc` sounds like a natively support for `debezium-json`
> >  format,
> > > however, it is not and even we don't want
> > > "upsert kafka" to support "debezium-json"
> > >
> > >
> > > Hi Jingsong,
> > >
> > > I think the terminology of "upsert" is fine, because Kafka also
> uses
> > > "upsert" to define such behavior in their official documentation
> [1]:
> > >
> > >> a data record in a changelog stream is interpreted as an UPSERT
> aka
> > > INSERT/UPDATE
> > >
> > > Materialize uses the "UPSERT" keyword to define such behavior too
> > [2].
> > > Users have been requesting such feature using "upsert kafka"
> >  terminology in
> > > user mailing lists [3][4].
> > > Many other systems support "UPSERT" statement natively, such as
> > impala
> >  [5],
> > > SAP [6], Phoenix [7], Oracle NoSQL [8], etc..
> > >
> > > Therefore, I think we don't need to be afraid of introducing
> "upsert"
> > > terminology, it is widely accepted by users.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > [1]:
> > >
> > >
> > 
> > >>
> >
> https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#streams_concepts_ktable
> > > [2]:
> > >
> > >
> > 
> > >>
> >
> https://materialize.io/docs/sql/create-source/text-kafka/#upsert-on-a-kafka-topic
> > > [3]:
> > >
> > >
> > 
> > >>
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SQL-materialized-upsert-tables-td18482.html#a18503
> > > [4]:
> > >
> > >
> > 
> > >>
> >
> http://apache-flink.147419.n8.nabble.com/Kafka-Sink-AppendStreamTableSink-doesn-t-support-consuming-update-changes-td5959.html
> > > [5]:
> >  https://impala.apache.org/docs/build/html/topics/impala_upsert.html
> > > [6]:
> > >
> > >
> > 
> > >>
> >
> https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/ea8b6773be584203bcd99da76844c5ed.html
> > > [7]: https://phoenix.apache.org/atomic_upsert.html
> > > [8]:
> > >
> > >
> > 
> > >>
> >
> https://docs.oracle.com/en/database/other-databases/nosql-database/18.3/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html
> > >
> > > On Fri, 23 Oct 2020 at 10:36, Jingsong Li 
> >  wrote:
> > >
> > >> The `kafka-cdc` looks good to me.
> > >> We can even give options to indicate whether to turn on compact,
> >  because
> > >> compact is just an optimization?
> > >>
> > >> - ktable let me think about KSQL.
> > >> - kafka-compacted it is not just compacted, more than that, it
> still
> >  has
> > >> the ability of CDC
> > >> - upsert-kafka , upsert is back, and I don't really want to see it
> >  again
> > >> since we have CDC
> > >>
> > >> Best,
>

[jira] [Created] (FLINK-19780) FlinkRelMdDistinctRowCount#getDistinctRowCount(Calc) will always return 0 when number of rows are large

2020-10-23 Thread Caizhi Weng (Jira)

Caizhi Weng created FLINK-19780:
---

 Summary: FlinkRelMdDistinctRowCount#getDistinctRowCount(Calc) will 
always return 0 when number of rows are large
 Key: FLINK-19780
 URL: https://issues.apache.org/jira/browse/FLINK-19780
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Caizhi Weng
 Fix For: 1.12.0


Due to CALCITE-4351 {{FlinkRelMdDistinctRowCount#getDistinctRowCount(Calc)}} 
will always return 0 when number of rows are large.

What I would suggest is to introduce our own {{FlinkRelMdUtil#numDistinctVals}} 
to treat small and large inputs in different ways. For small inputs we use the 
more precise {{RelMdUtil#numDistinctVals}} and for large inputs we copy the 
old, approximated implementation of {{RelMdUtil#numDistinctVals}}.

This is a temporary solution. When CALCITE-4351 is fixed we should revert this 
commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Resuming Savepoint issue with upgraded Flink version 1.11.2

2020-10-23 Thread Till Rohrmann

Hi Partha,

have you set explicit operator uids [1]? This is usually recommended in
order to ensure that Flink can match operator state across re-submissions
in particular when doing job upgrades. If you haven't set the uids
explicitly, then Flink will generate them automatically. It could be the
case that this computation changed across different Flink versions.

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#assigning-operator-ids

Cheers,
Till

On Fri, Oct 23, 2020 at 7:20 AM Partha Mishra 
wrote:

> Hi,
>
> We are trying to save checkpoints for one of the flink job running in
> Flink version 1.9 and tried to resume the same flink job in Flink version
> 1.11.2. We are getting the below error when trying to restore the saved
> checkpoint in the newer flink version. Can
>
> Cannot map checkpoint/savepoint state for operator
> fbb4ef531e002f8fb3a2052db255adf5 to the new program, because the operator
> is not available in the new program.
>
>
> Complete Stack Trace :
> {"errors":["org.apache.flink.runtime.rest.handler.RestHandlerException:
> Could not execute application.\n\tat
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$1(JarRunHandler.java:103)\n\tat
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)\n\tat
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)\n\tat
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609)\n\tat
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat
> java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)\n\tat
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)\n\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
> java.lang.Thread.run(Thread.java:748)\nCaused by:
> java.util.concurrent.CompletionException:
> org.apache.flink.util.FlinkRuntimeException: Could not execute
> application.\n\tat
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)\n\tat
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)\n\tat
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)\n\t...
> 7 more\nCaused by: org.apache.flink.util.FlinkRuntimeException: Could not
> execute application.\n\tat
> org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:81)\n\tat
> org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:67)\n\tat
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$0(JarRunHandler.java:100)\n\tat
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)\n\t...
> 7 more\nCaused by:
> org.apache.flink.client.program.ProgramInvocationException: The main method
> caused an error: Failed to execute job
> 'ST1_100Services-preprod-Tumbling-ProcessedBased'.\n\tat
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:302)\n\tat
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198)\n\tat
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:149)\n\tat
> org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:78)\n\t...
> 10 more\nCaused by: org.apache.flink.util.FlinkException: Failed to execute
> job 'ST1_100Services-preprod-Tumbling-ProcessedBased'.\n\tat
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1821)\n\tat
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:128)\n\tat
> org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76)\n\tat
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1697)\n\tat
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:699)\n\tat
> com.man.ceon.cep.jobs.AnalyticService$.main(AnalyticService.scala:108)\n\tat
> com.man.ceon.cep.jobs.AnalyticService.main(AnalyticService.scala)\n\tat
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
> java.lang.reflect.Method.invoke(Method.jav

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Jingsong Li

I just notice there is a limitation in the FLIP:

> Generally speaking, the underlying topic of the upsert-kafka source must
be compacted. Besides, the underlying topic must have all the data with the
same key in the same partition, otherwise, the result will be wrong.

According to my understanding, this is not accurate? Compact is an
optimization, not a limitation. It depends on users.

I don't want to stop voting, just want to make it clear.

Best,
Jingsong

On Fri, Oct 23, 2020 at 3:16 PM Timo Walther  wrote:

> +1 for voting
>
> Regards,
> Timo
>
> On 23.10.20 09:07, Jark Wu wrote:
> > Thanks Shengkai!
> >
> > +1 to start voting.
> >
> > Best,
> > Jark
> >
> > On Fri, 23 Oct 2020 at 15:02, Shengkai Fang  wrote:
> >
> >> Add one more message, I have already updated the FLIP[1].
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector
> >>
> >> Shengkai Fang  于2020年10月23日周五 下午2:55写道：
> >>
> >>> Hi, all.
> >>> It seems we have reached a consensus on the FLIP. If no one has other
> >>> objections, I would like to start the vote for FLIP-149.
> >>>
> >>> Best,
> >>> Shengkai
> >>>
> >>> Jingsong Li  于2020年10月23日周五 下午2:25写道：
> >>>
>  Thanks for explanation,
> 
>  I am OK for `upsert`. Yes, Its concept has been accepted by many
> >> systems.
> 
>  Best,
>  Jingsong
> 
>  On Fri, Oct 23, 2020 at 12:38 PM Jark Wu  wrote:
> 
> > Hi Timo,
> >
> > I have some concerns about `kafka-cdc`,
> > 1) cdc is an abbreviation of Change Data Capture which is commonly
> >> used
>  for
> > databases, not for message queues.
> > 2) usually, cdc produces full content of changelog, including
> > UPDATE_BEFORE, however "upsert kafka" doesn't
> > 3) `kafka-cdc` sounds like a natively support for `debezium-json`
>  format,
> > however, it is not and even we don't want
> > "upsert kafka" to support "debezium-json"
> >
> >
> > Hi Jingsong,
> >
> > I think the terminology of "upsert" is fine, because Kafka also uses
> > "upsert" to define such behavior in their official documentation [1]:
> >
> >> a data record in a changelog stream is interpreted as an UPSERT aka
> > INSERT/UPDATE
> >
> > Materialize uses the "UPSERT" keyword to define such behavior too
> [2].
> > Users have been requesting such feature using "upsert kafka"
>  terminology in
> > user mailing lists [3][4].
> > Many other systems support "UPSERT" statement natively, such as
> impala
>  [5],
> > SAP [6], Phoenix [7], Oracle NoSQL [8], etc..
> >
> > Therefore, I think we don't need to be afraid of introducing "upsert"
> > terminology, it is widely accepted by users.
> >
> > Best,
> > Jark
> >
> >
> > [1]:
> >
> >
> 
> >>
> https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#streams_concepts_ktable
> > [2]:
> >
> >
> 
> >>
> https://materialize.io/docs/sql/create-source/text-kafka/#upsert-on-a-kafka-topic
> > [3]:
> >
> >
> 
> >>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SQL-materialized-upsert-tables-td18482.html#a18503
> > [4]:
> >
> >
> 
> >>
> http://apache-flink.147419.n8.nabble.com/Kafka-Sink-AppendStreamTableSink-doesn-t-support-consuming-update-changes-td5959.html
> > [5]:
>  https://impala.apache.org/docs/build/html/topics/impala_upsert.html
> > [6]:
> >
> >
> 
> >>
> https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/ea8b6773be584203bcd99da76844c5ed.html
> > [7]: https://phoenix.apache.org/atomic_upsert.html
> > [8]:
> >
> >
> 
> >>
> https://docs.oracle.com/en/database/other-databases/nosql-database/18.3/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html
> >
> > On Fri, 23 Oct 2020 at 10:36, Jingsong Li 
>  wrote:
> >
> >> The `kafka-cdc` looks good to me.
> >> We can even give options to indicate whether to turn on compact,
>  because
> >> compact is just an optimization?
> >>
> >> - ktable let me think about KSQL.
> >> - kafka-compacted it is not just compacted, more than that, it still
>  has
> >> the ability of CDC
> >> - upsert-kafka , upsert is back, and I don't really want to see it
>  again
> >> since we have CDC
> >>
> >> Best,
> >> Jingsong
> >>
> >> On Fri, Oct 23, 2020 at 2:21 AM Timo Walther 
>  wrote:
> >>
> >>> Hi Jark,
> >>>
> >>> I would be fine with `connector=upsert-kafka`. Another idea would
>  be to
> >>> align the name to other available Flink connectors [1]:
> >>>
> >>> `connector=kafka-cdc`.
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>> [1] https://github.com/ververica/flink-cdc-connectors
> >>>
> >>> On 22.10.20 17:17, Jark Wu wrote:
> >>

Re: [VOTE]FLIP-149: Introduce the upsert-kafka connector

2020-10-23 Thread Konstantin Knauf

+1

On Fri, Oct 23, 2020 at 9:36 AM Jark Wu  wrote:

> +1
>
> On Fri, 23 Oct 2020 at 15:25, Shengkai Fang  wrote:
>
> > Hi, all,
> >
> > I would like to start the vote for FLIP-149[1], which is discussed and
> > reached a consensus in the discussion thread[2]. The vote will be open
> > until 16:00(UTC+8) 28th Oct. (72h, exclude weekends), unless there is an
> > objection or not enough votes.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector
> > [2]
> >
> >
> https://lists.apache.org/thread.html/r83e3153377594276b2066e49e399ec05d127b58bb4ce0fde33309da2%40%3Cdev.flink.apache.org%3E
> >
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Re: [VOTE]FLIP-149: Introduce the upsert-kafka connector

2020-10-23 Thread Jark Wu

+1

On Fri, 23 Oct 2020 at 15:25, Shengkai Fang  wrote:

> Hi, all,
>
> I would like to start the vote for FLIP-149[1], which is discussed and
> reached a consensus in the discussion thread[2]. The vote will be open
> until 16:00(UTC+8) 28th Oct. (72h, exclude weekends), unless there is an
> objection or not enough votes.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector
> [2]
>
> https://lists.apache.org/thread.html/r83e3153377594276b2066e49e399ec05d127b58bb4ce0fde33309da2%40%3Cdev.flink.apache.org%3E
>

[VOTE]FLIP-149: Introduce the upsert-kafka connector

2020-10-23 Thread Shengkai Fang

Hi, all,

I would like to start the vote for FLIP-149[1], which is discussed and
reached a consensus in the discussion thread[2]. The vote will be open
until 16:00(UTC+8) 28th Oct. (72h, exclude weekends), unless there is an
objection or not enough votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector
[2]
https://lists.apache.org/thread.html/r83e3153377594276b2066e49e399ec05d127b58bb4ce0fde33309da2%40%3Cdev.flink.apache.org%3E

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Timo Walther


+1 for voting

Regards,
Timo

On 23.10.20 09:07, Jark Wu wrote:

Thanks Shengkai!

+1 to start voting.

Best,
Jark

On Fri, 23 Oct 2020 at 15:02, Shengkai Fang  wrote:


Add one more message, I have already updated the FLIP[1].

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector

Shengkai Fang  于2020年10月23日周五 下午2:55写道：


Hi, all.
It seems we have reached a consensus on the FLIP. If no one has other
objections, I would like to start the vote for FLIP-149.

Best,
Shengkai

Jingsong Li  于2020年10月23日周五 下午2:25写道：


Thanks for explanation,

I am OK for `upsert`. Yes, Its concept has been accepted by many

systems.


Best,
Jingsong

On Fri, Oct 23, 2020 at 12:38 PM Jark Wu  wrote:


Hi Timo,

I have some concerns about `kafka-cdc`,
1) cdc is an abbreviation of Change Data Capture which is commonly

used

for

databases, not for message queues.
2) usually, cdc produces full content of changelog, including
UPDATE_BEFORE, however "upsert kafka" doesn't
3) `kafka-cdc` sounds like a natively support for `debezium-json`

format,

however, it is not and even we don't want
"upsert kafka" to support "debezium-json"


Hi Jingsong,

I think the terminology of "upsert" is fine, because Kafka also uses
"upsert" to define such behavior in their official documentation [1]:


a data record in a changelog stream is interpreted as an UPSERT aka

INSERT/UPDATE

Materialize uses the "UPSERT" keyword to define such behavior too [2].
Users have been requesting such feature using "upsert kafka"

terminology in

user mailing lists [3][4].
Many other systems support "UPSERT" statement natively, such as impala

[5],

SAP [6], Phoenix [7], Oracle NoSQL [8], etc..

Therefore, I think we don't need to be afraid of introducing "upsert"
terminology, it is widely accepted by users.

Best,
Jark


[1]:





https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#streams_concepts_ktable

[2]:





https://materialize.io/docs/sql/create-source/text-kafka/#upsert-on-a-kafka-topic

[3]:





http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SQL-materialized-upsert-tables-td18482.html#a18503

[4]:





http://apache-flink.147419.n8.nabble.com/Kafka-Sink-AppendStreamTableSink-doesn-t-support-consuming-update-changes-td5959.html

[5]:

https://impala.apache.org/docs/build/html/topics/impala_upsert.html

[6]:





https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/ea8b6773be584203bcd99da76844c5ed.html

[7]: https://phoenix.apache.org/atomic_upsert.html
[8]:





https://docs.oracle.com/en/database/other-databases/nosql-database/18.3/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html


On Fri, 23 Oct 2020 at 10:36, Jingsong Li 

wrote:



The `kafka-cdc` looks good to me.
We can even give options to indicate whether to turn on compact,

because

compact is just an optimization?

- ktable let me think about KSQL.
- kafka-compacted it is not just compacted, more than that, it still

has

the ability of CDC
- upsert-kafka , upsert is back, and I don't really want to see it

again

since we have CDC

Best,
Jingsong

On Fri, Oct 23, 2020 at 2:21 AM Timo Walther 

wrote:



Hi Jark,

I would be fine with `connector=upsert-kafka`. Another idea would

be to

align the name to other available Flink connectors [1]:

`connector=kafka-cdc`.

Regards,
Timo

[1] https://github.com/ververica/flink-cdc-connectors

On 22.10.20 17:17, Jark Wu wrote:

Another name is "connector=upsert-kafka', I think this can solve

Timo's

concern on the "compacted" word.

Materialize also uses "ENVELOPE UPSERT" [1] keyword to identify

such

kafka

sources.
I think "upsert" is a well-known terminology widely used in many

systems

and matches the
   behavior of how we handle the kafka messages.

What do you think?

Best,
Jark

[1]:










https://materialize.io/docs/sql/create-source/text-kafka/#upsert-on-a-kafka-topic





On Thu, 22 Oct 2020 at 22:53, Kurt Young 

wrote:



Good validation messages can't solve the broken user

experience,

especially

that
such update mode option will implicitly make half of current

kafka

options

invalid or doesn't
make sense.

Best,
Kurt


On Thu, Oct 22, 2020 at 10:31 PM Jark Wu 

wrote:



Hi Timo, Seth,

The default value "inserting" of "mode" might be not suitable,
because "debezium-json" emits changelog messages which include

updates.


On Thu, 22 Oct 2020 at 22:10, Seth Wiesman <

s...@ververica.com>

wrote:



+1 for supporting upsert results into Kafka.

I have no comments on the implementation details.

As far as configuration goes, I tend to favor Timo's option

where

we

add

a

"mode" property to the existing Kafka table with default

value

"inserting".

If the mode is set to "updating" then the validation changes

to

the

new

requirements. I personally find it more intuitive than a

seperate

connector, my fear is users won't understand its the same

physical

kafka

sink under the hood and

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Jark Wu

Thanks Shengkai!

+1 to start voting.

Best,
Jark

On Fri, 23 Oct 2020 at 15:02, Shengkai Fang  wrote:

> Add one more message, I have already updated the FLIP[1].
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector
>
> Shengkai Fang  于2020年10月23日周五 下午2:55写道：
>
> > Hi, all.
> > It seems we have reached a consensus on the FLIP. If no one has other
> > objections, I would like to start the vote for FLIP-149.
> >
> > Best,
> > Shengkai
> >
> > Jingsong Li  于2020年10月23日周五 下午2:25写道：
> >
> >> Thanks for explanation,
> >>
> >> I am OK for `upsert`. Yes, Its concept has been accepted by many
> systems.
> >>
> >> Best,
> >> Jingsong
> >>
> >> On Fri, Oct 23, 2020 at 12:38 PM Jark Wu  wrote:
> >>
> >> > Hi Timo,
> >> >
> >> > I have some concerns about `kafka-cdc`,
> >> > 1) cdc is an abbreviation of Change Data Capture which is commonly
> used
> >> for
> >> > databases, not for message queues.
> >> > 2) usually, cdc produces full content of changelog, including
> >> > UPDATE_BEFORE, however "upsert kafka" doesn't
> >> > 3) `kafka-cdc` sounds like a natively support for `debezium-json`
> >> format,
> >> > however, it is not and even we don't want
> >> >"upsert kafka" to support "debezium-json"
> >> >
> >> >
> >> > Hi Jingsong,
> >> >
> >> > I think the terminology of "upsert" is fine, because Kafka also uses
> >> > "upsert" to define such behavior in their official documentation [1]:
> >> >
> >> > > a data record in a changelog stream is interpreted as an UPSERT aka
> >> > INSERT/UPDATE
> >> >
> >> > Materialize uses the "UPSERT" keyword to define such behavior too [2].
> >> > Users have been requesting such feature using "upsert kafka"
> >> terminology in
> >> > user mailing lists [3][4].
> >> > Many other systems support "UPSERT" statement natively, such as impala
> >> [5],
> >> > SAP [6], Phoenix [7], Oracle NoSQL [8], etc..
> >> >
> >> > Therefore, I think we don't need to be afraid of introducing "upsert"
> >> > terminology, it is widely accepted by users.
> >> >
> >> > Best,
> >> > Jark
> >> >
> >> >
> >> > [1]:
> >> >
> >> >
> >>
> https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#streams_concepts_ktable
> >> > [2]:
> >> >
> >> >
> >>
> https://materialize.io/docs/sql/create-source/text-kafka/#upsert-on-a-kafka-topic
> >> > [3]:
> >> >
> >> >
> >>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SQL-materialized-upsert-tables-td18482.html#a18503
> >> > [4]:
> >> >
> >> >
> >>
> http://apache-flink.147419.n8.nabble.com/Kafka-Sink-AppendStreamTableSink-doesn-t-support-consuming-update-changes-td5959.html
> >> > [5]:
> >> https://impala.apache.org/docs/build/html/topics/impala_upsert.html
> >> > [6]:
> >> >
> >> >
> >>
> https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/ea8b6773be584203bcd99da76844c5ed.html
> >> > [7]: https://phoenix.apache.org/atomic_upsert.html
> >> > [8]:
> >> >
> >> >
> >>
> https://docs.oracle.com/en/database/other-databases/nosql-database/18.3/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html
> >> >
> >> > On Fri, 23 Oct 2020 at 10:36, Jingsong Li 
> >> wrote:
> >> >
> >> > > The `kafka-cdc` looks good to me.
> >> > > We can even give options to indicate whether to turn on compact,
> >> because
> >> > > compact is just an optimization?
> >> > >
> >> > > - ktable let me think about KSQL.
> >> > > - kafka-compacted it is not just compacted, more than that, it still
> >> has
> >> > > the ability of CDC
> >> > > - upsert-kafka , upsert is back, and I don't really want to see it
> >> again
> >> > > since we have CDC
> >> > >
> >> > > Best,
> >> > > Jingsong
> >> > >
> >> > > On Fri, Oct 23, 2020 at 2:21 AM Timo Walther 
> >> wrote:
> >> > >
> >> > > > Hi Jark,
> >> > > >
> >> > > > I would be fine with `connector=upsert-kafka`. Another idea would
> >> be to
> >> > > > align the name to other available Flink connectors [1]:
> >> > > >
> >> > > > `connector=kafka-cdc`.
> >> > > >
> >> > > > Regards,
> >> > > > Timo
> >> > > >
> >> > > > [1] https://github.com/ververica/flink-cdc-connectors
> >> > > >
> >> > > > On 22.10.20 17:17, Jark Wu wrote:
> >> > > > > Another name is "connector=upsert-kafka', I think this can solve
> >> > Timo's
> >> > > > > concern on the "compacted" word.
> >> > > > >
> >> > > > > Materialize also uses "ENVELOPE UPSERT" [1] keyword to identify
> >> such
> >> > > > kafka
> >> > > > > sources.
> >> > > > > I think "upsert" is a well-known terminology widely used in many
> >> > > systems
> >> > > > > and matches the
> >> > > > >   behavior of how we handle the kafka messages.
> >> > > > >
> >> > > > > What do you think?
> >> > > > >
> >> > > > > Best,
> >> > > > > Jark
> >> > > > >
> >> > > > > [1]:
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://materialize.io/docs/sql/create-source/text-kafka/#upsert-on-a-kafka-topic
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Thu, 22 Oct 2020 at 22:53, K

[jira] [Created] (FLINK-19779) Remove the "record_" field name prefix for Confluent Avro format deserialization

2020-10-23 Thread Danny Chen (Jira)

Danny Chen created FLINK-19779:
--

 Summary: Remove the "record_" field name prefix for Confluent Avro 
format deserialization
 Key: FLINK-19779
 URL: https://issues.apache.org/jira/browse/FLINK-19779
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.12.0
Reporter: Danny Chen
 Fix For: 1.12.0


Reported by Maciej Bryński :

Problem is this is not compatible. I'm unable to read anything from Kafka using 
Confluent Registry. Example:
I have data in Kafka with following value schema:


{code:java}
{
  "type": "record",
  "name": "myrecord",
  "fields": [
{
  "name": "f1",
  "type": "string"
}
  ]
}
{code}

I'm creating table using this avro-confluent format:


{code:sql}
create table `test` (
`f1` STRING
) WITH (
  'connector' = 'kafka', 
  'topic' = 'test', 
  'properties.bootstrap.servers' = 'localhost:9092', 
  'properties.group.id' = 'test1234', 
   'scan.startup.mode' = 'earliest-offset', 
  'format' = 'avro-confluent'
  'avro-confluent.schema-registry.url' = 'http://localhost:8081'
);
{code}

When trying to select data I'm getting error:


{code:noformat}
SELECT * FROM test;
[ERROR] Could not execute SQL statement. Reason:
org.apache.avro.AvroTypeException: Found myrecord, expecting record, missing 
required field record_f1
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Shengkai Fang

Add one more message, I have already updated the FLIP[1].

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector

Shengkai Fang  于2020年10月23日周五 下午2:55写道：

> Hi, all.
> It seems we have reached a consensus on the FLIP. If no one has other
> objections, I would like to start the vote for FLIP-149.
>
> Best,
> Shengkai
>
> Jingsong Li  于2020年10月23日周五 下午2:25写道：
>
>> Thanks for explanation,
>>
>> I am OK for `upsert`. Yes, Its concept has been accepted by many systems.
>>
>> Best,
>> Jingsong
>>
>> On Fri, Oct 23, 2020 at 12:38 PM Jark Wu  wrote:
>>
>> > Hi Timo,
>> >
>> > I have some concerns about `kafka-cdc`,
>> > 1) cdc is an abbreviation of Change Data Capture which is commonly used
>> for
>> > databases, not for message queues.
>> > 2) usually, cdc produces full content of changelog, including
>> > UPDATE_BEFORE, however "upsert kafka" doesn't
>> > 3) `kafka-cdc` sounds like a natively support for `debezium-json`
>> format,
>> > however, it is not and even we don't want
>> >"upsert kafka" to support "debezium-json"
>> >
>> >
>> > Hi Jingsong,
>> >
>> > I think the terminology of "upsert" is fine, because Kafka also uses
>> > "upsert" to define such behavior in their official documentation [1]:
>> >
>> > > a data record in a changelog stream is interpreted as an UPSERT aka
>> > INSERT/UPDATE
>> >
>> > Materialize uses the "UPSERT" keyword to define such behavior too [2].
>> > Users have been requesting such feature using "upsert kafka"
>> terminology in
>> > user mailing lists [3][4].
>> > Many other systems support "UPSERT" statement natively, such as impala
>> [5],
>> > SAP [6], Phoenix [7], Oracle NoSQL [8], etc..
>> >
>> > Therefore, I think we don't need to be afraid of introducing "upsert"
>> > terminology, it is widely accepted by users.
>> >
>> > Best,
>> > Jark
>> >
>> >
>> > [1]:
>> >
>> >
>> https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#streams_concepts_ktable
>> > [2]:
>> >
>> >
>> https://materialize.io/docs/sql/create-source/text-kafka/#upsert-on-a-kafka-topic
>> > [3]:
>> >
>> >
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SQL-materialized-upsert-tables-td18482.html#a18503
>> > [4]:
>> >
>> >
>> http://apache-flink.147419.n8.nabble.com/Kafka-Sink-AppendStreamTableSink-doesn-t-support-consuming-update-changes-td5959.html
>> > [5]:
>> https://impala.apache.org/docs/build/html/topics/impala_upsert.html
>> > [6]:
>> >
>> >
>> https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/ea8b6773be584203bcd99da76844c5ed.html
>> > [7]: https://phoenix.apache.org/atomic_upsert.html
>> > [8]:
>> >
>> >
>> https://docs.oracle.com/en/database/other-databases/nosql-database/18.3/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html
>> >
>> > On Fri, 23 Oct 2020 at 10:36, Jingsong Li 
>> wrote:
>> >
>> > > The `kafka-cdc` looks good to me.
>> > > We can even give options to indicate whether to turn on compact,
>> because
>> > > compact is just an optimization?
>> > >
>> > > - ktable let me think about KSQL.
>> > > - kafka-compacted it is not just compacted, more than that, it still
>> has
>> > > the ability of CDC
>> > > - upsert-kafka , upsert is back, and I don't really want to see it
>> again
>> > > since we have CDC
>> > >
>> > > Best,
>> > > Jingsong
>> > >
>> > > On Fri, Oct 23, 2020 at 2:21 AM Timo Walther 
>> wrote:
>> > >
>> > > > Hi Jark,
>> > > >
>> > > > I would be fine with `connector=upsert-kafka`. Another idea would
>> be to
>> > > > align the name to other available Flink connectors [1]:
>> > > >
>> > > > `connector=kafka-cdc`.
>> > > >
>> > > > Regards,
>> > > > Timo
>> > > >
>> > > > [1] https://github.com/ververica/flink-cdc-connectors
>> > > >
>> > > > On 22.10.20 17:17, Jark Wu wrote:
>> > > > > Another name is "connector=upsert-kafka', I think this can solve
>> > Timo's
>> > > > > concern on the "compacted" word.
>> > > > >
>> > > > > Materialize also uses "ENVELOPE UPSERT" [1] keyword to identify
>> such
>> > > > kafka
>> > > > > sources.
>> > > > > I think "upsert" is a well-known terminology widely used in many
>> > > systems
>> > > > > and matches the
>> > > > >   behavior of how we handle the kafka messages.
>> > > > >
>> > > > > What do you think?
>> > > > >
>> > > > > Best,
>> > > > > Jark
>> > > > >
>> > > > > [1]:
>> > > > >
>> > > >
>> > >
>> >
>> https://materialize.io/docs/sql/create-source/text-kafka/#upsert-on-a-kafka-topic
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, 22 Oct 2020 at 22:53, Kurt Young 
>> wrote:
>> > > > >
>> > > > >> Good validation messages can't solve the broken user experience,
>> > > > especially
>> > > > >> that
>> > > > >> such update mode option will implicitly make half of current
>> kafka
>> > > > options
>> > > > >> invalid or doesn't
>> > > > >> make sense.
>> > > > >>
>> > > > >> Best,
>> > > > >> Kurt
>> > > > >>
>> > > > >>
>> > > > >> On Thu, Oct 22, 2020 at 10:31 PM Jark Wu 
>> wro

44 matches

Mail list logo