[jira] [Created] (FLINK-12263) Remove SINGLE_VALUE aggregate function from physical plan

2019-04-18 Thread Jark Wu (JIRA)
Jark Wu created FLINK-12263:
---

 Summary: Remove SINGLE_VALUE aggregate function from physical plan
 Key: FLINK-12263
 URL: https://issues.apache.org/jira/browse/FLINK-12263
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: Jark Wu


 SINGLE_VALUE is an aggregate function which only accepts one row, and throws 
exception when received more than one row.

 

For example: 
{code:sql}
SELECT a2, SUM(a1) FROM A GROUP BY a2 HAVING SUM(a1) > (SELECT SUM(a1) * 0.1 
FROM A)
{code}
will get a physical plan contains SINGLE_VALUE:
{code:sql}
+- NestedLoopJoin(joinType=[InnerJoin], where=[>(EXPR$1, $f0)], select=[a2, 
EXPR$1, $f0], build=[right], singleRowJoin=[true])
   :- HashAggregate(isMerge=[true], groupBy=[a2], select=[a2, Final_SUM(sum$0) 
AS EXPR$1])
   :  +- Exchange(distribution=[hash[a2]])
   : +- LocalHashAggregate(groupBy=[a2], select=[a2, Partial_SUM(a1) AS 
sum$0])
   :+- TableSourceScan(table=[[A, source: [TestTableSource(a1, a2)]]], 
fields=[a1, a2])
   +- Exchange(distribution=[broadcast])
  +- HashAggregate(isMerge=[true], select=[Final_SINGLE_VALUE(value$0, 
count$1) AS $f0])
 +- Exchange(distribution=[single])
+- LocalHashAggregate(select=[Partial_SINGLE_VALUE(EXPR$0) AS 
(value$0, count$1)])
   +- Calc(select=[*($f0, 0.1) AS EXPR$0])
  +- HashAggregate(isMerge=[true], select=[Final_SUM(sum$0) AS 
$f0])
 +- Exchange(distribution=[single])
+- LocalHashAggregate(select=[Partial_SUM(a1) AS sum$0])
   +- Calc(select=[a1])
  +- TableSourceScan(table=[[A, source: 
[TestTableSource(a1, a2)]]], fields=[a1, a2])
{code}
But SINGLE_VALUE is a bit wired in physical plan because the logical plan can 
make sure there is only one input row. Moreover it it also introduces 
additional overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12262) Benefit of Watching TV Shows on the YesMovies Website

2019-04-18 Thread Sansa Stark (JIRA)
Sansa Stark created FLINK-12262:
---

 Summary: Benefit of Watching TV Shows on the YesMovies Website
 Key: FLINK-12262
 URL: https://issues.apache.org/jira/browse/FLINK-12262
 Project: Flink
  Issue Type: Improvement
Reporter: Sansa Stark


[https://yesmoviess.to/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12261) Support e2e group window in blink batch

2019-04-18 Thread Jingsong Lee (JIRA)
Jingsong Lee created FLINK-12261:


 Summary: Support e2e group window in blink batch
 Key: FLINK-12261
 URL: https://issues.apache.org/jira/browse/FLINK-12261
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: Jingsong Lee
Assignee: Jingsong Lee


1.Add support for generating optimized logical plan for Group window. (Almost 
same to Legacy runner)

2.Let WindowProperty be independent of Expression.

3.Introduce batch execution sort group window

4.Introduce batch execution hash group window



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12260) Slot allocation failure by taskmanager registration timeout and race

2019-04-18 Thread Hwanju Kim (JIRA)
Hwanju Kim created FLINK-12260:
--

 Summary: Slot allocation failure by taskmanager registration 
timeout and race
 Key: FLINK-12260
 URL: https://issues.apache.org/jira/browse/FLINK-12260
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.6.3
Reporter: Hwanju Kim


 

In 1.6.2., we have seen slot allocation failure keep happening for long time. 
Having looked at the log, I see the following behavior:
 # TM sends a registration request R1 to resource manager.
 # R1 times out after 100ms, which is initial timeout.
 # TM retries a registration request R2 to resource manager (with timeout 
200ms).
 # R2 arrives first at resource manager and registered, and then TM gets 
successful response moving onto step 5 below.
 # On successful registration, R2's instance is put to taskManagerRegistrations
 # Then R1 arrives at resource manager and realizes the same TM resource ID is 
already registered, which then unregisters R2's instance ID from 
taskManagerRegistrations. A new instance ID for R1 is registered to 
workerRegistration.
 # R1's response is not handled though since it already timed out (see akka 
temp actor resolve failure below), hence no registration to 
taskManagerRegistrations.
 # TM keeps heartbeating to the resource manager with slot status.
 # Resource manager ignores this slot status, since taskManagerRegistrations 
contains R2, not R1, which replaced R2 in workerRegistration at step 6.
 # Slot request can never be fulfilled, timing out.

The following is the debug logs for the above steps:

 
{code:java}
JM log:

2019-04-11 22:39:40.000,Registering TaskManager with ResourceID 
46c8e0d0fcf2c306f11954a1040d5677 
(akka.ssl.tcp://flink@flink-taskmanager:6122/user/taskmanager_0) at 
ResourceManager

2019-04-11 22:39:40.000,Registering TaskManager 
46c8e0d0fcf2c306f11954a1040d5677 under deade132e2c41c52019cdc27977266cf at the 
SlotManager.

2019-04-11 22:39:40.000,Replacing old registration of TaskExecutor 
46c8e0d0fcf2c306f11954a1040d5677.

2019-04-11 22:39:40.000,Unregister TaskManager deade132e2c41c52019cdc27977266cf 
from the SlotManager.

2019-04-11 22:39:40.000,Registering TaskManager with ResourceID 
46c8e0d0fcf2c306f11954a1040d5677 
(akka.ssl.tcp://flink@flink-taskmanager:6122/user/taskmanager_0) at 
ResourceManager

TM log:

2019-04-11 22:39:40.000,Registration at ResourceManager attempt 1 
(timeout=100ms)

2019-04-11 22:39:40.000,Registration at ResourceManager 
(akka.ssl.tcp://flink@flink-jobmanager:6123/user/resourcemanager) attempt 1 
timed out after 100 ms

2019-04-11 22:39:40.000,Registration at ResourceManager attempt 2 
(timeout=200ms)

2019-04-11 22:39:40.000,Successful registration at resource manager 
akka.ssl.tcp://flink@flink-jobmanager:6123/user/resourcemanager under 
registration id deade132e2c41c52019cdc27977266cf.

2019-04-11 22:39:41.000,resolve of path sequence [/temp/$c] failed{code}
 

As RPC calls seem to use akka ask, which creates temporary source actor, I 
think multiple RPC calls could've arrived out or order by different actor pairs 
and the symptom above seems to be due to that. If so, it could have attempt 
account in the call argument to prevent unexpected unregistration? At this 
point, what I have done is only log analysis, so I could do further analysis, 
but before that wanted to check if it's a known issue. I also searched with 
some relevant terms and log pieces, but couldn't find the duplicate. Please 
deduplicate if any.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12259) Improve debuggability of method invocation failures in OptimizerPlanEnvironment

2019-04-18 Thread Gaurav (JIRA)
Gaurav created FLINK-12259:
--

 Summary: Improve debuggability of method invocation failures in 
OptimizerPlanEnvironment 
 Key: FLINK-12259
 URL: https://issues.apache.org/jira/browse/FLINK-12259
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Gaurav


In cases where method invocation fails without setting the `optimizerPlan`, 
Flink does not always dump the stderr/stdout. Hence, logging from the method is 
lost. The stacktrace alone is not always helpful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] [FLINK SQL] External catalog for Confluent Kafka

2019-04-18 Thread Bowen Li
Great! I linked that JIRA to FLINK-11275
 , and put it along with
JIRAs for HiveCatalog and GenericHiveMetastoreCatalog.

I have some initial thoughts on the solution you described, but I'll wait
till a more complete google design doc comes up, since this discussion is
about engaging community interest.

On Thu, Apr 18, 2019 at 8:39 AM Artsem Semianenka 
wrote:

> Sorry guys I've attached the wrong link for Jira ticket in the
> previous email. This is the correct link :
> https://issues.apache.org/jira/browse/FLINK-12256
>
> On Thu, 18 Apr 2019 at 18:29, Artsem Semianenka 
> wrote:
>
> > Thank you guys so much!
> >
> > You provided me a lot of helpful information.
> > I've created the Jira ticket[1] and added to it an initial description
> > only with the main purpose of the new feature. More detailed
> implementation
> > description will be added further.
> >
> > Hi Rong, to tell the truth, my first idea was to use some predefined
> > prefix/postfix for topic name and lookup mapping between
> > topic/schema-subject.  But the idea with a separated view of a logical
> > table with schema looks more elegant and flexible.
> >
> > Also, I thought about other approaches on how to define the mapping
> > between topic and schema-subject in case if they have different names:
> > Define the "subject" as a part of the table definition:
> >
> > Select * from kafka.topic.subject
> > or
> > Select * from kafka.topic#subject
> >
> > In case if the subject is not defined try to find a subject with the same
> > name as a topic.
> > If the subject still not found  -  take one last message and try to infer
> > the schema ( retrieve schema id from the message and get last defined
> > schema)
> >
> > But I see one disadvantage for all of these approaches: the subject name
> > may contain not supported in SQL symbols.
> >
> > I try to investigate how to escape the illegal symbols in the table name
> > definition.
> >
> > Thanks,
> > Artsem
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-11275
> >
> > On Thu, 18 Apr 2019 at 11:54, Timo Walther  wrote:
> >
> >> Hi Artsem,
> >>
> >> having a catalog support for Confluent Schema Registry would be a great
> >> addition. Although the implementation of FLIP-30 is still ongoing, we
> >> merged the stable interfaces today [0]. This should unblock people from
> >> contributing new catalog implementations. So you could already start
> >> designing an implementation. The implementation could be unit tested for
> >> now until it can also be registered in a table environment for
> >> integration tests/end-to-end tests.
> >>
> >> I hope we can reuse the existing SQL Kafka connector and SQL Avro
> format?
> >>
> >> Looking forward to a JIRA issue and a little design document how to
> >> connect the APIs.
> >>
> >> Thanks,
> >> Timo
> >>
> >> [0] https://github.com/apache/flink/pull/8007
> >>
> >> Am 18.04.19 um 07:03 schrieb Bowen Li:
> >> > Hi,
> >> >
> >> > Thanks Artsem and Rong for bringing up the demand from user
> >> perspective. A
> >> > Kafka/Confluent Schema Registry catalog would have a good use case in
> >> > Flink. We actually mentioned the potential of Unified Catalog APIs for
> >> > Kafka in our talk a couple weeks ago at Flink Forward SF [1], and glad
> >> to
> >> > learn you are interested in contributing. I think creating a JIRA
> ticket
> >> > with link in FLINK-11275 [2], and starting with discussions and design
> >> > would help to advance the effort.
> >> >
> >> > The most interesting part of Confluent Schema Registry, from my point
> of
> >> > view, is the core idea of smoothing real production experience and
> >> things
> >> > built around it, including versioned schemas, schema evolution and
> >> > compatibility checks, etc. Introducing a confluent-schema-registry
> >> backed
> >> > catalog to Flink may also help our design to benefit from those ideas.
> >> >
> >> > To add on Dawid's points. I assume the MVP for this project would be
> >> > supporting Kafka as streaming tables thru the new catalog. FLIP-30 is
> >> for
> >> > both streaming and batch tables, thus it won't be blocked by the whole
> >> > FLIP-30. I think as soon as we finish the table operation APIs,
> finalize
> >> > properties and formats, and connect the APIs to Calcite, this work can
> >> be
> >> > unblocked. Timo and Xuefu may have more things to say.
> >> >
> >> > [1]
> >> >
> >>
> https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-flink-forward-sf-2019/23
> >> > [2] https://issues.apache.org/jira/browse/FLINK-11275
> >> >
> >> > On Wed, Apr 17, 2019 at 6:39 PM Jark Wu  wrote:
> >> >
> >> >> Hi Rong,
> >> >>
> >> >> Thanks for pointing out the missing FLIPs in the FLIP main page. I
> >> added
> >> >> all the missing FLIP (incl. FLIP-14, FLIP-22, FLIP-29, FLIP-30,
> >> FLIP-31) to
> >> >> the page.
> >> >>
> >> >> I also include @xuef...@alibaba-inc.com 
> >> and @Bowen
> >> >> Li   into the thread who are familiar with the
> >> 

[jira] [Created] (FLINK-12258) Decouple CatalogManager with Calcite

2019-04-18 Thread Bowen Li (JIRA)
Bowen Li created FLINK-12258:


 Summary: Decouple CatalogManager with Calcite
 Key: FLINK-12258
 URL: https://issues.apache.org/jira/browse/FLINK-12258
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Bowen Li
Assignee: Dawid Wysakowicz
 Fix For: 1.9.0


FLINK-11476 introduced CatalogManager API and the FlinkCatalogManager 
implementation. 

Due to that it currently depends on Calcite, a dependency that 
flink-table-api-java doesn't have right now. We temporarily put 
FlinkCatalogManager in flink-table-planner-blink.

Idealy FlinkCatalogManager should be in flink-table-api-java module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12257) Convert CatalogBaseTable to org.apache.calcite.schema.Table so that planner can use unified catalog APIs

2019-04-18 Thread Bowen Li (JIRA)
Bowen Li created FLINK-12257:


 Summary: Convert CatalogBaseTable to 
org.apache.calcite.schema.Table so that planner can use unified catalog APIs
 Key: FLINK-12257
 URL: https://issues.apache.org/jira/browse/FLINK-12257
 Project: Flink
  Issue Type: Sub-task
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.9.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] [FLINK SQL] External catalog for Confluent Kafka

2019-04-18 Thread Artsem Semianenka
Sorry guys I've attached the wrong link for Jira ticket in the
previous email. This is the correct link :
https://issues.apache.org/jira/browse/FLINK-12256

On Thu, 18 Apr 2019 at 18:29, Artsem Semianenka 
wrote:

> Thank you guys so much!
>
> You provided me a lot of helpful information.
> I've created the Jira ticket[1] and added to it an initial description
> only with the main purpose of the new feature. More detailed implementation
> description will be added further.
>
> Hi Rong, to tell the truth, my first idea was to use some predefined
> prefix/postfix for topic name and lookup mapping between
> topic/schema-subject.  But the idea with a separated view of a logical
> table with schema looks more elegant and flexible.
>
> Also, I thought about other approaches on how to define the mapping
> between topic and schema-subject in case if they have different names:
> Define the "subject" as a part of the table definition:
>
> Select * from kafka.topic.subject
> or
> Select * from kafka.topic#subject
>
> In case if the subject is not defined try to find a subject with the same
> name as a topic.
> If the subject still not found  -  take one last message and try to infer
> the schema ( retrieve schema id from the message and get last defined
> schema)
>
> But I see one disadvantage for all of these approaches: the subject name
> may contain not supported in SQL symbols.
>
> I try to investigate how to escape the illegal symbols in the table name
> definition.
>
> Thanks,
> Artsem
>
> [1] https://issues.apache.org/jira/browse/FLINK-11275
>
> On Thu, 18 Apr 2019 at 11:54, Timo Walther  wrote:
>
>> Hi Artsem,
>>
>> having a catalog support for Confluent Schema Registry would be a great
>> addition. Although the implementation of FLIP-30 is still ongoing, we
>> merged the stable interfaces today [0]. This should unblock people from
>> contributing new catalog implementations. So you could already start
>> designing an implementation. The implementation could be unit tested for
>> now until it can also be registered in a table environment for
>> integration tests/end-to-end tests.
>>
>> I hope we can reuse the existing SQL Kafka connector and SQL Avro format?
>>
>> Looking forward to a JIRA issue and a little design document how to
>> connect the APIs.
>>
>> Thanks,
>> Timo
>>
>> [0] https://github.com/apache/flink/pull/8007
>>
>> Am 18.04.19 um 07:03 schrieb Bowen Li:
>> > Hi,
>> >
>> > Thanks Artsem and Rong for bringing up the demand from user
>> perspective. A
>> > Kafka/Confluent Schema Registry catalog would have a good use case in
>> > Flink. We actually mentioned the potential of Unified Catalog APIs for
>> > Kafka in our talk a couple weeks ago at Flink Forward SF [1], and glad
>> to
>> > learn you are interested in contributing. I think creating a JIRA ticket
>> > with link in FLINK-11275 [2], and starting with discussions and design
>> > would help to advance the effort.
>> >
>> > The most interesting part of Confluent Schema Registry, from my point of
>> > view, is the core idea of smoothing real production experience and
>> things
>> > built around it, including versioned schemas, schema evolution and
>> > compatibility checks, etc. Introducing a confluent-schema-registry
>> backed
>> > catalog to Flink may also help our design to benefit from those ideas.
>> >
>> > To add on Dawid's points. I assume the MVP for this project would be
>> > supporting Kafka as streaming tables thru the new catalog. FLIP-30 is
>> for
>> > both streaming and batch tables, thus it won't be blocked by the whole
>> > FLIP-30. I think as soon as we finish the table operation APIs, finalize
>> > properties and formats, and connect the APIs to Calcite, this work can
>> be
>> > unblocked. Timo and Xuefu may have more things to say.
>> >
>> > [1]
>> >
>> https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-flink-forward-sf-2019/23
>> > [2] https://issues.apache.org/jira/browse/FLINK-11275
>> >
>> > On Wed, Apr 17, 2019 at 6:39 PM Jark Wu  wrote:
>> >
>> >> Hi Rong,
>> >>
>> >> Thanks for pointing out the missing FLIPs in the FLIP main page. I
>> added
>> >> all the missing FLIP (incl. FLIP-14, FLIP-22, FLIP-29, FLIP-30,
>> FLIP-31) to
>> >> the page.
>> >>
>> >> I also include @xuef...@alibaba-inc.com 
>> and @Bowen
>> >> Li   into the thread who are familiar with the
>> >> latest catalog design.
>> >>
>> >> Thanks,
>> >> Jark
>> >>
>> >> On Thu, 18 Apr 2019 at 02:39, Rong Rong  wrote:
>> >>
>> >>> Thanks Artsem for looking into this problem and Thanks Dawid for
>> bringing
>> >>> up the discussion on FLIP-30.
>> >>>
>> >>> We've observe similar scenarios when we also would like to reuse the
>> >>> schema
>> >>> registry of both Kafka stream as well as the raw ingested kafka
>> messages
>> >>> in
>> >>> datalake.
>> >>> FYI another more catalog-oriented document can be found here [1]. I do
>> >>> have
>> >>> one question to follow up with Dawid's point (2): are we suggesting
>> that
>> >>> different kafka topics (e

Re: [DISCUSS] [FLINK SQL] External catalog for Confluent Kafka

2019-04-18 Thread Artsem Semianenka
Thank you guys so much!

You provided me a lot of helpful information.
I've created the Jira ticket[1] and added to it an initial description only
with the main purpose of the new feature. More detailed implementation
description will be added further.

Hi Rong, to tell the truth, my first idea was to use some predefined
prefix/postfix for topic name and lookup mapping between
topic/schema-subject.  But the idea with a separated view of a logical
table with schema looks more elegant and flexible.

Also, I thought about other approaches on how to define the mapping between
topic and schema-subject in case if they have different names:
Define the "subject" as a part of the table definition:

Select * from kafka.topic.subject
or
Select * from kafka.topic#subject

In case if the subject is not defined try to find a subject with the same
name as a topic.
If the subject still not found  -  take one last message and try to infer
the schema ( retrieve schema id from the message and get last defined
schema)

But I see one disadvantage for all of these approaches: the subject name
may contain not supported in SQL symbols.

I try to investigate how to escape the illegal symbols in the table name
definition.

Thanks,
Artsem

[1] https://issues.apache.org/jira/browse/FLINK-11275

On Thu, 18 Apr 2019 at 11:54, Timo Walther  wrote:

> Hi Artsem,
>
> having a catalog support for Confluent Schema Registry would be a great
> addition. Although the implementation of FLIP-30 is still ongoing, we
> merged the stable interfaces today [0]. This should unblock people from
> contributing new catalog implementations. So you could already start
> designing an implementation. The implementation could be unit tested for
> now until it can also be registered in a table environment for
> integration tests/end-to-end tests.
>
> I hope we can reuse the existing SQL Kafka connector and SQL Avro format?
>
> Looking forward to a JIRA issue and a little design document how to
> connect the APIs.
>
> Thanks,
> Timo
>
> [0] https://github.com/apache/flink/pull/8007
>
> Am 18.04.19 um 07:03 schrieb Bowen Li:
> > Hi,
> >
> > Thanks Artsem and Rong for bringing up the demand from user perspective.
> A
> > Kafka/Confluent Schema Registry catalog would have a good use case in
> > Flink. We actually mentioned the potential of Unified Catalog APIs for
> > Kafka in our talk a couple weeks ago at Flink Forward SF [1], and glad to
> > learn you are interested in contributing. I think creating a JIRA ticket
> > with link in FLINK-11275 [2], and starting with discussions and design
> > would help to advance the effort.
> >
> > The most interesting part of Confluent Schema Registry, from my point of
> > view, is the core idea of smoothing real production experience and things
> > built around it, including versioned schemas, schema evolution and
> > compatibility checks, etc. Introducing a confluent-schema-registry backed
> > catalog to Flink may also help our design to benefit from those ideas.
> >
> > To add on Dawid's points. I assume the MVP for this project would be
> > supporting Kafka as streaming tables thru the new catalog. FLIP-30 is for
> > both streaming and batch tables, thus it won't be blocked by the whole
> > FLIP-30. I think as soon as we finish the table operation APIs, finalize
> > properties and formats, and connect the APIs to Calcite, this work can be
> > unblocked. Timo and Xuefu may have more things to say.
> >
> > [1]
> >
> https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-flink-forward-sf-2019/23
> > [2] https://issues.apache.org/jira/browse/FLINK-11275
> >
> > On Wed, Apr 17, 2019 at 6:39 PM Jark Wu  wrote:
> >
> >> Hi Rong,
> >>
> >> Thanks for pointing out the missing FLIPs in the FLIP main page. I added
> >> all the missing FLIP (incl. FLIP-14, FLIP-22, FLIP-29, FLIP-30,
> FLIP-31) to
> >> the page.
> >>
> >> I also include @xuef...@alibaba-inc.com   and
> @Bowen
> >> Li   into the thread who are familiar with the
> >> latest catalog design.
> >>
> >> Thanks,
> >> Jark
> >>
> >> On Thu, 18 Apr 2019 at 02:39, Rong Rong  wrote:
> >>
> >>> Thanks Artsem for looking into this problem and Thanks Dawid for
> bringing
> >>> up the discussion on FLIP-30.
> >>>
> >>> We've observe similar scenarios when we also would like to reuse the
> >>> schema
> >>> registry of both Kafka stream as well as the raw ingested kafka
> messages
> >>> in
> >>> datalake.
> >>> FYI another more catalog-oriented document can be found here [1]. I do
> >>> have
> >>> one question to follow up with Dawid's point (2): are we suggesting
> that
> >>> different kafka topics (e.g. test-topic-prod, test-topic-non-prod, etc)
> >>> considered as a "view" of a logical table with schema (e.g.
> test-topic) ?
> >>>
> >>> Also, seems like a few of the FLIPs, like the FLIP-30 page is not
> linked
> >>> in
> >>> the main FLIP confluence wiki page [2] for some reason.
> >>> I tried to fix that be seems like I don't have permission. Maybe
> someone
> >>> can also

[jira] [Created] (FLINK-12256) Implement KafkaReadableCatalog

2019-04-18 Thread Artsem Semianenka (JIRA)
Artsem Semianenka created FLINK-12256:
-

 Summary: Implement KafkaReadableCatalog
 Key: FLINK-12256
 URL: https://issues.apache.org/jira/browse/FLINK-12256
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Kafka, Table SQL / Client
Affects Versions: 1.9.0
Reporter: Artsem Semianenka
Assignee: Artsem Semianenka


 KafkaReadableCatalog is a special implementation of ReadableCatalog interface 
(which introduced in 
[FLIP-30|https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs]
 )  to retrieve meta information such topic name/schema of the topic from 
Apache Kafka and Confluent Schema Registry. 

New ReadableCatalog allows a user to run SQL queries like:
{code:java}
Select * form kafka.topic_name  
{code}
without the need for manual definition of the table schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Apply for contributor permission

2019-04-18 Thread peng2251
Hi,


I want to contribute to Apache Flink.
Would you please give me the contributor permission?
My JIRA ID is TonySu.
Thanks.

[jira] [Created] (FLINK-12255) Rename a few exception class names that were migrated from scala

2019-04-18 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created FLINK-12255:
---

 Summary: Rename a few exception class names that were migrated 
from scala
 Key: FLINK-12255
 URL: https://issues.apache.org/jira/browse/FLINK-12255
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Affects Versions: 1.9.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


As a followup of FLINK-11474 and based on PR review comments, a few exception 
(java) classes will be renamed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


HA lock nodes, Checkpoints, and JobGraphs after failure

2019-04-18 Thread dyana . rose
Flink v1.7.1

After a Flink reboot we've been seeing some unexpected issues with excess 
retained checkpoints not being able to be removed from ZooKeeper after a new 
checkpoint is created.

I believe I've got my head around the role of ZK and lockNodes in Checkpointing 
after going through the code. Could you check my logic on this and add any 
insight, especially if I've got it wrong?

The situation:
1) Say we run JM1 and JM2 and retain 10 checkpoints and are running in HA with 
S3 as the backing store.

2) JM1 and JM2 start up and each instance of ZooKeeperStateHandleStore has its 
own lockNode UUID. JM1 is elected leader.

3) We submit a job, that JobGraph lockNode is added to ZK using JM1's JobGraph 
lockNode.

4) Checkpoints start rolling in, latest 10 are retained in ZK using JM1's 
checkpoint lockNode. We continue running, and checkpoints are successfully 
being created and excess checkpoints removed.

5) Both JM1 and JM2 now are rebooted.

6) The JobGraph is recovered by the leader, the job restarts from the latest 
checkpoint.

Now after every new checkpoint we see in the ZooKeeper logs:
INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@653] - Got user-level 
KeeperException when processing sessionid:0x1047715000d type:delete 
cxid:0x210 zxid:0x71091 txntype:-1 reqpath:n/a Error 
Path:/flink/job-name/checkpoints/2fa0d694e245f5ec1f709630c7c7bf69/0057813
 Error:KeeperErrorCode = Directory not empty for 
/flink/job-name/checkpoints/2fa0d694e245f5ec1f709630c7c7bf69/005781
with an increasing checkpoint id on each subsequent call.

When JM1 and JM2 were rebooted the lockNode UUIDs would have rolled, right? As 
the old checkpoints were created under the old UUID, the new JMs will never be 
able to remove the old retained checkpoints from ZooKeeper.

Is that correct?

If so, would this also happen with JobGraphs in the following situation (we saw 
this just recently where we had a JobGraph for a cancelled job still in ZK):

Steps 1 through 3 above, then:
4) JM1 fails over to JM2, the job keeps running uninterrupted. JM1 restarts.

5) some time later while JM2 is still leader we hard cancel the job and restart 
the JMs

In this case JM2 would successfully remove the job from s3, but because its 
lockNode is different from JM1 it cannot delete the lock file in the jobgraph 
folder and so can’t remove the jobgraph. Then Flink restarts and tries to 
process the JobGraph it has found, but the S3 files have been deleted.

Possible related closed issues (fixes went in v1.7.0): 
https://issues.apache.org/jira/browse/FLINK-10184 and 
https://issues.apache.org/jira/browse/FLINK-10255

Thanks for any insight,
Dyana


Re: [Discuss] Add JobListener (hook) in flink job lifecycle

2019-04-18 Thread Till Rohrmann
Thanks for starting this discussion Jeff. I can see the need for additional
hooks for third party integrations.

The thing I'm wondering is whether we really need/want to expose a
JobListener via the ExecutionEnvironment. The ExecutionEnvironment is
usually used by the user who writes the code and this person (I assume)
would not be really interested in these callbacks. If he would, then one
should rather think about a better programmatic job control where the
`ExecutionEnvironment#execute` call returns a `JobClient` instance.
Moreover, we would effectively make this part of the public API and every
implementation would need to offer it.

In your case, it could be sufficient to offer some hooks for the
ClusterClient or being able to provide a custom ClusterClient. The
ClusterClient is the component responsible for the job submission and
retrieval of the job result and, hence, would be able to signal when a job
has been submitted or completed.

Cheers,
Till

On Thu, Apr 18, 2019 at 8:57 AM vino yang  wrote:

> Hi Jeff,
>
> I personally like this proposal. From the perspective of programmability,
> the JobListener can make the third program more appreciable.
>
> The scene where I need the listener is the Flink cube engine for Apache
> Kylin. In the case, the Flink job program is embedded into the Kylin's
> executable context.
>
> If we could have this listener, it would be easier to integrate with Kylin.
>
> Best,
> Vino
>
> Jeff Zhang  于2019年4月18日周四 下午1:30写道:
>
>>
>> Hi All,
>>
>> I created FLINK-12214  for
>> adding JobListener (hook) in flink job lifecycle. Since this is a new
>> public api for flink, so I'd like to discuss it more widely in community to
>> get more feedback.
>>
>> The background and motivation is that I am integrating flink into apache
>> zeppelin (which is a notebook in case you
>> don't know). And I'd like to capture some job context (like jobId) in the
>> lifecycle of flink job (submission, executed, cancelled) so that I can
>> manipulate job in more fined grained control (e.g. I can capture the jobId
>> when job is submitted, and then associate it with one paragraph, and when
>> user click the cancel button, I can call the flink cancel api to cancel
>> this job)
>>
>> I believe other projects which integrate flink would need similar
>> mechanism. I plan to add api addJobListener in
>> ExecutionEnvironment/StreamExecutionEnvironment so that user can add
>> customized hook in flink job lifecycle.
>>
>> Here's draft interface JobListener.
>>
>> public interface JobListener {
>>
>> void onJobSubmitted(JobID jobId);
>>
>> void onJobExecuted(JobExecutionResult jobResult);
>>
>> void onJobCanceled(JobID jobId, String savepointPath);
>> }
>>
>> Let me know your comment and concern, thanks.
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>


Re: I want to subscribe Flink mail group

2019-04-18 Thread Fan Liya
If you only want to subscribe to the mail group, send a message to
dev-subscr...@flink.apache.org
If you want to be a flink contributor, please provide your apache account.

Best,
Liya Fan

On Thu, Apr 18, 2019 at 7:50 PM Armstrong  wrote:

> Thanks a lot
>


I want to subscribe Flink mail group

2019-04-18 Thread Armstrong
Thanks a lot


Re: [DISCUSS] Apache Flink at Season of Docs

2019-04-18 Thread jincheng sun
Hi Konstantin&all,

The  blog post is released, please see
https://flink.apache.org/news/2019/04/17/sod.html

And I think is better to spread the word by https://twitter.com/ApacheFlink,
but I found that only PMC can manage and publish messages.

Help spread the word here:
https://twitter.com/sunjincheng121/status/1118831783762481152

Best,
Jincheng

jincheng sun  于2019年4月18日周四 上午11:01写道:

> Thanks Konstantin!
>
> I registered as an alternative administrator,  and have had left few
> comments at the blog post.
>
> Best,
> Jincheng
>
> Fabian Hueske  于2019年4月18日周四 上午12:36写道:
>
>> Thanks Konstantin!
>>
>> I registered as a mentor and will have a look at the blog post.
>>
>> Best, Fabian
>>
>> Am Mi., 17. Apr. 2019 um 18:14 Uhr schrieb Konstantin Knauf <
>> konstan...@ververica.com>:
>>
>> > Hi everyone,
>> >
>> > a few updates on our application:
>> >
>> > 1. As Aizhamal (Thanks!) has suggested I also added the information to
>> > submit during our application to the Google doc [1] and Fabian added a
>> > description for the SQL project idea (Thanks!).
>> >
>> > 2. I had a quick chat with Fabian offline and we concluded, that the
>> "Flink
>> > Internals" project might not be good fit for Season of Docs after all,
>> > because, we think, the amount of mentoring by core developers that
>> would be
>> > necessary to produce such a documentation could not be guaranteed. Any
>> > opinions?
>> >
>> > 3. To submit our application, we need to publish our project ideas list.
>> > For this I have just opened a PR to add a small blog post about Season
>> of
>> > Docs [2]. Please have a look and provide feedback.
>> >
>> > 4. For mentors (Stephan, Fabian, Jark, David) please complete the mentor
>> > registration [3] by next Tuesday (*Application Deadline*)
>> >
>> > Cheers,
>> >
>> > Konstantin
>> >
>> > [1]
>> >
>> >
>> https://docs.google.com/document/d/1Up53jNsLztApn-mP76AB6xWUVGt3nwS9p6xQTiceKXo/edit?usp=sharing
>> > [2] https://github.com/apache/flink-web/pull/202
>> > [3]
>> >
>> >
>> https://docs.google.com/forms/d/e/1FAIpQLSe-JjGvaKKGWZOXxrorONhB8qN3mjPrB9ZVkcsntR73Cv_K7g/viewform
>> >
>> > On Mon, Apr 15, 2019 at 9:12 PM Aizhamal Nurmamat kyzy <
>> > aizha...@google.com>
>> > wrote:
>> >
>> > > +Konstantin Knauf  this is looking good,
>> > thanks
>> > > for sharing!
>> > >
>> > > I also created a similar doc for Apache Airflow [1]. It is a bit
>> messy,
>> > > but it has questions from the application form that you can work with.
>> > >
>> > > Cheers,
>> > > Aizhamal
>> > >
>> > > [1]
>> > >
>> >
>> https://docs.google.com/document/d/1HoL_yjNYiTAP9IxSlhx3EUnPFU4l9WOT9EnwBZjCZo0/edit#
>> > >
>> > >
>> > > On Mon, Apr 15, 2019 at 2:24 AM Robert Metzger 
>> > > wrote:
>> > >
>> > >> Hi all,
>> > >> I'm very happy to see this project happening!
>> > >>
>> > >> Thank you for the proposal Konstantin! One idea for the "related
>> > >> material": we could also link to talks or blog posts about concepts /
>> > >> monitoring / operations. Potential writers could feel overwhelmed by
>> our
>> > >> demand for improvements, without any additional material.
>> > >>
>> > >>
>> > >> On Mon, Apr 15, 2019 at 10:16 AM Konstantin Knauf <
>> > >> konstan...@ververica.com> wrote:
>> > >>
>> > >>> Hi everyone,
>> > >>>
>> > >>> thanks @Aizhamal Nurmamat kyzy . As we only
>> have
>> > >>> one
>> > >>> week left until the application deadline, I went ahead and created a
>> > >>> document for the project ideas [1]. I have added the description for
>> > the
>> > >>> "stream processing concepts" as well as the "deployment & operations
>> > >>> documentation" project idea. Please let me know what you think,
>> edit &
>> > >>> comment. We also need descriptions for the other two projects (Table
>> > >>> API/SQL & Flink Internals). @Fabian/@Jark/@Stephan can you chime in?
>> > >>>
>> > >>> Any more project ideas?
>> > >>>
>> > >>> Best,
>> > >>>
>> > >>> Konstantin
>> > >>>
>> > >>>
>> > >>> [1]
>> > >>>
>> > >>>
>> >
>> https://docs.google.com/document/d/1Up53jNsLztApn-mP76AB6xWUVGt3nwS9p6xQTiceKXo/edit?usp=sharing
>> > >>>
>> > >>>
>> > >>>
>> > >>> On Fri, Apr 12, 2019 at 6:50 PM Aizhamal Nurmamat kyzy <
>> > >>> aizha...@google.com>
>> > >>> wrote:
>> > >>>
>> > >>> > Hello everyone,
>> > >>> >
>> > >>> > @Konstantin Knauf  - yes, you are
>> correct.
>> > >>> > Between steps 1 and 2 though, the open source organization, in
>> this
>> > >>> case
>> > >>> > Flink, has to be selected by SoD as one of the participating orgs
>> > >>> *fingers
>> > >>> > crossed*.
>> > >>> >
>> > >>> > One tip about organizing ideas is that you want to communicate
>> > >>> potential
>> > >>> > projects to the tech writers that are applying. Just make sure the
>> > >>> scope of
>> > >>> > the project is clear to them. The SoD wants to set up the tech
>> > writers
>> > >>> for
>> > >>> > success by making sure the work can be done in the allotted time.
>> > >>> >
>> > >>> > Hope it helps.
>> > >>> >
>> > >>> > Aizhamal
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> 

[jira] [Created] (FLINK-12254) Expose the new type system through the API

2019-04-18 Thread Timo Walther (JIRA)
Timo Walther created FLINK-12254:


 Summary: Expose the new type system through the API
 Key: FLINK-12254
 URL: https://issues.apache.org/jira/browse/FLINK-12254
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Timo Walther
Assignee: Timo Walther


Exposes the new type system through API methods.

Introduces new methods, adds converters for backwards-compatibility, and 
deprecates old methods.

Adds checks to types that are not supported by the legacy planner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12253) Setup a class hierarchy for the new type system

2019-04-18 Thread Timo Walther (JIRA)
Timo Walther created FLINK-12253:


 Summary: Setup a class hierarchy for the new type system
 Key: FLINK-12253
 URL: https://issues.apache.org/jira/browse/FLINK-12253
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Timo Walther
Assignee: Timo Walther


Setup a new class hierarchy around {{DataType}} and {{LogicalType}} in 
{{table-common}}.

The classes implement the types listed in the table of FLIP-37.

The classes won't be connected to the API yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12252) Support not escaped table name for External Catalogs in INSERT SQL statement

2019-04-18 Thread Artsem Semianenka (JIRA)
Artsem Semianenka created FLINK-12252:
-

 Summary: Support not escaped table name for External Catalogs in 
INSERT SQL statement
 Key: FLINK-12252
 URL: https://issues.apache.org/jira/browse/FLINK-12252
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.8.0, 1.9.0
Reporter: Artsem Semianenka
Assignee: Artsem Semianenka


In case if I want to write SQL stream query which inserting data into the sink 
which described in the external catalog I have to escape full table description 
in quotes. Example :

INSERT INTO +`test.db3.tb3`+ SELECT a,b,c,d FROM test.db2.tb2

I see a discrepancy in query semantic because the SELECT statement described 
without quotes.

I propose to add support of queries without quotes in INSERT statement like 
 INSERT INTO test.db3.tb3 SELECT a,b,c,d FROM test.db2.tb2

Pull request is available



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12251) Rework the Table API & SQL type system

2019-04-18 Thread Timo Walther (JIRA)
Timo Walther created FLINK-12251:


 Summary: Rework the Table API & SQL type system
 Key: FLINK-12251
 URL: https://issues.apache.org/jira/browse/FLINK-12251
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Timo Walther
Assignee: Timo Walther


Currently, the Table & SQL API relies on Flink’s TypeInformation at different 
positions in the code base. The API uses it for conversion between 
DataSet/DataStream API, casting, and table schema representation. The planning 
for code generation and serialization of runtime operators.

The past has shown that TypeInformation is useful when converting between 
DataSet/DataStream API, however, it does not integrate nicely with SQLs type 
system and depends on the programming language that is used.

There is consensus to perform a big rework of the type system with a better 
long-term vision and closer semantics to other SQL vendors and the standard 
itself.

FLIP-37 discusses the mid-term and long-term plan for supported types and their 
semantics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12250) Rewrite assembleNewPartPath to let it return a new PartPath

2019-04-18 Thread Fokko Driesprong (JIRA)
Fokko Driesprong created FLINK-12250:


 Summary: Rewrite assembleNewPartPath to let it return a new 
PartPath
 Key: FLINK-12250
 URL: https://issues.apache.org/jira/browse/FLINK-12250
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Fokko Driesprong
Assignee: Fokko Driesprong


While debugging some code, I've noticed assembleNewPartPath does not really 
return a new path. Also rewrote the code a bit so the mutable inProgressPart is 
only changed in a single function. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] FLIP-37: Rework of the Table API Type System (Part 1)

2019-04-18 Thread Timo Walther

Hi everyone,

thanks for the valuable feedback I got so far. I updated the design 
document at different positions due to the comments I got online and 
offline.


In general, the feedback was very positive. It seems there is consensus 
to perform a big rework of the type system with a better long-term 
vision and closer semantics to other SQL vendors and the standard itself.


Since my last mail, we improved topics around date-time types (esp. due 
to the cross-platform discussions [0]). And improved the general 
inter-operability with UDF implementation and Java classes.


I would like to convert the design document [1] into a FLIP soon and 
start with an implementation of the basic structure. I'm sure we will 
have subsequent discussion about certain types or semantics but this can 
also happen in the corresponding issues/PRs.


@Rong: Sorry for not responding earlier. I think we should avoid 
crossposting design dicussions on both MLs, because there are a lot of 
them right now. People that are interested should follow this ML.


Thanks,
Timo

[0] 
https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#
[1] 
https://docs.google.com/document/d/1a9HUb6OaBIoj9IRfbILcMFPrOL7ALeZ3rVI66dvA2_U/edit#



Am 28.03.19 um 17:24 schrieb Rong Rong:

Thanks @Timo for starting this effort and preparing the document :-)

I took a pass and left some comments. I also very much like the idea of the
DataType and LogicalType separation.
As explained in the doc, we've also been looking into ways to improve the
type system so a huge +1 on our side.

One question I have is, since this touches many of the external systems
like Hive / Blink comparison, does it make sense to share this to a border
audience (such as user@) later to gather more feedbacks?

Looking forward to this change and would love to contribute in anyway I can!

Best,
Rong








On Thu, Mar 28, 2019 at 3:25 AM Timo Walther  wrote:


Maybe to give some background about Dawid's latest email:

Kurt raised some good points regarding the conversion of data types at
the boundaries of the API and SPI. After that, Dawid and I had a long
discussion of how users can define those boundaries in a nicer way. The
outcome of this discussion was similar to Blink's current distinction
between InternalTypes and ExternalTypes. I updated the document with a
improved structure of DataTypes (for users, API, SPI with conversion
information) and LogicalTypes (used internally and close to standard SQL
types).

Thanks for the feedback so far,
Timo

Am 28.03.19 um 11:18 schrieb Dawid Wysakowicz:

Another big +1 from my side. Thank you Timo for preparing the document!

I really look forward for this to have a standardized way of type
handling. This should solve loads of problems. I really like the
separation of logical type from its physical representation, I think we
should aim to introduce that and keep it separated.

Best,

Dawid

On 28/03/2019 08:51, Kurt Young wrote:

Big +1 to this! I left some comments in google doc.

Best,
Kurt


On Wed, Mar 27, 2019 at 11:32 PM Timo Walther 

wrote:

Hi everyone,

some of you might have already read FLIP-32 [1] where we've described

an

approximate roadmap of how to handle the big Blink SQL contribution and
how we can make the Table & SQL API equally important to the existing
DataStream API.

As mentioned there (Advance the API and Unblock New Features, Item 1),
the rework of the Table/SQL type system is a crucial step for

unblocking

future contributions. In particular, Flink's current type system has
many shortcomings which make an integration with other systems (such as
Hive), DDL statements, and a unified API for Java/Scala difficult. We
propose a new type system that is closer to the SQL standard,

integrates

better with other SQL vendors, and solves most of the type-related
issues we had in the past.

The design document for FLIP-37 can be found here:




https://docs.google.com/document/d/1a9HUb6OaBIoj9IRfbILcMFPrOL7ALeZ3rVI66dvA2_U/edit?usp=sharing

I'm looking forward to your feedback.

Thanks,
Timo

[1]



https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions








Re: [DISCUSS] [FLINK SQL] External catalog for Confluent Kafka

2019-04-18 Thread Timo Walther

Hi Artsem,

having a catalog support for Confluent Schema Registry would be a great 
addition. Although the implementation of FLIP-30 is still ongoing, we 
merged the stable interfaces today [0]. This should unblock people from 
contributing new catalog implementations. So you could already start 
designing an implementation. The implementation could be unit tested for 
now until it can also be registered in a table environment for 
integration tests/end-to-end tests.


I hope we can reuse the existing SQL Kafka connector and SQL Avro format?

Looking forward to a JIRA issue and a little design document how to 
connect the APIs.


Thanks,
Timo

[0] https://github.com/apache/flink/pull/8007

Am 18.04.19 um 07:03 schrieb Bowen Li:

Hi,

Thanks Artsem and Rong for bringing up the demand from user perspective. A
Kafka/Confluent Schema Registry catalog would have a good use case in
Flink. We actually mentioned the potential of Unified Catalog APIs for
Kafka in our talk a couple weeks ago at Flink Forward SF [1], and glad to
learn you are interested in contributing. I think creating a JIRA ticket
with link in FLINK-11275 [2], and starting with discussions and design
would help to advance the effort.

The most interesting part of Confluent Schema Registry, from my point of
view, is the core idea of smoothing real production experience and things
built around it, including versioned schemas, schema evolution and
compatibility checks, etc. Introducing a confluent-schema-registry backed
catalog to Flink may also help our design to benefit from those ideas.

To add on Dawid's points. I assume the MVP for this project would be
supporting Kafka as streaming tables thru the new catalog. FLIP-30 is for
both streaming and batch tables, thus it won't be blocked by the whole
FLIP-30. I think as soon as we finish the table operation APIs, finalize
properties and formats, and connect the APIs to Calcite, this work can be
unblocked. Timo and Xuefu may have more things to say.

[1]
https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-flink-forward-sf-2019/23
[2] https://issues.apache.org/jira/browse/FLINK-11275

On Wed, Apr 17, 2019 at 6:39 PM Jark Wu  wrote:


Hi Rong,

Thanks for pointing out the missing FLIPs in the FLIP main page. I added
all the missing FLIP (incl. FLIP-14, FLIP-22, FLIP-29, FLIP-30, FLIP-31) to
the page.

I also include @xuef...@alibaba-inc.com   and @Bowen
Li   into the thread who are familiar with the
latest catalog design.

Thanks,
Jark

On Thu, 18 Apr 2019 at 02:39, Rong Rong  wrote:


Thanks Artsem for looking into this problem and Thanks Dawid for bringing
up the discussion on FLIP-30.

We've observe similar scenarios when we also would like to reuse the
schema
registry of both Kafka stream as well as the raw ingested kafka messages
in
datalake.
FYI another more catalog-oriented document can be found here [1]. I do
have
one question to follow up with Dawid's point (2): are we suggesting that
different kafka topics (e.g. test-topic-prod, test-topic-non-prod, etc)
considered as a "view" of a logical table with schema (e.g. test-topic) ?

Also, seems like a few of the FLIPs, like the FLIP-30 page is not linked
in
the main FLIP confluence wiki page [2] for some reason.
I tried to fix that be seems like I don't have permission. Maybe someone
can also take a look?

Thanks,
Rong


[1]

https://docs.google.com/document/d/1Y9it78yaUvbv4g572ZK_lZnZaAGjqwM_EhjdOv4yJtw/edit#heading=h.xp424vn7ioei
[2]

https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

On Wed, Apr 17, 2019 at 2:30 AM Artsem Semianenka 
Thank you, Dawid!
This is very helpful information. I will keep a close eye on the

updates of

FLIP-30 and contribute whenever it possible.
I guess I may create a Jira ticket for my proposal in which I describe

the

idea and attach intermediate pull request based on current API(just for
initial discuss). But the final pull request definitely will be based on
FLIP-30 API.

Best regards,
Artsem

On Wed, 17 Apr 2019 at 09:36, Dawid Wysakowicz 
wrote:


Hi Artsem,

I think it totally makes sense to have a catalog for the Schema
Registry. It is also good to hear you want to contribute that. There

is

few important things to consider though:

1. The Catalog interface is currently under rework. You make take a

look

at the corresponding FLIP-30[1], and also have a look at the first PR
that introduces the basic interfaces[2]. I think it would be worth to
already consider those changes. I cc Xuefu who is participating in the
efforts of Catalog integration.

2. There is still ongoing discussion about what properties should we
store for streaming tables and how. I think this might affect (but

maybe

doesn't have to) the design of the Catalog.[3] I cc Timo who might

give

more insights if those should be blocking for the work around this

Catalog.

Best,

Dawid

[1]



https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs

[2] https://github.com/apache/

[jira] [Created] (FLINK-12249) Type equivalence check fails for Window Aggregates

2019-04-18 Thread Dawid Wysakowicz (JIRA)
Dawid Wysakowicz created FLINK-12249:


 Summary: Type equivalence check fails for Window Aggregates
 Key: FLINK-12249
 URL: https://issues.apache.org/jira/browse/FLINK-12249
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Legacy Planner, Tests
Affects Versions: 1.9.0
Reporter: Dawid Wysakowicz


Creating Aggregate node fails in rules: {{LogicalWindowAggregateRule}} and 
{{ExtendedAggregateExtractProjectRule}} if the only grouping expression is a 
window and
we compute aggregation on NON NULLABLE field.

The root cause for that, is how return type inference strategies in calcite 
work and how we handle window aggregates. Take 
{{org.apache.calcite.sql.type.ReturnTypes#AGG_SUM}} as an example, based on 
{{groupCount}} it adjusts type nullability based on groupCount.

Though we pass a false information as we strip down window aggregation from 
groupSet (in {{LogicalWindowAggregateRule}}).

One can reproduce this problem also with a unit test like this:

{code}
@Test
  def testTumbleFunction2() = {
 
val innerQuery =
  """
|SELECT
| CASE a WHEN 1 THEN 1 ELSE 99 END AS correct,
| rowtime
|FROM MyTable
  """.stripMargin

val sql =
  "SELECT " +
"  SUM(correct) as cnt, " +
"  TUMBLE_START(rowtime, INTERVAL '15' MINUTE) as wStart " +
s"FROM ($innerQuery) " +
"GROUP BY TUMBLE(rowtime, INTERVAL '15' MINUTE)"
val expected = ""
streamUtil.verifySql(sql, expected)
  }
{code}

This causes e2e tests to fail: 
https://travis-ci.org/apache/flink/builds/521183361?utm_source=slack&utm_medium=notificationhttps://travis-ci.org/apache/flink/builds/521183361?utm_source=slack&utm_medium=notification



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)