date:20200323

[jira] [Created] (FLINK-16739) PrestoS3FileSystemITCase#testSimpleFileWriteAndRead fails with no such key

2020-03-23 Thread Zhijiang (Jira)

Zhijiang created FLINK-16739:


 Summary: PrestoS3FileSystemITCase#testSimpleFileWriteAndRead fails 
with no such key
 Key: FLINK-16739
 URL: https://issues.apache.org/jira/browse/FLINK-16739
 Project: Flink
  Issue Type: Task
  Components: Connectors / FileSystem, Tests
Reporter: Zhijiang


Build: 
[https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6546=logs=e9af9cde-9a65-5281-a58e-2c8511d36983=df5b2bf5-bcff-5dc9-7626-50bed0866a82]

logs
{code:java}
2020-03-24T01:51:19.6988685Z [INFO] Running 
org.apache.flink.fs.s3presto.PrestoS3FileSystemBehaviorITCase
2020-03-24T01:51:21.6250893Z [INFO] Running 
org.apache.flink.fs.s3presto.PrestoS3FileSystemITCase
2020-03-24T01:51:25.1626385Z [WARNING] Tests run: 8, Failures: 0, Errors: 0, 
Skipped: 2, Time elapsed: 5.461 s - in 
org.apache.flink.fs.s3presto.PrestoS3FileSystemBehaviorITCase
2020-03-24T01:51:50.5503712Z [ERROR] Tests run: 7, Failures: 1, Errors: 1, 
Skipped: 0, Time elapsed: 28.922 s <<< FAILURE! - in 
org.apache.flink.fs.s3presto.PrestoS3FileSystemITCase
2020-03-24T01:51:50.5506010Z [ERROR] testSimpleFileWriteAndRead[Scheme = 
s3p](org.apache.flink.fs.s3presto.PrestoS3FileSystemITCase)  Time elapsed: 0.7 
s  <<< ERROR!
2020-03-24T01:51:50.5513057Z 
com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException:
 com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not 
exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request 
ID: A07D70A474EABC13; S3 Extended Request ID: 
R2ReW39oZ9ncoc82xb+V5h/EJV5/Mnsee+7uZ7cFMkliTQ/nKhvHPCDfr5zddbfUdR/S49VdbrA=), 
S3 Extended Request ID: 
R2ReW39oZ9ncoc82xb+V5h/EJV5/Mnsee+7uZ7cFMkliTQ/nKhvHPCDfr5zddbfUdR/S49VdbrA= 
(Path: s3://***/temp/tests-c79a578b-13d9-41ba-b73b-4f53fc965b96/test.txt)
2020-03-24T01:51:50.5517642Z Caused by: 
com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not 
exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request 
ID: A07D70A474EABC13; S3 Extended Request ID: 
R2ReW39oZ9ncoc82xb+V5h/EJV5/Mnsee+7uZ7cFMkliTQ/nKhvHPCDfr5zddbfUdR/S49VdbrA=)
2020-03-24T01:51:50.5519791Z 
2020-03-24T01:51:50.5520679Z [ERROR] 
org.apache.flink.fs.s3presto.PrestoS3FileSystemITCase  Time elapsed: 17.431 s  
<<< FAILURE!
2020-03-24T01:51:50.5521841Z java.lang.AssertionError: expected: but 
was:
2020-03-24T01:51:50.5522437Z 
2020-03-24T01:51:50.8966641Z [INFO] 
2020-03-24T01:51:50.8967386Z [INFO] Results:
2020-03-24T01:51:50.8967849Z [INFO] 
2020-03-24T01:51:50.8968357Z [ERROR] Failures: 
2020-03-24T01:51:50.8970933Z [ERROR]   
PrestoS3FileSystemITCase>AbstractHadoopFileSystemITTest.teardown:155->AbstractHadoopFileSystemITTest.checkPathExistence:61
 expected: but was:
2020-03-24T01:51:50.8972311Z [ERROR] Errors: 
2020-03-24T01:51:50.8973807Z [ERROR]   
PrestoS3FileSystemITCase>AbstractHadoopFileSystemITTest.testSimpleFileWriteAndRead:87
 Â» UnrecoverableS3Operation
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16738) Add convenience Kinesis generic egress record builder to the Python SDK

2020-03-23 Thread Tzu-Li (Gordon) Tai (Jira)

Tzu-Li (Gordon) Tai created FLINK-16738:
---

 Summary: Add convenience Kinesis generic egress record builder to 
the Python SDK
 Key: FLINK-16738
 URL: https://issues.apache.org/jira/browse/FLINK-16738
 Project: Flink
  Issue Type: Task
  Components: Stateful Functions
Affects Versions: statefun-2.0
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai


Similar to the {{kafka_egress_record}} builder method, we're missing a 
counterpart for the Kinesis generic egress in the StateFun Python SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16737) Remove KubernetesUtils#getContentFromFile

2020-03-23 Thread Canbin Zheng (Jira)

Canbin Zheng created FLINK-16737:


 Summary: Remove KubernetesUtils#getContentFromFile
 Key: FLINK-16737
 URL: https://issues.apache.org/jira/browse/FLINK-16737
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Reporter: Canbin Zheng
 Fix For: 1.11.0


Since {{org.apache.flink.util.FileUtils}} has already provided some utilities 
such as {{readFile}} or {{readFileUtf8}} for reading file contents, we can 
remove the {{KubernetesUtils#getContentFromFile}} and use the {{FileUtils}} 
tool instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16736) Expose Elasticsearch sink metric

2020-03-23 Thread YangLee (Jira)

YangLee created FLINK-16736:
---

 Summary: Expose Elasticsearch sink metric
 Key: FLINK-16736
 URL: https://issues.apache.org/jira/browse/FLINK-16736
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / ElasticSearch
Reporter: YangLee


I found it quite useful to expose metric of sink operator via flink metric 
system for debugging and monitoring purpose, such as:
 * total times of bulk request
 * total size(sum) of bulk request
 * total duration(sum) spent on bulk request

with these metrics, we could derive the average size and duration.

Elasticsearch SDK already exposed all these information by 
BulkProcessorListenr. A simple bridge should be enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: The question about the FLIP-45

2020-03-23 Thread Yu Li

Hi LakeShen,

Sorry for the late response.

For the first question, literally, the stop command should be used if one
means to stop the job instead of canceling it.

For the second one, since FLIP-45 is still under discussion [1] [2]
(although a little bit stalled due to priority), we still don't support
stop with (retained) checkpoint yet. Accordingly, there's no implementation
in our code base.

Best Regards,
Yu

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-45-Reinforce-Job-Stop-Semantic-td30161.html
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-45%3A+Reinforce+Job+Stop+Semantic


On Thu, 19 Mar 2020 at 20:50, LakeShen  wrote:

> Hi community,
>
> Now I am reading the FLIP-45 Reinforce Job Stop Semantic, I have three
> questions about it :
> 1. What the command to use to stop the Flink task, stop or cancel?
>
> 2. If use stop command to stop filnk task , but I see the flink source
> code , the stop command we can set the savepoint dir , if we didn't set it
> , the default savepoint dir will use . Both the target Savepoint  Dir or
> default savepoint dir are null , the flink will throw the exception. But in
> FLIP-45 , If retained checkpoint is enabled, we should always do a
> checkpoint when stopping job. I can't find this code.
>
> Thanks to your reply.
>
> Best regards,
> LakeShen
>

[jira] [Created] (FLINK-16735) FlinkKafkaProducer should check that it is not null before sending a record

2020-03-23 Thread Shangwen Tang (Jira)

Shangwen Tang created FLINK-16735:
-

 Summary: FlinkKafkaProducer should check that it is not null 
before sending a record
 Key: FLINK-16735
 URL: https://issues.apache.org/jira/browse/FLINK-16735
 Project: Flink
  Issue Type: Bug
  Components: Connectors / JDBC
Affects Versions: 1.10.0
Reporter: Shangwen Tang
 Attachments: image-2020-03-24-11-25-47-898.png

In our user scenario, some users implemented the KafkaSerializationSchema and 
sometimes returned a null record, resulting in a null pointer exception

!image-2020-03-24-11-25-47-898.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-23 Thread Jingsong Li

Hi Benchao,

> do you think we need to add more columns with various types?

I didn't list all types, but we should support primitive types, varchar,
Decimal, Timestamp and etc...
This can be done continuously.

Hi Benchao, Jark,
About console and blackhole, yes, they can have no schema, the schema can
be inferred by upstream node.
- But now we don't have this mechanism to do these configurable sink
things.
- If we want to support, we need a single way to support these two sinks.
- And uses can use "create table like" and others way to simplify DDL.

And for providing system/registered tables (`console` and `blackhole`):
- I have no strong opinion on these system tables. In SQL, will be "insert
into blackhole select a /*int*/, b /*string*/ from tableA", "insert into
blackhole select a /*double*/, b /*Map*/, c /*string*/ from tableB". It
seems that Blackhole is a universal thing, which makes me feel bad
intuitively.
- Can user override these tables? If can, we need ensure it can be
overwrite by catalog tables.

So I think we can leave these system tables to future too.
What do you think?

Best,
Jingsong Lee

On Mon, Mar 23, 2020 at 4:44 PM Jark Wu  wrote:

> Hi Jingsong,
>
> Regarding (2) and (3), I was thinking to ignore manually DDL work, so users
> can use them directly:
>
> # this will log results to `.out` files
> INSERT INTO console
> SELECT ...
>
> # this will drop all received records
> INSERT INTO blackhole
> SELECT ...
>
> Here `console` and `blackhole` are system sinks which is similar to system
> functions.
>
> Best,
> Jark
>
> On Mon, 23 Mar 2020 at 16:33, Benchao Li  wrote:
>
> > Hi Jingsong,
> >
> > Thanks for bring this up. Generally, it's a very good proposal.
> >
> > About data gen source, do you think we need to add more columns with
> > various types?
> >
> > About print sink, do we need to specify the schema?
> >
> > Jingsong Li  于2020年3月23日周一 下午1:51写道：
> >
> > > Thanks Bowen, Jark and Dian for your feedback and suggestions.
> > >
> > > I reorganize with your suggestions, and try to expose DDLs:
> > >
> > > 1.datagen source:
> > > - easy startup/test for streaming job
> > > - performance testing
> > >
> > > DDL:
> > > CREATE TABLE user (
> > > id BIGINT,
> > > age INT,
> > > description STRING
> > > ) WITH (
> > > 'connector.type' = 'datagen',
> > > 'connector.rows-per-second'='100',
> > > 'connector.total-records'='100',
> > >
> > > 'schema.id.generator' = 'sequence',
> > > 'schema.id.generator.start' = '1',
> > >
> > > 'schema.age.generator' = 'random',
> > > 'schema.age.generator.min' = '0',
> > > 'schema.age.generator.max' = '100',
> > >
> > > 'schema.description.generator' = 'random',
> > > 'schema.description.generator.length' = '100'
> > > )
> > >
> > > Default is random generator.
> > > Hi Jark, I don't want to bring complicated regularities, because it can
> > be
> > > done through computed columns. And it is hard to define
> > > standard regularities, I think we can leave it to the future.
> > >
> > > 2.print sink:
> > > - easy test for streaming job
> > > - be very useful in production debugging
> > >
> > > DDL:
> > > CREATE TABLE print_table (
> > > ...
> > > ) WITH (
> > > 'connector.type' = 'print'
> > > )
> > >
> > > 3.blackhole sink
> > > - very useful for high performance testing of Flink
> > > - I've also run into users trying UDF to output, not sink, so they need
> > > this sink as well.
> > >
> > > DDL:
> > > CREATE TABLE blackhole_table (
> > > ...
> > > ) WITH (
> > > 'connector.type' = 'blackhole'
> > > )
> > >
> > > What do you think?
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Mon, Mar 23, 2020 at 12:04 PM Dian Fu 
> wrote:
> > >
> > > > Thanks Jingsong for bringing up this discussion. +1 to this
> proposal. I
> > > > think Bowen's proposal makes much sense to me.
> > > >
> > > > This is also a painful problem for PyFlink users. Currently there is
> no
> > > > built-in easy-to-use table source/sink and it requires users to
> write a
> > > lot
> > > > of code to trying out PyFlink. This is especially painful for new
> users
> > > who
> > > > are not familiar with PyFlink/Flink. I have also encountered the
> > tedious
> > > > process Bowen encountered, e.g. writing random source connector,
> print
> > > sink
> > > > and also blackhole print sink as there are no built-in ones to use.
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > > > 在 2020年3月22日，上午11:24，Jark Wu  写道：
> > > > >
> > > > > +1 to Bowen's proposal. I also saw many requirements on such
> built-in
> > > > > connectors.
> > > > >
> > > > > I will leave some my thoughts here:
> > > > >
> > > > >> 1. datagen source (random source)
> > > > > I think we can merge the functinality of sequence-source into
> random
> > > > source
> > > > > to allow users to custom their data values.
> > > > > Flink can generate random data according to the field types, users
> > > > > can customize their values to be more domain specific, e.g.
>

[jira] [Created] (FLINK-16733) Refactor YarnClusterDescriptor

2020-03-23 Thread Xintong Song (Jira)

Xintong Song created FLINK-16733:


 Summary: Refactor YarnClusterDescriptor
 Key: FLINK-16733
 URL: https://issues.apache.org/jira/browse/FLINK-16733
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Xintong Song


Currently, YarnClusterDescriptor is not in a good shape. It has 1600+ lines of 
codes, of which the method {{startAppMaster}} alone has 400+ codes, leading to 
poor maintainability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16732) Failed to call Hive UDF with constant return value

2020-03-23 Thread Rui Li (Jira)

Rui Li created FLINK-16732:
--

 Summary: Failed to call Hive UDF with constant return value
 Key: FLINK-16732
 URL: https://issues.apache.org/jira/browse/FLINK-16732
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.10.0
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-23 Thread Kurt Young

Thanks Timo for the design doc.

In general I'm +1 to this, with a minor comment. Since we introduced dozens
interfaces all at once,
I'm not sure if it's good to annotate them with @PublicEnvolving already. I
can imagine these interfaces
 would only be stable after 1 or 2 major release. Given the fact that these
interfaces will only be used by
connector developers, how about we annotate them as @Internal first? After
more try out and feedbacks
from connector developers, we can improve those interfaces quickly and mark
them @PublicEnvolving
after we are confident about them.

BTW, if I'm not mistaken, the end users will only see Row with enhanced
RowKind. This is the only one
which actually goes public IMO.

Best,
Kurt


On Tue, Mar 24, 2020 at 9:24 AM Becket Qin  wrote:

> Hi Timo,
>
> Thanks for the proposal. I completely agree that the current Table
> connectors could be simplified quite a bit. I haven't finished reading
> everything, but here are some quick thoughts.
>
> Actually to me the biggest question is why should there be two different
> connector systems for DataStream and Table? What is the fundamental reason
> that is preventing us from merging them to one?
>
> The basic functionality of a connector is to provide capabilities to do IO
> and Serde. Conceptually, Table connectors should just be DataStream
> connectors that are dealing with Rows. It seems that quite a few of the
> special connector requirements are just a specific way to do IO / Serde.
> Taking SupportsFilterPushDown as an example, imagine we have the following
> interface:
>
> interface FilterableSource {
> void applyFilterable(Supplier predicate);
> }
>
> And if a ParquetSource would like to support filterable, it will become:
>
> class ParquetSource implements Source, FilterableSource(FilterPredicate> {
> ...
> }
>
> For Table, one just need to provide an predicate supplier that converts an
> Expression to the specified predicate type. This has a few benefit:
> 1. Same unified API for filterable for sources, regardless of DataStream or
> Table.
> 2. The  DataStream users now can also use the ExpressionToPredicate
> supplier if they want to.
>
> To summarize, my main point is that I am wondering if it is possible to
> have a single set of connector interface for both Table and DataStream,
> rather than having two hierarchies. I am not 100% sure if this would work,
> but if it works, this would be a huge win from both code maintenance and
> user experience perspective.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Tue, Mar 24, 2020 at 2:03 AM Dawid Wysakowicz 
> wrote:
>
> > Hi Timo,
> >
> > Thank you for the proposal. I think it is an important improvement that
> > will benefit many parts of the Table API. The proposal looks really good
> > to me and personally I would be comfortable with voting on the current
> > state.
> >
> > Best,
> >
> > Dawid
> >
> > On 23/03/2020 18:53, Timo Walther wrote:
> > > Hi everyone,
> > >
> > > I received some questions around how the new interfaces play together
> > > with formats and their factories.
> > >
> > > Furthermore, for MySQL or Postgres CDC logs, the format should be able
> > > to return a `ChangelogMode`.
> > >
> > > Also, I incorporated the feedback around the factory design in general.
> > >
> > > I added a new section `Factory Interfaces` to the design document.
> > > This should be helpful to understand the big picture and connecting
> > > the concepts.
> > >
> > > Please let me know what you think?
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > > On 18.03.20 13:43, Timo Walther wrote:
> > >> Hi Benchao,
> > >>
> > >> this is a very good question. I will update the FLIP about this.
> > >>
> > >> The legacy planner will not support the new interfaces. It will only
> > >> support the old interfaces. With the next release, I think the Blink
> > >> planner is stable enough to be the default one as well.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >> On 18.03.20 08:45, Benchao Li wrote:
> > >>> Hi Timo,
> > >>>
> > >>> Thank you and others for the efforts to prepare this FLIP.
> > >>>
> > >>> The FLIP LGTM generally.
> > >>>
> > >>> +1 for moving blink data structures to table-common, it's useful to
> > >>> udf too
> > >>> in the future.
> > >>> A little question is, do we plan to support the new interfaces and
> data
> > >>> types in legacy planner?
> > >>> Or we only plan to support these new interfaces in blink planner.
> > >>>
> > >>> And using primary keys from DDL instead of derived key information
> from
> > >>> each query is also a good idea,
> > >>> we met some use cases where this does not works very well before.
> > >>>
> > >>> This FLIP also makes the dependencies of table modules more clear, I
> > >>> like
> > >>> it very much.
> > >>>
> > >>> Timo Walther  于2020年3月17日周二 上午1:36写道：
> > >>>
> >  Hi everyone,
> > 
> >  I'm happy to present the results of long discussions that we had
> >  internally. Jark, Dawid, Aljoscha, Kurt, Jingsong, me,

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-23 Thread Becket Qin

Hi Timo,

Thanks for the proposal. I completely agree that the current Table
connectors could be simplified quite a bit. I haven't finished reading
everything, but here are some quick thoughts.

Actually to me the biggest question is why should there be two different
connector systems for DataStream and Table? What is the fundamental reason
that is preventing us from merging them to one?

The basic functionality of a connector is to provide capabilities to do IO
and Serde. Conceptually, Table connectors should just be DataStream
connectors that are dealing with Rows. It seems that quite a few of the
special connector requirements are just a specific way to do IO / Serde.
Taking SupportsFilterPushDown as an example, imagine we have the following
interface:

interface FilterableSource {
void applyFilterable(Supplier predicate);
}

And if a ParquetSource would like to support filterable, it will become:

class ParquetSource implements Source, FilterableSource(FilterPredicate> {
...
}

For Table, one just need to provide an predicate supplier that converts an
Expression to the specified predicate type. This has a few benefit:
1. Same unified API for filterable for sources, regardless of DataStream or
Table.
2. The  DataStream users now can also use the ExpressionToPredicate
supplier if they want to.

To summarize, my main point is that I am wondering if it is possible to
have a single set of connector interface for both Table and DataStream,
rather than having two hierarchies. I am not 100% sure if this would work,
but if it works, this would be a huge win from both code maintenance and
user experience perspective.

Thanks,

Jiangjie (Becket) Qin

On Tue, Mar 24, 2020 at 2:03 AM Dawid Wysakowicz 
wrote:

> Hi Timo,
>
> Thank you for the proposal. I think it is an important improvement that
> will benefit many parts of the Table API. The proposal looks really good
> to me and personally I would be comfortable with voting on the current
> state.
>
> Best,
>
> Dawid
>
> On 23/03/2020 18:53, Timo Walther wrote:
> > Hi everyone,
> >
> > I received some questions around how the new interfaces play together
> > with formats and their factories.
> >
> > Furthermore, for MySQL or Postgres CDC logs, the format should be able
> > to return a `ChangelogMode`.
> >
> > Also, I incorporated the feedback around the factory design in general.
> >
> > I added a new section `Factory Interfaces` to the design document.
> > This should be helpful to understand the big picture and connecting
> > the concepts.
> >
> > Please let me know what you think?
> >
> > Thanks,
> > Timo
> >
> >
> > On 18.03.20 13:43, Timo Walther wrote:
> >> Hi Benchao,
> >>
> >> this is a very good question. I will update the FLIP about this.
> >>
> >> The legacy planner will not support the new interfaces. It will only
> >> support the old interfaces. With the next release, I think the Blink
> >> planner is stable enough to be the default one as well.
> >>
> >> Regards,
> >> Timo
> >>
> >> On 18.03.20 08:45, Benchao Li wrote:
> >>> Hi Timo,
> >>>
> >>> Thank you and others for the efforts to prepare this FLIP.
> >>>
> >>> The FLIP LGTM generally.
> >>>
> >>> +1 for moving blink data structures to table-common, it's useful to
> >>> udf too
> >>> in the future.
> >>> A little question is, do we plan to support the new interfaces and data
> >>> types in legacy planner?
> >>> Or we only plan to support these new interfaces in blink planner.
> >>>
> >>> And using primary keys from DDL instead of derived key information from
> >>> each query is also a good idea,
> >>> we met some use cases where this does not works very well before.
> >>>
> >>> This FLIP also makes the dependencies of table modules more clear, I
> >>> like
> >>> it very much.
> >>>
> >>> Timo Walther  于2020年3月17日周二 上午1:36写道：
> >>>
>  Hi everyone,
> 
>  I'm happy to present the results of long discussions that we had
>  internally. Jark, Dawid, Aljoscha, Kurt, Jingsong, me, and many more
>  have contributed to this design document.
> 
>  We would like to propose new long-term table source and table sink
>  interfaces:
> 
> 
> 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> 
> 
>  This is a requirement for FLIP-105 and finalizing FLIP-32.
> 
>  The goals of this FLIP are:
> 
>  - Simplify the current interface architecture:
>    - Merge upsert, retract, and append sinks.
>    - Unify batch and streaming sources.
>    - Unify batch and streaming sinks.
> 
>  - Allow sources to produce a changelog:
>    - UpsertTableSources have been requested a lot by users. Now
>  is the
>  time to open the internal planner capabilities via the new interfaces.
>    - According to FLIP-105, we would like to support changelogs for
>  processing formats such as Debezium.
> 
>  - Don't rely on DataStream API for

[jira] [Created] (FLINK-16731) Support show partitions table command in sql client

2020-03-23 Thread Jun Zhang (Jira)

Jun Zhang created FLINK-16731:
-

 Summary: Support  show partitions table command in sql client
 Key: FLINK-16731
 URL: https://issues.apache.org/jira/browse/FLINK-16731
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Client
Affects Versions: 1.10.0
Reporter: Jun Zhang
 Fix For: 1.11.0


Add a SHOW PARTITIONS TABLE command in sql client to support show the partition 
information of the partition table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16730) Python SDK Getting Started

2020-03-23 Thread Seth Wiesman (Jira)

Seth Wiesman created FLINK-16730:


 Summary: Python SDK Getting Started
 Key: FLINK-16730
 URL: https://issues.apache.org/jira/browse/FLINK-16730
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Seth Wiesman


We should add a python specific version of walkthrough for users to quickly get 
started. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [Dev Blog] Migrating Flink's CI Infrastructure from Travis CI to Azure Pipelines

2020-03-23 Thread Arvid Heise

Thank you Robert! (also thanks for incorporating my feedback so swiftly)

On Mon, Mar 23, 2020 at 8:54 PM Seth Wiesman  wrote:

> Very interesting! No questions but thank you for taking the initiative to
> put out the first dev blog.
>
> Seth
>
> > On Mar 23, 2020, at 5:14 AM, Robert Metzger  wrote:
> >
> > Hi all,
> >
> > I have just published the first post to the dev blog:
> >
> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
> > .
> >
> > I'm looking forward to your feedback and questions on the article :)
> >
> > Best,
> > Robert
>

Re: [Dev Blog] Migrating Flink's CI Infrastructure from Travis CI to Azure Pipelines

2020-03-23 Thread Seth Wiesman

Very interesting! No questions but thank you for taking the initiative to put 
out the first dev blog.

Seth

> On Mar 23, 2020, at 5:14 AM, Robert Metzger  wrote:
> 
> Hi all,
> 
> I have just published the first post to the dev blog:
> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
> .
> 
> I'm looking forward to your feedback and questions on the article :)
> 
> Best,
> Robert

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-23 Thread Dawid Wysakowicz

Hi Timo,

Thank you for the proposal. I think it is an important improvement that
will benefit many parts of the Table API. The proposal looks really good
to me and personally I would be comfortable with voting on the current
state.

Best,

Dawid

On 23/03/2020 18:53, Timo Walther wrote:
> Hi everyone,
>
> I received some questions around how the new interfaces play together
> with formats and their factories.
>
> Furthermore, for MySQL or Postgres CDC logs, the format should be able
> to return a `ChangelogMode`.
>
> Also, I incorporated the feedback around the factory design in general.
>
> I added a new section `Factory Interfaces` to the design document.
> This should be helpful to understand the big picture and connecting
> the concepts.
>
> Please let me know what you think?
>
> Thanks,
> Timo
>
>
> On 18.03.20 13:43, Timo Walther wrote:
>> Hi Benchao,
>>
>> this is a very good question. I will update the FLIP about this.
>>
>> The legacy planner will not support the new interfaces. It will only
>> support the old interfaces. With the next release, I think the Blink
>> planner is stable enough to be the default one as well.
>>
>> Regards,
>> Timo
>>
>> On 18.03.20 08:45, Benchao Li wrote:
>>> Hi Timo,
>>>
>>> Thank you and others for the efforts to prepare this FLIP.
>>>
>>> The FLIP LGTM generally.
>>>
>>> +1 for moving blink data structures to table-common, it's useful to
>>> udf too
>>> in the future.
>>> A little question is, do we plan to support the new interfaces and data
>>> types in legacy planner?
>>> Or we only plan to support these new interfaces in blink planner.
>>>
>>> And using primary keys from DDL instead of derived key information from
>>> each query is also a good idea,
>>> we met some use cases where this does not works very well before.
>>>
>>> This FLIP also makes the dependencies of table modules more clear, I
>>> like
>>> it very much.
>>>
>>> Timo Walther  于2020年3月17日周二 上午1:36写道：
>>>
 Hi everyone,

 I'm happy to present the results of long discussions that we had
 internally. Jark, Dawid, Aljoscha, Kurt, Jingsong, me, and many more
 have contributed to this design document.

 We would like to propose new long-term table source and table sink
 interfaces:


 https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces


 This is a requirement for FLIP-105 and finalizing FLIP-32.

 The goals of this FLIP are:

 - Simplify the current interface architecture:
   - Merge upsert, retract, and append sinks.
   - Unify batch and streaming sources.
   - Unify batch and streaming sinks.

 - Allow sources to produce a changelog:
   - UpsertTableSources have been requested a lot by users. Now
 is the
 time to open the internal planner capabilities via the new interfaces.
   - According to FLIP-105, we would like to support changelogs for
 processing formats such as Debezium.

 - Don't rely on DataStream API for source and sinks:
   - According to FLIP-32, the Table API and SQL should be
 independent
 of the DataStream API which is why the `table-common` module has no
 dependencies on `flink-streaming-java`.
   - Source and sink implementations should only depend on the
 `table-common` module after FLIP-27.
   - Until FLIP-27 is ready, we still put most of the interfaces in
 `table-common` and strictly separate interfaces that communicate
 with a
 planner and actual runtime reader/writers.

 - Implement efficient sources and sinks without planner dependencies:
   - Make Blink's internal data structures available to connectors.
   - Introduce stable interfaces for data structures that can be
 marked as `@PublicEvolving`.
   - Only require dependencies on `flink-table-common` in the
 future

 It finalizes the concept of dynamic tables and consideres how all
 source/sink related classes play together.

 We look forward to your feedback.

 Regards,
 Timo

>>>
>>>
>



signature.asc
Description: OpenPGP digital signature

Dynamic Flink SQL

2020-03-23 Thread Krzysztof Zarzycki

Dear Flink community!

In our company we have implemented a system that realize the dynamic
business rules pattern. We spoke about it during Flink Forward 2019
https://www.youtube.com/watch?v=CyrQ5B0exqU.
The system is a great success and we would like to improve it. Let me
shortly mention what the system does:
* We have a Flink job with the engine that applies business rules on
multiple data streams. These rules find patterns in data, produce complex
events on these patterns.
* The engine is built on top of CoProcessFunction, the rules are
preimplemented using state and timers.
* The engine accepts control messages, that deliver configuration of the
rules, and start the instances of the rules. There might be many rule
instances with different configurations running in parallel.
* Data streams are routed to those rules, to all instances.

The *advantages* of this design are:
  * *The performance is superb. *The key to it is that we read data from
the Kafka topic once, deserialize once, shuffle it once (thankfully we have
one partitioning key) and then apply over 100 rule instances needing the
same data.
* We are able to deploy multiple rule instances dynamically without
starting/stopping the job.

Especially the performance is crucial, we have up to 500K events/s
processed by 100 of rules on less than 100 of cores. I can't imagine having
100 of Flink SQL queries each consuming these streams from Kafka on such a
cluster.

The main *painpoints *of the design is:
* to deploy new business rule kind, we need to predevelop the rule template
with use of our SDK. *We can't use* *great Flink CEP*, *Flink SQL
libraries.* Which are getting stronger every day. Flink SQL with
MATCH_RECOGNIZE would fit perfectly for our cases.
* The isolation of the rules is weak. There are many rules running per job.
One fails, the whole job fails.
* There is one set of Kafka offsets, one watermark, one checkpoint for all
the rules.
* We have one just distribution key. Although that can be overcome.

I would like to focus on solving the *first point*. We can live with the
rest.

*Question to the community*: Do you have ideas how to make it possible to
develop with use of Flink SQL with MATCH_RECOGNIZE?

My current ideas are:
1. *A possibility to dynamically modify the job topology. *
Then I imagine dynamically attaching Flink SQL jobs to the same Kafka
sources.

2. *A possibility to save data streams internally to Flink, predistributed*.
Then Flink SQL queries should be able to read these streams.

The ideal imaginary solution would look that simple in use:
CREATE TABLE my_stream(...) with (,
cached = 'true')
PARTITIONED BY my_partition_key

(the cached table can also be a result of CREATE TABLE and INSERT INTO
my_stream_cached SELECT ... FROM my_stream).

then I can run multiple parallel Flink SQL queries reading from that cached
table in Flink.
These

Technical implementation: Ideally, I imagine saving events in Flink state
before they are consumed. Then implement a Flink source, that can read the
Flink state of the state-filling job. It's a different job, I know! Of
course it needs to run on the same Flink cluster.
A lot of options are possible: building on top of Flink, modifying Flink
(even keeping own fork for the time being), using an external component.

In my opinion the key to the maximized performance are:
* avoid pulling data through network from Kafka
* avoid deserialization of messages for each of queries/ processors.

Comments, ideas - Any feedback is welcome!
Thank you!
Krzysztof

P.S.   I'm writing to both dev and users groups because I suspect I would
need to modify Flink to achieve what I wrote above.

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-23 Thread Timo Walther

Hi everyone,

I received some questions around how the new interfaces play together
with formats and their factories.

Furthermore, for MySQL or Postgres CDC logs, the format should be able
to return a `ChangelogMode`.

Also, I incorporated the feedback around the factory design in general.

I added a new section `Factory Interfaces` to the design document. This
should be helpful to understand the big picture and connecting the concepts.

Please let me know what you think?

Thanks,
Timo

On 18.03.20 13:43, Timo Walther wrote:

Hi Benchao,

this is a very good question. I will update the FLIP about this.

The legacy planner will not support the new interfaces. It will only
support the old interfaces. With the next release, I think the Blink
planner is stable enough to be the default one as well.

Regards,
Timo

On 18.03.20 08:45, Benchao Li wrote:

Hi Timo,

Thank you and others for the efforts to prepare this FLIP.

The FLIP LGTM generally.

+1 for moving blink data structures to table-common, it's useful to
udf too

in the future.
A little question is, do we plan to support the new interfaces and data
types in legacy planner?
Or we only plan to support these new interfaces in blink planner.

And using primary keys from DDL instead of derived key information from
each query is also a good idea,
we met some use cases where this does not works very well before.

This FLIP also makes the dependencies of table modules more clear, I like
it very much.

Timo Walther 于2020年3月17日周二 上午1:36写道：

Hi everyone,

I'm happy to present the results of long discussions that we had
internally. Jark, Dawid, Aljoscha, Kurt, Jingsong, me, and many more
have contributed to this design document.

We would like to propose new long-term table source and table sink
interfaces:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces

This is a requirement for FLIP-105 and finalizing FLIP-32.

The goals of this FLIP are:

- Simplify the current interface architecture:
- Merge upsert, retract, and append sinks.
- Unify batch and streaming sources.
- Unify batch and streaming sinks.

- Allow sources to produce a changelog:
- UpsertTableSources have been requested a lot by users. Now is
the

time to open the internal planner capabilities via the new interfaces.
- According to FLIP-105, we would like to support changelogs for
processing formats such as Debezium.

- Don't rely on DataStream API for source and sinks:
- According to FLIP-32, the Table API and SQL should be
independent

of the DataStream API which is why the `table-common` module has no
dependencies on `flink-streaming-java`.
- Source and sink implementations should only depend on the
`table-common` module after FLIP-27.
- Until FLIP-27 is ready, we still put most of the interfaces in
`table-common` and strictly separate interfaces that communicate with a
planner and actual runtime reader/writers.

- Implement efficient sources and sinks without planner dependencies:
- Make Blink's internal data structures available to connectors.
- Introduce stable interfaces for data structures that can be
marked as `@PublicEvolving`.
- Only require dependencies on `flink-table-common` in the future

It finalizes the concept of dynamic tables and consideres how all
source/sink related classes play together.

We look forward to your feedback.

Regards,
Timo

[jira] [Created] (FLINK-16729) Offer an out-of-the-box Set serializer

2020-03-23 Thread Nico Kruber (Jira)

Nico Kruber created FLINK-16729:
---

 Summary: Offer an out-of-the-box Set serializer
 Key: FLINK-16729
 URL: https://issues.apache.org/jira/browse/FLINK-16729
 Project: Flink
  Issue Type: New Feature
  Components: API / Type Serialization System
Affects Versions: 1.10.0
Reporter: Nico Kruber


Currently, Set types are serialized by Kryo by default, since Flink does not 
come with an own SetSerializer (only one for maps). While the MapSerializer can 
be easily adapted to cover sets instead, I think, this should be available by 
default to get the maximum performance out of Flink (kryo is slow!)

 

When this is added, however, we need to provide a migration path for old state 
(or not use the new SetSerializer by default but offer to opt-in). This may 
need further investigation as to whether it is possible to migrate from kryo 
automatically and whether we can check potential changes to the encapsulated 
entry class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16728) Taskmanager dies after job got stuck and canceling fails

2020-03-23 Thread Leonid Ilyevsky (Jira)

Leonid Ilyevsky created FLINK-16728:
---

 Summary: Taskmanager dies after job got stuck and canceling fails
 Key: FLINK-16728
 URL: https://issues.apache.org/jira/browse/FLINK-16728
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Leonid Ilyevsky
 Attachments: taskmanager.log.20200323.gz

At some point I noticed that a few jobs got stuck (they basically stopped 
processing the messages, I could detect this watching the expected output), so 
I tried to cancel them.

The cancel operation failed, complaining that the job got stuck at 

StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:86)

and then the whole taskmanager shut down.

See the attached log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] JMX remote monitoring integration with Flink

2020-03-23 Thread Rong Rong

Thanks @Till for sharing the JIRA information.

I thought as well that this should not be an isolated case to our
situation. We would continue to follow up on the JIRA ticket.

Best,
Rong

On Fri, Mar 20, 2020 at 7:30 AM Till Rohrmann  wrote:

> Hi Rong Rong,
>
> you are right that it JMX is quite hard to use in production due to the
> mentioned problems with discovering the port. There is actually already a
> JIRA ticket [1] discussing this problem. It just never gained enough
> traction to be tackled.
>
> In general, I agree that it would be nice to have a REST API with which one
> can obtain JVM specific information about a Flink process. This information
> could also contain a potentially open JMX port.
>
> [1] https://issues.apache.org/jira/browse/FLINK-5552
>
> Cheers,
> Till
>
> On Fri, Mar 13, 2020 at 2:02 PM Forward Xu  wrote:
>
> > Hi  RongRong,
> > Thank you for bringing this discussion, it is indeed not appropriate to
> > occupy additional ports in the production environment to provide jmxrmi
> > services. I think [2] RestApi or JobManager/TaskManager UI is a good
> idea.
> >
> > Best,
> > Forward
> >
> > Rong Rong  于2020年3月13日周五 下午8:54写道：
> >
> > > Hi All,
> > >
> > > Has anyone tried to manage production Flink applications through JMX
> > remote
> > > monitoring & management[1]?
> > >
> > > We were experimenting to enable JMXRMI on Flink by default in
> production
> > > and would like to share some of our thoughts:
> > > ** Is there any straightforward way to dynamically allocate JMXRMI
> remote
> > > ports?*
> > >   - It is unrealistic to use JMXRMI static port in production
> > environment,
> > > however we have to go all around the logging system to make the dynamic
> > > remote port number printed out in the log files - this seems very
> > > inconvenient.
> > >   - I think it would be very handy if we can show the JMXRMI remote
> > > information on JobManager/TaskManager UI, or via REST API. (I am
> thinking
> > > about something similar to [2])
> > >
> > > ** Is there any performance overhead enabling JMX for a Flink
> > application?*
> > >   - We haven't seen any significant performance impact in our
> > experiments.
> > > However the experiment is not that well-rounded and the observation is
> > > inconclusive.
> > >   - I was wondering would it be a good idea to put some benchmark in
> the
> > > regression tests[3] to see what's the overhead would be?
> > >
> > > It would be highly appreciated if anyone could share some experiences
> or
> > > provide any suggestions in how we can improve the JMX remote
> integration
> > > with Flink.
> > >
> > >
> > > Thanks,
> > > Rong
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html
> > > [2]
> > >
> >
> https://samza.apache.org/learn/documentation/0.14/jobs/web-ui-rest-api.html
> > > [3] http://codespeed.dak8s.net:8000/
> > >
> >
>

[DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

2020-03-23 Thread Robert Metzger

Hey devs,

I would like to discuss whether it makes sense to fully switch to Azure
Pipelines and phase out our Travis integration.
More information on our Azure integration can be found here:
https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines

Travis will stay for the release-1.10 and older branches, as I have set up
Azure only for the master branch.

Proposal:
- We keep the flinkbot infrastructure supporting both Travis and Azure
around, while we are still receive pull requests and pushes for the
"master" and "release-1.10" branches.
- We remove the travis-specific files from "master", so that builds are not
triggered anymore
- once we receive no more builds at Travis (because 1.11 has been
released), we remove the remaining travis-related infrastructure

What do you think?


Best,
Robert

[jira] [Created] (FLINK-16727) cannot cast 2020-11-12 as class java.time.LocalDate

2020-03-23 Thread Matrix42 (Jira)

Matrix42 created FLINK-16727:


 Summary: cannot cast 2020-11-12 as class java.time.LocalDate
 Key: FLINK-16727
 URL: https://issues.apache.org/jira/browse/FLINK-16727
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.10.0
 Environment: [^Flinktest.zip]
Reporter: Matrix42
 Attachments: Flinktest.zip

I defined as ScalarFunction as follow:

 
{code:java}
public class DateFunc extends ScalarFunction {


public String eval(Date date) {
  return date.toString();
}

@Override
public TypeInformation getResultType(Class[] signature) {
return Types.STRING;
}

   @Override
   public TypeInformation[] getParameterTypes(Class[] signature) {
  return new TypeInformation[]{Types.INT};
   }
}
{code}
I ues it in sql: `select func(DATE '2020-11-12') as a from source` , Flink 
throws 'cannot cast 2020-11-12 as class java.time.LocalDate '

 

The full code is in the [^Flinktest.zip] Main class is 
com.lorinda.template.TestDateFunction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-03-23 Thread Jark Wu

Hi Jingsong, Dawid,

I created https://issues.apache.org/jira/browse/FLINK-16725 to track this
issue. We can continue discussion there.

Best,
Jark

On Thu, 27 Feb 2020 at 10:32, Jingsong Li  wrote:

> Hi Jark,
>
> The matrix I see is SQL cast. If we need bring another conversion matrix
> that is different from SQL cast, I don't understand the benefits. It makes
> me difficult to understand.
> And It seems bad to change the timestamp of different time zones to the
> same value silently.
>
> I have seen a lot of timestamp formats,  SQL, ISO, RFC. I can think that a
> "timestampFormat" could help them to deal with various formats.
> What way do you think can solve all the problems?
>
> Best,
> Jingsong Lee
>
> On Wed, Feb 26, 2020 at 10:45 PM Jark Wu  wrote:
>
>> Hi Jingsong,
>>
>> I don't think it should follow SQL CAST semantics, because it is out of
>> SQL, it happens in connectors which converts users'/external's format into
>> SQL types.
>> I also doubt "timestampFormat" may not work in some cases, because the
>> timestamp format maybe various and mixed in a topic.
>>
>> Best,
>> Jark
>>
>> On Wed, 26 Feb 2020 at 22:20, Jingsong Li  wrote:
>>
>>> Thanks all for your discussion.
>>>
>>> Hi Dawid,
>>>
>>> +1 to apply the logic of parsing a SQL timestamp literal.
>>>
>>> I don't fully understand the matrix your list. Should this be the
>>> semantics of SQL cast?
>>> Do you mean this is implicit cast in JSON parser?
>>> I doubt that because these implicit casts are not support
>>> in LogicalTypeCasts. And it is not so good to understand when it occur
>>> silently.
>>>
>>> How about add "timestampFormat" property to JSON parser? Its default
>>> value is SQL timestamp literal format. And user can configure this.
>>>
>>> Best,
>>> Jingsong Lee
>>>
>>> On Wed, Feb 26, 2020 at 6:39 PM Jark Wu  wrote:
>>>
 Hi Dawid,

 I agree with you. If we want to loosen the format constraint, the
 important piece is the conversion matrix.

 The conversion matrix you listed makes sense to me. From my
 understanding,
 there should be 6 combination.
 We can add WITHOUT TIMEZONE => WITHOUT TIMEZONE and WITH TIMEZONE =>
 WITH
 TIMEZONE to make the matrix complete.
 When the community reach an agreement on this, we should write it down
 on
 the documentation and follow the matrix in all text-based formats.

 Regarding to the RFC 3339 compatibility mode switch, it also sounds
 good to
 me.

 Best,
 Jark

 On Wed, 26 Feb 2020 at 17:44, Dawid Wysakowicz 
 wrote:

 > Hi all,
 >
 > @NiYanchun Thank you for reporting this. Yes I think we could improve
 the
 > behaviour of the JSON format.
 >
 > @Jark First of all I do agree we could/should improve the
 > "user-friendliness" of the JSON format (and unify the behavior across
 text
 > based formats). I am not sure though if it is as simple as just
 ignore the
 > time zone here.
 >
 > My suggestion would be rather to apply the logic of parsing a SQL
 > timestamp literal (if the expected type is of
 LogicalTypeFamily.TIMESTAMP),
 > which would actually also derive the "stored" type of the timestamp
 (either
 > WITHOUT TIMEZONE or WITH TIMEZONE) and then apply a proper sql
 conversion.
 > Therefore if the
 >
 > parsed type |requested type|
 behaviour
 >
 > WITHOUT TIMEZONE| WITH TIMEZONE | store the local
 > timezone with the data
 >
 > WITHOUT TIMEZONE| WITH LOCAL TIMEZONE  | do nothing in the
 data,
 > interpret the time in local timezone
 >
 > WITH TIMEZONE  | WITH LOCAL TIMEZONE   | convert the
 timestamp
 > to local timezone and drop the time zone information
 >
 > WITH TIMEZONE  | WITHOUT TIMEZONE   | drop the time
 zone
 > information
 >
 > It might just boil down to what you said "being more lenient with
 regards
 > to parsing the time zone". Nevertheless I think this way it is a bit
 better
 > defined behaviour, especially as it has a defined behaviour when
 converting
 > between representation with or without time zone.
 >
 > An implementation note. I think we should aim to base the
 implementation
 > on the DataTypes already rather than going back to the
 TypeInformation.
 >
 > I would still try to leave the RFC 3339 compatibility mode, but maybe
 for
 > that mode it would make sense to not support any types WITHOUT
 TIMEZONE?
 > This would be enabled with a switch (disabled by default). As I
 understand
 > the RFC, making the time zone mandatory is actually a big part of the
 > standard as it makes time types unambiguous.
 >
 > What do you think?
 >
 > Ps. I cross posted this on the dev ML.
 >
 > Best,
 >
 > Dawid
 >

[jira] [Created] (FLINK-16726) ScalarFunction throws Given parameters of function 'func' do not match any signature.

2020-03-23 Thread Matrix42 (Jira)

Matrix42 created FLINK-16726:


 Summary: ScalarFunction throws Given parameters of function 'func' 
do not match any signature.
 Key: FLINK-16726
 URL: https://issues.apache.org/jira/browse/FLINK-16726
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.10.0
 Environment: [^Flinktest.zip]
Reporter: Matrix42
 Attachments: Flinktest.zip

I write a ScalarFunction as follow:

 
{code:java}
public class UDF3 extends ScalarFunction {

public String eval(String s, int a, double d) {
return s + a + d;
}

@Override
public boolean isDeterministic() {
return true;
}

@Override
public TypeInformation getResultType(Class[] signature) {
return Types.STRING;
}

@Override
public TypeInformation[] getParameterTypes(Class[] signature) {
return new TypeInformation[]{Types.STRING, Types.INT, Types.DOUBLE};
}

}
{code}
I use it in sql `select func(s, 1,2.2) from source`, Flink throw exception as 
follow:

 

 
{noformat}
Exception in thread "main" org.apache.flink.table.api.ValidationException: SQL 
validation failed. Given parameters of function 'func' do not match any 
signature. Exception in thread "main" 
org.apache.flink.table.api.ValidationException: SQL validation failed. Given 
parameters of function 'func' do not match any signature. Actual: 
(java.lang.String, java.lang.Integer, java.math.BigDecimal) Expected: 
(java.lang.String, int, double) at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:129)
 at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:104)
 at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:127)
 at 
org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:85) 
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:464)
 at com.lorinda.template.TestUDF3.main(TestUDF3.java:40){noformat}
 

the full code is in the [^Flinktest.zip] , class name is 
com.lorinda.template.TestUDF3

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16725) Support to parse various timestamp data format in JSON

2020-03-23 Thread Jark Wu (Jira)

Jark Wu created FLINK-16725:
---

 Summary: Support to parse various timestamp data format in JSON
 Key: FLINK-16725
 URL: https://issues.apache.org/jira/browse/FLINK-16725
 Project: Flink
  Issue Type: New Feature
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
SQL / Ecosystem
Reporter: Jark Wu


There are many users reported timestamp values can't be parsed even if the data 
type is specified as TIMESTAMP, see [1][2][3]. Currently, JSON format only 
allowed the timestamp data is in RFC-3339, i.e. "2019-07-09 02:02:00.040Z".

For usability, we should support various data format, including "2019-07-09 
02:02:00.040".




[1]: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/TIME-TIMESTAMP-parse-in-Flink-TABLE-SQL-API-td33061.html
[2]: 
http://apache-flink.147419.n8.nabble.com/json-timestamp-json-flink-sql-td1914.html
[3]: http://apache-flink.147419.n8.nabble.com/FLINK-SQL-td2074.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16724) ListSerializer cannot serialize list which containers null

2020-03-23 Thread Chongchen Chen (Jira)

Chongchen Chen created FLINK-16724:
--

 Summary: ListSerializer cannot serialize list which containers null
 Key: FLINK-16724
 URL: https://issues.apache.org/jira/browse/FLINK-16724
 Project: Flink
  Issue Type: Bug
Reporter: Chongchen Chen
 Attachments: list_serializer_err.diff

MapSerializer handles null value correctly, but ListSerializer doesn't. The 
attachment is the modification of unit test that can reproduce the bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16723) Move Python SDK example out of statefun-python-sdk

2020-03-23 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-16723:


 Summary: Move Python SDK example out of  statefun-python-sdk
 Key: FLINK-16723
 URL: https://issues.apache.org/jira/browse/FLINK-16723
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Igal Shilman






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16722) Add Python SDK walkthrough

2020-03-23 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-16722:


 Summary: Add Python SDK walkthrough
 Key: FLINK-16722
 URL: https://issues.apache.org/jira/browse/FLINK-16722
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Igal Shilman
Assignee: Igal Shilman






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16721) getProcessCpuLoad seems to report a wrong cpu load

2020-03-23 Thread xiaogang zhou (Jira)

xiaogang zhou created FLINK-16721:
-

 Summary: getProcessCpuLoad seems to report a wrong cpu load 
 Key: FLINK-16721
 URL: https://issues.apache.org/jira/browse/FLINK-16721
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Affects Versions: 1.10.0
Reporter: xiaogang zhou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16720) Maven gets stuck downloading artifacts on Azure

2020-03-23 Thread Robert Metzger (Jira)

Robert Metzger created FLINK-16720:
--

 Summary: Maven gets stuck downloading artifacts on Azure
 Key: FLINK-16720
 URL: https://issues.apache.org/jira/browse/FLINK-16720
 Project: Flink
  Issue Type: Task
  Components: Build System / Azure Pipelines
Affects Versions: 1.11.0
Reporter: Robert Metzger


Logs: 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6509=logs=fc5181b0-e452-5c8f-68de-1097947f6483=27d1d645-cbce-54e2-51c4-d8b45fe24607

{code}
2020-03-23T08:43:28.4128014Z [INFO] 

2020-03-23T08:43:28.4128557Z [INFO] Building flink-avro-confluent-registry 
1.11-SNAPSHOT
2020-03-23T08:43:28.4129129Z [INFO] 

2020-03-23T08:48:47.6591333Z 
==
2020-03-23T08:48:47.6594540Z Maven produced no output for 300 seconds.
2020-03-23T08:48:47.6595164Z 
==
2020-03-23T08:48:47.6605370Z 
==
2020-03-23T08:48:47.6605803Z The following Java processes are running (JPS)
2020-03-23T08:48:47.6606173Z 
==
2020-03-23T08:48:47.7710037Z 920 Jps
2020-03-23T08:48:47.7778561Z 238 Launcher
2020-03-23T08:48:47.9270289Z 
==
2020-03-23T08:48:47.9270832Z Printing stack trace of Java process 967
2020-03-23T08:48:47.9271199Z 
==
2020-03-23T08:48:48.0165945Z 967: No such process
2020-03-23T08:48:48.0218260Z 
==
2020-03-23T08:48:48.0218736Z Printing stack trace of Java process 238
2020-03-23T08:48:48.0219075Z 
==
2020-03-23T08:48:48.3404066Z 2020-03-23 08:48:48
2020-03-23T08:48:48.3404828Z Full thread dump OpenJDK 64-Bit Server VM 
(25.242-b08 mixed mode):
2020-03-23T08:48:48.3405064Z 
2020-03-23T08:48:48.3405445Z "Attach Listener" #370 daemon prio=9 os_prio=0 
tid=0x7fe130001000 nid=0x452 waiting on condition [0x]
2020-03-23T08:48:48.3405868Zjava.lang.Thread.State: RUNNABLE
2020-03-23T08:48:48.3411202Z 
2020-03-23T08:48:48.3413171Z "resolver-5" #105 daemon prio=5 os_prio=0 
tid=0x7fe1ec2ad800 nid=0x177 waiting on condition [0x7fe1872d9000]
2020-03-23T08:48:48.3414175Zjava.lang.Thread.State: WAITING (parking)
2020-03-23T08:48:48.3414560Zat sun.misc.Unsafe.park(Native Method)
2020-03-23T08:48:48.3415451Z- parking to wait for  <0x0003d5a9f828> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2020-03-23T08:48:48.3416180Zat 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
2020-03-23T08:48:48.3416825Zat 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
2020-03-23T08:48:48.3417602Zat 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
2020-03-23T08:48:48.3418250Zat 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
2020-03-23T08:48:48.3418930Zat 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
2020-03-23T08:48:48.3419900Zat 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2020-03-23T08:48:48.3420395Zat java.lang.Thread.run(Thread.java:748)
2020-03-23T08:48:48.3420648Z 
2020-03-23T08:48:48.3421424Z "resolver-4" #104 daemon prio=5 os_prio=0 
tid=0x7fe1ec2ad000 nid=0x176 waiting on condition [0x7fe1863dd000]
2020-03-23T08:48:48.3421914Zjava.lang.Thread.State: WAITING (parking)
2020-03-23T08:48:48.3422233Zat sun.misc.Unsafe.park(Native Method)
2020-03-23T08:48:48.3422919Z- parking to wait for  <0x0003d5a9f828> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2020-03-23T08:48:48.3423447Zat 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
2020-03-23T08:48:48.3424141Zat 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
2020-03-23T08:48:48.3424734Zat 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
2020-03-23T08:48:48.3425339Zat 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
2020-03-23T08:48:48.3426359Zat 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
2020-03-23T08:48:48.3426912Zat

[jira] [Created] (FLINK-16719) flink-runtime-web install failure

2020-03-23 Thread jackray wang (Jira)

jackray wang created FLINK-16719:


 Summary: flink-runtime-web install failure
 Key: FLINK-16719
 URL: https://issues.apache.org/jira/browse/FLINK-16719
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Affects Versions: 1.10.0
Reporter: jackray wang


When I compiling the module flink-runtime-web will being reminded  1 error and 
1 warning ,as follows：
{code:java}

[WARNING] npm WARN prepare removing existing node_modules/ before installation
npm WARN prepare removing existing node_modules/ before installation

npm 1 error，1 warining
added 1302 packages in 24.161s

npm 1 error
Browserslist: caniuse-lite is outdated. Please run next command `npm 
update`{code}
Follow the reminds I run 'npm update' many times but it doesn't work in next 
compilation. 

 

In the end, the module remind 'build success' but have some errors are not very 
well for me，what can I do to remove all the errors?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[Dev Blog] Migrating Flink's CI Infrastructure from Travis CI to Azure Pipelines

2020-03-23 Thread Robert Metzger

Hi all,

I have just published the first post to the dev blog:
https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
.

I'm looking forward to your feedback and questions on the article :)

Best,
Robert

[jira] [Created] (FLINK-16718) KvStateServerHandlerTest leaks Netty ByteBufs

2020-03-23 Thread Gary Yao (Jira)

Gary Yao created FLINK-16718:


 Summary: KvStateServerHandlerTest leaks Netty ByteBufs
 Key: FLINK-16718
 URL: https://issues.apache.org/jira/browse/FLINK-16718
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Queryable State, Tests
Affects Versions: 1.10.0, 1.11.0
Reporter: Gary Yao
Assignee: Gary Yao
 Fix For: 1.11.0


The {{KvStateServerHandlerTest}} leaks Netty {{ByteBuf}}s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16717) Use headless service for rpc and blob port when flink on K8S

2020-03-23 Thread vanderliang (Jira)

vanderliang created FLINK-16717:
---

 Summary: Use headless service for rpc and blob port when flink on 
K8S
 Key: FLINK-16717
 URL: https://issues.apache.org/jira/browse/FLINK-16717
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Affects Versions: 1.10.0
 Environment: flink 1.10
Reporter: vanderliang


Current, when submitting a flink job cluster on kubernetes, the 
[https://github.com/apache/flink/blob/release-1.10/flink-container/kubernetes/job-cluster-service.yaml]
 will create Node Port type service for all ports. 

First, the RPC and blob only need a headless service which could avoid iptables 
or ipvs forwarding. 

Second，serverless K8S(like AWS EKS) in public cloud does not support Node 
Port，which cause some problem when running flink On serverless K8S.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[ANNOUNCE] Feature freeze for Apache Flink Stateful Functions 2.0.0

2020-03-23 Thread Tzu-Li (Gordon) Tai

Hi devs,

Following the consensus [1] to release Stateful Functions soon with new
version as 2.0.0, I've proceeded to cut the feature branch `release-2.0`
[2] for the project.

I'll aim to create the first voting release candidate by the end of this
week (Mar. 23).
The release candidate will also come out with a proposed test plan,
including things you can help with verifying for this release.

Cheers,
Gordon

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Update-on-Flink-Stateful-Functions-amp-what-are-the-next-steps-td38646.html
[2] https://github.com/apache/flink-statefun/tree/release-2.0

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-23 Thread Jark Wu

Hi Jingsong,

Regarding (2) and (3), I was thinking to ignore manually DDL work, so users
can use them directly:

# this will log results to `.out` files
INSERT INTO console
SELECT ...

# this will drop all received records
INSERT INTO blackhole
SELECT ...

Here `console` and `blackhole` are system sinks which is similar to system
functions.

Best,
Jark

On Mon, 23 Mar 2020 at 16:33, Benchao Li  wrote:

> Hi Jingsong,
>
> Thanks for bring this up. Generally, it's a very good proposal.
>
> About data gen source, do you think we need to add more columns with
> various types?
>
> About print sink, do we need to specify the schema?
>
> Jingsong Li  于2020年3月23日周一 下午1:51写道：
>
> > Thanks Bowen, Jark and Dian for your feedback and suggestions.
> >
> > I reorganize with your suggestions, and try to expose DDLs:
> >
> > 1.datagen source:
> > - easy startup/test for streaming job
> > - performance testing
> >
> > DDL:
> > CREATE TABLE user (
> > id BIGINT,
> > age INT,
> > description STRING
> > ) WITH (
> > 'connector.type' = 'datagen',
> > 'connector.rows-per-second'='100',
> > 'connector.total-records'='100',
> >
> > 'schema.id.generator' = 'sequence',
> > 'schema.id.generator.start' = '1',
> >
> > 'schema.age.generator' = 'random',
> > 'schema.age.generator.min' = '0',
> > 'schema.age.generator.max' = '100',
> >
> > 'schema.description.generator' = 'random',
> > 'schema.description.generator.length' = '100'
> > )
> >
> > Default is random generator.
> > Hi Jark, I don't want to bring complicated regularities, because it can
> be
> > done through computed columns. And it is hard to define
> > standard regularities, I think we can leave it to the future.
> >
> > 2.print sink:
> > - easy test for streaming job
> > - be very useful in production debugging
> >
> > DDL:
> > CREATE TABLE print_table (
> > ...
> > ) WITH (
> > 'connector.type' = 'print'
> > )
> >
> > 3.blackhole sink
> > - very useful for high performance testing of Flink
> > - I've also run into users trying UDF to output, not sink, so they need
> > this sink as well.
> >
> > DDL:
> > CREATE TABLE blackhole_table (
> > ...
> > ) WITH (
> > 'connector.type' = 'blackhole'
> > )
> >
> > What do you think?
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Mar 23, 2020 at 12:04 PM Dian Fu  wrote:
> >
> > > Thanks Jingsong for bringing up this discussion. +1 to this proposal. I
> > > think Bowen's proposal makes much sense to me.
> > >
> > > This is also a painful problem for PyFlink users. Currently there is no
> > > built-in easy-to-use table source/sink and it requires users to write a
> > lot
> > > of code to trying out PyFlink. This is especially painful for new users
> > who
> > > are not familiar with PyFlink/Flink. I have also encountered the
> tedious
> > > process Bowen encountered, e.g. writing random source connector, print
> > sink
> > > and also blackhole print sink as there are no built-in ones to use.
> > >
> > > Regards,
> > > Dian
> > >
> > > > 在 2020年3月22日，上午11:24，Jark Wu  写道：
> > > >
> > > > +1 to Bowen's proposal. I also saw many requirements on such built-in
> > > > connectors.
> > > >
> > > > I will leave some my thoughts here:
> > > >
> > > >> 1. datagen source (random source)
> > > > I think we can merge the functinality of sequence-source into random
> > > source
> > > > to allow users to custom their data values.
> > > > Flink can generate random data according to the field types, users
> > > > can customize their values to be more domain specific, e.g.
> > > > 'field.user'='User_[1-9]{0,1}'
> > > > This will be similar to kafka-datagen-connect[1].
> > > >
> > > >> 2. console sink (print sink)
> > > > This will be very useful in production debugging, to easily output an
> > > > intermediate view or result view to a `.out` file.
> > > > So that we can look into the data representation, or check dirty
> data.
> > > > This should be out-of-box without manually DDL registration.
> > > >
> > > >> 3. blackhole sink (no output sink)
> > > > This is very useful for high performance testing of Flink, to
> meansure
> > > the
> > > > throughput of the whole pipeline without sink.
> > > > Presto also provides this as a built-in connector [2].
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://github.com/confluentinc/kafka-connect-datagen#define-a-new-schema-specification
> > > > [2]: https://prestodb.io/docs/current/connector/blackhole.html
> > > >
> > > >
> > > > On Sat, 21 Mar 2020 at 12:31, Bowen Li  wrote:
> > > >
> > > >> +1.
> > > >>
> > > >> I would suggest to take a step even further and see what users
> really
> > > need
> > > >> to test/try/play with table API and Flink SQL. Besides this one,
> > here're
> > > >> some more sources and sinks that I have developed or used previously
> > to
> > > >> facilitate building Flink table/SQL pipelines.
> > > >>
> > > >>
> > > >>   1. random input data source
> > > >>  - should generate

[jira] [Created] (FLINK-16716) Update Roadmap after Flink 1.10 release

2020-03-23 Thread Fabian Hueske (Jira)

Fabian Hueske created FLINK-16716:
-

 Summary: Update Roadmap after Flink 1.10 release
 Key: FLINK-16716
 URL: https://issues.apache.org/jira/browse/FLINK-16716
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Fabian Hueske


The roadmap on the Flink website needs to be updated to reflect the new 
features of Flink 1.10 and the planned features and improvements of future 
releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-23 Thread Benchao Li

Hi Jingsong,

Thanks for bring this up. Generally, it's a very good proposal.

About data gen source, do you think we need to add more columns with
various types?

About print sink, do we need to specify the schema?

Jingsong Li  于2020年3月23日周一 下午1:51写道：

> Thanks Bowen, Jark and Dian for your feedback and suggestions.
>
> I reorganize with your suggestions, and try to expose DDLs:
>
> 1.datagen source:
> - easy startup/test for streaming job
> - performance testing
>
> DDL:
> CREATE TABLE user (
> id BIGINT,
> age INT,
> description STRING
> ) WITH (
> 'connector.type' = 'datagen',
> 'connector.rows-per-second'='100',
> 'connector.total-records'='100',
>
> 'schema.id.generator' = 'sequence',
> 'schema.id.generator.start' = '1',
>
> 'schema.age.generator' = 'random',
> 'schema.age.generator.min' = '0',
> 'schema.age.generator.max' = '100',
>
> 'schema.description.generator' = 'random',
> 'schema.description.generator.length' = '100'
> )
>
> Default is random generator.
> Hi Jark, I don't want to bring complicated regularities, because it can be
> done through computed columns. And it is hard to define
> standard regularities, I think we can leave it to the future.
>
> 2.print sink:
> - easy test for streaming job
> - be very useful in production debugging
>
> DDL:
> CREATE TABLE print_table (
> ...
> ) WITH (
> 'connector.type' = 'print'
> )
>
> 3.blackhole sink
> - very useful for high performance testing of Flink
> - I've also run into users trying UDF to output, not sink, so they need
> this sink as well.
>
> DDL:
> CREATE TABLE blackhole_table (
> ...
> ) WITH (
> 'connector.type' = 'blackhole'
> )
>
> What do you think?
>
> Best,
> Jingsong Lee
>
> On Mon, Mar 23, 2020 at 12:04 PM Dian Fu  wrote:
>
> > Thanks Jingsong for bringing up this discussion. +1 to this proposal. I
> > think Bowen's proposal makes much sense to me.
> >
> > This is also a painful problem for PyFlink users. Currently there is no
> > built-in easy-to-use table source/sink and it requires users to write a
> lot
> > of code to trying out PyFlink. This is especially painful for new users
> who
> > are not familiar with PyFlink/Flink. I have also encountered the tedious
> > process Bowen encountered, e.g. writing random source connector, print
> sink
> > and also blackhole print sink as there are no built-in ones to use.
> >
> > Regards,
> > Dian
> >
> > > 在 2020年3月22日，上午11:24，Jark Wu  写道：
> > >
> > > +1 to Bowen's proposal. I also saw many requirements on such built-in
> > > connectors.
> > >
> > > I will leave some my thoughts here:
> > >
> > >> 1. datagen source (random source)
> > > I think we can merge the functinality of sequence-source into random
> > source
> > > to allow users to custom their data values.
> > > Flink can generate random data according to the field types, users
> > > can customize their values to be more domain specific, e.g.
> > > 'field.user'='User_[1-9]{0,1}'
> > > This will be similar to kafka-datagen-connect[1].
> > >
> > >> 2. console sink (print sink)
> > > This will be very useful in production debugging, to easily output an
> > > intermediate view or result view to a `.out` file.
> > > So that we can look into the data representation, or check dirty data.
> > > This should be out-of-box without manually DDL registration.
> > >
> > >> 3. blackhole sink (no output sink)
> > > This is very useful for high performance testing of Flink, to meansure
> > the
> > > throughput of the whole pipeline without sink.
> > > Presto also provides this as a built-in connector [2].
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> >
> https://github.com/confluentinc/kafka-connect-datagen#define-a-new-schema-specification
> > > [2]: https://prestodb.io/docs/current/connector/blackhole.html
> > >
> > >
> > > On Sat, 21 Mar 2020 at 12:31, Bowen Li  wrote:
> > >
> > >> +1.
> > >>
> > >> I would suggest to take a step even further and see what users really
> > need
> > >> to test/try/play with table API and Flink SQL. Besides this one,
> here're
> > >> some more sources and sinks that I have developed or used previously
> to
> > >> facilitate building Flink table/SQL pipelines.
> > >>
> > >>
> > >>   1. random input data source
> > >>  - should generate random data at a specified rate according to
> > schema
> > >>  - purposes
> > >> - test Flink pipeline and data can end up in external storage
> > >> correctly
> > >> - stress test Flink sink as well as tuning up external storage
> > >>  2. print data sink
> > >>  - should print data in row format in console
> > >>  - purposes
> > >> - make it easier to test Flink SQL job e2e in IDE
> > >> - test Flink pipeline and ensure output data format/value is
> > >> correct
> > >>  3. no output data sink
> > >>  - just swallow output data without doing anything
> > >>  - purpose
> > >> - evaluate and tune performance of

[jira] [Created] (FLINK-16715) Always use the configuration argument in YarnClusterDescriptor#startAppMaster to make it more self-contained

2020-03-23 Thread Canbin Zheng (Jira)

Canbin Zheng created FLINK-16715:


 Summary: Always use the configuration argument in 
YarnClusterDescriptor#startAppMaster to make it more self-contained
 Key: FLINK-16715
 URL: https://issues.apache.org/jira/browse/FLINK-16715
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Canbin Zheng
 Fix For: 1.11.0


In the YarnClusterDescriptor#{{startAppMaster()}} we are using some time the 
configuration argument to the method to get/set config options, and 
sometimes the flinkConfiguration which is a class member. This ticket 
proposes to always use the configuration argument to make the method 
more self-contained.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16714) Kafka committedOffsets value <0

2020-03-23 Thread pingle wang (Jira)

pingle wang created FLINK-16714:
---

 Summary: Kafka committedOffsets value <0
 Key: FLINK-16714
 URL: https://issues.apache.org/jira/browse/FLINK-16714
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Affects Versions: 1.6.3
Reporter: pingle wang
 Attachments: image-2020-03-23-15-36-16-487.png, 
image-2020-03-23-15-36-27-910.png

dear all:
       use rest api get kafka offsets sometimes produces errors value like < 0, 
so use this value monitoring Kafka's lag size will produce false positives. 
This should be the problem of flink metrics collection?

thanks!

!image-2020-03-23-15-36-27-910.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16713) Flink can not support the type of source table to query data from es

2020-03-23 Thread jackray wang (Jira)

jackray wang created FLINK-16713:


 Summary: Flink can not support the type of source table to query 
data from es
 Key: FLINK-16713
 URL: https://issues.apache.org/jira/browse/FLINK-16713
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Affects Versions: 1.10.0
Reporter: jackray wang


[https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html#elasticsearch-connector]

For append-only queries, the connector can also operate in [append 
mode|https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html#update-modes]
 for exchanging only INSERT messages with the external system. If no key is 
defined by the query, a key is automatically generated by Elasticsearch.

I want to know ,why the connector of flink with ES just support sink but  
doesn't support source .Which version could add this feature to ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

42 matches

Mail list logo