date:20200708

[jira] [Created] (FLINK-18535) Elasticsearch connector hangs for threads deadlocked

2020-07-08 Thread DeFOX (Jira)

DeFOX created FLINK-18535:
-

 Summary: Elasticsearch connector hangs for threads deadlocked 
 Key: FLINK-18535
 URL: https://issues.apache.org/jira/browse/FLINK-18535
 Project: Flink
  Issue Type: Bug
  Components: Connectors / ElasticSearch
Affects Versions: 1.10.0
Reporter: DeFOX


When using the connector created by 
org.apache.flink.streaming.connectors.elasticsearch7.ElasticsearchSink.Builder 
to sink data to ES(es7.7.1 and es6.8.5), flink job will finally hang for 
threads deadlocked after a while.

And connector created by 
org.apache.flink.streaming.connectors.elasticsearch6.ElasticsearchSink.Builder 
will not result in this deadlocked situation.

This deadlocked issue may come from this 
pr：[https://github.com/elastic/elasticsearch/pull/41451]

So upgrade version of the dependencies of ES may fix this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Releasing Flink 1.11.1 soon?

2020-07-08 Thread Jark Wu

Besides, it would be great if we can figure out the performance regression
Thomas reported before.

Do you know what's the status now? @zhijiang 
 @Thomas

Best,
Jark

On Thu, 9 Jul 2020 at 11:10, Jark Wu  wrote:

> Hi everyone,
>
> As discussed in the voting thread of 1.11.0-RC4 [1], we found a blocker
> issue about the CDC feature [1].
> Considering this is a new kind of connector, we don't want to block the
> ready-to-publish RC4 and prefer to have an immediately 1.11.1 release.
> Therefore, I would like to start the discussion about releasing 1.11.1
> soon, to deliver a complete CDC feature.
> We can also release some notable bug fixes found in these days. But I
> suggest not to wait too long to collect/fix bugs,
> otherwise it will delay the feature delivery again, we can launch another
> patch release after that if needed.
>
> Most notable bug fixes so far are:
>
> - FLINK-18461 Changelog source can't be insert into upsert sink
> - FLINK-18426 Incompatible deprecated key type for registration cluster
> options
>
> Furthermore, I think it would be better if we can fix these issues in
> 1.11.1:
>
> - FLINK-18434 Can not select fields with JdbcCatalog (PR opened)
> - FLINK-18520 New Table Function type inference fails (PR opened)
> - FLINK-18477 ChangelogSocketExample does not work
>
> I'd like to suggest creating the first RC next Monday. What do you think?
>
> If there are any concerns or missing blocker issues need to be fixed in
> 1.11.1, please let me know. Thanks.
>
> Best,
> Jark
>
> [1]:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-11-0-release-candidate-4-tp42829p42858.html
>
> [2]: https://issues.apache.org/jira/browse/FLINK-18461
>

[DISCUSS] Releasing Flink 1.11.1 soon?

2020-07-08 Thread Jark Wu

Hi everyone,

As discussed in the voting thread of 1.11.0-RC4 [1], we found a blocker
issue about the CDC feature [1].
Considering this is a new kind of connector, we don't want to block the
ready-to-publish RC4 and prefer to have an immediately 1.11.1 release.
Therefore, I would like to start the discussion about releasing 1.11.1
soon, to deliver a complete CDC feature.
We can also release some notable bug fixes found in these days. But I
suggest not to wait too long to collect/fix bugs,
otherwise it will delay the feature delivery again, we can launch another
patch release after that if needed.

Most notable bug fixes so far are:

- FLINK-18461 Changelog source can't be insert into upsert sink
- FLINK-18426 Incompatible deprecated key type for registration cluster
options

Furthermore, I think it would be better if we can fix these issues in
1.11.1:

- FLINK-18434 Can not select fields with JdbcCatalog (PR opened)
- FLINK-18520 New Table Function type inference fails (PR opened)
- FLINK-18477 ChangelogSocketExample does not work

I'd like to suggest creating the first RC next Monday. What do you think?

If there are any concerns or missing blocker issues need to be fixed in
1.11.1, please let me know. Thanks.

Best,
Jark

[1]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-11-0-release-candidate-4-tp42829p42858.html

[2]: https://issues.apache.org/jira/browse/FLINK-18461

Re: [DISCUSS] FLIP-130: Support for Python DataStream API (Stateless Part)

2020-07-08 Thread Xingbo Huang

Hi Shuiqiang,

Thanks a lot for driving this discussion.
Big +1 for supporting Python DataStream.
In many ML scenarios, operating Object will be more natural than operating
Table.

Best,
Xingbo

Wei Zhong  于2020年7月9日周四 上午10:35写道：

> Hi Shuiqiang,
>
> Thanks for driving this. Big +1 for supporting DataStream API in PyFlink!
>
> Best,
> Wei
>
>
> > 在 2020年7月9日，10:29，Hequn Cheng  写道：
> >
> > +1 for adding the Python DataStream API and starting with the stateless
> > part.
> > There are already some users that expressed their wish to have the Python
> > DataStream APIs. Once we have the APIs in PyFlink, we can cover more use
> > cases for our users.
> >
> > Best, Hequn
> >
> > On Wed, Jul 8, 2020 at 11:45 AM Shuiqiang Chen 
> wrote:
> >
> >> Sorry, the 3rd link is broken, please refer to this one: Support Python
> >> DataStream API
> >> <
> >>
> https://docs.google.com/document/d/1H3hz8wuk22-8cDBhQmQKNw3m1q5gDAMkwTDEwnj3FBI/edit
> >>>
> >>
> >> Shuiqiang Chen  于2020年7月8日周三 上午11:13写道：
> >>
> >>> Hi everyone,
> >>>
> >>> As we all know, Flink provides three layered APIs: the
> ProcessFunctions,
> >>> the DataStream API and the SQL & Table API. Each API offers a different
> >>> trade-off between conciseness and expressiveness and targets different
> >> use
> >>> cases[1].
> >>>
> >>> Currently, the SQL & Table API has already been supported in PyFlink.
> The
> >>> API provides relational operations as well as user-defined functions to
> >>> provide convenience for users who are familiar with python and
> relational
> >>> programming.
> >>>
> >>> Meanwhile, the DataStream API and ProcessFunctions provide more generic
> >>> APIs to implement stream processing applications. The ProcessFunctions
> >>> expose time and state which are the fundamental building blocks for any
> >>> kind of streaming application.
> >>> To cover more use cases, we are planning to cover all these APIs in
> >>> PyFlink.
> >>>
> >>> In this discussion(FLIP-130), we propose to support the Python
> DataStream
> >>> API for the stateless part. For more detail, please refer to the FLIP
> >> wiki
> >>> page here[2]. If interested in the stateful part, you can also take a
> >>> look the design doc here[3] for which we are going to discuss in a
> >> separate
> >>> FLIP.
> >>>
> >>> Any comments will be highly appreciated!
> >>>
> >>> [1] https://flink.apache.org/flink-applications.html#layered-apis
> >>> [2]
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298
> >>> [3]
> >>>
> >>
> https://docs.google.com/document/d/1H3hz8wuk228cDBhQmQKNw3m1q5gDAMkwTDEwnj3FBI/edit?usp=sharing
> >>>
> >>> Best,
> >>> Shuiqiang
> >>>
> >>>
> >>>
> >>>
> >>
>
>

Re: [DISCUSS] FLIP-130: Support for Python DataStream API (Stateless Part)

2020-07-08 Thread Wei Zhong

Hi Shuiqiang,

Thanks for driving this. Big +1 for supporting DataStream API in PyFlink!

Best,
Wei


> 在 2020年7月9日，10:29，Hequn Cheng  写道：
> 
> +1 for adding the Python DataStream API and starting with the stateless
> part.
> There are already some users that expressed their wish to have the Python
> DataStream APIs. Once we have the APIs in PyFlink, we can cover more use
> cases for our users.
> 
> Best, Hequn
> 
> On Wed, Jul 8, 2020 at 11:45 AM Shuiqiang Chen  wrote:
> 
>> Sorry, the 3rd link is broken, please refer to this one: Support Python
>> DataStream API
>> <
>> https://docs.google.com/document/d/1H3hz8wuk22-8cDBhQmQKNw3m1q5gDAMkwTDEwnj3FBI/edit
>>> 
>> 
>> Shuiqiang Chen  于2020年7月8日周三 上午11:13写道：
>> 
>>> Hi everyone,
>>> 
>>> As we all know, Flink provides three layered APIs: the ProcessFunctions,
>>> the DataStream API and the SQL & Table API. Each API offers a different
>>> trade-off between conciseness and expressiveness and targets different
>> use
>>> cases[1].
>>> 
>>> Currently, the SQL & Table API has already been supported in PyFlink. The
>>> API provides relational operations as well as user-defined functions to
>>> provide convenience for users who are familiar with python and relational
>>> programming.
>>> 
>>> Meanwhile, the DataStream API and ProcessFunctions provide more generic
>>> APIs to implement stream processing applications. The ProcessFunctions
>>> expose time and state which are the fundamental building blocks for any
>>> kind of streaming application.
>>> To cover more use cases, we are planning to cover all these APIs in
>>> PyFlink.
>>> 
>>> In this discussion(FLIP-130), we propose to support the Python DataStream
>>> API for the stateless part. For more detail, please refer to the FLIP
>> wiki
>>> page here[2]. If interested in the stateful part, you can also take a
>>> look the design doc here[3] for which we are going to discuss in a
>> separate
>>> FLIP.
>>> 
>>> Any comments will be highly appreciated!
>>> 
>>> [1] https://flink.apache.org/flink-applications.html#layered-apis
>>> [2]
>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298
>>> [3]
>>> 
>> https://docs.google.com/document/d/1H3hz8wuk228cDBhQmQKNw3m1q5gDAMkwTDEwnj3FBI/edit?usp=sharing
>>> 
>>> Best,
>>> Shuiqiang
>>> 
>>> 
>>> 
>>> 
>>

Re: [DISCUSS] FLIP-130: Support for Python DataStream API (Stateless Part)

2020-07-08 Thread Hequn Cheng

+1 for adding the Python DataStream API and starting with the stateless
part.
There are already some users that expressed their wish to have the Python
DataStream APIs. Once we have the APIs in PyFlink, we can cover more use
cases for our users.

Best, Hequn

On Wed, Jul 8, 2020 at 11:45 AM Shuiqiang Chen  wrote:

> Sorry, the 3rd link is broken, please refer to this one: Support Python
> DataStream API
> <
> https://docs.google.com/document/d/1H3hz8wuk22-8cDBhQmQKNw3m1q5gDAMkwTDEwnj3FBI/edit
> >
>
> Shuiqiang Chen  于2020年7月8日周三 上午11:13写道：
>
> > Hi everyone,
> >
> > As we all know, Flink provides three layered APIs: the ProcessFunctions,
> > the DataStream API and the SQL & Table API. Each API offers a different
> > trade-off between conciseness and expressiveness and targets different
> use
> > cases[1].
> >
> > Currently, the SQL & Table API has already been supported in PyFlink. The
> > API provides relational operations as well as user-defined functions to
> > provide convenience for users who are familiar with python and relational
> > programming.
> >
> > Meanwhile, the DataStream API and ProcessFunctions provide more generic
> > APIs to implement stream processing applications. The ProcessFunctions
> > expose time and state which are the fundamental building blocks for any
> > kind of streaming application.
> > To cover more use cases, we are planning to cover all these APIs in
> > PyFlink.
> >
> > In this discussion(FLIP-130), we propose to support the Python DataStream
> > API for the stateless part. For more detail, please refer to the FLIP
> wiki
> > page here[2]. If interested in the stateful part, you can also take a
> > look the design doc here[3] for which we are going to discuss in a
> separate
> > FLIP.
> >
> > Any comments will be highly appreciated!
> >
> > [1] https://flink.apache.org/flink-applications.html#layered-apis
> > [2]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298
> > [3]
> >
> https://docs.google.com/document/d/1H3hz8wuk228cDBhQmQKNw3m1q5gDAMkwTDEwnj3FBI/edit?usp=sharing
> >
> > Best,
> > Shuiqiang
> >
> >
> >
> >
>

[jira] [Created] (FLINK-18534) KafkaTableITCase.testKafkaDebeziumChangelogSource failed with "Topic 'changelog_topic' already exists"

2020-07-08 Thread Dian Fu (Jira)

Dian Fu created FLINK-18534:
---

 Summary: KafkaTableITCase.testKafkaDebeziumChangelogSource failed 
with "Topic 'changelog_topic' already exists"
 Key: FLINK-18534
 URL: https://issues.apache.org/jira/browse/FLINK-18534
 Project: Flink
  Issue Type: Test
  Components: Connectors / Kafka, Table SQL / API, Tests
Affects Versions: 1.12.0
Reporter: Dian Fu


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=4350&view=logs&j=4be4ed2b-549a-533d-aa33-09e28e360cc8&t=0db94045-2aa0-53fa-f444-0130d6933518

{code}
2020-07-08T21:14:04.1626423Z [ERROR] Failures: 
2020-07-08T21:14:04.1629804Z [ERROR]   
KafkaTableITCase.testKafkaDebeziumChangelogSource:66->KafkaTestBase.createTestTopic:197
 Create test topic : changelog_topic failed, 
org.apache.kafka.common.errors.TopicExistsException: Topic 'changelog_topic' 
already exists.
2020-07-08T21:14:04.1630642Z [ERROR] Errors: 
2020-07-08T21:14:04.1630986Z [ERROR]   
KafkaTableITCase.testKafkaDebeziumChangelogSource:83  Failed to write 
debezium...
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

2020-07-08 Thread Xingcan Cui

Congratulations, Piotr!

Best, Xingcan

On Wed, Jul 8, 2020, 21:53 Yang Wang  wrote:

> Congratulations Piotr!
>
>
> Best,
> Yang
>
> Dan Zou  于2020年7月8日周三 下午10:36写道：
>
> > Congratulations!
> >
> > Best,
> > Dan Zou
> >
> > > 2020年7月8日 下午5:25，godfrey he  写道：
> > >
> > > Congratulations
> >
> >
>

[jira] [Created] (FLINK-18533) AccumulatorLiveITCase.testStreaming hangs

2020-07-08 Thread Dian Fu (Jira)

Dian Fu created FLINK-18533:
---

 Summary: AccumulatorLiveITCase.testStreaming hangs
 Key: FLINK-18533
 URL: https://issues.apache.org/jira/browse/FLINK-18533
 Project: Flink
  Issue Type: Test
  Components: Tests
Affects Versions: 1.12.0
Reporter: Dian Fu


[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=4350&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=a99e99c7-21cd-5a1f-7274-585e62b72f56]

{code}
2020-07-08T21:46:15.4438026Z Printing stack trace of Java process 40159
2020-07-08T21:46:15.4442864Z 
==
2020-07-08T21:46:15.4475676Z Picked up JAVA_TOOL_OPTIONS: 
-XX:+HeapDumpOnOutOfMemoryError
2020-07-08T21:46:15.7672746Z 2020-07-08 21:46:15
2020-07-08T21:46:15.7673349Z Full thread dump OpenJDK 64-Bit Server VM 
(25.242-b08 mixed mode):
2020-07-08T21:46:15.7673590Z 
2020-07-08T21:46:15.7673893Z "Attach Listener" #86 daemon prio=9 os_prio=0 
tid=0x7fef8c025800 nid=0x1b231 runnable [0x]
2020-07-08T21:46:15.7674242Zjava.lang.Thread.State: RUNNABLE
2020-07-08T21:46:15.7674419Z 
2020-07-08T21:46:15.7675150Z "flink-taskexecutor-io-thread-2" #85 daemon prio=5 
os_prio=0 tid=0x7fef9c02 nid=0xb03a waiting on condition 
[0x7fefac1f3000]
2020-07-08T21:46:15.7675964Zjava.lang.Thread.State: WAITING (parking)
2020-07-08T21:46:15.7676249Zat sun.misc.Unsafe.park(Native Method)
2020-07-08T21:46:15.7680997Z- parking to wait for  <0x87180a20> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2020-07-08T21:46:15.7681506Zat 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
2020-07-08T21:46:15.7682009Zat 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
2020-07-08T21:46:15.7682666Zat 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
2020-07-08T21:46:15.7683100Zat 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
2020-07-08T21:46:15.7683554Zat 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
2020-07-08T21:46:15.7684013Zat 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2020-07-08T21:46:15.7684371Zat java.lang.Thread.run(Thread.java:748)
2020-07-08T21:46:15.7684559Z 
2020-07-08T21:46:15.7685213Z "Flink-DispatcherRestEndpoint-thread-4" #84 daemon 
prio=5 os_prio=0 tid=0x7fef90431800 nid=0x9e49 waiting on condition 
[0x7fef7df4a000]
2020-07-08T21:46:15.7685665Zjava.lang.Thread.State: WAITING (parking)
2020-07-08T21:46:15.7686052Zat sun.misc.Unsafe.park(Native Method)
2020-07-08T21:46:15.7686707Z- parking to wait for  <0x87180cc0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2020-07-08T21:46:15.7687184Zat 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
2020-07-08T21:46:15.7687721Zat 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
2020-07-08T21:46:15.7688342Zat 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
2020-07-08T21:46:15.7688935Zat 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
2020-07-08T21:46:15.7689579Zat 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
2020-07-08T21:46:15.7690451Zat 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
2020-07-08T21:46:15.7690928Zat 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2020-07-08T21:46:15.7691317Zat java.lang.Thread.run(Thread.java:748)
2020-07-08T21:46:15.7691502Z 
2020-07-08T21:46:15.7692183Z "Flink-DispatcherRestEndpoint-thread-3" #83 daemon 
prio=5 os_prio=0 tid=0x7fefa01e2800 nid=0x9dc9 waiting on condition 
[0x7fef7f1f4000]
2020-07-08T21:46:15.7692636Zjava.lang.Thread.State: WAITING (parking)
2020-07-08T21:46:15.7692920Zat sun.misc.Unsafe.park(Native Method)
2020-07-08T21:46:15.7693647Z- parking to wait for  <0x87180cc0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2020-07-08T21:46:15.7694105Zat 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
2020-07-08T21:46:15.7694595Zat 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
2020-07-08T21:46:15.7695178Zat 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
2020-07-08T21:46:15.7695746Zat 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
2020-07-08T

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

2020-07-08 Thread Yang Wang

Congratulations Piotr!


Best,
Yang

Dan Zou  于2020年7月8日周三 下午10:36写道：

> Congratulations!
>
> Best,
> Dan Zou
>
> > 2020年7月8日 下午5:25，godfrey he  写道：
> >
> > Congratulations
>
>

[jira] [Created] (FLINK-18532) Remove Beta tag from MATCH_RECOGNIZE docs

2020-07-08 Thread Seth Wiesman (Jira)

Seth Wiesman created FLINK-18532:


 Summary: Remove Beta tag from MATCH_RECOGNIZE docs
 Key: FLINK-18532
 URL: https://issues.apache.org/jira/browse/FLINK-18532
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.12.0
Reporter: Seth Wiesman
Assignee: Seth Wiesman






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18531) Minicluster option to reuse same port if 0 is applied

2020-07-08 Thread David Chen (Jira)

David Chen created FLINK-18531:
--

 Summary: Minicluster option to reuse same port if 0 is applied
 Key: FLINK-18531
 URL: https://issues.apache.org/jira/browse/FLINK-18531
 Project: Flink
  Issue Type: Test
Reporter: David Chen


If the minicluster port is set to 0, can we have it able to reuse the same port 
after it's been closed and started again? The reason for this use case is 
simply that we have the port set to 0 but we'd like to restart the minicluster 
in order to get rid of completed/expired jobs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

2020-07-08 Thread Dan Zou

Congratulations!

Best,
Dan Zou

> 2020年7月8日 下午5:25，godfrey he  写道：
> 
> Congratulations

[jira] [Created] (FLINK-18530) ParquetAvroWriters can not write data to hdfs

2020-07-08 Thread humengyu (Jira)

humengyu created FLINK-18530:


 Summary: ParquetAvroWriters can not write data to hdfs
 Key: FLINK-18530
 URL: https://issues.apache.org/jira/browse/FLINK-18530
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem
Affects Versions: 1.11.0
Reporter: humengyu


I read data from kafka and write to hdfs by StreamingFileSink:
 # in version 1.11.0, ParquetAvroWriters does not work, but it works well in 
version 1.10.1;
 #  AvroWriters works well in 1.11.0.

{code:java}
public class TestParquetAvroSink {  @Test
  public void testParquet() throws Exception {
EnvironmentSettings settings = 
EnvironmentSettings.newInstance().useBlinkPlanner()
.inStreamingMode().build();
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tEnv = StreamTableEnvironment.create(env, settings);
env.enableCheckpointing(2L);TableSchema tableSchema = 
TableSchema.builder().fields(
new String[]{"id", "name", "sex"},
new DataType[]{DataTypes.STRING(), DataTypes.STRING(), 
DataTypes.STRING()})
.build();// build a kafka source
DataStream rowDataStream = ;Schema schema = SchemaBuilder
.record("xxx")
.namespace("")
.fields()
.optionalString("id")
.optionalString("name")
.optionalString("sex")
.endRecord();OutputFileConfig config = OutputFileConfig
.builder()
.withPartPrefix("prefix")
.withPartSuffix(".ext")
.build();StreamingFileSink sink = StreamingFileSink
.forBulkFormat(
new Path("hdfs://host:port/xxx/xxx/xxx"),
ParquetAvroWriters.forGenericRecord(schema))
.withOutputFileConfig(config)
.withBucketAssigner(new DateTimeBucketAssigner<>("'pdate='-MM-dd"))
.build();SingleOutputStreamOperator recordDateStream 
= rowDataStream
.map(new RecordMapFunction());recordDateStream.print();
recordDateStream.addSink(sink);env.execute("test");  }
  @Test
  public void testAvro() throws Exception {
EnvironmentSettings settings = 
EnvironmentSettings.newInstance().useBlinkPlanner()
.inStreamingMode().build();
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tEnv = StreamTableEnvironment.create(env, settings);
env.enableCheckpointing(2L);TableSchema tableSchema = 
TableSchema.builder().fields(
new String[]{"id", "name", "sex"},
new DataType[]{DataTypes.STRING(), DataTypes.STRING(), 
DataTypes.STRING()})
.build();// build a kafka source
DataStream rowDataStream = ;Schema schema = SchemaBuilder
.record("xxx")
.namespace("")
.fields()
.optionalString("id")
.optionalString("name")
.optionalString("sex")
.endRecord();OutputFileConfig config = OutputFileConfig
.builder()
.withPartPrefix("prefix")
.withPartSuffix(".ext")
.build();StreamingFileSink sink = StreamingFileSink
.forBulkFormat(
new Path("hdfs://host:port/xxx/xxx/xxx"),
AvroWriters.forGenericRecord(schema))
.withOutputFileConfig(config)
.withBucketAssigner(new DateTimeBucketAssigner<>("'pdate='-MM-dd"))
.build();SingleOutputStreamOperator recordDateStream 
= rowDataStream
.map(new RecordMapFunction());recordDateStream.print();
recordDateStream.addSink(sink);env.execute("test");  }  public static 
class RecordMapFunction implements MapFunction {private 
transient Schema schema;@Override
public GenericRecord map(Row row) throws Exception {
  if (schema == null) {
schema = SchemaBuilder
.record("xxx")
.namespace("xxx")
.fields()
.optionalString("id")
.optionalString("name")
.optionalString("sex")
.endRecord();
  }
  Record record = new Record(schema);
  record.put("id", row.getField(0));
  record.put("name", row.getField(1));
  record.put("sex", row.getField(2));
  return record;
}
  }
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18529) Query Hive table and filter by timestamp partition doesn't work

2020-07-08 Thread Rui Li (Jira)

Rui Li created FLINK-18529:
--

 Summary: Query Hive table and filter by timestamp partition 
doesn't work
 Key: FLINK-18529
 URL: https://issues.apache.org/jira/browse/FLINK-18529
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18528) Update UNNEST to new type system

2020-07-08 Thread Timo Walther (Jira)

Timo Walther created FLINK-18528:


 Summary: Update UNNEST to new type system
 Key: FLINK-18528
 URL: https://issues.apache.org/jira/browse/FLINK-18528
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Timo Walther
Assignee: Timo Walther


Updates the built-in UNNEST function to the new type system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18527) Task Managers fail to start on HDP 2.6.5 due to commons-cli conflict

2020-07-08 Thread Truong Duc Kien (Jira)

Truong Duc Kien created FLINK-18527:
---

 Summary: Task Managers fail to start on HDP 2.6.5 due to 
commons-cli conflict 
 Key: FLINK-18527
 URL: https://issues.apache.org/jira/browse/FLINK-18527
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Affects Versions: 1.11.0
Reporter: Truong Duc Kien


When launching a new job in HDP 2.6.5, we are encountering these exceptions

 
{code:java}
2020-07-08 16:10:36 E [default-dispatcher-4] [   o.a.f.y.YarnResourceManager] 
Could not start TaskManager in container container_xxx
java.lang.NoSuchMethodError: 
org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;
      at 
org.apache.flink.runtime.entrypoint.parser.CommandLineOptions.(CommandLineOptions.java:28)
 ~[flink-dist_2.12-1.11.0.jar:1.11.0]

2020-07-08 16:12:46 E [default-dispatcher-4] [   o.a.f.y.YarnResourceManager] 
Could not start TaskManager in container container_xxx {} [] 
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.flink.runtime.entrypoint.parser.CommandLineOptions


{code}
We figure this is because HDP 2.6.5 is putting commons-cli version 1.2 on the 
class path, while Flink is expecting version 1.3. Maybe commons-cli should also 
be shaded to avoid such issue.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

2020-07-08 Thread godfrey he

Congratulations Piotr!

Best,
Godfrey

Yun Tang  于2020年7月8日周三 下午5:22写道：

> Congratulations, Piotr!
>
> Best
> Yun Tang
> 
> From: Danny Chan 
> Sent: Wednesday, July 8, 2020 15:46
> To: dev@flink.apache.org 
> Subject: Re: [ANNOUNCE] New PMC member: Piotr Nowojski
>
> Congratulations and nice work Piotr ~
>
> Best,
> Danny Chan
> 在 2020年7月7日 +0800 PM10:00，dev@flink.apache.org，写道：
> >
> > Congratulations!
>

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

2020-07-08 Thread Yun Tang

Congratulations, Piotr!

Best
Yun Tang

From: Danny Chan 
Sent: Wednesday, July 8, 2020 15:46
To: dev@flink.apache.org 
Subject: Re: [ANNOUNCE] New PMC member: Piotr Nowojski

Congratulations and nice work Piotr ~

Best,
Danny Chan
在 2020年7月7日 +0800 PM10:00，dev@flink.apache.org，写道：
>
> Congratulations!

[jira] [Created] (FLINK-18526) Add the configuration of Python UDF using Managed Memory in the doc of Pyflink

2020-07-08 Thread Huang Xingbo (Jira)

Huang Xingbo created FLINK-18526:


 Summary: Add the configuration of Python UDF using Managed Memory 
in the doc of Pyflink
 Key: FLINK-18526
 URL: https://issues.apache.org/jira/browse/FLINK-18526
 Project: Flink
  Issue Type: Improvement
  Components: API / Python, Documentation
Reporter: Huang Xingbo
 Fix For: 1.12.0, 1.11.1


Add the configuration of Python UDF using Managed Memory in the doc of Pyflink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18525) Running yarn-session.sh occurs error

2020-07-08 Thread zhangyunyun (Jira)

zhangyunyun created FLINK-18525:
---

 Summary: Running yarn-session.sh occurs error
 Key: FLINK-18525
 URL: https://issues.apache.org/jira/browse/FLINK-18525
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.11.0
 Environment: hadoop-2.8.5
Reporter: zhangyunyun


org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy 
Yarn session cluster
 at 
org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:382)
 at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:514)
 at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$4(FlinkYarnSessionCli.java:751)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 at 
org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
 at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:751)
Caused by: java.io.FileNotFoundException: File does not exist: 
/tmp/application_1594196612035_0008-flink-conf.yaml3951184480005887817.tmp
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1444)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437)
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1437)
 at 
org.apache.flink.yarn.YarnApplicationFileUploader.registerSingleLocalResource(YarnApplicationFileUploader.java:163)
 at 
org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:839)
 at 
org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:524)
 at 
org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:375)
 ... 7 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Why ApplicationMode for k8s doesn't support using remote jar?

2020-07-08 Thread DashShen

Hi,
When I use ApplicationMode on k8s to submit a new flink application, it's
not capable for me to using remote jar like what yarn does and I have to
pack the jar into the image before. So I want know why the application mode
for k8s doesn't support to reading remote jar.




--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-08 Thread godfrey he

Congratulations!

Thanks Zhijiang and Piotr for the great work, and thanks everyone for their
contribution!

Best,
Godfrey

Benchao Li  于2020年7月8日周三 下午12:39写道：

> Congratulations!  Thanks Zhijiang & Piotr for the great work as release
> managers.
>
> Rui Li  于2020年7月8日周三 上午11:38写道：
>
>> Congratulations! Thanks Zhijiang & Piotr for the hard work.
>>
>> On Tue, Jul 7, 2020 at 10:06 PM Zhijiang 
>> wrote:
>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.11.0, which is the latest major release.
>>>
>>> Apache Flink® is an open-source stream processing framework for distributed,
>>> high-performing, always-available, and accurate data streaming
>>> applications.
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this new major release:
>>> https://flink.apache.org/news/2020/07/06/release-1.11.0.html
>>>
>>> The full release notes are available in Jira:
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>>
>>> Cheers,
>>> Piotr & Zhijiang
>>>
>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>
>
> --
>
> Best,
> Benchao Li
>

[jira] [Created] (FLINK-18524) Scala varargs cause exception for new inference

2020-07-08 Thread Timo Walther (Jira)

Timo Walther created FLINK-18524:


 Summary: Scala varargs cause exception for new inference
 Key: FLINK-18524
 URL: https://issues.apache.org/jira/browse/FLINK-18524
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Timo Walther
Assignee: Timo Walther


Scala varargs are supported but cause an exception currently. Because there are 
two signatures (a valid and invalid one) in the class file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

2020-07-08 Thread Danny Chan

Congratulations and nice work Piotr ~

Best,
Danny Chan
在 2020年7月7日 +0800 PM10:00，dev@flink.apache.org，写道：
>
> Congratulations!

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-08 Thread Congxian Qiu

Thanks Zhijiang and Piotr for the great work as release manager, and thanks
everyone who makes the release possible!

Best,
Congxian


Benchao Li  于2020年7月8日周三 下午12:39写道：

> Congratulations!  Thanks Zhijiang & Piotr for the great work as release
> managers.
>
> Rui Li  于2020年7月8日周三 上午11:38写道：
>
>> Congratulations! Thanks Zhijiang & Piotr for the hard work.
>>
>> On Tue, Jul 7, 2020 at 10:06 PM Zhijiang 
>> wrote:
>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.11.0, which is the latest major release.
>>>
>>> Apache Flink® is an open-source stream processing framework for distributed,
>>> high-performing, always-available, and accurate data streaming
>>> applications.
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this new major release:
>>> https://flink.apache.org/news/2020/07/06/release-1.11.0.html
>>>
>>> The full release notes are available in Jira:
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>>
>>> Cheers,
>>> Piotr & Zhijiang
>>>
>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>
>
> --
>
> Best,
> Benchao Li
>

[jira] [Created] (FLINK-18523) The watermark could be updated without data for some time

2020-07-08 Thread chen yong (Jira)

chen yong created FLINK-18523:
-

 Summary: The watermark could be updated without data for some time
 Key: FLINK-18523
 URL: https://issues.apache.org/jira/browse/FLINK-18523
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.11.0
Reporter: chen yong


In the case of window calculations and eventTime scenarios, watermar cannot 
update because the source does not have data for some reason, and the last 
Windows cannot trigger the calculations.

One parameter, table.exec.source. Idle -timeout, can only solve the problem of 
ignoring parallelism of watermark alignment that does not occur.But when there 
is no watermark in each parallel degree, you still cannot update the watermark.

Is it possible to add a lock-timeout parameter (which should be larger than 
maxOutOfOrderness with default of "-1 ms") and if the watermark is not updated 
beyond this time (i.e., there is no data), then the current time is taken and 
sent downstream as the watermark.

 

thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-18535) Elasticsearch connector hangs for threads deadlocked

Re: [DISCUSS] Releasing Flink 1.11.1 soon?

[DISCUSS] Releasing Flink 1.11.1 soon?

Re: [DISCUSS] FLIP-130: Support for Python DataStream API (Stateless Part)

Re: [DISCUSS] FLIP-130: Support for Python DataStream API (Stateless Part)

Re: [DISCUSS] FLIP-130: Support for Python DataStream API (Stateless Part)

[jira] [Created] (FLINK-18534) KafkaTableITCase.testKafkaDebeziumChangelogSource failed with "Topic 'changelog_topic' already exists"

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

[jira] [Created] (FLINK-18533) AccumulatorLiveITCase.testStreaming hangs

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

[jira] [Created] (FLINK-18532) Remove Beta tag from MATCH_RECOGNIZE docs

[jira] [Created] (FLINK-18531) Minicluster option to reuse same port if 0 is applied

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

[jira] [Created] (FLINK-18530) ParquetAvroWriters can not write data to hdfs

[jira] [Created] (FLINK-18529) Query Hive table and filter by timestamp partition doesn't work

[jira] [Created] (FLINK-18528) Update UNNEST to new type system

[jira] [Created] (FLINK-18527) Task Managers fail to start on HDP 2.6.5 due to commons-cli conflict

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

[jira] [Created] (FLINK-18526) Add the configuration of Python UDF using Managed Memory in the doc of Pyflink

[jira] [Created] (FLINK-18525) Running yarn-session.sh occurs error

Why ApplicationMode for k8s doesn't support using remote jar?

Re: [ANNOUNCE] Apache Flink 1.11.0 released

[jira] [Created] (FLINK-18524) Scala varargs cause exception for new inference

Re: [ANNOUNCE] New PMC member: Piotr Nowojski

Re: [ANNOUNCE] Apache Flink 1.11.0 released

[jira] [Created] (FLINK-18523) The watermark could be updated without data for some time

27 matches

Site Navigation

Mail list logo

Footer information