Re: [DISCUSS][Release 1.12] Stale blockers and build instabilities

2020-08-10 Thread Robert Metzger
Hi team,

2 weeks have passed since the last update. None of the test stabilities
I've mentioned have been addressed since then.

Here's an updated status report of Blockers and Test instabilities:

Blockers :
Currently 3 blockers (2x Hive, 1x CI Infra)

Test-Instabilities
 (total
79) which failed recently or frequently:


- FLINK-18807 
FlinkKafkaProducerITCase.testScaleUpAfterScalingDown
failed with "Timeout expired after 6milliseconds while awaiting
EndTxn(COMMIT)"

- FLINK-18634 
FlinkKafkaProducerITCase.testRecoverCommittedTransaction
failed with "Timeout expired after 6milliseconds while awaiting
InitProducerId"

- FLINK-16908 
FlinkKafkaProducerITCase
testScaleUpAfterScalingDown Timeout expired while initializing
transactional state in 6ms.

- FLINK-13733 
FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis

--> The first three tickets seem related.


- FLINK-17260 
StreamingKafkaITCase failure on Azure

--> This one seems really hard to reproduce


- FLINK-16768 
HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart
hangs

- FLINK-18374 
HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart
produced no output for 900 seconds

--> nobody seems to feel responsible for these tickets. My guess is that
the S3 connector should have shorter timeouts / faster retries to finish
within the 15 minutes test timeout. OR there is really something wrong with
the code.


- FLINK-18333 UnsignedTypeConversionITCase failed caused by MariaDB4j
"Asked to waitFor Program"

- FLINK-17159
 ES6
ElasticsearchSinkITCase unstable

- FLINK-17949 
KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388
expected:<310> but was:<0>

- FLINK-18222  "Avro
Confluent Schema Registry nightly end-to-end test" unstable with "Kafka
cluster did not start after 120 seconds"

- FLINK-17511  "RocksDB
Memory Management end-to-end test" fails with "Current block cache usage
202123272 larger than expected memory limit 2"




On Mon, Jul 27, 2020 at 8:42 PM Robert Metzger  wrote:

> Hi team,
>
> We would like to use this thread as a permanent thread for
> regularly syncing on stale blockers (need to have somebody assigned within
> a week and progress, or a good plan) and build instabilities (need to check
> if its a blocker).
>
> Recent test-instabilities:
>
>- https://issues.apache.org/jira/browse/FLINK-17159 (ES6 test)
>- https://issues.apache.org/jira/browse/FLINK-16768 (s3 test unstable)
>- https://issues.apache.org/jira/browse/FLINK-18374 (s3 test unstable)
>- https://issues.apache.org/jira/browse/FLINK-17949
>(KafkaShuffleITCase)
>- https://issues.apache.org/jira/browse/FLINK-18634 (Kafka
>transactions)
>
>
> It would be nice if the committers taking care of these components could
> look into the test failures.
> If nothing happens, we'll personally reach out to people I believe they
> could look into the ticket.
>
> Best,
> Dian & Robert
>


[jira] [Created] (FLINK-18889) New Async Table Function type inference fails

2020-08-10 Thread Mulan (Jira)
Mulan created FLINK-18889:
-

 Summary: New Async Table Function type inference fails
 Key: FLINK-18889
 URL: https://issues.apache.org/jira/browse/FLINK-18889
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.11.1
Reporter: Mulan


{code:java}
@FunctionHint(
input = @DataTypeHint("STRING"),
output = @DataTypeHint("ROW")
)
public class RedisAsyncTableFunction extends AsyncTableFunction {

private RedisClient redisClient;
private StatefulRedisConnection connection;
private RedisKeyAsyncCommands async;
private static final String PREFIX = "redis://";
private static final String DEFAULT_DB = "0";
private static final String DEFAULT_URL = "localhost:6379";
private static final String DEFAULT_PASSWORD = "";

@Override
public void open(FunctionContext context) throws Exception {
final String url = DEFAULT_URL;
final String password = DEFAULT_PASSWORD;
final String database = DEFAULT_DB;
StringBuilder redisUri = new StringBuilder();

redisUri.append(PREFIX).append(password).append(url).append("/").append(database);

redisClient = RedisClient.create(redisUri.toString());
connection = redisClient.connect();
async = connection.async();
}

public void eval(CompletableFuture> outputFuture, String 
key) {
RedisFuture> redisFuture = 
((RedisHashAsyncCommands) async).hgetall(key);
redisFuture.thenAccept(new Consumer>() {
@Override
public void accept(Map values) {
int len = 1;
Row row = new Row(len);
row.setField(0, values.get("ct"));
outputFuture.complete(Collections.singletonList(row));
}
});
}

@Override
public void close() throws Exception {
if (connection != null){
connection.close();
}
if (redisClient != null){
redisClient.shutdown();
}
}
}
{code}
{code:java}
tEnv.createTemporarySystemFunction("lookup_redis", 
RedisAsyncTableFunction.class);
{code}
{code:java}
Exception in thread "main" org.apache.flink.table.api.ValidationException: SQL 
validation failed. From line 3, column 31 to line 3, column 48: No match found 
for function signature lookup_redis()
at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146)
at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108)
at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:185)
at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlInsert(SqlToOperationConverter.java:525)
at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:202)
at 
org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:684)
at hua.mulan.slink.SqlSubmit.callInsertInto(SqlSubmit.java:100)
at hua.mulan.slink.SqlSubmit.callCommand(SqlSubmit.java:75)
at hua.mulan.slink.SqlSubmit.run(SqlSubmit.java:57)
at hua.mulan.slink.SqlSubmit.main(SqlSubmit.java:38)
Caused by: org.apache.calcite.runtime.CalciteContextException: From line 3, 
column 31 to line 3, column 48: No match found for function signature 
lookup_redis()
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:457)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:839)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:824)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:5089)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.handleUnresolvedFunction(SqlValidatorImpl.java:1882)
at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:305)
at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:218)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5858)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5845)
at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:139)
 

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-10 Thread Leonard Xu
Hi, all

I’m +1 to support HBase 2.x and also keep the HBase1.x in flink project.

IIRC,  HBase1.x  is still widely used in production  in the early discussion of 
this thread according to HBase PMC memeber Yu Li and OpenInx’s investigation.
Moving HBase1.x connector to Bahir means the HBase1.x connector will lose the 
timely support from Flink community in my opinion, I’m slightly -1 for this.

Best
Leonard Xu


> 在 2020年8月11日,02:36,Márton Balassi  写道:
> 
> Hi All,
> 
> I am also fairly torn on this one, however unless we are vigilant in keeping 
> the flink repository relatively lean the number of modules will just keep 
> increasing and pose an increasingly greater maintainability challenge.
> Less frequently used connectors are a strong candidate to be maintained in 
> bahir-flink and/or via flink-packages.org  (I do 
> not support creating a third option in apache/flink-connectors). If the 
> testing infrastructure of bahir-flink is a concern, then we should invest 
> into improving that, so that it can serve as a reasonable alternative. 
> 
> I prefer the option of HBase 2.x in Flink and 1.x in Bahir, with a community 
> commitment of improving the Bahir testing infra. If taking this step 
> immediately is deemed too risky I can accept having the two version 
> side-by-side in Flink for the time being, but without refactoring them to use 
> a common base module (like flink-kafka-connector-base) as we expect to move 
> 1.x to Bahir when the infra is satisfactory.
> 
> My position is not against HBase by any means, it is for a more maintainable 
> Flink repository. I have assigned [1] to Miklos, he aims at opening a PR in 
> the coming days - which we might modify based on the outcome of this 
> discussion.
> 
> [1] https://issues.apache.org/jira/browse/FLINK-18795 
> 
> On Mon, Aug 10, 2020 at 4:16 PM Robert Metzger  > wrote:
> @Jark: Thanks for bringing up these concerns.
> All the problems you've mentioned are "solvable": 
> - uber jar: Bahir could provide a hbase1 uber jar (we could theoretically 
> also add a dependency from flink to bahir and provide the uber jar from Flink)
> - e2e tests: we know that the connector is stable, as long as we are not 
> adding major changes (or we are moving the respective e2e tests to bahir).
> 
> On the other hand, I agree with you that supporting multiple versions of a 
> connector is pretty common (see Kafka or elasticsearch), so why can't we 
> allow it for Hbase now?
> 
> I'm really torn on this and would like to hear more opinions on this.
> 
> 
> On Fri, Aug 7, 2020 at 11:24 PM Felipe Lolas  > wrote:
> Hi all!
> 
> Im new here; I have been using the flink connector for hbase 1.2, but 
> recently opt to upgrading to hbase 2.1(basically because was bundled in CDH6)
> 
> it would be nice to add support for hbase 2.x! 
> I found that supporting hbase 1.4.3 and 2.1 needs minimal changes and keeping 
> that in mind last week I sent a PR with a solution supporting 1.4.3/2.1.0 
> hbase (maybe not the best, im sorry if i break some rules sending the PR).
> 
> i would be happy to help if needed!
> 
> 
> 
> Felipe.
> 
>> El 07-08-2020, a la(s) 10:53, Jark Wu > > escribió:
>> 
>> 
>> I'm +1 to add HBase 2.x
>> 
>> However, I have some concerns about moving HBase 1.x to Bahir:
>> 1) As discussed above, there are still lots of people using HBase 1.x.
>> 2) Bahir doesn't have the infrastructure to run the existing HBase E2E tests.
>> 3) We also paid lots of effort to provide an uber connector jar for HBase 
>> (not yet released), it is helpful to improve the out-of-box experience. 
>> 
>> My thought is that adding HBase 2.x doesn't have to remove HBase 1.x. It 
>> doesn't add too much work to maintain a new version. 
>> Keeping the old version can also help us to develop the new one. I would 
>> suggest to keep HBase 1.x in the repository for at least one more release. 
>> Another idea is that maybe it's a good time to have a 
>> "apache/flink-connectors" repository, and move both HBase 1.x and 2.x to it. 
>> It would also be a good place to accept the contribution of pulsar connector 
>> and other connectors. 
>> 
>> Best,
>> Jark
>> 
>> 
>> On Fri, 7 Aug 2020 at 17:54, Robert Metzger > > wrote:
>> Hi,
>> 
>> Thank you for picking this up so quickly. I have no objections regarding
>> all the proposed items.
>> @Gyula: Once the bahir contribution is properly reviewed, ping me if you
>> need somebody to merge it.
>> 
>> 
>> On Fri, Aug 7, 2020 at 10:43 AM Márton Balassi > >
>> wrote:
>> 
>> > Hi Robert and Gyula,
>> >
>> > Thanks for reviving this thread. We have the implementation (currently for
>> > 2.2.3) and it is straightforward to contribute it back. Miklos (ccd) has
>> > recently written a readme for said version, he would be interested in
>> > contr

[jira] [Created] (FLINK-18888) Add ElasticSearch connector for Python DataStream API

2020-08-10 Thread Shuiqiang Chen (Jira)
Shuiqiang Chen created FLINK-1:
--

 Summary: Add ElasticSearch connector for Python DataStream API
 Key: FLINK-1
 URL: https://issues.apache.org/jira/browse/FLINK-1
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Shuiqiang Chen
 Fix For: 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18887) Add ElasticSearch connector for Python DataStream API

2020-08-10 Thread Shuiqiang Chen (Jira)
Shuiqiang Chen created FLINK-18887:
--

 Summary: Add ElasticSearch connector for Python DataStream API
 Key: FLINK-18887
 URL: https://issues.apache.org/jira/browse/FLINK-18887
 Project: Flink
  Issue Type: Sub-task
Reporter: Shuiqiang Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18886) Support Kafka connectors for Python DataStream API

2020-08-10 Thread Shuiqiang Chen (Jira)
Shuiqiang Chen created FLINK-18886:
--

 Summary: Support Kafka connectors for Python DataStream API
 Key: FLINK-18886
 URL: https://issues.apache.org/jira/browse/FLINK-18886
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Shuiqiang Chen
 Fix For: 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18885) Add partitioning interfaces for Python DataStream.

2020-08-10 Thread Shuiqiang Chen (Jira)
Shuiqiang Chen created FLINK-18885:
--

 Summary: Add partitioning interfaces for Python DataStream.
 Key: FLINK-18885
 URL: https://issues.apache.org/jira/browse/FLINK-18885
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Shuiqiang Chen
 Fix For: 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18884) Add chaining strategy and slot sharing group interfaces for DataStream

2020-08-10 Thread Shuiqiang Chen (Jira)
Shuiqiang Chen created FLINK-18884:
--

 Summary: Add chaining strategy and slot sharing group interfaces 
for DataStream
 Key: FLINK-18884
 URL: https://issues.apache.org/jira/browse/FLINK-18884
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Shuiqiang Chen
 Fix For: 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18883) Support reduce() operation for Python KeyedStream.

2020-08-10 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-18883:
---

 Summary: Support reduce() operation for Python KeyedStream.
 Key: FLINK-18883
 URL: https://issues.apache.org/jira/browse/FLINK-18883
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Hequn Cheng
Assignee: Hequn Cheng
 Fix For: 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18882) Investigate supporting Hive interval type

2020-08-10 Thread Rui Li (Jira)
Rui Li created FLINK-18882:
--

 Summary: Investigate supporting Hive interval type
 Key: FLINK-18882
 URL: https://issues.apache.org/jira/browse/FLINK-18882
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Reporter: Rui Li


Some UDF returns interval results, and such UDF cannot be called via Flink 
because interval type is not supported at the moment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18881) Modify the Access Broken Link

2020-08-10 Thread weizheng (Jira)
weizheng created FLINK-18881:


 Summary: Modify the Access Broken Link
 Key: FLINK-18881
 URL: https://issues.apache.org/jira/browse/FLINK-18881
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.11.1, 1.11.0
 Environment: In 
Page:[https://ci.apache.org/projects/flink/flink-docs-release-1.11/learn-flink/etl.html]

the link of Rides and Fares 
Exercise[链接标题|[https://github.com/apache/flink-training/tree/release-1.11/rides-and-fares]]
 is not accessible
Reporter: weizheng
 Fix For: 1.11.1, 1.11.0


https://github.com/apache/flink-training/tree/release-1.11/rides-and-fares



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18880) Allow to configure the heap memory used by the gateway server

2020-08-10 Thread Dian Fu (Jira)
Dian Fu created FLINK-18880:
---

 Summary: Allow to configure the heap memory used by the gateway 
server
 Key: FLINK-18880
 URL: https://issues.apache.org/jira/browse/FLINK-18880
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.12.0, 1.11.2


Currently, it doesn't allow to configure the heap memory used by the gateway 
server. It will OOM in scenarios such as Table.to_pandas when the content of 
the Table is big. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-10 Thread jincheng sun
Thank you for your positive feedback Seth !
Would you please vote in the voting mail thread. Thank you!

Best,
Jincheng


Seth Wiesman  于2020年8月10日周一 下午10:34写道:

> I think this sounds good. +1
>
> On Wed, Aug 5, 2020 at 8:37 PM jincheng sun 
> wrote:
>
>> Hi David, Thank you for sharing the problems with the current document,
>> and I agree with you as I also got the same feedback from Chinese users. I
>> am often contacted by users to ask questions such as whether PyFlink
>> supports "Java UDF" and whether PyFlink supports "xxxConnector". The root
>> cause of these problems is that our existing documents are based on Java
>> users (text and API mixed part). Since Python is newly added from 1.9, many
>> document information is not friendly to Python users. They don't want to
>> look for Python content in unfamiliar Java documents. Just yesterday, there
>> were complaints from Chinese users about where is all the document entries
>> of  Python API. So, have a centralized entry and clear document structure,
>> which is the urgent demand of Python users. The original intention of FLIP
>> is do our best to solve these user pain points.
>>
>> Hi Xingbo and Wei Thank you for sharing PySpark's status on document
>> optimization. You're right. PySpark already has a lot of Python user
>> groups. They also find that Python user community is an important position
>> for multilingual support. The centralization and unification of Python
>> document content will reduce the learning cost of Python users, and good
>> document structure and content will also reduce the Q & A burden of the
>> community, It's a once and for all job.
>>
>> Hi Seth, I wonder if your concerns have been resolved through the
>> previous discussion?
>>
>> Anyway, the principle of FLIP is that in python document should only
>> include Python specific content, instead of making a copy of the Java
>> content. And would be great to have you to join in the improvement for
>> PyFlink (Both PRs and Review PRs).
>>
>> Best,
>> Jincheng
>>
>>
>> Wei Zhong  于2020年8月5日周三 下午5:46写道:
>>
>>> Hi Xingbo,
>>>
>>> Thanks for your information.
>>>
>>> I think the PySpark's documentation redesigning deserves our attention.
>>> It seems that the Spark community has also begun to treat the user
>>> experience of Python documentation more seriously. We can continue to pay
>>> attention to the discussion and progress of the redesigning in the Spark
>>> community. It is so similar to our working that there should be some ideas
>>> worthy for us.
>>>
>>> Best,
>>> Wei
>>>
>>>
>>> 在 2020年8月5日,15:02,Xingbo Huang  写道:
>>>
>>> Hi,
>>>
>>> I found that the spark community is also working on redesigning pyspark
>>> documentation[1] recently. Maybe we can compare the difference between our
>>> document structure and its document structure.
>>>
>>> [1] https://issues.apache.org/jira/browse/SPARK-31851
>>>
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html
>>>
>>> Best,
>>> Xingbo
>>>
>>> David Anderson  于2020年8月5日周三 上午3:17写道:
>>>
 I'm delighted to see energy going into improving the documentation.

 With the current documentation, I get a lot of questions that I believe
 reflect two fundamental problems with what we currently provide:

 (1) We have a lot of contextual information in our heads about how
 Flink works, and we are able to use that knowledge to make reasonable
 inferences about how things (probably) work in cases we aren't so familiar
 with. For example, I get a lot of questions of the form "If I use >>> feature> will I still have exactly once guarantees?" The answer is always
 yes, but they continue to have doubts because we have failed to clearly
 communicate this fundamental, underlying principle.

 This specific example about fault tolerance applies across all of the
 Flink docs, but the general idea can also be applied to the Table/SQL and
 PyFlink docs. The guiding principles underlying these APIs should be
 written down in one easy-to-find place.

 (2) The other kind of question I get a lot is "Can I do  with ?"
 E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be
 very difficult to answer because it is frequently the case that one has to
 reason about why a given feature doesn't seem to appear in the
 documentation. It could be that I'm looking in the wrong place, or it could
 be that someone forgot to document something, or it could be that it can in
 fact be done by applying a general mechanism in a specific way that I
 haven't thought of -- as in this case, where one can use a JDBC sink from
 Python if one thinks to use DDL.

 So I think it would be helpful to be explicit about both what is, and
 what is not, supported in PyFlink. And to have some very clear organizing
 principles in the documentation so that users can quickly learn 

[jira] [Created] (FLINK-18879) Support Row Serialization and Deserialization schemas for DataStream source/sink

2020-08-10 Thread Shuiqiang Chen (Jira)
Shuiqiang Chen created FLINK-18879:
--

 Summary: Support Row Serialization and Deserialization schemas for 
DataStream source/sink
 Key: FLINK-18879
 URL: https://issues.apache.org/jira/browse/FLINK-18879
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Shuiqiang Chen
 Fix For: 1.12.0


There are many built-in RowSerializationSchemas and RowDeserializationSchemas 
for DataStream Source and Sink, like JsonRow(De)SerializationSchema, 
AvroRow(De)SerializationSchema. In Python DataStream API, we will also support 
these schemas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-10 Thread Márton Balassi
Hi All,

I am also fairly torn on this one, however unless we are vigilant in
keeping the flink repository relatively lean the number of modules will
just keep increasing and pose an increasingly greater maintainability
challenge.
Less frequently used connectors are a strong candidate to be maintained in
bahir-flink and/or via flink-packages.org (I do not support creating a
third option in apache/flink-connectors). If the testing infrastructure of
bahir-flink is a concern, then we should invest into improving that, so
that it can serve as a reasonable alternative.

I prefer the option of HBase 2.x in Flink and 1.x in Bahir, with a
community commitment of improving the Bahir testing infra. If taking this
step immediately is deemed too risky I can accept having the two version
side-by-side in Flink for the time being, but without refactoring them to
use a common base module (like flink-kafka-connector-base) as we expect to
move 1.x to Bahir when the infra is satisfactory.

My position is not against HBase by any means, it is for a more
maintainable Flink repository. I have assigned [1] to Miklos, he aims at
opening a PR in the coming days - which we might modify based on the
outcome of this discussion.

[1] https://issues.apache.org/jira/browse/FLINK-18795

On Mon, Aug 10, 2020 at 4:16 PM Robert Metzger  wrote:

> @Jark: Thanks for bringing up these concerns.
> All the problems you've mentioned are "solvable":
> - uber jar: Bahir could provide a hbase1 uber jar (we could theoretically
> also add a dependency from flink to bahir and provide the uber jar from
> Flink)
> - e2e tests: we know that the connector is stable, as long as we are not
> adding major changes (or we are moving the respective e2e tests to bahir).
>
> On the other hand, I agree with you that supporting multiple versions of a
> connector is pretty common (see Kafka or elasticsearch), so why can't we
> allow it for Hbase now?
>
> I'm really torn on this and would like to hear more opinions on this.
>
>
> On Fri, Aug 7, 2020 at 11:24 PM Felipe Lolas  wrote:
>
>> Hi all!
>>
>> Im new here; I have been using the flink connector for hbase 1.2, but
>> recently opt to upgrading to hbase 2.1(basically because was bundled in
>> CDH6)
>>
>> it would be nice to add support for hbase 2.x!
>> I found that supporting hbase 1.4.3 and 2.1 needs minimal changes and
>> keeping that in mind last week I sent a PR with a solution supporting
>> 1.4.3/2.1.0 hbase (maybe not the best, im sorry if i break some rules
>> sending the PR).
>>
>> i would be happy to help if needed!
>>
>>
>>
>> Felipe.
>>
>> El 07-08-2020, a la(s) 10:53, Jark Wu  escribió:
>>
>> 
>> I'm +1 to add HBase 2.x
>>
>> However, I have some concerns about moving HBase 1.x to Bahir:
>> 1) As discussed above, there are still lots of people using HBase 1.x.
>> 2) Bahir doesn't have the infrastructure to run the existing HBase E2E
>> tests.
>> 3) We also paid lots of effort to provide an uber connector jar for HBase
>> (not yet released), it is helpful to improve the out-of-box experience.
>>
>> My thought is that adding HBase 2.x doesn't have to remove HBase 1.x. It
>> doesn't add too much work to maintain a new version.
>> Keeping the old version can also help us to develop the new one. I would
>> suggest to keep HBase 1.x in the repository for at least one more release.
>> Another idea is that maybe it's a good time to have a
>> "apache/flink-connectors" repository, and move both HBase 1.x and 2.x to
>> it.
>> It would also be a good place to accept the contribution of pulsar
>> connector and other connectors.
>>
>> Best,
>> Jark
>>
>>
>> On Fri, 7 Aug 2020 at 17:54, Robert Metzger  wrote:
>>
>>> Hi,
>>>
>>> Thank you for picking this up so quickly. I have no objections regarding
>>> all the proposed items.
>>> @Gyula: Once the bahir contribution is properly reviewed, ping me if you
>>> need somebody to merge it.
>>>
>>>
>>> On Fri, Aug 7, 2020 at 10:43 AM Márton Balassi >> >
>>> wrote:
>>>
>>> > Hi Robert and Gyula,
>>> >
>>> > Thanks for reviving this thread. We have the implementation (currently
>>> for
>>> > 2.2.3) and it is straightforward to contribute it back. Miklos (ccd)
>>> has
>>> > recently written a readme for said version, he would be interested in
>>> > contributing the upgraded connector back. The latest HBase version is
>>> > 2.3.0, if we are touching the codebase anyway I would propose to have
>>> that.
>>> >
>>> > If everyone is comfortable with it I would assign [1] to Miklos with
>>> > double checking the all functionality that Felipe has proposed is
>>> included.
>>> > [1] https://issues.apache.org/jira/browse/FLINK-18795
>>> > [2] https://hbase.apache.org/downloads.html
>>> >
>>> > On Fri, Aug 7, 2020 at 10:13 AM Gyula Fóra 
>>> wrote:
>>> >
>>> >> Hi Robert,
>>> >>
>>> >> I completely agree with you on the Bahir based approach.
>>> >>
>>> >> I am happy to help with the contribution on the bahir side, with
>>> thorough
>>> >>  review and testing.
>>> >>
>>> >> Cheers,
>>> >

[jira] [Created] (FLINK-18878) Support dependency management for Python StreamExecutionEnvironment.

2020-08-10 Thread Shuiqiang Chen (Jira)
Shuiqiang Chen created FLINK-18878:
--

 Summary: Support dependency management for Python 
StreamExecutionEnvironment.
 Key: FLINK-18878
 URL: https://issues.apache.org/jira/browse/FLINK-18878
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Shuiqiang Chen
 Fix For: 1.12.0


Add dependency management for StreamExecutionEnvironment when Users need to 
specified third party dependencies in their DataStream Job. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18877) Faulty BinarySection.equals()

2020-08-10 Thread Timo Walther (Jira)
Timo Walther created FLINK-18877:


 Summary: Faulty BinarySection.equals()
 Key: FLINK-18877
 URL: https://issues.apache.org/jira/browse/FLINK-18877
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Reporter: Timo Walther
Assignee: Timo Walther


Classes extending from `BinarySection` rely on `BinarySection.equals()` but 
this does not allow for subclasses. Thus, `BinaryRowData`, `BinaryMapData`, 
`BinaryArrayData`, `NestedRowData` have faulty equality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-10 Thread Seth Wiesman
I think this sounds good. +1

On Wed, Aug 5, 2020 at 8:37 PM jincheng sun 
wrote:

> Hi David, Thank you for sharing the problems with the current document,
> and I agree with you as I also got the same feedback from Chinese users. I
> am often contacted by users to ask questions such as whether PyFlink
> supports "Java UDF" and whether PyFlink supports "xxxConnector". The root
> cause of these problems is that our existing documents are based on Java
> users (text and API mixed part). Since Python is newly added from 1.9, many
> document information is not friendly to Python users. They don't want to
> look for Python content in unfamiliar Java documents. Just yesterday, there
> were complaints from Chinese users about where is all the document entries
> of  Python API. So, have a centralized entry and clear document structure,
> which is the urgent demand of Python users. The original intention of FLIP
> is do our best to solve these user pain points.
>
> Hi Xingbo and Wei Thank you for sharing PySpark's status on document
> optimization. You're right. PySpark already has a lot of Python user
> groups. They also find that Python user community is an important position
> for multilingual support. The centralization and unification of Python
> document content will reduce the learning cost of Python users, and good
> document structure and content will also reduce the Q & A burden of the
> community, It's a once and for all job.
>
> Hi Seth, I wonder if your concerns have been resolved through the previous
> discussion?
>
> Anyway, the principle of FLIP is that in python document should only
> include Python specific content, instead of making a copy of the Java
> content. And would be great to have you to join in the improvement for
> PyFlink (Both PRs and Review PRs).
>
> Best,
> Jincheng
>
>
> Wei Zhong  于2020年8月5日周三 下午5:46写道:
>
>> Hi Xingbo,
>>
>> Thanks for your information.
>>
>> I think the PySpark's documentation redesigning deserves our attention.
>> It seems that the Spark community has also begun to treat the user
>> experience of Python documentation more seriously. We can continue to pay
>> attention to the discussion and progress of the redesigning in the Spark
>> community. It is so similar to our working that there should be some ideas
>> worthy for us.
>>
>> Best,
>> Wei
>>
>>
>> 在 2020年8月5日,15:02,Xingbo Huang  写道:
>>
>> Hi,
>>
>> I found that the spark community is also working on redesigning pyspark
>> documentation[1] recently. Maybe we can compare the difference between our
>> document structure and its document structure.
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-31851
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html
>>
>> Best,
>> Xingbo
>>
>> David Anderson  于2020年8月5日周三 上午3:17写道:
>>
>>> I'm delighted to see energy going into improving the documentation.
>>>
>>> With the current documentation, I get a lot of questions that I believe
>>> reflect two fundamental problems with what we currently provide:
>>>
>>> (1) We have a lot of contextual information in our heads about how Flink
>>> works, and we are able to use that knowledge to make reasonable inferences
>>> about how things (probably) work in cases we aren't so familiar with. For
>>> example, I get a lot of questions of the form "If I use  will
>>> I still have exactly once guarantees?" The answer is always yes, but they
>>> continue to have doubts because we have failed to clearly communicate this
>>> fundamental, underlying principle.
>>>
>>> This specific example about fault tolerance applies across all of the
>>> Flink docs, but the general idea can also be applied to the Table/SQL and
>>> PyFlink docs. The guiding principles underlying these APIs should be
>>> written down in one easy-to-find place.
>>>
>>> (2) The other kind of question I get a lot is "Can I do  with ?"
>>> E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be
>>> very difficult to answer because it is frequently the case that one has to
>>> reason about why a given feature doesn't seem to appear in the
>>> documentation. It could be that I'm looking in the wrong place, or it could
>>> be that someone forgot to document something, or it could be that it can in
>>> fact be done by applying a general mechanism in a specific way that I
>>> haven't thought of -- as in this case, where one can use a JDBC sink from
>>> Python if one thinks to use DDL.
>>>
>>> So I think it would be helpful to be explicit about both what is, and
>>> what is not, supported in PyFlink. And to have some very clear organizing
>>> principles in the documentation so that users can quickly learn where to
>>> look for specific facts.
>>>
>>> Regards,
>>> David
>>>
>>>
>>> On Tue, Aug 4, 2020 at 1:01 PM jincheng sun 
>>> wrote:
>>>
 Hi Seth and David,

 I'm very happy to have your reply and suggestions. I would like to
 share my thoughts here:

>>>

Re: Status of FLIPs

2020-08-10 Thread Robert Metzger
Thanks for cleaning up our Wiki.

Seth and Timo: Can you check the status of the FLIPs mentioned in (3) ?

On Wed, Aug 5, 2020 at 9:23 AM jincheng sun 
wrote:

> Good job Dian!
> Thank you for helping to maintain the state of FLIPs which is very
> important for the community to understand the progress of FLIPs.
>
> Best,
> Jincheng
>
>
> Dian Fu  于2020年8月4日周二 下午5:41写道:
>
> > Hi all,
> >
> > When going through the status of existing FLIPs[1], I found that the
> > status of quite a few of FLIPs are out of date.
> >
> > 1) For the FLIPs which I'm pretty sure that are already finished(the
> > umbrella JIRA has been resolved), I have updated the status of them by
> > moving them from "Adopted/Accepted but unreleased FLIPs" to "Implemented
> > and Released FLIPs":
> >
> > - FLIP-52: Remove legacy Program interface.
> > - FLIP-57: Rework FunctionCatalog
> > - FLIP-63: Rework table partition support
> > - FLIP-68: Extend Core Table System with Pluggable Modules
> > - FLIP-73: Introducing Executors for job submission
> > - FLIP-81: Executor-related new ConfigOptions.
> > - FLIP-84: Improve & Refactor API of TableEnvironment
> > - FLIP-85: Flink Application Mode
> > - FLIP-86: Improve Connector Properties
> > - FLIP 87: Primary key constraints in Table API
> > - FLIP-92: Add N-Ary Stream Operator in Flink
> > - FLIP-103: Better TM/JM Log Display
> > - FLIP-123: DDL and DML compatibility for Hive connector
> > - FLIP-124: Add open/close and Collector to (De)SerializationSchema
> >
> >
> > 2) For the following FLIPs, it seems that the work has already been
> > finished. However, as the umbrella JIRA is still open and so I'm leaving
> > the status of them as it is:
> >
> > - FLIP-55: Introduction of a Table API Java Expression DSL
> > - FLIP-93: JDBC catalog and Postgres catalog
> > - FLIP-110: Support LIKE clause in CREATE TABLE
> >
> >
> > 3) For the following FLIPs, there are still some open subtasks. However,
> > it seems that the main work has already been finished and so I guess(not
> > quite sure) maybe we should also move them to "Implemented and Released
> > FLIPs":
> >
> > - FLIP-43: State Processing API
> > - FLIP-66: Support time attribute in SQL DDL
> > - FLIP-69: Flink SQL DDL Enhancement
> > - FLIP-79: Flink Function DDL Support
> >
> > For the FLIPs under 2) and 3), it would be great if the people who are
> > familiar with them could double check that whether we should update the
> > status of them.
> >
> > Thanks,
> > Dian
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >
>


Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-10 Thread Robert Metzger
@Jark: Thanks for bringing up these concerns.
All the problems you've mentioned are "solvable":
- uber jar: Bahir could provide a hbase1 uber jar (we could theoretically
also add a dependency from flink to bahir and provide the uber jar from
Flink)
- e2e tests: we know that the connector is stable, as long as we are not
adding major changes (or we are moving the respective e2e tests to bahir).

On the other hand, I agree with you that supporting multiple versions of a
connector is pretty common (see Kafka or elasticsearch), so why can't we
allow it for Hbase now?

I'm really torn on this and would like to hear more opinions on this.


On Fri, Aug 7, 2020 at 11:24 PM Felipe Lolas  wrote:

> Hi all!
>
> Im new here; I have been using the flink connector for hbase 1.2, but
> recently opt to upgrading to hbase 2.1(basically because was bundled in
> CDH6)
>
> it would be nice to add support for hbase 2.x!
> I found that supporting hbase 1.4.3 and 2.1 needs minimal changes and
> keeping that in mind last week I sent a PR with a solution supporting
> 1.4.3/2.1.0 hbase (maybe not the best, im sorry if i break some rules
> sending the PR).
>
> i would be happy to help if needed!
>
>
>
> Felipe.
>
> El 07-08-2020, a la(s) 10:53, Jark Wu  escribió:
>
> 
> I'm +1 to add HBase 2.x
>
> However, I have some concerns about moving HBase 1.x to Bahir:
> 1) As discussed above, there are still lots of people using HBase 1.x.
> 2) Bahir doesn't have the infrastructure to run the existing HBase E2E
> tests.
> 3) We also paid lots of effort to provide an uber connector jar for HBase
> (not yet released), it is helpful to improve the out-of-box experience.
>
> My thought is that adding HBase 2.x doesn't have to remove HBase 1.x. It
> doesn't add too much work to maintain a new version.
> Keeping the old version can also help us to develop the new one. I would
> suggest to keep HBase 1.x in the repository for at least one more release.
> Another idea is that maybe it's a good time to have a
> "apache/flink-connectors" repository, and move both HBase 1.x and 2.x to
> it.
> It would also be a good place to accept the contribution of pulsar
> connector and other connectors.
>
> Best,
> Jark
>
>
> On Fri, 7 Aug 2020 at 17:54, Robert Metzger  wrote:
>
>> Hi,
>>
>> Thank you for picking this up so quickly. I have no objections regarding
>> all the proposed items.
>> @Gyula: Once the bahir contribution is properly reviewed, ping me if you
>> need somebody to merge it.
>>
>>
>> On Fri, Aug 7, 2020 at 10:43 AM Márton Balassi 
>> wrote:
>>
>> > Hi Robert and Gyula,
>> >
>> > Thanks for reviving this thread. We have the implementation (currently
>> for
>> > 2.2.3) and it is straightforward to contribute it back. Miklos (ccd) has
>> > recently written a readme for said version, he would be interested in
>> > contributing the upgraded connector back. The latest HBase version is
>> > 2.3.0, if we are touching the codebase anyway I would propose to have
>> that.
>> >
>> > If everyone is comfortable with it I would assign [1] to Miklos with
>> > double checking the all functionality that Felipe has proposed is
>> included.
>> > [1] https://issues.apache.org/jira/browse/FLINK-18795
>> > [2] https://hbase.apache.org/downloads.html
>> >
>> > On Fri, Aug 7, 2020 at 10:13 AM Gyula Fóra 
>> wrote:
>> >
>> >> Hi Robert,
>> >>
>> >> I completely agree with you on the Bahir based approach.
>> >>
>> >> I am happy to help with the contribution on the bahir side, with
>> thorough
>> >>  review and testing.
>> >>
>> >> Cheers,
>> >> Gyula
>> >>
>> >> On Fri, 7 Aug 2020 at 09:30, Robert Metzger 
>> wrote:
>> >>
>> >>> It seems that this thead is not on dev@ anymore. Adding it back ...
>> >>>
>> >>> On Fri, Aug 7, 2020 at 9:23 AM Robert Metzger 
>> >>> wrote:
>> >>>
>>  I would like to revive this discussion. There's a new JIRA[1] + PR[2]
>>  for adding HBase 2 support.
>> 
>>  it seems that there is demand for a HBase 2 connector, and consensus
>> to
>>  do it.
>> 
>>  The remaining question in this thread seems to be the "how". I would
>>  propose to go the other way around as Gyula suggested: We move the
>> legacy
>>  connector (1.4x) to bahir and add the new (2.x.x) to Flink.
>>  Why? In the Flink repo, we have a pretty solid testing infra, where
>> we
>>  also run Hbase end to end tests. This will help us to stabilize the
>> new
>>  connector and ensure a good quality.
>>  It also, the perception of what goes into Flink, and what into Bahir
>> is
>>  a bit clearer if we put the stable, up to date stuff into Flink, and
>>  legacy, experimental or unstable connectors into Bahir.
>> 
>> 
>>  Who can take care of this effort? (Decide which Hbase 2 PR to take,
>>  review and contribution to Bahir)
>> 
>> 
>>  [1] https://issues.apache.org/jira/browse/FLINK-18795
>>  [2] https://github.com/apache/flink/pull/13047
>> 
>>  On Mon, Jun 22, 2020 at 3:32 PM Gyula Fóra

Re: [DISCUSS] Planning Flink 1.12

2020-08-10 Thread Robert Metzger
I updated the release date in the Wiki page.

On Sun, Aug 9, 2020 at 8:18 PM Yun Tang  wrote:

> +1 for extending the feature freeze due date.
> 
> From: Zhijiang 
> Sent: Thursday, August 6, 2020 17:05
> To: dev 
> Subject: Re: [DISCUSS] Planning Flink 1.12
>
> +1 on my side for feature freeze date by the end of Oct.
>
>
> --
> From:Yuan Mei 
> Send Time:2020年8月6日(星期四) 14:54
> To:dev 
> Subject:Re: [DISCUSS] Planning Flink 1.12
>
> +1
>
> > +1 for extending the feature freeze date to the end of October.
>
>
>
> On Thu, Aug 6, 2020 at 12:08 PM Yu Li  wrote:
>
> > +1 for extending feature freeze date to end of October.
> >
> > Feature development in the master branch could be unblocked through
> > creating the release branch, but every coin has its two sides (smile)
> >
> > Best Regards,
> > Yu
> >
> >
> > On Wed, 5 Aug 2020 at 20:12, Robert Metzger  wrote:
> >
> > > Thanks all for your opinion.
> > >
> > > @Chesnay: That is a risk, but I hope the people responsible for
> > individual
> > > FLIPs plan accordingly. Extending the time till the feature freeze
> should
> > > not mean that we are extending the scope of the release.
> > > Ideally, features are done before FF, and they use the time till the
> > freeze
> > > for additional testing and documentation polishing.
> > > This FF will be virtual, there should be less disruption than a
> physical
> > > conference with all the travelling.
> > > Do you have a different proposal for the timing?
> > >
> > >
> > > I'm currently considering splitting the feature freeze and the release
> > > branch creation. Similar to the Linux kernel development, we could
> have a
> > > "merge window" and a stabilization phase. At the end of the
> stabilization
> > > phase, we cut the release branch and open the next merge window (I'll
> > start
> > > a separate thread regarding this towards the end of this release cycle,
> > if
> > > I still like the idea then)
> > >
> > >
> > > On Wed, Aug 5, 2020 at 12:04 PM Chesnay Schepler 
> > > wrote:
> > >
> > > > I'm a bit concerned about end of October, because it means we have
> > Flink
> > > > forward, which usually means at least 1 week of little-to-no
> activity,
> > > > and then 1 week until feature-freeze.
> > > >
> > > > On 05/08/2020 11:56, jincheng sun wrote:
> > > > > +1 for end of October from me as well.
> > > > >
> > > > > Best,
> > > > > Jincheng
> > > > >
> > > > >
> > > > > Kostas Kloudas  于2020年8月5日周三 下午4:59写道:
> > > > >
> > > > >> +1 for end of October from me as well.
> > > > >>
> > > > >> Cheers,
> > > > >> Kostas
> > > > >>
> > > > >> On Wed, Aug 5, 2020 at 9:59 AM Till Rohrmann <
> trohrm...@apache.org>
> > > > wrote:
> > > > >>
> > > > >>> +1 for end of October from my side as well.
> > > > >>>
> > > > >>> Cheers,
> > > > >>> Till
> > > > >>>
> > > > >>> On Tue, Aug 4, 2020 at 9:46 PM Stephan Ewen 
> > > wrote:
> > > > >>>
> > > >  The end of October sounds good from my side, unless it collides
> > with
> > > > >> some
> > > >  holidays that affect many committers.
> > > > 
> > > >  Feature-wise, I believe we can definitely make good use of the
> > time
> > > to
> > > > >>> wrap
> > > >  up some critical threads (like finishing the FLIP-27 source
> > > efforts).
> > > > 
> > > >  So +1 to the end of October from my side.
> > > > 
> > > >  Best,
> > > >  Stephan
> > > > 
> > > > 
> > > >  On Tue, Aug 4, 2020 at 8:59 AM Robert Metzger <
> > rmetz...@apache.org>
> > > > >>> wrote:
> > > > > Thanks a lot for commenting on the feature freeze date.
> > > > >
> > > > > You are raising a few good points on the timing.
> > > > > If we have already (2 months before) concerns regarding the
> > > deadline,
> > > >  then
> > > > > I agree that we should move it till the end of October.
> > > > >
> > > > > We then just need to be careful not to run into the Christmas
> > > season
> > > > >> at
> > > >  the
> > > > > end of December.
> > > > >
> > > > > If nobody objects within a few days, I'll update the feature
> > freeze
> > > > >>> date
> > > >  in
> > > > > the Wiki.
> > > > >
> > > > >
> > > > > On Tue, Aug 4, 2020 at 7:52 AM Kurt Young 
> > > wrote:
> > > > >
> > > > >> Regarding setting the feature freeze date to late September, I
> > > have
> > > >  some
> > > > >> concern that it might make
> > > > >> the development time of 1.12 too short.
> > > > >>
> > > > >> One reason for this is we took too much time (about 1.5 month,
> > > from
> > > > >>> mid
> > > > > of
> > > > >> May to beginning of July)
> > > > >> for testing 1.11. It's not ideal but further squeeze the
> > > > >> development
> > > >  time
> > > > >> of 1.12 won't make this better.
> > > > >>   Besides, AFAIK July & August is also a popular vacation
> season
> > > for
> > > > >> European

Re: Adding a new "Docker Images" component to Jira

2020-08-10 Thread Robert Metzger
Thanks for the feedback.
Transaction committed.

On Mon, Aug 10, 2020 at 5:34 AM Matt Wang  wrote:

> +1 for unifying Deployment / Docker, Dockerfiles and Release System /
> Docker into Docker.
>
>
> --
>
> Best,
> Matt Wang
>
>
> On 08/10/2020 11:21,Yang Wang wrote:
> +1 for the unifying of the component name.
>
> Best,
> Yang
>
> Andrey Zagrebin  于2020年8月8日周六 下午6:30写道:
>
> +1 for the consolidation
>
> Best,
> Andrey
>
> On Fri, Aug 7, 2020 at 3:38 PM Till Rohrmann  wrote:
>
> +1 for unifying Deployment / Docker, Dockerfiles and Release System /
> Docker into Docker.
>
> Cheers,
> Till
>
> On Fri, Aug 7, 2020 at 12:18 PM Robert Metzger 
> wrote:
>
> Hi all,
>
> we now have 3 components containing the word "docker":
> - Deployment / Docker
> <
>
>
>
> https://issues.apache.org/jira/issues/?jql=project+%3D+FLINK+AND+component+%3D+%22Deployment+%2F+Docker%22
>
> (63
> issues)
> - Dockerfiles
> <
>
>
>
> https://issues.apache.org/jira/issues/?jql=project+%3D+FLINK+AND+component+%3D+Dockerfiles
>
> (12
> issues)
> - Release System / Docker
> <
>
>
>
> https://issues.apache.org/jira/issues/?jql=project+%3D+FLINK+AND+component+%3D+%22Release+System+%2F+Docker%22
>
> (3
> issues)
>
> I would suggest consolidating these three components into one, as there
> are
> not that many tickets for this aspect of Flink.
> Maybe we should just rename "Deployment / Docker" to "flink-docker",
> and
> merge the two other components into it?
>
>
> On Fri, Feb 21, 2020 at 11:47 AM Patrick Lucas 
> wrote:
>
> Thanks, Chesnay!
>
> On Fri, Feb 21, 2020 at 11:26 AM Chesnay Schepler <
> ches...@apache.org>
> wrote:
>
> I've added a "Release System / Docker" component.
>
> On 21/02/2020 11:19, Patrick Lucas wrote:
> Hi,
>
> Could someone with permissions add a new component to the FLINK
> project
> in
> Jira for the Docker images <
> https://github.com/apache/flink-docker
> ?
>
> There is already a "Deployment / Docker" component, but that's
> not
> quite
> the same as maintenance/improvements on the flink-docker images.
>
> Either top-level "Docker Images" or perhaps "Release / Docker
> Images"
> would
> be fine.
>
> Thanks,
> Patrick
>
>
>
>
>
>
>
>


Re: [DISCUSS] FLIP-36 - Support Interactive Programming in Flink Table API

2020-08-10 Thread Timo Walther

Hi Xuannan,

sorry for joining the discussion so late. I agree that this is a very 
nice and useful feature. However, the impact it has to many components 
in the stack requires more discussion in my opinion.


1) Separation of concerns:
The current design seems to mix different layers. We should make sure 
that all layers do what they are supposed to do:


1a) The FLIP states: "The cache() method returns a new Table object with 
a flag set."


The `Table` object is just a thin API class that wraps a 
`QueryOperation`. Other than that the `Table` object should not contain 
futher state. The tree of `QueryOperation` should be an immutable, 
independent data structure that can be passed around and will eventually 
be passed to the `Planner`.


The mentioned `CacheSink` should be added by the optimizer. It is not 
the responsibility of the API do perform optimizer-like tasks. A call to 
`t1.cache()` should simply wrap the original operation into something 
like `CacheOperation(t1.getQueryOperation)`. A `CacheOperation` could 
still extend from `QueryOperation` and assign a unique string identifier 
already. A specialized `StreamTransformation` would be necessary during 
translation.


1b) The FLIP states: "The default table service stores the metadata in 
the client (e.g. TableEnvironment)"


`TableEnvironment` is not a client. Similar to `Table`, it is an API 
class that just delegates to other session components. Currently, the 
table environment has (again) to many responsibilities that should 
better be split into other components. The table's `Executor` is the 
client that performs the cluster communication. But in general it also 
just delegates to `org.apache.flink.core.execution.PipelineExecutor`. 
IMO the `PipelineExecutor` is a better fit for a back-and-forth 
communication to determine existing cluster partitions and modify the 
job graph. Or even further down the stack, because as far as I know, 
`PipelineExecutor` works with `StreamGraph`.


`flink-table-api-java` has no dependency to `flink-runtime`. This has 
been done on purpose.


2) API
I still see the rejected option 2 a good fit to expose this feature.

A `Table.cache(): CachedTable` with `CachedTable.invalidate(): void` and 
maybe `CachedTable.getId(): String` makes the feature and its operations 
very explicit. It also avoids following up questions such as:


Will `invalidateCache()` be transitively propagated in 
`t1.cache().join(t2.cache()).invalidateCache()`?


Or as the FLIP states:

`Table t3 = t1.select(...) // cache will NOT be used.`
but
`t1.invalidateCache() // cache will be released`

This sounds a bit contradicting to me. Because sometimes the 
`t1.cache()` has implications on t1 and sometimes not.


3) Big picture

After reading the FLIP, I still don't understand how a user can 
configure or control the table service. Will we offer options through 
`TableConfig` or `TableEnvironment` or is this configuration done via 
ConfigOptions for lower layers?


4) SQL integration

As I mentioned earlier, we should think about a SQL integration as well. 
Otherwise we need to redesign the Java API to align it with SQL later. 
SQL has also a bigger user base than Table API. Let's assume we 
introduce a new keyword and combine the caching with regular CREATE VIEW 
syntax such as:

`CREATE CACHED TEMPORARY VIEW MyTable AS SELECT *`
This would align well with:
`tEnv.registerTemporaryView("MyTable", table.cache())`

What do you think?

4) TableEnvironment.close()

Will `TableEnvironment` implement `AutoCloseable`?


In summary, I think the FLIP should go into more details how this effort 
affects each layer. Because a lot of the interfaces are `@Public` or 
`@PublicEvolving`. And the FLIP still leaves a lot of questions how this 
high level concept ends up in JobGraph.


Regards,
Timo



On 30.07.20 09:00, Xuannan Su wrote:

Hi folks,

It seems that all the raised concerns so far have been resolved. I plan to 
start a voting thread for FLIP-36 early next week if there are no comments.

Thanks,
Xuannan
On Jul 28, 2020, 7:42 PM +0800, Xuannan Su , wrote:

Hi Kurt,

Thanks for the comments.

You are right that the FLIP lacks a proper discussion about the impact of the 
optimizer. I have added the section to talk about how the cache table works 
with the optimizer. I hope this could resolve your concern. Please let me know 
if you have any further comments.

Best,
Xuannan
On Jul 22, 2020, 4:36 PM +0800, Kurt Young , wrote:

Thanks for the reply, I have one more comment about the optimizer
affection. Even if you are
trying to make the cached table be as orthogonal to the optimizer as
possible by introducing
a special sink, it is still not clear why this approach is safe. Maybe you
can add some process
introduction from API to JobGraph, otherwise I can't make sure everyone
reviewing the design
doc will have the same imagination about this. And I'm also quite sure some
of the existing
mechanism will be affected by this special sink, e.g. multi sink

[jira] [Created] (FLINK-18876) BlobServer moves file with an open FileInputStream

2020-08-10 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-18876:


 Summary: BlobServer moves file with an open FileInputStream
 Key: FLINK-18876
 URL: https://issues.apache.org/jira/browse/FLINK-18876
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.12.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.12.0


{{BlobServer#putInputStream}} moves a file while it has an open 
FileInputStream, causing tests like the BlobServerPutTest to fail on Windows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18875) DESCRIBE table can return the table properties

2020-08-10 Thread Kaibo Zhou (Jira)
Kaibo Zhou created FLINK-18875:
--

 Summary: DESCRIBE table can return the table properties
 Key: FLINK-18875
 URL: https://issues.apache.org/jira/browse/FLINK-18875
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Kaibo Zhou


I created a table A with some properties, and then I want to see the properties 
when using DESCRIBE `A` or DESCRIBE EXTENDED `A`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18874) Support conversion between Table and DataStream

2020-08-10 Thread Shuiqiang Chen (Jira)
Shuiqiang Chen created FLINK-18874:
--

 Summary: Support conversion between Table and DataStream
 Key: FLINK-18874
 URL: https://issues.apache.org/jira/browse/FLINK-18874
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Shuiqiang Chen
 Fix For: 1.12.0


Support Converting a DataStream into a Table, and a Table to append/retract 
Stream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18873) Make the WatermarkStrategy API more scala friendly

2020-08-10 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-18873:


 Summary: Make the WatermarkStrategy API more scala friendly
 Key: FLINK-18873
 URL: https://issues.apache.org/jira/browse/FLINK-18873
 Project: Flink
  Issue Type: Improvement
  Components: API / Core, API / DataStream
Affects Versions: 1.11.0
Reporter: Dawid Wysakowicz
 Fix For: 1.12.0


Right now there is no reliable way of passing WatermarkGeneratorSupplier and/or 
TimestampAssigner as lambdas in scala.

The only way to use this API is:

{code}
.assignTimestampsAndWatermarks(
  WatermarkStrategy.forGenerator[(String, Long)](
new WatermarkGeneratorSupplier[(String, Long)] {
  override def createWatermarkGenerator(context: 
WatermarkGeneratorSupplier.Context): WatermarkGenerator[(String, Long)] =

new MyPeriodicGenerator
}
  )
.withTimestampAssigner(new SerializableTimestampAssigner[(String, 
Long)] {
  override def extractTimestamp(t: (String, Long), l: Long): Long = t._2
})
)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18872) Aggregate with mini-batch does not respect state retention

2020-08-10 Thread Benchao Li (Jira)
Benchao Li created FLINK-18872:
--

 Summary: Aggregate with mini-batch does not respect state retention
 Key: FLINK-18872
 URL: https://issues.apache.org/jira/browse/FLINK-18872
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API, Table SQL / Runtime
Affects Versions: 1.11.1, 1.10.2, 1.12.0
Reporter: Benchao Li


MiniBatchGroupAggFunction and MiniBatchGlobalGroupAggFunction does not respect 
state retention config, the state will grow infinitely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18871) Non-deterministic function could break retract mechanism

2020-08-10 Thread Benchao Li (Jira)
Benchao Li created FLINK-18871:
--

 Summary: Non-deterministic function could break retract mechanism
 Key: FLINK-18871
 URL: https://issues.apache.org/jira/browse/FLINK-18871
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API, Table SQL / Runtime
Reporter: Benchao Li


For example, we have a following SQL:
{code:sql}
create view view1 as
select 
  max(a) as m1,
  max(b) as m2 -- b is a timestmap
from T
group by c, d;

create view view2 as
select * from view1
where m2 > CURRENT_TIMESTAMP;

insert into MySink
select sum(m1) as m1 
from view2
group by c;
{code}
view1 will produce retract messages, and the same message in view2 maybe 
produce different results. and the second agg will produce wrong result.
 For example,
{noformat}
view1:
+ (1, 2020-8-10 16:13:00)
- (1, 2020-8-10 16:13:00)
+ (2, 2020-8-10 16:13:10)

view2:
+ (1, 2020-8-10 16:13:00)
- (1, 2020-8-10 16:13:00) // this record may be filtered out
+ (2, 2020-8-10 16:13:10)

MySink:
+ (1, 2020-8-10 16:13:00)
+ (2, 2020-8-10 16:13:10) // will produce wrong result.
{noformat}
In general, the non-deterministic function may break the retract mechanism. All 
operators downstream which will rely on the retraction mechanism will produce 
wrong results, or throw exception, such as Agg / some Sink which need retract 
message / TopN / Window.

(The example above is a simplified version of some production jobs in our 
scenario, just to explain the core problem)

CC [~ykt836] [~jark]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18870) [Kinesis][EFO] Update Kinesis Consumer website to reflect EFO

2020-08-10 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-18870:
-

 Summary: [Kinesis][EFO] Update Kinesis Consumer website to reflect 
EFO 
 Key: FLINK-18870
 URL: https://issues.apache.org/jira/browse/FLINK-18870
 Project: Flink
  Issue Type: Sub-task
Reporter: Danny Cranmer


Update the connector website to reflect the changes added for EFO:
- 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kinesis.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)