[ANNOUNCE] New scalafmt formatter has been merged

2022-04-12 Thread Francesco Guardiani
Hi all,
The new scalafmt formatter has been merged. From now on, just using mvn
spotless:apply as usual will format both Java and Scala, and Intellij will
automatically pick up the scalafmt config for who has the Scala plugin
installed. If it doesn't, just go in Preferences > Editor > Code Style >
Scala and change the Formatter to scalafmt. If you use the actions on save
plugin, make sure you have the reformat on save enabled for Scala.

For more details on integration with IDEs, please refer to
https://scalameta.org/scalafmt/docs/installation.html

If you have a pending PR with Scala changes, chances are you're going to
have conflicts with upstream/master now. In order to fix it, here is the
suggested procedure:

   - Do an interactive rebase on commit
   3ea3fee5ac996f6ae8836c3cba252f974d20bd2e, which is the commit before the
   refactoring of the whole codebase, fixing as usual the conflicting changes.
   This will make sure you won't miss the changes between your branch and
   master *before* the reformatting commit.
   - Do a rebase on commit 91d81c427aa6312841ca868d54e8ce6ea721cd60
   accepting all changes from your local branch. You can easily do that via git
   rebase -Xours 91d81c427aa6312841ca868d54e8ce6ea721cd60
   - Run mvn spotless:apply and commit all the changes
   - Do an interactive rebase on upstream/master. This will make sure you
   won't miss the changes between your branch and master *after* the
   reformatting commit.
   - Force push your branch to update the PR

Sorry for this noise!

Thank you,
FG

-- 

Francesco Guardiani | Software Engineer

france...@ververica.com


<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH

Registered at Amtsgericht Charlottenburg: HRB 158244 B

Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason,
Jinwei (Kevin) Zhang


[jira] [Created] (FLINK-27201) Remove CollectTableSink

2022-04-12 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-27201:
---

 Summary: Remove CollectTableSink
 Key: FLINK-27201
 URL: https://issues.apache.org/jira/browse/FLINK-27201
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27200) Replace CollectionTableSource and CollectionTableSink with new table stack alternatives

2022-04-12 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-27200:
---

 Summary: Replace CollectionTableSource and CollectionTableSink 
with new table stack alternatives
 Key: FLINK-27200
 URL: https://issues.apache.org/jira/browse/FLINK-27200
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani


We need to replace the old CollectionTableSource and CollectionTableSink with 
alternatives from the new stack, for example TestValuesTableFactory or some 
simpler and more straightforward alternative coming from TableFactoryHarness.

This task requires to identify the alternative for the old 
CollectionTableSource and CollectionTableSink, eventually creating a new one, 
and then replace the usage in tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27198) Cleanup tests testing legacy TableSink/TableSource

2022-04-12 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-27198:
---

 Summary: Cleanup tests testing legacy TableSink/TableSource
 Key: FLINK-27198
 URL: https://issues.apache.org/jira/browse/FLINK-27198
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani


We have a lot of mock TableSink/TableSource used only for testing features of 
the old TableSink/TableSource stack. We should identify them and remove 
altogether with the tests. A non complete list:


{{Implementations of TableSink in  (25 usages found)}}
{{    Test  (14 usages found)}}
{{        flink-python_2.12  (4 usages found)}}
{{            org.apache.flink.table.legacyutils  (4 usages found)}}
{{                legacyTestingSinks.scala  (3 usages found)}}
{{                    39 private[flink] class TestAppendSink extends 
AppendStreamTableSink[Row] {}}
{{                    69 private[flink] class TestRetractSink extends 
RetractStreamTableSink[Row] {}}
{{                    95 private[flink] class TestUpsertSink(}}
{{                TestCollectionTableFactory.scala  (1 usage found)}}
{{                    168 class CollectionTableSink(val outputType: 
RowTypeInfo)}}
{{        flink-table-common  (1 usage found)}}
{{            org.apache.flink.table.utils  (1 usage found)}}
{{                TypeMappingUtilsTest.java  (1 usage found)}}
{{                    419 private static class TestTableSink implements 
TableSink> {}}
{{        flink-table-planner_2.12  (9 usages found)}}
{{            org.apache.flink.table.planner.factories.utils  (1 usage found)}}
{{                TestCollectionTableFactory.scala  (1 usage found)}}
{{                    152 class CollectionTableSink(val schema: TableSchema)}}
{{            org.apache.flink.table.planner.runtime.batch.sql  (1 usage 
found)}}
{{                PartitionableSinkITCase.scala  (1 usage found)}}
{{                    331 private class TestSink(}}
{{            org.apache.flink.table.planner.runtime.utils  (3 usages found)}}
{{                StreamTestSink.scala  (3 usages found)}}
{{                    265 final class TestingUpsertTableSink(val keys: 
Array[Int], val tz: TimeZone)}}
{{                    344 final class TestingAppendTableSink(tz: TimeZone) 
extends AppendStreamTableSink[Row] {}}
{{                    499 final class TestingRetractTableSink(tz: TimeZone) 
extends RetractStreamTableSink[Row] {}}
{{            org.apache.flink.table.planner.utils  (4 usages found)}}
{{                MemoryTableSourceSinkUtil.scala  (3 usages found)}}
{{                    80 final class UnsafeMemoryAppendTableSink}}
{{                    146 final class DataTypeOutputFormatTableSink(}}
{{                    181 final class DataTypeAppendStreamTableSink(}}
{{                testTableSourceSinks.scala  (1 usage found)}}
{{                    1317 class OptionsTableSink(}}

{{Implementations of TableSource in  (31 usages found)}}
{{    Production  (8 usages found)}}
{{        flink-batch-sql-test  (1 usage found)}}
{{            org.apache.flink.sql.tests  (1 usage found)}}
{{                BatchSQLTestProgram.java  (1 usage found)}}
{{                    79 public static class GeneratorTableSource extends 
InputFormatTableSource {}}
{{        flink-stream-sql-test_2.12  (1 usage found)}}
{{            org.apache.flink.sql.tests  (1 usage found)}}
{{                StreamSQLTestProgram.java  (1 usage found)}}
{{                    192 public static class GeneratorTableSource}}
{{    Test  (23 usages found)}}
{{        flink-python_2.12  (1 usage found)}}
{{            org.apache.flink.table.legacyutils  (1 usage found)}}
{{                TestCollectionTableFactory.scala  (1 usage found)}}
{{                    127 class CollectionTableSource(}}
{{        flink-sql-client_2.12  (1 usage found)}}
{{            org.apache.flink.table.client.gateway.utils  (1 usage found)}}
{{                SimpleCatalogFactory.java  (1 usage found)}}
{{                    95 new StreamTableSource() {}}
{{        flink-table-api-java  (1 usage found)}}
{{            org.apache.flink.table.utils  (1 usage found)}}
{{                TableSourceMock.java  (1 usage found)}}
{{                    27 public class TableSourceMock implements 
TableSource {}}
{{        flink-table-common  (1 usage found)}}
{{            org.apache.flink.table.utils  (1 usage found)}}
{{                TypeMappingUtilsTest.java  (1 usage found)}}
{{                    375 private static class TestTableSource}}
{{        flink-table-planner_2.12  (19 usages found)}}
{{            org.apache.flink.table.planner.factories.utils  (1 usage found)}}
{{                TestCollectionTableFactory.scala  (1 usage found)}}
{{                    115 class CollectionTableSource(}}
{{            org.apache.flink.table.planner.plan.stream.sql.join  (2 usages 
found)}}
{{                LookupJoinTest.scala  (2 usages fo

[jira] [Created] (FLINK-27197) Port ArrowTableSource and PythonInputFormatTableSource to DynamicTableSource

2022-04-12 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-27197:
---

 Summary: Port ArrowTableSource and PythonInputFormatTableSource to 
DynamicTableSource
 Key: FLINK-27197
 URL: https://issues.apache.org/jira/browse/FLINK-27197
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27196) Remove old format interfaces

2022-04-12 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-27196:
---

 Summary: Remove old format interfaces
 Key: FLINK-27196
 URL: https://issues.apache.org/jira/browse/FLINK-27196
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27185) Run the assertj conversion script to convert assertions in connectors and formats

2022-04-11 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-27185:
---

 Summary: Run the assertj conversion script to convert assertions 
in connectors and formats
 Key: FLINK-27185
 URL: https://issues.apache.org/jira/browse/FLINK-27185
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27043) Remove CSV connector test usages

2022-04-04 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-27043:
---

 Summary: Remove CSV connector test usages
 Key: FLINK-27043
 URL: https://issues.apache.org/jira/browse/FLINK-27043
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26988) Better error reporting when format fails

2022-04-01 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26988:
---

 Summary: Better error reporting when format fails
 Key: FLINK-26988
 URL: https://issues.apache.org/jira/browse/FLINK-26988
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem
Reporter: Francesco Guardiani


Today when a format fails, depending on the format implementation, there might 
be no information about the split (file name, offset, etc), making hard to 
debug format failures. For example, here is what I've seen with csv format:


{code:java}
Caused by: java.lang.RuntimeException: One or more fetchers have encountered 
exception
    at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:225)
 ~[flink-connector-files-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:169)
 ~[flink-connector-files-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:130)
 ~[flink-connector-files-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.streaming.api.operators.SourceOperator.emitNextNotReading(SourceOperator.java:403)
 ~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:387)
 ~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)
 ~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
 ~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.io.StreamMultipleInputProcessor.processInput(StreamMultipleInputProcessor.java:85)
 ~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:531)
 ~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:227)
 ~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:841)
 ~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:767) 
~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
 ~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) 
~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) 
~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.lang.RuntimeException: SplitFetcher thread 0 received 
unexpected exception while polling the records
    at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:150)
 ~[flink-connector-files-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:105)
 ~[flink-connector-files-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 
~[?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
    at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.lang.RuntimeException: Invalid UTF-8 middle byte 0x54 (at char 
#3529, byte #3528): check content encoding, does not look like UTF-8
    at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.MappingIterator._handleIOException(MappingIterator.java:417)
 ~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.MappingIterator.hasNext(MappingIterator.java:190)
 ~[flink-dist-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.formats.csv.CsvReaderFormat$Reader.read(CsvReaderFormat.java:194)
 ~[flink-csv-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.connector.file.src.impl.StreamFormatAdapter$Reader.readBatch(StreamFormatAdapter.java:214)
 ~[flink-connector-files-1.16-SNAPSHOT.jar:1.16-SNAPSHOT]
    at 
org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:67)
 ~[flink-connector-files-1.16-SNAPSHOT.jar:1.16-SNAPSHOT

[jira] [Created] (FLINK-26952) Remove old CSV connector

2022-03-31 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26952:
---

 Summary: Remove old CSV connector
 Key: FLINK-26952
 URL: https://issues.apache.org/jira/browse/FLINK-26952
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


This involves removing all references in our tests to the old CSV connector 
(mostly TableEnvironmentITCase, TableEnvironmentTest and TableITCase) and 
removing the usage from a couple of e2e tests



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26950) Remove TableSink and TableSource interfaces

2022-03-31 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26950:
---

 Summary: Remove TableSink and TableSource interfaces
 Key: FLINK-26950
 URL: https://issues.apache.org/jira/browse/FLINK-26950
 Project: Flink
  Issue Type: Technical Debt
Reporter: Francesco Guardiani


This issue involves removing the old TableSink and TableSource stack, replaced 
by the new DynamicTableSink and DynamicTableSource.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-216 Decouple Hive connector with Flink planner

2022-03-30 Thread Francesco Guardiani
Sorry I replied on the wrong thread, i repost my answer here :)

As there was already a discussion in the doc, I'll just summarize my
opinions here on the proposed execution of this FLIP.

I think we should rather avoid exposing internal details, which I consider
Calcite to be part of, but rather reuse what we already have to define an
AST from Table API, which is what I'll refer in this mail as Operation tree.

First of all, the reason I think this FLIP is not a good idea is that it
proposes is to expose types out of our control, so an API we cannot control
and we may realistically never be able to stabilize. A Calcite bump in the
table project is already pretty hard today, as shown by tasks like that
https://github.com/apache/flink/pull/13577. This will make them even
harder. Essentially it will couple us to Calcite even more, and create a
different but still big maintenance/complexity burden we would like to get
rid of with this FLIP.

There are also some technical aspects that seem to me a bit overlooked here:

* What about Scala? Is flink-table-planner-spi going to be a scala module
with the related suffix? Because I see you want to expose a couple of types
which we have implemented with Scala right now, and making this module
Scala dependent makes even more complicated shipping both modules that use
it and flink-table-planner-loader.
* Are you sure exposing the Calcite interfaces is going to be enough? Don't
you also require some instance specific methods? E.g.
FlinkTypeFactory#toLogicalType? What if at some point you need to expose
something like FlinkTypeFactory? How do you plan to support it and
stabilize it in the long term?

Now let me talk a bit about the Operation tree. For who doesn't know what
it is, it's the pure Flink AST for defining DML, used for converting the
Table API DSL to an AST the planner can manipulate. Essentially, it's our
own version of the RelNode tree/RexNode tree. This operation tree can be
used already by Hive, without any API changes on Table side. You just need
a downcast of TableEnvironmentImpl to use getPlanner() and use
Planner#translate, or alternatively you can add getPlanner to
TableEnvironmentInternal directly. From what I've seen about your use case,
and please correct me if I'm wrong, you can implement your SQL -> Operation
tree layer without substantial changes on both sides.

The reason why I think this is a better idea, rather than exposing Calcite
and RelNodes directly, is:

* Aforementioned downsides of exposing Calcite
* It doesn't require a new API to get you started with it
* Doesn't add complexity on planner side, just removes it from the existing
coupling with hive
* Letting another project use the Operation tree will harden it, make it
more stable and eventually lead to become public

The last point in particular is extremely interesting for the future of the
project, as having a stable public Operation tree will allow people to
implement other relational based APIs on top of Flink SQL, or manipulate
the AST to define new semantics, or even more crazy things we can't think
of right now, leading to a broader bigger and more diverse ecosystem. Which
is exactly what Hive is doing right now at the end of the day, define a new
relational API on top of the Flink Table planner functionalities.

On Wed, Mar 30, 2022 at 4:45 PM 罗宇侠(莫辞)
 wrote:

> Hi, I would like to explain a bit more about the current dissusion[1] for
> the ways to decouple Hive connector with Flink Planner.
>
> The background is to support Hive dialect, the Hive connector is dependent
> on Flink Planner for the current implementation is generate RelNode and
> then deliver the RelNode to Flink.
>  But it also brings much complexity and maintenance burden, so we want to
> decouple Hive connector with Flink planner.
>
> There're two ways to do that:
> 1. Make the hive parser just generate an Operation tree like Table API
> currently does.
>
> 2. Introduce a slim module called table-planner-spl which provide Calcite
> dependency and expose limit public intefaces. Then, still generating
> Calcite RelNode, the Hive connector will only require the slim module.
>
> The first way is the ideal way and we should go in that direction. But it
> will take much effort for it requires rewriting all the code about Hive
> dialect and it's hard to do it in one shot.
> And given we want to move out Hive connector in 1.16, it's more pratical
> to decouple first, and then migrate it to operation tree.
> So, the FLIP-216[2] is for the second way. It explains the public
> interfaces to be exposed,  all of which has been implemented by
> PlannerContext.
>
> [1]
> https://docs.google.com/document/d/1LMQ_mWfB_mkYkEBCUa2DgCO2YdtiZV7YRs2mpXyjdP4/edit?usp=sharing
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A+Decouple+Hive+connector+with+Flink+planner
>
> Best regards,
> Yuxia--
> 发件人:罗宇侠(莫辞)
> 日 期:2022年03月25日 20:41:15
> 

Re: FLIP-216 Decouple Hive connector with Flink planner

2022-03-29 Thread Francesco Guardiani
As there was already a discussion in the doc, I'll just summarize my
opinions here on the proposed execution of this FLIP.

I think we should rather avoid exposing internal details, which I consider
Calcite to be part of, but rather reuse what we already have to define an
AST from Table API, which is what I'll refer in this mail as Operation tree.

First of all, the reason I think this FLIP is not a good idea is that it
proposes is to expose types out of our control, so an API we cannot control
and we may realistically never be able to stabilize. A Calcite bump in the
table project is already pretty hard today, as shown by tasks like that
https://github.com/apache/flink/pull/13577. This will make them even
harder. Essentially it will couple us to Calcite even more, and create a
different but still big maintenance/complexity burden we would like to get
rid of with this FLIP.

There are also some technical aspects that seem to me a bit overlooked here:

* What about Scala? Is flink-table-planner-spi going to be a scala module
with the related suffix? Because I see you want to expose a couple of types
which we have implemented with Scala right now, and making this module
Scala dependent makes even more complicated shipping both modules that use
it and flink-table-planner-loader.
* Are you sure exposing the Calcite interfaces is going to be enough? Don't
you also require some instance specific methods? E.g.
FlinkTypeFactory#toLogicalType? What if at some point you need to expose
something like FlinkTypeFactory? How do you plan to support it and
stabilize it in the long term?

Now let me talk a bit about the Operation tree. For who doesn't know what
it is, it's the pure Flink AST for defining DML, used for converting the
Table API DSL to an AST the planner can manipulate. Essentially, it's our
own version of the RelNode tree/RexNode tree. This operation tree can be
used already by Hive, without any API changes on Table side. You just need
a downcast of TableEnvironmentImpl to use getPlanner() and use
Planner#translate, or alternatively you can add getPlanner to
TableEnvironmentInternal directly. From what I've seen about your use case,
and please correct me if I'm wrong, you can implement your SQL -> Operation
tree layer without substantial changes on both sides.

The reason why I think this is a better idea, rather than exposing Calcite
and RelNodes directly, is:

* Aforementioned downsides of exposing Calcite
* It doesn't require a new API to get you started with it
* Doesn't add complexity on planner side, just removes it from the existing
coupling with hive
* Letting another project use the Operation tree will harden it, make it
more stable and eventually lead to become public

The last point in particular is extremely interesting for the future of the
project, as having a stable public Operation tree will allow people to
implement other relational based APIs on top of Flink SQL, or manipulate
the AST to define new semantics, or even more crazy things we can't think
of right now, leading to a broader bigger and more diverse ecosystem. Which
is exactly what Hive is doing right now at the end of the day, define a new
relational API on top of the Flink Table planner functionalities.


On Mon, Mar 28, 2022 at 10:57 AM 罗宇侠(莫辞)
 wrote:

> Sorry for this email, seems there's some format issue in my email client.
> Just ignore it for it's a duplicate of  [DISCUSS] FLIP-216 Decouple Hive
> connector with Flink planner [1]
>
>
> [1] https://lists.apache.org/thread/6xg33nxrnow5zy7xwqk5nwp00h9gcsbc
>
> Best regards,
> Yuxia--
> 发件人:罗宇侠
> 日 期:2022年03月25日 20:19:37
> 收件人:dev
> 主 题:FLIP-216 Decouple Hive connector with Flink planner
>
> Hi,everyone
>
>
>
>
> IwouldliketoopenadiscussionaboutdecouplingHiveconnectorwithFlinktableplanner.
> It'safollow-updiscussionafterHivesyntaxdiscussion[1],butonlyfocusonhowtodecoupleHiveconnector.Theorigindocishere[2],fromwhichyoucanseethedetailsworkandheateddiscussionaboutexposingCalciteAPIorreuseOperationtreetodecouple.
>
> IhavecreatedFLIP-216:DecoupleHiveconnectorwithFlinkplanner[3]forit.
>
>
>
>
>
> Thanksandlookingforwardtoalivelydiscussion!
>
>
>
> [1]https://lists.apache.org/thread/2w046dwl46tf2wy750gzmt0qrcz17z8t
>
> [2]
> https://docs.google.com/document/d/1LMQ_mWfB_mkYkEBCUa2DgCO2YdtiZV7YRs2mpXyjdP4/edit?usp=sharing
>
> [3]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A+Decouple+Hive+connector+with+Flink+planner
>
>
>
>
> Bestregards,
> Yuxia
>
>
> 
>


[jira] [Created] (FLINK-26782) Remove PlannerExpression and related

2022-03-21 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26782:
---

 Summary: Remove PlannerExpression and related
 Key: FLINK-26782
 URL: https://issues.apache.org/jira/browse/FLINK-26782
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26704) Remove string expression DSL

2022-03-17 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26704:
---

 Summary: Remove string expression DSL
 Key: FLINK-26704
 URL: https://issues.apache.org/jira/browse/FLINK-26704
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Conventions on assertions to use in tests

2022-03-15 Thread Francesco Guardiani
Hi all.

I've worked on a little tool to try to automatize a bit the conversion of
assertions to assertj https://github.com/slinkydeveloper/assertj-migrator.
It's not perfect, but it's an attempt to unify a bit the test codebase.

We applied it to flink-table modules and it worked like a charm, although
it required a couple of more commits to improve the tool output. Here is
the merged PR: https://github.com/apache/flink/pull/19039

I'll try to find some time in future to do a bulk conversion of other
modules as well, but you can also use it yourself when doing PRs and
updating the tests, so you don't have to manually convert the tests
themselves.

I hope it helps,
FG


On Tue, Dec 14, 2021 at 10:00 AM Jing Ge  wrote:

> Hi all,
>
> I took a close look at assertj and found there are two concepts for writing
> tests with two entry points interfaces: WithAssertions for normal style and
> BDDAssertions for BDD style. I would not suggest using them in one project
> simultaneously. Since all related work done previously were using the
> normal style afaik, the normal style seems to be the right one to stick
> with.
>
> WDYT?
>
> Best regards
> Jing
>
> On Fri, Dec 3, 2021 at 12:15 PM Marios Trivyzas  wrote:
>
> > Definitely +1 from me as well. Otherwise backporting tests (accompanying
> > fixes) would consume significant time.
> >
> > On Fri, Dec 3, 2021 at 11:42 AM Till Rohrmann 
> > wrote:
> >
> > > I think this is a very good idea, Matthias. +1 for backporting the
> > jassert
> > > changes to 1.14 and 1.13 if possible.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Dec 3, 2021 at 11:38 AM Matthias Pohl 
> > > wrote:
> > >
> > > > Currently, we only added the jassert to the master branch. I was
> > > wondering
> > > > whether we could backport the corresponding PR [1] to release-1.14
> and
> > > > release-1.13, too. Otherwise, we would have to implement tests twice
> > when
> > > > providing PRs with new tests that need to be backported: The jassert
> > > > version for master and a hamcrest (or any other available library)
> for
> > > the
> > > > backports.
> > > >
> > > > It's not really a bugfix. But it might help developers with their
> > > > backports.
> > > >
> > > > Matthias
> > > >
> > > > [1] https://github.com/apache/flink/pull/17871
> > > >
> > > > On Thu, Nov 25, 2021 at 12:54 PM Marios Trivyzas 
> > > wrote:
> > > >
> > > > > As @Matthias Pohl  mentioned, I agree that
> > no1
> > > > is
> > > > > to end up with consistency
> > > > > regarding the assertions in our tests, but I also like how those
> > > > assertions
> > > > > shape up with the AssertJ approach.
> > > > >
> > > > > On Thu, Nov 25, 2021 at 9:38 AM Francesco Guardiani <
> > > > > france...@ververica.com>
> > > > > wrote:
> > > > >
> > > > > > This is the result of experimenting around creating custom
> > assertions
> > > > for
> > > > > > Table API types
> > > > > > https://github.com/slinkydeveloper/flink/commit/
> > > > > > d1ce37a62c2200b2c3008a9cc2cac91234222fd5[1]. I will PR it once
> the
> > > two
> > > > > PRs
> > > > > > in the
> > > > > > previous mail get merged
> > > > > >
> > > > > > On Monday, 22 November 2021 17:59:29 CET Francesco Guardiani
> wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Given I see generally consensus around having a convention and
> > > using
> > > > > > > assertj, I propose to merge these 2 PRs:
> > > > > > >
> > > > > > > * Add the explanation of this convention in our code quality
> > guide:
> > > > > > > https://github.com/apache/flink-web/pull/482
> > > > > > > * Add assertj to dependency management in the parent pom and
> link
> > > in
> > > > > the
> > > > > > PR
> > > > > > > template the code quality guide:
> > > > > > https://github.com/apache/flink/pull/17871
> > > > > > >
> > > > > > > WDYT?
> > > > > > >
> > > > > > > Once we merge those, I'll work in the next days to add som

Re: [ANNOUNCE] New PMC member: Yuan Mei

2022-03-14 Thread Francesco Guardiani
Congratulations, Yuan!

On Mon, Mar 14, 2022 at 3:51 PM yanfei lei  wrote:

> Congratulations, Yuan!
>
>
>
> Zhilong Hong  于2022年3月14日周一 19:31写道:
>
> > Congratulations, Yuan!
> >
> > Best,
> > Zhilong
> >
> > On Mon, Mar 14, 2022 at 7:22 PM Konstantin Knauf 
> > wrote:
> >
> > > Congratulations, Yuan!
> > >
> > > On Mon, Mar 14, 2022 at 11:29 AM Jing Zhang 
> > wrote:
> > >
> > > > Congratulations, Yuan!
> > > >
> > > > Best,
> > > > Jing Zhang
> > > >
> > > > Jing Ge  于2022年3月14日周一 18:15写道:
> > > >
> > > > > Congrats! Very well deserved!
> > > > >
> > > > > Best,
> > > > > Jing
> > > > >
> > > > > On Mon, Mar 14, 2022 at 10:34 AM Piotr Nowojski <
> > pnowoj...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Congratulations :)
> > > > > >
> > > > > > pon., 14 mar 2022 o 09:59 Yun Tang 
> napisał(a):
> > > > > >
> > > > > > > Congratulations, Yuan!
> > > > > > >
> > > > > > > Best,
> > > > > > > Yun Tang
> > > > > > > 
> > > > > > > From: Zakelly Lan 
> > > > > > > Sent: Monday, March 14, 2022 16:55
> > > > > > > To: dev@flink.apache.org 
> > > > > > > Subject: Re: [ANNOUNCE] New PMC member: Yuan Mei
> > > > > > >
> > > > > > > Congratulations, Yuan!
> > > > > > >
> > > > > > > Best,
> > > > > > > Zakelly
> > > > > > >
> > > > > > > On Mon, Mar 14, 2022 at 4:49 PM Johannes Moser <
> > j...@ververica.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Congrats Yuan.
> > > > > > > >
> > > > > > > > > On 14.03.2022, at 09:45, Arvid Heise 
> > wrote:
> > > > > > > > >
> > > > > > > > > Congratulations and well deserved!
> > > > > > > > >
> > > > > > > > > On Mon, Mar 14, 2022 at 9:30 AM Matthias Pohl <
> > > map...@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Congratulations, Yuan.
> > > > > > > > >>
> > > > > > > > >> On Mon, Mar 14, 2022 at 9:25 AM Shuo Cheng <
> > > njucs...@gmail.com>
> > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >>> Congratulations, Yuan!
> > > > > > > > >>>
> > > > > > > > >>> On Mon, Mar 14, 2022 at 4:22 PM Anton Kalashnikov <
> > > > > > > kaa@yandex.com>
> > > > > > > > >>> wrote:
> > > > > > > > >>>
> > > > > > > >  Congratulations, Yuan!
> > > > > > > > 
> > > > > > > >  --
> > > > > > > > 
> > > > > > > >  Best regards,
> > > > > > > >  Anton Kalashnikov
> > > > > > > > 
> > > > > > > >  14.03.2022 09:13, Leonard Xu пишет:
> > > > > > > > > Congratulations Yuan!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Leonard
> > > > > > > > >
> > > > > > > > >> 2022年3月14日 下午4:09,Yangze Guo  写道:
> > > > > > > > >>
> > > > > > > > >> Congratulations!
> > > > > > > > >>
> > > > > > > > >> Best,
> > > > > > > > >> Yangze Guo
> > > > > > > > >>
> > > > > > > > >> On Mon, Mar 14, 2022 at 4:08 PM Martijn Visser <
> > > > > > > >  martijnvis...@apache.org> wrote:
> > > > > > > > >>> Congratulations Yuan!
> > > > > > > > >>>
> > > > > > > > >>> On Mon, 14 Mar 2022 at 09:02, Yu Li <
> car...@gmail.com>
> > > > > wrote:
> > > > > > > > >>>
> > > > > > > >  Hi all!
> > > > > > > > 
> > > > > > > >  I'm very happy to announce that Yuan Mei has joined
> > the
> > > > > Flink
> > > > > > > PMC!
> > > > > > > > 
> > > > > > > >  Yuan is helping the community a lot with creating
> and
> > > > > > validating
> > > > > > > >  releases,
> > > > > > > >  contributing to FLIP discussions and good code
> > > > contributions
> > > > > > to
> > > > > > > > >> the
> > > > > > > >  state backend and related components.
> > > > > > > > 
> > > > > > > >  Congratulations and welcome, Yuan!
> > > > > > > > 
> > > > > > > >  Best Regards,
> > > > > > > >  Yu (On behalf of the Apache Flink PMC)
> > > > > > > > 
> > > > > > > >  --
> > > > > > > > 
> > > > > > > >  Best regards,
> > > > > > > >  Anton Kalashnikov
> > > > > > > > 
> > > > > > > > 
> > > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Konstantin Knauf
> > >
> > > https://twitter.com/snntrable
> > >
> > > https://github.com/knaufk
> > >
> >
>


[jira] [Created] (FLINK-26582) Run the assertj conversion script to convert assertions in flink-table

2022-03-10 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26582:
---

 Summary: Run the assertj conversion script to convert assertions 
in flink-table
 Key: FLINK-26582
 URL: https://issues.apache.org/jira/browse/FLINK-26582
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Enable scala formatting check

2022-03-09 Thread Francesco Guardiani
It would be nice to merge it before the release branch cut, but I'm not
sure we're on time for that...

On Wed, Mar 9, 2022 at 4:58 PM Martijn Visser 
wrote:

> I think it would actually be better to merge it before the release branch
> is cut to avoid potential issues when needing to backport bugfixes?
>
> Thanks, Martijn
>
> On Wed, 9 Mar 2022 at 16:55, Seth Wiesman  wrote:
>
> > Happy to help get this merged.
> >
> > Do we want to wait until the 1.15 branch is cut? The change is mostly
> > trivial (reformatting) but does make changes to the build system.
> >
> > Seth
> >
> > On Wed, Mar 9, 2022 at 9:45 AM Francesco Guardiani <
> > france...@ververica.com>
> > wrote:
> >
> > > Hi all,
> > > I've been spending some time prototyping a scalafmt conf, which doesn't
> > > look too different from our java style and tries to keep the same
> > > properties from our scalastyle conf. Here is the PR:
> > > https://github.com/apache/flink/pull/19025
> > >
> > > In particular, this is the scalafmt config commit:
> > >
> > >
> >
> https://github.com/apache/flink/pull/19025/commits/cb32893df4b554e4526324c43c86681cc9fe8169
> > > And this is the commit removing scalastyle:
> > >
> > >
> >
> https://github.com/apache/flink/pull/19025/commits/9ffe7d52e3368c5c40f15e3dc48f6d81691a8dd0
> > >
> > > I need some committer to pair with to merge the big PR, any volunteers?
> > :)
> > >
> > > After we merge it I will also update the contributor guide doc to
> remove
> > > scalastyle.
> > >
> > > FG
> > >
> > > On Tue, Mar 8, 2022 at 10:07 AM David Anderson 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > For flink-training we initially tried cloning the scalastyle setup
> from
> > > > flink, but we decided to use spotless + scalafmt instead.
> > > >
> > > > David
> > > >
> > > > On Mon, Mar 7, 2022 at 1:12 PM Timo Walther 
> > wrote:
> > > >
> > > > > Big +1
> > > > >
> > > > > This will improve the contribution experience. Even though we
> stopped
> > > > > adding more Scala code, it is still necessary from time to time.
> > > > >
> > > > > Regards,
> > > > > Timo
> > > > >
> > > > > Am 02.03.22 um 09:29 schrieb 刘首维:
> > > > > > +1
> > > > > >
> > > > > >
> > > > > > I still remember my first pr. Lack of experience, I had to pay
> > > > attention
> > > > > to Scala code format and corrected the format manually, which made
> > me a
> > > > > littleembarrassed(though I'm a big fan of Scala). I think
> this
> > > > > proposal will lighten the burden of writing Scala code.
> > > > > >
> > > > > >
> > > > > > Shouwei Liu
> > > > > >
> > > > > >
> > > > > > --原始邮件--
> > > > > > 发件人:
> > > > > "dev"
> > > > >   <
> > > > > kna...@apache.org;
> > > > > > 发送时间:2022年3月2日(星期三) 下午3:01
> > > > > > 收件人:"dev" > > > > >
> > > > > > 主题:Re: [DISCUSS] Enable scala formatting check
> > > > > >
> > > > > >
> > > > > >
> > > > > > +1 I've never written any Scala in Flink, but this makes a lot of
> > > sense
> > > > > to
> > > > > > me. Converging on a smaller set of tools and simplifying the
> build
> > is
> > > > > > always a good idea and the Community already concluded before
> that
> > > > > spotless
> > > > > > is generally a good approach.
> > > > > >
> > > > > > On Tue, Mar 1, 2022 at 5:52 PM Francesco Guardiani <
> > > > > france...@ververica.com
> > > > > > wrote:
> > > > > >
> > > > > >  Hi all,
> > > > > > 
> > > > > >  I want to propose to enable the spotless scalafmt
> integration
> > > and
> > > > > remove
> > > > > >  the scalastyle plugin.
&g

Re: [DISCUSS] Enable scala formatting check

2022-03-09 Thread Francesco Guardiani
Hi all,
I've been spending some time prototyping a scalafmt conf, which doesn't
look too different from our java style and tries to keep the same
properties from our scalastyle conf. Here is the PR:
https://github.com/apache/flink/pull/19025

In particular, this is the scalafmt config commit:
https://github.com/apache/flink/pull/19025/commits/cb32893df4b554e4526324c43c86681cc9fe8169
And this is the commit removing scalastyle:
https://github.com/apache/flink/pull/19025/commits/9ffe7d52e3368c5c40f15e3dc48f6d81691a8dd0

I need some committer to pair with to merge the big PR, any volunteers? :)

After we merge it I will also update the contributor guide doc to remove
scalastyle.

FG

On Tue, Mar 8, 2022 at 10:07 AM David Anderson  wrote:

> +1
>
> For flink-training we initially tried cloning the scalastyle setup from
> flink, but we decided to use spotless + scalafmt instead.
>
> David
>
> On Mon, Mar 7, 2022 at 1:12 PM Timo Walther  wrote:
>
> > Big +1
> >
> > This will improve the contribution experience. Even though we stopped
> > adding more Scala code, it is still necessary from time to time.
> >
> > Regards,
> > Timo
> >
> > Am 02.03.22 um 09:29 schrieb 刘首维:
> > > +1
> > >
> > >
> > > I still remember my first pr. Lack of experience, I had to pay
> attention
> > to Scala code format and corrected the format manually, which made me a
> > littleembarrassed(though I'm a big fan of Scala). I think this
> > proposal will lighten the burden of writing Scala code.
> > >
> > >
> > > Shouwei Liu
> > >
> > >
> > > --原始邮件--
> > > 发件人:
> > "dev"
> >   <
> > kna...@apache.org;
> > > 发送时间:2022年3月2日(星期三) 下午3:01
> > > 收件人:"dev" > >
> > > 主题:Re: [DISCUSS] Enable scala formatting check
> > >
> > >
> > >
> > > +1 I've never written any Scala in Flink, but this makes a lot of sense
> > to
> > > me. Converging on a smaller set of tools and simplifying the build is
> > > always a good idea and the Community already concluded before that
> > spotless
> > > is generally a good approach.
> > >
> > > On Tue, Mar 1, 2022 at 5:52 PM Francesco Guardiani <
> > france...@ververica.com
> > > wrote:
> > >
> > >  Hi all,
> > > 
> > >  I want to propose to enable the spotless scalafmt integration and
> > remove
> > >  the scalastyle plugin.
> > > 
> > >  From an initial analysis, scalafmt can do everything scalastyle
> can
> > do, and
> > >  the integration with spotless looks easy to enable:
> > >  https://github.com/diffplug/spotless/tree/main/plugin-maven#scala
> .
> > The
> > >  scalafmt conf file gets picked up automatically from every IDE,
> and
> > it can
> > >  be heavily tuned.
> > > 
> > >  This way we can unify the formatting and integrate with our CI
> > without any
> > >  additional configurations. And we won't need scalastyle anymore,
> as
> > >  scalafmt will take care of the checks:
> > > 
> > >  * mvn spotless:check will check both java and scala
> > >  * mvn spotless:apply will format both java and scala
> > > 
> > >  WDYT?
> > > 
> > >  FG
> > > 
> > > 
> > > 
> > >  --
> > > 
> > >  Francesco Guardiani | Software Engineer
> > > 
> > >  france...@ververica.com
> > > 
> > > 
> > >  <https://www.ververica.com/;
> > > 
> > >  Follow us @VervericaData
> > > 
> > >  --
> > > 
> > >  Join Flink Forward <https://flink-forward.org/; - The Apache
> > Flink
> > >  Conference
> > > 
> > >  Stream Processing | Event Driven | Real Time
> > > 
> > >  --
> > > 
> > >  Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > > 
> > >  --
> > > 
> > >  Ververica GmbH
> > > 
> > >  Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > > 
> > >  Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung
> > Jason,
> > >  Jinwei (Kevin) Zhang
> > > 
> > >
> > >
> >
> >
>


[jira] [Created] (FLINK-26553) Enable scalafmt for scala codebase

2022-03-09 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26553:
---

 Summary: Enable scalafmt for scala codebase
 Key: FLINK-26553
 URL: https://issues.apache.org/jira/browse/FLINK-26553
 Project: Flink
  Issue Type: Bug
  Components: Build System
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


As discussed in 
https://lists.apache.org/thread/97398pc9cb8y922xlb6mzlsbjtjf5jnv, we should 
enable scalafmt in our codebase



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] CAST legacy behaviour

2022-03-09 Thread Francesco Guardiani
Hi all,
As I see this thread has consensus, here is the issue and PR to disable the
legacy behavior by default:
https://issues.apache.org/jira/browse/FLINK-26551
https://github.com/apache/flink/pull/19020

We target to merge it before the release of 1.15, unless there are any
objections.

FG

On Tue, Mar 1, 2022 at 2:18 PM Marios Trivyzas  wrote:

> Indeed, if we manage to use the configuration from *flink-conf.yaml* down
> the stack,
> it would be easy for everyone to configure a "system-wide" legacy cast
> behaviour.
>
> Best regards,
> Marios
>
> On Tue, Mar 1, 2022 at 2:52 PM Timo Walther  wrote:
>
> > +1
> >
> > Thanks for bringing up this discussion one more time Marios.
> >
> > I strongly support enabling the new behavior in 1.15. It definitely has
> > implications on existing users, but as Seth said, thinking about the
> > upcoming upgrade story we need to make sure that at least the core/basic
> > operations are correct. Otherwise we will have to maintain multiple
> > versions of functions with broken semantics.
> >
> > I since we also try to fix various issues around configuration, maybe it
> > might still be possible to configure the legacy cast behavior globally
> > via flink-conf.yaml. This should make the transitioning period easier in
> > production.
> >
> > Regards,
> > Timo
> >
> > Am 28.02.22 um 19:04 schrieb Seth Wiesman:
> > > +1
> > >
> > > Especially as SQL upgrades are right around the corner, it makes sense
> to
> > > get our defaults right.
> > >
> > > Seth
> > >
> > > On Mon, Feb 28, 2022 at 7:14 AM Martijn Visser 
> > > wrote:
> > >
> > >> +1 for setting this option to disabled by default. I believe failures
> > >> should be brought forward as soon as possible, so they can be fixed as
> > fast
> > >> as possible. It will also be less confusing for new users. Last but
> not
> > >> least, I believe the impact on existing users will be minimal (since
> it
> > can
> > >> be changed by changing one flag).
> > >>
> > >> Best regards,
> > >>
> > >> Martijn
> > >>
> > >> On Tue, 22 Feb 2022 at 17:55, Marios Trivyzas 
> wrote:
> > >>
> > >>> Thanks Francesco,
> > >>>
> > >>> The two arguments you posted, further strengthen the need to make it
> > >>> DISABLED by default.
> > >>>
> > >>> On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani <
> > >>> france...@ververica.com> wrote:
> > >>>
> > >>>> Hi all,
> > >>>> I'm +1 with what everything you said Marios.
> > >>>>
> > >>>> I'm gonna add another argument on top of that: the
> > >> "legacy-cast-behavior"
> > >>>> has also a broken type inference, leading to incorrect results or
> > >> further
> > >>>> errors down in the pipeline[1]. For example, take this:
> > >>>>
> > >>>> SELECT COALESCE(CAST('a' AS INT), 0) ...
> > >>>>
> > >>>> With the legacy cast behavior ENABLED, this is going to lead to the
> > >> wrong
> > >>>> result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return
> value
> > >> is
> > >>>> inferred as INT NOT NULL, so the planner will drop COALESCE, and
> will
> > >>> never
> > >>>> return 0. Essentially, CAST will infer the wrong nullability leading
> > to
> > >>>> assigning its result to a NOT NULL type, when its value can
> > effectively
> > >>> be
> > >>>> NULL.
> > >>>>
> > >>>>> You introduce a deprecated flag to help users
> > >>>> using old versions of the software to smoothly transition to the new
> > >>>> version, while the new users experience the new features/behavior,
> > >>>> without the need to set a flag.
> > >>>>
> > >>>> This is IMO the major point on why we should disable the legacy cast
> > >>>> behavior by default. This is even more relevant with 1.15 and the
> > >> upgrade
> > >>>> story, as per the problem described above, because users will now
> end
> > >> up
> > >>>> generating plans with wrong type inference, which will be hard to
> > >> migrate
> > >>>> i

[jira] [Created] (FLINK-26551) Make the legacy behavior disabled by default

2022-03-09 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26551:
---

 Summary: Make the legacy behavior disabled by default
 Key: FLINK-26551
 URL: https://issues.apache.org/jira/browse/FLINK-26551
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani


Followup of https://issues.apache.org/jira/browse/FLINK-25111

For the discussion, see 
https://lists.apache.org/thread/r13y3plwwyg3sngh8cz47flogq621txv



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26549) INSERT INTO with VALUES leads to wrong type inference with nested types

2022-03-09 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26549:
---

 Summary: INSERT INTO with VALUES leads to wrong type inference 
with nested types
 Key: FLINK-26549
 URL: https://issues.apache.org/jira/browse/FLINK-26549
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Francesco Guardiani


While working on casting, I've found out we have an interesting bug in the 
insert values type inference. This comes from the 
{{KafkaTableITCase#testKafkaSourceSinkWithMetadata}} (look at this version in 
particular 
https://github.com/apache/flink/blob/567440115bcacb5aceaf3304e486281c7da8c14f/flink-connectors/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/table/KafkaTableITCase.java).

The test scenario is an INSERT INTO VALUES query which is also pushing some 
metadata to a Kafka table, in particular is writing the headers metadata.

The table is declared like that:

{code:sql}
 CREATE TABLE kafka (
  `physical_1` STRING,
  `physical_2` INT,
  `timestamp-type` STRING METADATA VIRTUAL,
  `timestamp` TIMESTAMP(3) METADATA,
  `leader-epoch` INT METADATA VIRTUAL,
  `headers` MAP METADATA,
  `partition` INT METADATA VIRTUAL,
  `topic` STRING METADATA VIRTUAL,
  `physical_3` BOOLEAN
) WITH (
   'connector' = 'kafka',
   [...]
)
{code}

The insert into query looks like:

{code:sql}
INSERT INTO kafka VALUES
('data 1', 1, TIMESTAMP '2020-03-08 13:12:11.123', MAP['k1', x'C0FFEE', 'k2', 
x'BABE'], TRUE),
('data 2', 2, TIMESTAMP '2020-03-09 13:12:11.123', CAST(NULL AS MAP), FALSE),
('data 3', 3, TIMESTAMP '2020-03-10 13:12:11.123', MAP['k1', X'10', 'k2', 
X'20'], TRUE)
{code}

Note that in the first row, the byte literal is of length 3, while in the last 
row the byte literal is of length 1.

The generated plan of this INSERT INTO is:

{code}
== Abstract Syntax Tree ==
LogicalSink(table=[default_catalog.default_database.kafka], fields=[physical_1, 
physical_2, physical_3, headers, timestamp])
+- LogicalProject(physical_1=[$0], physical_2=[$1], physical_3=[$4], 
headers=[CAST($3):(VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
VARBINARY(2147483647)) MAP], 
timestamp=[CAST($2):TIMESTAMP_WITH_LOCAL_TIME_ZONE(3)])
   +- LogicalUnion(all=[true])
  :- LogicalProject(EXPR$0=[_UTF-16LE'data 1'], EXPR$1=[1], 
EXPR$2=[2020-03-08 13:12:11.123:TIMESTAMP(3)], EXPR$3=[MAP(_UTF-16LE'k1', 
X'c0ffee':VARBINARY(3), _UTF-16LE'k2', X'babe':VARBINARY(3))], EXPR$4=[true])
  :  +- LogicalValues(tuples=[[{ 0 }]])
  :- LogicalProject(EXPR$0=[_UTF-16LE'data 2'], EXPR$1=[2], 
EXPR$2=[2020-03-09 13:12:11.123:TIMESTAMP(3)], 
EXPR$3=[null:(VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
VARBINARY(2147483647)) MAP], EXPR$4=[false])
  :  +- LogicalValues(tuples=[[{ 0 }]])
  +- LogicalProject(EXPR$0=[_UTF-16LE'data 3'], EXPR$1=[3], 
EXPR$2=[2020-03-10 13:12:11.123:TIMESTAMP(3)], EXPR$3=[MAP(_UTF-16LE'k1', 
X'10':BINARY(1), _UTF-16LE'k2', X'20':BINARY(1))], EXPR$4=[true])
 +- LogicalValues(tuples=[[{ 0 }]])

== Optimized Physical Plan ==
Sink(table=[default_catalog.default_database.kafka], fields=[physical_1, 
physical_2, physical_3, headers, timestamp])
+- Union(all=[true], union=[physical_1, physical_2, physical_3, headers, 
timestamp])
   :- Calc(select=[_UTF-16LE'data 1' AS physical_1, 1 AS physical_2, true AS 
physical_3, CAST(CAST(MAP(_UTF-16LE'k1', X'c0ffee':VARBINARY(3), _UTF-16LE'k2', 
X'babe':VARBINARY(3)) AS (CHAR(2) CHARACTER SET "UTF-16LE" NOT NULL, BINARY(1) 
NOT NULL) MAP) AS (VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
VARBINARY(2147483647)) MAP) AS headers, CAST(2020-03-08 
12:12:11.123:TIMESTAMP_WITH_LOCAL_TIME_ZONE(3) AS 
TIMESTAMP_WITH_LOCAL_TIME_ZONE(3)) AS timestamp])
   :  +- Values(type=[RecordType(INTEGER ZERO)], tuples=[[{ 0 }]])
   :- Calc(select=[_UTF-16LE'data 2' AS physical_1, 2 AS physical_2, false AS 
physical_3, null:(VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
VARBINARY(2147483647)) MAP AS headers, CAST(2020-03-09 
12:12:11.123:TIMESTAMP_WITH_LOCAL_TIME_ZONE(3) AS 
TIMESTAMP_WITH_LOCAL_TIME_ZONE(3)) AS timestamp])
   :  +- Values(type=[RecordType(INTEGER ZERO)], tuples=[[{ 0 }]])
   +- Calc(select=[_UTF-16LE'data 3' AS physical_1, 3 AS physical_2, true AS 
physical_3, CAST(MAP(_UTF-16LE'k1', X'10':BINARY(1), _UTF-16LE'k2', 
X'20':BINARY(1)) AS (VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
VARBINARY(2147483647)) MAP) AS headers, CAST(2020-03-10 
12:12:11.123:TIMESTAMP_WITH_LOCAL_TIME_ZONE(3) AS 
TIMESTAMP_WITH_LOCAL_TIME_ZONE(3)) AS timestamp])
  +- Values(type=[RecordType(INTEGER ZERO)], tuples=[[{ 0 }]])

== Optimized Execution Plan ==
Sink(table=[default_catalog.default_database.kafka], fields=[physical_1, 
physical_2, physical_3, headers, timestamp])
+- Union(all=[true], union=[physical_1, physical_2, physical_3, headers, 
timestamp])
   :- Calc(select=['data 

[jira] [Created] (FLINK-26520) Implement SEARCH operator

2022-03-07 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26520:
---

 Summary: Implement SEARCH operator
 Key: FLINK-26520
 URL: https://issues.apache.org/jira/browse/FLINK-26520
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / Planner
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


The codegen right now is not implementing the SEARCH operator, but it's using 
the rex builder to circumvent it. We should implement the SEARCH operator 
directly, to remove the usage of the flink type factory



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26519) Remove FlinkTypeFactory.INSTANCE singleton

2022-03-07 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26519:
---

 Summary: Remove FlinkTypeFactory.INSTANCE singleton
 Key: FLINK-26519
 URL: https://issues.apache.org/jira/browse/FLINK-26519
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


This singleton is not correct, as the user classloader needs to be passed for 
structured types creation



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [ANNOUNCE] New Apache Flink Committer - Martijn Visser

2022-03-04 Thread Francesco Guardiani
Congratulations Martjin!

On Fri, Mar 4, 2022 at 8:38 AM Ingo Bürk  wrote:

> Congrats, Martijn!
>
> On 03.03.22 16:49, Robert Metzger wrote:
> > Hi everyone,
> >
> > On behalf of the PMC, I'm very happy to announce Martijn Visser as a new
> > Flink committer.
> >
> > Martijn is a very active Flink community member, driving a lot of efforts
> > on the dev@flink mailing list. He also pushes projects such as replacing
> > Google Analytics with Matomo, so that we can generate our web analytics
> > within the Apache Software Foundation.
> >
> > Please join me in congratulating Martijn for becoming a Flink committer!
> >
> > Cheers,
> > Robert
> >
>


[jira] [Created] (FLINK-26463) TableEnvironmentITCase should use MiniCluster

2022-03-03 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26463:
---

 Summary: TableEnvironmentITCase should use MiniCluster
 Key: FLINK-26463
 URL: https://issues.apache.org/jira/browse/FLINK-26463
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[DISCUSS] Enable scala formatting check

2022-03-01 Thread Francesco Guardiani
Hi all,

I want to propose to enable the spotless scalafmt integration and remove
the scalastyle plugin.

>From an initial analysis, scalafmt can do everything scalastyle can do, and
the integration with spotless looks easy to enable:
https://github.com/diffplug/spotless/tree/main/plugin-maven#scala. The
scalafmt conf file gets picked up automatically from every IDE, and it can
be heavily tuned.

This way we can unify the formatting and integrate with our CI without any
additional configurations. And we won't need scalastyle anymore, as
scalafmt will take care of the checks:

* mvn spotless:check will check both java and scala
* mvn spotless:apply will format both java and scala

WDYT?

FG



-- 

Francesco Guardiani | Software Engineer

france...@ververica.com


<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH

Registered at Amtsgericht Charlottenburg: HRB 158244 B

Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason,
Jinwei (Kevin) Zhang


[jira] [Created] (FLINK-26422) Update Chinese documentation with the new TablePipeline docs

2022-03-01 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26422:
---

 Summary: Update Chinese documentation with the new TablePipeline 
docs
 Key: FLINK-26422
 URL: https://issues.apache.org/jira/browse/FLINK-26422
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Francesco Guardiani


Chinese docs needs to be updated with the content of this commit: 
https://github.com/apache/flink/commit/4f65c7950f2c3ef849f2094deab0e199ffedf57b



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] CAST legacy behaviour

2022-02-22 Thread Francesco Guardiani
Hi all,
I'm +1 with what everything you said Marios.

I'm gonna add another argument on top of that: the "legacy-cast-behavior"
has also a broken type inference, leading to incorrect results or further
errors down in the pipeline[1]. For example, take this:

SELECT COALESCE(CAST('a' AS INT), 0) ...

With the legacy cast behavior ENABLED, this is going to lead to the wrong
result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return value is
inferred as INT NOT NULL, so the planner will drop COALESCE, and will never
return 0. Essentially, CAST will infer the wrong nullability leading to
assigning its result to a NOT NULL type, when its value can effectively be
NULL.

> You introduce a deprecated flag to help users
using old versions of the software to smoothly transition to the new
version, while the new users experience the new features/behavior,
without the need to set a flag.

This is IMO the major point on why we should disable the legacy cast
behavior by default. This is even more relevant with 1.15 and the upgrade
story, as per the problem described above, because users will now end up
generating plans with wrong type inference, which will be hard to migrate
in the next flink versions.

FG

[1] In case you're wondering why it wasn't fixed, the reason is that fixing
it means creating a breaking change, for details
https://github.com/apache/flink/pull/18611#issuecomment-1028174877


On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas  wrote:

> Hello all!
>
> I would like to bring back the discussion regarding the
> *table.exec.legacy-cast-behaviour*
> configuration option which we are introducing with Flink *1.15*. This
> option provides the users
> with the flexibility to continue using the old (incorrect, according to SQL
> standards) behaviour
> of *CAST.*
>
> With Flink *1.15* we have introduced a bunch of fixes, improvements and new
> casting functionality
> between types, see https://issues.apache.org/jira/browse/FLINK-24403, and
> some of them are
> guarded behind the legacy behaviour option:
>
>- Trimming and padding when casting to CHAR/VARCHAR types to respect the
>specified length
>- Changes for casting various types to CHAR/VARCHAR/STRING
>- Runtime errors for CAST no longer emit *null *as result but exceptions
>are thrown with a meaningful message for the cause, that fail the
> pipeline. *TRY_CAST
>*is introduced instead, which emits *null* results instead of throwing
>exceptions.
>
> Those changes become active if users set the
> *table.exec.legacy-cast-behaviour
> *option to *DISABLED*, otherwise
> they will continue to experience the old, *erroneous*, behaviour of *CAST*.
>
> Currently, we have set the *table.exec.legacy-cast-behaviour *option
> to be *ENABLED
> *by default, so if users want
> to get the new correct behaviour, they are required to set explicitly the
> option to *DISABLED*. Moreover, the option
> itself is marked as deprecated, since the plan is to be removed in the
> future, so that the old, erroneous behaviour
> won't be an option, and the *TRY_CAST* would be the way to go if users
> don't want to have errors and failed pipelines,
> but have *null*s emitted in case of runtime errors when casting.
>
> I would like to start a discussion and maybe ask for voting, so that we set
> the *table.exec.legacy-cast-behaviour* option
> to *DISABLED *by default, so that users that want to keep their old
> pipelines working the same way, without changing their
> SQL/TableAPI code, would need to explicitly set it to *ENABLED.*
>
> My main argument for changing the default value for the option, is that the
> *DISABLED* value is the one that enables
> the *correct* behaviour for CAST which should be the default for all new
> users. This way, new FLINK users, or users which
> build new pipelines, from now on would get the correct behaviour by default
> without the need of changing some flag in their
> configuration. It feels weird to me, especially for people very familiar
> with standard SQL, to be obliged to set some config
> flag, to be able to get the correct behaviour for CAST. On top, users that
> won't read about this option in our docs, will,
> "blindly", experience the old incorrect behaviour for their new pipelines,
> and issues that could cause the CAST to fail
> will remain hidden from them, since *nulls* would be emitted. IMHO, this
> last part is also very important during the development
> stages of an application/pipeline. Normally, developers would want to see
> all possible errors/scenarios during development
> stages , in order to build a robust production system. If errors, are
> hidden then, they can easily end up with those errors
> in the production system which would be even harder to discover and debug,
> since no exception will ever be thrown.
> Imagine that there is a CAST which generates nulls because of runtime
> errors, and its result is used in an aggregate function:
>
> SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY 

[jira] [Created] (FLINK-26280) Add a flag to disable uid generation

2022-02-21 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26280:
---

 Summary: Add a flag to disable uid generation
 Key: FLINK-26280
 URL: https://issues.apache.org/jira/browse/FLINK-26280
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Francesco Guardiani


We should add a flag to disable uid generation for back-compat. See the 
discussion here: https://issues.apache.org/jira/browse/FLINK-25932



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26252) Refactor MiniClusterExtension

2022-02-18 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26252:
---

 Summary: Refactor MiniClusterExtension
 Key: FLINK-26252
 URL: https://issues.apache.org/jira/browse/FLINK-26252
 Project: Flink
  Issue Type: Bug
  Components: Test Infrastructure
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26249) Run BuiltInFunctionITCase tests in parallel

2022-02-18 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26249:
---

 Summary: Run BuiltInFunctionITCase tests in parallel
 Key: FLINK-26249
 URL: https://issues.apache.org/jira/browse/FLINK-26249
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner, Test Infrastructure
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS]Support the merge statement in FlinkSQL

2022-02-16 Thread Francesco Guardiani
> In the theory aspect, incremental data should be carefully considered for
streaming data. In this situation,  the data flow from target_table to
target_table
will be a loop, and the incremental data with one key will keep going
through
the loop. It looks very strange.

This is the same concern I have here, I don't see how MERGE can work in a
streaming scenario without modifying its preliminary assumptions and
semantics.

Even assuming we put some hard constraint on the state size, for example
requiring to specify a window definition (like in interval joins), I still
think that the fundamental assumption of MERGE here is a problem: the
target table is both a sink and a source. And I think this is a big issue,
as we cannot reasonably assume that sink and sources are available for the
same table definition or that they behave similarly.

Also, talking about the batch implementation, I don't understand how you
would implement this: from what I see in the "*validator*" paragraph of
your document, you convert the merge statement to a bunch of other sql
statements, but you omit the initial join, fundamental for the semantics of
MERGE. Perhaps can you provide more details about it?

On another note, I think we can take inspiration from MERGE and its "event
driven" semantics, in order to have something that works both for batch and
streaming, say a "Flink-ified" version of MERGE.

For example, something that I can think of could be:

PUSH TO target_table
FROM source_table
ON [window TVF]
[when_clause [...]]

Where when_clause looks like the ones from MERGE (looking at the pgsql).
This has the window TVF constraint, so the state doesn't grow indefinitely,
and the source_table is effectively any select you can think of, removing
the assumption that the target is both a sink and a source. This statement
at the end produces a changelog stream, pushed to the output table. A
statement like this could then allow you to have something similar to the
MERGE, just by replacing source_table with a select performing the join. Of
course this is an example, and might not make much sense, but I hope it
gives you the idea.

FG


On Mon, Feb 14, 2022 at 4:28 AM OpenInx  wrote:

> I'm currently maintaining the iceberg flink modules from apache iceberg
> community.
>
> Currently, the spark has a great integration experience with iceberg format
> v2 in batch mode.  In this document [1],
> The merge into syntax from spark sql extensions does really help a lot when
> people want to change row-level data.
>
> We flink currently has a good integration with iceberg format v2 in
> streaming mode, I mean people can export their
> change log data into an iceberg table directly by writing a few sql.
> This[2] is a good material to read if anybody want to
> create a simple demo.
>
> But I'd say in the batch scenarios,  we flink sql currently lack few
> critical SQL syntax (for integrating iceberg format v2 in batch mode
> better):
> 1.  ALTER TABLE to change columns.
> 2.  UPDATE/DELETE sql to change the unexpected rows in a given table.
> 3.  MERGE INTO to merge a batch changing row set  (mixed with
> insert/delete/update) into the given table.
>
> In short, if we want to provide better integration and user experience with
> iceberg v2 in batch, then I think the support of the above syntax
> is very important (from iceberg perspective).
>
> > I think it's better to make that time investment at Calcite's
> implementation before bringing this to Flink.
>
> I find that there are some sql syntax which are critical for flink sql
> while not for other generic sql parser.  Is it possible to implement our
> flink sql plugin/extensions which
> extends the core calcite sql. Going a step further, is it possible for us
> to achieve a better abstraction of the flink sql framework, so that
> downstream components can implement
> their own customized sql plugins based on this sql framework. In this way,
> it is possible to meet the needs of different components to add their own
> sql implementation on top of
> flink sql.
>
> [1]. https://iceberg.apache.org/docs/latest/spark-writes/#merge-into
> [2].
>
> https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/build-real-time-data-lake-tutorial.html
>
>
> On Fri, Feb 11, 2022 at 4:28 PM zhou chao  wrote:
>
> > Hi, Martijn Visser, thanks for your reply. Firstly, I am sorry for
> posting
> > the
> > discussion twice. I sent the message to the dev mail group from an unsub-
> > scribed account,  but the message was not shown for a while, and I
> guessed
> > that
> > the dev mail group would not post an email coming from an unsubscribed
> > account, such that I sent it again from a subscribed account.
> >
> > Q: How would you see merge work for streaming data?
> > I think this is an interesting topic, especially for Flink, which is
> > wanting to unify
> > the streaming & batch processing. Back to the merge statement, there
> exist
> > two inputs, target_table and source_table(query). In the 

[jira] [Created] (FLINK-26131) CompiledPlan should implement Executable

2022-02-14 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26131:
---

 Summary: CompiledPlan should implement Executable
 Key: FLINK-26131
 URL: https://issues.apache.org/jira/browse/FLINK-26131
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Francesco Guardiani


In order to do that, we need to keep a reference of TableEnvironment within the 
implementation of CompiledPlan. We should also check that the TableEnvironment 
matches



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26128) Improve Table, Expressions and related classes Javadocs/Scaladocs

2022-02-14 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26128:
---

 Summary: Improve Table, Expressions and related classes 
Javadocs/Scaladocs
 Key: FLINK-26128
 URL: https://issues.apache.org/jira/browse/FLINK-26128
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Table SQL / API
Reporter: Francesco Guardiani


Right now we have inconsistent Javadocs and Scaladocs in Table and Expressions. 
Each of them has a different format, and some javadocs are even mentioning 
scala examples. We should properly reword them to be consistent, remove scala 
examples when not necessary, expand java examples. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26127) Cleanup usage of deprecated table methods in doc

2022-02-14 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26127:
---

 Summary: Cleanup usage of deprecated table methods in doc
 Key: FLINK-26127
 URL: https://issues.apache.org/jira/browse/FLINK-26127
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Table SQL / API
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


In our docs we have a lot of deprecated methods usage in various examples. We 
should clean them up, and replace with new methods usage



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26125) Doc overhaul for the CAST behaviour

2022-02-14 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26125:
---

 Summary: Doc overhaul for the CAST behaviour
 Key: FLINK-26125
 URL: https://issues.apache.org/jira/browse/FLINK-26125
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani


This includes:

* Proper documentation of the new TRY_CAST
* Add a CAST matrix to document which CAST tuples are supported



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26090) Remove pre FLIP-84 methods

2022-02-11 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26090:
---

 Summary: Remove pre FLIP-84 methods
 Key: FLINK-26090
 URL: https://issues.apache.org/jira/browse/FLINK-26090
 Project: Flink
  Issue Type: Bug
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


See https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878 
and https://issues.apache.org/jira/browse/FLINK-16364.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26089) Introduce TablePipeline

2022-02-11 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26089:
---

 Summary: Introduce TablePipeline
 Key: FLINK-26089
 URL: https://issues.apache.org/jira/browse/FLINK-26089
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


See 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26071) TableEnvironment#compilePlan should fail

2022-02-10 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26071:
---

 Summary: TableEnvironment#compilePlan should fail
 Key: FLINK-26071
 URL: https://issues.apache.org/jira/browse/FLINK-26071
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


FLINK-25841 introduced {{compilePlan}} in {{TableEnvironment}}. This API should 
fail, rather than failing later in {{writeToFile}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26060) Make Python specific exec nodes unsupported

2022-02-09 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26060:
---

 Summary: Make Python specific exec nodes unsupported 
 Key: FLINK-26060
 URL: https://issues.apache.org/jira/browse/FLINK-26060
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


These nodes are using the old type system, which is going to be removed soon. 
We should avoid supporting them in the persisted plan, as we cannot commit to 
support them. Once migrated to the new type system, PyFlink won't need these 
nodes anymore and will just rely on Table new function stack. For more details, 
also check https://issues.apache.org/jira/browse/FLINK-25231



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26054) Enable maven-enforcer to disable table-planner and table-runtime as dependencies

2022-02-09 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26054:
---

 Summary: Enable maven-enforcer to disable table-planner and 
table-runtime as dependencies
 Key: FLINK-26054
 URL: https://issues.apache.org/jira/browse/FLINK-26054
 Project: Flink
  Issue Type: Bug
  Components: Build System, Table SQL / Planner
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


>From https://github.com/apache/flink/pull/18676#discussion_r802502438, we 
>should enable the enforcer for table-planner and table-runtime modules, as 
>described in the flink-table/README.md



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26053) Fix parser generator warnings

2022-02-09 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-26053:
---

 Summary: Fix parser generator warnings
 Key: FLINK-26053
 URL: https://issues.apache.org/jira/browse/FLINK-26053
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Francesco Guardiani


When building flink-sql-parser, the javacc logs a couple of warnings:


{code}
Warning: Choice conflict in [...] construct at line 2920, column 5.
 Expansion nested within construct and expansion following construct
 have common prefixes, one of which is: "PLAN"
 Consider using a lookahead of 2 or more for nested expansion.
Warning: Choice conflict involving two expansions at
 line 2930, column 9 and line 2932, column 9 respectively.
 A common prefix is: "STATEMENT"
 Consider using a lookahead of 2 for earlier expansion.
Warning: Choice conflict involving two expansions at
 line 2952, column 9 and line 2954, column 9 respectively.
 A common prefix is: "STATEMENT"
 Consider using a lookahead of 2 for earlier expansion.
{code}

Took from 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30871=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=3911

We should investigate them, as they're often symptom of some bug in our parser 
template, which might result in unexpected parsing errors.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25986) Add FLIP-190 new API methods to python

2022-02-07 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25986:
---

 Summary: Add FLIP-190 new API methods to python
 Key: FLINK-25986
 URL: https://issues.apache.org/jira/browse/FLINK-25986
 Project: Flink
  Issue Type: Bug
  Components: API / Python, Table SQL / API
Reporter: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25942) Use jackson jdk8/time modules for Duration ser/de

2022-02-03 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25942:
---

 Summary: Use jackson jdk8/time modules for Duration ser/de
 Key: FLINK-25942
 URL: https://issues.apache.org/jira/browse/FLINK-25942
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


https://issues.apache.org/jira/browse/FLINK-25588 introduced jackson jdk8 and 
datetime modules to flink-shaded jackson, so now we don't need our Duration 
ser/de anymore (which was introduced because we lacked this jackson module).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25918) Use FileEnumerator to implement filter pushdown of filepath metadata

2022-02-02 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25918:
---

 Summary: Use FileEnumerator to implement filter pushdown of 
filepath metadata 
 Key: FLINK-25918
 URL: https://issues.apache.org/jira/browse/FLINK-25918
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / FileSystem
Reporter: Francesco Guardiani


Right now, unless you configure partition keys, the table file source will 
ingest all the files in the provided {{path}}.

Which means that a query like:

{code:sql}
SELECT * FROM MyFileTable WHERE filepath LIKE "%.csv"
{code}

Will ingest all the files and then, after the records are loaded in flink, the 
filtering happens and discards all the records not coming from a file with 
pattern "%.csv".

Using the filter push down feature provided by the DynamicTableSource stack, we 
could instead provide the {{FileSourceBuilder}} directly a {{FileEnumerator}} 
that does the filtering of input files, so we can effectively skip reading them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Deprecate/remove Twitter connector

2022-01-31 Thread Francesco Guardiani
> Shameless plug:  Maybe the Wikipedia EventStreams
<https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams> SSE API
<https://stream.wikimedia.org/?doc#/streams> would make for a great
connector example in Flink?

Sounds like a great idea! Do you have a ready to use Java Client for that?

On Mon, Jan 31, 2022 at 3:47 PM Jing Ge  wrote:

> Thanks @Martijn for driving this! +1 for deprecating and removing it. All
> the concerns mentioned previously are valid. It is good to know that the
> upcoming connector template/archetype will help the user for the kickoff.
> Beyond that, speaking of using a real connector as a sample, since Flink is
> heading towards the unified batch and stream processing, IMHO, it would be
> nice to pick up a feasible connector for this trend to let the user get a
> sample close to the use cases.
>
> Best regards
> Jing
>
> On Mon, Jan 31, 2022 at 3:07 PM Andrew Otto  wrote:
>
>> Shameless plug:  Maybe the Wikipedia EventStreams
>> <https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams> SSE API
>> <https://stream.wikimedia.org/?doc#/streams> would make for a great
>> connector example in Flink?
>>
>> :D
>>
>> On Mon, Jan 31, 2022 at 5:41 AM Martijn Visser 
>> wrote:
>>
>>> Hi all,
>>>
>>> Thanks for your feedback. It's not about having this connector in the
>>> main repo, that has been voted on already. This is strictly about the
>>> connector itself, since it's not maintained and most probably also can't be
>>> used due to changes in Twitter's API that aren't reflected in our connector
>>> implementation. Therefore I propose to remove it.
>>>
>>> Fully agree on the template part, what's good to know is that a
>>> connector template/archetype is part of the goals for the external
>>> connector repository.
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> On Mon, 31 Jan 2022 at 11:32, Francesco Guardiani <
>>> france...@ververica.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I agree with the concern about having this connector in the main repo.
>>>> But I think in general it doesn't harm to have a sample connector to show
>>>> how to develop a custom connector, and I think that the Twitter connector
>>>> can be a good candidate for such a template. It needs rework for sure, as
>>>> it has evident issues, notably it doesn't work with table.
>>>>
>>>> So i understand if we wanna remove what we have right now, but I think
>>>> we should have some replacement for a "connector template", which is both
>>>> ready to use and easy to hack to build your own connector starting from it.
>>>> Twitter API is a good example for such a template, as it's both "related"
>>>> to the known common use cases of Flink and because is quite simple to get
>>>> started with.
>>>>
>>>> FG
>>>>
>>>> On Sun, Jan 30, 2022 at 12:31 PM David Anderson 
>>>> wrote:
>>>>
>>>>> I agree.
>>>>>
>>>>> The Twitter connector is used in a few (unofficial) tutorials, so if
>>>>> we remove it that will make it more difficult for those tutorials to be
>>>>> maintained. On the other hand, if I recall correctly, that connector uses
>>>>> V1 of the Twitter API, which has been deprecated, so it's really not very
>>>>> useful even for that purpose.
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 21, 2022 at 9:34 AM Martijn Visser 
>>>>> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I would like to discuss deprecating Flinks' Twitter connector [1].
>>>>>> This was one of the first connectors that was added to Flink, which could
>>>>>> be used to access the tweets from Twitter. Given the evolution of Flink
>>>>>> over Twitter, I don't think that:
>>>>>>
>>>>>> * Users are still using this connector at all
>>>>>> * That the code for this connector should be in the main Flink
>>>>>> codebase.
>>>>>>
>>>>>> Given the circumstances, I would propose to deprecate and remove this
>>>>>> connector. I'm looking forward to your thoughts. If you agree, please 
>>>>>> also
>>>>>> let me know if you think we should first deprecate it in Flink 1.15 and
>>>>>> remove it in a version after that, or if you think we can remove it
>>>>>> directly.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Martijn Visser
>>>>>> https://twitter.com/MartijnVisser82
>>>>>>
>>>>>> [1]
>>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/twitter/
>>>>>>
>>>>>>


[jira] [Created] (FLINK-25897) Update project configuration gradle doc to 7.x version

2022-01-31 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25897:
---

 Summary: Update project configuration gradle doc to 7.x version
 Key: FLINK-25897
 URL: https://issues.apache.org/jira/browse/FLINK-25897
 Project: Flink
  Issue Type: Bug
Reporter: Francesco Guardiani
Assignee: Mario Rincon-Nigro


Update the gradle build script and its doc page to 7.x



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25895) Add ExecNodeGraph ser/de

2022-01-31 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25895:
---

 Summary: Add ExecNodeGraph ser/de
 Key: FLINK-25895
 URL: https://issues.apache.org/jira/browse/FLINK-25895
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


Right now we have an intermediate model called {{JsonPlanGraph}} to ser/de the 
ExecNodeGraph. This model is fine, but we should do the conversion through 
jackson rather than manually ser/de it with {{ExecNodeGraphJsonPlanGenerator}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Deprecate/remove Twitter connector

2022-01-31 Thread Francesco Guardiani
Hi,

I agree with the concern about having this connector in the main repo. But
I think in general it doesn't harm to have a sample connector to show how
to develop a custom connector, and I think that the Twitter connector can
be a good candidate for such a template. It needs rework for sure, as it
has evident issues, notably it doesn't work with table.

So i understand if we wanna remove what we have right now, but I think we
should have some replacement for a "connector template", which is both
ready to use and easy to hack to build your own connector starting from it.
Twitter API is a good example for such a template, as it's both "related"
to the known common use cases of Flink and because is quite simple to get
started with.

FG

On Sun, Jan 30, 2022 at 12:31 PM David Anderson 
wrote:

> I agree.
>
> The Twitter connector is used in a few (unofficial) tutorials, so if we
> remove it that will make it more difficult for those tutorials to be
> maintained. On the other hand, if I recall correctly, that connector uses
> V1 of the Twitter API, which has been deprecated, so it's really not very
> useful even for that purpose.
>
> David
>
>
>
> On Fri, Jan 21, 2022 at 9:34 AM Martijn Visser 
> wrote:
>
>> Hi everyone,
>>
>> I would like to discuss deprecating Flinks' Twitter connector [1]. This
>> was one of the first connectors that was added to Flink, which could be
>> used to access the tweets from Twitter. Given the evolution of Flink over
>> Twitter, I don't think that:
>>
>> * Users are still using this connector at all
>> * That the code for this connector should be in the main Flink codebase.
>>
>> Given the circumstances, I would propose to deprecate and remove this
>> connector. I'm looking forward to your thoughts. If you agree, please also
>> let me know if you think we should first deprecate it in Flink 1.15 and
>> remove it in a version after that, or if you think we can remove it
>> directly.
>>
>> Best regards,
>>
>> Martijn Visser
>> https://twitter.com/MartijnVisser82
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/twitter/
>>
>>


Re: [ANNOUNCE] flink-shaded 15.0 released

2022-01-27 Thread Francesco Guardiani
Thanks Chesnay for this!

FG

On Mon, Jan 24, 2022 at 11:20 AM David Morávek  wrote:

> That's a great news Chesnay, thanks for driving this! This should unblock
> some ongoing Flink efforts +1
>
> Best,
> D.
>
> On Mon, Jan 24, 2022 at 10:58 AM Chesnay Schepler 
> wrote:
>
> > Hello everyone,
> >
> > we got a new flink-shaded release, with several nifty things:
> >
> >   * updated version for ASM, required for Java 17
> >   * jackson extensions for optionals/datetime, which will be used by the
> > Table API (and maybe REST API)
> >   * a relocated version of swagger, finally unblocking the merge of our
> > experimental swagger spec
> >   * updated version for Netty, providing a proper fix for FLINK-24197
> >
> >
>


[jira] [Created] (FLINK-25809) Introduce test infra for building FLIP-190 tests

2022-01-25 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25809:
---

 Summary: Introduce test infra for building FLIP-190 tests 
 Key: FLINK-25809
 URL: https://issues.apache.org/jira/browse/FLINK-25809
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


The FLIP-190 requires to build a new test infra. For this test infra, we want 
to define test cases and data once, and then for each case we want to execute 
the following:

* Integration test that roughly does {{create plan -> execute job -> trigger 
savepoint -> stop job -> restore plan -> restore savepoint -> execute job -> 
stop and assert}}. Plan and savepoint should be commited to git, so running 
this tests when a plan and savepoint is available will not regenerate plan and 
savepoint.
* Change detection test to check that for the defined test cases, the plan 
hasn't been changed. Similar to the existing {{JsonPlanITCase}} tests.
* Completeness of tests/Coverage, that is count how many times ExecNodes 
(including versions) are used in the test cases. Fail if an ExecNode version is 
never covered.

Other requirements includes to "version" the test cases, that is for each test 
case we can retain different versions of the plan and savepoint, in order to 
make sure that after we introduce a new plan change, the old plan still 
continues to run



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25791) Make ObjectIdentifier json representation simpler

2022-01-24 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25791:
---

 Summary: Make ObjectIdentifier json representation simpler
 Key: FLINK-25791
 URL: https://issues.apache.org/jira/browse/FLINK-25791
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


Use {{ObjectIdentifier#asSerializableString}} to serialize the object 
identifier, rather than serializing it as object.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25779) Define ConfigOption for properties map in Kinesis

2022-01-24 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25779:
---

 Summary: Define ConfigOption for properties map in Kinesis
 Key: FLINK-25779
 URL: https://issues.apache.org/jira/browse/FLINK-25779
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Reporter: Francesco Guardiani


Now there is no {{ConfigOption}} definition of {{properties.*}} in Kafka 
connector. This breaks option forwarding when restoring the table options from 
the persisted plan/catalog. We can define it using 
{{ConfigOptionBuilder#mapType}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25778) Define ConfigOption for properties map in Kafka

2022-01-24 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25778:
---

 Summary: Define ConfigOption for properties map in Kafka
 Key: FLINK-25778
 URL: https://issues.apache.org/jira/browse/FLINK-25778
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Reporter: Francesco Guardiani


Now there is no {{ConfigOption}} definition of {{properties.*}} in Kafka 
connector. This breaks option forwarding when restoring the table options from 
the persisted plan/catalog. We can define it using 
{{ConfigOptionBuilder#mapType}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25777) Generate documentation for Table factories (formats and connectors)

2022-01-24 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25777:
---

 Summary: Generate documentation for Table factories (formats and 
connectors)
 Key: FLINK-25777
 URL: https://issues.apache.org/jira/browse/FLINK-25777
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / Common, Documentation, Formats (JSON, Avro, 
Parquet, ORC, SequenceFile)
Reporter: Francesco Guardiani


The goal of this issue is to generate automatically from code the documentation 
of configuration options for table connectors and formats.

This issue includes:

* Tweak {{ConfigOptionsDocGenerator}} to work with {{Factory#requiredOptions}}, 
{{Factory#requiredOptions}} and newly introduced 
{{DynamicTableFactory#forwardOptions}} and {{FormatFactory#forwardOptions}}. 
Also see this https://github.com/apache/flink/pull/18290 as reference. From 
these methods we should extract if an option is required or not, and if it's 
forwardable or not.
* Decide whether the generator output should be, and how to link/include it in 
the connector/format docs pages.
* Enable the code generator in CI.
* Regenerate all the existing docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25588) Add jdk8 and datetime module to jackson shaded

2022-01-10 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25588:
---

 Summary: Add jdk8 and datetime module to jackson shaded
 Key: FLINK-25588
 URL: https://issues.apache.org/jira/browse/FLINK-25588
 Project: Flink
  Issue Type: Improvement
  Components: BuildSystem / Shaded
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


Add modules jdk8 and jsr310 to jackson shaded. 
https://github.com/FasterXML/jackson-modules-java8



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-188: Introduce Built-in Dynamic Table Storage

2022-01-07 Thread Francesco Guardiani
+1 with a separate repo and +1 with the flink-storage name

On Fri, Jan 7, 2022 at 8:40 AM Jingsong Li  wrote:

> Hi everyone,
>
> Vote for create a separate sub project for FLIP-188 thread is here:
> https://lists.apache.org/thread/wzzhr27cvrh6w107bn464m1m1ycfll1z
>
> Best,
> Jingsong
>
>
> On Fri, Jan 7, 2022 at 3:30 PM Jingsong Li  wrote:
> >
> > Hi Timo,
> >
> > I think we can consider exposing to DataStream users in the future, if
> > the API definition is clear after.
> > I am fine with `flink-table-store` too.
> > But I tend to prefer shorter and clearer name:
> > `flink-store`.
> >
> > I think I can create a separate thread to vote.
> >
> > Looking forward to your thoughts!
> >
> > Best,
> > Jingsong
> >
> >
> > On Thu, Dec 30, 2021 at 9:48 PM Timo Walther  wrote:
> > >
> > > +1 for a separate repository. And also +1 for finding a good name.
> > >
> > > `flink-warehouse` would be definitely a good marketing name but I agree
> > > that we should not start marketing for code bases. Are we planning to
> > > make this storage also available to DataStream API users? If not, I
> > > would also vote for `flink-managed-table` or better:
> `flink-table-store`
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > >
> > > On 29.12.21 07:58, Jingsong Li wrote:
> > > > Thanks Till for your suggestions.
> > > >
> > > > Personally, I like flink-warehouse, this is what we want to convey to
> > > > the user, but it indicates a bit too much scope.
> > > >
> > > > How about just calling it flink-store?
> > > > Simply to convey an impression: this is flink's store project,
> > > > providing a built-in store for the flink compute engine, which can be
> > > > used by flink-table as well as flink-datastream.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Tue, Dec 28, 2021 at 5:15 PM Till Rohrmann 
> wrote:
> > > >>
> > > >> Hi Jingsong,
> > > >>
> > > >> I think that developing flink-dynamic-storage as a separate sub
> project is
> > > >> a very good idea since it allows us to move a lot faster and
> decouple
> > > >> releases from Flink. Hence big +1.
> > > >>
> > > >> Do we want to name it flink-dynamic-storage or shall we use a more
> > > >> descriptive name? dynamic-storage sounds a bit generic to me and I
> wouldn't
> > > >> know that this has something to do with letting Flink manage your
> tables
> > > >> and their storage. I don't have a very good idea but maybe we can
> call it
> > > >> flink-managed-tables, flink-warehouse, flink-olap or so.
> > > >>
> > > >> Cheers,
> > > >> Till
> > > >>
> > > >> On Tue, Dec 28, 2021 at 9:49 AM Martijn Visser <
> mart...@ververica.com>
> > > >> wrote:
> > > >>
> > > >>> Hi Jingsong,
> > > >>>
> > > >>> That sounds promising! +1 from my side to continue development
> under
> > > >>> flink-dynamic-storage as a Flink subproject. I think having a more
> in-depth
> > > >>> interface will benefit everyone.
> > > >>>
> > > >>> Best regards,
> > > >>>
> > > >>> Martijn
> > > >>>
> > > >>> On Tue, 28 Dec 2021 at 04:23, Jingsong Li 
> wrote:
> > > >>>
> > >  Hi all,
> > > 
> > >  After some experimentation, we felt no problem putting the dynamic
> > >  storage outside of flink, and it also allowed us to design the
> > >  interface in more depth.
> > > 
> > >  What do you think? If there is no problem, I am asking for PMC's
> help
> > >  here: we want to propose flink-dynamic-storage as a flink
> subproject,
> > >  and we want to build the project under apache.
> > > 
> > >  Best,
> > >  Jingsong
> > > 
> > > 
> > >  On Wed, Nov 24, 2021 at 8:10 PM Jingsong Li <
> jingsongl...@gmail.com>
> > >  wrote:
> > > >
> > > > Hi Stephan,
> > > >
> > > > Thanks for your reply.
> > > >
> > > > Data never expires automatically.
> > > >
> > > > If there is a need for data retention, the user can choose one
> of the
> > > > following options:
> > > > - In the SQL for querying the managed table, users filter the
> data by
> > >  themselves
> > > > - Define the time partition, and users can delete the expired
> > > > partition by themselves. (DROP PARTITION ...)
> > > > - In the future version, we will support the "DELETE FROM"
> statement,
> > > > users can delete the expired data according to the conditions.
> > > >
> > > > So to answer your question:
> > > >
> > > >> Will the VMQ send retractions so that the data will be removed
> from
> > >  the table (via compactions)?
> > > >
> > > > The current implementation is not sending retraction, which I
> think
> > > > theoretically should be sent, currently the user can filter by
> > > > subsequent conditions.
> > > > And yes, the subscriber would not see strictly a correct result.
> I
> > > > think this is something we can improve for Flink SQL.
> > > >
> > > >> Do we want time retention semantics handled by the compaction?
> > > >
> > > > Currently, no, Data never 

[jira] [Created] (FLINK-25526) Deprecate TableSinkFactory, TableSourceFactory and TableFormatFactory

2022-01-05 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25526:
---

 Summary: Deprecate TableSinkFactory, TableSourceFactory and 
TableFormatFactory
 Key: FLINK-25526
 URL: https://issues.apache.org/jira/browse/FLINK-25526
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / API
Reporter: Francesco Guardiani
 Fix For: 1.15.0


These factories are part of the old type system/sink & source stack and should 
not be used anymore, as users should work with 
DynamicTableSink/DynamicTableSource factories instead.

This task should deprecate {{TableSinkFactory}}, {{TableSourceFactory}} and 
{{TableFormatFactory}} and all the related types.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25518) Harden JSON Serialization utilities

2022-01-04 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25518:
---

 Summary: Harden JSON Serialization utilities
 Key: FLINK-25518
 URL: https://issues.apache.org/jira/browse/FLINK-25518
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-03 Thread Francesco Guardiani
Hi Jing,

Thanks for the FLIP. I'm not very knowledgeable about the topic, but going
through both the FLIP and the discussion here, I wonder, does it makes
sense for a lookup join to use hash distribution whenever is possible by
default?

The point you're explaining here:

> Many Lookup table sources introduce cache in order
to reduce the RPC call, such as JDBC, CSV, HBase connectors.
For those connectors, we could raise cache hit ratio by routing the same
lookup keys to the same task instance

Seems something we can infer automatically, rather than manually asking the
user to add this hint to the query. Note that I'm not talking against the
hint syntax, which might still make sense to be introduced, but I feel like
this optimization makes sense in the general case when using the connectors
you have quoted. Perhaps there is some downside I'm not aware of?

Talking about the hint themselves, taking this example as reference:

SELECT /*+ SHUFFLE_HASH('Orders', 'Customers') */ o.order_id, o.total,
c.country, c.zip
FROM Orders AS o
JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id = c.id;

Shouldn't the hint take the table alias as the "table name"? What If you do
two lookup joins in cascade within the same query with the same table (once
on a key, then on another one), where you use two different aliases for the
table?


On Fri, Dec 31, 2021 at 9:56 AM Jing Zhang  wrote:

> Hi Lincoln,
> Thanks for the feedback.
>
> > 1. For the hint name, +1 for WenLong's proposal.
>
> I've added add 'SHUFFLE_HASH' to other alternatives in FLIP. Let's waiting
> for more voices here.
>
> > Regarding the `SKEW` hint, agree with you that it can be used widely, and
> I
> prefer to treat it as a metadata hint, a new category differs from a join
> hint.
> For your example:
> ```
> SELECT /*+ USE_HASH('Orders', 'Customers'), SKEW('Orders') */ o.order_id,
> o.total, c.country, c.zip
> FROM Orders AS o
> JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
> ON o.customer_id = c.id;
> ```
> I would prefer another form:
> ```
> -- provide the skew info to let the engine choose the optimal plan
> SELECT /*+ SKEW('Orders') */ o.order_id, ...
>
> -- or introduce a new hint for the join case, e.g.,
> SELECT /*+ REPLICATED_SHUFFLE_HASH('Orders') */ o.order_id, ...
> ```
>
> Maybe there is misunderstanding here.
> I just use a syntax sugar here.
>
> SELECT /*+ USE_HASH('Orders', 'Customers'), SKEW('Orders') */ o.order_id,
> 
>
> is just a syntax with
>
> SELECT /*+ USE_HASH('Orders', 'Customers') */ /*+SKEW('Orders') */
> o.order_id,
> 
>
> Although I list 'USE_HASH' and 'SKEW' hint in a query hint clause, it does
> not mean they must appear together as a whole.
> Based on calcite syntax doc [1], you could list more than one hint in
> a /*+' hint [, hint ]* '*/ clause.
>
> Each hint has different function.
> The'USE_HASH' hint suggests the optimizer use hash partitioner for Lookup
> Join for table 'Orders' and table 'Customers' while the 'SKEW' hint tells
> the optimizer the skew metadata about the table 'Orders'.
>
> Best,
> Jing Zhang
>
> [1] https://calcite.apache.org/docs/reference.html#sql-hints
>
>
>
>
> Jing Zhang  于2021年12月31日周五 16:39写道:
>
> > Hi Martijn,
> > Thanks for the feedback.
> >
> > Glad to hear that we reached a consensus on the first and second point.
> >
> > About whether to use `use_hash` as a term, I think your concern makes
> > sense.
> > Although the hash lookup join is similar to Hash join in oracle that they
> > all require hash distribution on input, there exists a little difference
> > between them.
> > About this point, Lincoln and WenLong both prefer the term
> 'SHUFFLE_HASH',
> > WDYT?
> >
> > Best,
> > Jing Zhang
> >
> >
> > Lincoln Lee  于2021年12月30日周四 11:21写道:
> >
> >> Hi Jing,
> >> Thanks for your explanation!
> >>
> >> 1. For the hint name, +1 for WenLong's proposal. I think the `SHUFFLE`
> >> keyword is important in a classic distributed computing system,
> >> a hash-join usually means there's a shuffle stage(include shuffle
> >> hash-join, broadcast hash-join). Users only need to pass the `build`
> side
> >> table(usually the smaller one) into `SHUFFLE_HASH` join hint, more
> >> concisely than `USE_HASH(left_table, right_table)`. Please correct me if
> >> my
> >> understanding is wrong.
> >> Regarding the `SKEW` hint, agree with you that it can be used widely,
> and
> >> I
> >> prefer to treat it as a metadata hint, a new category differs from a
> join
> >> hint.
> >> For your example:
> >> ```
> >> SELECT /*+ USE_HASH('Orders', 'Customers'), SKEW('Orders') */
> o.order_id,
> >> o.total, c.country, c.zip
> >> FROM Orders AS o
> >> JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
> >> ON o.customer_id = c.id;
> >> ```
> >> I would prefer another form:
> >> ```
> >> -- provide the skew info to let the engine choose the optimal plan
> >> SELECT /*+ SKEW('Orders') */ o.order_id, ...
> >>
> >> -- or introduce a new hint for the join case, e.g.,
> >> SELECT /*+ 

[jira] [Created] (FLINK-25428) Expose complex types CAST to String

2021-12-23 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25428:
---

 Summary: Expose complex types CAST to String
 Key: FLINK-25428
 URL: https://issues.apache.org/jira/browse/FLINK-25428
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Francesco Guardiani
 Attachments: cast_function_it_case.patch, logical_type_casts.patch

Right now we have all the casting rules for collection, structured and raw 
types to string, that is we have logic to stringify the following types:

* ARRAY
* MAP
* MULTISET
* ROW
* STRUCTURED
* RAW

Unfortunately these don't work, for different reasons, notably:

* We need to support these combinations in {{LogicalTypeCasts}} (check the 
attached patch)
* For some of them Calcite applies its casting validation logic and marks them 
as invalid
* For MULTISET and STRUCTURED, there are issues specific to Table API and its 
expression stack, which cannot correctly convert the values to literal

You can check all these errors by applying the attached patch to the cast 
function it cases.

We need to fix these issues, so users can use SQL and Table API to cast these 
values to string.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25300) Remove CEIL and FLOOR call with one argument for integral values

2021-12-14 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25300:
---

 Summary: Remove CEIL and FLOOR call with one argument for integral 
values
 Key: FLINK-25300
 URL: https://issues.apache.org/jira/browse/FLINK-25300
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / Runtime
Reporter: Francesco Guardiani


Right now when the user tries to invoke {{FLOOR}} and {{CEIL}} on any integral 
number, a call to {{Math.floor}}/{{Math.ceil}} is generated, which is noop and 
possibly might end up in incorrect results.

We should rather implement a rule in the planner (or reuse the expression 
reducer rule) that removes {{FLOOR}} and {{CEIL}} in case its arguments are 
integral types, that is TINYINT, SMALLINT, INT and BIGINT



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25299) Improve LIKE operator efficiency

2021-12-14 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25299:
---

 Summary: Improve LIKE operator efficiency
 Key: FLINK-25299
 URL: https://issues.apache.org/jira/browse/FLINK-25299
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / Runtime
Reporter: Francesco Guardiani


Right now LIKE is implemented using regexes (check {{SqlLikeUtils}} provided by 
https://issues.apache.org/jira/browse/FLINK-25282).

We can improve it either removing the regex based implementation and manually 
implementing the matching code, or we can at least improve it caching the regex 
in the codegen assigning the pattern to a field of the generated function, as 
parsing the pattern is usually an expensive operation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25282) Move runtime dependencies from table-planner to table-runtime

2021-12-13 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25282:
---

 Summary: Move runtime dependencies from table-planner to 
table-runtime
 Key: FLINK-25282
 URL: https://issues.apache.org/jira/browse/FLINK-25282
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


There are several runtime dependencies (e.g. functions used in codegen) that 
are shipped by table-planner and calcite-core. We should move these 
dependencies to runtime



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS][FLINK-24427] Hide Scala from table planner

2021-12-10 Thread Francesco Guardiani
> When would the follow ups (mentioned under out of
scope) be done?

For PyFlink, I think we can reasonably get it done soon-ish. What needs to
be done is for PyFlink to start using the new type system and then the
planner needs to expose a way to plug in rules and to use the code
generator. But we're not talking about this release cycle for sure, and of
course that depends on the bandwidth PyFlink contributors can allocate to
it.

For tests, we'll see what we can do in this release cycle, but this
requires an iterative process consisting on fixing and even rewriting
existing test utilities, and will definitely span the next release and
probably even the one after. On the other hand, the tests issue is relevant
only to who develops format and connectors and rely on internal
undocumented test utilities, other users can just use the new
planner-loader module for developing tests.

For Hive, I can't really say because, as far as I investigated, the leaks
are too deep and it requires significant amount of work to isolate them.

> For the old type system for UDFs: naive question as I am not involved in
SQL much, is there an agreed upon deprecation/removal plan for the legacy
type system yet?

IIRC The legacy type system was deprecated a year ago, and I'm not aware of
any component in particular, except pyflink, that relies heavily on it. It
wasn't removed until now just because it's so deep in the codebase, that
it's an effort on its own and no one in the community had time to get on it
:)

> I am asking because the intermediate state of the uber JAR
described in the document seems a bit messy and I fear that users will
stumble across that.

I wouldn't say it's messy, on the contrary what we have right now is messy,
as it's just one big jar with everything, including apis, scala, runtime
and planner. The splitting in different jars is way nicer, as you have apis
in a single jar, runtime in a single jar and planner in a single jar as
three different components. The result is:

* You can swap planner and planner-loader depending on the fact you need
planner internals or not
* You can use the scala version you want
* Potentially, your task manager deployments can remove the planner from
the classpath, as only runtime and apis are required to run the actual job

I think what's really important here is that we communicate this change to
the users, both through mailing list mails, release notes and documentation
update.

Because we're going to continue to ship the old planner jar, together with
the new planner-loader jar, I suggest to start using the planner-loader as
default, as described in the doc, and during the RC, if we see the new
planner-loader is unstable, we swap the default planner in "/lib" with the
old jar before the 1.15 release.

FG

On Fri, Dec 10, 2021 at 1:04 PM Konstantin Knauf  wrote:

> Hi Francesco,
>
> Thanks for this summary. When would the follow ups (mentioned under out of
> scope) be done? I am asking because the intermediate state of the uber JAR
> described in the document seems a bit messy and I fear that users will
> stumble across that.
>
> For the old type system for UDFs: naive question as I am not involved in
> SQL much, is there an agreed upon deprecation/removal plan for the legacy
> type system yet?
>
> Cheers,
>
> Konstantin
>
> On Wed, Dec 8, 2021 at 6:02 PM Francesco Guardiani <
> france...@ververica.com>
> wrote:
>
> > Hi all,
> > In case you haven't seen, last week I published in the issue comments
> this
> > document to explain how we're proceeding to hide Scala from table
> planner:
> >
> >
> https://docs.google.com/document/d/12yDUCnvcwU2mODBKTHQ1xhfOq1ujYUrXltiN_rbhT34/edit?usp=sharing
> >
> > There is a section I've added yesterday which is particularly relevant,
> > because it explains the impact on the distribution. I strongly encourage
> > people to look at it.
> >
> > Once we perform all the changes, I'm gonna announce them on the user
> > mailing list as well, together with the package name changes already
> > brought in by #17897 <https://github.com/apache/flink/pull/17897> to
> > flink-parquet and flink-orc.
> >
> > Thanks,
> > FG
> >
> > --
> >
> > Francesco Guardiani | Software Engineer
> >
> > france...@ververica.com
> >
> >
> > <https://www.ververica.com/>
> >
> > Follow us @VervericaData
> >
> > --
> >
> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> > --
> >
> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >
> > --
> >
> > Ververica GmbH
> >
> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >
> > Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason,
> > Jinwei (Kevin) Zhang
> >
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


Re: [DISCUSS] Shall casting functions return null or throw exceptions for invalid input

2021-12-10 Thread Francesco Guardiani
ct) behaviour in Flink 1.15 by setting the
> >>> configuration flag to the current situation and only changing this
> >>> if/whenever a Flink 2.0 version is released.
> >>>
> >>>  From my perspective, I can understand that going for option 2 is a
> >>> preferred option for those that are running large Flink setups with a
> >>> great
> >>> number of users. I am wondering if those platforms also have the
> ability
> >>> to
> >>> set default values and/or override user configuration. That could be a
> way
> >>> to solve this issue for these platform teams.
> >>>
> >>> I would prefer to go for option 1, because I think correct execution of
> >>> CAST is important, especially for new Flink users. These new users
> should
> >>> have a smooth user experience and shouldn't need to change
> configuration
> >>> flags to get correct behaviour. I do expect that users who have used
> Flink
> >>> before are more familiar with checking release notes and interpreting
> how
> >>> this potentially affects them. That's why we have release notes. I also
> >>> doubt that we will have a Flink 2.0 release any time soon, meaning
> that we
> >>> are only going to make the pain even bigger for more users if we change
> >>> this incorrect behaviour at a later time.
> >>>
> >>> Best regards,
> >>>
> >>> Martijn
> >>>
> >>> On Tue, 23 Nov 2021 at 02:10, Kurt Young  wrote:
> >>>
> >>>>> This is what I don't really understand here: how adding a
> >>> configuration
> >>>> option causes issues here?
> >>>> This is why: for most Flink production use cases I see, it's not like
> a
> >>>> couple of people manage ~5 Flink
> >>>> jobs, so they can easily track all the big changes in every minor
> Flink
> >>>> version. Typically use case are like
> >>>> a group of people managing some streaming platform, which will provide
> >>>> Flink as an execution engine
> >>>> to their users. Alibaba has more than 40K online streaming SQL jobs,
> and
> >>>> ByteDance also has a similar
> >>>> number. Most of the time, whether upgrading Flink version will be
> >>>> controlled by the user of the platform,
> >>>> not the platform provider. The platform will most likely provide
> >>> multiple
> >>>> Flink version support.
> >>>>
> >>>> Even if you can count on the platform provider to read all the release
> >>>> notes carefully, their users won't. So
> >>>> we are kind of throw the responsibility to all the platform provider,
> >>> make
> >>>> them to take care of the semantic
> >>>> changes. They have to find some good way to control the impactions
> when
> >>>> their users upgrade Flink's version.
> >>>> And if they don't find a good solution around this, and their users
> >>>> encounter some online issues, they will be
> >>>> blamed. And you can guess who they would blame.
> >>>>
> >>>> Flink is a very popular engine now, every decision we make will affect
> >>> the
> >>>> users a lot. If you want them to make
> >>>> some changes, I would argue we should make them think it's worth it.
> >>>>
> >>>> Best,
> >>>> Kurt
> >>>>
> >>>>
> >>>> On Mon, Nov 22, 2021 at 11:29 PM Francesco Guardiani <
> >>>> france...@ververica.com> wrote:
> >>>>
> >>>>>> NULL in SQL essentially means "UNKNOWN", it's not as scary as a
> >>> null in
> >>>>> java which will easily cause a NPE or some random behavior with a c++
> >>>>> function call.
> >>>>>
> >>>>> This is true from the user point of view, except our runtime doesn't
> >>>> treat
> >>>>> null as some value where you can safely execute operations and get
> >>> "noop"
> >>>>> results. In our runtime null is Java's null, hence causing issues and
> >>>>> generating NPEs here and there when nulls are not expected.
> >>>>>
> >>>>>> It will really create a big mess after users upgrade their SQL jobs
> >>>>>
> >>>>

[jira] [Created] (FLINK-25229) Introduce flink-table-api-bridge-common

2021-12-09 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25229:
---

 Summary: Introduce flink-table-api-bridge-common
 Key: FLINK-25229
 URL: https://issues.apache.org/jira/browse/FLINK-25229
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API, Table SQL / Planner
Reporter: Francesco Guardiani


This package should deduplicate code from api-java-bridge and api-scala-bridge, 
notably:

* the various operations provided by both {{ScalaDataStreamQueryOperation}} and 
{{JavaDataStreamQueryOperation}} (which are essentially the same code)
* some code in {{StreamTableEnvironmentImpl}} and {{StreamStatementSetImpl}}

The end goal is that planner should remove the runtime (not test) dependency on 
flink-scala-api and flink-scala-api-bridge



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25228) Introduce flink-table-test-utils

2021-12-09 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25228:
---

 Summary: Introduce flink-table-test-utils
 Key: FLINK-25228
 URL: https://issues.apache.org/jira/browse/FLINK-25228
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API, Table SQL / Ecosystem, Table SQL / 
Planner
Reporter: Francesco Guardiani


Introduce a package to ship test utilities for formats, connectors and end 
users.

This package should provide:

* Assertions for data types, logical types and internal data structures.
* Test cases for formats and connnectors

The end goal is to remove the test-jar planner dependency in formats and 
connectors and replace it with this package, so formats and connectors can then 
just depend on table-planner-loader.   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[DISCUSS][FLINK-24427] Hide Scala from table planner

2021-12-08 Thread Francesco Guardiani
Hi all,
In case you haven't seen, last week I published in the issue comments this
document to explain how we're proceeding to hide Scala from table planner:
https://docs.google.com/document/d/12yDUCnvcwU2mODBKTHQ1xhfOq1ujYUrXltiN_rbhT34/edit?usp=sharing

There is a section I've added yesterday which is particularly relevant,
because it explains the impact on the distribution. I strongly encourage
people to look at it.

Once we perform all the changes, I'm gonna announce them on the user
mailing list as well, together with the package name changes already
brought in by #17897 <https://github.com/apache/flink/pull/17897> to
flink-parquet and flink-orc.

Thanks,
FG

-- 

Francesco Guardiani | Software Engineer

france...@ververica.com


<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH

Registered at Amtsgericht Charlottenburg: HRB 158244 B

Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason,
Jinwei (Kevin) Zhang


[jira] [Created] (FLINK-25158) Fix formatting for true, false and null to uppercase

2021-12-03 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25158:
---

 Summary: Fix formatting for true, false and null to uppercase
 Key: FLINK-25158
 URL: https://issues.apache.org/jira/browse/FLINK-25158
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Francesco Guardiani


All the cast rules using the constant strings {{true}}, {{false}} and {{null}} 
should use {{TRUE}}, {{FALSE}} and {{NULL}} instead.

This behavior should be enabled only if legacy behavior is not enabled



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25156) DISTINCT is not handled correctly by CastRules

2021-12-03 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25156:
---

 Summary: DISTINCT is not handled correctly by CastRules
 Key: FLINK-25156
 URL: https://issues.apache.org/jira/browse/FLINK-25156
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani


We need to support DISTINCT types in CastRuleProvider, or with some sort of 
meta rule, so every rule can support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25157) Introduce NULL type to string cast rule

2021-12-03 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25157:
---

 Summary: Introduce NULL type to string cast rule
 Key: FLINK-25157
 URL: https://issues.apache.org/jira/browse/FLINK-25157
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-196: Source API stability guarantees

2021-12-03 Thread Francesco Guardiani
Hi Till,

Thanks for starting this discussion, I think it's very beneficial for the
community to have stable APIs, in particular to develop connectors and
formats.

A couple of comments:

> I would suggest that we document these guarantees prominently under
/docs/dev/api_stability.

I think it would be nice that these docs are actually Javadocs, since our
PublicEvolving javadocs tend to be already complete. perhaps there is a way
we can play around the javadoc tool to generate only pages with @Public and
@PublicEvolving on it.

> Test Plan

IMHO this proposal lacks what testing means from the user perspective: I
think a discussion about API stability should go hand to hand with a
discussion about providing TCKs to users. If the interfaces to build a
Source are claimed to be stable, then, from the flink runtime point of
view, a 3rd party source implementation is expected to have certain
behaviors. On the other hand, as a 3rd party developer, I expect flink
gives me some means to test if my source is compliant with the basic
"expected behaviors". This is even more relevant when you're building a
source for Table, where we have well defined behaviors through our
"abilities" interfaces.

TCKs also have another very important aspect: they serve as a very detailed
documentation of all the behaviors that comes in/out a specific interface.

Another thing to mention is that we already have some of these test, they
just need to be polished to make them publicly available; e.g. look at the
JsonBatchFileSystemITCase, it uses all the test cases from
FileSystemITCaseBase to test that the format works correctly with the
filesystem source.

A TCK is something that can take quite some time to be considered
"complete", so I'm not suggesting at all to get it done all at once inside
this FLIP. But I think that if we bootstrap it in this FLIP, we can then
progressively add new test cases every time we should add them internally,
or when we need to rework the existing ones.


FG

On Fri, Dec 3, 2021 at 9:32 AM Konstantin Knauf  wrote:

> Hi Till,
>
> Thank you for picking up this topic. Explicit and consistent stability
> guarantees are important for our users as well as any downstream project of
> Apache Flink. Documenting the de-facto status quo and tackling existing
> inconsistencies sounds like a good first step. So, +1 from my side.
>
> Two questions for clarity:
> * Are you planning to implement the required tests via ArchUnit?
> * Fixing existing "test failures" is in the scope of this FLIP, correct?
>
> Cheers,
>
> Konstantin
>
> On Thu, Dec 2, 2021 at 3:48 PM Till Rohrmann  wrote:
>
> > Hi everyone,
> >
> > I would like to start a discussion about what kind of API stability
> > guarantees we want to provide to our users. The Flink community already
> > agreed on some stability guarantees, but these guarantees were only
> > communicated via the ML and not properly documented [2]. Moreover, I've
> > seen more and more complaints about breaking changes (some rightful and
> > others not) on the ML recently [3] because we rarely mark our APIs as
> > stable. This motivated this FLIP.
> >
> > The proposal first concentrates on source API stability guarantees.
> Binary
> > stability guarantees are explicitly left out for a follow-up discussion.
> >
> > In essence, the proposal follows our current stability guarantees while
> > updating the guarantees for @PublicEvolving that we are already providing
> > w/o having stated them. Moreover, this FLIP proposes some guidelines for
> > determining the stability guarantees for an API object, how to evolve
> them
> > and some additional requirements for the return and parameter types of
> > methods.
> >
> > All in all, I hope that this proposal is more reflecting our current
> > understanding of stability guarantees than being controversial. One of
> the
> > outcomes of this FLIP should be to properly document these guarantees so
> > that it is easily discoverable and understandable for our users.
> Moreover,
> > I hope that we can provide more stable APIs our users can rely and build
> > upon.
> >
> > There will be a follow-up FLIP discussing the problem of how to make sure
> > that APIs become stable over time.
> >
> > Looking forward to your feedback.
> >
> > [1] https://cwiki.apache.org/confluence/x/IJeqCw
> > [2] https://lists.apache.org/thread/5jm25783oq5svyk7rr8g1gly2ooxqhjr
> > [3] https://lists.apache.org/thread/kzhfc3t6omzo2kyo8zj9yxoh8twq5fzr
> >
> > Cheers,
> > Till
> >
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


Re: [ANNOUNCE] New Apache Flink Committer - Ingo Bürk

2021-12-02 Thread Francesco Guardiani
Congrats Ingo!

On Thu, Dec 2, 2021 at 4:58 PM Nicolaus Weidner <
nicolaus.weid...@ververica.com> wrote:

> Congrats Ingo!
> The PMC probably realized that it's simply too much work to review and
> merge all your PRs, so now you can/have to do part of that work yourself
> ;-)
>
> Best,
> Nico
>
> On Thu, Dec 2, 2021 at 4:50 PM Fabian Paul  wrote:
>
> > Thanks for always pushing Ingo. Congratulations!
> >
> > Best,
> > Fabian
> >
> > On Thu, Dec 2, 2021 at 4:24 PM Till Rohrmann 
> wrote:
> > >
> > > Hi everyone,
> > >
> > > On behalf of the PMC, I'm very happy to announce Ingo Bürk as a new
> Flink
> > > committer.
> > >
> > > Ingo has started contributing to Flink since the beginning of this
> year.
> > He
> > > worked mostly on SQL components. He has authored many PRs and helped
> > review
> > > a lot of other PRs in this area. He actively reported issues and helped
> > our
> > > users on the MLs. His most notable contributions were Support SQL 2016
> > JSON
> > > functions in Flink SQL (FLIP-90), Register sources/sinks in Table API
> > > (FLIP-129) and various other contributions in the SQL area. Moreover,
> he
> > is
> > > one of the few people in our community who actually understands Flink's
> > > frontend.
> > >
> > > Please join me in congratulating Ingo for becoming a Flink committer!
> > >
> > > Cheers,
> > > Till
> >
>


Re: [ANNOUNCE] New Apache Flink Committer - Matthias Pohl

2021-12-02 Thread Francesco Guardiani
Congrats Matthias!

On Thu, Dec 2, 2021 at 4:53 PM Nicolaus Weidner <
nicolaus.weid...@ververica.com> wrote:

> Congrats Matthias, well deserved!
>
> Best,
> Nico
>
> On Thu, Dec 2, 2021 at 4:48 PM Fabian Paul  wrote:
>
> > Congrats and well deserved.
> >
> > Best,
> > Fabian
> >
> > On Thu, Dec 2, 2021 at 4:42 PM Ingo Bürk  wrote:
> > >
> > > Congrats, Matthias!
> > >
> > > On Thu, Dec 2, 2021 at 4:28 PM Till Rohrmann 
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > On behalf of the PMC, I'm very happy to announce Matthias Pohl as a
> new
> > > > Flink committer.
> > > >
> > > > Matthias has worked on Flink since August last year. He helped review
> > a ton
> > > > of PRs. He worked on a variety of things but most notably the
> tracking
> > and
> > > > reporting of concurrent exceptions, fixing HA bugs and deprecating
> and
> > > > removing our Mesos support. He actively reports issues helping Flink
> to
> > > > improve and he is actively engaged in Flink's MLs.
> > > >
> > > > Please join me in congratulating Matthias for becoming a Flink
> > committer!
> > > >
> > > > Cheers,
> > > > Till
> > > >
> >
>


[jira] [Created] (FLINK-25131) Update sql client to ship flink-table-planner-loader instead of flink-table-planner

2021-12-01 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25131:
---

 Summary: Update sql client to ship flink-table-planner-loader 
instead of flink-table-planner
 Key: FLINK-25131
 URL: https://issues.apache.org/jira/browse/FLINK-25131
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Client
Reporter: Francesco Guardiani


See the parent task for more details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25130) Update flink-table-uber to ship flink-table-planner-loader instead of flink-table-planner

2021-12-01 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25130:
---

 Summary: Update flink-table-uber to ship 
flink-table-planner-loader instead of flink-table-planner
 Key: FLINK-25130
 URL: https://issues.apache.org/jira/browse/FLINK-25130
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Francesco Guardiani


This should also be tested by the sql client.

This change should be tested then by our e2e tests, specifically 
flink-end-to-end-tests/flink-batch-sql-test.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25129) Update docs and examples to use flink-table-planner-loader instead of flink-table-planner

2021-12-01 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25129:
---

 Summary: Update docs and examples to use 
flink-table-planner-loader instead of flink-table-planner
 Key: FLINK-25129
 URL: https://issues.apache.org/jira/browse/FLINK-25129
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Examples, Table SQL / API
Reporter: Francesco Guardiani


For more details 
https://docs.google.com/document/d/12yDUCnvcwU2mODBKTHQ1xhfOq1ujYUrXltiN_rbhT34/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25128) Introduce flink-table-planner-loader

2021-12-01 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25128:
---

 Summary: Introduce flink-table-planner-loader
 Key: FLINK-25128
 URL: https://issues.apache.org/jira/browse/FLINK-25128
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani


For more details, see 
https://docs.google.com/document/d/12yDUCnvcwU2mODBKTHQ1xhfOq1ujYUrXltiN_rbhT34/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25114) Remove flink-scala dependency from flink-table-runtime

2021-11-30 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25114:
---

 Summary: Remove flink-scala dependency from flink-table-runtime
 Key: FLINK-25114
 URL: https://issues.apache.org/jira/browse/FLINK-25114
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


flink-scala should not be necessary anymore to flink-table-runtime. We should 
try to remove it in order to help with the parent task 
[FLINK-24427|https://issues.apache.org/jira/browse/FLINK-24427]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25113) Cleanup from Parquet and Orc the partition key handling logic

2021-11-30 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25113:
---

 Summary: Cleanup from Parquet and Orc the partition key handling 
logic
 Key: FLINK-25113
 URL: https://issues.apache.org/jira/browse/FLINK-25113
 Project: Flink
  Issue Type: Technical Debt
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: Francesco Guardiani


After https://issues.apache.org/jira/browse/FLINK-24617 the partition key 
handling logic is encapsuled within {{FileInfoExtractorBulkFormat}}. We should 
cleanup this logic from orc and parquet formats, in order to simplify it. Note: 
Hive still depends on this logic, but it should rather use 
{{FileInfoExtractorBulkFormat}} or similar.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25090) Run the assertj conversion script to convert assertions

2021-11-29 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25090:
---

 Summary: Run the assertj conversion script to convert assertions
 Key: FLINK-25090
 URL: https://issues.apache.org/jira/browse/FLINK-25090
 Project: Flink
  Issue Type: Bug
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


See https://assertj.github.io/doc/#assertj-migration-using-regexes for more 
details



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25079) Add assertj assertions for table types

2021-11-26 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25079:
---

 Summary: Add assertj assertions for table types
 Key: FLINK-25079
 URL: https://issues.apache.org/jira/browse/FLINK-25079
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25075) Remove reflection to instantiate PlannerExpressionParser

2021-11-26 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25075:
---

 Summary: Remove reflection to instantiate PlannerExpressionParser
 Key: FLINK-25075
 URL: https://issues.apache.org/jira/browse/FLINK-25075
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


This reflection leaks the planner module classpath and can cause issues when 
isolating the classpath



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-11-25 Thread Francesco Guardiani
Hi Timo,

Thanks for putting this amazing work together, I have some 
considerations/questions 
about the FLIP:
*Proposed changes #6*: Other than defining this rule of thumb, we must also 
make sure 
that compiling plans with these objects that cannot be serialized in the plan 
must fail hard, 
so users don't bite themselves with such issues, or at least we need to output 
warning 
logs. In general, whenever the user is trying to use the CompiledPlan APIs and 
at the same 
time, they're trying to do something "illegal" for the plan, we should 
immediately either 
log or fail depending on the issue, in order to avoid any surprises once the 
user upgrades. 
I would also say the same for things like registering a function, registering a 
DataStream, 
and for every other thing which won't end up in the plan, we should log such 
info to the 
user by default.

*General JSON Plan Assumptions #9:* When thinking to connectors and formats, I 
think 
it's reasonable to assume and keep out of the feature design that no 
feature/ability can 
deleted from a connector/format. I also don't think new features/abilities can 
influence 
this FLIP as well, given the plan is static, so if for example, MyCoolTableSink 
in the next 
flink version implements SupportsProjectionsPushDown, then it shouldn't be a 
problem 
for the upgrade story since the plan is still configured as computed from the 
previous flink 
version. What worries me is breaking changes, in particular behavioural changes 
that 
might happen in connectors/formats. Although this argument doesn't seem 
relevant for 
the connectors shipped by the flink project itself, because we try to keep them 
as stable as 
possible and avoid eventual breaking changes, it's compelling to external 
connectors and 
formats, which might be decoupled from the flink release cycle and might have 
different 
backward compatibility guarantees. It's totally reasonable if we don't want to 
tackle it in 
this first iteration of the feature, but it's something we need to keep in mind 
for the future.


*Functions:* It's not clear to me what you mean for "identifier", because then 
somewhere 
else in the same context you talk about "name". Are we talking about the 
function name 
or the function complete signature? Let's assume for example we have these 
function 
definitions:


* TO_TIMESTAMP_LTZ(BIGINT)
* TO_TIMESTAMP_LTZ(STRING)
* TO_TIMESTAMP_LTZ(STRING, STRING)

These for me are very different functions with different implementations, where 
each of 
them might evolve separately at a different pace. Hence when we store them in 
the json 
plan we should perhaps use a logically defined unique id like 
/bigIntToTimestamp/, /
stringToTimestamp/ and /stringToTimestampWithFormat/. This also solves the 
issue of 
correctly referencing the functions when restoring the plan, without running 
again the 
inference logic (which might have been changed in the meantime) and it might 
also solve 
the versioning, that is the function identifier can contain the function 
version like /
stringToTimestampWithFormat_1_1 /or /stringToTimestampWithFormat_1_2/. An 
alternative could be to use the string signature representation, which might 
not be trivial 
to compute, given the complexity of our type inference logic. 

*The term "JSON plan"*: I think we should rather keep JSON out of the concept 
and just 
name it "Compiled Plan" (like the proposed API) or something similar, as I see 
how in 
future we might decide to support/modify our persistence format to something 
more 
efficient storage wise like BSON. For example, I would rename /
CompiledPlan.fromJsonFile/ to simply /CompiledPlan.fromFile/.

*Who is the owner of the plan file?* I asked myself this question when reading 
this:


> For simplification of the design, we assume that upgrades use a step size of 
> a single 
minor version. We don't guarantee skipping minor versions (e.g. 1.11 to 1.14).

My understanding of this statement is that a user can upgrade between minors 
but then 
following all the minors, the same query can remain up and running. E.g. I 
upgrade from 
1.15 to 1.16, and then from 1.16 to 1.17 and I still expect my original query 
to work 
without recomputing the plan. This necessarily means that at some point in 
future 
releases we'll need some basic "migration" tool to keep the queries up and 
running, 
ending up modifying the compiled plan. So I guess flink should write it back in 
the original 
plan file, perhaps doing a backup of the previous one? Can you please clarify 
this aspect?

Except these considerations, the proposal looks good to me and I'm eagerly 
waiting to see 
it in play.

Thanks,
FG


-- 
Francesco Guardiani | Software Engineer
france...@ververica.com[1]

Follow us @VervericaData
--
Join Flink Forward[2] - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
--
Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germa

[jira] [Created] (FLINK-25061) Add assertj as dependency in flink-parent

2021-11-25 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25061:
---

 Summary: Add assertj as dependency in flink-parent
 Key: FLINK-25061
 URL: https://issues.apache.org/jira/browse/FLINK-25061
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


We recently discussed test assertion on the ML 
(https://lists.apache.org/thread/33t7hz8w873p1bc5msppk65792z08rgt), and came to 
the conclusion to that we want to encourage contributions to use assertj as 
much as possible. In order to do that, we should add assertj in flink-parent 
pom as test dependency



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25060) Replace DataType.projectFields with Projection type

2021-11-25 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25060:
---

 Summary: Replace DataType.projectFields with Projection type
 Key: FLINK-25060
 URL: https://issues.apache.org/jira/browse/FLINK-25060
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


FLINK-24399 introduced new methods to perform data types projections in 
DataType. Note: no release included such changes.

FLINK-24776 introduced a new, more powerful, type to perform operations on 
projections, that is project types, but also difference and complement.

In spite of avoiding to provide different entrypoints for the same 
functionality, we should cleanup the new methods introduced by FLINK-24399 and 
replace them with the new Projection type. We should also deprecate the 
functions in DataTypeUtils.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25052) Port row to row cast logic to CastRule

2021-11-25 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25052:
---

 Summary: Port row to row cast logic to CastRule
 Key: FLINK-25052
 URL: https://issues.apache.org/jira/browse/FLINK-25052
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25051) Port raw <-> binary logic to CastRule

2021-11-25 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-25051:
---

 Summary: Port raw <-> binary logic to CastRule
 Key: FLINK-25051
 URL: https://issues.apache.org/jira/browse/FLINK-25051
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Francesco Guardiani
Assignee: Francesco Guardiani


More details on the parent task



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Conventions on assertions to use in tests

2021-11-25 Thread Francesco Guardiani
This is the result of experimenting around creating custom assertions for Table 
API types 
https://github.com/slinkydeveloper/flink/commit/
d1ce37a62c2200b2c3008a9cc2cac91234222fd5[1]. I will PR it once the two PRs in 
the 
previous mail get merged

On Monday, 22 November 2021 17:59:29 CET Francesco Guardiani wrote:
> Hi all,
> 
> Given I see generally consensus around having a convention and using
> assertj, I propose to merge these 2 PRs:
> 
> * Add the explanation of this convention in our code quality guide:
> https://github.com/apache/flink-web/pull/482
> * Add assertj to dependency management in the parent pom and link in the PR
> template the code quality guide: https://github.com/apache/flink/pull/17871
> 
> WDYT?
> 
> Once we merge those, I'll work in the next days to add some custom
> assertions in table-common for RowData and Row (commonly asserted
> everywhere in the table codebase).
> 
> @Matthias Pohl  about the confluence page, it seems
> a bit outdated, judging from the last modified date. I propose to continue
> to use this guide
> https://flink.apache.org/contributing/code-style-and-quality-common.html as
> it seems more complete.
> 
> 
> On Mon, Nov 22, 2021 at 8:58 AM Matthias Pohl 
> 
> wrote:
> > Agree. Clarifying once more what our preferred option is here, is a good
> > idea. So, +1 for unification. I don't have a strong opinion on what
> > framework to use. But we may want to add this at the end of the discussion
> > to our documentation (e.g. [1] or maybe the PR description?) to make users
> > aware of it and be able to provide a reference in case it comes up again
> > (besides this ML thread). Or do we already have something like that
> > somewhere in the docs where I missed it?
> > 
> > Matthias
> > 
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/Best+Practices+and+Lesso
> > ns+Learned> 
> > On Wed, Nov 17, 2021 at 11:13 AM Marios Trivyzas  wrote:
> >> I'm also +1 both for unification and specifically for assertJ.
> >> I think it covers a wide variety of assertions and as Francesco mentioned
> >> it's easily extensible, so that
> >> we can create custom assertions where needed, and avoid repeating test
> >> code.
> >> 
> >> On Tue, Nov 16, 2021 at 9:57 AM David Morávek  wrote:
> >> > I don't have any strong opinions on the asserting framework that we
> >> > use,
> >> > but big +1 for the unification.
> >> > 
> >> > Best,
> >> > D.
> >> > 
> >> > On Tue, Nov 16, 2021 at 9:37 AM Till Rohrmann 
> >> > 
> >> > wrote:
> >> > > Using JUnit5 with assertJ is fine with me if the community agrees.
> >> 
> >> Having
> >> 
> >> > > guides for best practices would definitely help with the transition.
> >> > > 
> >> > > Cheers,
> >> > > Till
> >> > > 
> >> > > On Mon, Nov 15, 2021 at 5:34 PM Francesco Guardiani <
> >> > > france...@ververica.com>
> >> > > 
> >> > > wrote:
> >> > > > > It is a bit unfortunate that we have tests that follow different
> >> > > > 
> >> > > > patterns.
> >> > > > This, however, is mainly due to organic growth. I think the
> >> 
> >> community
> >> 
> >> > > > started with Junit4, then we chose to use Hamcrest because of its
> >> > 
> >> > better
> >> > 
> >> > > > expressiveness.
> >> > > > 
> >> > > > That is fine, I'm sorry if my mail felt like a rant :)
> >> > > > 
> >> > > > > Personally, I don't have a strong preference for which testing
> >> 
> >> tools
> >> 
> >> > to
> >> > 
> >> > > > use. The important bit is that we agree as a community, then
> >> 
> >> document
> >> 
> >> > the
> >> > 
> >> > > > choice and finally stick to it. So before starting to use assertj,
> >> 
> >> we
> >> 
> >> > > > should probably align with the folks working on the Junit5 effort
> >> > 
> >> > first.
> >> > 
> >> > > > As Arvid pointed out, using assertj might help the people working

Re: [DISCUSS] Deprecate Java 8 support

2021-11-25 Thread Francesco Guardiani
+1 with what both Ingo and Matthias sad, personally, I cannot wait to start 
using some of 
the APIs introduced in Java 9. And I'm pretty sure that's the same for our 
users as well.

On Tuesday, 23 November 2021 13:35:07 CET Ingo Bürk wrote:
> Hi everyone,
> 
> continued support for Java 8 can also create project risks, e.g. if a
> vulnerability arises in Flink's dependencies and we cannot upgrade them
> because they no longer support Java 8. Some projects already started
> deprecating support as well, like Kafka, and other projects will likely
> follow.
> Let's also keep in mind that the proposal here is not to drop support right
> away, but to deprecate it, send the message, and motivate users to start
> migrating. Delaying this process could ironically mean users have less time
> to prepare for it.
> 
> 
> Ingo
> 
> On Tue, Nov 23, 2021 at 8:54 AM Matthias Pohl 
> 
> wrote:
> > Thanks for constantly driving these maintenance topics, Chesnay. +1 from
> > my
> > side for deprecating Java 8. I see the point Jingsong is raising. But I
> > agree with what David already said here. Deprecating the Java version is a
> > tool to make users aware of it (same as starting this discussion thread).
> > If there's no major opposition against deprecating it in the community we
> > should move forward in this regard to make the users who do not
> > regularly browse the mailing list aware of it. That said, deprecating Java
> > 8 in 1.15 does not necessarily mean that it is dropped in 1.16.
> > 
> > Best,
> > Matthias
> > 
> > On Tue, Nov 23, 2021 at 8:46 AM David Morávek  wrote:
> > > Thank you Chesnay for starting the discussion! This will generate bit of
> > 
> > a
> > 
> > > work for some users, but it's a good thing to keep moving the project
> > > forward. Big +1 for this.
> > > 
> > > Jingsong:
> > > 
> > > Receiving this signal, the user may be unhappy because his application
> > > 
> > > > may be all on Java 8. Upgrading is a big job, after all, many systems
> > > > have not been upgraded yet. (Like you said, HBase and Hive)
> > > 
> > > The whole point of deprecation is to raise awareness, that this will be
> > > happening eventually and users should take some steps to address this in
> > > medium-term. If I understand Chesnay correctly, we'd still keep Java 8
> > > around for quite some time to give users enough time to upgrade, but
> > > without raising awareness we'd fight the very same argument later in
> > 
> > time.
> > 
> > > All of the prerequisites from 3rd party projects for both HBase [1] and
> > > Hive [2] to fully support Java 11 have been completed, so the ball is on
> > > their side and there doesn't seem to be much activity. Generating bit
> > 
> > more
> > 
> > > pressure on these efforts might be a good thing.
> > > 
> > > It would be great to identify some of these users and learn bit more
> > 
> > about
> > 
> > > their situation. Are they keeping up with latest Flink developments or
> > 
> > are
> > 
> > > they lagging behind (this would also give them way more time for
> > > eventual
> > > upgrade)?
> > > 
> > > [1] https://issues.apache.org/jira/browse/HBASE-22972
> > > [2] https://issues.apache.org/jira/browse/HIVE-22415
> > > 
> > > Best,
> > > D.
> > > 
> > > On Tue, Nov 23, 2021 at 3:08 AM Jingsong Li 
> > > 
> > > wrote:
> > > > Hi Chesnay,
> > > > 
> > > > Thanks for bringing this for discussion.
> > > > 
> > > > We should dig deeper into the current Java version of Flink users. At
> > > > least make sure Java 8 is not a mainstream version.
> > > > 
> > > > Receiving this signal, the user may be unhappy because his application
> > > > may be all on Java 8. Upgrading is a big job, after all, many systems
> > > > have not been upgraded yet. (Like you said, HBase and Hive)
> > > > 
> > > > In my opinion, it is too early to deprecate support for Java 8. We
> > > > should wait for a safer point in time.
> > > > 
> > > > On Mon, Nov 22, 2021 at 11:45 PM Ingo Bürk  wrote:
> > > > > Hi,
> > > > > 
> > > > > also a +1 from me because of everything Chesnay already said.
> > > > > 
> > > > > 
> > > > > Ingo
> > > > > 
> > > > > On Mon, Nov 22, 2021 at 4:41 PM Martijn Visser <
> > 
> > mart...@ververica.com>

Re: [DISCUSS] Conventions on assertions to use in tests

2021-11-22 Thread Francesco Guardiani
Hi all,

Given I see generally consensus around having a convention and using
assertj, I propose to merge these 2 PRs:

* Add the explanation of this convention in our code quality guide:
https://github.com/apache/flink-web/pull/482
* Add assertj to dependency management in the parent pom and link in the PR
template the code quality guide: https://github.com/apache/flink/pull/17871

WDYT?

Once we merge those, I'll work in the next days to add some custom
assertions in table-common for RowData and Row (commonly asserted
everywhere in the table codebase).

@Matthias Pohl  about the confluence page, it seems
a bit outdated, judging from the last modified date. I propose to continue
to use this guide
https://flink.apache.org/contributing/code-style-and-quality-common.html as
it seems more complete.


On Mon, Nov 22, 2021 at 8:58 AM Matthias Pohl 
wrote:

> Agree. Clarifying once more what our preferred option is here, is a good
> idea. So, +1 for unification. I don't have a strong opinion on what
> framework to use. But we may want to add this at the end of the discussion
> to our documentation (e.g. [1] or maybe the PR description?) to make users
> aware of it and be able to provide a reference in case it comes up again
> (besides this ML thread). Or do we already have something like that
> somewhere in the docs where I missed it?
>
> Matthias
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Best+Practices+and+Lessons+Learned
>
> On Wed, Nov 17, 2021 at 11:13 AM Marios Trivyzas  wrote:
>
>> I'm also +1 both for unification and specifically for assertJ.
>> I think it covers a wide variety of assertions and as Francesco mentioned
>> it's easily extensible, so that
>> we can create custom assertions where needed, and avoid repeating test
>> code.
>>
>> On Tue, Nov 16, 2021 at 9:57 AM David Morávek  wrote:
>>
>> > I don't have any strong opinions on the asserting framework that we use,
>> > but big +1 for the unification.
>> >
>> > Best,
>> > D.
>> >
>> > On Tue, Nov 16, 2021 at 9:37 AM Till Rohrmann 
>> > wrote:
>> >
>> > > Using JUnit5 with assertJ is fine with me if the community agrees.
>> Having
>> > > guides for best practices would definitely help with the transition.
>> > >
>> > > Cheers,
>> > > Till
>> > >
>> > > On Mon, Nov 15, 2021 at 5:34 PM Francesco Guardiani <
>> > > france...@ververica.com>
>> > > wrote:
>> > >
>> > > > > It is a bit unfortunate that we have tests that follow different
>> > > > patterns.
>> > > > This, however, is mainly due to organic growth. I think the
>> community
>> > > > started with Junit4, then we chose to use Hamcrest because of its
>> > better
>> > > > expressiveness.
>> > > >
>> > > > That is fine, I'm sorry if my mail felt like a rant :)
>> > > >
>> > > > > Personally, I don't have a strong preference for which testing
>> tools
>> > to
>> > > > use. The important bit is that we agree as a community, then
>> document
>> > the
>> > > > choice and finally stick to it. So before starting to use assertj,
>> we
>> > > > should probably align with the folks working on the Junit5 effort
>> > first.
>> > > >
>> > > > As Arvid pointed out, using assertj might help the people working on
>> > the
>> > > > junit5 effort as well, since assertj works seamlessly with junit4,
>> > junit5
>> > > > and even other java testing frameworks.
>> > > >
>> > > > > But I'm not sure if it's wise to change everything at once also
>> > > > from the perspective of less active contributors. We may alleviate
>> that
>> > > > pain by providing good guides though. So maybe, we should also
>> include
>> > a
>> > > > temporal dimension into the discussion.
>> > > >
>> > > > This is why I'm proposing a convention and not a rewrite of all the
>> > tests
>> > > > at once, that's unfeasible. As you sad, we can provide guides, like
>> in
>> > > our
>> > > > contribution guides, explaining our assertion convention, that is
>> use
>> > > > assertj or whatever other library we want to use and how. So then we
>> > can
>> > > > ask contributors to use such assertion convention when they PR new
>> > tests
>> > > or
>> > > 

Re: [DISCUSS] Shall casting functions return null or throw exceptions for invalid input

2021-11-22 Thread Francesco Guardiani
> NULL in SQL essentially means "UNKNOWN", it's not as scary as a null in
java which will easily cause a NPE or some random behavior with a c++
function call.

This is true from the user point of view, except our runtime doesn't treat
null as some value where you can safely execute operations and get "noop"
results. In our runtime null is Java's null, hence causing issues and
generating NPEs here and there when nulls are not expected.

> It will really create a big mess after users upgrade their SQL jobs

This is what I don't really understand here: how adding a configuration
option causes issues here? We make it very clear in our release notes that
you need to switch that flag if you're relying on this behavior and that's
it: if you reprocess jobs every time you upgrade, you just flip the switch
before reprocessing and you won't have any issues. If you don't because you
use the hybrid source, either you upgrade your query or you flip the flag
and in both cases this shouldn't generate any issue.
Since it's a big change, I also expect to keep this flag for some releases,
at least up to Flink 2.

On Sat, Nov 20, 2021 at 7:25 AM Kurt Young  wrote:

> Hi Francesco,
>
> Thanks for sharing your opinion about this and examples with other
> programming
> languages. I just want to mention, that NULL in SQL world is a bit
> different with the
> meaning in programming languages like java.
>
> NULL in SQL essentially means "UNKNOWN", it's not as scary as a null in
> java which
> will easily cause a NPE or some random behavior with a c++ function call.
> UNKNOWN
> means it could be any value. In java, the condition "null == null" always
> return true. But
> in SQL, it returns NULL, which means UNKNOWN.
>
> Another example, if you run following statements:
> select 'true' where 3 in (1, 2, 3, null) // this will print true
> select 'true' where 3 not in (1, 2, null) // this won't print anything
>
> In summary, SQL's NULL is a bit different from others, it has its own
> meaning. So I won't
> compare the behavior of returning NULL with programming languages and then
> judge it
> as bad behavior. And it's not a very big deal if we return NULL when trying
> to cast "abc"
> to an integer, which means we don't know the correct value.
>
> But still, I'm ok to change the behavior, but just not now. It will really
> create a big mess after
> users upgrade their SQL jobs. I'm either fine to do it in some really big
> version change like
> Flink 2.0, or we can do it after we have some universal error records
> handling mechanism, so
> in that way, users could have a chance to handle such a situation.
>
> Best,
> Kurt
>
>
> On Fri, Nov 19, 2021 at 7:29 PM Francesco Guardiani <
> france...@ververica.com>
> wrote:
>
> > Hi all,
> >
> > tl;dr:
> >
> > I think Timo pretty much said it all. As described in the issue, my
> > proposal is:
> >
> > * Let's switch the default behavior of CAST to fail
> > * Let's add TRY_CAST to have the old behavior
> > * Let's add a rule (disabled by default) that wraps every CAST in a TRY,
> in
> > order to keep the old behavior.
> > * Let's put a giant warning in the release notes explaining to enable the
> > rule, in case you're depending on the old behavior
> >
> > This way, we break no SQL scripts, as you can apply this flag to every
> > previously running script. We can also think to another strategy, more
> than
> > the planner rule, to keep the old behavior, always behind a flag disabled
> > by default.
> >
> > Timing of this proposal is also crucial, since CAST is a basic primitive
> of
> > our language and, after we have the upgrade story in place, this is going
> > to be a whole more harder to deal with.
> >
> > And I would say that in the next future, we should start thinking to
> > support proper error handling strategies, that is:
> >
> > * How users are supposed to handle records that fails an expression
> > computation, aggregation, etc?
> > * Can we provide some default strategies, like log and discard, send to a
> > dead letter queue?
> >
> > Now let me go a bit more deep in the reason we really need such change:
> >
> > For me the issue is not really about being compliant with the SQL
> standard
> > or not, or the fact that other databases behaves differently from us, but
> > the fact that the CAST function we have right now is effectively a
> footgun
> > <https://en.wiktionary.org/wiki/footgun> for our users.
> > The concept of casting one value to another inherently involves some
> > concept of failure, this is something as a programmer I

Re: [DISCUSS] Shall casting functions return null or throw exceptions for invalid input

2021-11-19 Thread Francesco Guardiani
Hi all,

tl;dr:

I think Timo pretty much said it all. As described in the issue, my
proposal is:

* Let's switch the default behavior of CAST to fail
* Let's add TRY_CAST to have the old behavior
* Let's add a rule (disabled by default) that wraps every CAST in a TRY, in
order to keep the old behavior.
* Let's put a giant warning in the release notes explaining to enable the
rule, in case you're depending on the old behavior

This way, we break no SQL scripts, as you can apply this flag to every
previously running script. We can also think to another strategy, more than
the planner rule, to keep the old behavior, always behind a flag disabled
by default.

Timing of this proposal is also crucial, since CAST is a basic primitive of
our language and, after we have the upgrade story in place, this is going
to be a whole more harder to deal with.

And I would say that in the next future, we should start thinking to
support proper error handling strategies, that is:

* How users are supposed to handle records that fails an expression
computation, aggregation, etc?
* Can we provide some default strategies, like log and discard, send to a
dead letter queue?

Now let me go a bit more deep in the reason we really need such change:

For me the issue is not really about being compliant with the SQL standard
or not, or the fact that other databases behaves differently from us, but
the fact that the CAST function we have right now is effectively a footgun
 for our users.
The concept of casting one value to another inherently involves some
concept of failure, this is something as a programmer I expect, exactly
like when dividing a value by 0 or when sending a message to an external
system. And this is why every programming language has some explicit way to
signal to you such failures exist and, some of them, even force you to deal
with such failures, e.g. Java has the Exceptions and the try catch block,
Rust has the ? operator, Golang returns you an error together with the
result. Not failing when a failure is inherently defined by the operation
itself, or even not being explicit about the fact that such operation can
fail, leads users to mistakenly think the operation they're doing is always
safe and cannot lead to failures. And this is IMHO really the problem with
the CAST primitive we have right now: it has a concept of failure but we
shade it, and we're not even explicit about the fact that we're shading it
[1], and we expect users to go in the documentation and read that CAST can
return an eventual NULL and then deal with it. I even question why for
example, we return NULL more than a default sane value when an exception
happens, e.g. it would be way better to return epoch 0 more than NULL when
failing a cast from STRING to TIMESTAMP. This is for example the approach
taken by Golang: parsing string to timestamp function returns both a sane
value and an error.

And the cherry on top is that, for our users, the consequences of the bad
usage of CAST are simply disastrous: best case, some operator fails with
NPE, worst case you get bad results or even some data is lost down in the
pipeline. We give no indication at all that the cast failed, and even if we
push a change to log "hey this cast failed on this record" it would still
be extremely complicated to track down how badly a single cast failure
affected the results of a projection, a grouping, an aggregation, etc.
Hence my definition of our CAST function as a footgun.

The bottom line for me is that our CAST primitive goes directly against the
goal of Flink SQL to provide a simple to use API for developers and
business people to develop computation pipelines, because it's not
explicit, it silently fail with NULLs, and we require users to deal with it.

The very same discussion applies with TO_TIMESTAMP, which among the others
might even be more crucial because we directly use it in our documentation
to tell users how to compute rowtime.

FG

[1] Note: here the naming is a fundamental part of the issue, the function
we have today is named CAST and not TRY_CAST or CAST_OR_NULL or any other
name giving the indication that the operation might fail and provide a
result different from the cast result.


On Fri, Nov 19, 2021 at 4:00 AM Kurt Young  wrote:

> Hi Timo,
>
> Regarding CAST, I think no one denies the standard behavior which should
> raise errors when
> failed. The only question is how do we solve it, given lots of users
> already relying on current
> more tolerant behavior. Some violation of standard but acceptable behavior
> doesn't deserve
> a breaking change in Flink minor version IMO, i'm more comfortable to fix
> it in versions like
> Flink 2.0.
>
> Best,
> Kurt
>
>
> On Thu, Nov 18, 2021 at 11:44 PM Timo Walther  wrote:
>
> > Hi everyone,
> >
> >
> > thanks for finally have this discussion on the mailing list. As both a
> > contributor and user, I have experienced a couple issues around
> > nullability coming out of 

[jira] [Created] (FLINK-24924) TO_TIMESTAMP and TO_DATE should fail

2021-11-16 Thread Francesco Guardiani (Jira)
Francesco Guardiani created FLINK-24924:
---

 Summary: TO_TIMESTAMP and TO_DATE should fail
 Key: FLINK-24924
 URL: https://issues.apache.org/jira/browse/FLINK-24924
 Project: Flink
  Issue Type: Sub-task
Reporter: Francesco Guardiani


In a similar fashion to what described 
https://issues.apache.org/jira/browse/FLINK-24385, TO_TIMESTAMP and TO_DATE 
should fail instead of returning an error.

In particular for these two functions, a failure in parsing could lead to very 
unexpected behavior, for example it could lead to records with null rowtime.

We should change these functions to fail by default when parsing generates an 
error. We can let users handle errors by letting them use TRY_CAST for the same 
functionality:

{code:sql}
-- This fails when input is invalid
TO_TIMESTAMP(input)

-- Behaves the same as above
CAST(input AS TIMESTAMP)

-- This returns null when input is invalid
TRY_CAST(input AS TIMESTAMP)
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Definition of Done for Apache Flink

2021-11-16 Thread Francesco Guardiani
+1 with Ingo proposal, the goal of the template should be to help developer
to do a self check of his/her PR quality, not to define whether something
is done or not. It's up to the committer to check that the "definition of
done" is fulfilled.

> The Definition of Done as suggested:

This checklist makes sense to me, although it seems to me we already have
these bullet points defined here:
https://flink.apache.org/contributing/contribute-code.html

On Tue, Nov 16, 2021 at 8:16 AM Ingo Bürk  wrote:

> Hi Joe,
>
> thank you for starting this discussion. Having a common agreement on what
> to expect from a PR for it to be merged is very much a worthwhile goal.
>
> I'm slightly worried about the addition to the PR template. We shouldn't
> make opening PRs even more difficult (unless it adds sufficient benefit).
>
> There are two main benefits to have from using templates: requiring
> information from authors to automate certain feedback, and serving as a
> self-control checklist for contributors.
>
> As it stands, a large number of PRs don't fill out the template, and I
> haven't yet seen anyone not merge a PR over that, so de-facto we are not
> using it for the former.
>
> For the latter purpose of contributors having a checklist for themselves, I
> think the current template is too long already and contains the wrong
> content. Being short here is key if we want anyone to read it, and
> personally I would cut it down significantly to a description and a couple
> of checkboxes.
>
> This isn't exactly the scope of your proposal, but personally I wouldn't
> like to add even more questions that need to be filled out, especially
> since they don't actually need to be filled out. It just creates an
> annoying burden for contributors and is ignored by those who might benefit
> most from reading it anyway.
>
>
> Ingo
>
>
> On Mon, Nov 15, 2021, 22:36 Johannes Moser  wrote:
>
> > Dear Flink Community,
> >
> > We as the release managers of the 1.15 release are suggesting to
> introduce
> > a “Definition of Done".
> >
> > Let me elaborate a bit on the reasons:
> > * During the release process for 1.14 the stability of master was
> > sometimes in a state that made contributing to Apache Flink a bad
> > experience.
> > * Some of the changes that have been contributed seem to be unusable by
> > users because of defects.
> > * Documentation is neglected which also leads to users unable to make use
> > of changes. One of the reasons is, because documentation is often pushed
> to
> > a later state.
> >
> > With this definition of done awareness and sensibility for these aspect
> > should be increased. Both, for the ones who are committing and for the
> ones
> > that are reviewing.
> > We focus on code quality, testing and documentation. A shared
> > understanding is created.
> >
> > The Definition of Done as suggested:
> >
> > -
> > A PR is done and can be merged, when:
> >
> > 1. It is following the code contribution process
> > 2. It is implemented according to the code style and quality guide.
> > 3. If it has user facing changes the documentation has been updated
> > according to the documentation style guide.
> > 4. It is covered by tests.
> > 5. All tests passed.
> > -
> >
> > There are two PRs to illustrate the changes.
> > https://github.com/apache/flink-web/pull/481 <
> > https://github.com/apache/flink-web/pull/481>
> > https://github.com/apache/flink/pull/17801 <
> > https://github.com/apache/flink/pull/17801>
> >
> >
> > It isn’t the goal to make it harder to get changes into Apache Flink. It
> > is rather the opposite of making contributing and using Apache Flink a
> > better experience.
> > By creating awareness a push towards quality and usability should happen.
> >
> > I’m happy to hear your feedback.
> >
> > Best,
> > Joe
>


Re: [DISCUSS] Conventions on assertions to use in tests

2021-11-15 Thread Francesco Guardiani
> It is a bit unfortunate that we have tests that follow different patterns.
This, however, is mainly due to organic growth. I think the community
started with Junit4, then we chose to use Hamcrest because of its better
expressiveness.

That is fine, I'm sorry if my mail felt like a rant :)

> Personally, I don't have a strong preference for which testing tools to
use. The important bit is that we agree as a community, then document the
choice and finally stick to it. So before starting to use assertj, we
should probably align with the folks working on the Junit5 effort first.

As Arvid pointed out, using assertj might help the people working on the
junit5 effort as well, since assertj works seamlessly with junit4, junit5
and even other java testing frameworks.

> But I'm not sure if it's wise to change everything at once also
from the perspective of less active contributors. We may alleviate that
pain by providing good guides though. So maybe, we should also include a
temporal dimension into the discussion.

This is why I'm proposing a convention and not a rewrite of all the tests
at once, that's unfeasible. As you sad, we can provide guides, like in our
contribution guides, explaining our assertion convention, that is use
assertj or whatever other library we want to use and how. So then we can
ask contributors to use such assertion convention when they PR new tests or
when they modify existing ones. Something like that:
https://github.com/apache/flink-web/commit/87c572ccd4e0ae48eeff3eb15ad9847d302e659d

On Fri, Nov 12, 2021 at 5:07 PM Arvid Heise  wrote:

> JUnit5 migration is currently mostly prepared. The rules are being migrated
> [1] and Hang and Qingsheng have migrated most tests in their branch afaik
> (Kudos to them!).
>
> Using assertj would make migration easier as it's independent of the JUnit
> version. But the same can be said about hamcrest, albeit less expressive.
>
> I'm personally in favor of assertj (disclaimer I contributed to the project
> a bit). But I'm not sure if it's wise to change everything at once also
> from the perspective of less active contributors. We may alleviate that
> pain by providing good guides though. So maybe, we should also include a
> temporal dimension into the discussion.
>
> [1] https://github.com/apache/flink/pull/17556
>
> On Fri, Nov 12, 2021 at 3:58 PM Till Rohrmann 
> wrote:
>
> > Thanks for starting this discussion Francesco. I think there is a lot of
> > value in consistency because it makes it a lot easier to navigate and
> > contribute to the code base. The testing tools are definitely one
> important
> > aspect of consistency.
> >
> > It is a bit unfortunate that we have tests that follow different
> patterns.
> > This, however, is mainly due to organic growth. I think the community
> > started with Junit4, then we chose to use Hamcrest because of its better
> > expressiveness. Most recently, there was an effort started that aimed at
> > switching over to Junit5 [1, 2]. @Arvid Heise  knows
> > more about the current status.
> >
> > Personally, I don't have a strong preference for which testing tools to
> > use. The important bit is that we agree as a community, then document the
> > choice and finally stick to it. So before starting to use assertj, we
> > should probably align with the folks working on the Junit5 effort first.
> >
> > [1] https://lists.apache.org/thread/jsjvc2cqb91pyh47d4p6olk3c1vxqm3w
> > [2] https://lists.apache.org/thread/d9y5tzcl8wpk6ozmf8575qfzww450jpk
> >
> > Cheers,
> > Till
> >
> > On Fri, Nov 12, 2021 at 3:41 PM David Anderson 
> > wrote:
> >
> >> For what it's worth, I recently rewrote all of the tests in
> flink-training
> >> to use assertj, removing a mixture of junit4 assertions and hamcrest in
> >> the
> >> process. I chose assertj because I found it to be more expressive and
> made
> >> the tests more readable.
> >>
> >> +1 from me
> >>
> >> David
> >>
> >> On Fri, Nov 12, 2021 at 10:03 AM Francesco Guardiani <
> >> france...@ververica.com> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I wonder If we have a convention of the testing tools (in particular
> >> > assertions) to use in our tests. If not, are modules free to decide
> on a
> >> > convention on their own?
> >> >
> >> > In case of table, we have a mixed bag of different assertions of all
> >> kinds,
> >> > sometimes mixed even in the same test:
> >> >
> >> >- Assertions from junit 4
> >> >- Assertions from junit 5
> >> >- Hamcrest
> >

  1   2   >